You will need some basic understanding of the concept of neural networks and, more specifically, convolutional neural networks. Here are some useful pointers:
In this task, you will not need to design a neural network or train it. We will use an existing pre-trained network that is freely available online.
The network that we will use is called VGG-19, developed by Karen Simonyan and Andrew Zisserman, and described in the following technical report:
Karen Simonyan and Andrew Zisserman (2014): Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556.
The input of the network is an RGB image of dimensions 224 × 224, and the outputs of the network are classification labels; there are 1000 possible labels that the network can output.
The network consists of 19 layers, but each of them is very simple: There are convolution layers that simply calculate convolutions with a sliding window of size 3 × 3 and apply ReLU. There are maxpool layers that reduce the dimensionality by replacing each 2 × 2 block by the maximum value. And finally, there are fully connected layers that are familiar from usual neural networks.
Computation starts with a 3-dimensional array of dimensions 224 × 224 × 3. This is our input image; 224 × 224 pixels and 3 channels (RGB). Then we apply 64 different convolutions with a window of size 3 × 3 to obtain an array of dimensions 224 × 224 × 64. Another convolution layer maps this to a new array of dimensions 224 × 224 × 64, and a maxpool layer then reduces the dimensionality to 112 × 112 × 64. Note that we have increased the number of channels and decreased the image dimensions. Similar steps of interleaved convolution and maxpool layers are repeated until we are left with an array of dimensions 7 × 7 × 512, and this is finally interpreted as a flat array with 25088 elements. Now regular fully-connected neural networks are applied to map this to an array of 4096 elements, then another array of 4096 elements, and finally to an array of 1000 classification labels.
See the file
nn.h for the interface and
../nn-common/nn-main.cc for an example of how the main program will call the classifier. The global variable
g_weights will point to the data structure with the weights (comes from
weights.bin). The entry point to the classifier is function
evalNetwork; the only parameter
buf0 serves both as the input and the output.