You will need some basic understanding of the concept of neural networks and, more specifically, convolutional neural networks. Here are some useful pointers:

- Convolutional Neural Networks — Stanford CS231n: Convolutional Neural Networks for Visual Recognition
- Convolutional Neural Networks — Wikipedia

In this task, you will not need to design a neural network or train it. We will use an existing pre-trained network that is freely available online.

The network that we will use is called *VGG-19*, developed by Karen Simonyan and Andrew Zisserman, and described in the following technical report:

Karen Simonyan and Andrew Zisserman (2014): Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556.

The network is called *“configuration E”* in the above paper. The network is available for download under the CC BY 4.0 license.

The input of the network is an RGB image of dimensions 224 × 224, and the outputs of the network are classification labels; there are 1000 possible labels that the network can output.

The network consists of 19 layers, but each of them is very simple: There are *convolution layers* that simply calculate convolutions with a sliding window of size 3 × 3 and apply ReLU. There are *maxpool layers* that reduce the dimensionality by replacing each 2 × 2 block by the maximum value. And finally, there are *fully connected layers* that are familiar from usual neural networks.

Computation starts with a 3-dimensional array of dimensions 224 × 224 × 3. This is our input image; 224 × 224 pixels and 3 channels (RGB). Then we apply 64 different convolutions with a window of size 3 × 3 to obtain an array of dimensions 224 × 224 × 64. Another convolution layer maps this to a new array of dimensions 224 × 224 × 64, and a maxpool layer then reduces the dimensionality to 112 × 112 × 64. Note that we have increased the number of channels and decreased the image dimensions. Similar steps of interleaved convolution and maxpool layers are repeated until we are left with an array of dimensions 7 × 7 × 512, and this is finally interpreted as a flat array with 25088 elements. Now regular fully-connected neural networks are applied to map this to an array of 4096 elements, then another array of 4096 elements, and finally to an array of 1000 classification labels.

See the file `nn.h`

for the interface and `../nn-common/nn-main.cc`

for an example of how the main program will call the classifier. The global variable `g_weights`

will point to the data structure with the weights (comes from `weights.bin`

). The entry point to the classifier is function `evalNetwork`

; the only parameter `buf0`

serves both as the input and the output.