Overview: Convolutional Neural Networks (CNNs)

5 min readOct 31, 2021

In today’s society, we see many cars obtain features that allow the vehicle to do more work than the driver, such as keeping the driver in the correct lane, auto-brake, self-parking, and we’ve recently seen 100% self-driving cars hit the road. But, the truth is, many of us are still wondering — what kind of technologies allow the vehicle to know whether it’s a red or green light — or where the lines on the road are. And behind all of this is Convolutional Neural Networks.

Convolutional Neural Networks, also known as CNNs, are a form of neural networks that are popular for image analysis, computer vision, data analysis or classification problems. CNNs are known to pick out certain patterns and sense them through using convolutional layers. Through the use of linear algebra, matrix multiplication and identifying patterns in images, CNNs make this possible.

Basics on Neural Networks

Neural networks are a subsection of machine learning and they are very important for deep learning applications . Neural networks are made up of several layers, including an input layer where data is inputted, hidden layer(s) where data is classified, and an output layer where data is presented to the user.

CNN Layers

Within a CNN, the “hidden layers” do the entire job of filtering and analyzing the data. As the layers add on, the filters and task of the layer become more and more complex. Earlier layers focus on more simple features, including colours and shapes, while later layers focus on the more complex features. The three main hidden layers include the convolutional layer, pooling layer and fully-connected (FC) layer. There can be multiple convolutional layers, followed by a pooling layer, and finally, the FC layer.

Convolutional Layer(s)

Convolutional layers are the main heavy-duty section of the CNN. Each of the convolutional layers has filters that detect patterns in input data. A filter is a matrix in which we decide the number of rows and columns — normally, 3x3 size is used for filters. The process of checking for certain features and colours is known as convolution. The filter is applied to a certain area of the image, and between the input pixels and filter a dot product is calculated. The dot product data is handed over to an output array as the filter slides or convolves over each 3x3 pixel area till the entire image is covered. The output from the dot products is known as a feature/activation map or convolved feature.

As the image shows to the left, the filter will look at the image in sets of 3x3, sliding through the entire area of the image. As the image starts to detect some colour, the values displayed increase, from which a dot product is calculated.

In the next convolutional layer, the dot product and filter are multiplied to store into the next layer. This first layer was only looking at colour of a handwritten number to try and figure out what number it is, but the next layer looks at the specific features and characteristics of the shape of the number.

The (-1) represents black, (1) represents white, and (0) represents grey. Filter one looks over top horizontal edges, filter two over right vertical edges, filter 3 on bottom horizontal edges, and filter 4 on left vertical edges. The filters have the most bright edges on the respective areas they are filtering.

Looking at the simple image below, most of us would be able to determine that there are several shapes, edges, and colours — it’s pretty simple, but previously, computers used to take a good amount of time to figure this out. However, through the use of CNNs, a computer can detect all of these patterns along with patterns that we wouldn’t recognize, or make Face ID on your phone safe and possible.

For example, let’s say we are trying to figure out if an image contains a bicycle. The computer will think and analyze the bike in parts — handles, pedals, wheels, etc. The computer will break down the parts of the image and start to put it together to see if the parts resemble a bicycle. The parts are broken down in a feature hierarchy within the CNN as seen below.

Pooling Layers

The pooling layer has a job similar to the convolutional layers — filtering the entire input, but unlike the convolutional layer, it does not have any weight associated to the filters. The convolutional layer has certain weights to specific features (i.e; colour may be weighed more than edges), whereas the pooling layer does not have any weight assigned to specific features. There are two types of pooling:

Average Pooling: When the filter slides across the input, it will calculate the average value for the receptive field and send to the output array.

Maximum Pooling: When the filter slides across the input, it will choose the pixel with the max value and send to the output array. This approach is used more often when compared to average pooling.

The data can then be put through a fully-connected layer, or directly outputted to the result we would expect.

Fully Connected Layer

Final classification is done through a fully connected layer. This layer has connections to all data in the previous layer and uses this to finalize the analysis of the input data and form an output.

CNN Uses

In today’s date, CNNs are used in our everyday applications. Some uses include facial recognition, image classification, signal processing, natural language processing, speech recognition, analyzing documents, understanding climate, advertisement, healthcare, and more. For example, self-driving cars rely on CNNs to classify signs on the road, lanes, pedestrians, and oher vehicles. CNNs are most commonly used with cameras and/or images.

Overall, CNNs are being used more and more in today’s society!

Thanks for reading!

If you’d like to subscribe to my monthly newsletter, click here, and if you’d like to connect with me on LinkedIn, click here. Visit my Personal Website: jayantarora.ca