How to Understand and Implement Neural Networks: A Step-by-Step Guide

Artificial Neural Network – In our daily lives, we effortlessly recognize faces and understand voices, tasks that seem almost second nature to us. But explaining how we do these things to machines is not easy. So, how do we make machines think? Can we teach them using examples?

Think of it like this: just as we fuel our brains with energy, do we need to feed machine learning algorithms to make them learn? Machine learning models are made up of mathematical structures that allow them to map input to output

Imagine, you want to teach a machine to recognize faces in photos. You’d give it tons of pictures with faces labeled ‘face’ and pictures without labeled ‘face’. The machine learns by looking at these examples, figuring out patterns, and then making its guesses whether a new picture has a face or not.

Now, let’s dive deeper and understand what an artificial neural network is, drawing inspiration from the intricate workings of biological neurons to construct models that simulate learning processes.

Let’s begin.

Overview

In this post you will learn:

What are neural networks and how they draw inspiration from biological neurons
Modeling a neural network for linear problems and transitioning to ANN for complex problems
Concept of learning in artificial neural networks – training phase, backpropagation, optimization, generalization

Let’s get started

1- Introduction to Artificial Neural Network: Drawing Inspiration from Neuron

The concept of artificial neural networks comes from biological neurons. Here’s how they share similarities in terms of function and structure:

biological neural network vs artificial neural network — Figure 1: Biological Neural Network on the left side and Artificial Neural Network on the right side

Biological Neurons	Artificial Neural Network (ANNs)
Biological neurons consist of the cell body (soma) for processing impulses, dendrites for receiving signals, and axons for transmitting the signals to other neurons	They undergo learning via synaptic plasticity, where synapses become stronger or weaker over time in response to activity changes
Biological neurons accumulate signals through dendrites, firing if the combined signals surpasses a threshold	ANNs utilize backpropagation, adjusting weights based on the error between predicted and actual outcomes

It is important to note, that while ANNs draw some inspiration from biological neurons in structure and function, modern computer science research in neural networks focuses more on developing useful models to solve the problem at hand rather than focusing on the internal workings of the brain.

deep learning research area — Figure 2: Deep Learning research focus

2- Modeling an Artificial Neural Network

To model a neural network, let’s start with a single neuron. An ANN takes the inputs $x_0$ through $x_n$ , multiplies them with weights $w_0$ through $w_n$ and sums the products to produce output $y$ .

Mathematically, the output $y$ of the neuron can be represented as:

$y=x_0w_0 + x_1w_1 + x_2w_2$

$y = \sum_{i} x_iw_i = w^{T}x$

This operation can be expressed as a simple matrix multiplication. Assume that weights are stored in a row matrix ( $w^T$ ) and inputs $x$ are stored in the column matrix. The output of the neuron is simply the multiplication of each input by its corresponding weight, summing the weighted inputs, and passing through an activation function.

but, why do we do this?

What we are trying to accomplish here is to approximate a function. The neuron learns to map the input data to the desired output through the adjustment of weights during the training process. This process allows neural networks to approximate functions and make predictions on data.

Let’s take an example of a linear function:

Given a set of $(x,y)$ pairs, our goal is to find such weights $w_0, w_1, w_2$ that fit the data we have the best.

Figure 3: Example of a simple linear function

However, the functions that we want to approximate might not be as simple as $y=2x$ . The functions might be complex:

complex function - swiss roll distribution — Figure 4: Complex function shown in the form of a Swiss roll data points distribution

We can learn more complex functions by learning a network of neurons, where the output of one set of neurons is fed into another set of neurons as inputs. Let’s have a look at a common type of neural network: a multilayer perceptron.

3- Artificial Neural Network: Multi-Layer Perceptron

A common type of ANN is a multi-layer perceptron (MLP) known for its versatility and ability to handle complex tasks. These types of networks are referred to as feedforward networks because data flows in one direction from the input layer through the hidden layers to the output. Let’s now break down the components of MLP to understand its anatomy:

artificial neural network architecture — Figure 5: Artificial Neural Network (ANN)

Input Layer: Its function is to pass input data to subsequent layers, consisting of neurons that receive input data, each neuron represents a feature or attribute of input data, number of neurons corresponds to the number of input features or dimensions in the dataset.
Hidden Layers: These layers learn and produce outputs that are useful for the next layers. The term “hidden” refers to the fact that we don’t have to explicitly specify what happens at these layers.
Output Layer: This layer has as many neurons as the output variables, e.g.
- Regression Problem: For predicting the price of a car, $y_0$ would be the predicted current value.
- Classification Problem: For recognizing indoor and outdoor scenarios, output neurons $y_0$ and $y_1$ can be the indoor and outdoor neurons.

Just now, we learned that we don’t have to specify what happens within the hidden layers, how would these layers learn to approximate a function? The answer is simple, by using learning algorithms

But before we jump into it, here are a few more things to remember about hidden layers. By increasing the depth (number of layers) and width (number of neurons per layer), we can increase the network’s complexity. This allows the network to learn more complex patterns, or does it?

complexity of artificial neural network — Figure 6: Increasing the depth and width of ANNs increases the network’s complexity

4- Concept of Learning in Artificial Neural Network

Learning in this context refers to the process by which the network adjusts its parameters (such as weights and biases) based on input data and corresponding output during training. Training a neural network is an optimization problem. The goal is to minimize the loss function (measures how well the model is doing on training data). In essence, learning involves updating the parameters of a neural network in a way that improves the performance of the network.

To delve deeper, let’s break down the concept of learning in neural networks:

i. Training Phase

During the training phase, the neural network receives a dataset consisting of input samples and their corresponding target outputs.
The network processes each input sample through its layers, computes predictions, and compares them with the actual target outputs.
The neural network calculates a loss based on the disparity between predicted and actual outputs, quantifying the error.
The objective of the training phase is to minimize this loss function by adjusting the parameters of the neural network, namely weights and biases.

ii. Backpropagation Algorithm:

The backpropagation is the primary mechanism through which neural networks learn.
It involves propagating the error backward through the network, layer by layer, and updating the weights and biases in a way that reduces the error.
The algorithm computes the gradient of the loss function with respect to each parameter (weight and bias) using techniques like the chain rule. Then it adjusts the parameters in the direction that decreases the loss.

iii. Optimization Techniques:

During training, practitioners employ various optimization techniques such as stochastic gradient descent (SGD), Adam, and RMSprop to effectively update the parameters of the network.
These techniques determine the magnitude and direction of parameter updates, taking into account factors like learning rate, momentum, and regularization.

iv. Generalization:

The goal of learning in neural networks is to generalize well on unseen data.
Good generalization requires the network to learn meaningful patterns and relationships from training data without overfitting.

Learning involves iteratively adjusting the parameters of the network based on the input data and associated target outputs, to minimize error calculated using the loss function. Through techniques like backpropagation and optimization, neural networks can effectively learn to make accurate predictions on unseen data.

Summary

In this post, we covered:

Neural networks draw inspiration from the workings of biological neurons, where they receive signals from other neurons, accumulate the signals where each input has a different weight, and fire if the signal is strong enough.
In modeling a neural network, individual neurons compute weighted sums of inputs and apply activation functions to produce outputs.
Learning in neural networks involves adjusting parameters (weights and biases) based on input data and target output to minimize the loss function.
Backpropagation algorithms involve the propagation of error gradients backward through the network to update parameters.

Video Tutorials

Deep Learning Crash Course – Leo Isikdogan
An overview of deep learning and neural networks – Digital Sreeni

Deep Learning with Python, 2021