Create A Neural Network From Scratch In PyTorch!
Learn how to create a neural network from scratch in Python using the library PyTorch.
Introduction
In this tutorial, I will guide you through the creation of a simple neural network from scratch in PyTorch. This is a practical tutorial. In the next tutorials, we will see more details about the theory of neural networks.
Objective :
The goal of this tutorial is to learn how to create a neural network in PyTorch and train it on a dataset. To do so, we will create a model able to recognize handwritten numbers.
Requirements
Python installed with PyTorch.
Basic understanding of neural networks theory and machine learning.
Basic Python programming skills.
What are neural networks used for?
In general, neural networks are used to approximate a mathematical function based on a set of labelled data. In classical programming, we have to have rules, inputs and we produce the outputs. In contrast, with neural networks, we have inputs (data) and outputs (labels) and, through an optimization procedure, we change the parameters of the neural network until the model is well-trained on the data set.
What is PyTorch ?
An open-source machine learning framework that accelerates the path from research prototyping to production deployment.
PyTorch is an open-source Python library used for machine learning based on Torch developed by Meta. PyTorch allows performing tensor calculations necessary for deep learning. Note that PyTorch is not the only library we can use for machine learning, other libraries such as Keras & TensorFlow, sci-kit-learn, theano... can also be used.
Import Dependencies
First thing first, let's import the basic dependencies for our project, in the next few lines I will try to explain what are these dependencies used for, the list of examples I give for each module is clearly not exhaustive :
torch: is the basic library to create tensors for example
torch.nn : is a module that allows us to use layers like Linear, ReLU and Conv layers.
torch.optim : allows us to use optimizers like SGD and Adam optimizers.
torch.nn.functional : allows us to use functions like sigmoid and relu.
torchvision.transforms : to use transformations that will be applied to the data.
DataLoader : to load a dataset with a batch and apply transforms to it.
torchvision.datasets : to load some datasets like mnist For more details, please visit the documentation here : PyTorch documentation
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import torchvision.datasets as datasets
Create a Fully Connected Network
Now, let's create a fully connected network, for that we will first create a class called neural_network, this class inherits from nn.Module, we inherit from this parent class to be able to use some methods like forward, basically, when this method is defined, it will be called directly when we apply the model to a given input x. Our class takes input_size and num_classes as input, input_size indicates the number of inputs for our neural network, In our case we will be using images of size 28x28 so the input_size will be equal to 784. As for the num_classes it will be equal to 10 since we want to predict numbers from 0 to 9.
The fc1 indicates the first layer, we use a Linear with number of inputs equal to 28x28=784 and output equals to 50, then fc2 indicates the second layer with number inputs equal to the output of the previous layer and number of outputs equal to num classes (aka 10).
# Create a fully connected network
class neural_network(nn.Module):
def __init__(self, input_size, num_classes):
super(neural_network, self).__init__()
self.input_size = input_size
self.num_classes = num_classes
self.fc1 = nn.Linear(self.input_size, 50)
self.fc2 = nn.Linear(50, self.num_classes)
def forward(self, x):
x = F.relu(self.fc1(x))
return F.sigmoid(self.fc2(x))
Define the Device and the Hyperparameters
When the model is too complex and the dataset too large, we opt to use a GPU rather than a CPU to speed up the training and optimization process. However, we would like to check if the GPU is available, if yes we can use it, if no, we will use the CPU and avoid the error: "cuda device is not available".
# Define device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
The fact that GPUs are faster than CPUs is due to the fact that GPUs perform operations in parallel, which makes them more suitable for processing images, videos and signals in general. While CPUs only perform calculations in series. This does not mean that GPUs are better than CPUs. In fact, CPUs are important for running all other software, like the operating system itself.
We will now define the parameters we will need to train our neural network. The input_size parameter indicates the number of neurons in the input layer. The batch_size indicates the number of images on which we will train the neural network at each time. The learning_rate indicates the step of the gradient descent. num_classes indicates the number of classes. Finally, num_epochs is the number of training iterations.
# Hyperparameters
input_size = 28 * 28
batch_size = 64
learning_rate = 0.001
num_classes = 10
num_epochs = 20
Define the Data-augmentation
What is data augmentation and why do we need it ?
Some of the problems that a neural network suffers from include underfitting and overfitting. Underfitting occurs when a neural network cannot learn the necessary features from the training dataset and yields low accuracy on the training dataset itself. For example, this may be because the training dataset is too large and the number of epochs is too small. The overfitting problem occurs when a neural network learns the training dataset by heart, which means that the model works perfectly (high accuracy) on the training dataset, but when it comes to the test dataset, the model performs poorly. This may be because the training dataset is not sufficient for the model to be able to generalize on new unseen data. To solve this problem, we can try to add more data, but this is usually challenging. What we can do, is to augment the data, which means apply some transformation on the input images such as translations, rotations, etc ...
# Data-augmentation
data_transforms = transforms.Compose(
[transforms.ToTensor(),
transforms.Resize(size=(28,28))]
)
For more details about data augmentationg and image transformations in pytorch please refer to : PyTorch Transforms
Load the dataset
Now that we have created the model, defined the necessary hyperparameters and the device, it's time to load the dataset. For this tutorial, we are going to use the famous MNIST dataset, which is a dataset containing the handwritten digits from 0 to 9. Our objective is to build a model able to differentiate these digits.
# Load dataset
train_data = datasets.MNIST(root='dataset/', download=True, train=True, transform=data_transforms)
train_dataloader = DataLoader(dataset=train_data, batch_size=batch_size, shuffle=True)
test_data = datasets.MNIST(root='dataset/', download=False, train=False, transform=transforms.ToTensor())
test_dataloader = DataLoader(dataset=test_data, batch_size=2, shuffle=False)
Note that to evaluate the performance of our model, we split the dataset to train and test data.
Define the model
Now, we will use our previous class tha we created and define the model. we add .to(device) to move the model parameters to the defined device, either the GPU or the CPU.
# Define the model
model = neural_network(input_size, num_classes).to(device)
Loss function and optimizer
After defining the model and loading the dataset, we now define the optimizer that will optimize the parameters of our neural network. For that, we will use the Adam optimizer. For the loss function we use the cross entropy loss function defined as follows :
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
Traine the model
Now, that everything is set up, we will train our model and simultaneously check its accuracy on the training and validation sets.
def check_accuracy(loader, model):
if loader.dataset.train:
print("Checking the accuracy on training dataset...")
else:
print("Checking the accuracy on validation dataset...")
num_corrects = 0.0
num_samples = 0.0
model.eval()
with torch.no_grad():
for x, y in loader:
x.to(device)
y.to(device)
x = x.reshape(x.shape[0], -1)
predictions = model(x)
_, predictions = predictions.max(1)
num_corrects += (predictions == y).sum()
num_samples += batch_size
print(f"Got the following accuracy : {num_corrects / num_samples}")
model.train()
for epoch in range(num_epochs):
for idx, (image, target) in enumerate(train_dataloader):
# Move everything to device
image = image.to(device)
target = target.to(device)
# Reshape
image = image.reshape(image.shape[0], -1) # (64, 28*28)
# predict
prediction = model(image)
# compute the loss
loss = criterion(prediction, target)
# update the parameters
optimizer.zero_grad()
loss.backward()
optimizer.step()
check_accuracy(test_dataloader, model)
Finally ...
In future blogposts we will see more details about neural networks theory and practice. Stay tuned!