Deep Learning in Python with PyTorch - Tutorial and Demo

- Programming, Mathematics

Medusa in fiery scenery

As I am continuing my personal journey into deep learning research and development, I wanted to try out PyTorch, a machine learning framework with GPU acceleration primarily designed for the Python programming language. However, I couldn't find any good introductory resource online for it.

So I read all of PyTorch's documentation, made my own basic working model from scratch, and figured it would be a good idea to share it with the world so others don't struggle like I did!

If you are not familiar with the principles and mathematics of deep learning, I highly suggest you refer to How to Simulate Human Intuition, my original blog post on the subject. The implementation I will be showcasing below will match almost perfectly the model I presented in it, so feel free to take a look at it if you stumble on a foreign concept or something unclear.

Similarly, if you are not familiar with Python or programming in general, you can find a great beginner's guide on the Python Wiki.

And with that, let's get started!

Basic concepts


The most important PyTorch class is torch.Tensor. As the name implies, instances of this class represent a mathematical tensor, specifically an array of numbers of arbitrary dimensions. PyTorch supports a wide variety of data types that can be embedded in them, and it is also possible to configure them to be handled by a CPU or a GPU. By default, tensors are CPU-based and store 32-bit floating point data.

Once instantiated, PyTorch tensors can be used just like normal mathematical tensors, and PyTorch natively support a wide variety of common mathematical operations for this purpose.

One of the greatest strengths of a PyTorch tensor is that it can record operations on returned tensors to calculate a gradient later during backward propagation. The computed gradient is then available for retrieval through the tensor object itself.

Creating and using tensors

The base torch package contains various constructor and mathematical operator functions for PyTorch tensors.

In general, PyTorch tensors are constructed using the torch.tensor() function, which should not be confused with the torch.Tensor class.

Neural networks

Neural networks in PyTorch are designed to be custom classes that inherit the base class torch.nn.Module. The programmer should define the structure of the model of the network in the constructor of the custom subclass, then override its forward() method with the mathematical operations to apply in order to compute the simulated output from a normal input.

Loss functions

Predefined loss functions are defined in the torch.nn package, among other things.


The torch.optim package contains a bunch of predefined optimization algorithms that perform backward propagation. Since prefect gradient descent is pretty much impossible due to the resources that would be required to do so, several approximations are included.


The torch.nn.Parameter class is a subclass of torch.Tensor. It behaves pretty much the same as a normal PyTorch tensor, except that they are designed to be stored as attributes in a torch.nn.Module for easy retrieval by PyTorch optimizers.


I believe that the best way to learn PyTorch beyond its basic concepts is to have a functional demo with plenty of annotations, so that's what I did! It's not the best model ever, but it is relatively simple to understand.

I am releasing the code below under the MIT No Attribution License, and it has been verified to run correctly with Python 3.8.6 and PyTorch 1.8.1 for CUDA 10.2 on Windows. It should take less than 5 minutes to complete on a modern computer. You can find a downloadable version here:

Note that GPU acceleration is disabled by default. In fact, I experienced worse performance with GPU acceleration enabled on my machine than with CPU only!

While reading the source code of this demo, feel free to refer to PyTorch's documentation for more details about any unclear references to the torch package that I'm using.

# Import relevant Python modules.
import math
import time

# Import PyTorch.
import torch

# Constants to be customized by the programmer.
do_print_debug = False  # If all inputs and computations should be printed to manually verify them
use_cuda = False  # If CUDA-based GPU acceleration should be enabled
rng_seed = 123456  # Integer for the random number generator (RNG) seed, or set to None for non-deterministic RNG
layer_model = [2, 10, 5, 2]  # The size of each layer of the neural network, from input to hidden layers to output
nb_grad_iterations = 10000  # The number of gradient descent iterations to perform
batch_size = 5  # The number of input/output pairs to process before each gradient descent
learning_factor = 0.1  # The learning factor to be applied during gradient descent
grad_max = 1.0  # The maximum size a gradient can have before gradient descent, to prevent divergence
validation_inputs = [[0.1, 0.8], [0.5, 0.5], [0.0, 0.0]]  # A list of inputs to validate the trained neural network on

# Define a target black box function to be simulated by a neural network here.
# The input x is a torch.Tensor containing a number of elements equal to the first element of layer_model.
# The output must be a torch.Tensor containing a number of elements equal to the last element of layer_model.
def target_function(x):
    """ In this example, we use the function (x[0],x[1]) -> (abs(x[0]-x[1]),sqrt(x[0]^2+x[1]^2)).
    x: A torch.Tensor containing 2 elements.
    return torch.tensor([abs(x[0] - x[1]), math.sqrt(x[0]**2 + x[1]**2)])

# Define a function that will generate inputs here.
# It must return a torch.Tensor containing a number of elements equal to the first element of layer_model.
def get_next_input():
    """ In this example, we return a torch.Tensor containing 2 random elements in the range [0,1)."""
    return torch.rand(2)


# Set PyTorch's RNG.
if rng_seed is not None:
    torch.manual_seed(rng_seed)  # known state
    torch.seed()  # non-deterministic RNG

if use_cuda:
    # Set PyTorch to use CUDA-based GPU tensors instead of default CPU tensors.

def print_debug(to_print):
    """to_print: A string."""
    if do_print_debug:

def print_debug_parameters(network):
    """network: A torch.nn.Module."""
    if do_print_debug:
        for parameter in network.parameters():

def print_debug_gradients(network):
    """network: A torch.nn.Module."""
    if do_print_debug:
        for parameter in network.parameters():
            # PyTorch embeds gradients into its parameters.

class NeuralNetworkTestModule(torch.nn.Module):
    """This class defines the neural network."""
    bias = torch.ones(1)  # Constant equal to [1]
    def __init__(self, layer_sizes):
        """layer_sizes: The size of each layer of the neural network, from input to output."""
        super(NeuralNetworkTestModule, self).__init__()
        self.layer_sizes = layer_sizes
        for i in range(len(layer_sizes) - 1):
            # Create weights matrix, with random elements to break symmetry during gradient descent.
            # Note that the vertical dimension has an extra row in order to take bias into account.
            # Python's setattr() is used for compatibility with the inherited parameters() method.
            setattr(self, "weights_" + str(i), torch.nn.Parameter(torch.rand(layer_sizes[i+1], layer_sizes[i] + 1)))
    def forward(self, x):
        """Returns the trained output of the model for a given tensor.Tensor x though forward propagation."""
        for i in range(len(self.layer_sizes) - 1):
            # Include bias in x, then multiply by the weights matrix, then apply ReLU as the activation function.
            # Note that x is updated while transitionning from layer to layer.
            x = torch.nn.functional.relu(, "weights_" + str(i)),[x, self.bias])))
        return x

# Instantiate the neural network.
test_network = NeuralNetworkTestModule(layer_model)
print_debug("Initial weights:")

# Define the loss function as the mean squared error, equal in this context to the mean of all (output-target)^2.
loss_function = torch.nn.MSELoss()

# Define the learning algorithm as the stochastic gradient descent, an approximation of perfect gradient descent.
optimizer = torch.optim.SGD(test_network.parameters(), lr=learning_factor)

# Start measuring learning time.
time_start = time.time()

# Learn.
batch_output = [None] * batch_size
batch_target = [None] * batch_size
for grad_iteration in range(nb_grad_iterations):
    print_debug("Iteration " + str(grad_iteration))
    # Reset all gradients inside test_network to 0.
    # Process batch of input/output pairs.
    for batch_pos in range(batch_size):
        input = get_next_input()
        batch_output[batch_pos] = test_network(input)  # Calls test_network.forward(input)
        batch_target[batch_pos] = target_function(input)  # Output of black boxed function to simulate
        if do_print_debug:
            print("Batch element " + str(batch_pos))
            print("Input: " + str(input))
            print("Output: " + str(batch_output[batch_pos]))
            print("Target: " + str(batch_target[batch_pos]))
    # Update loss function.
    loss = loss_function(torch.stack(batch_output), torch.stack(batch_target))
    print_debug("Loss: " + str(loss))
    # Compute gradients through backward propagation
    print_debug("Computed gradients:")
    # Force gradients to be within [-grad_max,grad_max] to prevent divergence
    torch.nn.utils.clip_grad_value_(test_network.parameters(), grad_max)
    print_debug("Clipped gradients:")
    # Update weights in the direction of the clipped gradients
    print_debug("Updated weights:")

# Print learning time.
time_end = time.time()
print("Learning time in seconds: " + str(time_end - time_start))

print("--- VALIDATION ---")
for validation_input in validation_inputs:
    test_input = torch.tensor(validation_input)
    print("Input: " + str(test_input))
    # If the trained model is good, output should approximate target.
    print("Output: " + str(test_network(test_input)))
    print("Target: " + str(target_function(test_input)))

Related articles I wrote

Dice stacked in a triangle shape, with their face numbers matching their row position

I Designed the Perfect Gambling Game, But...

- Mathematics, Business, Game Design

Back in 2006-07-08, during the 13th Canadian Undergraduate Mathematics Conference at McGill University, I presented a gambling game I designed with the novel property of being both advantageous to players and the house, and that despite this proprety, that pretty much nobody in their right mind…

Stream of zeros and ones in space

Minifying JSON Text Beyond Whitespace

- Programming, Mathematics

JSON is a common data serialization format to transmit information over the Internet. However, as I mentioned in a previous article, it's far from optimal. Nevertheless, due to business requirements, producing data in this format may be necessary. I won't go into the details as to how one could…

Stream of concatenated JSON objects

Current Data Serialization Formats May Be a Waste of Money

- Programming, Business

Storing data. Transmitting data. Processing data. These fundamental topics of computer science are often overlooked nowadays thanks to the historical exponential growth of processing power, storage availability and bandwidth capabilities, along with a myriad of existing solutions to tackle them. So…

Girl sitting on a small deserted island at sunrise reading a magical book under a brain-shaped tree

The Ultimate Maths Cheat Sheet

- Mathematics

The following is a compilation of pretty much every single mathematical topic that I learned throughout my life, covering topics from all levels of education, along with external links for each of them for quick reference. I have compiled this list after extracting all of the relevant information…

Slippery road signs scattered everywhere

Scrum Is Not Agile

- Programming, Business, Psychology

While there is no denying that Scrum revolutionized the software industry for the better, it may seem a little strange to read about someone that dislikes it despite strongly agreeing with the Agile Manifesto, considering the creator of Scrum was one of its signers. However, after having experienced…

See all of my articles