Deep Learning in Python With PyTorch - Tutorial and Demo

- Programming, Mathematics

Medusa in fiery scenery

As I am continuing my personal journey into deep learning research and development, I wanted to try out PyTorch, a machine learning framework with GPU acceleration primarily designed for the Python programming language. However, I couldn't find any good introductory resource online for it.

So I read all of PyTorch's documentation, made my own basic working model from scratch, and figured it would be a good idea to share it with the world so others don't struggle like I did!

If you are not familiar with the principles and mathematics of deep learning, I highly suggest you refer to How to Simulate Human Intuition, my original blog post on the subject. The implementation I will be showcasing below will match almost perfectly the model I presented in it, so feel free to take a look at it if you stumble on a foreign concept or something unclear.

Similarly, if you are not familiar with Python or programming in general, you can find a great beginner's guide on the Python Wiki.

And with that, let's get started!

Basic concepts


The most important PyTorch class is torch.Tensor. As the name implies, instances of this class represent a mathematical tensor, specifically an array of numbers of arbitrary dimensions. PyTorch supports a wide variety of data types that can be embedded in them, and it is also possible to configure them to be handled by a CPU or a GPU. By default, tensors are CPU-based and store 32-bit floating point data.

Once instantiated, PyTorch tensors can be used just like normal mathematical tensors, and PyTorch natively support a wide variety of common mathematical operations for this purpose.

One of the greatest strengths of a PyTorch tensor is that it can record operations on returned tensors to calculate a gradient later during backward propagation. The computed gradient is then available for retrieval through the tensor object itself.

Creating and using tensors

The base torch package contains various constructor and mathematical operator functions for PyTorch tensors.

In general, PyTorch tensors are constructed using the torch.tensor() function, which should not be confused with the torch.Tensor class.

Neural networks

Neural networks in PyTorch are designed to be custom classes that inherit the base class torch.nn.Module. The programmer should define the structure of the model of the network in the constructor of the custom subclass, then override its forward() method with the mathematical operations to apply in order to compute the simulated output from a normal input.

Loss functions

Predefined loss functions are defined in the torch.nn package, among other things.


The torch.optim package contains a bunch of predefined optimization algorithms that perform backward propagation. Since prefect gradient descent is pretty much impossible due to the resources that would be required to do so, several approximations are included.


The torch.nn.Parameter class is a subclass of torch.Tensor. It behaves pretty much the same as a normal PyTorch tensor, except that they are designed to be stored as attributes in a torch.nn.Module for easy retrieval by PyTorch optimizers.


I believe that the best way to learn PyTorch beyond its basic concepts is to have a functional demo with plenty of annotations, so that's what I did! It's not the best model ever, but it is relatively simple to understand.

I am releasing the code below under the MIT No Attribution License, and it has been verified to run correctly with Python 3.8.6 and PyTorch 1.8.1 for CUDA 10.2 on Windows. It should take less than 5 minutes to complete on a modern computer. You can find a downloadable version here:

Note that GPU acceleration is disabled by default. In fact, I experienced worse performance with GPU acceleration enabled on my machine than with CPU only!

While reading the source code of this demo, feel free to refer to PyTorch's documentation for more details about any unclear references to the torch package that I'm using.

# Import relevant Python modules.
import math
import time

# Import PyTorch.
import torch

# Constants to be customized by the programmer.
do_print_debug = False  # If all inputs and computations should be printed to manually verify them
use_cuda = False  # If CUDA-based GPU acceleration should be enabled
rng_seed = 123456  # Integer for the random number generator (RNG) seed, or set to None for non-deterministic RNG
layer_model = [2, 10, 5, 2]  # The size of each layer of the neural network, from input to hidden layers to output
nb_grad_iterations = 10000  # The number of gradient descent iterations to perform
batch_size = 5  # The number of input/output pairs to process before each gradient descent
learning_factor = 0.1  # The learning factor to be applied during gradient descent
grad_max = 1.0  # The maximum size a gradient can have before gradient descent, to prevent divergence
validation_inputs = [[0.1, 0.8], [0.5, 0.5], [0.0, 0.0]]  # A list of inputs to validate the trained neural network on

# Define a target black box function to be simulated by a neural network here.
# The input x is a torch.Tensor containing a number of elements equal to the first element of layer_model.
# The output must be a torch.Tensor containing a number of elements equal to the last element of layer_model.
def target_function(x):
    """ In this example, we use the function (x[0],x[1]) -> (abs(x[0]-x[1]),sqrt(x[0]^2+x[1]^2)).
    x: A torch.Tensor containing 2 elements.
    return torch.tensor([abs(x[0] - x[1]), math.sqrt(x[0]**2 + x[1]**2)])

# Define a function that will generate inputs here.
# It must return a torch.Tensor containing a number of elements equal to the first element of layer_model.
def get_next_input():
    """ In this example, we return a torch.Tensor containing 2 random elements in the range [0,1)."""
    return torch.rand(2)


# Set PyTorch's RNG.
if rng_seed is not None:
    torch.manual_seed(rng_seed)  # known state
    torch.seed()  # non-deterministic RNG

if use_cuda:
    # Set PyTorch to use CUDA-based GPU tensors instead of default CPU tensors.

def print_debug(to_print):
    """to_print: A string."""
    if do_print_debug:

def print_debug_parameters(network):
    """network: A torch.nn.Module."""
    if do_print_debug:
        for parameter in network.parameters():

def print_debug_gradients(network):
    """network: A torch.nn.Module."""
    if do_print_debug:
        for parameter in network.parameters():
            # PyTorch embeds gradients into its parameters.

class NeuralNetworkTestModule(torch.nn.Module):
    """This class defines the neural network."""
    bias = torch.ones(1)  # Constant equal to [1]
    def __init__(self, layer_sizes):
        """layer_sizes: The size of each layer of the neural network, from input to output."""
        super(NeuralNetworkTestModule, self).__init__()
        self.layer_sizes = layer_sizes
        for i in range(len(layer_sizes) - 1):
            # Create weights matrix, with random elements to break symmetry during gradient descent.
            # Note that the vertical dimension has an extra row in order to take bias into account.
            # Python's setattr() is used for compatibility with the inherited parameters() method.
            setattr(self, "weights_" + str(i), torch.nn.Parameter(torch.rand(layer_sizes[i+1], layer_sizes[i] + 1)))
    def forward(self, x):
        """Returns the trained output of the model for a given tensor.Tensor x though forward propagation."""
        for i in range(len(self.layer_sizes) - 1):
            # Include bias in x, then multiply by the weights matrix, then apply ReLU as the activation function.
            # Note that x is updated while transitionning from layer to layer.
            x = torch.nn.functional.relu(, "weights_" + str(i)),[x, self.bias])))
        return x

# Instantiate the neural network.
test_network = NeuralNetworkTestModule(layer_model)
print_debug("Initial weights:")

# Define the loss function as the mean squared error, equal in this context to the mean of all (output-target)^2.
loss_function = torch.nn.MSELoss()

# Define the learning algorithm as the stochastic gradient descent, an approximation of perfect gradient descent.
optimizer = torch.optim.SGD(test_network.parameters(), lr=learning_factor)

# Start measuring learning time.
time_start = time.time()

# Learn.
batch_output = [None] * batch_size
batch_target = [None] * batch_size
for grad_iteration in range(nb_grad_iterations):
    print_debug("Iteration " + str(grad_iteration))
    # Reset all gradients inside test_network to 0.
    # Process batch of input/output pairs.
    for batch_pos in range(batch_size):
        input = get_next_input()
        batch_output[batch_pos] = test_network(input)  # Calls test_network.forward(input)
        batch_target[batch_pos] = target_function(input)  # Output of black boxed function to simulate
        if do_print_debug:
            print("Batch element " + str(batch_pos))
            print("Input: " + str(input))
            print("Output: " + str(batch_output[batch_pos]))
            print("Target: " + str(batch_target[batch_pos]))
    # Update loss function.
    loss = loss_function(torch.stack(batch_output), torch.stack(batch_target))
    print_debug("Loss: " + str(loss))
    # Compute gradients through backward propagation
    print_debug("Computed gradients:")
    # Force gradients to be within [-grad_max,grad_max] to prevent divergence
    torch.nn.utils.clip_grad_value_(test_network.parameters(), grad_max)
    print_debug("Clipped gradients:")
    # Update weights in the direction of the clipped gradients
    print_debug("Updated weights:")

# Print learning time.
time_end = time.time()
print("Learning time in seconds: " + str(time_end - time_start))

print("--- VALIDATION ---")
for validation_input in validation_inputs:
    test_input = torch.tensor(validation_input)
    print("Input: " + str(test_input))
    # If the trained model is good, output should approximate target.
    print("Output: " + str(test_network(test_input)))
    print("Target: " + str(target_function(test_input)))

Related articles I wrote

Spaceship flying over active volcanoes

A Universe and World Creation Script for Mongoose Traveller 2nd Edition

- Tabletop RPGs, Programming

The following is a Python script developed by yours truly to generate a sector according to the core rulebook of the Mongoose Traveller 2nd Edition tabletop RPG, exactly as described in the Universe and World Creation chapter. It is designed to describe worlds in human-readable format as much as…

Dusty light bulb lying on the floor

Stop! Your Ideas Are Stale!

- Business, Programming

"Everything must be done now. Let's re-use existing proven solutions and build over them so we don't waste time." And thus, people will look at the top 2 or 3 most popular solutions they already know about or can easily find on the Internet, compare them, pick the best one, and maybe add or change…

Lady Justice

Reasonable Doubt as a Game Mechanic

- Game Design, Mathematics

Detective fiction, and particularly whodunits, have been really good at being engaging people in attempting to solving the mystery presented before the final reveal. Video games allows such stories to thrive with a level of interactivity that can directly engage the player in this process as an…

City numerically connected to computers

Free Quantum Programming on the IBM Q Experience

- Programming, Mathematics

You can now watch yours truly explain the basics of quantum programming and how to execute a custom quantum program on the IBM Q Experience in less than 10 minutes! The following video summarizes my very first blog post Quantum Programming 101 in a dynamic format and apprends to it a real-time…

Connected gears with a business man or woman in each one of them

Building a Great Front-End Test Automation Solution

- Quality Assurance, Programming

So far in my career I had to architect three different front-end test automation solutions from scratch. Such solutions are really useful to minimize manual regression tests after a code change. What I would like to share today is what I've seen work and not work based on my experience. Note that…

See all of my articles