As I am continuing my personal journey into deep learning research and development, I wanted to try out PyTorch, a machine learning framework with GPU acceleration primarily designed for the Python programming language. However, I couldn't find any good introductory resource online for it.
So I read all of PyTorch's documentation, made my own basic working model from scratch, and figured it would be a good idea to share it with the world so others don't struggle like I did!
If you are not familiar with the principles and mathematics of deep learning, I highly suggest you refer to How to Simulate Human Intuition, my original blog post on the subject. The implementation I will be showcasing below will match almost perfectly the model I presented in it, so feel free to take a look at it if you stumble on a foreign concept or something unclear.
Similarly, if you are not familiar with Python or programming in general, you can find a great beginner's guide on the Python Wiki.
And with that, let's get started!
The most important PyTorch class is
torch.Tensor. As the name implies, instances of this class represent a mathematical tensor, specifically an array of numbers of arbitrary dimensions. PyTorch supports a wide variety of data types that can be embedded in them, and it is also possible to configure them to be handled by a CPU or a GPU. By default, tensors are CPU-based and store 32-bit floating point data.
Once instantiated, PyTorch tensors can be used just like normal mathematical tensors, and PyTorch natively support a wide variety of common mathematical operations for this purpose.
One of the greatest strengths of a PyTorch tensor is that it can record operations on returned tensors to calculate a gradient later during backward propagation. The computed gradient is then available for retrieval through the tensor object itself.
Creating and using tensors
torch package contains various constructor and mathematical operator functions for PyTorch tensors.
In general, PyTorch tensors are constructed using the
torch.tensor() function, which should not be confused with the
Neural networks in PyTorch are designed to be custom classes that inherit the base class
torch.nn.Module. The programmer should define the structure of the model of the network in the constructor of the custom subclass, then override its
forward() method with the mathematical operations to apply in order to compute the simulated output from a normal input.
Predefined loss functions are defined in the
torch.nn package, among other things.
torch.optim package contains a bunch of predefined optimization algorithms that perform backward propagation. Since prefect gradient descent is pretty much impossible due to the resources that would be required to do so, several approximations are included.
torch.nn.Parameter class is a subclass of
torch.Tensor. It behaves pretty much the same as a normal PyTorch tensor, except that they are designed to be stored as attributes in a
torch.nn.Module for easy retrieval by PyTorch optimizers.
I believe that the best way to learn PyTorch beyond its basic concepts is to have a functional demo with plenty of annotations, so that's what I did! It's not the best model ever, but it is relatively simple to understand.
I am releasing the code below under the MIT No Attribution License, and it has been verified to run correctly with Python 3.8.6 and PyTorch 1.8.1 for CUDA 10.2 on Windows. It should take less than 5 minutes to complete on a modern computer. You can find a downloadable version here: PyTorch.py
Note that GPU acceleration is disabled by default. In fact, I experienced worse performance with GPU acceleration enabled on my machine than with CPU only!
While reading the source code of this demo, feel free to refer to PyTorch's documentation for more details about any unclear references to the
torch package that I'm using.
# Import relevant Python modules. import math import time # Import PyTorch. import torch # Constants to be customized by the programmer. do_print_debug = False # If all inputs and computations should be printed to manually verify them use_cuda = False # If CUDA-based GPU acceleration should be enabled rng_seed = 123456 # Integer for the random number generator (RNG) seed, or set to None for non-deterministic RNG layer_model = [2, 10, 5, 2] # The size of each layer of the neural network, from input to hidden layers to output nb_grad_iterations = 10000 # The number of gradient descent iterations to perform batch_size = 5 # The number of input/output pairs to process before each gradient descent learning_factor = 0.1 # The learning factor to be applied during gradient descent grad_max = 1.0 # The maximum size a gradient can have before gradient descent, to prevent divergence validation_inputs = [[0.1, 0.8], [0.5, 0.5], [0.0, 0.0]] # A list of inputs to validate the trained neural network on # Define a target black box function to be simulated by a neural network here. # The input x is a torch.Tensor containing a number of elements equal to the first element of layer_model. # The output must be a torch.Tensor containing a number of elements equal to the last element of layer_model. def target_function(x): """ In this example, we use the function (x,x) -> (abs(x-x),sqrt(x^2+x^2)). x: A torch.Tensor containing 2 elements. """ return torch.tensor([abs(x - x), math.sqrt(x**2 + x**2)]) # Define a function that will generate inputs here. # It must return a torch.Tensor containing a number of elements equal to the first element of layer_model. def get_next_input(): """ In this example, we return a torch.Tensor containing 2 random elements in the range [0,1).""" return torch.rand(2) # --- CODE STARTS HERE. --- # Set PyTorch's RNG. if rng_seed is not None: torch.manual_seed(rng_seed) # known state else: torch.seed() # non-deterministic RNG if use_cuda: # Set PyTorch to use CUDA-based GPU tensors instead of default CPU tensors. torch.set_default_tensor_type(torch.cuda.FloatTensor) def print_debug(to_print): """to_print: A string.""" if do_print_debug: print(to_print) def print_debug_parameters(network): """network: A torch.nn.Module.""" if do_print_debug: for parameter in network.parameters(): print(parameter) def print_debug_gradients(network): """network: A torch.nn.Module.""" if do_print_debug: for parameter in network.parameters(): # PyTorch embeds gradients into its parameters. print(parameter.grad) class NeuralNetworkTestModule(torch.nn.Module): """This class defines the neural network.""" bias = torch.ones(1) # Constant equal to  def __init__(self, layer_sizes): """layer_sizes: The size of each layer of the neural network, from input to output.""" super(NeuralNetworkTestModule, self).__init__() self.layer_sizes = layer_sizes for i in range(len(layer_sizes) - 1): # Create weights matrix, with random elements to break symmetry during gradient descent. # Note that the vertical dimension has an extra row in order to take bias into account. # Python's setattr() is used for compatibility with the inherited parameters() method. setattr(self, "weights_" + str(i), torch.nn.Parameter(torch.rand(layer_sizes[i+1], layer_sizes[i] + 1))) def forward(self, x): """Returns the trained output of the model for a given tensor.Tensor x though forward propagation.""" for i in range(len(self.layer_sizes) - 1): # Include bias in x, then multiply by the weights matrix, then apply ReLU as the activation function. # Note that x is updated while transitionning from layer to layer. x = torch.nn.functional.relu(torch.mv(getattr(self, "weights_" + str(i)), torch.cat([x, self.bias]))) return x # Instantiate the neural network. test_network = NeuralNetworkTestModule(layer_model) print_debug("Initial weights:") print_debug_parameters(test_network) print_debug("") # Define the loss function as the mean squared error, equal in this context to the mean of all (output-target)^2. loss_function = torch.nn.MSELoss() # Define the learning algorithm as the stochastic gradient descent, an approximation of perfect gradient descent. optimizer = torch.optim.SGD(test_network.parameters(), lr=learning_factor) # Start measuring learning time. time_start = time.time() # Learn. batch_output = [None] * batch_size batch_target = [None] * batch_size for grad_iteration in range(nb_grad_iterations): print_debug("Iteration " + str(grad_iteration)) # Reset all gradients inside test_network to 0. optimizer.zero_grad() # Process batch of input/output pairs. for batch_pos in range(batch_size): input = get_next_input() batch_output[batch_pos] = test_network(input) # Calls test_network.forward(input) batch_target[batch_pos] = target_function(input) # Output of black boxed function to simulate if do_print_debug: print("Batch element " + str(batch_pos)) print("Input: " + str(input)) print("Output: " + str(batch_output[batch_pos])) print("Target: " + str(batch_target[batch_pos])) # Update loss function. loss = loss_function(torch.stack(batch_output), torch.stack(batch_target)) print_debug("Loss: " + str(loss)) # Compute gradients through backward propagation loss.backward() print_debug("Computed gradients:") print_debug_gradients(test_network) # Force gradients to be within [-grad_max,grad_max] to prevent divergence torch.nn.utils.clip_grad_value_(test_network.parameters(), grad_max) print_debug("Clipped gradients:") print_debug_gradients(test_network) # Update weights in the direction of the clipped gradients optimizer.step() print_debug("Updated weights:") print_debug_parameters(test_network) print_debug("") # Print learning time. time_end = time.time() print("Learning time in seconds: " + str(time_end - time_start)) print("") print("--- VALIDATION ---") for validation_input in validation_inputs: print("") test_input = torch.tensor(validation_input) print("Input: " + str(test_input)) # If the trained model is good, output should approximate target. print("Output: " + str(test_network(test_input))) print("Target: " + str(target_function(test_input)))
"Everything must be done now. Let's re-use existing proven solutions and build over them so we don't waste time." And thus, people will look at the top 2 or 3 most popular solutions they already know about or can easily find on the Internet, compare them, pick the best one, and maybe add or change…
Detective fiction, and particularly whodunits, have been really good at being engaging people in attempting to solving the mystery presented before the final reveal. Video games allows such stories to thrive with a level of interactivity that can directly engage the player in this process as an…
You can now watch yours truly explain the basics of quantum programming and how to execute a custom quantum program on the IBM Q Experience in less than 10 minutes! The following video summarizes my very first blog post Quantum Programming 101 in a dynamic format and apprends to it a real-time…
So far in my career I had to architect three different front-end test automation solutions from scratch. Such solutions are really useful to minimize manual regression tests after a code change. What I would like to share today is what I've seen work and not work based on my experience. Note that…
A few months ago, I started to study deep learning, a branch in computer science heavily inspired by biology that allows programs to learn arbitrary concepts without explicit programming. How good is deep learning? Well, good enough to play Go with super-human performance, a significant milestone in…
See all of my articles