[Pytorch Learning 101] Basics of PyTorch Tensor, Autograd and Partial Derivatives

4 min readNov 2, 2020

Image source: https://pxhere.com/en/photo/678444

For the weekend I was stuck on a simple question: If I have a matrix A¹⁰, how do I use PyTorch to calculate the following:

It seems straightforward, but the problem is how to implement this in PyTorch. So I went through some basics in the autograd package.

Derivative Over Scale Tensor

First, let’s take a look at what we can do with the tensor class. As this nice tutorial points out, the tensor object actually provides you with more details than its data value (yea that was all I cared about before writing this post). To be specific, for calculating the derivatives, you need to know:

whether the tensor is a leaf or not
whether the gradient is required on this tensor or not
the accumulated gradient.

Let’s break these concepts down further.

Using the example in the tutorial mentioned above, suppose we have a tensor:

a = torch.tensor(2.0, requires_grad=True)

it means: 1) a is a leaf; 2) gradient is required on this tensor; 3) since this tensor is just initialized, there’s no accumulated gradient on this tensor yet.

Which means, 1) if you print:

a.is_leaf

the answer is True

2) is straightforward. 3) if you print a.grad, you’ll get None as there’s no gradient yet.

Similarly, we can define another leaf tensor:

b = torch.tensor(3.0, requires_grad=True)

And when we do the calculation below:

c = a*b

c is no longer a leaf tensor, as a and b will be c’s leaves instead. Now we can start calculating the partial derivatives of c over a and b:

How do we do that? Just run:

c.backward()

what will happen now? The torch autograd will automatically calculate all the partial derivatives for you, and you only need to access the values like below:

a.grad

b.grad

You’ll see the corresponding values are 2.0 and 3.0, which is exactly as desired.

Then you might want to ask, what does the is_leaf do? It’s a more complicated issue if you want to obtain gradients for non-leaf tensors and the short answer is no, you can’t calculate gradient for non-leaf tensors. Let’s see another simple example here. Given the same a, b and c tensors, we define the 4th tensor

d = torch.tensor(4.0, requires_grad=True)

e = c*d

if we do e.backward() and print c.grad

you’ll get None and a warning saying that c is not a leaf tensor. You can double check the fact by running:

c.is_leaf

and will find the answer is False.

Retain Graph

One key concept of the torch autograd is that, the gradients are cumulative. Which means, if we run c.backward() first and then e.backward() second, then eventually it will be the following instead.

For calling the two backward() subsequently, you have to tick the retain_graph option like below:

c.backward(retain_graph=True)

e.backward()

Derivative Over Matrix Tensor

Now let’s get to see a more advanced example, what if we want to calculate the derivatives over a matrix? What’s the difference we should pay attention to?

For example, if we have:

a = torch.tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)

b = a**2

what will happen if we run b.backward()? The answer is, you’ll receive a runtime error. The backward method is designed for scalars only, and you have to give it a scalar, even if you want to get the derivative for each element. But this is quite an easy task, as for summation, it won’t change the derivative for each element. Which means, you can simply change b to the following:

b = torch.sum(a**2)

Think about it, when you calculate the derivative for b over a now, it won’t change things, right? Simply because

And now you do b.backward() and print a.grad, you’ll see the expected result.

Now It’s Time for Some Fun Stuff

Remember what we asked at the very beginning of this post? Suppose we want the following

It’s different than the simple matrix example above, since A⁹ is no longer a leaf. The trick we can play is, initialize A as a tensor with requires_grad_=False. Then A⁹ will become a leaf.

Then we can define A¹⁰=A⁹*A, and call A¹⁰.backward(), we’ll see the following:

Which is exactly what we want.

The examples discussed in this post are here. Try it and have fun!

[Pytorch Learning 101] Basics of PyTorch Tensor, Autograd and Partial Derivatives

Derivative Over Scale Tensor

Retain Graph

Derivative Over Matrix Tensor

Now It’s Time for Some Fun Stuff

Written by Mengliu Zhao

No responses yet