Autograd란?

autograd 패키지는 딥러닝에서의 핵심인 backprop에서 필요로 하는 미분 연산을 자동으로 해줍니다. 모델에 값을 대입할 때마다, 그래프를 즉석에서 만들고 이를 이용하여 Gradient를 계산합니다.

Gradient 구하기

requires_grad=True 를 이용함으로써 이 것이 최적화 할 변수임을 알리고 이 변수에 대한 gradient를 구해야 한다는 것을 컴퓨터에게 명시해 줍니다.

x = torch.randn(3, requires_grad=True)
y = x + 2
print(x)
print(y)

z = y * y * 3
print(z)
z = z.mean()
print(z)
Python
복사

grad_fn attribute는 텐서가 어떤 연산을 하였는 지 연산 정보를 담고 있고, 나중에 최적화 하는데 사용됩니다. y , z 변수는 x (최적화 할 변수)를 연산하여 얻은 결과이기 때문에 grad_fn 을 가집니다.

수행하고자 하는 연산이 끝나면 .backward() 함수를 호출하여 구해야하는 모든 gradient를 구합니다.

z.backward() # z의 backprop을 할껀데....
print(x.grad) # dz/dx => x에 대한 편미분을 구하자!
Python
복사

하지만 결과 y가 여러 개로 나오는 경우 각각의 편미분을 구하기 위해서는 다른 방법을 사용해야 한다.

Jacobian 구하기

x = torch.randn(3, requires_grad=True)

y = x * 2
for _ in range(10):
    y = y * 2
print(y)
print(y.shape)

v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float32)
y.backward(v)
print(x.grad)
Python
복사

Gradient 계산을 멈추는 방법

training을 반복적으로 하면서 weight를 업데이트하게 되는데, 이 업데이트 과정까지 Computational Graph를 생성하는데 들어가서는 안된다. 따라서 이 weight를 update하는 과정은 gradient 계산을 멈춰야 한다. 이를 하기위해서는 크게 3가지 방법이 존재한다.

•

x.requires_grad_(False)

•

x.detach()

•

with torch.no_grad():

a = torch.randn(2, 2, requires_grad = True)

# first. 기존의 텐서에서 변경
a.requires_grad_(False) 

# second. 같은 내용을 가지는 새로운 텐서 생성
b = a.detach() 

# third. torch.no_grad() => 대부분의 코드에서 자주 볼 수 있음
with torch.no_grad():. 
    print((a ** 2).requires_grad)
Python
복사

새로운 Optimization step에 들어가기 전에는 weight의 gradient를 초기화해주어야 한다!

weights = torch.ones(4, requires_grad=True)

for epoch in range(10):
    # just a dummy example
    model_output = ((weights*2) + 1).sum()
    model_output.backward()
    
    print(weights.grad) # d(model_output) / d(weight)

    # optimize model, i.e. adjust weights...
    with torch.no_grad():
        weights -= 0.1 * weights.grad

    # 마지막 weight와 Output에 영향을 준다!
    weights.grad.zero_()

print(weights)
print(model_output)
Python
복사

위에서 배운 방법들을 응용하여 Backprop 예제를 살펴보자!

# y = w * x 

x = torch.tensor(1.0)
y = torch.tensor(2.0) 

# optimize하고 싶은 weight이므로 requires_grad=True 로 설정하자!
w = torch.tensor(1.0, requires_grad=True)

for epoch in range(500):
    
    # loss를 구하기 위해 foward-pass를 설정한다.
    y_predicted = w * x
    loss = (y_predicted - y)**2 # L2 Loss

    #gradient dLoss/dw를 계산하기위해 backward-pass 설정한다.
    loss.backward()
    
    # Optimization을 위한 계산과정은 Graph 구성에 포함되지 않게 하기위해 
    # 아래와 같이 작성한다.
    with torch.no_grad():
        w -= 0.01 * w.grad
    
    # optimzation을 반복할 때마다 gradient 초기화는 잊지말자
    w.grad.zero_()

print(w)
Python
복사

Autograd를 이용해 Gradient Descent를 구현해보기

import torch

# Here we replace the manually computed gradient with autograd

# Linear regression
# f = w * x 

# 정답 : f = 2 * x => w를 2로 가까이 가도록 만들자!
X = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
Y = torch.tensor([2, 4, 6, 8], dtype=torch.float32)

w = torch.tensor(0.0, dtype=torch.float32, requires_grad=True)

# model output
def forward(x):
    return w * x

# loss = MSE
def loss(y, y_pred):
    return ((y_pred - y)**2).mean()

# .item() 함수를 이용해 tensor에서 값을 가져올 수 있다
print(f'Prediction before training: f(5) = {forward(5).item():.3f}')

# Training
learning_rate = 0.01
n_iters = 100

for epoch in range(n_iters):
    # predict = forward pass
    y_pred = forward(X)

    # loss
    l = loss(Y, y_pred)

    # calculate gradients = backward pass
    l.backward()

    # update weights
    #w.data = w.data - learning_rate * w.grad
    with torch.no_grad():
        w -= learning_rate * w.grad
    
    # zero the gradients after updating
    w.grad.zero_()

    if epoch % 10 == 0:
        print(f'epoch {epoch+1}: w = {w.item():.3f}, loss = {l.item():.8f}')

print(f'Prediction after training: f(5) = {forward(5).item():.3f}')
Python
복사

Autograd - PyTorch Beginner 03 | Python Engineer

Learn all the basics you need to get started with this deep learning framework! In this part we learn how to calculate gradients using the autograd package in PyTorch. This tutorial contains the following topics: requires_grad attribute for Tensors Computational graph Backpropagation (brief explanation) How to stop autograd from tracking history How to zero (empty) gradients All code from this course can be found on GitHub.

https://www.python-engineer.com/courses/pytorchbeginner/03-autograd/

PyTorch Gradient 관련 설명 (Autograd)

gaussian37's blog

https://gaussian37.github.io/dl-pytorch-gradient/#derivative-%EA%B8%B0%EB%B3%B8-%EC%98%88%EC%A0%9C-1