Softmax & Cross-Entropy 다루기

분류 문제를 풀 때 흔히들 output layer에서 softmax를 통해 각 feature가 가지는 확률로 나타낸다. 이를 통해 One-hot Encoding된 class label의 분포와 Cross-Entropy Loss를 구함으로써 학습을 진행시킨다.

이를 돕기 위해서 pytorch 상에서 함수를 제공한다. 바로 nn.CrossEntropyLoss 이다.

nn.CrossEntropyLoss 사용 시 주의점

1. 마지막 layer에 Softmax 추가하지 않기

nn.CrossEntropy는 두가지의 모듈이 합쳐진 함수이다. (nn.LogSoftmax + nn.NLLLoss) 따라서 이미 함수상에 softmax 함수가 포함이 되어있기 때문에 softmax를 별도로 추가할 필요가 없다.

2. 입력으로 실제 값과 예측 값 넣을 때 Input 형태 주의하기

실제 값 target Y는 One-hot 형태가 아니라 class label index 형태로 입력해주어야 한다. 예측 값 input Y_pred는 raw data 그대로 입력해주어야한다 (softmax를 거친 확률 형태 아님!!)

직접 사용한 코드를 보자

import torch
import torch.nn as nn
import numpy as np

loss = nn.CrossEntropyLoss() # loss(input,target) 순서로 입력!

# target is of size nBatch = 3
# Y (=target) 은 class label 형태로 입력되어야 한다 (ex, 0,1,2 ...)
Y = torch.tensor([2, 0, 1])

# input is of size nBatch x nClasses = 3 x 3
# Y_pred are logits, raw data 형태로 입력되어야 한다. 
Y_pred_good = torch.tensor(
    [[0.1, 0.2, 3.9], # predict class 2
    [1.2, 0.1, 0.3],  # predict class 0
    [0.3, 2.2, 0.2]]) # predict class 1

Y_pred_bad = torch.tensor(
    [[0.9, 0.2, 0.1],
    [0.1, 0.3, 1.5],
    [1.2, 0.2, 0.5]])

l1 = loss(Y_pred_good, Y)
l2 = loss(Y_pred_bad, Y)
print(f'Batch Loss1:  {l1.item():.4f}')
print(f'Batch Loss2: {l2.item():.4f}')

# get predictions
# torch.max => input tensor의 elements중 최대 값을 반환한다.
_, predictions1 = torch.max(Y_pred_good, 1)
_, predictions2 = torch.max(Y_pred_bad, 1)
print(f'Actual class: {Y},\n Y_pred1: {predictions1}, Y_pred2: {predictions2}')
Python
복사

Binary & Multi Classification 비교하기

Binary의 경우 nn.BCELoss() 를 사용하여 loss를 구한다. 이 함수는 앞서 다룬 nn.CrossEntropyLoss 와 다르게 sigmoid가 포함되어 있지 않기 때문에 따로 layer에 torch.sigmoid() 함수를 추가해주어야 한다.

# Binary classification
class NeuralNet1(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(NeuralNet1, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size) 
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, 1)  
    
    def forward(self, x):
        out = self.linear1(x)
        out = self.relu(out)
        out = self.linear2(out)

        # 마지막에 sigmoid 추가해주어야 한다.
        y_pred = torch.sigmoid(out)
        return y_pred

model = NeuralNet1(input_size=28*28, hidden_size=5)
criterion = nn.BCELoss()
Python
복사

Multi의 경우 nn.CrossEntropyLoss 를 그대로 사용하면 된다.

# Multiclass problem
class NeuralNet2(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet2, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size) 
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, num_classes)  
    
    def forward(self, x):
        out = self.linear1(x)
        out = self.relu(out)
        out = self.linear2(out)

        # 마지막에 softmax 없음!
        return out

model = NeuralNet2(input_size=28*28, hidden_size=5, num_classes=3)
criterion = nn.CrossEntropyLoss()
Python
복사

Softmax And Cross Entropy - PyTorch Beginner 11 | Python Engineer

Learn all the basics you need to get started with this deep learning framework! In this part we learn about the softmax function and the cross entropy loss function. Softmax and cross entropy are popular functions used in neural nets, especially in multiclass classification problems. Learn the math behind these functions, and when and how to use them in PyTorch.

https://www.python-engineer.com/courses/pytorchbeginner/11-softmax-and-crossentropy/