[PyTorch] MNIST 문자 인식 모델

MNIST(Modified National Institute of Standard an Technology) 데이터셋:

1. Pytorch를 사용하기 위한 라이브러리 불러오기

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transfroms

torch: PyTorch 라이브러리. 텐서 연산 및 신경망 구성에 사용
torch.nn: 신경망 레이어를 정의할 때 사용
torch.optim: 학습 과정에서 사용될 옵티마이저를 정의
torchvision: 이미지 관련 데이터셋과 전처리를 위한 모듈
transforms: 데이터 전처리를 위한 모듈

Setting

1. GPU(CUDA)가 가능하면 GPU를 사용하고, 아니면 CPU를 사용하도록 설정

device = 'cuda' if torch.cuda.is_available() else 'cpu'
torch.manual_seed(777)
if device == 'cuda':
    torch.cuda.manual_seed_all(777)
print(device + " is available")

2. 학습 하이퍼파라미터 설정

learning_rate = 0.001
batch_size = 64
num_classes = 10
epochs = 10

learning_rate: 학습률 설정 (경사하강법에서 얼마나 빠르게 가중치를 업데이트할지를 결정).
batch_size: 한 번에 학습에 사용할 데이터 샘플의 수.
num_classes: MNIST 데이터셋의 클래스 개수는 10 (숫자 0-9).
epochs: 전체 데이터셋을 학습할 반복 횟수

2. 데이터셋 로드 및 전처리

# MNIST 데이터셋 로드
train_set = torchvision.datasets.MNIST(
    root = './data/MNIST',
    train = True,
    download = True,
    transform = transfroms.Compose([
        transfroms.ToTensor()  # 데이터의 텐서화
    ])
)
test_set = torchvision.datasets.MNIST(
    root = './data/MNIST',
    train = False,
    download = True,
    transform = transfroms.Compose([
        transfroms.ToTensor() # 데이터의 텐서화 
    ])
)

print(f'Train set size: {len(train_set)}')
print(f'Test set size: {len(test_set)}')

print(f'Image size: {image.size()}')
print(f'Label: {label}')

DataLoader: 데이터를 배치 단위로 로드하여 네트워크에 전달하는 역할이고, batch_size만큼 데이터를 나눠서 학습에 사용

from torch.utils.data import DataLoader

train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False)

images, labels = next(iter(train_loader))
print(f'Batch size: {images.size()}')  # 배치의 크기 (64, 1, 28, 28)
print(f'Labels: {labels[:10]}')  # 첫 10개의 라벨 확인

3. CNN model 정의

class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)  # 1채널 입력, 10개의 필터, 5x5 커널
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5) # 10채널 입력, 20개의 필터, 5x5 커널
        self.drop2D = nn.Dropout2d(p=0.25)           # 드롭아웃, 25% 확률로 무작위 뉴런 비활성화
        self.mp = nn.MaxPool2d(2)                    # 2x2 크기의 Max Pooling
        self.fc1 = nn.Linear(320, 100)               # 완전 연결층: 320 -> 100
        self.fc2 = nn.Linear(100, 10)                # 완전 연결층: 100 -> 10 (클래스 수)
    
    def forward(self, x):
        x = F.relu(self.mp(self.conv1(x)))  # Conv1 -> ReLU -> Max Pool
        x = F.relu(self.mp(self.conv2(x)))  # Conv2 -> ReLU -> Max Pool
        x = self.drop2D(x)                  # 드롭아웃
        x = x.view(x.size(0), -1)           # 데이터를 평탄화 (Flatten)
        x = self.fc1(x)                     # 완전 연결층 1
        x = self.fc2(x)                     # 완전 연결층 2
        return F.log_softmax(x, dim=1)      # log Softmax로 확률 값 출력

4. 학습

for epoch in range(epochs):
    avg_cost = 0
    for data, target in train_loader:
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()  # 기울기 초기화
        hypothesis = model(data)  # 모델을 통해 예측
        cost = criterion(hypothesis, target)  # 손실 계산
        cost.backward()  # 역전파 계산
        optimizer.step()  # 파라미터 업데이트
        avg_cost += cost / len(train_loader)
    print('[Epoch: {:>4}] cost = {:>.9}'.format(epoch + 1, avg_cost))

5. 평가

model.eval()  # 모델을 평가 모드로 전환 (dropout, batch_norm 해제)
with torch.no_grad():  # 기울기 계산 비활성화
    correct = 0
    total = 0
    for data, target in test_loader:
        data, target = data.to(device), target.to(device)
        out = model(data)
        preds = torch.max(out.data, 1)[1]  # 예측된 클래스 값
        total += len(target)
        correct += (preds == target).sum().item()  # 정확도 계산
    print('Test Accuracy: ', 100.*correct/total, '%')

6. Real test

import matplotlib.pyplot as plt
import numpy as np

num_images = 10


model.eval()

# 예측할 데이터와 실제 레이블을 가져오기
data_iter = iter(test_loader)
images, labels = next(data_iter)


images, labels = images.to(device), labels.to(device)

# 모델 예측
with torch.no_grad():
    outputs = model(images)
    _, predicted = torch.max(outputs.data, 1)


fig = plt.figure(figsize=(12, 6))
for idx in range(num_images):
    ax = fig.add_subplot(2, 5, idx + 1)
    ax.imshow(images[idx].cpu().squeeze(), cmap='gray')  # 이미지를 시각화 (채널을 제거)
    ax.set_title(f'True: {labels[idx].item()}, Pred: {predicted[idx].item()}')  # 실제 레이블과 예측값 표시
    ax.axis('off')  # 축 제거

plt.tight_layout()
plt.show()

Hyper parameter tuning

num_classes = 10

learning_rate = 0.001

epoch가 높아질수록 손실함수 값이 낮아짐을 볼 수 있어 epoch를 늘려보았다.

또, batch_size를 줄여서 가중치의 미세한 변화를 반영해보자.

try 1: 기본 시도

batch_size = 128
epochs = 5

try 2

batch_size = 128
epochs = 10 / 20

조금 향상된 결과다. 오히려 떨어질 수도 있게 되는데 이는 과적합됬을 수도 있다.
사실 하나의 파라미터를 조정한다고 해서 눈에 띄게 달라지지 않는다.

추가로

early stopping으로 최적의 epoch를 찾고 싶은데 그 기능이 pytorch에는 없다고해서 구현해봐야 겠다.

-> 생성형 AI의 힘을 받아 구성(cost와 val_loss 둘 중에 하나가 epoch가 증가함에도 커지면 counter++)하여

돌려본 결과 Epoch = 14가 적절하다.

try 3

batch_size = 32
epochs = 5

try 4

batch_size = 256
epochs = 5

batch_size는 큰 데이터에서는 유용하게 쓰일 수 있지만, 7000개 정도의 데이터인 MNIST에서는 오히려 많은게 독이 된 결과다.

+) try 4에서는 조금 아쉬운 결과가 하나 있다.

더 여러 가지 시도를 해보고 싶지만, 시간이 너무 오래 소요되서 CPU로는 못 돌리겠다...!

'AI' 카테고리의 다른 글

[DCASE] CNN.py 파헤치기 (0)	2025.04.05
[SED] DCASE 2023 Task 4 Baseline test (0)	2025.03.14
[FL] Federated Learning tutorial (flwr) (0)	2025.03.03
[ML] 강화학습(RL)의 이해 (0)	2025.02.02
window에서 tensorflow-gpu 사용하기 (0)	2024.05.29

전기전자 힐끔 연구소

[PyTorch] MNIST 문자 인식 모델

1. Pytorch를 사용하기 위한 라이브러리 불러오기

Setting

2. 데이터셋 로드 및 전처리

3. CNN model 정의

4. 학습

5. 평가

6. Real test

Hyper parameter tuning

try 1: 기본 시도

try 2

try 3

try 4

'AI' 카테고리의 다른 글

티스토리툴바

[PyTorch] MNIST 문자 인식 모델

1. Pytorch를 사용하기 위한 라이브러리 불러오기

Setting

2. 데이터셋 로드 및 전처리

3. CNN model 정의

4. 학습

5. 평가

6. Real test

Hyper parameter tuning

try 1: 기본 시도

try 2

try 3

try 4

'AI' 카테고리의 다른 글

관련글

티스토리툴바