遷移學(xué)習(xí)教程
來自這里。
在本教程中,你將學(xué)習(xí)如何使用遷移學(xué)習(xí)來訓(xùn)練你的網(wǎng)絡(luò)。在cs231n notes你可以了解更多關(guān)于遷移學(xué)習(xí)的知識(shí)。
在實(shí)踐中,很少有人從頭開始訓(xùn)練整個(gè)卷積網(wǎng)絡(luò)(使用隨機(jī)初始化),因?yàn)閾碛凶銐虼笮〉臄?shù)據(jù)集相對(duì)較少。相反,通常在非常大的數(shù)據(jù)集(例如ImageNet,它包含120萬幅、1000個(gè)類別的圖像)上對(duì)ConvNet進(jìn)行預(yù)訓(xùn)練,然后使用ConvNet作為初始化或固定的特征提取器來執(zhí)行感興趣的任務(wù)。
兩個(gè)主要的遷移學(xué)習(xí)的場(chǎng)景如下:
- Finetuning the convert:與隨機(jī)初始化不同,我們使用一個(gè)預(yù)訓(xùn)練的網(wǎng)絡(luò)初始化網(wǎng)絡(luò),就像在imagenet 1000 dataset上訓(xùn)練的網(wǎng)絡(luò)一樣。其余的訓(xùn)練看起來和往常一樣。
- ConvNet as fixed feature extractor:在這里,我們將凍結(jié)所有網(wǎng)絡(luò)的權(quán)重,除了最后的全連接層。最后一個(gè)全連接層被替換為一個(gè)具有隨機(jī)權(quán)重的新層,并且只訓(xùn)練這一層。
#!/usr/bin/env python3
# License: BSD
# Author: Sasank Chilamkurthy
from __future__ import print_function,division
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import torchvision
from torchvision import datasets,models,transforms
import matplotlib.pyplot as plt
import time
import os
import copy
plt.ion() # 交互模式
導(dǎo)入數(shù)據(jù)
我們使用torchvision 和torch.utils.data 包來導(dǎo)入數(shù)據(jù)。
我們今天要解決的問題是訓(xùn)練一個(gè)模型來區(qū)分螞蟻和蜜蜂。我們有螞蟻和蜜蜂的訓(xùn)練圖像各120張。每一類有75張驗(yàn)證圖片。通常,如果是從零開始訓(xùn)練,這是一個(gè)非常小的數(shù)據(jù)集。因?yàn)槲覀円褂眠w移學(xué)習(xí),所以我們的例子應(yīng)該具有很好地代表性。
這個(gè)數(shù)據(jù)集是一個(gè)非常小的圖像子集。
你可以從這里下載數(shù)據(jù)并解壓到當(dāng)前目錄。
# 訓(xùn)練數(shù)據(jù)的擴(kuò)充及標(biāo)準(zhǔn)化
# 只進(jìn)行標(biāo)準(zhǔn)化驗(yàn)證
data_transforms = {
'train': transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
}
data_dir = 'data/hymenoptera_data'
image_datasets = {x: datasets.ImageFolder(os.path.join(
data_dir, x), data_transforms[x]) for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(
image_datasets[x], batch_size=4, shuffle=True, num_workers=4) for x in ['train', 'val']}
dataset_size = {x:len(image_datasets[x]) for x in ['train','val']}
class_name = image_datasets['train'].classes
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
可視化一些圖像
為了理解數(shù)據(jù)擴(kuò)充,我們可視化一些訓(xùn)練圖像。
def imshow(inp, title=None):
inp = inp.numpy().transpose((1, 2, 0))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
inp = std * inp + mean
inp = np.clip(inp, 0, 1)
plt.imshow(inp)
if title is not None:
plt.title(title)
plt.pause(10) # 暫停一會(huì),以便更新繪圖
# 獲取一批訓(xùn)練數(shù)據(jù)
inputs, classes = next(iter(dataloaders['train']))
# 從批處理中生成網(wǎng)格
out = torchvision.utils.make_grid(inputs)
imshow(out, title=[class_name[x] for x in classes])

訓(xùn)練模型
現(xiàn)在我們來實(shí)現(xiàn)一個(gè)通用函數(shù)來訓(xùn)練一個(gè)模型。在這個(gè)函數(shù)中,我們將:
- 調(diào)整學(xué)習(xí)率
- 保存最優(yōu)模型
下面例子中,參數(shù)schedule 是來自torch.optim.lr_scheduler 的LR調(diào)度對(duì)象。
def train_model(model, criterion, optimizer, schduler, num_epochs=25):
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs-1))
print('-'*10)
for phase in ['train', 'val']:
if phase == 'train':
schduler.step()
model.train() # 訓(xùn)練模型
else:
model.eval() # 評(píng)估模型
running_loss = 0.0
running_corrects = 0
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
# 零化參數(shù)梯度
optimizer.zero_grad()
# 前向傳遞
# 如果只是訓(xùn)練的話,追蹤歷史
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# 訓(xùn)練時(shí),反向傳播 + 優(yōu)化
if phase == 'train':
loss.backward()
optimizer.step()
# 統(tǒng)計(jì)
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / dataset_size[phase]
epoch_acc = running_corrects.double() / dataset_size[phase]
print('{} Loss: {:.4f} Acc: {:.4f}'.format(
phase, epoch_loss, epoch_acc))
# 很拷貝模型
if phase == 'val' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
print()
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))
# 導(dǎo)入最優(yōu)模型權(quán)重
model.load_state_dict(best_model_wts)
return model
可視化模型預(yù)測(cè)
展示部分預(yù)測(cè)圖像的通用函數(shù):
def visualize_model(model, num_images=6):
was_training = model.training
model.eval()
images_so_far = 0
fig = plt.figure()
with torch.no_grad():
for i, (inputs, labels) in enumerate(dataloaders['val']):
inputs = inputs.to(device)
labels = labels.to(device)
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
for j in range(inputs.size()[0]):
images_so_far += 1
ax = plt.subplot(num_images//2, 2, images_so_far)
ax.axis('off')
ax.set_title('predicted: {}'.format(class_name[preds[j]]))
imshow(inputs.cpu().data[j])
if images_so_far == num_images:
model.train(mode=was_training)
return
model.train(mode=was_training)
Finetuning the convnet
加載預(yù)處理的模型和重置最后的全連接層:
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 2)
model_ft = model_ft.to(device)
criterion = nn.CrossEntropyLoss()
# 優(yōu)化所有參數(shù)
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)
# 沒7次,學(xué)習(xí)率衰減0.1
exp_lr_scheduler = torch.optim.lr_scheduler.StepLR(
optimizer_ft, step_size=7, gamma=0.1)
訓(xùn)練和評(píng)估
在CPU上可能會(huì)花費(fèi)15-25分鐘,但是在GPU上,少于1分鐘。
model_ft = train_model(model_ft, criterion, optimizer_ft,
exp_lr_scheduler, num_epochs=25)
visualize_model(model_ft)

ConvNet作為固定特征提取器
現(xiàn)在,我們凍結(jié)除最后一層外的所有網(wǎng)絡(luò)。我們需要設(shè)置requires_grad=False 來凍結(jié)參數(shù),這樣調(diào)用backward() 時(shí)不計(jì)算梯度。
你可以從這篇文檔中了解更多。
model_conv = models.resnet18(pretrained=True)
for param in model_conv.parameters():
param.requires_grad = False
# 新構(gòu)造模塊的參數(shù)默認(rèn)requires_grad=True
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)
model_conv = model_conv.to(device)
criterion = nn.CrossEntropyLoss()
# 優(yōu)化所有參數(shù)
optimizer_ft = optim.SGD(model_conv.parameters(), lr=0.001, momentum=0.9)
# 沒7次,學(xué)習(xí)率衰減0.1
exp_lr_scheduler = torch.optim.lr_scheduler.StepLR(
optimizer_ft, step_size=7, gamma=0.1)
model_conv = train_model(model_conv, criterion, optimizer_ft,
exp_lr_scheduler, num_epochs=25)
visualize_model(model_conv)
plt.ioff()
plt.show()

|