半小時(shí)學(xué)會(huì) PyTorch Hook

taotao_2016 2019-07-26

展開全文

原創(chuàng)技術(shù)文章，第一時(shí)間獲取

小編推薦：

七期飛躍計(jì)劃還剩2個(gè)名額，聯(lián)系小編，獲取你的專屬算法工程師學(xué)習(xí)計(jì)劃（聯(lián)系小編SIGAI_NO1）

SIGAI特約作者

尹相楠

里昂中央理工在讀博士

提到 hook，我首先想起的是動(dòng)畫《小飛俠》里滑稽的 captain hook，滿滿童年的回憶促使我 P 了張題圖：虎克船長勾著 PyTorch 的 logo。同時(shí)想起的還有大名鼎鼎的胡克定律：Hooke's law（雖然不是一個(gè) hook），當(dāng)年上物理實(shí)驗(yàn)課，看著彈簧測力計(jì)下面的鉤子，聯(lián)想到胡克被牛頓爵士打壓的悲慘一生，不由發(fā)出既生胡何生牛的唏噓……然而本文將介紹的是 PyTorch 中的 hook。

首先貼一段維基百科中對鉤子的定義：

鉤子編程（hooking），也稱作“掛鉤”，是計(jì)算機(jī)程序設(shè)計(jì)術(shù)語，指通過攔截軟件模塊間的函數(shù)調(diào)用、消息傳遞、事件傳遞來修改或擴(kuò)展操作系統(tǒng)、應(yīng)用程序或其他軟件組件的行為的各種技術(shù)。處理被攔截的函數(shù)調(diào)用、事件、消息的代碼，被稱為鉤子（hook）。

Hook 是 PyTorch 中一個(gè)十分有用的特性。利用它，我們可以不必改變網(wǎng)絡(luò)輸入輸出的結(jié)構(gòu)，方便地獲取、改變網(wǎng)絡(luò)中間層變量的值和梯度。這個(gè)功能被廣泛用于可視化神經(jīng)網(wǎng)絡(luò)中間層的 feature、gradient，從而診斷神經(jīng)網(wǎng)絡(luò)中可能出現(xiàn)的問題，分析網(wǎng)絡(luò)有效性。本文將結(jié)合代碼，由淺入深地介紹 pytorch 中 hook 的用法。文章分為三部分：

1.Hook for Tensors ：針對 Tensor 的 hook

2.Hook for Modules：針對例如 nn.Conv2dnn.Linear等網(wǎng)絡(luò)模塊的 hook

3.Guided Backpropagation：利用 Hook 實(shí)現(xiàn)的一段神經(jīng)網(wǎng)絡(luò)可視化代碼

Hook for Tensors

上面的計(jì)算圖中，x y w 為葉子節(jié)點(diǎn)，而 z 為中間變量

在 PyTorch 的計(jì)算圖（computation graph）中，只有葉子結(jié)點(diǎn)（leaf nodes）的變量會(huì)保留梯度。而所有中間變量的梯度只被用于反向傳播，一旦完成反向傳播，中間變量的梯度就將自動(dòng)釋放，從而節(jié)約內(nèi)存。如下面這段代碼所示：

import torch

x = torch.Tensor([0, 1, 2, 3]).requires_grad_()
y = torch.Tensor([4, 5, 6, 7]).requires_grad_()
w = torch.Tensor([1, 2, 3, 4]).requires_grad_()
z = x+y
# z.retain_grad()

o = w.matmul(z)
o.backward()
# o.retain_grad()

print('x.requires_grad:', x.requires_grad) # True
print('y.requires_grad:', y.requires_grad) # True
print('z.requires_grad:', z.requires_grad) # True
print('w.requires_grad:', w.requires_grad) # True
print('o.requires_grad:', o.requires_grad) # True


print('x.grad:', x.grad) # tensor([1., 2., 3., 4.])
print('y.grad:', y.grad) # tensor([1., 2., 3., 4.])
print('w.grad:', w.grad) # tensor([ 4.,  6.,  8., 10.])
print('z.grad:', z.grad) # None
print('o.grad:', o.grad) # None

由于 z 和 o 為中間變量（并非直接指定數(shù)值的變量，而是由別的變量計(jì)算得到的變量），它們雖然 requires_grad 的參數(shù)都是 True，但是反向傳播后，它們的梯度并沒有保存下來，而是直接刪除了，因此是 None。如果想在反向傳播之后保留它們的梯度，則需要特殊指定：把上面代碼中的z.retain_grad() 和 o.retain_grad的注釋去掉，可以得到它們對應(yīng)的梯度，運(yùn)行結(jié)果如下所示：

x.requires_grad: True
y.requires_grad: True
z.requires_grad: True
w.requires_grad: True
o.requires_grad: True
x.grad: tensor([1., 2., 3., 4.])
y.grad: tensor([1., 2., 3., 4.])
w.grad: tensor([ 4.,  6.,  8., 10.])
z.grad: tensor([1., 2., 3., 4.])
o.grad: tensor(1.)

但是，這種加 retain_grad() 的方案會(huì)增加內(nèi)存占用，并不是個(gè)好辦法，對此的一種替代方案，就是用 hook 保存中間變量的梯度。

對于中間變量z，hook 的使用方式為：z.register_hook(hook_fn)，其中 hook_fn為一個(gè)用戶自定義的函數(shù)，其簽名為：

hook_fn(grad) -> Tensor or None

它的輸入為變量 z 的梯度，輸出為一個(gè) Tensor 或者是 None （None 一般用于直接打印梯度）。反向傳播時(shí)，梯度傳播到變量 z，再繼續(xù)向前傳播之前，將會(huì)傳入 hook_fn。如果hook_fn的返回值是 None，那么梯度將不改變，繼續(xù)向前傳播，如果 hook_fn的返回值是 Tensor 類型，則該 Tensor 將取代 z 原有的梯度，向前傳播。

下面的示例代碼中 hook_fn 不改變梯度值，僅僅是打印梯度：

import torch

x = torch.Tensor([0, 1, 2, 3]).requires_grad_()
y = torch.Tensor([4, 5, 6, 7]).requires_grad_()
w = torch.Tensor([1, 2, 3, 4]).requires_grad_()
z = x+y

# ===================
def hook_fn(grad):
    print(grad)

z.register_hook(hook_fn)
# ===================

o = w.matmul(z)

print('=====Start backprop=====')
o.backward()
print('=====End backprop=====')

print('x.grad:', x.grad)
print('y.grad:', y.grad)
print('w.grad:', w.grad)
print('z.grad:', z.grad)

運(yùn)行結(jié)果如下：

import torch

x = torch.Tensor([0, 1, 2, 3]).requires_grad_()
y = torch.Tensor([4, 5, 6, 7]).requires_grad_()
w = torch.Tensor([1, 2, 3, 4]).requires_grad_()
z = x + y


# ===================
def hook_fn(grad):
    g = 2 * grad
    print(g)
    return g


z.register_hook(hook_fn)
# ===================

o = w.matmul(z)

print('=====Start backprop=====')
o.backward()
print('=====End backprop=====')

print('x.grad:', x.grad)
print('y.grad:', y.grad)
print('w.grad:', w.grad)
print('z.grad:', z.grad)

我們發(fā)現(xiàn)，z 綁定了hook_fn后，梯度反向傳播時(shí)將會(huì)打印出 o 對 z 的偏導(dǎo)，和上文中z.retain_grad()方法得到的 z 的偏導(dǎo)一致。

接下來可以試一下，在 hook_fn 中改變梯度值，看看會(huì)有什么結(jié)果。

=====Start backprop=====
tensor([1., 2., 3., 4.])
=====End backprop=====
x.grad: tensor([1., 2., 3., 4.])
y.grad: tensor([1., 2., 3., 4.])
w.grad: tensor([ 4.,  6.,  8., 10.])
z.grad: None

運(yùn)行結(jié)果如下：

=====Start backprop=====
tensor([2., 4., 6., 8.])
=====End backprop=====
x.grad: tensor([2., 4., 6., 8.])
y.grad: tensor([2., 4., 6., 8.])
w.grad: tensor([ 4.,  6.,  8., 10.])
z.grad: None

發(fā)現(xiàn) z 的梯度變?yōu)閮杀逗?，受其影響，x和y的梯度也都變成了原來的兩倍。

在實(shí)際代碼中，為了方便，也可以用 lambda 表達(dá)式來代替函數(shù)，簡寫為如下形式：

import torch

x = torch.Tensor([0, 1, 2, 3]).requires_grad_()
y = torch.Tensor([4, 5, 6, 7]).requires_grad_()
w = torch.Tensor([1, 2, 3, 4]).requires_grad_()
z = x + y

# ===================
z.register_hook(lambda x: 2*x)
z.register_hook(lambda x: print(x))
# ===================

o = w.matmul(z)

print('=====Start backprop=====')
o.backward()
print('=====End backprop=====')

print('x.grad:', x.grad)
print('y.grad:', y.grad)
print('w.grad:', w.grad)
print('z.grad:', z.grad)

運(yùn)行結(jié)果和上面的代碼相同，我們發(fā)現(xiàn)一個(gè)變量可以綁定多個(gè) hook_fn，反向傳播時(shí)，它們按綁定順序依次執(zhí)行。例如上面的代碼中，第一個(gè)綁定的 hook_fn把 z的梯度乘以2，第二個(gè)綁定的hook_fn打印z的梯度。因此反向傳播時(shí)，也是按照這個(gè)順序執(zhí)行的，打印出來的 z的梯度值，是其原本梯度值的兩倍。

至此，針對對 Tensor 的 hook 就介紹完了。然而它的使用場景一般不多，最常用的 hook 是針對神經(jīng)網(wǎng)絡(luò)模塊的。

Hook for Modules

網(wǎng)絡(luò)模塊 module 不像上一節(jié)中的 Tensor，擁有顯式的變量名可以直接訪問，而是被封裝在神經(jīng)網(wǎng)絡(luò)中間。我們通常只能獲得網(wǎng)絡(luò)整體的輸入和輸出，對于夾在網(wǎng)絡(luò)中間的模塊，我們不但很難得知它輸入/輸出的梯度，甚至連它輸入輸出的數(shù)值都無法獲得。除非設(shè)計(jì)網(wǎng)絡(luò)時(shí)，在 forward 函數(shù)的返回值中包含中間 module 的輸出，或者用很麻煩的辦法，把網(wǎng)絡(luò)按照 module 的名稱拆分再組合，讓中間層提取的 feature 暴露出來。

為了解決這個(gè)麻煩，PyTorch 設(shè)計(jì)了兩種 hook：register_forward_hook 和register_backward_hook，分別用來獲取正/反向傳播時(shí)，中間層模塊輸入和輸出的 feature/gradient，大大降低了獲取模型內(nèi)部信息流的難度。

register_forward_hook的作用是獲取前向傳播過程中，各個(gè)網(wǎng)絡(luò)模塊的輸入和輸出。對于模塊module，其使用方式為：module.register_forward_hook(hook_fn) 。其中 hook_fn的簽名為：

hook_fn(module, input, output) -> None

它的輸入變量分別為：模塊，模塊的輸入，模塊的輸出，和對 Tensor 的 hook 不同，forward hook 不返回任何值，也就是說不能用它來修改輸入或者輸出的值，但借助這個(gè) hook，我們可以方便地用預(yù)訓(xùn)練的神經(jīng)網(wǎng)絡(luò)提取特征，而不用改變預(yù)訓(xùn)練網(wǎng)絡(luò)的結(jié)構(gòu)。下面提供一段示例代碼：

import torch
from torch import nn

# 首先我們定義一個(gè)模型
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(3, 4)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(4, 1)
        self.initialize()
    
    # 為了方便驗(yàn)證，我們將指定特殊的weight和bias
    def initialize(self):
        with torch.no_grad():
            self.fc1.weight = torch.nn.Parameter(
                torch.Tensor([[1., 2., 3.],
                              [-4., -5., -6.],
                              [7., 8., 9.],
                              [-10., -11., -12.]]))

            self.fc1.bias = torch.nn.Parameter(torch.Tensor([1.0, 2.0, 3.0, 4.0]))
            self.fc2.weight = torch.nn.Parameter(torch.Tensor([[1.0, 2.0, 3.0, 4.0]]))
            self.fc2.bias = torch.nn.Parameter(torch.Tensor([1.0]))

    def forward(self, x):
        o = self.fc1(x)
        o = self.relu1(o)
        o = self.fc2(o)
        return o

# 全局變量，用于存儲(chǔ)中間層的 feature
total_feat_out = []
total_feat_in = []

# 定義 forward hook function
def hook_fn_forward(module, input, output):
    print(module) # 用于區(qū)分模塊
    print('input', input) # 首先打印出來
    print('output', output)
    total_feat_out.append(output) # 然后分別存入全局 list 中
    total_feat_in.append(input)


model = Model()

modules = model.named_children() #
for name, module in modules:
    module.register_forward_hook(hook_fn_forward)
    # module.register_backward_hook(hook_fn_backward)

# 注意下面代碼中 x 的維度，對于linear module，輸入一定是大于等于二維的
# （第一維是 batch size）。在 forward hook 中看不出來，但是 backward hook 中，
# 得到的梯度完全不對。
# 有一篇 hook 的教程就是這里出了錯(cuò)，作者還強(qiáng)行解釋，遺毒無窮，

x = torch.Tensor([[1.0, 1.0, 1.0]]).requires_grad_() 
o = model(x)
o.backward()

print('==========Saved inputs and outputs==========')
for idx in range(len(total_feat_in)):
    print('input: ', total_feat_in[idx])
    print('output: ', total_feat_out[idx])

運(yùn)行結(jié)果為：

import torch

x = torch.Tensor([0, 1, 2, 3]).requires_grad_()
y = torch.Tensor([4, 5, 6, 7]).requires_grad_()
w = torch.Tensor([1, 2, 3, 4]).requires_grad_()
z = x + y

# ===================
z.register_hook(lambda x: 2*x)
z.register_hook(lambda x: print(x))
# ===================

o = w.matmul(z)

print('=====Start backprop=====')
o.backward()
print('=====End backprop=====')

print('x.grad:', x.grad)
print('y.grad:', y.grad)
print('w.grad:', w.grad)
print('z.grad:', z.grad)

讀者可以用筆驗(yàn)證一下，這里限于篇幅，就不做驗(yàn)證了。

和register_forward_hook相似，register_backward_hook 的作用是獲取神經(jīng)網(wǎng)絡(luò)反向傳播過程中，各個(gè)模塊輸入端和輸出端的梯度值。對于模塊 module，其使用方式為：module.register_backward_hook(hook_fn) 。其中hook_fn的函數(shù)簽名為：

hook_fn(module, grad_input, grad_output) -> Tensor or None

它的輸入變量分別為：模塊，模塊輸入端的梯度，模塊輸出端的梯度。需要注意的是，這里的輸入端和輸出端，是站在前向傳播的角度的，而不是反向傳播的角度。例如線性模塊：o=W*x+b，其輸入端為 W，x 和 b，輸出端為 o。

如果模塊有多個(gè)輸入或者輸出的話，grad_input和grad_output可以是 tuple 類型。對于線性模塊：o=W*x+b ，它的輸入端包括了W、x 和 b 三部分，因此 grad_input 就是一個(gè)包含三個(gè)元素的 tuple。

這里注意和 forward hook 的不同：

1.在 forward hook 中，input 是 x，而不包括 W 和 b。

2.返回 Tensor 或者 None，backward hook 函數(shù)不能直接改變它的輸入變量，但是可以返回新的 grad_input，反向傳播到它上一個(gè)模塊。

Talk is cheap，下面看示例代碼：

import torch
from torch import nn


class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(3, 4)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(4, 1)
        self.initialize()

    def initialize(self):
        with torch.no_grad():
            self.fc1.weight = torch.nn.Parameter(
                torch.Tensor([[1., 2., 3.],
                              [-4., -5., -6.],
                              [7., 8., 9.],
                              [-10., -11., -12.]]))

            self.fc1.bias = torch.nn.Parameter(torch.Tensor([1.0, 2.0, 3.0, 4.0]))
            self.fc2.weight = torch.nn.Parameter(torch.Tensor([[1.0, 2.0, 3.0, 4.0]]))
            self.fc2.bias = torch.nn.Parameter(torch.Tensor([1.0]))

    def forward(self, x):
        o = self.fc1(x)
        o = self.relu1(o)
        o = self.fc2(o)
        return o


total_grad_out = []
total_grad_in = []


def hook_fn_backward(module, grad_input, grad_output):
    print(module) # 為了區(qū)分模塊
    # 為了符合反向傳播的順序，我們先打印 grad_output
    print('grad_output', grad_output) 
    # 再打印 grad_input
    print('grad_input', grad_input)
    # 保存到全局變量
    total_grad_in.append(grad_input)
    total_grad_out.append(grad_output)


model = Model()

modules = model.named_children()
for name, module in modules:
    module.register_backward_hook(hook_fn_backward)

# 這里的 requires_grad 很重要，如果不加，backward hook
# 執(zhí)行到第一層，對 x 的導(dǎo)數(shù)將為 None，某英文博客作者這里疏忽了
# 此外再強(qiáng)調(diào)一遍 x 的維度，一定不能寫成 torch.Tensor([1.0, 1.0, 1.0]).requires_grad_()
# 否則 backward hook 會(huì)出問題。
x = torch.Tensor([[1.0, 1.0, 1.0]]).requires_grad_()
o = model(x)
o.backward()

print('==========Saved inputs and outputs==========')
for idx in range(len(total_grad_in)):
    print('grad output: ', total_grad_out[idx])
    print('grad input: ', total_grad_in[idx])

運(yùn)行后的輸出為：

Linear(in_features=3, out_features=4, bias=True)
input (tensor([[1., 1., 1.]], requires_grad=True),)
output tensor([[  7., -13.,  27., -29.]], grad_fn=<AddmmBackward>)
ReLU()
input (tensor([[  7., -13.,  27., -29.]], grad_fn=<AddmmBackward>),)
output tensor([[ 7.,  0., 27.,  0.]], grad_fn=<ThresholdBackward0>)
Linear(in_features=4, out_features=1, bias=True)
input (tensor([[ 7.,  0., 27.,  0.]], grad_fn=<ThresholdBackward0>),)
output tensor([[89.]], grad_fn=<AddmmBackward>)
==========Saved inputs and outputs==========
input:  (tensor([[1., 1., 1.]], requires_grad=True),)
output:  tensor([[  7., -13.,  27., -29.]], grad_fn=<AddmmBackward>)
input:  (tensor([[  7., -13.,  27., -29.]], grad_fn=<AddmmBackward>),)
output:  tensor([[ 7.,  0., 27.,  0.]], grad_fn=<ThresholdBackward0>)
input:  (tensor([[ 7.,  0., 27.,  0.]], grad_fn=<ThresholdBackward0>),)
output:  tensor([[89.]], grad_fn=<AddmmBackward>)

讀者可以自己用筆算一遍，驗(yàn)證正確性。需要注意的是，對線性模塊，其grad_input 是一個(gè)三元組，排列順序分別為：對 bias 的導(dǎo)數(shù)，對輸入 x 的導(dǎo)數(shù)，對權(quán)重 W 的導(dǎo)數(shù)。

注意事項(xiàng)

register_backward_hook只能操作簡單模塊，而不能操作包含多個(gè)子模塊的復(fù)雜模塊。如果對復(fù)雜模塊用了 backward hook，那么我們只能得到該模塊最后一次簡單操作的梯度信息。對于上面的代碼稍作修改，不再遍歷各個(gè)子模塊，而是把 model 整體綁在一個(gè) hook_fn_backward上：

model = Model()model.register_backward_hook(hook_fn_backward)

輸出結(jié)果如下：

Model(
  (fc1): Linear(in_features=3, out_features=4, bias=True)
  (relu1): ReLU()
  (fc2): Linear(in_features=4, out_features=1, bias=True)
)
grad_output (tensor([[1.]]),)
grad_input (tensor([1.]), tensor([[1., 2., 3., 4.]]), tensor([[ 7.],
        [ 0.],
        [27.],
        [ 0.]]))
==========Saved inputs and outputs==========
grad output:  (tensor([[1.]]),)
grad input:  (tensor([1.]), tensor([[1., 2., 3., 4.]]), tensor([[ 7.],
        [ 0.],
        [27.],
        [ 0.]]))

我們發(fā)現(xiàn)，程序只輸出了 fc2 的梯度信息。

除此之外，有人還總結(jié)（吐槽）了 backward hook 在全連接層和卷積層表現(xiàn)不一致的地方（Feedback about PyTorch register_backward_hook · Issue #12331 · pytorch/pytorch）

1.形狀

1.1在卷積層中，weight 的梯度和 weight 的形狀相同

1.2在全連接層中，weight 的梯度的形狀是 weight 形狀的轉(zhuǎn)秩（觀察上文中代碼的輸出可以驗(yàn)證）

2.grad_input tuple 中各梯度的順序

2.1在卷積層中，bias 的梯度位于tuple 的末尾：grad_input = (對feature的導(dǎo)數(shù)，對權(quán)重 W 的導(dǎo)數(shù)，對 bias 的導(dǎo)數(shù))

2.2在全連接層中，bias 的梯度位于 tuple 的開頭：grad_input=(對 bias 的導(dǎo)數(shù)，對 feature 的導(dǎo)數(shù)，對 W 的導(dǎo)數(shù))

3.當(dāng) batchsize>1時(shí)，對 bias 的梯度處理不同

3.1在卷積層，對 bias 的梯度為整個(gè) batch 的數(shù)據(jù)在 bias 上的梯度之和：grad_input = (對feature的導(dǎo)數(shù)，對權(quán)重 W 的導(dǎo)數(shù)，對 bias 的導(dǎo)數(shù))

3.2在全連接層，對 bias 的梯度是分開的，bach 中每條數(shù)據(jù)，對應(yīng)一個(gè) bias 的梯度：grad_input = ((data1 對 bias 的導(dǎo)數(shù)，data2 對 bias 的導(dǎo)數(shù) ...)，對 feature 的導(dǎo)數(shù)，對 W 的導(dǎo)數(shù))

Guided Backpropagation

通過上文的介紹，我們已經(jīng)掌握了PyTorch 中各種 hook 的使用方法。接下來，我們將用這個(gè)技術(shù)寫一小段代碼（從 kaggle 上扒的，稍作了一點(diǎn)修改），來可視化預(yù)訓(xùn)練的神經(jīng)網(wǎng)絡(luò)。

Guided Backpropagation 算法來自 ICLR 2015 的文章：

Striving for Simplicity: The All Convolutional Net。

其基本原理和大多數(shù)可視化算法類似：通過反向傳播，計(jì)算需要可視化的輸出或者feature map 對網(wǎng)絡(luò)輸入的梯度，歸一化該梯度，作為圖片顯示出來。梯度大的部分，反映了輸入圖片該區(qū)域?qū)δ繕?biāo)輸出的影響力較大，反之影響力小。借此，我們可以了解到神經(jīng)網(wǎng)絡(luò)作出的判斷，到底是受圖片中哪些區(qū)域所影響，或者目標(biāo) feature map 提取的是輸入圖片中哪些區(qū)域的特征。Guided Backpropagation 對反向傳播過程中 ReLU 的部分做了微小的調(diào)整。

我們先回憶傳統(tǒng)的反向傳播算法：假如第 l 層為 ReLU，那么前向傳播公式為：

當(dāng)輸入 ReLU 的值大于0時(shí)，其輸出對輸入的導(dǎo)數(shù)為 1，當(dāng)輸入 ReLU 的值小于等于 0 時(shí)，其輸出對輸入的導(dǎo)數(shù)為 0。根據(jù)鏈?zhǔn)椒▌t，其反向傳播公式如下：

即 ReLU 層反向傳播時(shí)，只有輸入大于 0 的位置，才會(huì)有梯度傳回來，輸入小于等于 0 的位置不再有梯度反傳。

Guided Backpropagation 的創(chuàng)新在于，它反向傳播時(shí)，只傳播梯度大于零的部分，拋棄梯度小于零的部分。這很好理解，因?yàn)槲覀兿Ｍ氖牵业捷斎雸D片中對目標(biāo)輸出有正面作用的區(qū)域，而不是對目標(biāo)輸出有負(fù)面作用的區(qū)域。其公式如下：

下面是代碼部分：

import torch
from torch import nn


class Guided_backprop():
    def __init__(self, model):
        self.model = model
        self.image_reconstruction = None
        self.activation_maps = []
        self.model.eval()
        self.register_hooks()

    def register_hooks(self):
        def first_layer_hook_fn(module, grad_in, grad_out):
            # 在全局變量中保存輸入圖片的梯度，該梯度由第一層卷積層
            # 反向傳播得到，因此該函數(shù)需綁定第一個(gè) Conv2d Layer
            self.image_reconstruction = grad_in[0]

        def forward_hook_fn(module, input, output):
            # 在全局變量中保存 ReLU 層的前向傳播輸出
            # 用于將來做 guided backpropagation
            self.activation_maps.append(output)

        def backward_hook_fn(module, grad_in, grad_out):
            # ReLU 層反向傳播時(shí)，用其正向傳播的輸出作為 guide
            # 反向傳播和正向傳播相反，先從后面?zhèn)髌?/span>
            grad = self.activation_maps.pop() 
            # ReLU 正向傳播的輸出要么大于0，要么等于0，
            # 大于 0 的部分，梯度為1，
            # 等于0的部分，梯度還是 0
            grad[grad > 0] = 1 
            
            # grad_in[0] 表示 feature 的梯度，只保留大于 0 的部分
            positive_grad_in = torch.clamp(grad_in[0], min=0.0)
            # 創(chuàng)建新的輸入端梯度
            new_grad_in = positive_grad_in * grad

            # ReLU 不含 parameter，輸入端梯度是一個(gè)只有一個(gè)元素的 tuple
            return (new_grad_in,)


        # 獲取 module，這里只針對 alexnet，如果是別的，則需修改
        modules = list(self.model.features.named_children())

        # 遍歷所有 module，對 ReLU 注冊 forward hook 和 backward hook
        for name, module in modules:
            if isinstance(module, nn.ReLU):
                module.register_forward_hook(forward_hook_fn)
                module.register_backward_hook(backward_hook_fn)

        # 對第1層卷積層注冊 hook
        first_layer = modules[0][1]
        first_layer.register_backward_hook(first_layer_hook_fn)

    def visualize(self, input_image, target_class):
        # 獲取輸出，之前注冊的 forward hook 開始起作用
        model_output = self.model(input_image)
        self.model.zero_grad()
        pred_class = model_output.argmax().item()
        
        # 生成目標(biāo)類 one-hot 向量，作為反向傳播的起點(diǎn)
        grad_target_map = torch.zeros(model_output.shape,
                                      dtype=torch.float)
        if target_class is not None:
            grad_target_map[0][target_class] = 1
        else:
            grad_target_map[0][pred_class] = 1
        
        # 反向傳播，之前注冊的 backward hook 開始起作用
        model_output.backward(grad_target_map)
        # 得到 target class 對輸入圖片的梯度，轉(zhuǎn)換成圖片格式
        result = self.image_reconstruction.data[0].permute(1,2,0)
        return result.numpy()

def normalize(I):
    # 歸一化梯度map，先歸一化到 mean=0 std=1
    norm = (I-I.mean())/I.std()
    # 把 std 重置為 0.1，讓梯度map中的數(shù)值盡可能接近 0
    norm = norm * 0.1
    # 均值加 0.5，保證大部分的梯度值為正
    norm = norm + 0.5
    # 把 0，1 以外的梯度值分別設(shè)置為 0 和 1
    norm = norm.clip(0, 1)
    return norm




if __name__=='__main__':
    from torchvision import models, transforms
    from PIL import Image
    import matplotlib.pyplot as plt

    image_path = './cat.png'
    I = Image.open(image_path).convert('RGB')
    means = [0.485, 0.456, 0.406]
    stds = [0.229, 0.224, 0.225]
    size = 224

    transform = transforms.Compose([
        transforms.Resize(size),
        transforms.CenterCrop(size),
        transforms.ToTensor(),
        transforms.Normalize(means, stds)
    ])

    tensor = transform(I).unsqueeze(0).requires_grad_()

    model = models.alexnet(pretrained=True)

    guided_bp = Guided_backprop(model)
    result = guided_bp.visualize(tensor, None)

    result = normalize(result)
    plt.imshow(result)
    plt.show()

    print('END')

程序中用到的圖為：

運(yùn)行結(jié)果為：

從圖中可以看出，小貓的腦袋部分，尤其是眼睛、鼻子、嘴巴和耳朵的梯度很大，而背景等部分，梯度很小，正是這些部分讓神經(jīng)網(wǎng)絡(luò)認(rèn)出該圖片為小貓的。

Guided Backpropagation 的缺點(diǎn)是對 target class 不敏感，設(shè)置不同的 target class，最終可能得到的 gradient map 差別不大?；诖?，有 Grad-CAM (Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization) 等更高級(jí)的優(yōu)化方法，限于篇幅不做介紹。

總結(jié)

本文介紹了 PyTorch 中的 hook 技術(shù)，從針對 Tensor 的 hook，到針對 Module 的 hook，最終詳細(xì)解讀了利用 hook 技術(shù)可視化神經(jīng)網(wǎng)絡(luò)的代碼。感謝大家的閱讀，還望各位不吝批評(píng)指教。

本文為SIGAI原創(chuàng)

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來自： taotao_2016 > 《it》

舉報(bào)/認(rèn)領(lǐng)