小白的进阶之路系列之十四—-人工智能从初步到精通pytorch综合运用的讲解第七部分

通过示例学习PyTorch

本教程通过独立的示例介绍PyTorch的基本概念。

PyTorch的核心提供了两个主要特性：

一个n维张量，类似于numpy，但可以在gpu上运行

用于构建和训练神经网络的自动微分

我们将使用一个三阶多项式来拟合问题 y = s i n ( x ) y=sin(x) y=sin(x)，作为我们的例子。网络将有四个参数，并将通过最小化网络输出与真实输出之间的欧几里得距离来训练梯度下降以拟合随机数据。

张量

热身:numpy

在介绍PyTorch之前，我们将首先使用numpy实现网络。

Numpy提供了一个n维数组对象，以及许多用于操作这些数组的函数。Numpy是科学计算的通用框架；它对计算图、深度学习或梯度一无所知。然而，我们可以很容易地使用numpy来拟合正弦函数的三阶多项式，通过使用numpy操作手动实现网络的向前和向后传递：

# -*- coding: utf-8 -*-
import numpy as np
import math

# Create random input and output data
x = np.linspace(-math.pi, math.pi, 2000)
y = np.sin(x)

# Randomly initialize weights
a = np.random.randn()
b = np.random.randn()
c = np.random.randn()
d = np.random.randn()

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y
    # y = a + b x + c x^2 + d x^3
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # Update weights
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f'Result: y = {
              a} + {
              b} x + {
              c} x^2 + {
              d} x^3')

输出为：

99 245.74528761103798
199 173.39611738368987
299 123.2440921752332
399 88.44419578924933
499 64.27442600584153
599 47.47246235496093
699 35.78209704113
799 27.641356008462985
899 21.967808007623955
999 18.010612761588764
1099 15.248450766474248
1199 13.319031005970338
1299 11.970356120179034
1399 11.026994992046731
1499 10.366717947337667
1599 9.904294863860954
1699 9.58024957389446
1799 9.353047171709575
1899 9.193661409658082
1999 9.081793826477744
Result: y = -0.016362745280289488 + 0.8518166048235671 x + 0.0028228458381066635 x^2 + -0.09262995903014938 x^3

PyTorch:张量

Numpy是一个很好的框架，但是它不能利用gpu来加速它的数值计算。对于现代深度神经网络，gpu通常提供50倍或更高的速度，因此不幸的是，numpy不足以用于现代深度学习。

这里我们介绍PyTorch最基本的概念：张量（Tensor）。PyTorch张量在概念上与numpy数组相同：张量是一个n维数组，PyTorch提供了许多函数来操作这些张量。在幕后，张量可以跟踪计算图和梯度，但它们作为科学计算的通用工具也很有用。

与numpy不同的是，PyTorch张量可以利用gpu来加速它们的数值计算。要在GPU上运行PyTorch Tensor，你只需要指定正确的设备。

这里我们使用PyTorch张量来拟合正弦函数的三阶多项式。像上面的numpy示例一样，我们需要手动实现网络中的正向和反向传递：

# -*- coding: utf-8 -*-

import torch
import math


dtype = torch.float
# device = torch.device("cpu")
device = torch.device("cuda:0") # Uncomment this to run on GPU

# Create random input and output data
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# Randomly initialize weights
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # Update weights using gradient descent
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d


print(f'Result: y = {
              a.item()} + {
              b.item()} x + {
              c.item()} x^2 + {
              d.item()} x^3')

输出为：

99 80.36772918701172
199 56.39781951904297
299 40.46946716308594
399 29.881393432617188
499 22.840898513793945
599 18.157623291015625
699 15.041072845458984
799 12.966400146484375
899 11.584661483764648
999 10.664037704467773
1099 10.050336837768555
1199 9.641047477722168
1299 9.367938041687012
1399 9.185595512390137
1499 9.063775062561035
1599 8.982349395751953
1699 8.92789077758789
1799 8.89144515991211
1899 8.867034912109375
1999 8.85067367553711
Result: y = 0.0030112862586975098 + 0.8616413474082947 x + -0.0005194980767555535 x^2 + -0.09402744472026825 x^3

Autograd

PyTorch：张量和自梯度

在上面的例子中，我们必须手动实现神经网络的向前和向后传递。对于一个小型的两层网络来说，手动实现向后传递并不是什么大问题，但对于大型复杂网络来说，可能很快就会变得非常棘手。

值得庆幸的是，我们可以使用自动微分来自动计算神经网络中的逆向传递。PyTorch中的autograd包提供了这个功能。当使用autograd时，网络的前向传递将定义一个计算图；图中的节点将是张量，而边将是由输入张量产生输出张量的函数。然后通过此图进行反向传播，可以轻松地计算梯度。

这听起来很复杂，但在实践中使用起来很简单。每个张量表示计算图中的一个节点。如果x是一个张量，x.requires_grad=True，则x.grad是另一个张量，它持有x相对于某个标量值的梯度。

在这里，我们使用PyTorch张量和autograd来实现我们的拟合正弦波与三阶多项式的例子；现在我们不再需要手动实现通过网络的反向传递：

import torch
import math

if __name__ == '__main__':
    
    print(torch.__version__)

    # We want to be able to train our model on an `accelerator <https://pytorch.org/docs/stable/torch.html#accelerators>`__
    # such as CUDA, MPS, MTIA, or XPU. If the current accelerator is available, we will use it. Otherwise, we use the CPU.

    dtype = torch.float
    device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
    print(f"Using {
              device} device")
    torch.set_default_device(device)

    # Create Tensors to hold input and outputs.
    # By default, requires_grad=False, which indicates that we do not need to
    # compute gradients with respect to these Tensors during the backward pass.
    x = torch.linspace(-math.pi, math.pi, 2000, dtype=dtype)
    y = torch.sin(x)

    # Create random Tensors for weights. For a third order polynomial, we need
    # 4 weights: y = a + b x + c x^2 + d x^3
    # Setting requires_grad=True indicates that we want to compute gradients with
    # respect to these Tensors during the backward pass.
    a = torch.randn((), dtype=dtype, requires_grad=True)
    b = torch.randn((), dtype=dtype, requires_grad=True)
    c = torch.randn((), dtype=dtype, requires_grad=True)
    d = torch.randn((), dtype=dtype, requires_grad=True)

    learning_rate = 1e-6
    for t in range(2000):
        # Forward pass: compute predicted y using operations on Tensors.
        y_pred = a + b * x + c * x ** 2 + d * x ** 3

        # Compute and print loss using operations on Tensors.
        # Now loss is a Tensor of shape (1,)
        # loss.item() gets the scalar value held in the loss.
        loss = (y_pred - y).pow(2).sum()
        if t % 100 == 99:
            print(t, loss.item())

        # Use autograd to compute the backward pass. This call will compute the
        # gradient of loss with respect to all Tensors with requires_grad=True.
        # After this call a.grad, b.grad. c.grad and d.grad will be Tensors holding
        # the gradient of the loss with respect to a, b, c, d respectively.
        loss.backward()

        # Manually update weights using gradient descent. Wrap in torch.no_grad()
        # because weights have requires_grad=True, but we don't need to track this
        # in autograd.
        with torch.no_grad():
            a -= learning_rate * a.grad
            b -= learning_rate * b.grad
            c -= learning_rate * c.grad
            d -= learning_rate * d.grad

            # Manually zero the gradients after updating weights
            a.grad = None
            b.grad = None
            c.grad = None
            d.grad = None

    print(f'Result: y = {
              a.item()} + {
              b.item()} x + {
              c.item()} x^2 + {
              d.item()} x^3')

输出为：

2.7.1+cu118
Using cuda device
99 972.4776611328125
199 683.9300537109375
299 482.0523986816406
399 340.72113037109375
499 241.7181396484375
599 172.326171875
699 123.66252136230469
799 89.5176010131836
899 65.54801940917969
999 48.71359634399414
1099 36.88516616821289
1199 28.570558547973633
1299 22.72364044189453
1399 18.610462188720703
1499 15.715910911560059
1599 13.67824649810791
1699 12.243351936340332
1799 11.232608795166016
1899 10.520437240600586
1999 10.01850700378418
Result: y = -0.03580974042415619 + 0.8494142293930054 x + 0.00617777556180954 x^2 + -0.09228824824094772 x^3

PyTorch：定义新的 autograd 函数

在底层，每个原始的 autograd 算子实际上是作用于张量的两个函数。forward 函数根据输入张量计算输出张量。backward 函数接收输出张量关于某个标量值的梯度，并计算输入张量关于同一标量值的梯度。

在 PyTorch 中，我们可以通过定义一个继承自 torch.autograd.Function 的子类，并实现 forward 和 backward 函数来轻松定义我们自己的 autograd 算子。然后，我们可以通过构造一个实例并像调用函数一样调用它来使用我们新的 autograd 算子，传入包含输入数据的张量。

在此示例中，我们将模型定义为 $y=a+bP3(c+dx)y $而不是 y = a + b x + c x 2 + d x 3 y=a+bx+cx^2+dx^3 y=a+bx+cx2+dx3，其中 P 3 ( x ) = 1 2 ( 5 x 3 − 3 x ) P3(x)=frac{1}{2}(5x^3−3x) P3(x)=21(5×3−3x) 是三阶勒让德多项式。我们编写了自己的自定义 autograd 函数来计算 P3P*3 的前向和后向传播，并使用它来实现我们的模型。

# -*- coding: utf-8 -*-
import torch
import math


class LegendrePolynomial3(torch.autograd.Function):
    """
    We can implement our own custom autograd Functions by subclassing
    torch.autograd.Function and implementing the forward and backward passes
    which operate on Tensors.
    """

    @staticmethod
    def forward(ctx, input):
        """
        In the forward pass we receive a Tensor containing the input and return
        a Tensor containing the output. ctx is a context object that can be used
        to stash information for backward computation. You can cache arbitrary
        objects for use in the backward pass using the ctx.save_for_backward method.
        """
        ctx.save_for_backward(input)
        return 0.5 * (5 * input ** 3 - 3 * input)

    @staticmethod
    def backward(ctx, grad_output):
        """
        In the backward pass we receive a Tensor containing the gradient of the loss
        with respect to the output, and we need to compute the gradient of the loss
        with respect to the input.
        """
        input, = ctx.saved_tensors
        return grad_output * 1.5 * (5 * input ** 2 - 1)


dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0")  # Uncomment this to run on GPU

# Create Tensors to hold input and outputs.
# By default, requires_grad=False, which indicates that we do not need to
# compute gradients with respect to these Tensors during the backward pass.
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# Create random Tensors for weights. For this example, we need
# 4 weights: y = a + b * P3(c + d * x), these weights need to be initialized
# not too far from the correct result to ensure convergence.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
a = torch.full((), 0.0, device=device, dtype=dtype, requires_grad=True)
b = torch.full((), -1.0, device=device, dtype=dtype, requires_grad=True)
c = torch.full((), 0.0, device=device, dtype=dtype, requires_grad=True)
d = torch.full((), 0.3, device=device, dtype=dtype, requires_grad=True)

learning_rate = 5e-6
for t in range(2000):
    # To apply our Function, we use Function.apply method. We alias this as 'P3'.
    P3 = LegendrePolynomial3.apply

    # Forward pass: compute predicted y using operations; we compute
    # P3 using our custom autograd operation.
    y_pred = a + b * P3(c + d * x)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # Use autograd to compute the backward pass.
    loss.backward()

    # Update weights using gradient descent
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # Manually zero the gradients after updating weights
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f'Result: y = {
              a.item()} + {
              b.item()} * P3({
              c.item()} + {
              d.item()} x)')

输出为：

99 209.95834350585938
199 144.66018676757812
299 100.70249938964844
399 71.03519439697266
499 50.97850799560547
599 37.403133392333984
699 28.206867218017578
799 21.97318458557129
899 17.745729446411133
999 14.877889633178711
1099 12.93176555633545
1199 11.610918045043945
1299 10.714258193969727
1399 10.10548210144043
1499 9.692106246948242
1599 9.411375999450684
1699 9.220745086669922
1799 9.091286659240723
1899 9.003362655639648
1999 8.943641662597656
Result: y = -2.9753338681715036e-10 + -2.208526849746704 * P3(-1.1693186696692948e-10 + 0.2554861009120941 x)

`nn` 模块

PyTorch：`nn`

计算图和 autograd 是一个非常强大的范式，用于定义复杂的算子和自动求导；然而，对于大型神经网络来说，原始的 autograd 可能有点过于底层。

在构建神经网络时，我们经常考虑将计算组织成层，其中一些层具有在学习过程中将被优化的可学习参数。

在 TensorFlow 中，像 Keras、TensorFlow-Slim 和 TFLearn 这样的包提供了对原始计算图的更高级抽象，这对于构建神经网络很有用。

在 PyTorch 中，nn 包用于达到同样的目的。nn 包定义了一系列模块 (Modules)，它们大致相当于神经网络的层。一个模块接收输入张量并计算输出张量，但也可能持有内部状态，例如包含可学习参数的张量。nn 包还定义了一系列在训练神经网络时常用的有用损失函数。

在此示例中，我们使用 nn 包来实现我们的多项式模型网络。

# -*- coding: utf-8 -*-
import torch
import math


# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# For this example, the output y is a linear function of (x, x^2, x^3), so
# we can consider it as a linear layer neural network. Let's prepare the
# tensor (x, x^2, x^3).
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

# In the above code, x.unsqueeze(-1) has shape (2000, 1), and p has shape
# (3,), for this case, broadcasting semantics will apply to obtain a tensor
# of shape (2000, 3) 

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. The Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
# The Flatten layer flatens the output of the linear layer to a 1D tensor,
# to match the shape of `y`.
model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for t in range(2000):

    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Tensor of input data to the Module and it produces
    # a Tensor of output data.
    y_pred = model(xx)

    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the
    # loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Tensors with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its gradients like we did before.
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

# You can access the first layer of `model` like accessing the first item of a list
linear_layer = model[0]

# For linear layer, its parameters are stored as `weight` and `bias`.
print(f'Result: y = {
              linear_layer.bias.item()} + {
              linear_layer.weight[:, 0].item()} x + {
              linear_layer.weight[:, 1].item()} x^2 + {
              linear_layer.weight[:, 2].item()} x^3')

99 1249.5333251953125
199 836.2708740234375
299 560.94580078125
399 377.43267822265625
499 255.0554656982422
599 173.4051971435547
699 118.89861297607422
799 82.49130249023438
899 58.158931732177734
999 41.886512756347656
1099 30.997180938720703
1199 23.705059051513672
1299 18.818429946899414
1399 15.541385650634766
1499 13.342010498046875
1599 11.864748001098633
1699 10.871685028076172
1799 10.20351791381836
1899 9.753573417663574
1999 9.450298309326172
Result: y = -0.01542313676327467 + 0.8367996215820312 x + 0.0026607480831444263 x^2 + -0.09049391746520996 x^3

PyTorch：optim

到目前为止，我们通过使用 torch.no_grad() 手动改变持有可学习参数的张量来更新模型的权重。对于随机梯度下降等简单的优化算法来说，这不是一个很大的负担，但在实践中，我们经常使用更复杂的优化器来训练神经网络，例如 AdaGrad、RMSProp、Adam 等。

PyTorch 中的 optim 包抽象了优化算法的概念，并提供了常用优化算法的实现。

在此示例中，我们将像之前一样使用 nn 包定义模型，但我们将使用 optim 包提供的 RMSprop 算法来优化模型。

# -*- coding: utf-8 -*-
import torch
import math


# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# Prepare the input tensor (x, x^2, x^3).
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)
loss_fn = torch.nn.MSELoss(reduction='sum')

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use RMSprop; the optim package contains many other
# optimization algorithms. The first argument to the RMSprop constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
for t in range(2000):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(xx)

    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable
    # weights of the model). This is because by default, gradients are
    # accumulated in buffers( i.e, not overwritten) whenever .backward()
    # is called. Checkout docs of torch.autograd.backward for more details.
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()


linear_layer = model[0]
print(f'Result: y = {
              linear_layer.bias.item()} + {
              linear_layer.weight[:, 0].item()} x + {
              linear_layer.weight[:, 1].item()} x^2 + {
              linear_layer.weight[:, 2].item()} x^3')

99 8000.55322265625
199 2261.3798828125
299 693.5020751953125
399 431.6658935546875
499 384.56048583984375
599 335.9397277832031
699 275.62109375
799 209.7830810546875
899 147.56797790527344
999 95.69912719726562
1099 56.923744201660156
1199 31.183666229248047
1299 16.72918701171875
1399 11.45569133758545
1499 9.004927635192871
1599 8.858762741088867
1699 8.87247085571289
1799 8.982568740844727
1899 8.938497543334961
1999 8.890787124633789
Result: y = -0.00026192551013082266 + 0.8563163876533508 x + -0.00026193069061264396 x^2 + -0.09376735985279083 x^3

PyTorch：自定义 `nn` 模块

有时你会想指定比现有模块序列更复杂的模型；对于这些情况，你可以通过继承 nn.Module 并定义一个 forward 方法来定义自己的模块，该方法接收输入张量，并使用其他模块或张量上的其他 autograd 操作来生成输出张量。

在此示例中，我们将三阶多项式实现为一个自定义 Module 子类。

# -*- coding: utf-8 -*-
import torch
import math


class Polynomial3(torch.nn.Module):
    def __init__(self):
        """
        In the constructor we instantiate four parameters and assign them as
        member parameters.
        """
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))

    def forward(self, x):
        """
        In the forward function we accept a Tensor of input data and we must return
        a Tensor of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Tensors.
        """
        return self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3

    def string(self):
        """
        Just like any class in Python, you can also define custom method on PyTorch modules
        """
        return f'y = {
              self.a.item()} + {
              self.b.item()} x + {
              self.c.item()} x^2 + {
              self.d.item()} x^3'


# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# Construct our model by instantiating the class defined above
model = Polynomial3()

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters (defined 
# with torch.nn.Parameter) which are members of the model.
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-6)
for t in range(2000):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {
              model.string()}')

99 2803.167236328125
199 1862.448486328125
299 1238.683349609375
399 825.0042724609375
499 550.60009765625
599 368.5425720214844
699 247.72659301757812
799 167.53285217285156
899 114.28926086425781
999 78.92943572998047
1099 55.439781188964844
1199 39.83119583129883
1299 29.455921173095703
1399 22.557106018066406
1499 17.968172073364258
1599 14.91462230682373
1699 12.881914138793945
1799 11.528202056884766
1899 10.626267433166504
1999 10.025044441223145
Result: y = 0.01408891100436449 + 0.8255323171615601 x + -0.0024305693805217743 x^2 + -0.08889124542474747 x^3

PyTorch：控制流 + 权重共享

作为动态图和权重共享的一个示例，我们实现了一个非常奇特的模型：一个三到五阶多项式，它在每次前向传播时随机选择 3 到 5 之间的一个数作为阶数，并多次重用相同的权重来计算四阶和五阶项。

对于这个模型，我们可以使用正常的 Python 控制流来实现循环，并且可以通过在定义前向传播时简单地多次重用相同的参数来实现权重共享。

我们可以轻松地将这个模型实现为一个 Module 子类。

# -*- coding: utf-8 -*-
import random
import torch
import math


class DynamicNet(torch.nn.Module):
    def __init__(self):
        """
        In the constructor we instantiate five parameters and assign them as members.
        """
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))
        self.e = torch.nn.Parameter(torch.randn(()))

    def forward(self, x):
        """
        For the forward pass of the model, we randomly choose either 4, 5
        and reuse the e parameter to compute the contribution of these orders.

        Since each forward pass builds a dynamic computation graph, we can use normal
        Python control-flow operators like loops or conditional statements when
        defining the forward pass of the model.

        Here we also see that it is perfectly safe to reuse the same parameter many
        times when defining a computational graph.
        """
        y = self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3
        for exp in range(4, random.randint(4, 6)):
            y = y + self.e * x ** exp
        return y

    def string(self):
        """
        Just like any class in Python, you can also define custom method on PyTorch modules
        """
        return f'y = {
              self.a.item()} + {
              self.b.item()} x + {
              self.c.item()} x^2 + {
              self.d.item()} x^3 + {
              self.e.item()} x^4 ? + {
              self.e.item()} x^5 ?'


# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# Construct our model by instantiating the class defined above
model = DynamicNet()

# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-8, momentum=0.9)
for t in range(30000):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 2000 == 1999:
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {
              model.string()}')

1999 1467.99267578125
3999 693.8176879882812
5999 325.27569580078125
7999 155.37420654296875
9999 76.90623474121094
11999 39.586753845214844
13999 23.33151626586914
15999 15.747987747192383
17999 12.066144943237305
19999 10.335156440734863
21999 9.515758514404297
23999 9.050674438476562
25999 8.763498306274414
27999 8.726919174194336
29999 8.659371376037598
Result: y = -0.004927584435790777 + 0.8540687561035156 x + 0.00039800297236070037 x^2 + -0.09318151324987411 x^3 + 0.00012038549175485969 x^4 ? + 0.00012038549175485969 x^5 ?

文章版权归作者所有，未经允许请勿转载。如内容涉嫌侵权，请在本页底部进入<联系我们>进行举报投诉!

THE END