10 Convolutional networks
The simplest machine learning models assume that the observed data values are unstructured, meaning that the elements of the data vectors x are treated as if we do not know anything in advance about how the individual elements might relate to each other. If we were to make a random permutation of the ordering of these variables and apply this fixed permutation consistently on all training and test data, there would be no difference in the performance for the models considered so far.
Many applications of machine learning, however, involve structured data in which there are additional relationships between input variables. For example, the words in natural language form a sequence, and if we were to model language as a generative autoregressive process then we would expect each word to depend more strongly on the immediately preceding words and less so on words much earlier in the sequence. Likewise, the pixels of an image have a well-defined spatial relationship to each other in which the input variables are arranged in a two-dimensional grid, and nearby pixels have highly correlated values.
We have already seen that our knowledge of the structure of specific data modalities can be utilized through the addition of a regularization term to the error function in the training objective, through data augmentation, or through modifications to the model architecture. These approaches can help guide the model to respect certain properties such as invariance (9.1.3) and equivariance (9.1.4) with respect to transformations of the input data. In this chapter we will take a look at an architectural approach called a convolutional neural network (CNN), which we will see can be viewed as a sparsely connected multilayer network with parameter sharing, and designed to encode invariances and equivariances specific to image data.
import torch
import torch.nn as nn
import torch.nn.functional as F
from einops.layers.torch import Rearrange
# Define the old network
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
# Define the new network
conv_net_new = nn.Sequential(
nn.Conv2d(1, 10, kernel_size=5),
nn.MaxPool2d(kernel_size=2),
nn.ReLU(),
nn.Conv2d(10, 20, kernel_size=5),
nn.MaxPool2d(kernel_size=2),
nn.ReLU(),
nn.Dropout2d(),
Rearrange('b c h w -> b (c h w)'),
nn.Linear(320, 50),
nn.ReLU(),
nn.Dropout(),
nn.Linear(50, 10),
nn.LogSoftmax(dim=1)
)
# Create a random tensor to represent a batch of images
x = torch.randn(1, 1, 28, 28)
# Pass the tensor through the old network
conv_net_old = Net()
y_old = conv_net_old(x)
print("Output from the old network:", y_old)
print("Output shape from the old network:", y_old.shape)
# Pass the tensor through the new network
y_new = conv_net_new(x)
print("Output from the new network:", y_new)
print("Output shape from the new network:", y_new.shape)10.1 Computer vision
10.1.1 Image data
10.1.2 Convolutional filters
10.1.3 Feature detectors
10.1.4 Translation equivariance
10.1.5 Padding
10.1.6 Strided convolutions
10.1.7 Multi-dimensional convolutions
10.1.8 Pooling
10.1.9 Multilayer convolutions
10.1.10 Example network architectures
LeNet ImageNet VGG16 AlexNet