基于ResNet-18的猫狗分类任务

发布时间:2026/6/6 12:38:12

基于ResNet-18的猫狗分类任务 教材参考《动手学深度学习pytorch版》--李沐等部分代码在ds的帮助下完善。整个项目在Kaggle中使用免费GPU运行不需要额外依赖其他包。项目地址https://github.com/AkitaAoi/-_-ResNet-18此博文主要在于记录整个项目的过程。必要的包%matplotlib inline import torch import torchvision from torch import nn from torch.nn import functional as F import torchvision.transforms as transforms from torch.utils import data import matplotlib.pyplot as plt import numpy as np import random基本架构ResNet-18先来看notebooks/dogs-vs-cats 1.ipynb, 对应教材p201-204.此时的主要目的在于先把模型敲出来在此之前先复习一下ResNet-18的架构主要是3个部分先实现第1、3模块模块 1这里面第一个卷积层采用3输入通道因为我们的图片是RGB彩色图所以有3个输入通道。b1 nn.Sequential(nn.Conv2d(3, 64, kernel_size 7, stride 2, padding 3), nn.BatchNorm2d(64), nn.ReLU(), nn.MaxPool2d(kernel_size 3, stride 2, padding 1))模块 3这里需要注意教材中最后的全连接层有10个输出维度是因为在Fashion-MNIST数据集中有10个类别。我们做猫狗分类只有两个类别所以要将10改成2.b6 nn.Sequential(nn.AdaptiveAvgPool2d((1, 1)), nn.Flatten(), nn.Linear(512, 2))模块 24个残差模块每个都由两个残差块构成在第一个模块中有两个下面这样的残差块后面的3个模块中都在第一个残差块中加入了1x1卷积层以方便输出与X相加。也就是说剩下这3个模块的输入都是先进入左侧的残差块再进入右侧的残差块下面给出代码实现一个残差块class Residual(nn.Module): def __init__(self, input_channels, num_channels, use_1x1conv False, strides 1): super().__init__() self.conv1 nn.Conv2d(input_channels, num_channels, kernel_size 3, padding 1, stride strides) self.conv2 nn.Conv2d(num_channels, num_channels, kernel_size 3, padding 1) if use_1x1conv: self.conv3 nn.Conv2d(input_channels, num_channels, kernel_size 1, stride strides) else: self.conv3 None self.bn1 nn.BatchNorm2d(num_channels) self.bn2 nn.BatchNorm2d(num_channels) def forward(self, X): Y F.relu(self.bn1(self.conv1(X))) Y self.bn2(self.conv2(Y)) if self.conv3: X self.conv3(X) Y X return F.relu(Y)一个残差模块(可以包含多个残差块但在ResNet-18中是包含两个残差块)def resnet_block(input_channels, num_channels, num_residuals, first_block False): blk [] for i in range(num_residuals): if i 0 and not first_block: blk.append(Residual(input_channels, num_channels, use_1x1conv True, strides 2)) else: blk.append(Residual(num_channels, num_channels)) return blk实例化b2 nn.Sequential(*resnet_block(64, 64, 2, first_block True)) b3 nn.Sequential(*resnet_block(64, 128, 2)) b4 nn.Sequential(*resnet_block(128, 256, 2)) b5 nn.Sequential(*resnet_block(256, 512, 2))拼接net nn.Sequential(b1, b2, b3, b4, b5, b6)形状变化形状变化公式因为水平方向和竖直方向的变换是一致的所以我们这里只看一个坐标上的数值变换教材中使用(批量大小输入通道数高宽) (1, 1, 224, 224)的输入这里因为是RGB图像所以使用(batch_size, 3, 224, 224)大小的输入批量大小肯定是不变的。b1层中采用(nn.Conv2d(3, 64, kernel_size 7, stride 2, padding 3)的卷积层填充为6(上下各3)步幅为2故:.输出通道数为64于是此时的输出形状应为(batch_size, 64, 112, 112).b1中除了卷积层还有最大池化层也会影响形状nn.MaxPool2d(kernel_size 3, stride 2, padding 1)所以最终b1的输出形状为batch_size, 64, 56, 56.然后来看b2块b2 nn.Sequential(*resnet_block(64, 64, 2, first_block True))def resnet_block(input_channels, num_channels, num_residuals, first_block False): ... blk.append(Residual(64, 64)) return blk使用了两个64 -64的残差块第一个里面有两个3x3卷积核, stride默认为1conv1 nn.Conv2d(input_channels, num_channels, kernel_size 3, padding 1, stride strides) conv2 nn.Conv2d(num_channels, num_channels, kernel_size 3, padding 1)经过conv1k 3, 填充2步长1得到(batch_size, 64, 56, 56)经过conv2时同理所以大小不变。下面看b3块b3 nn.Sequential(*resnet_block(64, 128, 2))依旧是创建了2个残差块输入64输出128.第一个残差块def resnet_block(input_channels, num_channels, num_residuals, first_block False): ... if i 0 and not first_block: blk.append(Residual(64, 128, use_1x1conv True, strides 2)) ... return blk这里步幅改成2所以大小也要除以2(算法和上面类似)所以输出变为:(batch_size, 128, 28, 28).第二个残差块def resnet_block(input_channels, num_channels, num_residuals, first_block False): ... blk.append(Residual(128, 128)) return blk于是b3块的输出形状为(batch_size, 128, 28, 28).后面算法类似就不说了。到此模型差不多有框架了我们后面不会改模型了。下面按照Github中版本逐一说明。notebooks/dogs-vs-cats 1加载数据集修改自d2l.load_data_fashion_mnist().def load_data(batch_size, resize None): trans [transforms.ToTensor()] if resize: trans.insert(0, transforms.Resize((resize, resize))) trans transforms.Compose(trans) train_dir /kaggle/input/datasets/tongpython/cat-and-dog/training_set/training_set test_dir /kaggle/input/datasets/tongpython/cat-and-dog/test_set/test_set full_dataset torchvision.datasets.ImageFolder(root train_dir, transform trans) #划分训练集和验证集 val_size int(len(full_dataset) * 0.2) train_size len(full_dataset) - val_size train_dataset, val_dataset data.random_split(full_dataset, [train_size, val_size]) #返回 train_loader data.DataLoader(train_dataset, batch_size batch_size, shuffle True) val_loader data.DataLoader(val_dataset, batch_size batch_size, shuffle False) return train_loader, val_loader注意data.DataLoader中L是大写的。准备工作完成了接下来可以开始训练了。训练函数修改自d2l.train_ch6, 删除了作图的部分。依赖的函数有accuracy(), evaluate_accuracy_gpu()Accumulator。都加在前面就行。class Accumulator: def __init__(self, n): self.data [0.0] * n def add(self, *args): self.data [a float(b) for a, b in zip(self.data, args)] def reset(self): self.data [0.0] * len(self.data) def __getitem__(self, idx): return self.data[idx]#返回正确数 def accuracy(y_hat, y): if y_hat.ndim 1: y_hat y_hat.argmax(dim1) return (y_hat y).sum().item() def evaluate_accuracy_gpu(net, data_iter, device None): if isinstance(net, nn.Module): net.eval() if not device: device next(iter(net.parameters())).device metric Accumulator(2) with torch.no_grad(): for X, y in data_iter: if isinstance(X, list): X [x.to(device) for x in X] else: X X.to(device) y y.to(device) metric.add(accuracy(net(X), y), y.numel()) return metric[0] / metric[1]def train(net, train_iter, test_iter, num_epochs, lr, device): #使用xavier权重初始化 def init_weights(m): if type(m) nn.Linear or type(m) nn.Conv2d: nn.init.xavier_uniform_(m.weight) net.apply(init_weights) print(training on, device) net.to(device) optimizer torch.optim.SGD(net.parameters(), lr lr) loss nn.CrossEntropyLoss() for epoch in range(num_epochs): metric Accumulator(3) for i,(x, y) in enumerate(train_iter): optimizer.zero_grad() x, y x.to(device), y.to(device) y_hat net(x) l loss(y_hat, y) l.backward() optimizer.step() with torch.no_grad(): metric.add(l * x.shape[0], accuracy(y_hat, y), x.shape[0]) train_l metric[0] / metric[2] train_acc metric[1] / metric[2] test_acc evaluate_accuracy_gpu(net, test_iter) print(fEpoch {epoch1}: loss{train_l:.4f}, train_acc{train_acc:.4f}, test_acc{test_acc:.4f})设置超参数开始训练lr, num_epochs, batch_size 0.05, 10, 256 train_iter, test_iter load_data(batch_size, resize 96) train(net, train_iter, test_iter, num_epochs, lr, torch.device(cuda if torch.cuda.is_available() else cpu))至少可以看到的是程序不会报错开始运行了下面是运行结果注意到loss nan这是不对的说明梯度爆炸或数值不稳定。改进方向针对梯度爆炸减小学习率。针对数值不稳定输入标准化/归一化。notebooks/dogs-vs-cats 2只有两个需要改进的点输入标准化在load_data()函数中, ToTensor()之后加入def load_data(batch_size, resize None): ... trans.append(transforms.ToTensor()) trans.append(transforms.Normalize(mean[0.485, 0.456, 0.406], std[0.229, 0.224, 0.225])) trans transforms.Compose(trans) ...其中mean和std是听信DS使用了ImageNet的均值和标准差。实际上对于当前数据集应该使用mean [0.4883, 0.4551, 0.4174], std [0.2595, 0.2526, 0.2552]. 下面给出计算均值和标准差的代码。可以在训练前单独运行将结果替换到transforms.Normalize()中。后面一直都是用这个错误的均值和标准差跑的所以结果就当个参考吧。trans [] trans.append(transforms.Resize((224, 224))) trans.append(transforms.ToTensor()) trans transforms.Compose(trans) train_dir /kaggle/input/datasets/tongpython/cat-and-dog/training_set/training_set dataset torchvision.datasets.ImageFolder(root train_dir, transform trans) train data.DataLoader(dataset, shuffle True) # 初始化累加器 channel_sum torch.zeros(3) # Σ x channel_sum_sq torch.zeros(3) # Σ x^2 total_pixels 0 for images, _ in train: # images shape: (batch_size, 3, H, W) # 当前 batch 的像素总数所有图像的 H*W 之和 batch_pixels images[0, 0, :, :].numel() * images.size(0) # H*W * batch_size # 累加像素值和平方和在空间维度上求和 channel_sum images.sum(dim[0,2,3]) # (3,) channel_sum_sq (images ** 2).sum(dim[0,2,3]) # (3,) total_pixels batch_pixels # 计算均值 channel_mean channel_sum / total_pixels # 计算标准差 sqrt( E[x^2] - (E[x])^2 ) # 注意这是总体标准差除以总像素数如果希望样本标准差可除以 (N-1) channel_var channel_sum_sq / total_pixels - channel_mean ** 2 channel_std torch.sqrt(channel_var) print(Channel mean (R,G,B):, channel_mean) print(Channel std (R,G,B):, channel_std)减小学习率改一下超参数设置中的 lr 就可以了。lr, num_epochs, batch_size 0.0005, 10, 256 train_iter, test_iter load_data(batch_size, resize 96) train(net, train_iter, test_iter, num_epochs, lr, torch.device(cuda if torch.cuda.is_available() else cpu))运行结果改进方向loss变成正常下降了说明我们的输入标准化和减小学习率是有用的但是还是有问题存在首先训练集准确率不高可能之后多训练几轮还能上升需要增加训练轮数也说明模型收敛速度不快。验证集上的准确率在epoch8达到最高后又下降说明过拟合了。要调整的部分很多先减小学习率、增加训练轮数。防止过拟合减小了batch_size。增加了权重衰减参数为1e-4。增加了Dropout层参数设置为0.3加在模型最后的全连接层之前。图像大小统一调整为和教材上一样的224。notebooks/dogs-vs-cats 3增加Dropout暂退层b6 nn.Sequential(nn.AdaptiveAvgPool2d((1, 1)), nn.Flatten(), nn.Dropout(0.3), nn.Linear(512, 2))权重衰减def train(net, train_iter, test_iter, num_epochs, lr, device): ... net.to(device) optimizer torch.optim.SGD(net.parameters(), lr lr, weight_decay1e-4) loss nn.CrossEntropyLoss() ...调参lr, num_epochs, batch_size 0.0001, 50, 64 train_iter, test_iter load_data(batch_size, resize 224) train(net, train_iter, test_iter, num_epochs, lr, torch.device(cuda if torch.cuda.is_available() else cpu))运行结果只放最后几个epoch训练集准确率到达了80%以上但是验证集准确率最后才60%左右最高达到了70%。主要进步是Dropout层的功劳并且在尝试参数为0.30.40.5之后发现还是0.3最好用。改进方向显然还是过拟合了要增加正则化项图像增强。在验证集上波动很大怀疑和验证集的划分有关将训练集验证集调整为73。DS建议我调整动量可以加速收敛、减小震荡、越过局部最小值。事实证明这个真有效。增加可视化。notebooks/dogs-vs-cats 4因为想画动态图加上敲了一下教材中的图像增强所以增加了一些作图部分但是后来没有用到所以这个版本我就只挑一些关键的改动剩下的都放在最后一个版本中。动量我们的大功臣加了动量之后就去午睡了起来看见测试集准确率能到80%也不困了手机也不好玩了起来就开始整理代码了。def train(net, train_iter, test_iter, num_epochs, lr, device): ... optimizer torch.optim.SGD(net.parameters(), lr lr, momentum0.9, weight_decay1e-4) loss nn.CrossEntropyLoss() ...图像增广因为只有训练集需要增强所以在训练集和验证集上使用不同的预处理。大概试了各种增广方法发现只用水平翻转应该是最好的也就是torchvision.transforms.RandomHorizontalFlip()train_augs torchvision.transforms.Compose([ torchvision.transforms.Resize((224, 224)), torchvision.transforms.RandomHorizontalFlip(), torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(mean[0.485, 0.456, 0.406], std[0.229, 0.224, 0.225])]) test_augs torchvision.transforms.Compose([ torchvision.transforms.Resize((224, 224)), torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(mean[0.485, 0.456, 0.406], std[0.229, 0.224, 0.225])])应用到不同数据中顺便把验证集比例也调整到0.3def load_data(batch_size, resizeNone, train_augsNone, test_augsNone): train_dir /kaggle/input/datasets/tongpython/cat-and-dog/training_set/training_set # 1. 先加载原始数据集不应用任何变换 full_dataset torchvision.datasets.ImageFolder(roottrain_dir, transformNone) # 2. 划分训练集和验证集基于索引 val_size int(len(full_dataset) * 0.3) train_size len(full_dataset) - val_size train_subset, val_subset data.random_split(full_dataset, [train_size, val_size]) # 3. 定义一个包装类为每个子集单独应用变换 class TransformSubset(data.Dataset): def __init__(self, subset, transform): self.subset subset self.transform transform def __len__(self): return len(self.subset) def __getitem__(self, idx): img, label self.subset[idx] if self.transform: img self.transform(img) return img, label # 4. 分别包装训练集和验证集应用不同的变换 train_dataset TransformSubset(train_subset, train_augs) val_dataset TransformSubset(val_subset, test_augs) # 5. 创建 DataLoader train_loader data.DataLoader(train_dataset, batch_sizebatch_size, shuffleTrue) val_loader data.DataLoader(val_dataset, batch_sizebatch_size, shuffleFalse) return train_loader, val_loader可视化 增加学习率因为不会动态绘图也不想导d2l包所以采用的方法是记录每一epoch的损失和准确率最后一起绘图。于是需要改train()函数顺便增加了保存验证集上准确率最高的模型的功能def train(net, train_iter, test_iter, num_epochs, lr, device): ... loss nn.CrossEntropyLoss() train_losses, train_accs, test_accs [], [], [] best_acc 0 for epoch in range(num_epochs): ... test_acc evaluate_accuracy_gpu(net, test_iter) if test_acc best_acc: best_acc test_acc torch.save(net.state_dict(), best_cat_dog_model.pth) print(fEpoch {epoch1}: loss{train_l:.4f}, train_acc{train_acc:.4f}, test_acc{test_acc:.4f}) train_losses.append(train_l) train_accs.append(train_acc) test_accs.append(test_acc) return train_losses, train_accs, test_accs因为有了动量所以我们可以增加学习率来使其快点收敛lr, num_epochs, batch_size 0.001, 50, 64 train_iter, test_iter load_data(batch_size, resize 224, train_augstrain_augs, test_augstest_augs)最后作图plt.figure(figsize(12,4)) plt.subplot(1,2,1) plt.plot(train_losses, labeltrain loss) plt.xlabel(epoch) plt.legend() plt.subplot(1,2,2) plt.plot(train_accs, labeltrain acc) plt.plot(test_accs, labeltest acc) plt.xlabel(epoch) plt.legend() plt.show()运行结果改进方向可以看到验证集上的准确率最高达到了84%后续平均也几乎在80%.差不多可以收尾了最后添加了种子使结果可复现。整理了jupyter文件删除了一些没用的函数添加了一些必要的文字说明使结构更清晰一些。单图片预测在Kaggle或者Jupyter中运行 notebooks/dogs-vs-cats-with-aug seed42.ipynb 后可以得到50epoch中验证集准确率最高的模型best_cat_dog_model.pth项目里没传Github网页上传不了大于25M的文件。将best_cat_dog_model.pth和scripts/model.py以及scripts/inference.py放在同一个文件夹中就可以预测单张图片了命令行中输入python inference.py /path/to/your/cat.jpg

相关新闻