Pytorch单GPU、多GPU训练的几个细节对比

Pytorch单GPU、多GPU训练的区别主要在三个地方：训练前指定GPU、训练过程中保存模型和加载刚刚保存的模型。

训练前指定GPU

单GPU：

os.environ["CUDA_VISIBLE_DEVICES"] = '0'

多GPU：

os.environ["CUDA_VISIBLE_DEVICES"] = '0,1' if torch.cuda.device_count() > 1:     print("Let's use", torch.cuda.device_count(), "GPUs!")     self.model = nn.DataParallel(self.model)

训练过程中保存模型

单GPU：

state = {'model': self.model.state_dict(), 'epoch': ite} torch.save(state, self.model.name())

多GPU：

if isinstance(self.model,torch.nn.DataParallel):##判断是否并行     self.model = self.model.module      state = {'model': self.model.state_dict(), 'epoch': ite} torch.save(state, self.model.name())  if torch.cuda.device_count() > 1:     self.model = nn.DataParallel(self.model)

多加的上面两句是为了解决下面的问题

AttributeError: 'DataParallel' object has no attribute 'name'

如果不加最后两句，也不会报错，但是后面训练都会变成单GPU，也就是会导致下面的结果。（我用的两个GPU）

加上后两句之后：

需要注意前两句、后两句以及原来两句的相对位置不能颠倒，例如把原来的第一句放到最前面，在后面加载模型的时候可能会出现问题。

加载刚刚保存的模型

单GPU：

checkpoint = torch.load(self.model.name()) self.model.load_state_dict(checkpoint['model'])

多GPU改成：

if isinstance(self.model,torch.nn.DataParallel):         self.model = self.model.module if torch.cuda.is_available(): #gpu     checkpoint = torch.load(self.model.name()) else: #cpu     checkpoint = torch.load(self.model.name(),map_location=lambda storage, loc: storage) self.model.load_state_dict(checkpoint['model'])

上一个：天津宠物领养机构电话号码（天津宠物救助站领养宠物）

下一个：北京农大动物医院公众号北京农大动物医院公众号下载

归纳

21 2025-04
89 2025-03

Pytorch单GPU、多GPU训练的几个细节对比

训练前指定GPU

训练过程中保存模型

加载刚刚保存的模型

热门文章

归纳

21 2025-04

89 2025-03