首页 Paddle框架 帖子详情
飞桨如何进行模型并行(单GPU环境,模型结构存在可并行的地方)?
收藏
快速回复
Paddle框架 问答模型训练 519 9
飞桨如何进行模型并行(单GPU环境,模型结构存在可并行的地方)?
收藏
快速回复
Paddle框架 问答模型训练 519 9

如何用单GPU进行模型并行?

飞桨框架内用动态图或者动转静图以后,无法自动并行训练可并行的层。

比如例子中的结构,一个是纯并行,一个是纯串行。

如果模型自动并行了,那么并行的耗时应该远低于串行耗时,但实际是两者耗时相同。

请问是我的方法测试不对,还是飞桨不支持模型中有能并行的结构不会并行。

模型结构如下图:

模型代码如下:

import time
import paddle


class Model1(paddle.nn.Layer):
    def __init__(self):
        super(Model1, self).__init__()
        self.conv1 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv2 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv3 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv4 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv5 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv6 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv7 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv8 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv9 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv10 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv11 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv12 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv13 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv14 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv15 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv16 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv17 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv18 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv19 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv20 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")

    @paddle.jit.to_static()
    def forward(self, inputs):
        out1 = self.conv1(inputs)
        out2 = self.conv2(inputs)
        out3 = self.conv3(inputs)
        out4 = self.conv4(inputs)
        out5 = self.conv5(inputs)
        out6 = self.conv6(inputs)
        out7 = self.conv7(inputs)
        out8 = self.conv8(inputs)
        out9 = self.conv9(inputs)
        out10 = self.conv10(inputs)
        out11 = self.conv11(inputs)
        out12 = self.conv12(inputs)
        out13 = self.conv13(inputs)
        out14 = self.conv14(inputs)
        out15 = self.conv15(inputs)
        out16 = self.conv16(inputs)
        out17 = self.conv17(inputs)
        out18 = self.conv18(inputs)
        out19 = self.conv19(inputs)
        out20 = self.conv20(inputs)
        return out1


class Model2(paddle.nn.Layer):
    def __init__(self):
        super(Model2, self).__init__()
        self.conv1 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv2 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv3 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv4 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv5 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv6 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv7 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv8 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv9 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv10 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv11 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv12 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv13 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv14 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv15 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv16 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv17 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv18 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv19 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")
        self.conv20 = paddle.nn.Conv2D(in_channels=16, out_channels=16, kernel_size=(3, 3), padding="same")

    # @paddle.jit.to_static()
    def forward(self, inputs):
        out1 = self.conv1(inputs)
        out1 = self.conv2(out1)
        out1 = self.conv3(out1)
        out1 = self.conv4(out1)
        out1 = self.conv5(out1)
        out1 = self.conv6(out1)
        out1 = self.conv7(out1)
        out1 = self.conv8(out1)
        out1 = self.conv9(out1)
        out1 = self.conv10(out1)
        out1 = self.conv11(out1)
        out1 = self.conv12(out1)
        out1 = self.conv13(out1)
        out1 = self.conv14(out1)
        out1 = self.conv15(out1)
        out1 = self.conv16(out1)
        out1 = self.conv17(out1)
        out1 = self.conv18(out1)
        out1 = self.conv19(out1)
        out1 = self.conv20(out1)
        return out1


if __name__ == '__main__':
    model = Model1()
    model.eval()
    data = paddle.uniform([128, 16, 128, 128])
    force = model(data)
    t = time.time()
    for i in range(1000):
        force = model(data)
    print(time.time() - t)

    model = Model2()
    model.eval()
    data = paddle.uniform([128, 16, 128, 128])
    force = model(data)
    t = time.time()
    for i in range(1000):
        force = model(data)
    print(time.time() - t)
在GPU上耗时为33/35秒,在CPU上耗时20.4秒和27.7秒

可见在GPU上串行结构和并行结构是一样的耗时.而CPU上串行结构和并行结构的耗时则不同.
0
收藏
回复
全部评论(9)
时间顺序
UnseenMe
#2 回复于2022-03

没听说飞桨可以模型自动并行

0
回复
d
docoter_c
#3 回复于2022-03
没听说飞桨可以模型自动并行

据说tensorflow可以

0
回复
这是aistudio_4这是aistud
#4 回复于2022-03

1

0
回复
fi_Past
#5 回复于2022-03

单gpu没必要并行吧

0
回复
skywalk163
#6 回复于2022-03

通常情况下单个操作符将使用所有CPU或单个GPU上的所有计算资源。例如,在一台机器上有多个CPU处理器,dot 操作符也将使用所有CPU上的所有核心(和线程)。这样的行为同样适用于单个GPU。因此,并行化对于单设备计算机来说并不是很有用,  ---《动手学深度学习》

0
回复
d
docoter_c
#7 回复于2022-03
fi_Past #5
单gpu没必要并行吧

单GPU有必要的,因为GPU占用和显存都没有满.

由于串行运算,假如单层的参数量和数据输入都比较小,但是层数多,和并行结构多,所以整体下来耗时就多.

并行就能成倍的降低运算时间.

0
回复
d
docoter_c
#8 回复于2022-03
通常情况下单个操作符将使用所有CPU或单个GPU上的所有计算资源。例如,在一台机器上有多个CPU处理器,dot 操作符也将使用所有CPU上的所有核心(和线程)。这样的行为同样适用于单个GPU。因此,并行化对于单设备计算机来说并不是很有用,  ---《动手学深度学习》
展开

它这里说的是指一个运算符可以动用整个GPU上的资源,这个没问题.

但我现在的问题是有很多个运算符,如果不并行的话,同排的运算符也要等前面的计算完它才能计算.而单个运算符不需要占用整个GPU,单个运算符计算的时候,GPU大部分是闲置状态.所以需要结构上的并行.

0
回复
fi_Past
#9 回复于2022-03
单GPU有必要的,因为GPU占用和显存都没有满. 由于串行运算,假如单层的参数量和数据输入都比较小,但是层数多,和并行结构多,所以整体下来耗时就多. 并行就能成倍的降低运算时间.

嗷欧,学习了

0
回复
d
docoter_c
#10 回复于2022-03

@to_static之后,内部执行器会按照拓扑序来并行执行的。

目前CPU训练默认是2个线程,GPU上由于CUDA launch kernel 时存在锁,默认是单线程在跑,暂时没有并行。

在GPU训练上,CPU端的Kernel launch与 GPU端的 Kernel 执行是异步的。如果CPU端的Kernel 都可以很早launch起来,意味着并行并没有什么特殊收益,因为GPU始终是满利用率在执行较早前launch的Kernel,而新launch的kernel还在stream里排队。

在CPU训练上,每个Kernel launch和Kernel计算都在一个线程里进行,没有上述GPU中的异步,所以开启多线程并行是会有收益的

Github上的提问链接:https://github.com/PaddlePaddle/Paddle/issues/40435

0
回复
需求/bug反馈?一键提issue告诉我们
发现bug?如果您知道修复办法,欢迎提pr直接参与建设飞桨~
在@后输入用户全名并按空格结束,可艾特全站任一用户