首页 Paddle框架 帖子详情
[DNNL] NCHW vs NHWC performance
收藏
快速回复
Paddle框架 问答深度学习炼丹技巧 1088 10
[DNNL] NCHW vs NHWC performance
收藏
快速回复
Paddle框架 问答深度学习炼丹技巧 1088 10

Recently DNNL (formerly known as MKL-DNN) integration in PaddlePaddle was updated to support
NHWC PaddlePaddle layout. Functionally it was tested by unit tests. No performance comparison was made NCHW vs NHWC on MKL-DNN and in my opinion it would valuable to have such a comparison on model level (best convolutional one). Hence there are two options:

  1. Please provide two versions of model for C-API . One will be NCHW and the other will be NHWC.
  2. If step 1 is not available then python model where We can change NCHW to NHWC is also good option.
0
收藏
回复
全部评论(10)
时间顺序
AIStudio785465
#2 回复于2020-01
@zhangting2020

Please see the Intel test requirements.

0
回复
AIStudio791360
#3 回复于2020-01
@jczaja

I uploaded two models that support two different data formats. Please see if they meet your requirements.
MobileNetV1_model_nchw.tar.gz
MobileNetV1_model_nhwc.tar.gz

0
回复
AIStudio791541
#4 回复于2020-01
@zhangting2020

Could you please share a command how to execute inference of those two models with current develop?

0
回复
AIStudio791541
#5 回复于2020-01
@zhangting2020

I have run both models, so my question does not need answer

0
回复
AIStudio791541
#6 回复于2020-01
@zhangting2020

I have another issue that I would like to be advised on:
I checked that one of models is NCHW and other is NHWC this is correct , but problem is that
both are having the same input dimensions. In other words , when executing model via test_analyzer_image_classification then shape is read out of model, as it happens here:

std::vector<int64_t> var_shape = var->GetShape();

And for both NCHW and NHWC we have the same shape : [-1, 224, 224, 3] and according to my understanding shouldn't be the case. Paddle NHWC implies that shape is also following NHWC order not NCHW.Hence input shapes should be diffrent. Models are functionally correct, but I cannot compare performance when input shape differs.
among them.

0
回复
AIStudio791360
#7 回复于2020-01
@jczaja

Your understanding is correct and thanks for your description. After checking, I found that the shape of the data layer in my code is set to [-1, 224, 224, 3] when data format is NCHW, which is obviously wrong. For NCHW, the shape of input should be [-1, 3, 224, 224]. So I modified it and uploaded a new version of NCHW data format.
MobileNetV1_model_nchw.tar.gz

0
回复
AIStudio791541
#8 回复于2020-01

Both models were successfully tested . Results are as follows:

Platform MobileNetV1 NCHW MobileNetV1 NHWC
6248
86.0684 img/s
85.0215 img/s
E5-2699
73.8635 img/s
73.8363 img/s

On 6248 NCHW model is slightly faster , this is because two differences in Model execution:

  1. NHWC model does require additional reorder operation. This is because there is inside MKL-DNN
    efficient convolution implementation that is accepting Input(Src) in NCHW , Params(weights) in NCHW16C and produce output(Dst) in NCHW16C format. But there is no analogical implementation
    for NHWC e.g. Input(NHWC) , Weights(NCHW16C) and output(NCHW16C). So input is converted from
    NHWC to NCHW16C and then convolution working on NCHW16C (input,output,weights) happens.
    This slows down a bit execution on AVX512 capable platforms.
  2. NCHW model is executing MKL-DNN Fully connected operator. It does not happen for NHWC as
    MKLDNN FC Pass has to be extended for NHWC. I filed a relevant issue :
    #22396
0
回复
AIStudio791541
#9 回复于2020-02
@zhangting2020 @luotao1

, I was asked to add some more details on running.

  1. Running NHWC model requires model with data_format/data_layout to be pointed by infer_model option. So running any model (NCHW or NHWC) is the same it just value of infer_model does change. For example:

test_analyzer_image_classification --infer_model=<path to NCHW or NHWC model> --repeat=1000

  1. Provided models in this thread are in a form of separate files with params. Hence I modified
    test_analyzer_image_classification to accept model in that form. See last commit of(just analyzer file):
    https://github.com/jczaja/Paddle/commits/prv-tests

  2. Provided models are containing model files, while PaddlePaddle is searching __model__ files, so model was renamed to __model__

  3. You can inspect DNNL_VERBOSE logs (first dnnl_verbose,exec line): to check if DNNL was given input data in NHWC layout:

dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:acdb:f0 dst_f32::blocked:abcd:f0,,,1x3x224x224,0.0959473

We can see that there is:
..src_f32:blocked:acdb ..

acdb is NHWC in DNNL protocol

0
回复
AIStudio786090
#11 回复于2020-02

Done.

0
回复
需求/bug反馈?一键提issue告诉我们
发现bug?如果您知道修复办法,欢迎提pr直接参与建设飞桨~
在@后输入用户全名并按空格结束,可艾特全站任一用户