[DNNL] NCHW vs NHWC performance

Recently DNNL (formerly known as MKL-DNN) integration in PaddlePaddle was updated to support
NHWC PaddlePaddle layout. Functionally it was tested by unit tests. No performance comparison was made NCHW vs NHWC on MKL-DNN and in my opinion it would valuable to have such a comparison on model level (best convolutional one). Hence there are two options:

Please provide two versions of model for C-API . One will be NCHW and the other will be NHWC.
If step 1 is not available then python model where We can change NCHW to NHWC is also good option.

Please see the Intel test requirements.

I uploaded two models that support two different data formats. Please see if they meet your requirements.
MobileNetV1_model_nchw.tar.gz
MobileNetV1_model_nhwc.tar.gz

Could you please share a command how to execute inference of those two models with current develop?

I have run both models, so my question does not need answer

I have another issue that I would like to be advised on:
I checked that one of models is NCHW and other is NHWC this is correct , but problem is that
both are having the same input dimensions. In other words , when executing model via test_analyzer_image_classification then shape is read out of model, as it happens here:

Paddle/paddle/fluid/inference/tests/test_helper.h

Line 134 in 6d325a9

std::vector<int64_t> var_shape = var->GetShape();

And for both NCHW and NHWC we have the same shape : [-1, 224, 224, 3] and according to my understanding shouldn't be the case. Paddle NHWC implies that shape is also following NHWC order not NCHW.Hence input shapes should be diffrent. Models are functionally correct, but I cannot compare performance when input shape differs.
among them.

Your understanding is correct and thanks for your description. After checking, I found that the shape of the data layer in my code is set to [-1, 224, 224, 3] when data format is NCHW, which is obviously wrong. For NCHW, the shape of input should be [-1, 3, 224, 224]. So I modified it and uploaded a new version of NCHW data format.
MobileNetV1_model_nchw.tar.gz

Both models were successfully tested . Results are as follows:

Platform MobileNetV1 NCHW MobileNetV1 NHWC

6248

86.0684 img/s

85.0215 img/s

E5-2699

73.8635 img/s

73.8363 img/s

On 6248 NCHW model is slightly faster , this is because two differences in Model execution:

NHWC model does require additional reorder operation. This is because there is inside MKL-DNN
efficient convolution implementation that is accepting Input(Src) in NCHW , Params(weights) in NCHW16C and produce output(Dst) in NCHW16C format. But there is no analogical implementation
for NHWC e.g. Input(NHWC) , Weights(NCHW16C) and output(NCHW16C). So input is converted from
NHWC to NCHW16C and then convolution working on NCHW16C (input,output,weights) happens.
This slows down a bit execution on AVX512 capable platforms.
NCHW model is executing MKL-DNN Fully connected operator. It does not happen for NHWC as
MKLDNN FC Pass has to be extended for NHWC. I filed a relevant issue :
#22396

, I was asked to add some more details on running.

Running NHWC model requires model with data_format/data_layout to be pointed by infer_model option. So running any model (NCHW or NHWC) is the same it just value of infer_model does change. For example:

test_analyzer_image_classification --infer_model=<path to NCHW or NHWC model> --repeat=1000

Provided models in this thread are in a form of separate files with params. Hence I modified
test_analyzer_image_classification to accept model in that form. See last commit of(just analyzer file):
https://github.com/jczaja/Paddle/commits/prv-tests
Provided models are containing model files, while PaddlePaddle is searching __model__ files, so model was renamed to __model__
You can inspect DNNL_VERBOSE logs (first dnnl_verbose,exec line): to check if DNNL was given input data in NHWC layout:

dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:acdb:f0 dst_f32::blocked:abcd:f0,,,1x3x224x224,0.0959473

We can see that there is:
..src_f32:blocked:acdb ..

acdb is NHWC in DNNL protocol

MobileNetV1_model_nchw.tar.gz
MobileNetV1_model_nhwc.tar.gz

Done.