Hi! I confirmed that it is because elementwise_mul_op.cc did not support 3 -dimension format ncw
.
I add this 3-dimension support and drop to naive version. After that, the performance with bs_test_10 can run successfully but slow. Do you have any reference performance?
As it is slow, we are considering not dropping to naive version but:
- use MKL-DNN binary primitive
- if that does not work, properly will be jit.
Please if possible provide reference latency ( or performance). Thank you!
I confirmed that it is because elementwise_mul_op.cc did not support 3 -dimension format ncw.
Is
elementwise_mul_op.cc
orelementwise_mul_mkldnn_op.cc
?As it is slow, we are considering not dropping to naive version but:
Maybe you could add this 3-dimension support for mkldnn at first.
Sorry elementwise_mul_mkldnn_op.cc
did not support 3-dimension format. In the #20965 I drop to naive version. We are also working on mkldnn support now.
Hi, In general, we don’t want performance degradation.
We only have the performance with closing mkldnn.
What is the setting getting performance (1000 ms/sample)? Is mkldnn open and what is the thread number?
PaddlePaddle版本:develop

commit id:
CPU型号: Intel(R) Xeon(R) Gold 6148 CPU
使用Docker镜像:hub.baidubce.com/paddlepaddle/paddle:latest-dev
模型信息
模型:Ernie-large
错误信息:
打开MKLDNN库,报错。
关掉MKLDNN库,错误消失。
错误详细信息:
复现步骤:
/benchmark/Inference/c++/ernie/路径下:
mkdir build
cd build
cmake -DUSE_GPU=OFF -DPADDLE_ROOT=XXX ..
make