请教paddle框架下如何用余弦相似度比较两个词
189******30 发布于2020-03 浏览:2364 回复:0
0
收藏

正在跑NLP的作业,老师给的源码是用两个词向量点乘再求和的方式度量两个词向量的距离,代码如下:

word_sim = fluid.layers.elementwise_mul(center_words_emb, target_words_emb)
word_sim = fluid.layers.reduce_sum(word_sim, dim = -1)

我改成了用余弦相似度度量两个词向量的距离(用的scikit-learn库),代码如下:

cos_simi = []
cwen, twen = center_words_emb.numpy(), target_words_emb.numpy()
for cwe, twe in zip(cwen, twen):
    #cos_simi.append(cosine_similarity([cwe, twe])[0][1])
    cos_simi.append(pairwise_distances([cwe, twe],metric="cosine")[0][1])
    cos_simi = np.array(cos_simi)
    cos_simi = fluid.dygraph.to_variable(cos_simi)

然后我打印对比了word_sim和cos_simi的形状完全一样。

name _generated_var_4, dtype: VarType.FP32 shape: [128] 	lod: {}
	dim: 128
	layout: NCHW
	dtype: float
	data: [1.09309 0.95204 0.952273 0.89564 1.04106 0.959963 0.999314 1.04193 0.940868 1.04073 0.97881 0.93834 0.935314 0.978834 1.0094 1.03479 0.987623 1.04722 0.980403 0.970246 0.994652 0.866643 0.914722 0.921665 1.08021 0.927647 1.01984 0.993656 1.10541 1.07305 1.04776 0.894663 1.07535 0.978069 1.04506 1.09533 0.999264 1.03059 1.10485 0.911649 1.00788 1.01854 0.948804 0.918824 0.956171 1.01934 1.01357 0.906443 0.999934 0.937435 0.930464 0.905384 1.0975 0.930095 0.915911 0.981939 0.930552 0.951365 1.01836 0.936301 0.94974 0.984658 1.03553 0.970266 0.949307 0.963235 0.944142 0.903508 0.893749 0.939145 1.03633 0.980904 0.967058 1.04402 1.04436 0.983846 0.993587 0.988446 0.943637 1.0263 0.864552 0.941066 1.02293 1.04184 1.0407 0.957243 0.958953 0.950395 0.96996 0.841486 0.955255 0.931798 1.04504 1.10637 1.029 1.04081 1.06277 0.89706 1.12735 0.96552 1.08053 1.01276 1.02153 1.04763 1.0594 0.935468 0.979125 0.990572 1.09037 1.12188 1.06681 1.03376 0.932162 1.05503 1.08114 0.99389 1.01611 1.03835 1.08709 0.909373 0.94254 1.06881 1.0028 0.939428 1.02045 0.996419 1.06491 1.02702]
name tmp_3, dtype: VarType.FP32 shape: [128] 	lod: {}
	dim: 128
	layout: NCHW
	dtype: float
	data: [-0.0892933 0.0484201 0.0466774 0.108153 -0.0413855 0.0387186 0.000692869 -0.0435374 0.061526 -0.0416969 0.0223086 0.0617025 0.0656886 0.0214862 -0.00964883 -0.0351711 0.0129902 -0.0475177 0.0191074 0.0310871 0.00522051 0.13347 0.0875304 0.0804459 -0.0820067 0.0706066 -0.0197691 0.00642272 -0.100747 -0.0790353 -0.0482385 0.103857 -0.0739055 0.0218809 -0.0452028 -0.0942113 0.000772853 -0.0297771 -0.103203 0.0942138 -0.00758512 -0.0191907 0.0480328 0.0862864 0.0437371 -0.0192648 -0.0135297 0.0954909 7.06827e-05 0.0637315 0.0720659 0.091651 -0.0964707 0.0657186 0.0916246 0.0177157 0.0748465 0.0459875 -0.0186842 0.0600546 0.0466207 0.0158564 -0.0352678 0.0287785 0.0454609 0.0368144 0.0567448 0.100132 0.10442 0.057532 -0.037015 0.0192375 0.0342264 -0.0442521 -0.0436309 0.0167647 0.00639526 0.0117759 0.0568235 -0.0248982 0.13497 0.0567944 -0.0229012 -0.0408141 -0.0402232 0.0405853 0.0392214 0.0507376 0.0286319 0.156024 0.0432244 0.0717968 -0.0450222 -0.102999 -0.0308226 -0.04283 -0.0578854 0.09673 -0.130118 0.032944 -0.0796598 -0.0116373 -0.0218446 -0.045581 -0.0606595 0.0612557 0.0203598 0.00940536 -0.0972261 -0.127709 -0.0637216 -0.0344415 0.06782 -0.0542321 -0.0767841 0.00607104 -0.0165149 -0.0389835 -0.083485 0.0884749 0.0553485 -0.0678692 -0.00275095 0.0615377 -0.0204389 0.00359157 -0.0701532 -0.025632]

结果还是在adam.minimize(loss)时报错。前向计算,反向求梯度都打印了没问题。

Traceback (most recent call last) in 
    175         loss.backward()
    176         #通过minimize函数,让程序根据loss,完成一步对参数的优化更新
--> 177         adam.minimize(loss)
    178         #使用clear_gradients函数清空模型中的梯度,以便于下一个mini-batch进行更新
    179         skip_gram_model.clear_gradients()
PaddleCheckError: Expected param_dims == ctx->GetInputDim("Grad"), but received param_dims:253854, 300 != ctx->GetInputDim("Grad"):0.
Param and Grad input of AdamOp should have same dimension at [/paddle/paddle/fluid/operators/optimizers/adam_op.cc:65]

说优化器参数的维度错误。。。大佬指点下。

 

 

收藏
点赞
0
个赞
TOP
切换版块