前两天自从看到一张图后:
就一直想读一下相关论文,这两天终于有时间把论文看了一下,就是这篇Wide & Deep Learning for Recommender Systems
首先简介,主要说了什么是Wide和Deep:
Wide就是:wide是指高维特征+特征组合的LR, 原文Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort.
Deep就是:深度神经网络,原文:With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize and recommend less relevant items when the user-item interactions are sparse and high-rank.
然后就是本文介绍如何整合Wide和Deep
主要内容:
两个有意思的概念Memorization和Generalization:
Memorization can be loosely defined as learning the frequent co-occurrence of items or features and exploiting the correlation available in the historical data.
Generalization, on the other hand, is based on transitivity of correlation and explores new feature combinations that have never or rarely occurred in the past.
回顾LR和深度学习的方法。
介绍他们的实践,一些细节
目标App Acquisitions
对比join training和ensemble。ensemble是disjoint的。join training可以一起优化整个模型。
训练时候LR部分是FTRL+L1正则,深度学习用的AdaGrad?
训练数据有500 个billion。这是怎么算的,这么NB?
连续值先用累计分布函数CDF归一化到[0,1],再划档离散化。这个倒是不错的trick。
文章不长写的挺有意思的,大家可以下来细读一下。