Recommender Systems

Deep Learning Recommendation Model for Personalization and Recommendation Systems

The model uses embeddings to process sparse features that represent categorical data and a multilayer perceptron (MLP) to process dense features, then interacts these features explicitly using the statistical techniques proposed in the paper of factorization machines. Finally, it finds the event probability by post-processing the interactions with another MLP. Some known latent factor methods Matrix Factorization Factorization Machine MLP Architecture To process the categorical features, each categorical feature will be represented by an embedding vector of the same dimension, generalizing the concept of latent factors used in matrix factorization....

Wide & Deep Learning for Recommender Systems

16，17年这几篇文章虽然已经很久远了，他们提出的东西已成为现在的共识，但目前还是很多人将他们作为baseline，看这些文章还是有价值。推荐系统的主要挑战之一是同时解决Memorization和Generalization，Memorization是记住并利用出现过的user-item关系，Generalization是发掘未知的user-item关系，增加推荐多样性。后者可以支持我们的idea。 Memorization用简单的线性模型如logistic regression，对训练集中未出现过的item就无法判断。Generalization用embedding-based models，因为item很多，用户感兴趣的只有一部分，是很稀疏的高秩的，如果将他们映射到低维特征，会导致过度泛化。文章想结合两种方法，提出的模型结构如下： wide部分就是y = w^T x + b，为了给线性模型加上非线性特征Φ(x)，还会设计一些cross-product transformation，比如某个特征为1当且仅当其他两个特征都为1。对于一个item x的分类预测如下：文章在serve的时候是多线程的小batch并行推理，和我们的idea的思想是接近的，但这只是单机的并行推理，batch量并不大。推荐系统名词解释 impression：用户观察到曝光的产品 click：用户对impression的点击行为 conversion：用户点击之后对物品的购买行为 CTR(Clickthrough rate)：从impression到click的比例 CVR：从click到conversion的比例 CTCVR：从impression到conversion的比例

Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference

Embedding placement problem is essentially the same thing as cache optimization problem, and many of the ideas are similar. Merlin HugeCTR combines a high-performance GPU embedding cache with an hierarchical storage architecture, to realize low-latency retrieval of embeddings for online model inference tasks. Since late 2021, Merlin HugeCTR additionally features a hierarchical parameter server (HPS) and supports deployment via the NVIDIA Triton server framework, to leverage the computational capabilities of GPUs for high-speed recommendation model inference....

Deep & Cross Network for Ad Click Predictions

It introduces a novel cross network that is more efficient in learning certain bounded-degree feature interactions. In particular, DCN explicitly applies feature crossing at each layer, requires no manual feature engineering, and adds negligible extra complexity to the DNN model. The Cross Network comprises all the cross terms of degree from 1 to l+1. We show that, with only O(d^n) parameters, the cross network contains all the cross terms occurring in the polynomial of the same degree, with each term’s coefficient distinct from each other....

DeepFM: A Factorization-Machine based Neural Network for CTR Prediction

FM ROC receiver operating characteristic curve，曲线上这些点代表着一个分类器在不同阈值下的分类效果 FM就是用embedding代替weight，在不依赖全部的共现信息的前提下建模feature interactions。在理解FM之后，模型结构也比较好懂。sparse feature是指训练集raw feature经过处理之后的特征，比如one-hot，离散化等处理。本文和wide & deep的区别是不需要FM的预训练和各种人造特征，可以直接端到端学习低秩和高秩的特征，相当于特征更全面。

Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update (OSDI 2022)

用了很多分布式系统的思想。在线更新推荐推理系统，优化了P2P的通信，而且考虑了模型更新的优先级和差模型对SLO的影响。差模型的影响通过一个 inference model state manager 监控，它有一个几分钟前的baseline模型，它会接收部分的用户流量作为ground truth去评估现在的模型。对模型的parameter分片传输，达到最终一致性，作者认为一个模型的不同参数版本不一致，对推理结果影响不大，只要最终一致就可以；用version vectors去做replica之间的P2P同步，物理时间+id作为一个parameter的version number。用一个Dominator Version Vector去维护cache，保证大于Dominator Version Vector的版本都在cache里面，删除过期cache会更新Dominator Version Vector的计数 (merge)。如果Version Vector比Dominator Version Vector大说明所有东西都在缓存中。 Shard Version是和replica绑定的，Shard Version少很多，是为了减少Version Vector做的二级数据结构；Shard Version大，Version Vector肯定大。 update priorities考虑的因素是freshness，gradient magnitude和request rates的多项式。为每一个模型准备一个baseline模型，是几分钟前的模型参数，用于监控模型更新有没有变坏。 witness servers用于记录update，但是不会即时flush也没有update priorities，当infernece server需要rollback的时候才flush。