鸭知半解

Everything happens for the best.

Deep Learning Recommendation Model for Personalization and Recommendation Systems

The model uses embeddings to process sparse features that represent categorical data and a multilayer perceptron (MLP) to process dense features, then interacts these features explicitly using the statistical techniques proposed in the paper of factorization machines. Finally, it finds the event probability by post-processing the interactions with another MLP. Some known latent factor methods Matrix Factorization Factorization Machine MLP Architecture To process the categorical features, each categorical feature will be represented by an embedding vector of the same dimension, generalizing the concept of latent factors used in matrix factorization....

December 6, 2022 · Yihong Li

Wide & Deep Learning for Recommender Systems

16,17年这几篇文章虽然已经很久远了,他们提出的东西已成为现在的共识,但目前还是很多人将他们作为baseline,看这些文章还是有价值。 推荐系统的主要挑战之一是同时解决Memorization和Generalization,Memorization是记住并利用出现过的user-item关系,Generalization是发掘未知的user-item关系,增加推荐多样性。后者可以支持我们的idea。 Memorization用简单的线性模型如logistic regression,对训练集中未出现过的item就无法判断。Generalization用embedding-based models,因为item很多,用户感兴趣的只有一部分,是很稀疏的高秩的,如果将他们映射到低维特征,会导致过度泛化。文章想结合两种方法,提出的模型结构如下: wide部分就是y = w^T x + b,为了给线性模型加上非线性特征Φ(x),还会设计一些cross-product transformation,比如某个特征为1当且仅当其他两个特征都为1。对于一个item x的分类预测如下: 文章在serve的时候是多线程的小batch并行推理,和我们的idea的思想是接近的,但这只是单机的并行推理,batch量并不大。 推荐系统名词解释 impression:用户观察到曝光的产品 click:用户对impression的点击行为 conversion:用户点击之后对物品的购买行为 CTR(Clickthrough rate):从impression到click的比例 CVR:从click到conversion的比例 CTCVR:从impression到conversion的比例

December 4, 2022 · Yihong Li

Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference

Embedding placement problem is essentially the same thing as cache optimization problem, and many of the ideas are similar. Merlin HugeCTR combines a high-performance GPU embedding cache with an hierarchical storage architecture, to realize low-latency retrieval of embeddings for online model inference tasks. Since late 2021, Merlin HugeCTR additionally features a hierarchical parameter server (HPS) and supports deployment via the NVIDIA Triton server framework, to leverage the computational capabilities of GPUs for high-speed recommendation model inference....

December 1, 2022 · Yihong Li

Deep & Cross Network for Ad Click Predictions

It introduces a novel cross network that is more efficient in learning certain bounded-degree feature interactions. In particular, DCN explicitly applies feature crossing at each layer, requires no manual feature engineering, and adds negligible extra complexity to the DNN model. The Cross Network comprises all the cross terms of degree from 1 to l+1. We show that, with only O(d^n) parameters, the cross network contains all the cross terms occurring in the polynomial of the same degree, with each term’s coefficient distinct from each other....

December 1, 2022 · Yihong Li

DeepFM: A Factorization-Machine based Neural Network for CTR Prediction

FM ROC receiver operating characteristic curve,曲线上这些点代表着一个分类器在不同阈值下的分类效果 FM就是用embedding代替weight,在不依赖全部的共现信息的前提下建模feature interactions。 在理解FM之后,模型结构也比较好懂。sparse feature是指训练集raw feature经过处理之后的特征,比如one-hot,离散化等处理。 本文和wide & deep的区别是不需要FM的预训练和各种人造特征,可以直接端到端学习低秩和高秩的特征,相当于特征更全面。

November 28, 2022 · Yihong Li

DistCache: Provable Load Balancing for Large-Scale Storage Systems with Distributed Caching

为了实现多机架的负载均衡cache,提出两个idea。 一是多层的cache,每一层用不同的hash函数,即不同的划分方式。 二是Power of Two Random Choices的路由方法。Power of Two Random Choices的大白话解释是假如顺序地将请求发送到n个服务器,策略是从n个服务器中随机独立均匀地选择两个,然后选择服务器负载最少的处理请求。这样的算法以非常高的概率,n个服务器中最大负载为(1+o(1)) loglogn / log2 + O(1)个请求。 原始的Power of Two Random Choices和本文不同的地方在于,原始的是随机地找两个服务器,本文是hash到两个指定服务器。 多机架的负载均衡cache,可以很容易想到两种方法,Cache partition和Cache replication,前者是将不同的cache划分在不同的node上面,但这没有解决负载不均衡问题;后者将所有cache都放到不同的cache node上,这有一致性的开销和存储的开销。 下面层的cache node只负责cache自己服务器上的hot items,上面层用了不一样的hash function,相当于将下面层每个cache node的item都分散开来。然后对一个key上下两层会各有一个cache node,这时选负载最低的cache node响应请求。 文章把问题看成一个找完美匹配的问题,用了一些图论,排队论的知识去证明这个power of two choices可以实现,无论请求的分布如何,总的吞吐量都接近全部机器的最大吞吐量总和(也就是实现负载均衡)。其中一些截图没有解释的变量:m是cluster数量,p_i是i对象的请求概率,R是总的响应量,波浪T是一个cluster的最大吞吐量,P是所有的p_i。 其他的内容就和NetCache差不多,不同的主要在缓存一致性上面。怎么同时原子更新不同cache node上的缓存?文章的解决方法是two-phase update protocol,收到写请求时先无效化所有缓存,storage server让一个无效化的数据包传遍所有cache node最后回到storage server。最后传一个数据包更新cache。 cache的更新则用的是NetCache的HH detector的监测方法,但有些不同在于本文用一种不需要controller的方法。Specifically, the agent first inserts the new object into the cache, but marks it as invalid. Then the agent notifies the server; the server updates the cached object in the data plane using phase 2 of cache coherence, and serializes this operation with other write queries....

November 24, 2022 · Yihong Li