Horovod

Horovod 在18年其他框架的分布式训练不成熟的时候,率先用NCCL和MPI实现了ring allreduce。 Horovod Timeline Tensor Fusion Ring-allreduce utilizes the network in an optimal way if the tensors are large enough, but does not work as efficiently or quickly if they are very small. One of the unique things about Horovod is its ability to interleave communication and computation coupled with the ability to batch small allreduce operations, which results in improved performance. We call this batching feature Tensor Fusion....

March 25, 2022 · Yihong Li