Microsoft Released “DeepSpeed”, deep learning optimization technology

February 10, Microsoft Research released optimization technology for deep learning.  It helps make distributed model training simple and effective, and it has the ability to train 100 billion parameter model.  

It was developed for the purpose of improving the efficiency of natural language model training, which is the agenda of deep learning, and it is compatible with PyTorch.  It includes ZeRO (Zero Redundancy Optimizer), which is a new parallelized optimizer that reduces necessary resources for model and data parallelism.

ZeRO is a memory optimization technology for a large-scale distributed deep learning.  It can train deep learning model with 100 billion parameters on the current generation of GPU clusters at three to five times the throughput of the current best system.  By the way, code changes can be minimized.

Thank to DeepSpeed and ZeRO, it’s been reported that Microsoft Research created Truning-NLG (Turing Natural Language Generation), the largest language model with 17 billion parameters.  They broke the SOTA (State of the Art) record in the machine learning category.

DeepSpeed is available on the project website under MIT License.

DeepSpeed
https://github.com/microsoft/DeepSpeed