Video Highlights: Ultimate Guide To Scaling ML Models – Megatron-LM | ZeRO | DeepSpeed | Mixed Precision

July 21, 2023 by Editorial Team Leave a Comment

In this video presentation, Aleksa Gordić explains what it takes to scale ML models up to trillions of parameters! He covers the fundamental ideas behind all of the recent big ML models like Meta’s OPT-175B, BigScience BLOOM 176B, EleutherAI’s GPT-NeoX-20B, GPT-J, OpenAI’s GPT-3, Google’s PaLM, DeepMind’s Chinchilla/Gopher models, etc. He covers the ideas of data parallelism, model/pipeline parallelism (e.g. GPipe, PipeDream, etc.), model/tensor parallelism (Megatron-LM), activation checkpointing, mixed precision training, ZeRO (zero redundancy optimizer) from Microsoft’s DeepSpeed library and many more. Along the way, many top research papers are highlighted. The video presentation is sponsored by AssemblyAI.

Papers:

✅ Megatron-LM paper: https://arxiv.org/abs/1909.08053

✅ ZeRO (DeepSpeed) paper: https://arxiv.org/abs/1910.02054v3

✅ Mixed precision training paper: https://arxiv.org/abs/1710.03740

✅ Gpipe (pipeline parallelism) paper: https://arxiv.org/abs/1811.06965

Articles:

✅ Collective ops: https://en.wikipedia.org/wiki/Collect…

✅ IEEE float16 format: https://en.wikipedia.org/wiki/Half-pr…

✅ Google Brain’s bfloat16 format: https://cloud.google.com/blog/product…

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW

Filed Under: AI Deep Learning, Big Data, Data Science, Education / Training, Google News Feed, Machine Learning, Main Feature, News / Analysis, Research Highlight, Uncategorized, Video Tagged With: AI, Deep Learning, Machine Learning, Weekly Newsletter Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage
[SPONSORED POST] Organizations can now confidently embrace Elastic, enhance their hot tier storage, and seamlessly manage historical data with cost-efficient capacity-optimized storage. Pure Storage not only meets the demands of the modern data landscape but also empowers organizations to simplify their Elastic architecture, reflecting the industry trend towards a more streamlined and efficient approach.

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

In today’s fast-paced world, driven by demands for speed and efficiency, the field of clinical development has undergone a remarkable transformation. The way trials are being conducted has changed significantly with decentralized clinical trials (DCT) becoming mainstream and the collection of clinical data from wearables and other remote-monitoring devices becoming common practice. While these advances […]

Speak Your Mind Cancel reply