Video Highlights: Ultimate Guide To Scaling ML Models – Megatron-LM | ZeRO | DeepSpeed | Mixed Precision

In this video presentation, Aleksa Gordić explains what it takes to scale ML models up to trillions of parameters! He covers the fundamental ideas behind all of the recent big ML models like Meta’s OPT-175B, BigScience BLOOM 176B, EleutherAI’s GPT-NeoX-20B, GPT-J, OpenAI’s GPT-3, Google’s PaLM, DeepMind’s Chinchilla/Gopher models, etc.

POLL: Which Company Will Lead the LLM Pack?

Since the release of ChatGPT late last year, the world has gone crazy for large language models (LLMs) and generative AI powered by transformers. The biggest players in our industry are now jockeying for prime position in this lucrative space. The news cycle is extremely fast-paced and technology is advancing at an incredible rate. Meta’s announcement yesterday about Llama 2, the latest version of their large language model, being open sourced is a good example.

Brief History of LLMs

The early days of natural language processing saw researchers experiment with many different approaches, including conceptual ontologies and rule-based systems. While some of these methods proved narrowly useful, none yielded robust results. That changed in the 2010s when NLP research intersected with the then-bustling field of neural networks. The collision laid the ground for the first large language models. This post, adapted and excerpted from one on Snorkel.ai entitled “Large language models: their history, capabilities, and limitations,” follows the history of LLMs from that first intersection to their current state.

Power to the Data Report Podcast: The Math Behind the Models

Hello, and welcome to the “Power-to-the-Data Report” podcast where we cover timely topics of the day from throughout the Big Data ecosystem. I am your host Daniel Gutierrez from insideBIGDATA where I serve as Editor-in-Chief & Resident Data Scientist. Today’s topic is “The Math Behind the Models,” one of my favorite topics when I’m teaching my Introduction to Data Science class at UCLA. In the podcast, I’ll discuss how in the age of data-driven decision-making and artificial intelligence, the role of data scientists has become increasingly vital. However, to truly excel in this field, data scientists must possess a strong foundation in mathematics and statistics.

MosaicML Releases Open-Source MPT-30B LLMs, Trained on H100s to Power Generative AI Applications

MosaicML announced the availability of MPT-30B Base, Instruct, and Chat, the most advanced models in their MPT (MosaicML Pretrained Transformer) series of open-source large language models. These state-of-the-art models – which were trained with an 8k token context window – surpass the quality of the original GPT-3 and can be used directly for inference and/or as starting points for building proprietary models.

Book Review: The Kaggle Book/Workbook

Kaggle is an incredible resource for all data scientists. I advise my Intro to Data Science students at UCLA to take advantage of Kaggle by first completing the venerable Titanic Getting Started Prediction Challenge, and then moving on to active challenges. Kaggle is a great way to gain valuable experience with data science and machine learning. Now, there are two excellent books to lead you through the Kaggle process. The Kaggle Book by Konrad Banachewicz and Luca Massaron published in 2022, and The Kaggle Workbook by the same authors published in 2023, both from UK-based Packt Publishing, are excellent learning resources.

Generative AI Report: Stravito Introduces Generative AI Advances that Transform Search into Conversation – and Information into Intelligent Answers

Welcome to the Generative AI Report, a new feature here on insideBIGDATA with a special focus on all the new applications and integrations tied to generative AI technologies. We’ve been receiving so many cool news items relating to applications centered on large language models, we thought it would be a timely service for readers to start a new channel along these lines. The combination of a large language model plus a knowledge base equals an AI application, and this is what these innovative companies are creating. The field of AI is accelerating at such fast rate, we want to help our loyal global audience keep pace. Enjoy!

Big Data Clusters: Building the Best Infrastructure Platform for Big Data Workloads

Our friends over at Silicon Mechanics put together a guide for the Triton Big Data Cluster™ reference architecture that addresses many challenges and can be the big data analytics and DL training solution blueprint many organizations need to start their big data infrastructure journey. The guide is for a technical person, especially those who might be a system admin in government, research, financial services, life sciences, oil and gas, or a similarly compute-intensive field.

Research Highlights: LLMs Can Process a lot more Text Than We Thought

A team of researchers at AI21 Labs, the company behind generative text AI platforms Human or Not, Wordtune, and Jurassic 2, has identified a new method to overcome a challenge that most Large Language Models (LLMs) grapple with – a limit as to how much text they can process before it becomes too expensive and impractical. 

The Science and Practical Applications of Word Embeddings 

In this contributed article, editorial consultant Jelani Harper takes a look at how word embeddings are directly responsible for many of the exponential advancements natural language technologies have made over the past couple years. They’re foundational to the functionality of popular Large Language Models like ChatGPT and other GPT iterations. These mathematical representations also have undeniable implications for textual applications of Generative AI.