Deci Unveils DeciLM-7B: A Leap Forward in Language Model Performance and Inference Cost Efficiency

Print Friendly, PDF & Email

Deci, the deep learning company harnessing AI to build AI, unveiled the latest addition to its suite of innovative generative AI models, DeciLM-7B, a 7 billion parameter large language model. Building upon the success of its predecessor DeciLM 6B, DeciLM 7B is setting new benchmarks in the large language model (LLM) space, outperforming prominent open-source models such as Llama2 7B and Mistral 7B in both accuracy and efficiency. 

DeciLM-7B stands out for its unmatched performance, surpassing open-source language models up to 13 billion parameters in both accuracy and speed with less computational demand. It achieves a 1.83x and 2.39x increase in throughput over Mistral 7B and Llama 2 7B, respectively, which means significantly faster processing speeds compared to competing models. Its compact design is ideal for cost-effective GPUs, striking an unparalleled balance between affordability and high-end performance.

The remarkable performance of DeciLM-7B can be further accelerated when used in tandem with Infery-LLM, the world’s fastest inference engine, designed to deliver high throughput, low latency and cost effective inference on widely available GPUs. This powerful duo sets a new standard in throughput performance, achieving speeds 4.4 times greater than Mistral 7B with vLLM without sacrificing quality. Leveraging DeciLM-7B in conjunction with Infery-LLM enables teams to drastically reduce their LLM compute expenses, while simultaneously benefiting from quicker inference times. This integration facilitates the efficient scaling of Generative AI workloads and supports the transition to more cost-effective hardware solutions.

This synergy enables the efficient serving of multiple clients simultaneously without excessive compute costs or latency issues. This is especially crucial in sectors such as telecommunications, online retail, and cloud services, where the ability to respond to a massive influx of concurrent customer inquiries in real time can significantly enhance user experience and operational efficiency.

Licensed under Apache 2.0, DeciLM-7B is available for use and deployment anywhere, including local setups, enabling teams to fine tune for specific industry applications without compromising on data security or privacy. Its versatility allows teams to easily tailor it for unique use cases across a wide range of business applications, including content creation, translation, conversation modeling, data categorization, summarization, sentiment analysis and chatbot development, among others. When fine tuned for specific data sets, DeciLM-7B can deliver similar quality to that of much larger models such as GPT 3.5 at approximately 97% lower cost and better speed.

“With the increasing use of Generative AI in various business sectors, there’s a growing demand for models that are not only highly performant but also operationally cost efficient,” said Yonatan Geifman, CEO and co-founder of Deci. “Our latest innovation, DeciLM-7B, combined with Infery-LLM, is a game-changer in this regard. It’s adaptable to diverse settings, including on-premise solutions, and its exceptional inference efficiency makes high-quality large language models more accessible to a wider range of users.”

DeciLM-7B’s cost-effectiveness and reduced computational demand make advanced AI technologies more accessible to businesses of all sizes, fostering innovation and driving forward the digital transformation across various sectors. With DeciLM-7B, companies can now leverage the full potential of AI without the prohibitive costs or complexities previously associated with high-end language models.

Deci AI’s introduction of DeciLM-7B builds on its track record of innovative and efficient Generative AI models, including DeciLM 6B, DeciCoder 1B, and DeciDiffusion 1.0. Similar to its other models, DeciLM 7B was generated with Deci’s cutting-edge Automated Neural Architecture Construction (AutoNAC) engine, the most advanced Neural Architecture Search (NAS)-based technology on the market, with its focus on efficiency.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter:

Join us on LinkedIn:

Join us on Facebook:

Speak Your Mind