Heard on the Street – 5/9/2024

Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Click HERE to check out previous “Heard on the Street” round-ups.

Generative AI’s Effect on Climate Tech. Commentary by William Allison, CTO at UC Berkeley

“Much of the dialogue about generative AI’s undeniable environmental effects focuses on its significant power and water use. It’s worth taking a moment to explore the genAI’s potential to positively impact climate tech, which will be felt across residential, commercial and industrial sectors.

Generative AI can extend beyond a chatbot that interacts using human language such as Open AI’s ChatGPT. The GPT stands for ‘generative pretraining transformer,’ and models generate outputs based on pre-trained neural network transformer models.

In fact, GenAI transformer models can be applied to any type of data that can be tokenized (broken into chunks). GenAI can extract patterns from data and generate novel outputs that can subsequently be used as inputs for all types of systems from robots to manufacturing. GenAI is already being applied this way in smart agriculture. For example, GenAI systems today connect to data collection and control systems in current generation farm equipment to reduce farming’s carbon footprint for instance by fine tuning water usage. GenAI is used by the energy industry to control and monitor power generation systems, carbon capture, and management of renewable energy, to optimize and reduce energy loss. GenAI is applied to green manufacturing to reduce waste. Beyond controlling systems, GenAI is being widely used for data modeling, affording new ways to collect and assess large complex data sets. These collection and analytic capabilities benefit climate tech by helping scientists better understand the scope of deforestation, ocean health, and impacts of climate change on biodiversity.

Generative AI in forms we aren’t yet imagining will be applied to many types of systems at very foundational levels. Over time experimentation and iterative innovation will lead to new efficiencies and insights not possible today. While GenAI is all the rage in 2024, it’s also only one of many types of AI. Deep learning and traditional Machine Learning are already playing a significant role enabling climate tech, including UC Berkeley projects such as Coral Reef Restoration, the Fate of Snow, Algae Bloom Detection and Species Monitoring.

As Amara’s Law observes- people tend to overestimate the effects of a technology in the short term because of the hype, leading to underestimation of the long run effects. Although there is a long history of AI development and evolution, we are still in the very early stages.”

Distributed SQL is the right choice for RAG. Commentary by Ed Huang, CTO at PingCAP

“The emergence of tools like ChatGPT last year sparked a surge of interest in RAG (Retrieval Augmented Generation), a methodology for improving the accuracy of LLM output. This in turn prompted a surge of investment in vector databases on the theory that they make the best hosts for RAG content. But in my opinion, vector functionality doesn’t warrant its own database. It should be a feature within existing databases. SQL databases are perfectly capable of storing and retrieving vector information, and the familiar syntax is more in line with developers’ intuitions.

Vector database advocates will say a database that natively “speaks” vector is the only kind that can handle RAG at scale. But typical RAG applications — customer service chatbots, training systems, research and analysis tools — don’t generate nearly enough traffic to justify the maintenance of a separate database. Distributed SQL databases deliver plenty of speed and scalability for RAG without siloing data. In fact, the same qualities that make them so effective for HTAP (hybrid transactional/analytical processing) also make them perfectly suited for vector search. With distributed SQL, you sacrifice nothing in vector search performance while keeping your RAG data easily accessible for creative new applications.”

Intel earnings report. Commentary by Philip Kaye, Co-Founder and Director at Vesper Technologies

“Intel’s earnings highlight the competitive landscape for data centre CPUs and the shift in the market to AI silicon. It’s not all doom and gloom though, as we’ve seen with NVIDIA, the market for AI-focused hardware is huge and only continuing to grow. Intel has also diversified into large-scale manufacturing, building several new Fabs, which is a long-term strategy. I am optimistic about their future in manufacturing and that they will recapture some of the market share they’ve ceded to competitors recently. The company is full of brilliant minds and active leadership.”

Two Strategies to Avoid the AI Noise and Focus on Real ROI. Commentary by Saar Yoskovitz, CEO and co-founder at Augury

“There’s always a new technology that captures people’s attention, but the AI ‘noise’ has shown that many struggle to identify technology that actually meets an organization’s unique needs. Here are two strategies that will help organizations who are looking to make meaningful technology investments. First, understand your organization’s problem and don’t fall for shiny objects. Companies must know exactly what internal challenge they need to overcome, then understand how an AI or IoT solution is going to solve it. Second, beware of marketing fluff. Ask questions and look under the hood to ensure you’re not being pitched fake AI: Ask if the vendor has numbers that show real-world benefits, if the solution is scalable, and if they will be a true partner all the way through the process.”

Can AI-Powered Search Engines Shake Up the Market? Commentary by Sarah Nagy, founder and CEO of Seek AI

“AI-driven search presents a fresh challenge for legacy players like Google and Bing, which currently dominate the market. Historically, we’ve seen how challenger brands–for instance, Netflix, Airbnb, and Uber– completely transformed their respective industries by changing all the rules. The future of search will likely witness a heated race between companies leveraging AI innovation to engineer the most intuitive and relevant search experience for the next generation. Platforms like Perplexity are introducing a novel approach of providing consumers with citations alongside their answers, which is a valuable safeguard against misinformation that’s been a problem search engines have struggled to contain. Until incumbents can match the quality and innovation of these AI-powered products, they risk losing market share to these new players. Consumer-focused innovation will ultimately define winners and losers in the years to come.”

Leveraging AI in Healthcare. Commentary by Calum Yacoubian, Director of NLP Strategy, IQVIA

“The hype around AI in healthcare is nothing new, but the speed of innovation and accessibility associated with the technology is. For several years, leading healthcare institutions have been exploring the use of AI for research to improve population health, precision medicine, and predictive analytics. However, the AI models used to drive these applications often cater to highly technical teams and, in some cases, don’t meet clinical standards.

As a result, the greatest challenge of leveraging AI in healthcare has been around operationalizing the technology, which makes it essential to have a strong focus on ethics, privacy, and security. For now, responsible use of AI must include a level of human review while the industry sorts through these operational challenges. Ultimately, AI’s greatest benefit in healthcare lies in its ability to reduce administrative and cognitive burdens on clinicians, enabling them to spend more face-to-face time with patients.”

Regulations and mounting pressure drive cloud adoption decisions. Commentary by Randy Raitz, VP of Information Technology & Information Security Officer, Faction, Inc.

“Regulations and public pressure to properly protect information held by organizations will drive strategic decisions around cloud adoption. Companies will quickly realize that managing multiple copies of data across multiple clouds slows down their efforts, complicates their products, and produces siloed results. Organizations will recognize that a multicloud approach means a single copy of data is being used across all cloud providers making it easier to properly protect their data.“

On generative AI and the hiring spree. Commentary by SupportNinja CEO Craig Crisler

“Generative AI is white hot and in demand – so is the job market for it. While many companies are on a hiring spree for AI, we’re also seeing a shortage in talent for folks with AI PhDs and data scientists, making them very expensive and difficult to find.

Companies now have to walk the fine line of finding the best AI talent while making room for said talent within the payroll budget. Some might get one or two really expensive hires and fill the rest of the team with cheaper talent, while some might fill out their entire team with mid-range salaries and go with a more balanced approach.

Leveraging tools like outsourcing can help all companies find the right talent for them, no matter where they are in the world. It also guides them in finding the right approach to building out talent. Whether it’s going top-heavy building from the ground up, assembling the best AI team with the best talent will be the best way to approach innovation.”

The Indispensable Link between Master Data Management and Reliable AI Outcomes. Commentary by Steven Lin, Data Expert, Semarchy

“As AI continues to shape critical sectors like healthcare, finance, and public policy, the integrity and quality of its underlying data become paramount. Master data management (MDM) is a crucial discipline, ensuring AI models are founded on accurate, consistent, and comprehensive data. High-quality data fosters accurate and dependable AI outcomes, while poor-quality data can lead to biased or flawed decisions. With the velocity and volume AI is consuming and learning from these decisions, these effects drastically compound – which will either accelerate or hinder your business goals.

Data quality becomes especially vital in sectors where decisions have significant human impacts, such as loan approvals, medical diagnostics, and criminal justice. MDM offers a structured framework for aggregating, linking, and maintaining essential data from diverse sources with consistency and accuracy that’s usually augmented by human intuition, intelligence, and oversight. This approach establishes a “single source of truth,” essential for training reliable AI models and facilitating data governance and standardization.

Emerging best practices and frameworks include robust data governance, which sets data quality standards and processes that ensure consistent handling and transparency. Regular bias audits are crucial, utilizing tools and methods to detect and mitigate biases in datasets and model predictions, promoting the development of fairer AI systems. Continuous data quality monitoring through technologies like machine learning (ML) also helps dynamically identify and correct issues, preserving data integrity. Promoting a collaborative culture among data scientists, IT professionals, and domain experts aligns AI goals with data quality standards. Finally, implementing international standards such as ISO 8000 or the Data Management Association’s Body of Knowledge (DMBOK) ensures global consistency in data management, enhancing the reliability and comparability of AI systems internationally.

Through these practices, MDM supports and enhances the reliability, fairness, and trustworthiness of AI applications.”

Apple developing AI chips for data Centers. Commentary by Philip Kaye, Co-Founder and Director at Vesper Technologies

“Apple’s move to develop its own AI chips for data centres marks a significant shift in the tech landscape. Echoing Microsoft’s strategy but instead partnering with TSMC, the news highlights the growing trend among tech giants to design bespoke hardware solutions that enhance efficiency and performance in specialised AI tasks. Thanks to the AI arms race, we are witnessing what may end up being game-changing developments for the IT hardware industry. Apple’s entry into chip design will be crucial in determining its future as an AI powerhouse.”

Why it’s time to ditch Frontier Models for Sovereign Models? Commentary by Andrew Joiner, CEO of Hyperscience

“The AI market has grown significantly, and today frontier models dominate the industry conversation. But in the current ‘wild west’ AI landscape, where many AI science projects proliferate but real ROI is hard to come by, organizations are looking for more than an LLM trained on a wide-breadth of publicly available data. Organizations today have three key requirements for rolling out AI in the enterprise: accuracy, traceability, and transparency. Sovereign AI models, which place borders and incorporate restrictions beyond the traditional frontier model approach by training models solely on proprietary data, meeting governance and security needs, check the boxes on all three requirements.

Similar to the concept of sovereign nations, these models take AI development in-house, allowing for businesses to have full control without external dependencies, providing organizations with the pure accuracy required for business critical decision-making and automated decisioning and outcomes. Government agencies have led the way in building and applying narrow, sovereign models based on their own data – and private sector organizations have an opportunity to follow their lead to ensure accuracy, traceability, and transparency in their AI applications.

Building a sovereign model provides companies with the accuracy of their own proprietary data, the traceability that comes with knowing where the data comes from and how it is used, and the transparency of understanding how and why an AI system arrived at an automation and outcome.. As governments and international bodies continue to introduce new AI regulations, the capabilities that sovereign models provide will become increasingly important, as organizations must prove how their AI systems operate and make decisions.

To successfully apply AI in the enterprise, organizations must build systems that deliver automation and productivity, as well as transparency and compliance. By embracing sovereign models, enterprise leaders can build accurate and trustworthy AI systems, hyper-personalized to the language of their business, that deliver competitive advantage and disrupt their industries.”

Compliance automation tools have enormous potential for managing data. Commentary by Claude Zwicker, Senior Product Manager, Immuta

“In 2024, data leaders are all working within a complex data ecosystem, where sensitive data powers insights and actions that enable businesses to grow and flourish. Within this ecosystem, an automated data access control system can help to save time that would otherwise be spent compiling the assets necessary for audits. With that time back, your team will be better equipped to assess and report risks and align on the best next steps, ultimately providing better data protection. For example, with the SEC’s cybersecurity disclosure requirements, organizations are required to make a disclosure within 4-business days once a cybersecurity incident is deemed by the company. With compliance automation, teams can spend more time on determining and strategizing the best way to respond and notify those that have been impacted, rather than all the time spent compiling the necessary information.”

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW

Heard on the Street – 5/9/2024

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Speak Your Mind Cancel reply

Featured RSS Feed

More News from insideHPC

Heard on the Street – 5/9/2024

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Join Us On Social Media

Speak Your Mind Cancel reply

Related Posts

Featured RSS Feed

More News from insideHPC