Heard on the Street – 3/21/2024

Print Friendly, PDF & Email

Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Click HERE to check out previous “Heard on the Street” round-ups.

Musk’s OpenAI lawsuit reflects trend of open-source exploitation. Commentary by Patrik Backman, General Partner at early-stage deeptech VC OpenOcean

“Elon Musk’s lawsuit against OpenAI highlights how the foundational principles of open-source development have been tested in recent years. The transformation of OpenAI from a non-profit entity into a profit-driven organization, closely tied with Microsoft, mirrors broader trends we’ve observed in the tech sector, where large corporations have increasingly capitalized on open-source innovations without proportionately contributing back to the community.

The original mission of OpenAI promised an inclusive development pathway that prioritised human benefit over profit. However, as we saw with HashiCorp or MongoDB’s strategic licensing decisions, navigating the balance between open innovation and financial sustainability is complex. Open-source projects, especially those with the potential to redefine our relationship with technology, must carefully consider their licensing models to ensure they are able to operate while staying true to their core ethos. These models should facilitate innovation, true, but they should also guard against the monopolization of technologies that have the potential to permanently impact humanity.”

Open Data Day – March 5, 2024. Commentary by Jason Kent, Hacker in Residence, Cequence Security

“Open Data Day underscores the importance of responsible data management and security practices. Ensuring that data remains accessible while protecting it from unauthorized access and misuse is essential for fostering innovation and trust in the digital age.

In today’s digital landscape, information flows through applications powered by APIs, which have become the fastest way to easily create web applications. This shift brings both advantages and risks for businesses. APIs, designed for machine-to-machine interaction, encompass commands, payloads, and data crucial for engaging user experiences. However, including sensitive data within APIs poses a recurring challenge: inadvertent exposure. Such lapses jeopardize regulatory compliance and lead to costly data breaches.

A comprehensive approach encompassing discovery, detection, and prevention measures is crucial to address these challenges. Organizations can mitigate the risk of sensitive data exposure and API-related threats by implementing comprehensive data protection measures. Proactive measures are essential in safeguarding against the myriad risks associated with API usage in today’s interconnected digital ecosystem.

AI is helping to solve today’s major supply chain challenges. Commentary by Ryan Tierney, SVP of Product Management, TrueCommerce

“The global supply chain is being challenged every day with volatility, rising costs, and sustainability. By harnessing the power of AI, businesses can optimize their supply chain by automating processes and using data analytics to reduce costs and streamline operations.

One of the key benefits of AI in the supply chain is its ability to accurately forecast demand. Traditional demand forecasting methods often fall short due to their reliance on historical data and manual processes. AI-powered algorithms can analyze vast amounts of data, including market trends, customer behavior, and even external factors like weather patterns, to provide accurate real-time demand forecasts. This enables companies to make informed decisions regarding production volumes, inventory levels, and distribution strategies. 

AI also plays a crucial role in automating repetitive tasks within the supply chain. By utilizing machine learning and robotic process automation, companies can streamline warehouse operations, order processing, and transportation logistics. Additionally, AI algorithms can optimize routes for delivery trucks, considering factors like traffic conditions and fuel efficiency, leading to cost savings and faster delivery times.

Another area where AI adds value to the supply chain is quality control. Traditional quality control processes often are time-consuming and prone to errors. With AI, companies can implement technologies that will automatically detect defects or anomalies in products during the manufacturing process.

It has become increasingly vital for organizations to adopt AI in order to gain a competitive edge. Embracing AI-driven solutions is not only a strategic move, but a necessity for operational efficiency, reducing costs, and improving customer satisfaction in today’s digital and interconnected world of global commerce.”

RAG is alive and well. Commentary by Alex Ratner, CEO and Co-founder, Snorkel AI

“There’s lots of chatter about Gemini 1.5 Pro being the “RAG killer.” Some of it overblown. The reality is enterprises will still use RAG for complex production systems. RAG still wins from a cost, latency, and scale perspective. Even more durably: a RAG approach is modular. So for more complex, scaled, and / or production settings, RAG is likely here to stay.

Long context models will definitely eat up a lot of simpler use cases + pre-production development—which is a lot of AI today—especially when factoring in progress with post-transformer eg SSM architectures.

Regardless, the key step remains the same: tuning LLM systems on good data! Whether tuning/aligning an LLM or an LLM + RAG system, the key is in the data you use, and how you develop it.”

The Future of AI is Hybrid. Commentary by Luis Ceze, CEO & Co-founder, OctoAI

“In the realm of AI today, the interplay between choice and accessibility is foundational to innovation. Traditionally, the cloud has served as a robust engine for AI, facilitating complex computational tasks and extensive data storage. However, as AI continues to advance, the inherent limitations of a cloud-centric approach, including latency, privacy issues, and bandwidth constraints, become increasingly evident. 

In response, edge computing presents a compelling alternative, processing data locally to mitigate these challenges, particularly for time-sensitive applications. This approach not only enhances privacy and security by retaining sensitive information on-site but also echoes the early days of cloud computing, where remote data hosting provided significant efficiency improvements. 

The future, much like the past, suggests a hybrid model that combines the best of both worlds, offering the necessary flexibility for diverse organizational needs and projects, thereby ensuring that AI innovation continues to thrive on the principles of choice and accessibility.”

Your AI initiative is probably set up to fail. Jerod Johnson, Senior Technology Evangelist at CData

“Balancing AI-fueled initiatives and data management is one of the many challenges facing organizations that wish to maintain their competitive edge. AI is obviously at the forefront of everyone’s mind, and with good reason, but the results of AI efforts are only as valuable as the data they’re based on. AI platforms and processes trained on well-governed, curated datasets are capable of finding lost insights, rapidly making predictions, and even prescribing impactful, profitable actions to help drive business. 

Robust data governance practices are all but required for any organization looking to make the most of their AI-fueled initiatives. Properly controlling access to data, ensuring accuracy, maintaining compliance and regulatory requirements are foundational practices to set organizations up for success building meaningful, usable datasets for their AI initiatives. By pairing well-governed data with AI initiatives, businesses can drive innovation while safeguarding their reputation and customer trust.”

Ensuring LLM accuracy requires rigorous model sanitation. Commentary by Chase Lee, Enterprise GM, Vanta

“While LLMs are impressive, the data that flows into LLMs touches countless systems, and this very interconnectedness poses a growing data security threat to organizations.

LLMs themselves are not always completely understood. Depending on the model, their inner workings may be a black box, even to their creators. Meaning, we won’t always fully understand what will happen to the data we put in, and how or where it may come out.

To stave off risks, organizations will need to build infrastructure and processes to ensure data hygiene. Such measures include taking stock of model inventory. In other words, know every instance of every model you are running in production as well as in development. 

Another measure is data mapping. Track and monitor all of the data entering your models and training pipelines.

Lastly, but perhaps more importantly, rigorous data sanitization for inputs and outputs. Just because you have vast troves of data to train a model doesn’t mean you should. Scrutinize data to ensure it’s free of risk. Every data point should have a reasonable and defined purpose. Ensure outputs are not only relevant, but also coherent and sensible within the context of their intended use.”

Beyond Deepfakes: the Problem of Mis- and Dis-information Campaigns in Elections – and How to Solve It. Commentary by Prashant Bhuyan, founder, CEO and chairman of Accrete

“Past elections have proven that bad actors can and do influence campaign narratives, and social media influence operations appear to be gearing up both here and abroad. Understandably, deepfakes are garnering a great deal of media attention since today’s AI-generated voice and video likenesses of politicos are easier to create and harder than ever for the average person to discern. But in addition, we’ve recently seen that “misattributed’ photos and even video game footage can be utilized in mis- and dis-information campaigns. Once a malicious post goes viral, the damage is done, even if it is later widely publicized to have been perpetuated by bad actors.

While it is important to determine the veracity of images, video, and audio, it’s equally important for political campaigns, journalists, government, and defense officials to focus on getting ahead of what initiatives could or are beginning to go viral, and what sources are initially sharing materials, often in a coordinated fashion. AI agents that identify the sources and online networks behind misinformation threats enable users to track the origin of a post, determine engagements, and ascertain coordinated actions by bad actors to spread false information and sway public opinion around campaigns, candidates, and even election results.”

The Role of AI and Human Oversight in Financial Services Personalization. Commentary by JB Orecchia, President and CEO of SavvyMoney

“Artificial intelligence and large language models will continue to evolve in the coming year. This technology will pave the way to better personalization and recommendations for consumers based on large amounts of predictive data. While AI can do the heavy lifting and package up personalized suggestions, it will still be the responsibility of human representatives to double-check the output to make sure it’s accurate, fair and accompanied by the proper disclosures before being presented to the customer. Financial services is a highly-regulated space, so it’s critical any AI outputs are closely monitored.”

AWS’ unrestricted free data transfers is “data portability washing.” Commentary by Kevin Cochrane, CMO of Vultr

“The AWS announcement re: unrestricted free data transfers to other cloud providers is, in essence, “data portability washing.” On the surface, what AWS is doing is enabling customers to move a certain amount of their data at no cost, if they notify AWS. The reality is – these customers would need to delete all of the data housed on AWS across all their workloads if they want to take AWS up on the no-egress-fee offer. This shows AWS’ lack of true commitment to help their customers embrace a multi-cloud strategy, lack of commitment to composability, and lack of commitment to true cloud cost optimization and FinOps. No-fee data portability should be considered table stakes in today’s reality, where composable applications run on composable infrastructure.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW

Speak Your Mind