Heard on the Street – 6/5/2023

Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!

Azure ML Integration w/Snowflake. Commentary by Rodrigo Liang, CEO of SambaNova Systems

“The Azure ML Snowflake announcement is a sign of the evolution of foundation models, and perhaps the flipping of the data gravity concept in AI. Previously we have seen foundation models progress by being built upon huge amounts of amassed data, leading to the revolutionary capabilities we are now seeing from the technology. Enterprises now want to incorporate their proprietary data into foundation models, transforming them into unique assets that align with their specific requirements. Seamless connections to data lakes and data warehouses have emerged as an effective means to achieve this objective. While Azure Machine Learning currently offers an extensive range of models with diverse functions, this announcement signifies a shift towards a more streamlined approach of offering fewer but more versatile models, where the onus is on the enterprise to fine tune and own its own model. This is a growing trend that will likely continue. Foundation models have immense capabilities and will continue to be adopted by enterprises as their data becomes easier to integrate.”

Retailers Rely on Location Intelligence. Commentary by Scot Frank, Director of Product Management, Foursquare

“Geospatial data is an increasingly critical resource for retail businesses. Big box retailers, for example, rely on timely and accurate location intelligence to bridge customer insights from the digital and physical worlds. By connecting aggregated movement data, coupled with sales performance data, retailers can better understand staffing needs and supply-chain demand – as well as compare their performance to competitors. These powerful insights are supporting shops of all sizes in making smarter business decisions, like where to open a new storefront, when to run the best promotions, or improving the customer experience, like lessening wait time at the morning coffee rush. However, simply having access to geospatial data isn’t enough. Retailers must invest in the right tools to manage, analyze, and activate their data in a privacy-first way. Graph databases are one such tool, and they’re growing in popularity due to their ability to analyze large volumes of data. One thing is certain: location intelligence is essential to driving retailers’ optimal success.”

Machine Learning and AI Improve Modern Data Infrastructure Challenges. Commentary by Diana Bald, President of Blue Orange Digital

“Investing in data infrastructure is essential for success in today’s business environment, where data is growing exponentially on a daily basis. A well-designed and reliable data infrastructure should be scalable and adaptable to meet changing needs and to avoid re-architecting or costly mistakes and disruptions. Organizations that can effectively collect, process, and analyze large volumes of data can gain a competitive advantage. Machine learning and AI help by enhancing flexibility and robustness of the infrastructure. For example, machine learning provides the ability to analyze large volumes of data quickly and conduct predictive analysis. AI automates tasks and processes to improve overall efficiency. Together, machine learning and AI can unlock new insights, identify trends, make better decisions, offer personalization, and enhance the overall customer or employee experience. The resulting benefits help organizations drive growth with new business models, products, and services.”

Putting people and potential first in a world of AI. Commentary by Pierre-Yves Noel, Director of Rainbow Cloud Services, Alcatel-Lucent Enterprise

“Bill Gates recently stated that, ‘we’re only at the beginning of what AI can accomplish. Whatever limitations it has today will be gone before we know it’. He also posits that while technology can improve lives for people everywhere, strict rules need to be established so that any downsides are far outweighed by the benefits. Much has been said about the potentially significant pitfalls of adopting new technologies, particularly for large, complex or multi-faceted businesses. It can be easy to get excited by the potential of new technologies, however ensuring business continuity and carefully managing the implementation of new processes must remain the priority for business leaders. We have access to greater volumes of data than ever before. AI can help distil down this wealth of data into value. Identifying the most important trends from thousands of data points and communicating these to the end user in the relevant context can prove invaluable in high-stakes scenarios such as healthcare and crisis management. While the wider business community may be embracing emerging AI technologies with considerable pace, complex organizations have a number of considerations to factor in when adopting AI. But, if adoption is stringently planned and properly managed, the rewards can far outweigh the risks. In doing this, businesses can partner with experienced technology providers to unlock the potential of AI in drastically reduced costs, significantly improved efficiencies in delivery, and maximized offerings to customers that will allow them to maintain the competitive edge.”

Trustworthy approaches to AI. Commentary by Graham Sheldon, Chief Product Officer, UiPath

“It’s no surprise that AI has the power to transform businesses, particularly ones that amplify that transformation with automation processes to speed up workflows, leaving more room for employee productivity. There are, of course, concerns with any rapidly growing technology like AI. As businesses begin to leverage generative AI powered by Large Language Models such as OpenAI’s ChatGPT, there are certain compliance considerations business leaders need to be aware of. There are three tenets to an effective and responsible AI strategy – open, flexible, and enterprise-ready. Openness ensures you can easily leverage future AI innovation. Flexibility enables customers to include multiple AI models that their business may want to work with and delivers training and customization from an accuracy and safety standpoint, which are key in order for AI to work with specific enterprise data. Once the foundational models are trained, enterprise-ready guardrails are set with human-in-the-loop validation to ensure information is accurate. While AI can effectively extract information from documents, inaccuracies can still occur. In addition to the guardrails, human oversight involving subject matter experts (SMEs) is needed. The SMEs role is to teach AI models the intricacies of the business to validate, maintain, and increase accuracy.”

What’s next for MDM. Commentary by Brett Hansen Chief Marketing Officer for Semarchy

“In today’s challenging economic environment, master data management (MDM) needs to deliver meaningful business value at a rapid pace. Inflation, paired with the looming recession, has caused all industries to be weary of making any new technological investments for their teams. However, the right investments will unify and centralize the enterprise’s data that is critical across applications while enabling a unique source of trusted data for effective data governance. In fact, master data management is vital and key in providing high quality data to enterprise applications, operational processes and analytics structures like data warehouses and data lakes – building/preparing a strong foundation for strategic enterprise initiatives like advanced analytics, data science or AI/ML use cases. Nevertheless, MDM initiatives that aren’t meeting specific business needs and using generic industry templates are only a loss of time, compromising innovation and differentiation. The ability to rapidly build and deploy custom, data-rich apps tailored to specific business requirements must be a top criteria for all organizations when selecting an MDM solution. From a technical perspective, effective MDM provides automation and no-code data-driven workflows that empower cross-departmental collaboration and stewardship without the stress of tedious data cleansing and migration tasks. These features optimize efficiency and improve every user’s experience in the workplace. Having a great solution is only the start. To deliver the best results, technology needs to be backed by both people and a good process. To be successful with MDM requires a seasoned team providing its know-how, delivery assurance, and world-class support on one hand, and an intelligent process and deployment approach on the other hand. In short, I see the future of MDM in three words: unified, modular, scalable.”

Data lineage tools identify the bad data that sends generative AI models into “hallucinations.” Commentary by Jan Ulrych, VP of research and education at Manta

“Consider the failures and confusion that arise when someone uses ChatGPT outputs without validating them … or even understanding how to really validate them. There have been plenty of “hallucinations” reported when the model creates false or incomplete answers of its own. And that’s just the beginning. True generative AI — not just large language models (LLM) like Chat GPT that synthesize text — relies on complex data algorithms that run into the same data mishaps and challenges as any other program. This complexity demands equally matched auditability as more industries adopt AI technologies. All successful AI flows from high-quality data. Lineage allows developers to track the origins of the data generative AI is using and identify when the program is shifting sources to inaccurate information — whether that “bad” data is coming from the wrong data source or filtered in such a way it introduces bias. Further, if a data source has changed (and in AI, the “source” can include hundreds of steps) without retraining the model, inaccurate results follow. Lineage brings to light the effects of these changes and roots out inaccurate data that harms an AI model’s performance. Lineage also illuminates the so-called “black box” most AI modeling still lives within to call forward which inputs are the most significant or even better, which specific inputs contributed to the specific parts of the generated outputs. Most importantly, this transparency will be crucial as AI is deployed more widely. It will not only be subject to greater regulatory oversight — but end users will expect a higher level of trustworthiness and transparency from AI tools. Especially with AI being used for writing news articles or reports, it will be more and more critical to prove the origins and sources of information used in an article for future fact-checking.”

Biden AI Regulation Chatter – On Data Protection for Health Tech. Commentary by Adam Rusho, Field CTO of Clumio

“On the heels of industry calls for a halt in artificial intelligence (AI) technological development, comes the Biden Administration’s formal public request for comment on AI accountability measures in a clear first step toward regulatory action. As many forward-leaning companies now leverage AI-native technologies or integrations that rely on vast data lakes, the security, compliance, and integrity of this foundational data will become an increasingly important aspect of the regulatory discussion – and the impending impact on businesses in highly regulated industries. This can include flagging anomalies in patient data to identify potential health issues using large language models (LLMs), widely deploying machine learning (ML) models to detect irregular activity or fraud within financial services, and analyzing sequences for genetic research in life sciences, processing massive amounts of data to find minute differences or unobvious patterns. All such industries are subject to stringent data compliance requirements around retention, encryption, storage, and privacy. With that, there are initiatives organizations must take as AI technology expands and the talks of AI regulations become a reality – such as making sure backup solutions can scale to their AI and ML needs. With terabytes of new data being generated every day, an organization’s data resilience platform should be able to scale to petabytes of data and track millions of events and changes in its data with high fidelity. Organizations should also test their recoverability to ensure that it meets their service-level agreements, and review their total cost of ownership. This means investigating cloud-native, cost-effective solutions for long-term retention needs and taking stock of how much overhead is going into managing copies of data (versions, replicas, archives, vaults). As much data growth as we’ve seen in the last few years, the next few years will bring orders of magnitude more. AI is just one of the technology trends intertwined with data at scale, and we must take proper measures to ensure its protection along the way.”

Generative AI + Data Lakehouse = A match made in low code heaven. Commentary by Dremio’s data advocate Dipankar Mazumdar

“Migrating a dashboard to a data lakehouse offers several benefits. This new architecture broadens the scope of analysis as users are no longer limited to a specific dataset, helping them drill down more. By moving to a data lakehouse architecture, organizations can reduce their overall infrastructure and maintenance costs, allowing them to maximize their value from data. The idea is to let any level of user, irrespective of their technical expertise, have access to comprehensive reports and be able to answer critical business questions using them.”

Are hyperscalers in hot water at CMA launches AI review? Commentary by Ekaterina Almasque, General Partner at OpenOcean

“This CMA review is welcome news for the UK tech sector. Fair competition will always have a positive effect on innovation and pushes the hyperscalers to perform at their best. The current barriers to entry in AI – namely the acquisition of talent, the paucity of open-source large language models (LLMs) and high costs in server time to train or fine-tune models – risk stifling our domestic start-up ecosystem if we do not find a new way forward. To train AI models, you need three things. A high volume of data, a high quality of data, and the ability to access both without transgressing IP law or individual privacy. Hyperscalers possess enormous amounts of user data from the other sides of their business, granting them a great advantage over start-ups with far more limited access to training data. Steps must be taken to make it easier for early-stage AI start-ups to find their niche, compete, and create models to solve business problems. Many believe AI will be a net positive for society and for businesses – so long as it solves more problems than it creates. We are at a stage when the technology sector can decide who is included in the architecture of these models, and who is not: whose voices, idiosyncrasies, and worldviews are represented, and whose are not. This is the time to foster diversity of thought and competition through careful, considered measures.”

Data Analysts Are Using AI and ML to Swing the 2024 Election. Commentary by Jon Reilly, Co-Founder and COO @ Akkio

“There are two ways AI is shaping politics. The first is generative content – AI political ads, images, etc. That’s a largely level playing field and is already receiving plenty of awareness. The second way is Machine Learning for fundraising. Sorting through the general population to find the people who are most likely to donate to a candidate using ML is helping drive substantially better results in crowdfunding, and that is going to have far-reaching impacts because the more money you raise, the larger your megaphone.”

Google AI leak: Will open-source AI competitors take hyperscalers’ lunch money? Commentary by Victor Botev, CTO and co-founder at Iris.ai

“Big isn’t always better when it comes to language models. ChatGPT uses 100s of billions of parameters but the latest open source models show almost equivalent performance with just 13 billion parameters – and Google itself has now admitted that ‘giant models are slowing us down. Smart models trump large models, and as our understanding of these models improves, the performance gap is rapidly closing between smaller open-source AI models and proprietary Large Language Models (LLMs). The pace of innovation reflects the power of community in the race to build more efficient and powerful models. In the coming months, I expect a shift towards smaller, customisable models, as speed of iteration and training data quality outweigh the need for unwieldy Large Language Models (LLMs). Training LLMs from scratch is costly and prohibitive for many enterprises. Instead, we should focus on techniques like Low-Rank Adaptation (LoRA) fine-tuning, which reduces trainable parameters and GPU memory. This enables us to create agile, composable models that improve over time, proving training LLMs from scratch to be excessive and inefficient.”

Navigating The Influx Of Generative AI Tools Responsibly. Commentary by with Sebastian Okser, CTO of Cyndx

“The release of GPTs (Generative Pre-trained Transformers) could have been predicted to some extent based on the progress being made in the field of artificial intelligence and natural language processing (NLP) at the time. In the years leading up to the development of GPT models, there had been significant advances in deep learning techniques, particularly in the area of neural networks. Researchers were also making progress in creating models that could process and generate natural language, although these models were often limited in their ability to generate coherent and contextually appropriate responses. The eventual release of ChatGPT became a phenomenon partly because of the public awareness that it received. ChatGPT is great in helping to automate routine tasks, however its propensity to create hallucinations is often a major blocker that can require users to spend more time debugging than is potentially saved in some situations. Trying to debug code produced by ChatGPT that is correct in its syntax but makes a logical error can often take more time to debug than it saves by producing the code in the first place. On the other hand, routine tasks such as creating configuration files or documentation can be greatly benefited by the use of tools such as ChatGPT. This leads to an interesting situation in which its use needs to be heavily monitored and a skilled technician is the individual who can decipher right from wrong in the output rather than the one who can simply provide commands. Pinecone’s recent raise of $100M, Weaviate’s raise of $50M and Qdrant’s raise of $7.5M have highlighted a heavy influx of capital into the next generation of AI tools that are changing the way that models are deployed. This will lead to increased research and development in the AI space and make a lot of the supporting infrastructure easier to deploy and more reliable for other companies in the industry. However, in the same way that proper use of ChatGPT requires a strong domain knowledge to decipher the truth from the hallucinations, the influx of investments into the AI space will be in vain without experts to determine how to apply these technologies successfully.”

Hold your horses with ChatGPT. Commentary by Daasity CEO Dan LeBlanc

“I think the progress is going to be slower than people think. We are starting to find the gaps in the software and recognizing the things it can and cannot do well. The next phase will really be exploration to see where it can be useful and where it will not, which will likely be tested heavily by Big Tech firms who can afford the experimentation such as Apple, Google, Microsoft, Amazon and Meta.”

What are the benefits of predictive models in accelerating automakers’ software development? Commentary by Tarun Shome, Director, BlackBerry IVY

“Predictive models use machine learning to find patterns in historical data. These patterns can then be used to make predictions about new data. Such models have a great deal of potential for use cases within a software defined vehicle, and the advent of in vehicle High Performance Computing (HPC) allows for these models to be run in the vehicle rather than in the cloud. Moving such models to the vehicle from the cloud gives the benefits of reduced data transmission costs, reduced cloud compute costs, reduced cloud compute storage, and reduced latency. Predictive models can be used to predict outcomes based on learned driver behavior, predict component wear, or even to predict when maintenance will be required for a certain vehicle component. From such models, drivers can benefit from features such as improved EV range accuracy, driver efficiency coaching, at a glance predicted component wear, or even predicted component failure. The insights from predictive models can also provide valuable information to Fleet Managers and OEM service centers. BlackBerry IVY facilitates the deployment of predictive models within the vehicle by providing a middleware run time that incorporates Data Standardization and Machine Learning Frameworks that allow predictive models to be easily developed and deployed as “Synthetic Sensors”. The insights from these predictive models can then be ingested into vehicle based applications, or offloaded to the cloud.”

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW

Heard on the Street – 6/5/2023

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Speak Your Mind Cancel reply

Featured RSS Feed

More News from insideHPC

Heard on the Street – 6/5/2023

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Join Us On Social Media

Speak Your Mind Cancel reply

Related Posts

Featured RSS Feed

More News from insideHPC