Heard on the Street – 5/25/2023

Print Friendly, PDF & Email

Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!

Meta’s $1.3 Billion GDPR Fine. Commentary by Amit Shaked, CEO and co-founder, Laminar 

“The E.U. ‘s decision to fine Meta for illegal data transfers from Europe to the U.S. signifies the severity of the consequences for mishandling data. While in this circumstance, Meta was aware of the data being shared, many organizations are in the dark regarding where their sensitive data resides and unknowingly breaking compliance regulations. This unknown or ‘shadow’ data is a concern for 93% of data security and governance professionals, and one of the No.1 targets for cyberadversaries because it is not governed nor under the same security structure as other datastores. To avoid the heavy fines from compliance laws such as GDPR, HIPAA, CCPA and more, it’s critical that data and governance professionals have automated guardrails in place to identify sensitive data and verify policy is being followed. In our cloud-age which is defined by ephemeral networks and data democratization, organizations need to have the ability to quickly detect and remediate regulatory compliance violations, and be able to quickly generate audit-ready reports. Doing so can not only help them avoid fines, but also protect them from the financial and reputational damages of a breach.” 

Fine for Facebook Proves Net is Closing In. Commentary by Sandra Molenaar, director Consumentenbond

“After the resounding victory in our own lawsuit against Facebook, this is another important blow for consumers. It is also an important support for consumers who participate in our claim against Facebook. It makes their case a lot stronger. Slowly but surely, the net is closing around Facebook. Consumers can still join our action and I urge everyone to do so. Together we must make a fist against Facebook. The amount does justice to the violation Facebook committed. The rights of millions of consumers have been trampled on. That requires a deterrent amount. And that is this fine.”

Why are we already talking about GPT-5? Commentary by Victor Botev, CTO and Co-Founder at Iris.ai

“It’s understandable that OpenAI is being cautious when it comes to commencing development of the latest version of its flagship large language model, GPT-5. With the release of GPT-4 occurring just over a month ago, there’s still a whole generation of tools, applications, and platforms that can be created from variations of this brand-new model. Huge corporations already possess mountains of training data with which they can build sprawling, complex AI models. Yet bigger doesn’t always equal better. We’re already seeing models trained on high-quality data sets with limited parameters be better applied to specific use cases, rather than the one-size-fits-all large language model (LLM) approach. This perhaps goes some way to explaining why OpenAI has not started training GPT-5. More practical models which cost significantly less to train present a better way forward for all. The next step is ensuring that all AI models’ training data is properly monitored and the outputs factually validated. Instead of focusing on the next big thing, fine-tuning the incredibly capable models we already have will usher in the next era of the technology.”

Insights on AI Regulation: It’s not surprising. Commentary by Jake Klein, CEO of Dealtale

“Generative AI technologies represent a groundbreaking way to encapsulate and distribute information. Given the Chinese government’s goal of controlling–and in some cases censoring–the dissemination of information it finds sensitive, this move is not that surprising. That being said, the future implications of generative AI for society are certain to be significant, and are not yet close to being fully understood. Large language models (LLMs) have the potential to be misused as tools of disinformation that can exacerbate some of the problems that we already see, for example, with social media platforms. So will some form of regulation or oversight be required? I think the answer is probably yes, although the motivation and implications are a bit different than what we are seeing in China.”

Unlocking the hidden potential of underutilized data. Commentary by Sasa Seles, a Software Developer at Industrial ML, inc 

“Businesses are producing vast amounts of data, yet they lack the capability to analyze it effectively, resulting in an underutilization of data that affects most industries. This problem leads companies to lose opportunities to optimize internal processes, prevent crises, and reduce costs. The manufacturing industry, for example, has evolved and become better connected, but the usage of sensor data has relied on templated dashboards that display sensor values in real-time and occasionally have alert systems that work on simple interactions, resulting in a flood of alerts and a lack of practical analysis. One solution to this issue is implementing a system that monitors multiple sensors, tracks their progress, and analyzes the situation using algorithms. Computer vision and AI implementations can also be used to make data analysis more efficient. For example, alerts can be set for when the furnace’s temperature deviates more than 10% from the standard deviation for more than one minute instead of setting a minimum and maximum temperature range. This way, erratic behavior in the furnace can be detected, and alerts for small spikes in temperature during reheating can be avoided. This approach can improve the effectiveness of sensor utilization and lead to better data collection. However, it’s crucial to note that communication is also essential to maximize the potential of collected data. Decision-makers within the industry should communicate with all stakeholders about the effectiveness of the data gathered and the methods by which it is analyzed and presented. Collected data should also be shared efficiently between them. By ensuring effective analysis, communication, and implementation, decision-makers in manufacturing or any industry can prevent the underutilization of sensors or other data sources and optimize their internal processes.”

Uber’s Move to the Cloud — How to Prevent Security Implications. Commentary by Amit Shaked, CEO and co-founder, Laminar

“Uber has long been the exception to the new cloud-norm. Although the rideshare leader’s digital transformation activities will be accelerated by the move to the cloud, it doesn’t come without risks. In the height of the pandemic when other organizations were undertaking similar initiatives, three in four businesses experienced a breach due to unknown or ‘shadow’ data, lack of visibility into the network and overall disconnection between developers and IT and security teams. It is critical that enterprises rely on agile data security tools that allow for automated and continuous monitoring of data assets — especially after the shift to the cloud is complete. Having total observability will enable organizations to automate cloud data discovery and data security policy enforcement (especially in multi-cloud environments), control data exposure and enable data-centric environment segmentation.”

How A SSOT and Hybrid Data Mesh Model Can Help Companies Avoid Imminent Failure. Commentary by Raj Sundaresan, CEO of Altimetrik

“Silicon Valley Bank’s collapse is a reminder that an organization’s data can provide valuable insights to manage risk and business opportunities. In the case of SVB, other issues like mismanagement led to the collapse of the storied financial institution. Its asset-liability mismatch caused a big hole in its finances, and when SVB fell out of compliance with a key risk metric, they changed the model’s assumptions instead of heeding that warning. On the heels of this fall from grace, we’re reminded that data management and the right processes are critical to seeing clearly what financial services companies need to know about risks and opportunities. A single source of truth (SSOT) is essential to capturing data as a repository from which a hybrid data mesh approach can be built to help drive decision making. One of the key benefits of the SSOT is its ability to generate valuable data insights that can uncover patterns and trends within an institution’s data, helping to identify both risks and opportunities. For instance, if Silicon Valley Bank had utilized an enterprise data analysis tool to monitor its financial health, it may have detected warning signs about its interest rate sensitivity much earlier.  Numerous industries can benefit from a hybrid data mesh model using SSOT because it’s rooted in creating advanced and accurate insights — a necessity in a number of sectors.” 

Implications of Reddit charging or API access. Commentary by Dan LeBlanc, CEO, of Daasity

“Reddit is following a trend similar to Twitter and will start charging for using their APIs. While a lot of this movement is focused on generating revenue to offset costs of handling all those API request, it’s very ambiguous. The claim is that they want to stop companies that ‘crawl’ Reddit for data, but leave access free for those that want to build apps and bots to help people use the app. This leaves a ton of gray areas open for interpretation, including people that want to scrape data to see what is trending in order to write content for Reddit?  Would they have to pay for that? The ability to scrape information that is beyond what is posted can be really helpful but it’s unclear if that is free or not anymore.”

AI changes the burden of creativity. Commentary by Parry Malm, CEO at Phrasee

“For any company to fire the creatives on their team and replace them with AI is a huge misstep. We’ve proven that learning language models and other generative AI can help with the creation process, and the results are incredible. But generative AI has left us with a new human-owned cognitive burden: Not content creation but content curation. All AI-developed content must be curated with the help of the human eye. You need to consider brand voice, regulatory and compliance guidelines, safety, accuracy, and intent. Our future influencers and branding experts will not be tasked with crafting content but pruning a vast AI-generated content library to ensure their brand’s tone remains intact. A different form of creativity, but creative nonetheless.”

Data Scientists Needed to Harness GPT. Commentary by David Robinson, director of data science at Heap

Every innovation that enhances the value of data has simultaneously increased the value of data scientists, and ChatGPT is no exception. The evolution of data science has progressed in waves, from the inception of data warehouses in the 1980s to the era of Big Data in the 2010s to the recent emergence of the Modern Data Stack. These technologies each presented the potential to transform raw data into actionable insights, and each was met by a generation of data practitioners adept at harnessing and deploying these technologies. Large Language Models, such as ChatGPT, represent the latest innovation for converting data into tangible business value. However, similar to their predecessors, these models can deliver value only when effectively applied by individuals possessing both quantitative skills and a strong understanding of the business context. Data scientists will be at the forefront of harnessing GPT to summarize data, uncover insights, and make informed decisions.

UX Research + Generative AI. Commentary by Nitzan Shaer, CEO and co-founder of WEVO.ai

“Our digital world faces accelerating changes, from technology advancements like generative AI, to regulations like GDPR and growing scrutiny from Congress and EU regulators, with the promise of a cookieless world. Meanwhile, consolidation in the UX space could prove to be anticompetitive, denying consumers an accessible online experience due to industry upheaval that doesn’t serve the end user. While enterprises are looking for alternative ways to gather customer data and intelligence, finding a trusted platform with accurate customer feedback is often cost-prohibitive,” said Nitzan Shaer, CEO and co-founder of WEVO.ai. “Too few experiences are built on unreliable user research, as it can take companies 50+ hours to gather insights that rely on 4-10 users. That is why companies must prioritize UX by investing in platforms that go beyond slow, anecdotal insights and instead use human-augmented AI to generate reliable user research to create the most valuable digital experiences.” 

The Untapped Capital Stream: Data-As-Collateral. Commentary by West Monroe’s Douglas B. Laney, Data and Analytics Strategy Innovation Fellow. 

“Data has become an increasingly valuable asset in the digital age, as evidenced by a booming data brokerage market. However, many organizations are not taking full advantage of their data’s potential for monetization. Leveraging data as collateral—offering loans backed by a copy of the organization’s data—unlocks a whole new stream of capital without the risk of selling. These types of loans are non-dilutive and protect the founders’ equity—the original data never leaves the borrower’s possession. Setting up these types of arrangements can be tricky; you’ll need to find a lender who recognizes the asset’s worth and come to an agreement on the data’s value. Luckily, there are several new players in the space who use machine learning and AI to expedite the data valuation process. Pursuit of these loans—which requires a clear data valuation process—can also help chief data officers fill gaps in organizational understanding of their data’s actual and potential value. Even if the loan does not come to be, this exercise of valuation can help pave the way for future data investments.”

The Importance of Real-time Data to Power AI Applications. Commentary by Gary Hagmueller, Chief Executive Officer, Arcion

“The AI revolution is here, and the potential for AI systems to create value is unprecedented. Large language models (LLMs) like ChatGPT are a significant advancement, but they rely on massive amounts of training data and may not be suitable for real-time proprietary enterprise data. Enterprise AI applications will require a combination of generative and analytical techniques, with situational awareness being a critical element. Change data capture (CDC) technology, which replicates data from transactional databases in real time, offers a solution for delivering real-time, normalized and secure data to power AI applications. It can help address the challenges of managing data changes and ensuring data integrity in enterprise-level AI applications. Modern CDC vendors have emerged with cloud-based distributed microservices architectures to meet the demands of modern enterprises.”

ChatGPT is colliding with AI Regulations. Commentary by Manasi Vartak, Founder and CEO of Verta

“We’re witnessing a collision in real time between ChatGPT and other generative AI technologies, on the one hand, and the growing number of regulations around AI, on the other. Generative AI has captured all our imaginations, but it also has raised alarm bells around potential risks like copyright infringement, Responsible AI, and security issues. These changes come at a time when lawmakers in the United States, European Union and elsewhere are adopting new laws to regulate AI that require a level of explainability, transparency and documentation (for example, around data lineage). It is very likely that organizations using Generative AI models will find it challenging to meet these regulations. So while, as data science and machine learning practitioners, we’re excited to discover new ways to apply Generative AI, we need to understand the requirements embodied in the new AI regulations and ensure that we can mitigate the potential risks that come with leveraging technologies like ChatGPT.” 

Can AI unlearn consumer data? Commentary by Aaron Mendes, CEO & Co-Founder of PrivacyHawk 

“Governments  are increasingly concerned about personal data being used to train AI models like ChatGPT.  These actions can violate privacy regulations and subject consumers to the dangers of future use of their data beyond today’s comprehension. While regulators are working to prevent malicious use of personal data, individuals must also take responsibility to reduce their digital footprint to protect their private data, as regulations can only solve part of the problem. The growing use of consumer data to train AI models means consumers must be empowered to control how their personal information is used by regulators and private sector services”

What’s next for AI regulation? Commentary by Daniel Schiappa, Chief Product Officer, Arctic Wolf 

“The recent OpenAI hearing marked a significant milestone in the AI regulation debate. To ensure a comprehensive approach to regulation, it is imperative that Congress incorporate insights from industry leaders spanning both the public and private sector. By establishing voluntary or regulated guidelines to indicate what types of models are being used, we can effectively address concerns while promoting responsible AI usage. There must be an efficient way to enforce regulation, as there will inevitably be ‘bad actors’ who flout protocols and exploit AI through increasingly realistic tactics like phishing scams. While agency oversight can contribute significantly to regulation efforts, it is critical to strike a balance that keeps pace with the rapid rate of innovation. From a cybersecurity perspective, just as the US is moving quickly on AI, so are our adversaries in China, Russia and Iran. Maintaining our pace of innovation while instituting regulation is crucial to safeguarding our global cybersecurity posture.”

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW

Speak Your Mind

*