Sign up for our newsletter and get the latest big data news and analysis.

Heard on the Street – 8/1/2022

Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!

How AI is becoming easier and more accessible to everyone. Commentary by Erin LeDell, Chief Machine Learning Scientist, H2O.ai

With more businesses moving towards incorporating AI in their day-to-day operations, one of the biggest challenges to its advancement is that often, organizations don’t have the internal resources or expertise to develop and carry through projects that use AI. This is particularly the case with businesses outside of the technology industry. With demand for AI at an all-time high and these challenges in mind, the biggest scale-related trend is the acceleration of democratizing AI – making it not only available to everyone, but also easy and fast to use, so all companies can get in on the action. This is where open source frameworks and the ability to use low-code and pre-canned proprietary apps are growing in popularity, as they make it easier for any kind of enterprise to build and operate AI-based services in common areas like fraud prevention, anomaly detection and customer churn prediction.

New large language model, Bloom, shows community-led research is the future of machine learning. Commentary by Victor Botev, CTO at Iris.ai

GPT-3 arrived on the AI scene like a thunderbolt. But as time goes on, we’re discovering that its applications, while impressive, are limited in specific use-cases, and tend to look better on the macro level than the micro. Now that we have Bloom – an open-source competitor at almost the same size as GPT-3 – it’s tempting to imagine this communal sharing of ideas will bring us closer to solving the prevalent issues of AI bias and language barriers from the ground level up. Bloom was only able to build its model thanks to $7 million worth of grants in compute time on one of the most powerful supercomputers in the world – Jean Zay, near Paris. These are laudable efforts by BigScience to disrupt the market, disturb the Big Tech players in AI, and build a Large Language Model (LLM) focused on research. But this achievement is merely one part of the ecosystem that drives the whole community of AI researchers forward. In simple terms, machine learning consists of three parts: Processing data, improving hardware, and improving algorithms. Hype and acclaim will flow to organizations like BigScience, and rightly so, but LLMs like this are centered on the processing of data. They combine vast amounts of data with existing algorithms and leverage huge computing power to find out what happens. The next stages, happening out of the spotlight, may be just as important. The community will explore the parameter’s space from Bloom’s model to better understand and optimize the underlying mathematics, and this work will inform the next generation of hardware and algorithms to further push the boundaries for what’s possible.

Avoid Ending Up with a “Lakeshack” Instead of a Data Lakehouse. Commentary by Ori Rafael, the CEO and co-founder of Upsolver

It has occurred to me that if there is a data swamp, and a data lake house, then there must be a data lake shack. What would this look like? Well, the term data swamp expresses the frustration CIOs have with the fact that they’re storing large quantities of modern data but can’t efficiently utilize it. The data lake house concept implies a solution to this problem, whereby the data lake becomes eminently queryable for all kinds of data. So what’s a data lake shack? Well, it’s an organization’s superficial attempt at creating a lake house that leads to slow, expensive and unreliable queries – and unhappy data users. It looks like a house, but you wouldn’t want to live in it. It’s the result of addressing the symptom (performance) by adding a modern query engine, without handling the root cause, which is an unoptimized raw cloud object store. The answer is to refine the raw data into query-ready tables, ideally in an open format like Parquet. Because these tables are large and live (constantly updated), getting to this state traditionally has required a lengthy, manual data engineering project. Fortunately, solutions have emerged to automate that work, which can help us avoid shack-shock and get the data access and analytics performance we require from a data lake. 

Multimodal AI needs open source to scale. Commentary by Aaron Sloman, Chief Technology Officer, CLIPr

How important is open source for innovations with multimodal AI development? The short answer is it’s critical for the future of the technology. Standard AI algorithms are usually unimodal, meaning they are trained to do only one specific task such as processing images or text. They are fed a single sample of training data from which they are able to identify corresponding images or words. Multimodal AI on the other hand, has the ability to process multiple data types (i.e. image, text, speech, numerical data, etc.) from disjointed sources and feed them into a single model for analysis. Currently, there is a lack of standardization across different types of siloed unimodal AI models, so developers have to patchwork several models together, each of which doing one type of analysis well. The biggest roadblock for important multimodal AI initiatives is cost and more open source offerings will make it easier and less expensive to train and run experiments. There is no shortage of impactful use cases for multimodal AI, but it almost always comes down to making the financials work which is why a lack of open source development is the biggest barrier to its growth.

How To Set Up Your MLOps Teams To Succeed. Commentary by Abhijit Bose, MVP, Head of Center for Machine Learning, Capital One

Every enterprise serious about embracing machine learning (and who isn’t) is turning to MLOps to ensure a more reliable and efficient approach to deploying ML models into production. A challenge for many ML teams today is the time and effort it takes to develop the infrastructure needed to deploy ML reliably, and at scale, across the enterprise in a repeatable way. The purpose of MLOps is to standardize and, to a degree, automate these processes so engineers and data scientists can spend their time on better optimizing their model parameters and business objectives. The best way to set yourself up for success in deploying MLOps is to build an infrastructure that gets your model development and model deployment teams on the same tech stack while prioritizing reusable components and frameworks. At Capital One, our move to the cloud, for example, was a cost-effective, flexible, and efficient way to get our ML compute environment up to speed so we could develop and deploy complex compute solutions at scale. Another critical factor was getting our ML functionality to a place where automating model monitoring and training became possible — ensuring these functions perform well and scale as we push models into production. We are also heavily focused on compressing the lag time between production and analytical data environments, which has been immensely helpful in expanding visibility across our core cloud infrastructure, including our CI/CD pipelines and management processes, deployment of containers, security management, and governance. Taken together, all of these MLOps best practices have helped us ensure consistent reproducibility, model monitoring, and maintenance, which ultimately keeps us responsible and well-managed in all that we do across our ML ecosystem.

Preparing for climate disclosure regulations by tracking ESG data. Commentary by Rick Dorsett, Senior Director HRAVS & ESG at ISN

72% of senior decision makers don’t have confidence in the ESG data currently being reported to stakeholders. On top of that, the United States is currently awaiting a final ruling by the SEC on the public disclosure of climate emissions, which could arrive as early as October. The proposed rule would require publicly traded companies to report their Scope 1 and Scope 2 Greenhouse Gas Emissions (GHG) and their Scope 3 emissions if deemed material. With these changes coming soon, organizations need to prepare to accurately track and report emissions data, or they face many risks. Improvements in data collection have been critical in the tracking of GHG emissions. Businesses should implement services to collect primary data on their entire supply chain and use specific emissions factors to translate available information into a reportable data point. By leveraging these advancements, companies are able to more accurately track their emissions throughout their value chain, including Scope 3 emissions. This data enables organizations to create and make progress towards GHG reduction targets. As the SEC ruling is finalized, we’ll likely see more organizations adopting methods of tracking their emissions to prepare for the public disclosure component, and the role of data collection will become key to doing so accurately.

Businesses Are Moving Toward More Digital Transformation Thanks To The Metaverse, Web3 and AI. Commentary by Sanjay Vyas, Chief Technology Officer at Planful 

Businesses across all industries are looking at advanced technologies to modernize how they operate. In recent years, this has meant embracing AI, which has impacted businesses by augmenting tasks that require human judgment. AI works to eliminate mundane tasks, surface hidden anomalies, and predict what’s coming next. For the modern CFO, a digital transformation requires consistent innovation. As we continue to hear more about the Metaverse and Web3, more finance departments are beginning to strategize how they’ll incorporate it into their existing business forecasts. While some fintech leaders have already joined the movement, many are still hesitant to take the leap. Those starting to experiment with the Metaverse are finding that virtual reality experiences can bring added value to their company in the form of reduced business costs and improved customer experiences. In my opinion, it’s not off-base to assume that many finance departments will want to adopt the Metaverse to unlock streamlined processes that enhance business. While AI is providing an immediate impact now, it is also setting the stage for a much larger virtual experience finance leaders are already investing in.

Anomaly detection helps companies pinpoint key insights in their data. Commentary by Andy Williamson, Chief Product Officer, Kaizen Analytix 

Companies today deal with extremely large volumes of data generated from multiple sources, so it’s quite difficult for them to quickly find potential issues hidden at a granular level – most of which result in real profit leakage. The most advanced companies are leveraging sophisticated anomaly detection techniques to pinpoint the oddities in their data instead of trying to manually find them buried in dashboards and reports. Automating the detection and triage processes allow for faster resolution of issues in a less labor-intensive and error-prone way. With the interconnectivity and interdependence of data growing like never before, simple univariate anomaly detection techniques frankly do not cut it anymore. Companies need algorithms that look across a variety of data sources, metrics and segments to uncover trends and relationships in order to more confidently assess where the true anomalies lie. The corporate use cases for anomaly detection are practically endless, from spotting fraud to revenue leakage to system outages. One great example – a large telecommunications provider used advanced anomaly detection techniques to spot irregularities in their customer acquisition and retention, allowing them to tailor retention strategies to their most at-risk customer segments. This resulted in a 10% reduction in controllable churn and $20 million in annual revenue salvaged. The costs of not using advanced anomaly detection can be astronomical for many businesses.

Meta’s ‘superpower’ AI translator may not be what it’s cracked up to be. Commentary by Victor Botev, CTO at Iris.ai

As Meta claims a breakthrough in AI translation technology through their newest paper, it’s amazing to see the power of machine learning to connect far-flung, esoteric languages. The engineering prowess required to clean and present enough data for these obscure datasets is itself a marvel. However, it’s worth bearing in mind, despite the hype, that these models are not the cure-all that they may first appear. The models that Meta uses are massive, unwieldy beasts. So, when you get into the minutiae of individualized use-cases, they can easily find themselves out of their depth – overgeneralized and incapable of performing the specific tasks required of them. Language tasks, like interpreting and translating academic research papers, for instance. Another point to note is that the validity of these measurements has yet to be scientifically proven and verified by their peers. The datasets for different languages are too small, as shown by the challenge in creating them in the first place, and the metric they’re using, BLEU, is not particularly applicable. Finally, this article hasn’t been published for peer review. Doing a kind of peer review through Meta’s media publication creates bias for future reviews and puts public pressure on the reviewers. But despite all of this, I’m hoping that these points will be addressed and it will be a good foundation for some great work in the next few months in NLP.

The Game-Changing Power of Location Intelligence for the Data-Driven Enterprise. Commentary by Jeff White, CEO of Gravy Analytics

Location data is increasingly being used by companies to enhance business operations—from advertising to market research, financial services to supply chain risk management, and more. Reflecting true human mobility in the physical world, this type of data powers solutions for organizations across a wide range of industries that need to understand how people, products, and materials move throughout the world. Location intelligence helps organizations of all types better understand human mobility and consumer behavior. For example, in the finance industry, location data is used as an alternative data set by professional investors. Foot traffic data, measured through location analytics, can highlight changes in consumer behavior, trends, and market demand. By analyzing foot traffic data that reflects true consumer activity, investors can better understand a company’s performance in near real-time. Through these insights, investors can identify opportunities, mitigate risks, and ultimately gain an edge in the market. As more industries realize the benefits of location intelligence and how it can be applied to business decisions, its growth will continue in the years to come.

Boosting the Sales Teams’ Confidence with Price Optimization. Commentary by Zilliant Senior Vice President of Products & Science Pete Eppele 

Inflation, supply chain shortages and general market volatility have put salespeople in a historically difficult position. They have been the “bearer of bad news,” if you will, from everything from limited product availability to repeated price increases. In most cases, sales teams aren’t provided with the contextual data to help them explain the rationale behind price increases to their customers. Over time, they lose confidence in pricing as they face increased push-back from their already-stretched customers. The result is often increasing requests for customer price exceptions and, ultimately, lost margin. In times of inflation and volatility, price optimization software offers unique advantages. Price optimization can replace traditional methods of “across the board” price moves with more surgical and personalized price changes for each customer. For example, if a customer is paying a price that’s higher than similar customers, they may get a smaller recommended increase. This can be a great selling point for a sales rep to discuss with that customer. On the other hand, if a customer is underpaying relative to similar customers, volatility provides an opportunity to align that customer to what similar customers pay without overcharging. Visibility to optimized pricing can be extended directly into sales tools like CRM and CPQ using visual analytics. This makes it straightforward for salespeople to see one customer’s price relative to other customers with similar buying behavior. Transparent price optimization boosts the sales teams’ confidence in pricing guidance and ensures that necessary price increases stick with customers. 

Living on the Edge: Putting Real-Time Data In The Hands Of Anyone, Anywhere Through Modern MDM. Commentary by Manish Sood, Founder & CTO, Reltio

As enterprises continue to add new applications and collect even more data, many are adopting edge computing strategies, allowing them to put real-time information in distant locations without compromising speed and accuracy. Edge computing brings more unstructured data into play, which can compound challenges. Without a foundation of trusted, high-quality data for current operational and analytics systems, successfully implementing the transformational initiatives at the edge becomes a highly uphill task. Edge computing capabilities will handle many computations in real-time. Driverless cars, for example, will rely on data to differentiate between vehicles and pedestrians when operating. Situations like these will depend heavily on core data from a central source, such as location maps or a driver’s portable profile. The precision of the outcomes at the edge will depend heavily on the quality of the core information that is universally shared across the entire service. Real-time master data management (MDM) for core data in the cloud becomes the always-on, single source of truth for such information. A modern MDM approach—which leverages the power of the cloud— can add tremendous value to an edge computing strategy. It enables organizations to have clean, accurate data available anywhere without compromising speed and accuracy. Modern data management is about flexibility, fluidity, and the ability to deliver large volumes of trusted and critical data at scale to downstream systems for real-time decisions. That’s where the cloud comes in because it allows for the ultimate flexibility as a business shifts. Without a foundation of trusted high-quality data for use by current operational and analytics systems, successfully implementing the transformational initiatives at the edge becomes an extremely uphill task.

What TikTok’s Data Management Failures Mean for Data Governance and Regulation. Commentary by Jay Militscher, head of data office at data intelligence company Collibra

Organizations that have clearly demonstrated expertise for using data to personalize the experience for their customers through targeted ads and product recommendations cannot then make claims that they don’t know where the data is. There is a disconnect there. Companies that are visibly successful can’t also say they are not in control of their data, especially once you’ve surpassed a certain number of billions in revenue. It’s a choice not to know where data is when money could easily be invested in data management, and if these companies don’t prioritize this themselves, then that is where we should consider new regulations and safeguards for consumers.

As Technology Evolves, Enterprises Must Keep Humanity in AI. Commentary by Dr. Lewis Z. Liu, CEO and Co-Founder of Eigen Technologies

Conversations about AI sound much different today than they did 10 years ago. We’re no longer wondering whether AI will help businesses grow or increase bottom lines; we know companies implement AI-powered technologies to power efficiencies, cost savings and scalability. Instead, the proliferation of the technology has pushed these conversations in more meaningful and complex directions – making it critically important to assess data privacy and biases in AI models. Enter humanity in AI. Too much human involvement defeats the automation purpose of your AI operations but having little-to-no human interaction can lead to flawed AI-fuelled decisions. It’s no secret that AI that is managed incorrectly or trained on flawed data can lead to biases and unethical practices. This can be seen in the disproportionate amount of loan rejections people of color face when applying for a mortgage online or in Google’s recent efforts to include additional skin tones to help combat racial bias in their approach to AI. While the technology’s AI decisions may be statistically correct based on its programmed algorithms, it doesn’t mean those decisions are ethical. Today’s organizations must strike the right balance when it comes to keeping humanity in AI. A system of checks and balances must be in place in your AI operations, which means regularly training the algorithm, taking a human-in-the-loop (HITL) approach, logging the ‘who, what, when and why’ of any changes made to the models and more. You need human intervention to ensure fair, ethical and ultimately, profitable decisions. Hybrid models that keep HITL are vital in today’s AI practices – ensuring the right amount of human interference at the right time. This doesn’t mean having someone oversee the entire AI process, but rather having the right parameters in place to check the system when a data point falls outside of the typical parameters.

Using AI to Optimize Digital Marketing. Commentary by Peter Day, Chief Technology Officer at Quantcast

Like many of you, I’ve recently read the story about the Google engineer who raised fears that some of the company’s artificial intelligence software had become sentient. I’m worried – not because I believe that Cylons, the Terminator, or any other Hollywood-version of evil, suddenly conscious robots are about to take over. I’m concerned that this story is going to further cloud the conversation around AI: one of the most misunderstood technologies of our age. I believe that when applied properly, AI will have a revolutionary impact on not just marketing and advertising – but the entire business world. But for that to happen, CMOs and other marketing leaders must get a handle on AI’s true purpose and potential. For marketers to truly harness the power of AI they must understand its capabilities and limitations, ask the right questions, and take the right measures to apply this technology in a way that will enable them to understand, reach and influence audiences more effectively. AI is always entirely and singularly goal-oriented. Whatever you tell an algorithm to optimize towards, it will orient all of its computational power toward that end. Therefore, you’d better be sure you’ve got your goals in order. On that note, here are the top questions brands should ask when applying computational assistance: (i) What do I really want these products to do? How will we measure success? (ii) Is the data needed available and trustworthy (AI likes raw and directly observed data)? (iii) What are the possible unintended consequences of using AI to reach this goal?

AI’s Impact on the Supply Chain Crisis. Commentary by Joe Fizor, Director of Solutions Engineering at TBI

AI is a strong tool in mitigating the effects of supply chain slow downs. AI is used to look for patterns – what has happened in the past and how can it inform what happens today and in the future. This allows companies who see a slow down in product or congestion at a port, to automatically increase requests for products. It can also monitor the demand for products and adjust prices accordingly. AI not only allows these processes to become automatic, but also makes them faster and takes away a task from an employee, minimizing error. The end goal of any business is to get the product into the customers’ hands and AI makes this happen more efficiently. It works from beginning (manufacturing of the product) to the end (getting the product in the hands of the consumer) to create a seamless, cost-effective process. If a customer needs to talk to customer service, AI can assist at all hours of the day. During the manufacturing process, AI can help the design by informing which parts can be cut out or made better to create the best product. And when the product moves through the supply chain, it can provide transparency from end to end. All in all improving the customer experience, and increasing the bottom line.

Emotion-detecting AI has potential to expand digital accessibility, but before moving forward we need to understand its other implications. Commentary by Theresa Kushner, Head of the North America Innovation Center at NTT DATA Services

Emotion-detection AI is one step on the road to replicating human sentiment. But before we lunge forward with that step, we need to make sure that we understand fully where we are going and how fast we want to go. Like all things that are technology based, emotion-detection AI has a yin and a yang. Organizations have to balance the potential positive uses of emotion-detection AI with the technology’s ethical implications and negative consumer sentiment. When used appropriately, emotion-detection AI has the potential to support digital accessibility, bridging the gap for the 15% of the world’s population with a disability. It allows for greater real-time personalization that adapts to how a customer or patient is feeling, giving access to the same quality of online services that others may take for granted. But as consumers of this software, we might not yet be ready for the use of facial recognition or image scanning which are logical next steps. Emotion detecting AI cannot be treated like any other AI technology, as it has specific ethical concerns to address and requires agreement on what specific facial cues mean. A head nod in America is very different from one in India. Consumers also need to be educated on the full extent of the information they’re handing over when they are the subject of emotion-detection AI scanning, and that responsibility should fall to the organization using it. As a bottom line, emotion-detecting AI has the potential to better the lives of underserved communities, but its use requires both a thoughtful and ethical approach.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Leave a Comment

*

Resource Links: