Sign up for our newsletter and get the latest big data news and analysis.

Heard on the Street – 9/12/2022

Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!

Eliminating off-label AI. Commentary by Triveni Gandhi, Responsible AI Lead at Dataiku

The healthcare industry is well known for its off-label drug use. We see this all the time where a drug approved for heart concerns may later be prescribed to improve mental health outcomes even though it was never formally reviewed for that purpose. Off-label use proliferates for many reasons: perhaps there are no suitable approved drugs for a condition or other approved drugs haven’t worked. Surprisingly to many, this happens all the time. In AI, many practitioners have taken a similar approach, and it’s a grave mistake. “Off-label AI” is when practitioners take a successful model for a certain situation and re-use it for others. For example, in the legal field judges have used AI-informed sentencing guidelines, which turned out to be heavily biased against people of color. However, the model used was actually taken from a different application intended to identify potential criminal re-offenders and offer support to minimize recidivism. This copy-paste approach to AI embodies the perils of off-label AI – even with the best intentions – and must be eliminated to build trust in the field.

How MLOps can be something of a silver bullet in the era of digital transformation and complex data problems if used strategically. Commentary by Egido Terra, Senior Data Product Manager, Talend

As data volume and complexity continues to grow, ML is gaining more importance to ensure data health. The value of mature data management is already immeasurable. However, many professionals fail to understand the requirements of successful automation. In order to unleash the full potential of ML, MLOps must be leveraged for solving complex problems with highly specific, tailored solutions. MLOps — the discipline of deploying and monitoring machine learning models — can be something of a silver bullet in the era of digital transformation and complex data problems if used strategically. Automation is a must when it comes to properly managing data and ML; developing models won’t be sufficient unless MLOps is used to quickly identify problems, optimize operations, find issues in the data, and allow smooth and successful execution of ML applications. The alternative is hard-to-manage ad-hoc deployments and longer release cycles,  where time-consuming human intervention and error is all too common. The benefits of issue-specific ML applications for data health are endless. A dedicated investment in MLops to ensure your automation priorities are well-structured will pay off in the short and long term. As a result, harmful data will be kept out of applications, and solutions will come quicker with a significant impact. 

How To Level Up Data Storage In The Growing Datasphere. Commentary by Jeff Fochtman, senior vice president of marketing, Seagate Technology

The global datasphere—that is, all the data created, consumed, and stored in the world—doubles in size every 3 years. It’s a mind-blowing growth. How business leaders treat all this data matters. It matters because data is an immensely valuable, if overlooked, business currency. Organizations that find themselves deluged by more and more data should focus on converting this data into insights, and those insights into business value. Likely, if your organization is handling data sets that are 100TB and more, you already store some of this data in the multicloud. Unfortunately, 73% of business leaders report that they can only save and use a fraction of their data because of growing costs associated with data storage and movement. What can you do about it today? Learn from companies that win at business by taking 5 practical steps: 1) They are a lot more likely to consistently use predictive third-party software tools that help anticipate and measure the costs of cloud resources for every deployment decision. Do that—every time. 2) Make sure to evaluate deployment criteria (performance, API, etc.) prior to deploying applications. 3) Monitor those characteristics once applications are up and running. 4) Invest in tools in addition to training. 5) Automate security and protection. What can you do about it in the near future? Some storage companies offer vendor-agnostic, frictionless data services with transparent, predictable pricing and no egress or API fees. To reclaim control over your data, look for those solutions.

New bill brings US closer to sustainability goals – IoT will help us cross the finish line. Commentary by Syam Madanapalli, Director, IoT Solutions at NTT DATA Services

As the US pushes forward toward its sustainability goals with recent legislation that provides the most climate funding the country has ever seen, cleantech is at the forefront of our economy. Internet of Things (IoT) technology has the potential to play a key role in this sector through the reduction of carbon emissions and adoption of sustainable practices and to have far-reaching positive impacts both on business operations and for our environment. IoT and digital twin technologies allow for the connection of complex ecosystems, providing real-time data from the large variety of meters, sensors, systems, devices, and more that an organization might use to measure carbon emissions, giving more insight into their carbon footprint than ever before. Once that IoT data is connected to digital twins in the cloud, advanced analytics can be used to identify and predict issues along the value chain and optimize operations. This will be an area of growth as leaders continue to look for ways to improve operations and reduce environmental impact.

Leverage existing AI/ML capabilities right now. Commentary by Don Kaye, CCO, Exasol

In today’s data-driven world, there is a definitive need for organizations to use artificial intelligence (AI) and machine learning (ML) to move beyond simple reports and dashboards describing what has happened to predict with confidence what will happen. Forward-thinking companies are embracing AI and ML in an effort to develop thorough data strategies that link to their overall business objectives.  Business processes today are not instantaneous – but business leaders expect data-driven outcomes to be. This often leaves decision-makers and their teams in a pinch, especially as data consumerization continues to increase, and fast. This is where artificial intelligence and machine learning play an integral role. While these capabilities are often integrated within an organization’s current technology stack, they are not getting leveraged to their fullest potential. Companies must use their existing AI/ML capabilities to improve access flows to data, gain commonality across various points of view at scale, and all within a fraction of the time it takes to sift through the typical data sets analysts are tasked with.

It’s Time to Put Credit Scores in Context. Commentary by Steve Lappenbusch, Head of Privacy at People Data Labs

Last week, The Wall Street Journal reported that a coding error at consumer credit reporting agency Equifax lead the credit giant to report millions of erroneous credit scores to lenders across a three-week period in April and May of this year, a major challenge for lenders and credit seekers. While a credit score can shed some essential light on the subject’s credit history and past interaction with lenders and payees, enriching a record with alternative data like work history, past addresses, and social media profiles can substantially expand a lender’s understanding of who the customer is, and how legitimate their application may be. A history of social media profiles, email, and phone contacts with a long history of use, and a valid work history will all help to expedite the process of weeding out synthetic identities and other invalid applications fast, freeing up time to service legitimate applicants. Credit scores aren’t going anywhere. They’ll remain a critical tool for lenders looking to understand ability to repay, and the only permissible tool for determining credit worthiness. However, it’s easy to imagine a world in which alternative data sources can diminish the impact of inevitable errors like the one reported today. By providing a backstop of additional context and a new layer of identity on top of traditional credit bureau records, lenders no longer need to be tied to a single source of truth.

The value of embedded analytics driving product-led growth. Commentary by Sumeet Arora, Chief Development Officer, ThoughtSpot

Nowadays, our whole world is made up of data and that data presents an opportunity to create personalized, actionable insights that drive the business forward. But far too often, we see products that fail to equip users with data analysis within their natural workflow and without the need to toggle to another application. Today, in-app data exploration, or embedded analytics, is table stakes for product developers as it has become the new frontier for creating engaging experiences that keep users coming back for more. For example, an app like Fitbit doesn’t just count steps and read heart rates. It gives users an overview of health and recommends actions that should be taken to keep moving, get better sleep, and improve overall well-being. That’s what employees and customers want to see in business applications. Insights should not be limited to business intelligence dashboards; they should be seamlessly integrated everywhere. Whether users are creating an HR app for recruiting, or a supply chain app for managing suppliers, embedded analytics can provide real-time intelligence in all these applications by putting the end-user in the driver’s seat and giving them insights that are flexible and personalized.

What’s the deal with Docker (containers), and Kleenex (tissues)? Commentary by Don Boxley, CEO and Co-Founder, DH2i

I suppose you can say that Docker is to containers, what Kleenex is to tissues. However, the truth is that Docker was just the first to really bring containers into the mainstream. And, while Docker did it in a big way, there are other containerization alternatives out there. And that’s a good thing because organizations are starting to adopt containers in production at breakneck speed in this era of big data and digital transformation. In doing so, organizations are enjoying major increases in portability, scalability and speed of deployment – all checkboxes for organizations looking to embrace a cloud-based future. I am always excited to learn about how it is going for customers leveraging containers in production. Many have even arrived at the point of deploying their most business-critical SQL Server workloads in containers. The sky’s the limit for deployments of this sort, but only if you do it thoughtfully. Without a doubt, containerization adds another layer of complexity to the high availability (HA) equation, and you certainly can’t just jump into it with container orchestration alone. What is necessary is approaching HA in a containerized SQL Server environment with a solution that enables fully-automatic failover of SQL Server Availability Groups in Kubernetes—enabling true, bulletproof protection for any containerized SQL Server environment.

Why data collection must be leveraged to personalize customer experiences beyond retail. Commentary by Stanley Huang, co-founder and CTO, Moxo

Today, as more customers prefer to manage their business online, these interactions can feel impersonal. It’s common for the retail industry to leverage data collection and spending algorithms in order to create customer profiles and predict the next best offer, as retail is a highly customer-centric business with buyers requiring on-demand service. Beyond the retail industry, high-touch industries, such as legal and financial services, are beginning to utilize data collection in order to more effectively service clients. By analyzing collected data from previous touchpoints, companies can create a holistic 360-degree view of each customer and gain a better understanding of how to interact with them based on their individual preferences. Data collected from a user’s past is the most relevant source to help contextualize client interactions and enable businesses to personalize the entire customer journey moving forward. This historical data collected from client interactions allows businesses to identify client pain points in the service process and make improvements in real time. In addition, the automation of processes can enable businesses to more quickly analyze collected data and reduce friction in the customer service process.

It’s Time To Tap Into the Growing Role of AI, Data and Machine Learning in Our Supply Chain. Commentary by Don Burke, CIO at Transflo

The area of machine learning, AI, contextual search, natural language processing, neural networks and other evolving technologies allows for enhanced operational agility with the supply chain like never before. These technologies allow for adaptable digital workflows driving speed, efficiencies and cost savings. Digitizing and automating workflows allow organizations to scale, grow revenues, adapt faster and deliver a superior customer experience. For example, transportation generates volumes of required documents necessary to complete financial transactions among supply chain partners. The ability to apply deep learning machine models to classify and extract data from complex, unstructured documents (i.e., emails, PDFs, handwritten memos, etc.) not only drives efficient processing but unlocks actionable data accelerating business processes and decision-making! This equates to a real economic impact, whether by customer service excellence, speed of invoicing or significant cost savings. Above and beyond automating routine tasks and freeing up human resources for more high-valued opportunities, data becomes a valuable and harvestable area. Using these technologies to extract, process and merge data connects both the front and back office; allowing for hidden analytical insights and unseen patterns to be discovered and improve organizational decision-making by understanding customer behaviors, profitability of products/facilities, market awareness and more. In transportation, the sheer number of documents such as BOLs, PODs, Rate confirmation and accessorial hold untapped and unlocked insight that can be applied to reducing friction and complexity within the supply chain.

Bad Actors Still Want Your Data, But Are Changing Tactics of How to Get it. Commentary by Aaron Cockerill, chief strategy officer, Lookout

Bad actors are zeroing in on endpoints, apps and data being outside of the original corporate perimeter. In fact, there’s a plethora of threat intelligence reports about how bad actors have moved from trying attacks on infrastructure to trying to attack endpoints, apps and data that are outside that perimeter. For example, many companies have had to move apps and servers that were behind a firewall, into the cloud (IaaS environments) and run them so they are internet accessible, but many of these apps and servers weren’t designed to be internet accessible and moving them outside of the perimeter introduces vulnerabilities that weren’t there when they were inside the corporate perimeter. Many server attacks these days leverage RDP; something that would not have been possible had the servers been behind a corporate perimeter. The same is true of endpoints, although the way an attack occurs tends to be less around gaining access to RDP and more frequently involving phishing and social engineering to gain access and move laterally to critical infrastructure and sensitive data. So, the attack surface has changed – instead of looking for vulnerabilities inside the organizations’ perimeter, we are now looking for vulnerabilities in servers in the cloud and on endpoints that are no longer protected by the perimeter. But what has not changed is what the bad actors are seeking and it is very much focused on data. We hear a lot about ransomware, but what is not well understood yet, in the broader sense, is that ransomware typically is only successful when the bad actor has considerable leverage and the leverage they obtain is always through the theft of data and then the threat of exposure of the data – what we call double extortion. 

What is Vertical Intelligence? Commentary by Daren Trousdell, Chairman and CEO of NowVertical Group

Data transformation begins from the inside out. Businesses’ greatest challenge is staying hyper-competitive in an overly complicated world. Vertical Intelligence empowers enterprises to uplift existing tech stacks — and staff — with platform-agnostic solutions that can scale the modern enterprise. Vertical Intelligence is the key to unlocking the potential from within to bring transformation to the forefront. The idea of a purpose-built, top-to-bottom automation solution is antiquated. Yet the future is malleable: We see it as a flexible network of technologies that are platform-agnostic and prioritized to industry-specific use cases and needs. Most AI solutions currently available either require massive multi-year investment or for companies to mold their decision-making automation around a prefabricated solution that was either not built for their business or requires them to conform to specific constructs. We believe that technology should be made to serve a customer, not the other way around, and that’s why we’ve brought together the best industry-specific technologies and thought leaders to shape the experience and prioritize the most critical use cases.

Digital acceleration is just a dream without open solution development acceleration platforms.
Commentary by Petteri Vainikka, Cognite

We are in the era of the new, open platform architecture model. Businesses now stand a greater chance of truly transforming by thinking bigger and more broadly across their operations. Businesses that cling to the past and maintain fixed, impenetrable silos of data are doomed to stay in the past. Contrary to those maintaining past operating models, businesses that place their bets on open, truly accessible, and data product-centric digital platform architectures will be the ones experiencing the most rapid and rewarding digital acceleration. Because there is no single data product management platform that can meet all the various needs of a data-rich, complex industrial enterprise, open data domain specialized platforms are rising to the occasion. Such open platforms meet operations business needs by offering specialized industrial data operations technology packaged with proven composable reference applications to boost the ROI of data in a faster, more predictable way. With greater openness, domain specialization, and pre-built interoperability at the core, businesses can boost their data platform capabilities and simultaneously realize new data-rich solutions in less than three months. To stay in the lead in the digital transformation race, businesses must think about operationalizing and scaling hundreds of use cases rather than one-offs or single-case proofs of concept. They need open, composable platform architectures that serve to tear down data siloes while simultaneously delivering high-value business solutions with instant business impact. This will only happen with the right mix of specialized open data platform services orchestrated to work together like a symphony. Digital acceleration is just a dream without open solution development acceleration platforms.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW

Leave a Comment

*