Sign up for our newsletter and get the latest big data news and analysis.

Heard on the Street – 6/1/2022

Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!

The Enormity of Humankind’s Most Critical Data. Commentary by Luke Norris, Faction Founder, Executive Chairman

Across every field important to human life, big datasets are expanding exponentially. But as they expand, they become unwieldy. It can take months or even years to move data from one network to another. When rich data collections are stuck where they’re first created, it hinders how data can be used. Important innovations in industries like healthcare, energy and transportation – those that advance the greater good – are being slowed by the fact that datasets are cumbersome. It’s not due to cybersecurity, or because scientists, researchers and engineers don’t want to inform and accelerate each other’s innovations. The problem is that data is really big. Cloud providers benefit from expanding datasets and the recurring revenue that they produce. And moving or sharing large datasets is a pain for those managing on-premises datasets. But community storage can help solve some of the world’s biggest challenges. When organizations can lend their data to other organizations to aid in the discovery of a solution, the innovation path shortens. Solutions that make individual datasets accessible across different networks are essential to making this idea of community storage successful. We’re in a data-driven society and the ability to share large datasets for a common good, breaks down barriers to a brighter future.

Driving data transformations with column-aware metadata. Commentary by from Coalesce‘s Co-Founder and CTO, Satish Jayanthi

What exactly is column-aware metadata? It’s the ability to leverage column names and mappings for easily applying transformations within a data set. For example, when creating a type two dimension, you can easily identify and track changes from specific columns such as address, name, phone number or any other column in your table. Column-level lineage is a profound problem for organizations trying to be data-driven and is compounded by how large the scale of the project is. Being column-aware also allows users to generate SQL in a graphical interface vs. a code-driven IDE that requires that input manually. Benefits of a column-aware architecture include: (i) Efficiency for everyone. A column-aware product dramatically improves productivity for every part of the data pipeline, helping democratize data from engineers and architects to the creators and consumers of final data dashboards; (ii) Instantaneous impact-analysis and lineage. Because the architecture understands the relationships of columns, you can instantly see the data lineage, how it is interconnected, and the type of change impacts a transformation may have; (iii) An unlocked UI. Column metadata unlocks an incredibly unique graphical interface able to display an intuitive and powerful experience for the user. All without compromising flexibility; (iv) The lost art of data modeling. Column-aware data transformations seamlessly integrate data profiling, the logical data model, and physical model.

Living Our Principles by Future Proofing Responsible AI. Commentary by Steve Mills, Chief AI Officer and Gamma MDP at BCG

On 5/19, BCG released an AI Code of Conduct, which outlines our approach to our work in AI. We recognize that AI presents unique opportunities, but also unique risks. We believe we have a duty as a leader in AI to proactively ensure the responsible use of AI is core to our approach. The AI Code of Conduct is a means for us to codify and disseminate that commitment publicly. We also strongly believe that a company cannot become a global leader in AI without also being a global leader in Responsible AI. We are extraordinarily passionate about Responsible AI and have worked to promote Responsible AI principles internally, with our clients, and with other organizations globally. The AI Code of Conduct is another way for us to model those values and lead with integrity in this space by promoting the principles of Responsible AI and providing a template to other organizations.

Zola Hit By Credential Stuffing Hack. Commentary by Robert Prigge, CEO of Jumio

The recent Zola credential stuffing attack proves once again that there is an incredible amount of risk involved with using the same login credentials for multiple online accounts. In this instance, it’s likely that certain usernames and passwords had already been leaked on the dark web, allowing fraudsters to leverage bots to gain access to Zola accounts and potentially other platforms as well. This breach highlights the essential need for stronger online identity verification processes, especially as consumers use an increasing number of online accounts and platforms for critical functions like banking, healthcare and shopping. As a result, many organizations have already moved away from traditional passwords and implemented more secure methods of user authentication, such as face-based biometric verification and multi-factor authentication (MFA), to thwart the efforts of malicious actors and ensure user data is kept secure.

Zola Hit By Credential Stuffing Hack. Commentary by Gunnar Peterson, CISO, Forter

The compromise of multiple Zola user accounts provides both security and fraud prevention lessons. In regards to security, there is an influx of bad actors using automated tools like botnets and machine learning to engage in ongoing attacks against consumer-facing websites like this one. With automated tools, they commit account takeover fraud using techniques like credential stuffing and brute force attacks. To succeed against dynamic cybercriminals, organizations must build a learning system that evolves over time to keep up with attacker tactics. Identity graph technologies can help savvy organizations recognize attacker tactics across the whole identity lifecycle, including provisioning and account maintenance. The credential stuffing tactics also led to digital commerce fraud. In this case, it appears credentials were purchased on the dark web, and associated bank accounts were used to buy gift cards and make purchases. Retailers can actually apply similar identity-based principles to fraud prevention to catch these fraudsters in the act. Fraud prevention teams must look beyond basic attributes and work to identify patterns with less conventional characteristics. Surfacing those patterns takes sophisticated technology and making decisions on transactions accurately and instantly. Thus, the most effective way to combat gift card fraud is not to focus on the transaction but instead the identity behind the transaction. Merchants must block bad actors across the digital commerce funnel and across channels to protect their consumers and their profits.

GDPR Anniversary. Commentary by Steve Bradford, Senior Vice President EMEA, SailPoint

It may have been four years since GDPR was introduced, but compliance is a process that must be adapted continuously. To keep on top of this, companies must try to understand the regulatory requirements as much as possible and keep track of how it affects their own industry. Businesses should then conduct assessments to identify their own privacy risks, prioritize them and create an action plan to mitigate the most important risks. It’s also important for companies to review the security policies and procedures already in place, to stay compliant with regulations applicable to their business. To ensure sustainable compliance, companies should also streamline and automate compliance processes and policies as much as possible. Technology like identity security can achieve this by regulating user access and keeping track of who is using various apps and data, and when. Doing this can save costs as well as valuable staff time, while reducing the risk of devastating data breaches due to manual errors.

How US businesses adapted to 4 years of GDPR and patchwork privacy law. Commentary by Danny Sandwell, Director of Product Marketing, erwin by Quest Software

When we look at the big picture, the GDPR really has become a vital component of global privacy law. It set the standards for others to follow, and it brought data privacy and data management into focus for everyone from citizens to enterprises and government institutions. Over the last four years, we shouldn’t underestimate the impact that GDPR has had on highlighting the reasons that companies should take data related issues more seriously, and not put them on the backburner. However, as data protection regulations expand from simply a “citizens-rights” focus, many global organizations now find themselves struggling to manage the convergence of multiple data regulations across different regions, rather than focusing on growth and improvement.  This has resulted in organizations looking at data regulations more holistically and managing sensitive data in an environment where they can understand the unique requirements and manage any conflicts that may arise from the different viewpoints and drivers of said regulations. GDPR has forced organizations to put sensitive data governance at the front and center of their digital transformation efforts. What’s also changed in four years is that there is a much greater focus on the physical geographical location of data. Thankfully, cloud providers are no longer spinning their heads around in response to many organizations’ specific regional hosting needs. With the various compliance, auditing, and breach notification requirements under GDPR better understood, the major cloud providers are equipped to help organizations navigate and advise along the way.

5 ways retailers can unlock the potential of AI. Commentary by Benoit Rojare, AI solutions director for retail & CPG at Dataiku

The rise of mobile tech and digital disruption has made it critical for retailers to give customers what they want, when and where they want it. This is placing increased pressure on retailers to ensure that promises to customers are fulfilled with a personalized, seamless, and unified experience (both online and in-store) – and AI is the only way to deliver that hyper-personalization.  AI’s ability to absorb and sort through a ton of unstructured data and use that information to gain more relevance among customers is a critical asset for retailers. Here are 5 key areas retailers must address to unlock the potential of data: (i) Ensure data preparation is easy and hassle-free – To successfully operationalize and scale, teams need far more than just vision, they need smooth access to data and simplified ways to transform and industrialize these transformations to build the foundations of any data-driven business strategy; (ii) Seamlessly connect every source – Retailers have endless data across touchpoints and systems. But operating in silos, lack of data ownership, and inconsistent data management create blind spots that make it difficult to generate insights in line with the broader business strategy. Functional areas operate like islands and this cannot happen if real insight is going to be realized; (iii) Make legacy technology work – Big retailers have been able to make big investments into ML and AI technology to identify and exploit strategic opportunities through data, but most midsize or small retailers don’t have the resources and expertise to do so. Getting any new platform or technology to work in harmony with the old is fundamental to ROI; (iv) Create a culture of democratization – Retailers need to create change on an organizational level, supporting hundreds or even thousands of individuals affected by the transformation to being a data-led retailer. It is not enough to just fill the business with data scientists but rather, focus on transforming the culture of the enterprise so that data ownership and empowerment goes way beyond one group of specialized individuals, of whom there will never be enough; (v) Enable everyone to get involved with AI transformation – To unlock a data-driven workforce, retailers need to bring together business experts, data scientists and technologists in a multidisciplinary approach focused on solving priority business challenges or capturing new opportunities. It’s about every part of the organization doing their part in harmony toward a common goal.

Implementing Data Lakehouse Security. Commentary by Steven Mih, Cofounder and CEO, Ahana

As the data lake has become widely used, digital native companies are more closely managing the data security and governance of their diverse data sets and their corresponding use. Controlling who has access to what data and what permissions a user might have is critical. In the last year we’ve seen a pronounced effort around building technologies that address these areas for the Data Lakehouse. When it comes to Data Lakehouse security, there are three key areas that need to be addressed: Multi-user support; Role-based access control; and Auditing.  With all the benefits the data lakehouse offers, including better cost, more flexibility, better scale, and being more open, digital native companies want to leverage it more than ever before. And now it’s possible to rest assured that the data lakehouse security is on par with the data warehouse. With more fine-grained access control and governance capabilities in the market today, it’s now possible to architect a fully secured data lakehouse.

Growing 5G adoption will cause boom in edge data. Commentary by Adil Kidwai, Vice President and Head of Project Management, EdgeQ

Current projections indicate that enterprise 5G adoption is set to take off by 2025. As 5G enters the mainstream, we anticipate a Cambrian explosion of new applications, new use cases, and new businesses all at the edge – such as intelligent video surveillance, collaborative robotics, automated factories, and self-driving vehicles. These technologies will emphasize data as a fleeting currency, harnessed and synthesized in real time.  And since this data will be created at the edge, it will be highly distributed and highly localized across disparate endpoints. This will impose a structural shift in how we infrastruct around data: The cloud will need to migrate to the data.  And fundamentally, 5G and AI would need to be fused in a coherent, unified manner. Data provenance will play just as an integral role as data compute.  And 5G will be the technology fundamentally underlying this competitive advantage.

Intelligent IT Provisioning for Digital Transformation. Commentary by Tam Ayers, Field CTO, Digibee

CIOs often feel compelled to slash costs and push efficiency with operational costs, hyper-focusing on solution or product expenditures versus the potential value it could deliver. Veteran leaders within the tech industry need to stay the course in positioning their divisions as profit centers that drive the business forward rather than being viewed as a customary expense line on the books. While some might seek means to cut spending by tapping second-rate tools, the true value of the tools’ worth should be considered. Cost-benefit analysis is essential to any purchasing practice, but the real value proposition needs to be evaluated as a substantial portion of that analysis. To better position themselves for success, leaders cannot compromise today at the expense of tomorrow. Cutting costs in the short-term leads to greater expenses down the road. Why? Because the path to digital transformation demands iterations and adds stressors to a company’s employees—it’s greatest asset. To encourage empowerment, agility and resilience, the focus ought to be on increasing the productivity and efficiency of your current team. Don’t settle for software products or tools that are adequate. Instead, pursue the tools that not only position the business for success, but equally important, find the ones that employees actually enjoy utilizing. Empower individuals to become more productive, while increasing employee satisfaction and retention. By incorporating this ethos, the road to digital transformation becomes substantially easier.

Leveraging Geospatial Data to Prevent Climate Disasters. Commentary by Dr. Mike Flaxman, Geospatial Expert and Product Manager at HEAVY.AI

Climate change and extreme weather events, such as wildfires, floods and hurricanes, pose a growing threat to the world. According to the UN, natural disasters have surged globally over the past 50 years. Geotemporal data and predictive analytics will play a critical role in mitigating the impact of these events and even avoiding some of them altogether.  In terms of prevention, automated data pipelines and ML can be used to continuously refine, and update risk models used to prioritize mitigation activities.  In terms of tactical operations, near-term forecasts are hugely important in prepositioning assets, while high-cadence updates on current conditions can literally save lives and property during emergency operations. For example, wildfires are often caused by dead trees striking power lines. Historically, utilities managed this problem by sending hundreds of contractors to manually visit lines and look for dying vegetation. This was a cumbersome and imprecise process, typically with 4-year revisit times. Recently, utilities have been able to analyze weekly geospatial satellite data to pinpoint locations with the worst tree mortality. Equipped with these granular insights, utilities can determine where dead trees and power lines are most likely to come into contact, then take action to remove vegetation and avoid catastrophe.  For example, one east coast utility found that more than 50% of its outage risk was occurring in 10% of its service territory.  Since major utilities spend hundreds of millions of dollars per year on asset and vegetation management, even modest improvements in targeting can have major positive impacts on both public safety and ratepayer’s wallets.

Moving to a bundled data infrastructure. Commentary by Ovais Tariq, Co-Founder & CEO, Tigris Data

Today, data is a precious commodity for businesses. As we have adopted an unbundled architecture for the data stack, the tools we have for managing the data have become more diverse than ever before; each supporting one specific use case. However, this brings in two significant issues for businesses that implement an unbundled data infrastructure – operational complexity and fragility. With so many tools and services working independently of one another, it can be difficult to maintain control over the entire system. This can lead to chaos and inconsistency, as well as a higher risk of failure. Additionally, businesses may find that they are spending more time and money on managing their data infrastructure than they are on using it to drive their business forward. To overcome these challenges, businesses need to adopt a unified approach to data management. This can be accomplished by adopting a unified control plane that provides a single point of control for all data-related activities, from ingestion to analysis. This will help to reduce operational complexity, improve efficiency, and decrease the likelihood of errors.

VMware acquisition and implications for the open source community. Commentary by Tobi Knaup, CEO of D2iQ

Broadcom’s potential acquisition of VMware is the latest proof point in cloud computing’s unrelenting growth and popularity. VMware’s evolution to a cloud focus has centered on building off its virtualization roots to focus on containers and Kubernetes. While not the only focus of its business, VMware’s track record in cloud-native applications points to both the success and complexity of this market. Broadcom continues to diversify its portfolio, but it’s unclear what will happen with its cloud focus following this acquisition. This uncertainty underscores the importance of having deep cloud native domain expertise, as a singular focus is often required to make these critical applications a reality. In addition, this consolidation brings fears of vendor lock-in, making it more important than ever that organizations and the broader IT community invest in and support the open source initiatives that have made cloud native a reality.

Manufacturing companies are turning to data transformations for ultimate utilization for production. Commentary by Coalesce‘s Co-Founder and CTO, Satish Jayanthi

Data is at the heart of every business, but data analytics are only as good as a company’s ability to consolidate data from various sources and transform it into a consumable format for business intelligence teams. Manufacturing organizations that have amassed large amounts of historical data still struggle to utilize their data to understand basic manufacturing processes including how to cut costs, increase production quality, and reduce errors. Data transformations are quickly becoming an inevitable catalyst to monitor manufacturing processes to create a feedback loop to fix problems and create analytics in real time.

Data analytics tools becoming imperative for better decision making. Commentary by Amit Patel, Senior VP, Consulting Solutions

Organizations across nearly all industry sectors, from fintech and healthcare to government, are making greater strides in extracting real value from the wealth of raw data they have been collecting for some time now. This is being done through predictive analytics to forecast future trends that have the potential to impact business, as well as through prescriptive analytics that inform as to the best course of action—in other words, if “X” happens, what should we do? At the same time, data analysis is also moving into the realm of the mainstream (i.e., business-level users) through the utilization of self-service, no-code/low-code analytics applications to support fact-based, faster daily decision making while reducing the burden on the IT personnel these users were once reliant on for their data reports. These advancements, along with smarter artificial intelligence and machine learning that can better learn algorithms to interpret system data, are leading organizations ever closer to being more fully data driven. In our current world, such digital transformation isn’t merely “nice to do” but is becoming increasingly essential for market competitiveness and overall business success.

Why businesses are so desperate to access external data. Commentary by Maor Shlomo, co-founder and CEO of Explorium

Access to the right external data is a major competitive advantage for enterprises, but they often struggle with finding it. Businesses rely on external data to keep their AI and analytics up to date in fast-changing environments, so it’s no wonder new research finds over 40% of organizations purchase external data from five or more providers. In fact, the same study found 22% of businesses are spending over half a million dollars annually to acquire data. The last two years have been hard for companies to get accurate, current, and relevant information to make impactful business decisions. Even when they do, internal obstacles keep them from making good use of it. It’s time intensive and expensive to source, verify, and integrate external signals with internal data. Today’s organizations need a solution that eliminates the barriers to acquiring external data from the right sources and integrating it in a simple way. These solutions allow organizations to automatically discover the proper signals that improve their predictions for example around lead scoring and segmentation. As the business landscape becomes more competitive, organizations will need to start looking outside their four walls to stay ahead.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Leave a Comment

*

Resource Links: