Heard on the Street – 3/1/2023

Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!

Open Data Day. Commentary by Rehan Jalil, President and CEO at Securiti

Open Data Day encourages the adoption of open data policies in government, businesses, and society. Data is the fastest growing resource on the planet, unsurprisingly it is considered today’s new oil. As data grows in both volume and breadth of systems, spanning to include multicloud, SaaS, and public/private cloud environments; the regulations, obligations, and environments become much more complex. Organizations typically harness a range of different solutions to satisfy needs within privacy, security, governance, and compliance, creating increasing silos. This has resulted in inconsistent data classification, fragmented visibility, higher costs, and greater complexity. Dark data – including an organization’s unused, undiscovered, and untapped data – can end up dispersed over numerous cloud service provider accounts, regions, and jurisdictions. Today’s modern data landscape offers organizations the chance to reevaluate how they are managing these requirements and instead take into consideration a unified data controls framework. It is critical to adopt a more sophisticated strategy that offers thorough visibility into an organization’s asset footprint and enables risk-reduction steps. On Open Data Day – and beyond the observation date -, it is imperative to think about security, governance, and privacy as the full picture. Organizations must rethink their architectures, ensuring there is unification of data intelligence and controls to ensure they have an effective and efficient avenue to aggregate and centralize visibility and controls of their entire data across all environments.

Being a Data-Informed vs. Data-Driven Organization. Commentary by Tyler Jones, Chief Customer Officer, CLARA Analytics

Companies are moving toward a more data-informed approach vs. data-driven. The distinction here is critically important. The fundamental difference between the two is that the data-driven paradigm is about automation and the data-informed process allows room for human intelligence to make the final decision. A data-informed approach empowers and assists people to make better decisions. It’s a helper, not a replacement. AI will continue to improve and will likely get closer to a data-driven approach over time but taking a data-informed approach will improve process change management and adoption.

Unlocking the potential of metadata. Commentary by Chetan Venkatesh, CEO and Co-Founder of Macrometa

Metadata is a powerful tool for enterprises, providing a wealth of information about their data and enabling them to make informed decisions about their data assets. In 2023, enterprises can leverage metadata to drive digital transformation and gain a competitive edge in their respective industries. By understanding the metadata associated with their data, enterprises can gain a comprehensive view of their data landscape, including information about the structure, content, and relationships between data assets. This information can be used to improve data governance, drive data-driven decision making, and support data-driven business processes. Not to mention metadata can be leveraged to improve data privacy and security. With the increasing importance of data privacy, enterprises can use metadata to identify and classify sensitive data, and then implement appropriate security measures to protect it. Additionally, metadata can be used to track data lineage, allowing organizations to understand where their data comes from, how it is transformed, and where it is stored, which is critical for complying with data privacy regulations such as GDPR and CCPA. By leveraging metadata, enterprises have a huge opportunity to improve their data privacy and security posture, reducing the risk of data breaches and threats.

Data Professionals Spend 39% of Time on Data Cleansing; Data Standardization Is Key to Build Value-Added Applications. Commentary by Narrative‘s CEO & Founder Nick Jordan

Businesses are ingesting more and more data to inform their decision-making, but collecting a variety of unstructured data from disparate sources presents challenges of its own. According to Anaconda’s 2021 State of Data Science survey, respondents claimed to spend 39% of their time on data prep and data cleansing. That means only 60% or so of their time is spent being able to build value-added applications on top of that data or glean actionable insights from it. Key to overcoming this hurdle is data standardization. Currently, almost every organization is using their own data “language.” Effectively translating those languages into one universal language is critical to free up time for data scientists, giving back hours of productivity, reducing costs associated with errors and helping organizations make data-driven decisions. When data is clean, it enables data professionals to focus on value-generating activities, such as developing data-intensive applications, rather than scrubbing the data themselves. Developers and data scientists are able to integrate data with other data sources, allowing them to build more complex and powerful applications. This is particularly necessary for data-intensive applications, from finance to healthcare.

Implications of ending free API access on Twitter. Commentary by Dan LeBlanc, Co-Founder and CEO at Daasity

Twitter ending free access to its API will not have an impact on advertisers and the companies that Twitter makes money from already, but it is going to directly impact companies that use Twitter to report on general trends or companies that enable you to automate via the API. So, think of tweetbots that repost, retweet, and like content. Think of companies that scrape the data to make money and report on trends. Researchers too, who use the API to understand behavior, will be impacted. All these interests will have to pay.

Why we’re not 100% ready for generative AI. Commentary by Brian Walker, CSO at Bloomreach

Much of this technology remains immature and I don’t foresee a widespread adoption or use in a scaled-out fashion just yet. Adequate laws have not been written to address prevalent concerns about biases and the misuse of this technology for distributing misinformation.

On ChatGPT. Commentary by Kurt Muehmel, Everyday AI Strategic Advisor at Dataiku

The release of truly transformative technologies like ChatGPT, and now Bard, should be an opportunity for us to reflect on the incredible times we are living through. More specifically, the upcoming release of Bard will be of great interest as it will allow the broader public to gain an appreciation for what is common to all Large Language Models of this generation and what may be specific to GPT-3.5 (the model behind ChatGPT) or LaMDA (the model behind Bard). With OpenAI having opened the floodgates and Google rushing through quickly thereafter, we should expect to see more such releases from different tech companies, small and large, old and new, in the coming months. That being said, like any technology, these Large Language Models are not neutral, and the way in which they are released will tell us a lot about the values and priorities of the different companies releasing them. As ever, it is important to understand the limitations of these technologies so that they can be used appropriately. It is important to ensure human oversight of their use because, as we have seen, despite all of their capabilities, they can be wrong.

On ChatGPT: Harnessing data is key. Commentary by Doug Laney, Innovation Fellow at West Monroe

ChatGPT and other generative AI applications will provide organizations with numerous opportunities to leverage their data in new and innovative ways – from automatic customer support and training procedures, to content creation, data analysis, and more. The question is how companies should prepare for the litany of issues that will inevitably arise in doing so. For instance, business leaders will have to think through what data they need to redact, mask, and/or synthesize before running it through an AI program. They’ll also need to start considering how various roles in the organization will change as AI replaces or enhances certain functions, among other challenges. One best practice for now? Use generative AI as a co-pilot not the pilot. Ensure that no client deliverable would fail an AI detector.

Unlocking Streaming Data’s Power. Commentary by Julia Brouillette, Senior Technologist, Imply

Streaming data used to be niche—but now it’s the new normal. As the top cloud providers have embraced streaming services and over 80% of Fortune 100 companies are now leveraging the popular streaming platform Apache Kafka, new use cases for data streams are in demand for real-time analytics. Data teams across diverse industries are progressively moving away from the batch-oriented stack approach and toward a streaming-native model. This “next evolution” is nothing short of a paradigm shift in streaming, from fixed data to flowing data. The bottom line is that data at rest has been usurped by data in motion. Companies can now analyze events just as they’re created to immediately compare past and present. Reacting to events as they occur facilitates much better decision-making, with data now transmitting seamlessly within and between organizations via data systems and applications. But how can subsecond analytics on streaming data remain scalable? Some evolving technologies are “purpose-built” to support data in motion systems, such as Apache Druid, a real-time analytics database. Stream processors such as Kafka in combination with Druid have led to new analytics applications with nearly limitless scaling. As a combined platform, Kafka plus Druid can ingest millions of events per second and simultaneously juggle hundreds of concurrent analyst queries. When it comes to moving, analyzing, and sharing data, streaming data is nothing short of a force multiplier—and this is just the beginning of a world of new possibilities.

Microsoft new AI ChatGPT Bing homepage. Commentary by Manish Sinha, Chief Marketing Officer at STL

Microsoft incorporating OpenAI’s GPT technology into its Bing search engine and homepage will cause a ripple effect. But will they only be ripples or will it turn into a wave that could disrupt Google? The search giant is not going to take it lying down! Only a couple of days ago it introduced Bard. Silicon Valley behemoths like Google and Meta are rightly concerned about the rise of ChatGPT. 80% of Alphabet’s overall revenue in 2021 was thanks to Google ads, and as ChatGPT’s popularity grows exponentially – it could hit Google’s bottom line. You can’t take anything for granted and habits could change, especially as more and more people start using ChatGPT. And you could end up using Bing as the default search engine. Search results will become even more accurate as Bing will be better able to understand the context and intent of user queries. Natural language search and authentic-sounding conversational answers are also bound to be popular. There’s a good chance this development could mark the beginning of the end of the era of SERPs. Instead, it may trigger huge A.I.-based search innovation in voice assistant technology. As with all things technological, those who fail to innovate will likely be left behind.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW

Comments

Doug Laney says

March 1, 2023 at 9:07 am

Thanks Daniel for the shout. Yes, at West Monroe we’ve been experimenting, prototyping and actually deploying ways to leverage generative AI both internally and with clients, e.g. data monetization workshop prep.

And at the University of Illinois where I teach an MBA class on Infonomics, we’re setup to allow students to use ChatGPT (or other) with the caveat that 1) they’ll have to share prompts and threads, and 2) ChatGPT will be fed the rubric to automatically grade their essays.

- Daniel Gutierrez says
  
  March 1, 2023 at 9:31 am
  
  Excellent work, thanks for the clarifications!

Heard on the Street – 3/1/2023

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Speak Your Mind Cancel reply

Comments

Featured RSS Feed

More News from insideHPC

Heard on the Street – 3/1/2023

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Join Us On Social Media

Speak Your Mind Cancel reply

Comments

Related Posts

Featured RSS Feed

More News from insideHPC