Sign up for our newsletter and get the latest big data news and analysis.

Heard on the Street – 8/15/2022

Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!

The Importance of Human Touch in AI Innovation. Commentary by Igor Bergman, VP & GM of Cloud and Software at Lenovo

Today many businesses rely too heavily on technology to reach their goals. Consequently, automation cannot work solely by itself for the best customer experience. Instead automation requires  human touch. Combining both humans and technology together allows businesses to provide improved efficiencies and create tailored and engaging experiences for customers. For example, smart AI, by combining voice recognition with AI back-end, can enable people to communicate in virtual meetings more easily without having to make adjustments manually. Relying on AI to learn what the optimal meeting environment is and then adjusting accordingly is a great use of AI expanding human capabilities and allowing humans to focus on other aspects of the meeting. If the last few years have taught us anything, it’s that we no longer think about work, play and home as separate, but blended. This is where intuitive AI, equipped with human elements working in collaboration, can really excel and help users take their day-to-day experiences to new levels. Solutions and applications should be built to improve the user’s experience, across the board, from the application itself, to how the device is set up, to access and security, whether via the cloud or a corporate VPN. AI enables us to do this more effectively. With device diagnostics powered by AI and user data, we are able to better understand how users use their device, what’s important to them and proactively work to resolve issues. IT administrators, gamers, students and individual device owners can have not only an optimized and secure device using AI to automate much of the manual labor, but a personalized experience because it’s them, the human, that will choose the best options for their own devices. This is another example of how the software on the device assists the human to amplify their device experience.

Tapping into data analytics for more productive hybrid meetings. Commentary by Brian Goodman, Director of Product at Poll Everywhere

Meeting norms have undoubtedly changed amid hybrid work. As a result, companies are conducting most, if not all, of their team interactions online. This reality opens a very beneficial door for business leaders, as they’re able to reap the benefits of data analytics stemming from meetings. With meetings now being largely facilitated on platforms like Zoom, MS Teams, and Webex, data can be collected via live presentations and meetings via participant feedback – like polling, Q&As, open-ended and multiple-choice questions, comments, reactions, etc. From there, vast amounts of functional data that reveal employee sentiment and engagement levels can help leaders to thrive in this new era of work where every interaction serves as an opportunity for better listening. By employing their data insights, leaders can improve online engagements and overall decision-making – once again revolutionizing the way we conduct work. 

 The potential downside of going serverless. Commentary by Alexey Baikov, CTO and Co-founder of Zesty 

For some organizations, the upside to serverless is clear, enabling them to move quickly without the need to address the underlying infrastructure. For many others, however, making the switch to serverless may not only prove to be unnecessary but also has the potential to be challenging from a cost and performance perspective. Limited customization capabilities in serverless PaaS may drastically hinder companies’ power to meet certain KPIs efficiently, and as a closed source platform, the limited visibility and ability to monitor effectively increases the difficulty of debugging, preventing outages, and performing root cause analysis. If managed incorrectly, this can lead to increased costs. In addition, going serverless tends to become very expensive when the time comes to scale. The combined runtimes of the many functions required as a company scales have the tendency to be roughly 45% more expensive than running on a traditional on-demand virtual machine. The benefits of serverless computing are certainly vast, but companies must also consider the downsides to ensure they’re approaching it intelligently.

Open-source large language model, BLOOM, released. Commentary by Marshall Choy, SVP of Product at SambaNova Systems

Large language models (LLMs) are state of the art but the sheer size of them is a big obstacle for academic researchers. BLOOM, backed by a significant grant and in partnership with over a thousand volunteers, is finally making LLMs accessible to academia, allowing researchers to further advance these models. LLMs such as BLOOM represent a new class of technology that fundamentally shifts the needle for AI. These foundational models – AI models which are not designed to be task-specific and are trained on a broad set of unlabelled data – have multiple applications across industries. We still have so much to learn about the mathematics of LLMs, and BLOOM presents an opportunity for academics to improve the algorithms. But two years on from the release of GPT-3, the biggest challenge for LLMs is still applying the models to enterprises in real-world scenarios. Bringing together foundational models such as BLOOM with domain-specific knowledge is a game-changer for enterprises.

Data Quality is Paramount to Data Ops and Data Empowerment. Commentary by Heath Thompson, President and GM of Information Systems Management, Quest Software

Data quality has overtaken data security as the top driver of data governance initiatives — with 41% of IT decision makers noting that their business decision-making relies fundamentally on trustworthy, quality data. The problem? As businesses deal with a massive influx of data available in today’s modern world, so much of that data is not being used strategically. In fact 42% of ITDMs said that at least half their data is currently unused, unmanageable and unfindable. This massive influx of dark data and a lack of data visibility can lead to downstream bottlenecks, impeding the accuracy and effectiveness of operational data. Recent research shows that focusing on DataOps is overwhelmingly agreed to be a major key in empowering employees to use data confidently. In fact, 9 in 10 ITDMs believe that strengthening DataOps improves data quality, visibility and access issues across their business. Businesses should look to improve DataOps accuracy and efficiency by investing in automated technologies and deployment of time-saving tools, such as metadata management. Currently, only 37% of respondents describe their DataOps processes as automated, and a similarly small proportion report having automated data cataloging and mapping today (36% and 35% respectively). That number will need to increase significantly in order to fully maximize data use for both IT and line-of-business needs.

On Snowflake Summit and Databricks. Commentary by Lior Gavish, CTO, and co-founder, of Monte Carlo

This conference season, one thing was clear: it’s all about collaboration. During Snowflake Summit 2022 in Las Vegas, the Data Cloud provider announced new features to make it easier for developers to build and monetize data applications on top of Snowflake, while Databricks announced their own data marketplace, a new platform for exchanging data products, notebooks, and machine learning models across teams and even companies. As these cloud behemoths continue to roll out new products and services that make it easier for customers to decentralize and share data, we expect the onus on data quality and trust will grow even bigger.

Amid record global heat, utilities turn to satellite and AI tech to prevent wildfires, outages. Commentary by Jeff Pauska, Digital Product Director, Hitachi Energy

From Australia and Europe to North America, record droughts and abrupt changes in climate have created profound operating environment risks for power utilities, increasing their likelihood of sparking wildfires and initiating damaging, widespread outages. Utilities need to proactively manage vegetation growth around infrastructure even more carefully this time of year – despite constrained budgets – to avoid a disaster like the Dixie Fire, the second-largest wildfire in California’s history, which was sparked when power lines came into contact with a tree. As utilities take on this critical task of vegetation management, they are turning to new technology for support. Using deep AI visual analysis and satellite imagery, utilities can automatically analyze vegetation around their overhead lines and take proactive steps toward wildfire and outage prevention. This AI technology, using algorithms trained on thousands of miles of utility asset data, satellite imagery, and validated by point-cloud field captured datasets, automatically identifies vegetation infringements against business action thresholds, and predicts tree growth and off right-of-way hazards – a major risk factor in utility-caused wildfires. By automatically identifying trees and other vegetation at risk of contacting power lines, utilities can prevent wildfires and protect their customers from catastrophe and widespread outages.

How AI is becoming easier and more accessible to everyone. Commentary by Erin LeDell, Chief Machine Learning Scientist, H2O.ai

With more businesses moving towards incorporating AI in their day-to-day operations, one of the biggest challenges to its advancement is that often, organizations don’t have the internal resources or expertise to develop and carry through projects that use AI. This is particularly the case with businesses outside of the technology industry. With demand for AI at an all-time high and these challenges in mind, the biggest scale-related trend is the acceleration of democratizing AI – making it not only available to everyone, but also easy and fast to use, so all companies can get in on the action. This is where open source frameworks and the ability to use low-code and pre-canned proprietary apps are growing in popularity, as they make it easier for any kind of enterprise to build and operate AI-based services in common areas like fraud prevention, anomaly detection and customer churn prediction.

Climate Resilience Analytics Emerges as the Latest Means to Evaluate Threats. Commentary by Toby Kraft, CEO, Teren

The infrastructure industry is no stranger to geospatial data. However, it’s often viewed as a source for specialists rather than decision-makers. The size and complexity of geospatial data have limited its use to GIS and data analysts rather than the enterprise. But that’s all changing rapidly as decision makers need access to data that’s not only spatially accurate but timely. Across all infrastructure industries, including oil and gas, renewables, electric, telecommunications, roads and railways, decision-makers need to shift from a focus on risk management to strategic resilience. This requires them to understand not only their asset’s risk, but also the relevant external and environmental threats and how they change over time. The result is in an emerging market: Climate Resilience Analytics. Climate resilience analytics go beyond climate risk modeling to inform physical risk mitigation and strengthen resilience. It pinpoints where climate risks threaten assets, prioritizes threats, reveals how site conditions can be modified to fortify assets, and monitors and measures progress toward physical resilience through time.

Navigating VC deal flows, looming recession. Commentary by Ray Zhou, Affinity’s co-founder and co-CEO

It is clear that the venture investing world has changed, with the public market slowdown and the closing of the IPO window directly impacting startup valuations. However, we are yet to see the volume of investments being made slowing down as much as feared but we expect that to be the case more in the second half. We see that volume shift in our platform that shows that VCs are adding new deals to their pipeline at a 23% slower rate than in 2021 – pointing to a much more strict set of criteria being applied to potential investments. Given the amount of money available to be called down by VCs is actually increasing, we do not expect that this situation will just lead to better deals for VCs in the second half but rather an increased level of competition between VCs for great investments–as they are all applying the same selection criteria. The VC firms who are putting the effort into founders’ relationships, understanding the reality of their investment criteria and great deal management are going to be best positioned to win that competition.

Adapting to new demands for data quality through sustainable data operationalization and end-to-end testing. Commentary by Michael White, Sr. Product Marketing Manager at Tricentis

There is a crisis of information trust, observes distinguished Gartner analyst Ted Freidman, and poor data quality is a major factor at the root of it. With pressure on organizations to adapt to new demands for digital transformation, seamless operation of increasingly complex data pipelines, and more robust data compliance expectations, the risks of bad data are more prevalent now than they were a few years ago. Add to that a renewed emphasis on data governance and data ownership, and organizations are increasingly seeking automation and AI-powered solutions to minimize the risks of tedious scripts, convoluted SQL queries, and manual spreadsheet-based data exercises. After all, a shocking 24% of Enron’s spreadsheets contained errors. Furthermore, Gartner suggests that the lack of a sustainable data and analytics operationalization framework may delay key organizational initiatives for up to two years – an eon in today’s digital economy! While the focus on avoiding past mistakes is good (e.g. designing user interfaces and onboarding data sources), a solid framework for testing the complete data layer – including the API and UI – is critical for a sustainable data framework and operationalized analytics value chain. End-to-end data testing is a more effective approach for ensuring costly data issues are captured at the source(s). In this manner, data discrepancies can be identified, reconciled, and remediated prior to them rearing their ugly proverbial heads downstream causing breakdowns or sometimes worse: leaking into BI reports and ML models which can result in unwittingly poor business decisions and bad predictions for months.

Machine Learning Automates Relevancy at Scale. Commentary by Katie Boschele, Senior Product Manager, Lucidworks

The quality of any digital experience depends on how relevant it is to the person using it. There are thousands, upon hundreds of thousands, of people using a site at any given moment and expecting a relevant experience that meets their unique needs. It’s impossible to build that manually. Machine learning and other advanced technologies automate relevancy to connect people to the information, products, and services that they need. One of the best examples of this automation in action is with semantic vector search. One of the places we see this is enhancing the queries in the search bar.   Let’s say a contractor is looking for a very specific piece of connective pipe. They type in the product number but that product is no longer being manufactured. Instead of getting a “No Results” message, semantic vector search understands what they are looking for and relates this query to the newer version of this same product—no manual updating required by the merchandisers on the other end. Machine learning automation saves time and the sale for merchandisers and valued customers alike.

Google announces Q2 2022 earnings. Commentary by Amit Sharma, CEO and co-founder, CData Software

Alphabet’s Q2 earnings show that the tech giant isn’t immune to the challenges the market currently faces. Continued transitions and reliance on the cloud has enabled the provider to remain competitive as businesses modernize their tech stack. Organizations are increasingly shifting to cloud databases to make their data more easily accessible and agile. As organizations prioritize data connectivity amid remote and hybrid work changes, we can expect Google to further hone in on their capabilities to keep organizations efficient. Data is optimized when it’s secure and also readily accessible across all systems in real-time – that’s how businesses can uncover the true value of their data.  

Insights for its hybrid cloud and AI strategy. Commentary by Qin Li, Solutions Manager at Tamr

What should businesses focus on to adopt hybrid AI successfully and ensure they’re using the right tech? Focus on the business results. Start the project with value in mind and select solutions that can get you to value quickly and easily. Maintain some sort of lineage. Make sure the workflow keeps track of what decisions were made by the machine and what was overwritten by humans so that the machine can learn from it continuously. Nowadays, any machine learning projects or deep learning projects will involve tons of data. Solid cloud storage and compute infrastructure would definitely be essential. On top of that, distributed systems such as spark can help with parallel processing of the data, helping speed up the process. I would recommend open source technologies and high interoperability with other technology components.

How the government can achieve data democratization. Commentary by Tomer Shiran, CPO of Dremio

To better manage and protect an ever-growing amount of data, the U.S. government introduced a digital initiative known as the Federal Data Strategy.  The FDS requires the creation and adoption of certain standards concerning data across government agencies, as well as the implementation of structured strategies designed to improve data management, security, and processes throughout the government. The initiative is a great start at better managing and protecting data, but without thorough strategies around processing and analyzing data, U.S. government agencies could be at risk. There is no other option. The only way the government can best utilize top tier technologies to process and analyze data, without compromising data governance and security, is through an open architecture that doesn’t lock users into a singular vendor. Open architectures are flexible, easy, and allow users to see inside all or parts of the architecture without any proprietary constraints. Vendor-agnostic partners can help keep costs low, give unbiased support, offer customized solutions, and provide an overall simplified process. Two tools that immediately come to mind are Apache Iceberg, a table format that is free from vendor lock-in, and Apache Arrow, an anti-vendor lock-in memory format and data exchange protocol. Implementing an open architecture without vendor lock-in, like Apache Iceberg and Apache Arrow, to process and analyze data is the best approach for the U.S. government in better managing and protecting data agency-wide.

Why we need vector databases now more than ever. Commentary by Charles Xie, CEO of Zilliz

The amount of unstructured data, including everything from text, images, audio, and more, filling up the cloud is happening at an ever-growing rate, causing greater demand for robust data management tools. Enter vector databases. Vector databases are being used in an increasingly large number of applications, including but not limited to: image search, recommender system, text understanding, video summarization, drug discovery, stock market analysis, and much more. When combined with powerful machine learning models, vector databases have the capability of revolutionizing semantic search and recommendation systems. With over 80% (and increasing) of all saved data being unstructured, vector databases are quickly becoming a go-to solution for utilizing vast amounts of information that enterprises need to operate. To make users’ decisions even easier, when choosing a vector database, there are several open-source options available, which boast the distinct advantage of being community-driven and thoroughly tested in both small and large deployments, furthermore proving vector databases are the way to go when managing unstructured data.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Leave a Comment

*