Heard on the Street – 5/10/2022

Print Friendly, PDF & Email

Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!

Rising Demand for Data Skills. Commentary by Julie Furt, Vice President Global Delivery, Talend

As digitization accelerates, data skills have become a top workplace asset. Whether you’re in IT or on the business side, data is now an integral part of every operation, and teams can’t afford to have a knowledge gap that hinders informed decision making. Data scientists are essential for their ability to parse through data and craft business solutions from it, but the field has been experiencing a critical shortage for years (one unlikely to be fixed in the short-term). Reliance on data will continue to escalate, however, and organizations will need to train and support  ‘citizen analysts’ if they want to properly manage and leverage their data moving forward. For many organizations, the first step is cultivating a culture of data sharing and mature reporting capabilities. This is only a stepping stone, though; long-term success can only be found with data literacy. This goes beyond tool training to ensure that citizen analysts not only learn how to use a dashboard, but also how to make operational and strategic decisions based on what the reports tell them. Leaders determined to achieve organization-wide data literacy must ensure that each team member understands: (i) What data is relevant to their role; (ii) How to access it – either through specific business intelligence tools or ad hoc; (iii) How to read the data and understand what it is saying in order to make operational and strategic decisions. All of these aspects are trainable and necessary, if data-driven cooperation is the ultimate goal.

Bike Safety and Big Data. Commentary by Gabriel McFadden, senior regional sales manager, GRIDSMART Technologies

This year’s Bike Safety Month comes at a pivotal moment for transit agencies across the country. According to the National Highway Traffic Safety Administration, 846 bicyclists lost their lives in traffic crashes in 2019, showing the need for greater action to protect these travelers. As AI and data-driven technologies grow in development and sophistication, now is the time to harness its potential in the name of bike safety. Platforms are emerging that leverage enhanced AI capabilities to protect vulnerable road users (VRUs) such as bikers and pedestrians from the onslaught of vehicle traffic. They can even detect the carbon fiber bikes that advanced cyclists use, which was previously not possible. Real-time analytics feed into these platforms that then adjust clearance times to accommodate VRUs no matter how quickly or slowly they travel. This means longer clearance times for those that need it as well as shorter clearance times when possible, giving back valuable extra seconds to vehicles and optimizing intersection safety and efficiency for all road users.

Creating a True Data Lakehouse. Commentary by Ori Rafael, CEO and co-founder, Upsolver

The “data lake house” is a term meant to represent the convergence of the data lake and data warehouse. The driving principle behind it is an industry mega-trend – breaking down the database monolith into de-coupled services for storage (on object stores), metadata (data catalogs) and querying. Local storage has always been the Achilles heel for databases. Database administrators needed to make sure disks don’t fill up, create multiple data copies, perform backups etc. Snowflake proved how much easier it is to manage a data warehouse on top of cloud object storage instead of local storage, but their customers still must buy the full stack of data services (storage, metadata, querying) from a single vendor. he data lake house concept goes far beyond replacing the data warehouse. Any database, SQL-based or not, will work better with a decoupled architecture. The flexible future we should strive for is composed of centralized storage and metadata layers and multiple fit-for-purpose query APIs (SQL, key-value, text search, ML/AI). In this future, customers own their data store and select the best-of-breed engine for each use case, eliminating vendor lock-in. For the data lake house to be more than just a fancy new word for a proprietary data warehouse, it has to truly be open. If the data format or catalog is vendor-specific, then it’s still a data warehouse. If all data services must be purchased form a single vendor then it’s a data warehouse. Nothing new except the marketing.

Real-Time Data Needed to Improve Food Supply Chains. Commentary by Saar Yoskovitz, CEO of Augury

A major obstacle for food manufacturers being able to meet production targets is unexpected equipment failures, which threatens food safety and contributes to supply chain issues. In fact, 62% of plant-level workers report daily or weekly unplanned downtime incidents due to machine failures. One of the reasons there are frequent machine failures is because the food manufacturing industry is behind other industries in real-time data collection. Less than half of food industry manufacturers have the ability to visualize the real-time condition of critical assets across all sites. The biggest asset data collection challenges faced by plant operators are the time it takes to collect the data, the accuracy of the data, and the frequency of measurements. Most food manufacturers have data after a failure occurs and can determine why and how the failure occurred, how to repair or mitigate the failure, and when to repair it, but they don’t have the real-time data needed to prevent machine breakdowns from occurring in the first place. Implementing Machine Health, which combines the real-time condition-based monitoring of machines with AI-powered diagnostics to generate insights that prevent machine failure, is the solution to many of the food & beverage industry’s current and future problems.

Engineering teams deserve more than the three pillars of observability. Commentary by Chronosphere CEO Martin Mao

Observability has been largely defined as a collection of three distinct data types, often known as the three pillars — logs, metrics, and distributed traces. While these are all critical inputs to observability, they are not observability solutions in and of themselves. Effective observability can drive competitive advantage, world-class customer experiences, faster innovation and happier developers. But organizations can’t achieve actionable observability by throwing more and more telemetry data at it  – they need much more than that to derive maximum value from their data. Instead of focusing on the three inputs of observability, organizations should  shift focus to the three phases of observability that create better outcomes – know, triage, and understand. In practice, these three phases improve results by empowering engineers to focus on crucial  answers to three key questions: (i) Phase 1 (Know): How quickly—before or after a negative customer or employee experience—am I notified when there is a problem? (ii) Phase 2 (Triage): How easily and quickly can I triage the problem and understand its impact? (iii) Phase 3 (Understand): How do I find the underlying cause so I can fix the problem? Focusing on these three phases and implementing tools and processes that help engineering teams answer these questions as quickly as possible will enable teams to achieve the promise of great observability.

Twitter’s Open-Source Algorithm: Transparency Is the Best Policy. Commentary by Cybersecurity expert Derek E. Brink, Vice President & Research Fellow at Aberdeen Strategy & Research

The idea that algorithms should be open and transparent has been considered best practice for nearly 140 years. It’s called Kerchoff’s Principle, which holds that trying to keep the algorithms secret — which many refer to as “security by obscurity” — is the wrong approach to maintaining security. Instead, the algorithms themselves should be public knowledge — or as put by Shannon’s Maxim (another version of the same principle), we should operate under the assumption that “the enemy knows the system.” In cybersecurity, openness and transparency has consistently led to algorithms that are better and more secure, not less. For those who raise the concern that an open, transparent algorithm might be “gamed” to provide some advantage — can we not say the same thing about “closed” algorithms? Everyday examples are abundant, for example: how to make your web pages more likely to be found by search engines; how to raise your credit score; how to minimize the likelihood of an IRS audit on your tax return; how to improve your candidacy on job search sites; and how to optimize your personal profile for dating sites, to name just a few. Openness and transparency about how these algorithms work is the best way to prevent discrimination and corruption – or, as Supreme Court Justice Louis Brandeis put it, “sunlight is the best disinfectant.”

Data is a high-stakes game at a crowded table. Commentary by Lior Gavish, CTO and co-founder, Monte Carlo

In 2022, data workloads between analytics and ML use cases are merging. Three years ago, you couldn’t use Databricks to spin up business intelligence dashboards, and likewise, Snowflake and Redshift couldn’t help you when it came to running data science experiments with Python and Spark. Now it’s all table stakes. With tools like Fivetran and dbt, these solutions are migrating towards ETL (extract from source, structure/transform at collection, and load into the warehouse), putting the onus on lakes and warehouses to serve as a source of truth instead of a loading zone for data. Any way you shake it, it signals a market maturity that’s making data more actionable and operational than ever before.

No-code Dataflow and Data Strategy to create Successful Data pipelines. Commentary by Thiago Da Costa, Co-Founder and CEO of Toric

Compiling and streamlining data from multiple sources can be a risky proposition, especially in the architecture, engineering, construction and owner industries, given the likelihood for fragmentation, chaos and conflict. Companies in these sectors need a data platform to solve the problem of collecting and utilizing data, to ensure individuals in architecture and planning, civil engineering, construction, real estate operations and developers can integrate, transform, model and visualize data—without writing code. They don’t have the time or desire to learn how to write code, but it’s crucial that they be able to leverage existing data to combine design, project, and finance data in one place for real-time analysis, insights, and decision making. No-code options allow them to not only abandon spreadsheets, but streamline and share critical data in the moment. It’s really the great equalizer to connect the overlaps in this space to ensure all parties are speaking the same language and are, quite literally, on the same page.

How to Ensure Your Company is Ready Before Adopting No-Code/Low-Code Tools. Commentary by Mahendra Alladi, founder and CEO of ACCELQ

No-code/low-code tools play a significant role in accelerating digital transformation across multiple industries. This technology has reached a state of maturity in which it is now possible to imagine software development as a modular effort with business-level abstraction. Additionally, there is a plethora of ecosystem tools among several areas including test automation, continuous deployment, and production monitoring, bolstering the maturity of low-code/no-code business practices. In short, the market is crowded. Business leaders selecting a tool need a holistic strategy for a sustainable and low-maintenance option. Here are some pointers to help make the tool selection decision: (i) Ensure the tool is truly no-code/low-code from the bottom-up, and not a shallow abstraction suitable only for a limited set of simplified workflows; (ii) Evaluate the complexity of extending the core capabilities offered by the tool. Regardless of how comprehensive the native capabilities may be, it is likely that you may end up missing a critical piece that requires extendability; (iii) As the low-code/no-code technology is still evolving, there isn’t a well-defined standard interface yet for external integrations. Be sure to check out the maturity of the external/API interface for the tool. Any complex business implementation will likely include multiple no-code/low-code tools in the flow; (iv) If you are selecting a low-code/no-code tool in the areas of test automation etc., it is imperative to choose something that can work with all the leading low-code/no-code platforms; (v) Check the upgrade process and update cycles, and the complexity associated with change-management. With disparate update release cycles from multiple vendors, the situation can quickly get out of control.

World Password Day. Commentary by Cohesity Chief Information Security Officer Brian Spanswick.

With more than 22 billion connected devices online and cyber attacks on the rise, your data has never been at greater risk. On World Password Day, it’s critical that IT managers, SecOps personnel, and, for that matter, all business workers, remember to prioritize password hygiene today and year around. Using a password manager is an effective way to ensure secure passwords, and taking steps to choose a unique password that’s regularly updated and varied from device to device can mean the difference between a normal day and a devastating data breach — where you potentially not only expose your data, but put your company at risk as well.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Speak Your Mind

*