Sign up for our newsletter and get the latest big data news and analysis.

“Above the Trend Line” – Your Industry Rumor Central for 2/4/2020

Above the Trend Line: your industry rumor central is a recurring feature of insideBIGDATA. In this column, we present a variety of short time-critical news items grouped by category such as M&A activity, people movements, funding news, industry partnerships, customer wins, rumors and general scuttlebutt floating around the big data, data science and machine learning industries including behind-the-scenes anecdotes and curious buzz. Our intent is to provide you a one-stop source of late-breaking news to help you keep abreast of this fast-paced ecosystem. We’re working hard on your behalf with our extensive vendor network to give you all the latest happenings. Heard of something yourself? Tell us! Just e-mail me at: daniel@insidebigdata.com. Be sure to Tweet Above the Trend Line articles using the hashtag: #abovethetrendline.

It’s the calm before the storm (when the big data conference season begin) so I’m catching up with some long-overdue projects. I have even more time on my hands after my special program at UCLA Extension, “Introduction to Data Science,” for a group of 30 Chinese scholars which was scheduled to start this week was canceled due to travel restrictions on account of the coronavirus! Data science will have to take a back seat for now. In the meantime, we learned of some new funding news … Kaskada, a machine learning company that provides a unified platform for feature engineering, announced it completed a Series A funding round totaling $8 million. The investors include, among others, Voyager Capital, NextGen Venture Partners, Founders’ Co-op, and Walnut Street Capital Fund. The total amount raised to date by Kaskada is $9.8 million. Kaskada helps organizations make better predictions and increase the speed of innovation by integrating data science and data engineering workflows. Kaskada delivers an end-to-end platform for feature engineering and feature serving, including a collaborative interface for data scientists and robust data infrastructure for computing, storing, and serving features in production. Features are the independent variables that go into machine learning algorithms and are among the most important factors in a machine learning project’s success. Most companies manage feature engineering and feature serving as a non-collaborative, inefficient process. Data scientists design features and data engineers must rewrite the features before they are deployed. This process slows innovation and increases the potential for error. Kaskada provides a unified platform for data scientists and data engineers to share features and eliminate inefficiencies, allowing teams to operate as high-functioning data science factories … Directly, a leader in customer experience automation, announced a $20M strategic investment that will significantly expand the company’s telecommunications practice. New participants in the round include new investors Samsung NEXT, Industry Ventures and AvidBank. Existing investors Microsoft’s M12 Ventures, True Ventures, Costanoa Ventures and Northgate also took part in the round. Directly’s expert platform simplifies the complex task of making virtual agents work. Companies can use the Directly platform to understand the thousands of things their customers want, provide automated answers or actions to customer questions, and tap community experts to help the customer when the virtual agent can’t. The round was led by Industry Ventures, a previous investor in Figure Eight, who was acquired by Appen in 2019.

We also learned of a new partnership … Yellowbrick Data, the modern analytical data warehouse built for the hybrid cloud, announced a technology partnership with MicroStrategy® Incorporated (Nasdaq: MSTR), a global provider of enterprise analytics software and services, that provides the market with a new integration of a Yellowbrick Data warehouse and MicroStrategy 2020™, MicroStrategy’s flagship enterprise analytics platform. The combination of Yellowbrick Data’s modern analytical database and MicroStrategy 2020 is designed to bring actionable insights to the workforce, while enabling significantly faster performance for complex queries and support for high user concurrency.

In people movement news we heard … Information Builders (IBI), a leading data and analytics company, announced the appointment of Carol McNerney as chief marketing officer (CMO). Reporting to CEO Frank J Vella, McNerney will drive the organization’s marketing and communications strategy. With nearly 30 years of experience, McNerney will focus on driving greater brand awareness, scaling a global demand-generation organization, and mobilizing a community of passionate customers and partners that rely on the IBI platform. With clients that range from Fortune 100 enterprises and federal agencies to innovative data-driven companies of every size in every major industry, McNerney has the opportunity to make the IBI community a competitive advantage for the company.

2020 Trends/2019 Year-in-Review

“Businesses have been working to break through the logjam of AI projects that have been back-burnered in the face of machine learning skills shortages,” commented Stradigi AI’s CCO (Chief Commercial Officer) Per Nyberg. “However, we’re seeing the real world reach of AI expand with more companies looking at ways to foster collaboration, gain economies of scale and accelerate their AI paths from concept to production with maturing tools. AI is no longer for the small minority of machine learning experts and data scientists. With data at their core, business analysts are also eager for a slice of the pie. With AI and ML tools at their disposal, the skills of business analysts are expanding towards data science to explore insights from more diverse and richer data sets through the use of machine learning. Technology and automated machine learning techniques will begin shifting the use of data and AI to a greater proportion of a company’s business analysts. The demand for these skills are also starting to shape higher-ed curriculums to contend with this new wave of expectations.”

“AI is already used by many retailers for functions such as customer service (through chatbots), product recommendations, and targeted advertising,” commented Julien Gautier, Marketing Director at ActiveViam. “The reach of AI technologies will expand in 2020 to help retailers stay competitive by recommending markdown optimations and more accurate promotions and prices for price managers to implement. Optimizing operations in logistics, merchandising, sourcing, etc., will be accomplished as fulfillment managers learn from patterns derived from their own data.”

“The field of AI will come to terms with its age and success,” commented Richard Socher, Chief Scientist, Salesforce Einstein. “Questions around the ethics of AI are not new, but 2020 will be the year of reckoning as the industry builds out the best practices and regulations required for ensuring that AI works in the best interest of people.”

The demand for AIOps in the enterprise will continue to rise as AI and machine learning have taken the industry by the jugular,” commented Ram Chakravarti, Chief Technology Officer, BMC. “Due to an expansion in the number of workloads – both in public cloud and on-premises – and an increase in application complexity, investment in AIOps will increase and ultimately lead to better business outcomes. Today’s challenges place a premium on differentiated vendor solutions powered by AI/ML and big data analytics techniques that can help modern IT operations evolve from traditional monitoring to observability to actionability.

“Cloud data warehouses turn out to be a big data detour,” commented Tomer Shiran, co-founder and CEO of Dremio. “Given the tremendous cost and complexity associated with traditional on-premise data warehouses, it wasn’t surprising that a new generation of cloud-native enterprise data warehouse emerged. But savvy enterprises have figured out that cloud data warehouses are just a better implementation of a legacy architecture, and so they’re avoiding the detour and moving directly to a next-generation architecture built around cloud data lakes. In this new architecture data doesn’t get moved or copied, there is no data warehouse and no associated ETL, cubes, or other workarounds. We predict 75 percent of the global 2000 will be in production or in pilot with a cloud data lake in 2020, using multiple best-of breed engines for different use cases across data science, data pipelines, BI, and interactive/ad-hoc analysis.”

“Graph++ will be grow in 2020. We will see strong growth in graph use-cases and the graph ecosystem,” commented Joerg Schad, Head of Engineering and Machine Learning, ArangoDB. “This is due to the recent trend towards graph in processing and storage. We expect massive growth of graph-related developments in machine learning. A few of specific area include the following: (i) Knowledge Graphs: Knowledge graphs have been a powerful tool to represent knowledge by relationships between different entities. Combined with machine learning, we learn/extract new knowledge from knowledge graph (and for example grow the knowledge graph itself); (ii) Graph Neural Networks: The new stars in machine–deep neural networks–expect basically vectors as input, while graphs are expressed as nodes and vertices. Lots of current research and industry use cases trend similar to how we developed neural networks for dealing with graphs, natural language, and voice; (iii) AI-based DB and Multi-Model DBs: Even while we typically associate machine learning with frameworks such as TensorFlow, PyTorch or MxNet, data scientists dedicate most of their time to prepping data. A database, especially a graph or multimodel database supporting graph queries, document queries, and text retrieval can be a very powerful tool here; and (iv) Metadata and Production Grade ML Infrastructure: Machine learning is moving more and more into production scenarios, where metadata is equally important as good training data. As such, metadata representing a multi-stage machine learning pipeline can be naturally modeled as a graph connecting different documents of metadata.”

“Artificial intelligence (AI) workloads will continue to generate business value in 2020,” commented Stanley Zaffos, SVP of Infinidat. “But, for organizations to increase their reliance on AI, storage vendors will need to make it easier for AI applications to access more data faster, in turn helping the systems learn faster and unlock the value of the data. As we enter 2020, data sets are getting bigger and demands for instantaneous decision making are becoming more prevalent. This puts stress on the training systems. Expect more demand for smarter storage systems to match the escalating intelligence of the applications themselves. We’ll see more investments in tools like software-defined switches to open up more pathways for hardcore analytics; QoS functions to dole out information more strategically; scale-out system architectures; and the ability to deliver data with lower latency.”

“It’s difficult to put numbers on these things, but historically, corporate use of machine learning has faced three main challenges: finding the right people with the right skills; funding and housing the necessary compute power; and getting the necessary data to use ML effectively,” commented Saif Ahmed, Product Owner Machine Learning, Kinetica. “Today, these challenges have largely been addressed. First, there are so many ways to learn the skills needed to work with AI and ML that it’s hardly a specialty at most companies. Google and the other big tech companies may still have access to the most cutting edge data researchers, but for the rest of us, it’s not an impossible task to find employees with the skills in the right wheelhouse. Secondly, every year the amount of computing power you get for the same dollar goes up, while the amount you need for a successful ML project goes down. This means funding and housing the necessary computing power is much more feasible. Lastly, data is being recognized across industries as a crucial business asset. There are entire startup ecosystems built around data (data cleaning, data capture, data-cleaning-as-a-service); it’s not an obscure resource anymore. Data today is the most valuable business asset because everything runs on data. From the smartphone in your pocket to the retailer you bought it from, the fourth Industrial Revolution is being built by organizations that recognize the versatility of data to not only provide historical insights but to also inform business decisions in real-time.”

“The ability to harness the power of data will accelerate disruption across the economy and create winners and losers more quickly than in the past,” commented Infoworks CEO, Buno Pati. “New challengers will rise faster than seen before in this next decade and incumbent leaders will fall just as fast. Research from BCG shows that for large companies, there is now less correlation between past and future financial and competitive performance over multiple years. Data scientists across all industries currently spend about 80% of their time on lower-value activity such as ingesting data, incrementally updating data, organizing and managing data, optimizing pipelines and delivering data to applications. The cost: only 20% of data scientists’ time is spent on developing applications to further growth and competitive advantage for business. Those who truly harness the power of data via new, automated approaches to data operations and orchestration will thrive, as this will enable them to focus their data science talent on creating business value. The impact of digital transformation will be felt across all segments of the economy – in expected (technology, financial services, retail/etail, etc.) and unexpected places (agriculture, home improvement, public sector, etc.).”

“Machine learning with models has reached a turning point, with companies of all sizes and at all stages moving towards operationalizing their model training efforts,” commented Alluxio’s Founder, Chairman and CTO Haoyuan (H.Y.) Li. “While there are several popular frameworks for model training, a leading technology hasn’t yet emerged. Just like Apache Spark is considered a leader for data transformation jobs and Presto is emerging as the leading tech for interactive querying, 2020 will be the year we’ll see a frontrunner dominate the broader model training space with pyTorch or Tensorflow as leading contenders.”

“This year, I expect that an increasing number of companies will prioritize building a data-driven organization, but the challenge of becoming truly data-driven will continue,” commented Aaron Kalb, co-founder of Alation. “To date, a majority of companies have struggled to see past the massive amounts of data they have to actually build a data-driven organization. According to research, only 38 percent of companies have created a data-driven organization, and 91 percent of companies cite people and process challenges as the biggest barriers to becoming data-driven. One big shift we will see this year will be at an organizational level: with greater availability and understanding of data in companies, org charts and job descriptions will change. For example, many “strategy” teams could become analytics teams and we could see more quantitative roles and opportunities in functions like marketing. We can also expect to see the “data science pendulum” start swinging back: Recently, machine learning experts have been focused on optimizing their models to be as predictive as possible using inscrutable algorithms like deep learning; but as infamous mistakes have been publicized and ethical issues have been raised, I anticipate we will start to see renewed interest in algorithms that can “explain” their classifications, and business processes that creatively combine human and machine input, rather than delegating fully to computers. Lastly, the cloud will get complicated. Companies will increasingly look to firms like Amazon, Google and Microsoft to host their services and move more and more of their business to the cloud. But simultaneously there will be regulatory pressure to segment and separate customers’ data sets, and different departmental initiatives splitting organizations’ checks across multiple clouds (despite the efforts from each to be the one-stop-shop).”

“Data driven is so last year!” commented Jean-Luc Chatelain, Managing Director & Chief Technology Officer at Accenture Applied Intelligence. “In 2020 businesses need to be data & AI powered. Enterprises must leverage all trusted data to discover actionable insights to inform their business processes, create new customer experiences and achieve transformational outcomes.”

“In the year ahead, small footprint approaches to ASR continue to be of interest in the field and seem to be slowly improving, allowing more obscure languages to be created with less available data,” commented Alex Fleming, Product Marketing Manager at UK-based Speechmatics. “Voice devices will become more multilingual so they can be deployed in more countries and handle multiple accents and dialects. In addition, the expectation is that these voice services can do more and in real-time. For example, someone speaks Chinese, and the mobile phone not only recognizes and transcribes the speech but translates it and sends English to your earphones. WER will continue to be a major metric and progress will probably continue to be made by throwing more and more data at the problem with growing networks to compute. There may not be many benefits for end users as they are too large and potentially slow to be practical. For this reason, I believe ASR will move past the ‘more data is all you need to improve accuracy’ argument. Expectations will start to plateau at something like 95% accurate and achieving the last 5% will be either not important for some markets or will require a really deep understanding of the real world to solve.”

“Machine learning will drastically impact every industry over the next 10 years,” commented Atrium co-founder and VP of Data Science, Eric Looftsgaarden. “Machine learning capability is directly related to data quality, but data quality is an issue everywhere. As we look to 2020 and the decade ahead, companies that want to get ahead and truly capitalize on the value of data will have to develop techniques that can better accommodate low-quality data. That being said, there will be an “arms race” for businesses to collect more and better quality data than competitors in 2020, while looking for ways to generate unique data that others do not have. Data will be the primary value generation engine across industries. In 2020, we will see data replace other forms of intellectual property as the driver of business value and acquisition value of organizations. Machine learning will drastically impact every industry over the next 10 years. Machine learning The companies that are innovative in collecting and leveraging data their competitors don’t have will be able to differentiate themselves and lead their markets.”

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: