Sign up for our newsletter and get the latest big data news and analysis.

Search Results for: data science

“Above the Trend Line” – Your Industry Rumor Central for 12/8/2022

Above the Trend Line: your industry rumor central is a recurring feature of insideBIGDATA. In this column, we present a variety of short time-critical news items grouped by category such as M&A activity, people movements, funding news, financial results, industry alignments, customer wins, rumors and general scuttlebutt floating around the big data, data science and machine learning industries including behind-the-scenes anecdotes and curious buzz.

Hypothesis-led data exploration is failing you …

In this special guest feature, Aakash Indurkhya, Co-Head of AI at Virtualitics, suggests that you should set your assumptions aside and start looking at your data through the lens of AI. Cut through the noise, surface significant insight, and take aim at the real issues. Forget data as oil–data is gold and Intelligent Exploration is the sophisticated tool that’s going to help you get at it.

2023 Trends in Data Governance 

In this contributed article, editorial consultant Jelani Harper offers his perspectives around 2023 trends for data governance. The valuation of data governance, both to the enterprise and to data management as a whole, is evinced in two of the most discernable trends to shape this discipline in 2023.

What to Avoid When Solving Multilabel Classification Problems

In this contributed article, April Miller, a senior IT and cybersecurity writer for ReHack Magazine, suggests that If you are working with a model with a multilabel classification problem, there is a likely chance you will run into something in need of fixing. Here are a few common issues you may encounter and what to avoid when solving them.

Research Highlights: R&R: Metric-guided Adversarial Sentence Generation

Large language models are a hot topic in AI research right now. But there’s a hotter, more significant problem looming: we might run out of data to train them on … as early as 2026. Kalyan Veeramachaneni and the team at MIT Data-to-AI Lab may have found the solution: in their new paper on Rewrite and Rollback (“R&R: Metric-Guided Adversarial Sentence Generation”), an R&R framework can tweak and turn low-quality (from sources like Twitter and 4Chan) into high-quality data (texts from sources like Wikipedia and industry websites) by rewriting meaningful sentences and thereby adding to the amount of the right type of data to test and train language models on.

The Key Role Missing in Most Data Science Teams

In this contributed article, Wendy Lynch, Founder of Analytic-Translator.com, shares her experience of working with small to large global clients on how to break down the communication barriers in an organization to deliver results. This often happens between the analyst teams and the business teams.

Stop Building Models, Start Training Data

In this special guest feature, Sanjay Pichaiah, VP of Product Growth at Akridata, highlights why it is time for data scientists to stop building models and start training data. The path to better models and greater model accuracy doesn’t lie exclusively with the model, even though that has been the greatest focus in recent years. To truly accelerate and increase model performance, we need to be focusing more on the training data sets we are supplying the models and stop hoping the data is good enough.

Heard on the Street – 11/29/2022

Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace.

2023 Trends in Artificial Intelligence and Machine Learning: Generative AI Unfolds  

In this contributed article, editorial consultant Jelani Harper offers his perspectives around 2023 trends for the boundless potential of generative Artificial Intelligence—the variety of predominantly advanced machine learning that analyzes content to produce strikingly similar new content.

Chung-Ang University Researchers Develop Algorithm for Optimal Decision Making under Heavy-tailed Noisy Rewards

Researchers from South Korean Chung-Ang University propose methods that theoretically guarantee minimal loss for worst case scenarios with minimal prior information for heavy-tailed reward distributions.