Archives for 2013

Cloudera and WANdisco Partner to Make Hadoop Continuously Available

Cloudera, a major player in enterprise analytic data management powered by Apache™ Hadoop®, and WANdisco, a provider of continuous availability software for global enterprises to meet the challenges of big data, announced that WANdisco’s Non-Stop Hadoop technology is certified to run on Cloudera’s Distribution for Hadoop version 4 (CDH4) providing 100% uptime for global multi-data center deployments.

Meet the Researcher: Yann LeCun

On Tuesday Facebook announced it hired machine learning pioneer Yann LeCun to run its newly created artificial intelligence lab. Scooping up one of the biggest names in the field is a major move for the company, but it’s not a surprising one. If anything, Facebook is late to enter to the data science arms race that’s underway in Silicon Valley and the country as a whole.

Machine Learning with Ruby

For all you Rubyists out there, here is a great talk from the recent Ruby Conf 2013 that took place in Miami Beach on Nov. 8-10 – Thinking About Machine Learning with Ruby by Bryan Liles. Not sure where to cluster or where to classify? Have you seen a linear regression lately? Every wanted to […]

The State of Big Data: What the Surveys Say

As a data scientist, I should believe in the value of surveys and other data collection mechanisms. In the case IT industry surveys, I’m not convinced how accurately the respondents report their reality while rushing through online surveys. So taking the results with a grain of salt, I found an intriguing article appearing in Forbes: The State of Big Data: What The Surveys Say.

Demo: DataRPM Natural Language Analytics

“DataRPM is a revolutionary business intelligence and data analytics solution that provides a natural language question answering and search interface to analyze and visualize any data residing anywhere in corporate databases, big data systems, files, applications, 3rd party systems, data warehouses and even other business intelligence tools. Available on the cloud, on premises and embeddable in SaaS/ISV applications.”

Big Data Predictions for 2014

As 2013 draws to a close, it is time for industry players to reflect on what progress was made in 2013 and what we might expect in 2014. Here’s an example of what the big data ecosystem is thinking these days with some observations from Pentaho’s CEO Quentin Gallivan.

Data Science Wars: Python vs. R

As I frequently travel in data science circles, I’m hearing more and more about a new kind of tech war: Python vs. R. I’ve lived through many tech wars in the past, e.g. Windows vs. Linux, iPhone vs. Android, etc., but this tech war seems to have a different flavor to it.

Slidecast: Introducing GridBank 4.0 from Tarmin

“With data growing exponentially, one of the greatest data management challenges is end-to-end protection, governance, discovery and access, no matter the file type, device or social origination.” said Steve Duplessie, founder and senior analyst, Enterprise Strategy Group. “Tarmin’s latest release of GridBank is tailored to meet all the needs of organizations facing massive growth with unstructured data.”

Visualization of the Week: Philanthropy Data

The Foundation Center, a global philanthropy think tank, launched its Foundation Stats data visualization and exploration tool this week, which allows users to generate charts and reports from its large database of foundations and grants.

Researchers Use Machine Learning to Identify Breast Cancer Type

Researchers from the University of Alberta and Alberta Health Services have developed a machine learning classifier algorithm that successfully predicts whether estrogen is sending signals to cancer cells to grow into tumors in the breast.