Welcome back to our series of articles sponsored by Intel – “Ask a Data Scientist.” Once a week you’ll see reader submitted questions of varying levels of technical detail answered by a practicing data scientist – sometimes by me and other times by an Intel data scientist. This week’s question is from a reader who asks for an explanation of data leakage.
MapR Technologies, Inc., provider of a leading distribution for Apache™ Hadoop®, has announced that NTT Comware is using the MapR Distribution including Hadoop to power its new SmartCloud service. Launched earlier this month for customers in Japan, SmartCloud provides Hadoop-as-a-service to leverage its big data processing infrastructure running in the cloud.
Q: What is the role of exploratory data analysis in data science?
How do you handle missing data? What imputation techniques do you recommend?