Ask a Data Scientist: Curse of Dimensionality

Print Friendly, PDF & Email

datascientist2_featuredWelcome back to our series of articles sponsored by Intel – “Ask a Data Scientist.” Once a week you’ll see reader submitted questions of varying levels of technical detail answered by a practicing data scientist – sometimes by me and other times by an Intel data scientist. Think of this new insideBIGDATA feature as a valuable resource for you to get up to speed in this flourishing area of technology. If you have a big data question you’d like answered, please just enter a comment below, or send an e-mail to me at:daniel@insidehpc.com. This week’s question is from a reader who wants to know more about the “curse of dimensionality.”

Q: What is the “curse of dimensionality?”

A: Richard Bellman* is traditionally credited with first coining the term Curse of Dimensionality in his work with dynamic optimization in the late 1950s and early 1960s. Now, it’s used to describe situations involving machine learning and big data for the analysis of high-dimensional data. To better understand its meaning, it is useful to break the term into its two component. “Dimensionality” refers to the number of dimensions or predictors in a data set (also known as feature variables). “Curse” refers to the difficulties that arise in analytics when the number of features variables increases. The “Curse of Dimensionality” is the exponentially increasing difficulty a data scientist encounters in finding any discernible patterns or global optima for the parameter space to fit models as the number of feature variables in the data space increases.

There are many techniques used to deal with the Curse of Dimensionality. You can use feature engineering to search through the available predictors to find those that are most important in predicting the desired response variable. You also can use Principal Components Analysis (PCA) to reduce the dimensionality in the data while maintaining as much as possible of the variation.

* Bellman, R.E. Adaptive control processes: A guided tour. Princeton University Press (Princeton, NJ), 1961

If you have a question you’d like answered, please just enter a comment below, or send an e-mail to me at: daniel@insidehpc.com.

Data Scientist: Daniel D. Gutierrez – Managing Editor, insideBIGDATA

 

 

 

Speak Your Mind

*