Sign up for our newsletter and get the latest big data news and analysis.

Supercharge Data Science Applications with the Intel® Distribution for Python

Sponsored Post

The Python language plays a prominent role in almost every data scientist’s workflow. There are countless easy-to-use Python data science packages, ranging from exploratory data analysis (EDA) and visualization, to machine learning, to AutoML platforms that enable rapid iteration over data and models. Python is integral to many high-profile use cases such as facial recognitionsentiment analysisfraud detectionbrain tumor classification, and much more.

Intel® Distribution for Python Accelerates Data Science Workflows

Intel® Distribution for Python is a distribution of commonly used packages for computation and data intensive domains, such as scientific and engineering computing, big data, and data science. With Intel® Distribution for Python you can supercharge Python applications and speed up core computational packages with this performance-oriented distribution. Professionals who can gain advantage with this product include: machine learning developers, data scientists, numerical and scientific computing developers, and HPC developers.

Intel®’s accelerated Python packages enable data scientists to take advantage of the ease-of-use and productivity of Python, while taking advantage of the ever-increasing performance of modern hardware. Intel®’s optimized implementation of scikit-learn (leveraging Intel® Data Analytics Acceleration Library), as well as Intel®’s optimized implementations of Tensorflow and Caffe (leveraging Intel® MKL-DNN), achieve highly efficient data layout, cache blocking, multi-threading, and vectorization. Intel®’s optimized implementations of numpy and scipy provide drop-in performance enhancement to the expansive complement of statistics, mathematical optimizations, and many other data-centric computations already built on top of numpy and scipy. In addition, Intel® now provides daal4py, which combines the API simplicity familiar to users of scikit-learn, with automatic scaling over multiple compute nodes. This rich feature-set helps data scientists deliver better predictions faster, and enable analysis of higher volume data sets with the same compute and memory resources.

Using Intel® Distribution for Python, data scientists and data engineers are able to:

  • Achieve faster Python application performance with minimal or no changes to your code.
  • Accelerate NumPy, SciPy, and scikit-learn with integrated Intel® Performance Libraries such as Intel® Math Kernel Library and Intel® Data Analytics Acceleration Library.
  • Access the latest vectorization and multithreading instructions, Numba and Cython, composable parallelism with Threading Building Blocks, etc.

Performance improvements include: faster machine learning with scikit-learn key algorithms accelerated with Intel® Data Analytics Acceleration Library, the latest TensorFlow and Caffe libraries optimized for Intel® architecture, the XGBoost package included in the Intel® Distribution for Python (Linux* only). Intel® Distribution for Python is included in the company’s flagship product, Intel® Parallel Studio XE.

Close-to-Native Code Performance

Intel® Distribution for Python incorporates multiple libraries and techniques to bridge the performance gap between Python and equivalent functions written in C and C++ languages, including:

  • Intel® Math Kernel Library (Intel® MKL) for BLAS and LAPACK
  • Intel MKL vector math library for universal functions (uMath)
  • Intel® Data Analytics Acceleration Library (Intel® DAAL) for machine learning and data analytics
  • Integration with Intel® Advanced Vector Extensions (Intel® AVX), a feature of the latest Intel® Xeon® processors

A series of benchmarks were performed to show the efficiency of optimized functions for areas—linear algebra, Fast Fourier Transforms (FFT), uMath, machine learning, composable parallelism, Amazon Elastic Compute Cloud, and Black Scholes formula—and compare Intel® Distribution for Python to its respective open source Python packages. The benchmarks measure Python against native C code equivalent, which is considered to be representative of optimal performance. The higher the efficiency, the faster the function and the closer to native C speed.

Summary

The following groups can benefit from the Intel® Distribution for Python:

  • Machine learning developers, data scientists, and analysts – easily implement performance-packed, production-ready scikit-learn algorithms.
  • Numerical and scientific computing developers – accelerate and scale the compute-intensive Python packages NumPy, SciPy, and mpi4py.
  • High-performance computing (HPC) developers – unlock the power of modern hardware to accelerate your Python applications.

Intel® Distribution for Python is a free software package available for Windows, Linux, and macOS. Each OS option comes with specialized packages for accelerated workflows and advanced functionality. Download immediately HERE.

Leave a Comment

*

Resource Links: