To help our audience leverage the power of machine learning, the editors of insideBIGDATA have created this weekly article series called “The insideBIGDATA Guide to Machine Learning.” This is our seventh installment, “Production Deployment with R.”
The final step in completing a machine learning project with R is to determine how best to deploy the solution to a production environment. Deploying open source R can problematic for some/all of the following reasons:
- In-memory bound – need hybrid memory and disk scalability
- Single threaded – need parallel threading
- Packages not optimized for big data
- Risk of deploying open source with GPL license
Many times, in order to avoid these issues, data scientists will opt to convert their working R solution to a different programming environment like Python. This path, however, is far from optimal since it requires redevelopment and significant retesting.
Commercial products like Revolution R Enterprise (RRE) offer a much more robust production environment that has big data in mind. RRE’s catchphrase “write once, deploy anywhere” indicates that you can develop your R solution once and deploy with a number of different choices. For example, you can specify a compute context with rxSetComputeContext() — HPC cluster, Hadoop, Teradata data warehouse, etc. and run your R code unaltered. You also can use rxExec() to run any R function on a parallel infrastructure like
Hadoop. RRE includes many big data parallelized algorithms for descriptive statistics, statistical tests, and sampling.
In addition, R performance and capacity is production-ready with RRE. The following graph differentiates between glm() and rxGlm(). RRE will use all cores (e.g. quad core processor) and cluster nodes available on your parallel platform to perform required computations.
RRE also includes DeployR: a web services software development kit (SDK) for exposing R functions via web services for custom application and integration with third party products such as Alteryx, Tableau, Excel, etc.
The last article in this series will focus on Production Deployment Environments. If you prefer you can download the entire insideBIGDATA Guide to Machine Learning, courtesy of Revolution Analytics, by visiting the insideBIGDATA White Paper Library.