Managing a successful data science project requires time, effort, and a great deal of planning. Defining the problems to solve and planning the project’s scope is just the tip of the iceberg, as team members need to fully understand all aspects of a project in order to effectively contribute.
A critical challenge of data science projects is getting everyone on the same page in terms of project challenges, responsibilities, and methodologies. More often than not, there is a disconnect between the worlds of development and production. Some teams may choose to re-code everything in an entirely different language while others may make changes to core elements, such as testing procedures, backup plans, and programming languages. Transitioning a data product into production could become a nightmare as different opinions and methods vie for supremacy, resulting in projects that needlessly drag on for months beyond promised deadlines.
The goal of this guide is to explore grounds for commonality and introduce strategies & procedures designed to bridge the gap between development and production. The topics range from Best Operating Procedures (managing environmental consistency, data scalability, and consistent code & data packaging) to Risk Management for unforeseen situations (roll-back and failover strategies). We also discuss modelling (continuously re-train of models, A/B testing, and multivariate optimization) and implementing communication strategies (auditing and functional monitoring).