The Next Level of Robotic Process Automation: Automating Data Science

Print Friendly, PDF & Email

Traditionally, the mutually beneficial relationship between Robotic Process Automation (RPA) and data science was perfectly equitable. RPA’s bots implemented timely action based on the insight of data science’s advanced analytics, while data science’s predictive models were responsible for giving those digital agents greater intelligence and enterprise applicability.

More recently, however, RPA’s automation has expanded to not only include any assortment of workflows supported by data science algorithms, but also the realm of data science itself. In this respect, RPA’s utility is part of a greater movement to democratize data science alongside other options like SaaS solutions for machine learning, self-service analytics platforms, and visual solutions for building predictive models.

RPA is able to further this objective in two principal ways, the first of which involves natively including a range of statistical AI approaches such as “computer vision, Natural Language Processing, and Deep Learning,” remarked Automation Anywhere SVP of Products and Engineering Abhijit Kakhandiki. Moreover, RPA effectively automates critical aspects of the predictive model building and deployment process, most notably training models and selecting the best algorithms for accomplishing business tasks.

The significance of this development emphasizes, yet ultimately transcends, RPA’s extension into data science. It’s more noteworthy for enabling business users to perform their jobs better without scarce, expensive data scientists and for “truly democratizing AI because those models come to the business users as business relevant solutions, so that anything that they need to understand about that AI solution is in the context of business,” affirmed Kakhandiki.

Training Predictive Models

Devising the classic or advanced machine learning models that imbue virtual agents with greater intelligence often requires an inordinate amount of labeled training data. Finding that data to teach models how to predict desired business outcomes is part of a lengthier process in which data scientists are “building a model, and doing feature engineering and those kind of things, and they’re also responsible for the training of those models,” Kakhandiki said. Relegating the training of cognitive computing models solely within the realm of data scientists is inadvisable for two reasons. Firstly, data scientists often lack sufficient quantities of labeled training data to teach the most sophisticated machine learning models how to predict necessary business outcomes. Secondly, what data are available may involve theoretical, data science sandbox datasets with little bearing on real world business use cases.

It’s far more preferable to train statistical AI models on data from production settings, which is what bots are now able to do. For example, they can gather a diversity of information from different sources to determine whether or not an applicant is approved for a business loan. Once collected and properly formatted, this information becomes the basis for a loan officer’s decision. Those decisions—and the data upon which they’re based—become the labeled input data for training machine learning models to make these judgments themselves. “The AI model and the bot can actually learn from that, and say this is the underlying model I need to update, so let me observe this human and I can actually, based on that decision taken by that human and the underlying data that went into that, update the AI model constantly so that it becomes smarter over a period of time,” Kakhandiki mentioned.

Bot Learning

From a data science perspective, such production data are ideal for training machine learning models, particularly when considered at scale for larger organizations. Although a business user’s labeled decision (whether to approve or deny an application) is the primary data source for training machine learning models, this step is affected by other types of learning that indirectly affect the model training process. Bots use the following capabilities to get the data required to make a loan decision from the proper sources:

  • Computer Vision: This technology enables bots to “see a screen just like a human would,” Kakhandiki revealed. Thus, bots can observe different data sources and websites from which to get data supporting business use cases.
  • Natural Language Processing: NLP technologies work in conjunction with deep learning ones “so bots can recognize all the app controls on the screen like a human and operate those controls,” Kakhandiki said. Implicit to this capability is the capacity for bots to “understand applications written in different application frameworks,” Kakhandiki commented.
  • Connectors: Termed by Kakhandiki as “hooks”, these mechanisms allow bots to execute the various steps required for their processes (such as data retrieval) to work with varying operating systems.

The action derived from these capabilities is influential for retrieving the data necessary to approve or deny loan applicants—providing labeled input data for training machine learning models to do so, too. Automating this core facet of data science is essential so that “instead of training being in the hands of a few data scientists, RPA has taken it to the next level by democratizing it by pushing it to the business users,” Kakhandiki explained. “Instead of five data scientists doing this, now there are 500 business users who can do this.”    

Algorithm Selection and AutoML

Another vital component in the data science pipeline is choosing which predictive model—and its underlying algorithm—functions best for a particular use case. Data scientists often create multiple models for this purpose, or even ensemble (combine) them. However, modern RPA solutions are able to automate this aspect of data science by deploying AutoML, which effectively “applies like, 60 different algorithms and based on which one predicts the human results the best, is the best model,” Kakhandiki noted.

This option is viable for situations similar to the loan approval use case in which data are retrieved from myriad sources to form a single decision (particularly for binary decisions such as approval or denial of loans). In these instances, AutoML’s various models and algorithms are responsible for determining the best one—which effectively automates a host of data science steps. Moreover, “the algorithms can change behind scenes based on as the bot sees more and more data,” Kakhandiki disclosed, which reflects the ongoing learning process core to the effectiveness of data science, and to RPA as well.

The Bottom Line

There’s a critical bifurcation of the overall impact of RPA’s automation of data science mainstays like generating models, training them, and selecting the most appropriate one for a particular business task. On the one hand these capabilities enable data scientists to devote greater time to more meaningful tasks, such as determining additional ways to support the business with analytics solutions.

Ultimately, however, placing these fundamental data science processes into the realm of business users gives them better control of the data-driven processes they require to increase productivity—and the bottom line. “When you actually bring AI and ML to a business user in the context of what they want to actually automate, what that does is it helps in being able to deliver AI to a much broader audience,” Kakhandiki acknowledged.

About the Author

Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance and analytics.

Sign up for the free insideBIGDATA newsletter.

Speak Your Mind



  1. Great article! At CiGen (an Australian pure-play RPA specialist), on our blog, we haven’t yet approached the subject of RPA’s use and impact on data modelling and data science aspects in general, but we appreciate the work you’ve done explaining that here. Will share with our community on social media. Thanks.

  2. Great article covering data science automation. Data science, when combined with the power of RPA, can do wonders. Thanks for the share.

  3. Abram Abram says

    I didn’t have any expectations concerning that title, but the more I was astonished. The author did a great job. I spent a few minutes
    reading and checking the facts. Everything is very clear and understandable. I like posts that fill in your knowledge gaps. This one is of the sort.