Understanding “Human Intent and Behavior” with Computer Vision

Print Friendly, PDF & Email

Computer vision is one of the most eminent forms of statistical Artificial Intelligence in use today. Comprised of varying facets of object detection, facial recognition, image classification, and other techniques, it supports a range of pressing use cases from contact-less shopping to video surveillance.

Its enterprise utility, however, is centered on one indisputable fact which, although perhaps not as exigent as the foregoing ones, redoubles its underlying merit—and ROI. Quite simply, it enables machines to see and understand things the way humans do, providing the foundation for the former to act like humans and effectively automate any critical business process, task, or function.

This potential is frequently realized via Robotic Process Automation bots that, enriched by computer vision’s machine intelligence, will “act on your behalf, Mr. Human, and do the things you want me to do,” remarked Automation Anywhere CTO Prince Kohli. “But now to act like you, I must be able to interact with my environment, the applications on my desktop, or other things, like you. That means I need a certain amount of AI and cognitive computing. Computer vision, for example, must be able to understand invoice numbers, images, how to digitize documents, all those things.”

The tandem of computer vision and bots supports three foundational use cases yielding immediate enterprise value. It’s integral for automating any series of tasks, performing Intelligent Document Processing (IDP), and discovering new processes to automate. Each of these applications considerably increases the throughput, scale, and celerity of enterprise automation for everything from back-end processes to external, revenue generating ones.

Process Automation

The machine learning substratum upholding computer vision largely relies on deep neural networks, many of which are highly domain specific. These machine learning models undergo a training period so that, for many enterprise users, they offer out-the-box functionality in robust solutions specializing in automation. Typically, computer vision is the means of empowering bots to view and comprehend human behavior with the propensity “to look at a desktop, a screen in front of a person, and look at applications,” Kohli revealed. “For each application and each window, by your interaction with each window, it understands what the window does.”

This perceptivity is the basis for gleaning how the information on a screen is used in a particular process—such as accessing Salesforce information for support for sales teams dealing with prospects. With computer vision providing the intelligence about the significance of the onscreen items, applications, APIs, login information, and more, bots are able to implement action from this knowledge to automate processes. “They know the data on the window, the controls on the window, where the target is, where information gets extracted, and how do you, for example, submit something on Salesforce,” Kohli mentioned. “What do you have to fill in, what do you have to take out, what do you do with a task-based system or some local system, whether it be some IT system, etc.” Bots can record such activities and, when desired, complete them perfectly.

Intelligent Document Processing

IDP has practically become horizontally applicable and is regularly used in industries such as insurance, healthcare, financial services, and others. This computer vision application is so widely employed because of its rapidity for accurately processing documents in which intelligent bots “get digitized documents, whether it be a PDF or whatever it may be, and understand what it says—not just a text scan,” Kohli noted. Such instantaneous comprehension is useful for classifying documents, extracting information from them, and automatically routing it to downstream systems. Moreover, this use case is one of the best examples of the range of cognitive computing models—many of which may be accessed via the cloud—required to support computer vision. “The IDP space is so rich with AI models for understanding, for example, lung cancer or some other problem in the lung, or an AI model to understand disease, which is very different from an AI model to understand invoices or a factory layout,” Kohli maintained.

The diversity of machine learning models underpinning these various computer vision applications is readily accessible via platforms like Google Cloud, which not only specializes in these data science resources, but also partners with competitive RPA platforms for such purposes. “We have a few models that work very well for certain invoices, the mortgage space, and a few others as well,” Kohli commented. “And then, Google brings many, many others so the benefit that customers get is best of breed AI models for their use cases.” The capacity to readily utilize models based on subject matter expertise in specific domains is another way computer vision equips machines to process information as well as, if not better, than humans can—especially when aided by bots.

Process Discovery

No matter which computer vision techniques a particular system or digital agent employs, the primary value they yield is the capability to intelligently observe what it’s monitoring. For enterprise users attempting to pare costs, boost efficiency, and achieve more while using fewer resources, one of the most valuable deployments of this technology is discovering new processes to automate (which inherently meets these three objectives). Combining this optical characteristic with the action virtual agents create delivers the best of both worlds: unwavering observation and automated action. According to Kohli, “Finding processes is one of the hardest jobs of automation. If you have software that just passively sits there and understands all the processes in long, valuable processes, and helps you automate them very easily, that is very close to Nirvana.”

The parallels between a digital agent performing this function and a human (like a supervisor or a manager) watching the way workers process the various steps for insurance company adjudication, for instance, before suggesting ways to make them faster, easier, and more scalable—then implementing those steps for them—are perfectly clear. They’re also consistently realized when deploying bots with computer vision that specialize in such functionality. In fact, Kohli observed that fundamentally, the capability to regularly discover processes and automate them requires the capacity to “understand human intent and behavior.” With bots, organizations can leverage this computer vision advantage to then duplicate that behavior.

Machine Intelligence

There are several complexities in daily operations that humans take for granted. The notion of intelligence in this sense is relatively broad, and oftentimes easily overlooked. For example, people can visually assess things so quickly to determine, for example, a particular payment code for a medical procedure. “I can look at an invoice, and in two seconds flat I can tell you that the invoice is about this asset, it was ordered on this date, here is the address to which it was going to be delivered, here is the address it came from, and this is the person to whom it’s going,” Kohli denoted. “As a human, I can do that very quickly for any invoice. But for a computer that’s hard.”

Computer vision, however, makes such a task much less exacting. Grafting it to RPA creates repeatable, sustainable action from this knowledge, which is imperative for automating important business processes.

About the Author

Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance and analytics.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Speak Your Mind



  1. Thank you for sharing such an informative article. Keep posting