Harness Unstructured Data with AI to Improve Investigative Intelligence 

Print Friendly, PDF & Email

In this special guest feature, Jordan Dimitrov, Product Manager, Unstructured Data Analytics, Cognyte, addresses the importance of unstructured data, why AI is an invaluable tool and how to move beyond legacy approaches to data management. Jordan is responsible for the unstructured data analytics in NEXYTE, Cognyte’s decision intelligence platform. Before transitioning to investigative analytics, he was a Product Analyst in cybersecurity, dealing with asset visibility and threat detection. His educational background is in Marketing & Business.

Until now, investigation and intelligence teams have largely focused their efforts on expanding their data collection capabilities to include more and more data sources. But for these teams – in domains including law enforcement, financial crimes, immigration management, national security, port/airport authorities and more – this growing stockpile of data often produces very little actionable insight. 

For smaller teams making do with fewer data sources, the core challenge is to extract more meaningful insights from the limited data available to them. Every detail must be thoroughly mined to complete the fullest possible investigative picture. 

For investigative teams large and small, data collection on its own is simply not enough. The focus has now shifted to fusing these disparate data sources for more effective, automated analytics that improve decision intelligence. 

The inability to synthesize unstructured data with conventional structured data has emerged as a major stumbling block in this effort, however. A recent survey of 200 chief investigators and senior analysts confirms this lingering challenge, among other valuable data points.


Unstructured data – including images, video and multimedia, hand-written criminal reports, etc. – accounts for a fast-growing percentage of today’s available intelligence content. Unstructured data including cyber data and criminal records already comprises the majority of data being used today for investigations by governmental organizations. And the volume of this data is growing exponentially, sourced from CCTV cameras, social media and other forums and formats. 

Investigators need the ability to efficiently ingest and analyze this media-based, unstructured data, and moreover, they need the ability to cross reference and correlate unstructured data with their structured databases. This is achievable with AI technology. 

Valuable insight can be extracted from unstructured data when it’s synthesized and analyzed properly with AI. Details embedded in photographs and hacker forums, for example, can reveal relationships between bad actors and other important contextual information. Textual analysis of police records is another important target application for unstructured/structured data synthesis.


AI is critical to this effort, and ultimately helps to transform unstructured data into structured data that can be analyzed easily. The AI-driven process begins with the automated extraction of identifiers inside the unstructured data – this could include faces, objects, text elements, location context and more. 

Leveraging comprehensive text, audio, image and video data analytics, AI can help surface previously hidden relationships and patterns emerging from the unstructured data. Analysts ultimately gain a clearer overall picture based on these linkages, significantly improving their decision intelligence. With a deluge of unstructured data now upon us, it would be impossible to do all of this manually at scale. 

AI is crucial for enrichment purposes throughout the process. This includes establishing, ingesting and indexing the available metadata accompanying the unstructured data. Additionally, AI enables the extracting, structuring and correlating of valuable ‘object’ data contained in media-based unstructured data (photos, videos, etc.).


There are several limitations to the ad hoc approaches commonly employed today when managing unstructured data. When it comes to AI enrichment, it’s cumbersome to outsource this to multiple third parties for text, video, image, facial recognition enrichment, etc. Third-party access to sensitive information can also introduce obvious security and privacy concerns – and in secure ‘air gap’ environments, access to cloud-based data and services is often disallowed.

The challenges with third-party outsourcing extend downstream throughout the workflow. Offline, third-party enrichments introduce issues with data reingestion and other process bottlenecks. The multiplication of files and queries across multiple third-party services can also add considerable extra expense over time.

While many solutions have come to market in recent years, they typically are limited to handling specific unstructured data formats and/or they offer partial capabilities in a limited selection of supported languages. There are major benefits to managing these enrichments and processes via a single unified solution leveraging AI. Key advantages can include sophisticated capabilities for fusing structured and unstructured data streams and establishing and analyzing important correlations and patterns amid the data.

Unstructured data comprises the majority of data being used for investigations by governmental organizations today and will play an increasingly vital role in investigative analytics going forward. 

To ensure a holistic, data-driven intelligence assessment, unstructured data fusion and analysis are essential. 

A comprehensive, unified solution can fuse all data sources – structured and unstructured – together in one place, with all the cost and workflow efficiencies that entails. Most importantly, this approach can dramatically improve overall decision intelligence, yielding more precise and complete insights faster than what’s possible with legacy approaches. As more investigative teams tap AI-based solutions to automate these processes at scale, they’ll be well equipped to handle the flood of unstructured data that’s only just begun. 

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW

Speak Your Mind



  1. Josue Calvo says

    This is totally spot on, Jordan.
    What I found interesting is what I believe is a missing piece in the puzzle. I see many data scientists happy to throw tons of raw unstructured data into AI to get some insights out. But then, they seem to fail at creating a consistent place holder (repository, database, knowledge graph, you name it) where the extracted data points are kept and further processed for entity resolution aka: fusion.
    Our team followed that approach, with a specialised knowledge graph we called Data Fabric, created to hold the extracted data points from all sorts of sources. The responsibility of entity resolution is left to further processes, with proprietary algorithms for matching and connecting data, making possible the data fusion you describe in your final paragraphs. As you can imagine, running these entity resolution processes over the knowledge graph, rather than straight from raw data, is orders of magnitude faster and cheaper, making possible insights we wouldn’t dare thinking about in the first place. The results, even in the few verticals we are tackling right now, are very exciting!