Sign up for our newsletter and get the latest big data news and analysis.

Combating Distributed Malware Through Machine Learning

Karthik KrishnanIn this special guest feature, Karthik Krishnan, Vice President of Product Management at Niara, discusses how machine learning is an option that’s gaining popularity to detect cyber attacks given its effectiveness in classifying and clustering attack activity, even within large event data streams. Karthik is responsible for driving product strategy and direction as well as customer engagements. He helped to drive the initial framework for the Niara Security Analytics Platform. Before joining Niara, he served as vice president of product management at Embrane, acquired by Cisco in 2015, and senior director of product management at PGP, acquired by Symantec in 2010. Karthik also spent five years at Juniper as the director of product management, driving product strategy, sales and customer engagements for Juniper’s Access Control products. 

Cyber attacks have increased in sophistication, with cybercriminals using multi-stage techniques to establish a gestating and persistent presence within corporate networks. Hence, it’s no longer sufficient to simply identify an attack and deal with the consequences later. An organization must be able to determine what else may have been affected and if hackers have installed persistent malware with the intent of causing future damage to systems.

But that’s easier said than done. There are many complex nuances that hinder the detection of these attacks. For example, it’s difficult to distinguish between regular HTTP traffic and malware-to-C&C (command and control) communications over HTTP. Too many malicious activities can be made to look benign, and signatures can be adapted, often frequently, so it’s impossible to catch all malware and malicious system use.

So how can one better detect these attacks? Machine learning is an option that’s gaining popularity, given its effectiveness in classifying and clustering attack activity, even within large event data streams.

The clues to an advanced attack are buried in disparate data sources (e.g., network traffic, server logs, network flows, “IP reputation” data streams, firewall logs, etc.). A range of machine learning techniques – supervised, semi-supervised, unsupervised and reinforcement learning – can help to extract actionable security insights from the large volumes of data being produced by modern IT systems, enabling computers to recognize patterns and correlations in an intelligent way. Machine learning can thread together seemingly unconnected actions found in the above mentioned disparate data sources to reveal an attack.

For example, hackers use domain generation algorithms to automatically create domains. These serve as the command and control servers that communicate with and provide instructions to compromised systems within businesses. In the past, automatically generated domains were typically a random sequence of characters, and thus, easily identifiable as machine generated. However, as hackers have become more sophisticated, they’ve refined the algorithms to generate human readable domains that are increasingly difficult to detect as malware.

This is exactly where machine learning can help. By carefully selecting features, and providing labeled training data, supervised learning techniques can be used to build machine learning models that accurately identify these human readable, machine-generated domains and the associated malware families. This is just one example of the application of machine learning. Using a range of different techniques, it’s possible to identify aberrant behaviors and also associate any maliciousness with them. This is important because not every change in behavior is a cyberattack. Knowing what’s malicious helps analysts to focus on anomalies that are in fact part of an attack. In brief, machine learning helps identify signals that indicate true attacks in an increasingly large sea of noisy data.

So what does this all mean and how does it help? Machine learning helps to better understand the nature of complex distributed malware being used to compromise networks every day. And with better understanding, it will ultimately help improve defenses, while maintaining the integrity, confidentiality and availability of businesses’ information and information systems.

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: