The Key to Machine Learning to Defend Against Web Application Attacks

Print Friendly, PDF & Email

In this special guest feature, Misha Govshteyn, Co-Founder and SVP of Products at Alert Logic, offers some practical insights to successfully leverage machine learning to improve web application attack defenses. Misha co-founded Alert Logic in 2002. He is responsible for security strategy, security research and software development at Alert Logic. Prior to founding Alert Logic, he served as a Director of Managed Services for Reliant Energy Communications. In this role, he developed and successfully launched five major product lines including Managed Intrusion Detection Services and Managed Enterprise Firewall/VPN Products.

The enterprise shift to the cloud has created an inviting attack surface for a growing number of web application attacks. According to recent cloud security research, web application attacks accounted for 73 percent of all the incidents flagged in the 18-month evaluation period. The research found that web application attacks affected 85 percent of all Alert Logic customers, with injection-style attacks such as SQL injection leading the pack.

The problems caused by SQL injection (SQLi) attacks stem from their notorious ability to blend in to the “noise” associated with web application traffic. Organizations receive, on average, 17,000 security alerts per week and, generally, only have time to attend to about 4 percent of these alerts. More alarming is that 81 percent of web application-related alerts are classified as “noise.” Investigating such noise ultimately consumes more than two-thirds of security teams’ time which causes organizations to miss real threats.

It is especially difficult for security teams to accurately identify SQLi attacks and SQL databases are increasingly common within organizations that operate in the cloud, since Remote Desktop Services (RDS) in public cloud platforms like AWS and Azure make it so simple to create a database-driven web app.

So how can an organization cut through this noise to quickly identify and eradicate SQLi attacks? Enter, machine learning.

In the last 12 months, the application of machine learning (ML) to real world problems has accelerated with the development of tools and frameworks that allow programmers to apply it to a wide variety of problems. When it works, machine learning can enable businesses to solve significantly more complex problems than human intelligence alone when dealing with large volumes of data; i.e. Human Genome sequencing, facial recognition, cancer treatment, and, critically, cybersecurity. But simply implementing machine learning isn’t enough.

To stay competitive within industries where advanced data processing is vital, it’s important to take a comprehensive approach to integrate machine learning technology into existing processes to propel innovation towards better, more meaningful outcomes. This is to ensure that organizations maintain a critical human element alongside machine learning and artificial intelligence deployments. In the cybersecurity space, for example, integrating machine learning with existing analysis processes ensure that security analysts are able to feed real-world data in near real-time to data scientists who can then tweak machine learning algorithms for constant process improvement.

Simply put, any successful machine learning program requires five primary ingredients:

  1. Collection of consistent, high quality and high volume of data – Domain experts need to work closely with data scientists to create a training set to make a machine learning algorithm increasingly functional. In many real-world applications, the creation of the training set is the hardest part of the machine learning equation. Domain experts have a critical role to interpret raw data and identify features or characteristics important to the data scientists.
  2. Data scientists to curate and label subsets of data needed for training machine learning algorithms – Data scientists use this knowledge to select an appropriate machine learning technique and transform raw data into a balanced training set for the algorithm. The training set must be carefully constructed so that the machine learning trains on the right features and learns from both positive and negative examples.
  3. Domain experts to guide the data scientists – Once the algorithm is working, moving it from the lab to a production environment is another hurdle. The set of skills needed to build a robust, nonstop, maintainable production environment is different than those needed to apply machine learning techniques to a problem domain. The production team needs to be comprised of engineers who can understand data scientists’ algorithms and needs, as well as build an environment that allows their work to run.
  4. A production team to convert the data scientist results into a scalable high-quality production implementation – Once machine learning algorithms are trained, running in production and producing results, experts need to identify which results are accurate and which are not.
  5. A feedback loop that evaluates the performance of the algorithms and continually improves results – This new data is then given to data scientists, who can either tweak the algorithm and/or add the data to their training set. This feedback loop is critical to develop an effective algorithm.

All of these parts are tightly linked. Collecting high-quality, consistent data is critical to the functionality of machine learning deployments. If the data is noisy, a system will train incorrectly and produce bad results. Once an algorithm is trained, real-world data must be measured and collected using identical sensors to the training data. Otherwise, it’s like training the algorithm using text documents and expecting it to recognize speech.

The bottom line is that in order to achieve machine learning’s full potential, it has to be used effectively. In the case of cybersecurity, machine learning is working wonders to mitigate the massive volume of SQLi attacks that have risen to the forefront of cloud-based environments.


Sign up for the free insideBIGDATA newsletter.


Speak Your Mind