Description of the use case

Numerous machine learning (ML) based methods have been proposed for cybersecurity applications, in particular, for detecting malicious objects and activities. In practice, such methods are integrated in more complex systems or analytical pipelines, and ML model verdicts contribute to more general detection logic, which also includes rule-based approaches and human expert decisions.

The WithSecure’s use case comprises several ML models used in various places of our technology stack. We present two initial examples here, and – as the project progresses – further models can be added to the scope of the study.

1) A model for detecting malicious MS Word documents. Having started with the Emotet malware family[1] detection, we developed a model for identifying malicious macros used by Emotet, which was later extended to classify any macro-type malware sample. The underlying model is Random Forest, and the set of the features that we use includes over one hundred document meta properties and counts of keywords in document macros.

2) A model for profiling Windows machines and assigning those to a small set of categories, such as “technical”, “non-technical”, and “server”. Knowing those categories enables us to design more precise, endpoint-type-specific rules in our real-time attack detection engine, reflecting, for instance, the importance of a given endpoint. The model is a combination of TF-IDF transformation and XGBoost classifier applied to the counts of certain events (“new process”, “network connection” and “file access”) observed in Windows endpoints.

 

What is the pilot for?

At a high level, the plan is to study threats to several machine learning (ML) models used in our technology stack or in cybersecurity research and to propose methods of mitigating those threats. We will focus on evasion and poisoning attacks against the selected ML models, essentially enumerating potential attack avenues, designing, implementing  and testing practical attacks, and quantifying ‘susceptibility’ of our models to those. That will be followed by designing suitable defences, verifying their effectiveness and potential impact on the ML models under protection, and assessing the remaining risks.

 

Why is it useful for WithSecure?

ML security currently receives little attention from the industry, while the threat of adversarial ML attacks is serious, and attacks against real-world systems have already been reported. ML models have important conceptual differences in comparison with traditional information systems and their components, and they expose new security weaknesses and vulnerabilities that cannot be mitigated by applying traditional cybersecurity defences and require new approaches. Through the work and collaboration in SPATIAL, we would like to both (i) better understand how to protect ML models in our own systems, and (ii) strengthen our expertise and develop our tools for producing new ML security services to offer to the customers.

 

What tangible results are expected from it?

We see two primary targets at the moment. The first is a methodology of security assessment for ML-powered systems (illustrated by the examples selected for the pilot). We definitely aim at a practically viable methodology that can serve as a foundation for our commercial offerings. The other target is to ensure that our own ML-powered systems are resilient against attacks and abuse and to provide sound evidence of their resilience.

[1] https://blog.f-secure.com/emotet-returned-from-vacation-and-is-active-again-how-to-reduce-risk-in-your-environment/