Get to learn more about security threat modelling for AI-based systems!!
This article aims to provide an introduction to known vulnerabilities of ML-based systems:
- First, it is explored why machine learning models show an increased attack surface in the first place.
- Second, a brief overview of the vulnerabilities of machine learning systems is given.
Although the practical applicability of machine learning (ML) applications has been a highly active and flourishing field of research in recent years, the security of machine learning-based systems still needs to be sufficiently addressed. However, security is essential for the broad and successful adoption of ML-based systems, as they expose new security vulnerabilities that conventional computer systems do not.
Machine learning systems are different from conventional algorithms and computer programs. They learn their behaviour and make decisions based on data gathered from their deployed environment.
Consequently, machine learning systems present different vulnerabilities, can be exposed to different threats and can be targeted by different attacks than conventional computer systems.
In the context of the project deliverable D1.2, “Security threat modelling for AI-based systems” [1], SPATIAL researchers addressed this topic by investigating the security and corresponding threat modelling for AI-based systems. Precisely, the researchers examined the increased attack surface of ML-based systems and provided a comprehensive overview of the main vulnerabilities that are specific to ML systems. In addition, both conventional and new security frameworks specifically tailored for analysing the security posture of machine learning systems were examined in the deliverable. The SPATIAL researchers found that conventional approaches (e.g. STRIDE, PASTA, DREAD) used to analyse the threats and vulnerabilities of computer systems and the means to protect them may not be best suited or effective when applied to machine learning systems.
In contrast, the new security frameworks introduced by public organisations, including ENISA, MITRE, NIST, IBM, Microsoft, and WithSecure, are more suited for an effective threat modelling of ML-based systems.
However, the SPATIAL researcher determined that a detailed and profound knowledge of the vulnerabilities of machine learning systems is essential for effective threat modelling for all discussed security frameworks. These frameworks usually require a high level of technical knowledge about the system being studied, known or potential security vulnerabilities, and an understanding of how attacks work. In the case of ML applications, this knowledge is missing because vulnerabilities and attacks that exploit them are new and still largely unknown to security experts.
Machine Learning Models exhibit an increased Attack Surface
In contrast to a traditional software program, the behaviour of a machine learning model is learned from the data it is trained with. This has four main implications that, compared to conventional software systems, increase its attack surface, and introduce new vulnerability types.
- First, as the model behaviour is learned from training data, the information in the training data is inherently embedded into the machine learning model and, by transitivity, into its predictions. This means that the machine learning model and its predictions could be used to compromise the confidentiality of the training data and its data sources, even if the training data is well protected using encryption and secure storage mechanisms.
- Second, the fact that the model behaviour is learned from data means that an attacker can compromise a machine-learning model by compromising its training data or data sources. Any compromise (e.g., loss of integrity) of these assets before training will be transferred into the model during training.
- Third, it is hard to verify machine learning models. Unlike traditional software libraries, it is difficult to read the code of a machine learning model and identify potential flaws and threats (which can be further aggravated by the non-deterministic decision-making of some models). Furthermore, while input-output-based validation of machine learning models can produce statistical evidence of their expected functionality, validation results cannot prove their correctness for all the possible inputs. This naturally brings about supply chain attack risks since third-party machine learning models are often used either “as is” or as a foundation for training other models.
- Fourth, detecting adversarial data inputs for training and inference is challenging. Machine learning is used in the first place to cope with the fact that we cannot define explicit rules to model some data or a phenomenon in an explicit manner. Consequently, there are typically no easy and scalable means to decide whether some input data is benign or malicious. Furthermore, inputs are taken from the environment in which the machine learning system is deployed or from its users, and such input spaces cannot be narrowly defined. It also means attackers can compromise machine learning systems by manipulating their environment or controlling some users.
Vulnerabilities of Machine Learning Systems
The four implications discussed above explain why machine learning systems expose additional and different vulnerabilities compared to traditional software systems. These new vulnerabilities also imply the exposure to additional and different threats that must be considered when assessing and implementing security in machine learning systems.
A short overview of vulnerability machine learning systems is provided in the following.
1. Model Poisoning
Model poisoning is an attack designed to alter a machine learning model via influence over its training data or process [2]. Model poisoning attacks compromise the integrity of a machine learning model. In a data poisoning attack, an adversary injects malicious data inputs into the model’s training set designed to distort the model’s ability to classify inputs accurately. In this way, an attacker alters the accuracy of the machine learning model for their purposes. A model attacked in this fashion may become impractical for real-world use. As such, data poisoning attacks can also impact the availability of a machine learning system.
An alternative possibility for poisoning is backdoor attacks. It is a form of adversarial attack where the attacker poisons a small part of the training data and creates trigger patterns in the model that can be activated during inference [3]. The model may perform well on most inputs. Still, its accuracy may drop only for specific inputs with backdoor triggers, such as inputs that satisfy some secret or inputs having specific properties chosen by the attacker. These poisoning attacks will therefore affect the integrity of the machine learning model.
2. Model Evasion
Model evasion attacks [4] are the most common attacks on machine learning systems. Evasion attacks target the inference phase of the machine learning model lifecycle and compromise the integrity of the machine learning model’s predictions. Such attacks aim to change the machine learning model’s expected (and often correct) output using well-crafted malicious inputs, a.k.a. adversarial examples, to confuse machine learning models into making incorrect predictions. Evasion attacks typically aim to obtain a misclassification while making minimal modifications to the sample to be misclassified. For example, an attacker can implement an evasion attack to bypass a network intrusion detection system (NIDS) by minimally modifying malicious network packets while preserving their malicious utility and remaining undetected by the NIDS.3
3. Model Stealing
Model stealing attacks, also called model extraction, are adversarial machine learning attacks that compromise the confidentiality and the intellectual property of a machine learning model during inference. They can be used to steal the machine learning model and as a stepping stone to launch other attacks, e.g., white-box evasion attacks. In this scenario, an attacker leverages that machine learning models leak information about their internal decision logic through the query/response interactions provided during inference. By carefully crafting adversarial queries to the machine learning model (via an API), an attacker can exploit the information in returned predictions to reconstruct a surrogate machine learning model with similar performance and behaviour to the victim model. Thus, the target model’s predictions serve to leak information and compromise its confidentiality.
4. Training Data Inference
Data inference attacks take advantage of the information leaked by machine learning systems and use this information to compromise the confidentiality of training data and threaten the privacy of individuals or organizations whose data was used in training sets. Two main types of inference attacks exist – membership inference and model inversion (a.k.a. attribute inference attack).
Membership inference attacks assume a situation where access to a model is readily available. Such an attack attempts to identify if a record is included in the training data. There are many cases where this attack can have a serious impact, such as when the attack attempts to uncover sensitive personal data such as purchase records, location, or medical records. The basic idea of a membership inference attack is to learn the difference between the target model’s behaviour with inputs that were already seen in the training data set and from inputs from unseen data.
Model inversion attacks assume a situation where an attacker already has partial knowledge about a data record and tries to infer the information of the missing attributes. The attack methodology is similar to that of membership inference attacks. In this case, the attacker repeatedly queries the target model with different possible values of a missing attribute and analyses outputs to discover the value that is indeed in the corresponding record of the training data set [5].
5. Supply Chain Vulnerabilities
External components, libraries and pieces of code used in machine learning systems can be potentially compromised and used to attack the resulting model or system. The provenance and integrity of these external components must be verified to prevent supply chain attacks. Besides conventional supply chain vulnerabilities related to compromised third party software, machine learning systems leverage specific third-party components like machine learning training libraries, pre-trained machine learning models and serialization libraries, which supply chain attacks can compromise. Hence, verifying the integrity of these components is essential to enhance the security of ML-based systems.
6. Deployment Vulnerabilities
External cloud services are commonly used to control the cost of training and deploying machine learning models. Some cloud services even provide customer solutions for machine learning specific tasks, which provide standard machine learning libraries and monitoring solutions as part of their offering. The compromise of training or deployment platforms can jeopardize the security of machine learning systems despite having all their components and algorithms secured. For example, secure and trusted machine learning processes and components can be replaced by malicious ones at the (compromised) platform supposed to execute them. Therefore, the integrity of the training and deployment platforms must be verified to secure machine learning systems and reduce potential attack vectors.
This article discussed why machine learning models and systems exhibit an increased attack surface compared to their traditional counterparts. Furthermore, the article presented a short overview of the vulnerabilities of ML-based systems. In this context, it is essential to highlight that a profound and detailed understanding of the attack surface and presented vulnerabilities is key for effective threat modelling and securing ML-based systems. Therefore, we refer the interested reader to the SPATIAL deliverable D1.2 [1] for more details on the presented vulnerabilities. In addition, this deliverable provides a comprehensive overview of conventional and new security frameworks specifically tailored for analysing the security posture of machine learning systems, as well as a discussion and security analysis of different machine learning architectures.
*This article is intended to provide only an entry point to the knowledge of ML vulnerabilities. For a more detailed discussion, we refer the interested reader to the SPATIAL deliverable D1.2 [1].
References
[1] S. Marchal, A. Kirichenko, A. Patel, M. Boerger, N. Tcholtchev, M.-D. Nguyen, V. Hoa La, A. R. Cavalli, C. Soriente, N. Kourtellis, D. Perino, A. Lutu, S. Park, P. Bagave, A. Ding, M. Westberg, M. Liyanage, S. Wang, B. Siniarski, C. Sandeepa, and T. Senevirathna, “SPATIAL D1.2 – Security Threats modelling for AI-based System Architectures”, H2020 Project SPATIAL – Grant agreement No. 101021808, Nov. 2022.
[2] B. Biggio and F. Roli, “Wild patterns: Ten years after the rise of adversarial machine learning,” Pattern Recognition, no. 84, pp. 317-331, 2018.
[3] S. Aniruddha, A. Subramanya and H. Pirsiavash, “Hidden trigger backdoor attacks,” in Proceedings of the AAAI conference on artificial intelligence, 2020.
[4] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Srndic, P. Laskov, G. Giacinto and F. Roli, “Evasion Attacks against Machine Learning at Test Time,” in Proceedings of the 2013th European Conference on Machine Learning and Knowledge Discovery in Databases, 2013.
[5] S. Yeom, I. Giacomelli, M. Fredrikson and S. Jha, “Privacy risk in machine learning: Analyzing the connection to overfitting,” in IEEE 31st computer security foundations symposium (CSF), 2018.