Adversarial Attacks against ML systems

The increasing usage of artificial intelligence (AI) and machine learning (ML) in several applications raises a security concern. Machine learning systems expose new security vulnerabilities that traditional systems do not. These vulnerabilities are mostly associated to the strong, and yet partly oblivious, link between machine learning models and the data they use during training and inference. Several such vulnerabilities have already been discovered and exploited by so-called adversarial machine learning attacks, such as the following:

  • Model poisoning: An attack whereby an attacker maliciously injects training data or modifies a machine learning model’s training data or the training logic. This attack results in compromising the integrity of the machine learning model, reducing the correctness and/or confidence of its predictions overall (denial-of-service attacks) or on selected inputs (backdoor attacks).
  • Model evasion: An attack whereby an attacker maliciously selects or constructs inputs sent to a machine learning model at the inference time. This attack succeeds if the attacker’s inputs receive incorrect or low-confidence predictions from the targeted machine learning model.
  • Model stealing: An attack whereby an attacker builds a copy of a victim’s machine-learning model by querying it and using the queries and predictions to train a surrogate model. This attack results in compromising the confidentiality and the intellectual property of the victim’s machine learning model.
  • Training data inference: An attack whereby an attacker infers characteristics or reconstructs parts of the data used to train a machine learning model (model inversion and attribute inference) or verifies whether specific data were used in the model training (membership inference). This attack relies either on querying the target model and analyzing its predictions or on reverse-engineering the model. This attack results in compromising the confidentiality of the data used to train the machine learning model.

The existence of such attacks and the increasing usage of machine learning models for automated decision-making in various applications increases the security threats against ML-based systems. As a result, it is paramount to understand and manage the security risk created by using machine learning in larger systems and the potential impact that would result from a successful attack against the ML model.

Challenges in ML security assessment

Assessing this security risk associated with an ML system is not straightforward. It requires multidisciplinary technical skills, knowledge of emerging technologies, and an understanding of the ecosystem in which the ML-based system operates. First, we need to understand the impact of compromise based on the usage of the ML-based system and the decisions it makes. Second, we must (i) identify the security threats, (ii) discover the system vulnerabilities, and (iii) understand how adversarial ML attacks can exploit these vulnerabilities. Finally, we must know if we have deployed mechanisms (defences) that can mitigate the threats and vulnerabilities against our system.

Performing these tasks is currently challenging because of 3 main factors:

  • Little awareness of vulnerabilities and attacks specific to ML systems.
  • Little understanding of the attack vectors leads to exploit vulnerabilities once ML models are integrated into larger systems.
  • Limited availability of people having a deep understanding of both security and ML.

A solution to self-assess the security of ML systems

To partially address these challenges, we designed three questionnaires to assist machine learning practitioners, security experts and decision-makers in this risk assessment process. These questionnaires contain leading questions to help understand the security risk associated with a given ML system. These questions help the respondents to get a better understanding of the vulnerabilities and the possible attacks against their own ML systems. The questions also hint at measures that can be adopted to reduce these vulnerabilities and mitigate the attacks. The 3 questionnaires are defined as follows:

  • Risk and impact assessment assesses how well one manages the security risks associated with their ML system(s). It analyses your approach to threat analysis and impact assessment both in a generic context and when considering security threats specific to machine learning systems.
  • Attack surface and vulnerabilities help to identify the attack surface of your machine learning system and to discover its potential vulnerabilities at the several stages of its lifecycle.
  • The security of your ML system assesses how secure and robust it is your ML system. It helps to identify if you know the security threats against your system and if you have discovered its potential vulnerabilities. It explores if you have processes and techniques to mitigate potential attacks.

Each questionnaire can be answered individually by a different person. Answering all three questionnaires provides a complete picture of the security status of the assessed ML system. Respondents can remain anonymous; no personal information is required while answering the questionnaires. The goals of these three questionnaires are 4-fold:

  • Raising awareness about security threats against ML systems
  • Help machine learning practitioners in assessing the security of their own ML systems
  • Share solutions and practices that can increase the security of ML systems
  • Use the collected answers to infer global trends about the current state of ML system security

The questionnaires can be accessed at the following link.

Risk and impact assessment

To grasp the importance of machine learning security, it is important first to understand the security risk associated with the ML system on a high level. This questionnaire primarily aims to raise awareness about the consequences of a security incident on the overall ecosystem in which the ML system is used. Second, it aims to ensure that respondents are prepared for such an incident. The questions explore the application domain of the ML system and how its predictions/recommendations are used. They also investigate the processes set up to monitor and manage the security risk. A set of questions is targeted at risk assessment, and it attempts to identify the processes in place to manage the risk, such as:

  • Threat analysis exercises
  • Identification and monitoring of vulnerabilities
  • Use of metrics to quantify and monitor the security risk
  • Classification and ranking of different risk factors
  • Response procedures to mitigate the security risk

Another set of questions focuses on understanding the impact in case of a successful attack depending on several factors:

  • The application domain in which the ML system is used (e.g., high-risk applications)
  • The concrete damage caused by a successful attack on the business, users, society, etc.
  • The adversarial ML attack(s) that succeeds
  • The asset(s) that can be compromised (the ML model or the data it is trained with)

The targeted respondents for this questionnaire are people who understand the ML model’s business usage in a wide context and know generic security concepts and risk management. It can be answered by upper management or by legal or risk management experts and it is available at the following link.


Attack surface and vulnerabilities

The attack surface the ML system exposes serves as an attack vector to compromise it. This questionnaire reviews the design, implementation and deployment choices for the analyzed ML system in order to point out its vulnerabilities to adversarial ML attacks. Depending on these choices, adversarial ML attacks will be easier or harder to perform for an attacker who wants to target the system. This questionnaire analyses the attack vectors against the ML system according to the five stages of the ML model lifecycle.

  1. Design & Implementation: Choice and implementation of the model, training algorithm, optimization method and definition of input features.
  2. Data collection: Gathering a dataset, labelling it, sanitizing it and transforming it into the selected representation (e.g., feature extraction) that can be input to the ML model. This also includes splitting the whole dataset into training, validation, and testing sets.
  3. Training: Training the ML model on the prepared datasets using the training method selected during design and implementation. This includes tuning hyperparameters and the iterative training + validation steps required to improve model performance.
  4. Deployment & Integration: Choice of the deployment platform to perform inference and integrate the ML model into the overall machine learning-based system.
  5. Inference: Process where the model serves its primary purpose: providing predictions or recommendations for inputs submitted to it.


The targeted respondents for this questionnaire are people having a good knowledge of the ML model lifecycle, the ML system architecture and machine learning concepts in general. It can be answered by data scientists, data engineers or software engineers and it is available at the following link.


Data collection

The data collection process is primarily associated with vulnerabilities to model poisoning attacks and, to a lower extent, with training data inference. Depending on how data collection is performed, how big is the training dataset, which and how many data sources provide data, etc., an attacker will have a different ability to:

  • inject poisoned training data during the collection
  • compromise or overwrite the labels of data
  • compromise the data after it has been collected

For instance, an attacker can easily inject poisoned data by compromising a data source if many untrusted data sources provide data to the training dataset. Similarly, if there are processes for providing customer feedback that can change the labels of data, an attacker can more easily compromise these labels by using the feedback process. The questions in this sub-questionnaire investigate the aspects of data provenance, labelling process, data processing and storage to uncover vulnerabilities during data collection, which are mostly related to training data compromise.


Design & implementation

The choice of a machine learning training algorithm, the type of its input data and the chosen representation of the data can make the ML system more or less vulnerable to all adversarial ML attacks. For instance, crafting adversarial examples for model evasion is typically easier against complex ML models, taking input data represented using many features. Training data inference attacks are also easier against complex ML models and have a better ability to remember the data they are trained with. Model poisoning, on the other hand, is easier against simple ML models. The questions in this sub-questionnaire investigate the type of data and ML model used by the ML system to infer vulnerabilities coming from the design of the ML system.


Training process

The implementation of the training process is primarily associated with vulnerabilities to model poisoning. A compromise of the training process can also have an impact on inference time attacks like stealing, evasion and data inference. For instance, using a public pre-trained ML model increases the attacker’s knowledge about this model. This knowledge can be used to improve the success of stealing, evasion and training data inference attacks. A pre-trained ML model can already be compromised (e.g., poisoned) if an unknown or untrusted party trains it. This compromise can transfer to an ML model retrained from it and poison it as well. Another example of a training factor that influences the vulnerability to poisoning is the periodicity for retraining the ML model: a poisoning attack can be launched every time a model is retrained. The questions in this sub-questionnaire investigate the libraries, pre-trained models and training platforms that are used to build the ML model. The more external parties are involved during the training process, the greater the exposure to attacks performed by one of these parties who can be malicious or compromised.


Deployment & Integration + Inference

How an ML model is deployed and integrated into a larger system is a key factor that conditions its exposure to attacks.  The main factor defining its vulnerability to inference time attacks is model evasion, model stealing and training data inference. For instance, hiding the decision process of an ML model, e.g., by deploying it remotely in a cloud service, reduces the knowledge an attacker can obtain about this model. This increases the difficulty of mounting a successful adversarial ML attack and makes the ML system more robust. The type of system or user the ML model interacts with also conditions its exposure to attacks. If its inputs come from trusted systems, there is little chance a model evasion attack can manipulate them. Similarly, if the ML model’s predictions can only be consumed by trusted systems, there is little chance that an attacker can extract information from them to perform a model stealing or a training data inference attack. Finally, if untrusted users or systems consume predictions, returning coarse predictions reduces the information leakage about the model and its training data. This can mitigate the success of all inference-time adversarial ML attacks. The questions in this sub-questionnaire investigate the deployment of the model and its interactions with another system component to infer its vulnerabilities at inference time.


Even though choices at the different stages of the ML model lifecycle influence the vulnerability of the system to attacks, these are not typical vulnerabilities that can be fixed with a simple patch. Decreasing the exposure of the ML system to a certain attack may increase its exposure to another attack or impair its performance. Thus, fixing the ML vulnerabilities discussed above is more a matter of making compromises. Answering this questionnaire helps understand the several vulnerabilities of ML systems, comparing them with the system and security requirements, and it enables to make an informed choice about the performance-security trade-off one wants to meet. For instance, one can choose to favor having vulnerabilities associated with less prominent security threats. Alternatively, defences can be deployed to mitigate the known vulnerabilities.


Security of your ML system

Despite exposing vulnerabilities, the security of an ML system can be safeguarded. If one knows the vulnerabilities of the system and how easy/hard they are to exploit, defences can be deployed to cope with the most serious security threats. This last questionnaire evaluates the security of the ML system and its readiness to respond to potential attacks. It primarily aims to assess the security of the ML system with respect to attacks specific against ML systems, a.k.a. adversarial ML attacks. The questionnaire does not evaluate traditional systems, software, networks, etc., and security, which are assumed to be already met by the system. The first part of this questionnaire focuses on security assessment, and it evaluates how informed respondents are about the current security of their ML system. The second part evaluates the deployed processes and defences that can mitigate adversarial ML attacks.

Security assessment

Properly securing an ML system requires being informed about the attacks likely to exploit its vulnerabilities. This sub-questionnaire investigates the processes used for security assessment, such as security testing, penetration testing, etc. It asks if adversarial ML attacks (model evasion, model poisoning, model stealing, etc.) have been evaluated against the ML system. Finally, it also asks if the ML system is secured from different perspectives, such as system, software or network security, etc. by asking for compliance with generic security standards (e.g., ISO/IEC 27000-series) and security certification of the system (e.g., Cybersecurity Act in Europe).

Mitigation of attacks

Processes and defences can be set up to increase the security of the ML system. Some defences have been developed to protect from adversarial ML attacks, such as adversarial training that mitigates model evasion attacks or differential privacy to defend against training data inference attacks. This sub-questionnaire asks if such ML-specific defences are currently deployed to protect the ML system. It also investigates if good ML practices are used, which are not primarily designed to mitigate attacks but can nevertheless increase the resilience of the system. For instance, traditional data sanitization and detection of abnormal inputs can mitigate model poisoning during training and model evasion during inference. Enforcing strong access control to data and ML models also mitigates model poisoning. Monitoring the performance of the ML model at inference and having fallback plans in case of critical performance degradation can reduce the impact of model evasion attacks. Finally, constraining queries to the ML model by limiting their rate slows down training data inference and model-stealing attacks. This sub-questionnaire asks if such good ML practices are deployed to increase the resilience of the ML system to adversarial ML attacks.