D3.1 DETECTION MECHANISMS TO IDENTIFY DATA BIASES AND EXPLORATORY STUDIES ABOUT DIFFERENT DATA QUALITY TRADE-OFFS FOR AI-BASED SYSTEMS

The data quality used for training AI models is critical to guarantee robust performance. Besides the large body of well-established methods for data fusion, preparation and augmentation, additional data characteristics can contribute towards making AI models trustworthy. Data fairness and transparency are key aspects to investigate when analyzing the features extracted from the data used by the model to support their autonomous decision-making. Advancements in model training using distributed devices introduce a higher level of complexity when dissecting AI model logic. This document investigates fairness and transparency methods to detect possible biases introduced as models are re-trained over time. Through rigorous benchmarks and studies that consider different application scenarios, we explore the performance of these methods and data trade-offs that impact the overall model inference process. Our results suggest detecting induced and non-induced changes over data used for training the models is possible. This requires, however, augmenting current standard machine learning pipelines with components that analyze data quality throughout the pipeline from input data to model deployment.

Read the deliverable here!