AIのバイアス特定・管理のための標準化に向けて(2/3)NIST Special Publication 1270

弊社では米国国立標準技術研究所(NIST)の許可を得て、2022年3月に発表された報告書 NIST Special Publication 1270: Towards a Standard for Identifying and Managing Bias in Artificial IntelligenceAIのバイアスの特定と管理のための標準化に向けて」について、そのキーポイントと思われるところを抜粋し邦訳の上、3回に分けてブログで公開することと致しました。今回は第二回目です。



Translated with permission courtesy of the National Institute of Standards and Technology (NIST), not an official US Government translation. All rights reserved, US Secretary of Commerce.

【NIST Special Publication 1270の構成】

前回に引き続き、第二回は以下構成の内、2. AIバイアスの類型化の後半から2章の終わりまでについて、主な論点を取りまとめます。

Executive Summary

1 Purpose and Scope 目的とスコープ

2 AI Bias: Context and Terminology AIバイアスの類型化

3 AI Bias: Challenges and Guidance AIバイアス:課題と指針

4. Conclusions 結論

2 AI Bias: Context and Terminology AIバイアスの類型化

2.3 A Socio-technical Systems Approach 社会技術的なアプローチ

Adopting a socio- technical perspective can enable a broader understanding of AI impacts and the key decisions that happen throughout, and beyond, the AI lifecycle–such as whether technology is even a solution to a given task or problem [3, 109].

社会技術的なアプローチを採用することで、AIのライフサイクルならびにその先に起こり得る影響をより広く理解し、与えられた課題や問題に対して技術だけが真の解決策となるのかといった重要な判断を行うことができるようになります。 [3, 109]

Reframing AI-related factors such as datasets, TEVV, participatory design, and human-in-the-loop practices through a socio- technical lens means understanding how they are both functions of society and, through the power of AI, can impact society.

AIに関連するデータセット、TEVV(Test, Evaluation, Validation, and Verification)、参加型設計といった要素と、ヒューマンインザループの実践を、社会技術的なレンズを通して捉え直すことで、それらが共に社会の重要な機能であると同時に、逆にAIという力を通じて社会に影響を及ぼし得るものであるかということを理解できるようになります。

2.4 An Updated AI Lifecycle 新たなAIのライフサイクル

This document has adapted a four-stage AI lifecycle from other stakeholder ver sions.11 The intent is to enable AI designers, developers, evaluators and deployers to relate lifecycle processes with AI bias categories and effectively facilitate its identification and management.

本報告書は、他の著作版に記載された、4 段階の AI ライフサイクルを採用します。その意図は、AIの設計者、開発者、評価者、デプロイ担当者が、ライフサイクルプロセスとAIのバイアスカテゴリーを関連付け、効率的にその特定と管理を促進できるようにするためです。

AI Lifecycles are iterative, and begin in the Pre-Design stage, where planning, problem specification, background research, and identification of data take place.


The Design and Development stage typically starts with analysis of the requirements and the available data. Based on this, a model is designed or selected.


The Deployment stage is when the AI system is released and used. Once humans begin to interact with the AI system the performance of the system must be monitored and reassessed to ensure proper function.


The Test and Evaluation stage is continuous throughout the entire AI Development Lifecycle. Organizations are encouraged to perform continuous testing and evaluation of all AI system components and features where bias can contribute to harmful impacts.


For example, if during deployment the model is retrained with new data for a specific context, the model deployer should work with the model producer to assess actual performance for bias evaluation. Multi-stakeholder engagement is encouraged to ensure that the assessment is balanced and comprehensive.


If deviations from desired goals are observed, the findings should feed into the model Pre-Design stage to ensure appropriate adjustments are made in data curation and problem formulation.


Any proposed changes to the design of the model should then be evaluated together with the new data and requirements to ensure compatibility and identification of any potential new sources of bias.


Then another round of design and implementation commences to formulate corresponding requirements for the new model capabilities and features and for additional datasets. During this stage, the model developer should perform continuous testing and evaluation to ensure that bias mitigation maintains effectiveness in the new setting, as the model is optimized and tested for performance.


Once released, the deploying organization should use documented model specifications to test and evaluate bias characteristics during deployment in the specific context. Ideally, this evaluation should be performed together with other stakeholders to ensure all previously identified problems are resolved to everyone’s satisfaction.


3. AI Bias: Challenges and Guidance AI バイアス:課題と指針

Through a review of the literature, and various multi-stakeholder processes, including public comments, workshops, and listening sessions, NIST has identified three broad areas that present challenges for addressing AI bias.


The first challenge relates to dataset factors such as availability, representativeness, and baked-in societal biases.

The second relates to issues of measurement and metrics to support testing and evaluation, validation, and verification (TEVV).

The third area broadly comprises issues related to human factors, including societal and historic biases within individuals and organizations, as well as challenges related to implementing human-in-the-loop.




NIST plans to work with the trustworthy and responsible AI communities to explore the proposed mitigants and governance processes, and build associated formal technical guidance over the coming years in concert with these communities.


3.1 Who is Counted? Datasets in AI Bias


3.1.1 Dataset Challenges データセットの課題

AI design and development practices rely on large scale datasets to drive ML processes. This ever-present need can lead researchers, developers, and practitioners to first “go where the data is,” and adapt their questions accordingly [124]. This creates a culture focused more on which datasets are available or accessible, rather than what dataset might be most suitable [108]

AIの設計と開発の現場では、機械学習プロセスを実行するにあたり、大規模なデータセットが必要です。この絶えず存在するニーズは、 AIの研究者、開発者、実務家を、データがあるところに駆り立て、課題をそれに適合させてしまう可能性があります[124]。このことは、どのデータセットが最も適しているかということよりも、どのデータセットが利用可能か、アクセス可能かということに、より焦点を当てる文化を生み出すことになります[108]。

As a result, the data used in these processes may not be fully representative of populations or the phenomena that are being modeled. The data that is collected can differ significantly from what occurs in the real world [77, 78, 119].

その結果、これらのプロセスで使用されるデータは、母集団やモデル化される現象を必ずしも代表しているとは言い難いものとなるかも知れません。収集されたデータが、実世界で起こっていることと大きく異なってしまう可能性もあり得ます [77, 78, 119]。

Systemic biases may also be manifested in the form of availability bias when datasets that are readily available but not fully representative of the target population (including proxy data) are used and reused as training data.


Other issues arise due to the common ML practice of reusing datasets. Under such practices, datasets may become disconnected from the social contexts and time periods of their creation.


Even when datasets are representative, they may still exhibit entrenched historical and systemic biases, improperly utilize protected attributes, or utilize culturally or contextually unsuitable attributes. Latent variables such as gender can be inferred through browsing history, and race can be inferred through zip code.


3.1.2 Dataset Guidance データセットのガイダンス

Not only is the predictive behavior of the ML system determined by the data, but the data also largely defines the machine learning task itself [62].


The question of dataset fit or suitability requires attention to three factors: statistical methods for mitigating representation issues; processes to account for the socio-technical context in which the application is being deployed; and awareness of the interaction of human factors with the AI technical system at all stages of the AI lifecycle.



Statistical Factors 統計的要因

AI bias problems are exacerbated by the variety of statistical biases that are prevalent in the large scale datasets used in ML modeling. When these models are deployed for decision-based applications, often in high-risk settings and off-label uses, harms can be perpetuated and amplified.


Consequently, a model trained on biased and erroneous data may lead to biased and inaccurate predictions. Moreover, training a model on one dataset and using it to operate on another requires special care to account for potential differences in the distributions of the datasets that may further exacerbate the unfairness and errors of the model.


Accounting for Socio-technical Factors 社会技術的要因の考慮

While statistical methods are indeed necessary, they are not sufficient for addressing the AI bias challenges associated with datasets. Modeling processes have the intent of making contextual concepts measurable. Once the context has been removed, however, it is difficult to get it back, leading AI models to learn from inexact representations.


The practice of deploying AI in off-label uses, that is AI systems being applied to a task or within a social or organizational context for which it was not designed, must be approached with caution, especially in high-risk settings.


Interaction of human factors and datasets


Systemic institutional biases are captured in the datasets used to build the models underlying AI applications. These biases are compounded by the decisions and assumptions made by AI design and development teams about which datasets to use [145]. These decisions affect who and what gets counted, and who and what does not get counted.


Data typically needs to be cleaned in some way, removing outliers and spurious data. Missing data may be imputed (replacing the missing values with nearest neighbors or extrapolated values) or removed entirely. Missing data may be more frequent in marginalized populations. Furthermore, because of compounding collection biases, missing and spurious data is often not random.


