Abstract

Human annotated data plays a crucial role in machine learning (ML) research\nand development. However, the ethical considerations around the processes and\ndecisions that go into dataset annotation have not received nearly enough\nattention. In this paper, we survey an array of literature that provides\ninsights into ethical considerations around crowdsourced dataset annotation. We\nsynthesize these insights, and lay out the challenges in this space along two\nlayers: (1) who the annotator is, and how the annotators' lived experiences can\nimpact their annotations, and (2) the relationship between the annotators and\nthe crowdsourcing platforms, and what that relationship affords them. Finally,\nwe introduce a novel framework, CrowdWorkSheets, for dataset developers to\nfacilitate transparent documentation of key decisions points at various stages\nof the data annotation pipeline: task formulation, selection of annotators,\nplatform and infrastructure choices, dataset analysis and evaluation, and\ndataset release and maintenance.\n

Keywords

CrowdsourcingAnnotationComputer scienceDocumentationPipeline (software)Task (project management)Data scienceSpace (punctuation)Key (lock)Selection (genetic algorithm)Information retrievalArtificial intelligenceWorld Wide Web

Affiliated Institutions

Related Publications

Publication Info

Year
2022
Type
article
Pages
2342-2351
Citations
61
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

61
OpenAlex

Cite This

Mark Díaz, Ian Kivlichan, Rachel Rosen et al. (2022). CrowdWorkSheets: Accounting for Individual and Collective Identities Underlying Crowdsourced Dataset Annotation. 2022 ACM Conference on Fairness, Accountability, and Transparency , 2342-2351. https://doi.org/10.1145/3531146.3534647

Identifiers

DOI
10.1145/3531146.3534647