In 2024, Mozilla’s Data Futures Lab is hosting a speaker series exploring a more equitable data ecosystem in the era of generative AI. We’ll feature builders, legal experts, and researchers who identify issues and propose concrete solutions.
In our January event, Shayne Longpre, Naana Obeng-Marnu, and William Brannon, three core contributors to the Data Provenance Initiative, presented their work mapping of 2000+ popular, text-to-text finetuning datasets from origin to creation, cataloging their data sources, licenses, creators, and other metadata, for researchers and builders to explore.