
Stefan Baack
Latest research
-
Towards Best Practices for Open Datasets for LLM Training
Jan. 13, 2025Openness and AI / AI fairness, accountability, and transparencyBuilding on community insights from 30 AI dataset experts, this research paper distills best practices for creating open datasets for LLM training. The paper is a collaboration between Mozilla and EleutherAI.
-
Training Data for the Price of a Sandwich: Common Crawl’s Impact on Generative AI
Feb. 6, 2024AI bias & discrimination / AI fairness, accountability, and transparencyMozilla finds that Common Crawl's outsized role in the generative AI boom has improved transparency and competition, but is also contributing to biased and opaque generative AI models.
-
Internet Health Report 2022
July 18, 2022Internet health / Internet Health Report / AI fairness, accountability, and transparencyAn annual compilation of research and stories explaining what’s key to a healthier internet. In this edition we are narrowing our focus to artificial intelligence.