Stefan Baack

Stefan Baack

Research and Data Analyst, Insights

Latest research

  • Dataset Convening logo

    Towards Best Practices for Open Datasets for LLM Training

    Jan. 13, 2025
    Openness and AI / AI fairness, accountability, and transparency

    Building on community insights from 30 AI dataset experts, this research paper distills best practices for creating open datasets for LLM training. The paper is a collaboration between Mozilla and EleutherAI.

  • Common-Crawl-Spider

    Training Data for the Price of a Sandwich: Common Crawl’s Impact on Generative AI

    Feb. 6, 2024
    AI bias & discrimination / AI fairness, accountability, and transparency

    Mozilla finds that Common Crawl's outsized role in the generative AI boom has improved transparency and competition, but is also contributing to biased and opaque generative AI models.

  • thumbnail-IRL

    Internet Health Report 2022

    July 18, 2022
    Internet health / Internet Health Report / AI fairness, accountability, and transparency

    An annual compilation of research and stories explaining what’s key to a healthier internet. In this edition we are narrowing our focus to artificial intelligence.

Browse all projects (7)