Open source lies at the heart of Mozilla and our Manifesto. Despite its ubiquity in the current technology landscape, it is easy to forget that open source was once a radical idea which was compared to cancer. In the long journey since, Mozilla has helped create an open source browser, email client, programming language, and data donation platform while applying the ethos beyond our code, including our advocacy.
Recent developments in the AI ecosystem have put open source back in the spotlight, sparking heated conversations and accusations about whose ends it serves - a global community of developers, the entrenched dominance of big tech companies, or a little bit of both? Motivations and incentives matter and Mozilla believes in representing the core set of values behind open source while working with players of all sizes to advance a trustworthy ecosystem. As we noted in 2021, “openness” is often at risk of being co-opted, serving as little more than a facade meant to shield organizations and governments from scrutiny.
We’ve been following the debate closely at Mozilla and think the nature of open source in AI raises many new questions that are still in the process of being answered - what is open source when it comes to AI? How should regulation treat open source to foster innovation without providing a pink slip to necessary regulatory compliance? What are the contours of the commercial deployment and corresponding liability in open source software? None of these are easy questions and the potential for abuse inherent in powerful models only serves to further muddy the waters. Mozilla is exploring these questions by building open source technology to advance trustworthy AI at Mozilla.ai, giving grants to our community through the Mozilla Technology Fund (MTF), and through our public policy work.
On the public policy front, EU legislators are paying close attention to these developments. The most recent proposals for the AI Act from the European Parliament and member states include dedicated language on open source, and for good reason: open source development offers tremendous opportunities and can enable significant innovation and commercial deployment. Just as importantly, making AI models and components available under permissive licenses opens them up to important scrutiny from researchers aiming to evaluate, amongst other things, their safety and trustworthiness.
However, the special nature and capabilities of the open source ecosystem clearly require further improvements in the AI Act before finalization, as a coalition from the open source community recently argued. We think the coalition’s paper brings some much needed clarity, specifically by centering the debate on two key facets:
“First, the values of sound research, reproducibility, and transparency fostered by open science are instrumental to the development of safe and accountable AI systems.
Second, open source development can enable competition and innovation by new entrants and smaller players, including in the EU.”
We continue to crystallize our thoughts on these issues, both by collaborating with allies and centering our thinking around the community - an integral aspect of Mozilla’s role in the technology ecosystem - we are highlighting key considerations for EU legislators as they finalize the AI Act.
Slippery definitions of open source AI are rife in the ecosystem. In the absence of definitional clarity, shifting meanings of “open source” and “open” can be deployed strategically in the policy domain in ways that reduce oversight and hinder accountability. The final version of the AI Act should therefore clearly define what it means by “open source” and related terms.
Here are a few key places where clarity could help move the ball forward in the direction of greater AI accountability and further enabling open source development:
First, EU legislators should ensure that any definition of open source focuses on permissive licenses that are not undercut with restrictions on how or for what purposes they can be used. Releases that do include such restrictions would not meet conventional definitions of “open source”, including the definition provided by the Open Source Initiative (OSI). The OSI definition could serve as a helpful point of reference in this regard. Should legislators want to create exemptions similar to those relating to open source releases for releases that come with certain use restrictions, for example so-called open responsible AI licenses (or open RAIL) or for releases limited to research uses, they should do so explicitly and without expanding conventional definitions of open source through regulation.
Second, openness in relation to AI is more complex, and more expansive, than in other contexts. While open source software typically relates to source code, it can relate to a number of different artifacts in AI: from the entire model (i.e. the model weights; not source code) to components like training data or the source code underlying deployment software or the training process. The AI Act should therefore clearly define AI components and be specific with regard to the question of which obligations should apply to providers of which components. The co-legislators’ proposals are still laden with ambiguity in this regard. For example, would obligations concerning foundation models apply only to those open-sourcing the trained model or also to those open-sourcing constituent components, such as training datasets? And how should obligations be applied if, for example, the model is openly available but the training data is not? In answering these questions, EU legislators should duly take into account the capabilities of the various actors along the supply chain and of open source communities more generally.
Recommendation: The AI Act should clarify that technologies claiming special treatment in this context should be released under licenses aligned with the Open Source Initiative (OSI) definition of “open source”. Further, the law should clarify the minimum set of components (indicatively - models, weights, training data, etc.) that should be released under an OSI license to benefit from regulatory exemptions.
Open source AI enables important research on AI and its risks, but simply open-sourcing an AI model does not necessarily mean that it is released with research as its primary purpose. In fact, enabling broader commercialization has always been a key tenet of the open source movement. While appealing at first glance, relying solely on the intent to commercialize an AI model as a criterion for imposing regulatory obligations raises an array of thorny questions for regulators.
First, unless stipulated otherwise (e.g., through use restrictions in the license under which a model is released), openly released models can be adapted and used for any purpose — and that should be taken into account in formulating obligations for open source providers. At the same time, while many open source AI projects are carried out in the public interest or by open source community groups (e.g., Open Science’s BLOOM or EleutherAI), some are driven by well-resourced commercial actors. In fact, some of the most widely used and commercialized permissively licensed AI models — e.g., Stable Diffusion or the LLAMA family of models — have been developed by companies such as Stability AI or Meta, in some cases with the deliberate intent of commercialization.
Meaningfully tying regulatory obligations to commercialization requires much greater clarity on what differentiates non-commercial and commercial development, deployment, and maintenance. For instance, would a developer be considered a “commercial” actor if they receive some compensation for maintaining an open source component (that is also used for commercial purposes) in addition to their day job? The AI Act currently doesn’t provide that clarity. This is an issue that has also cropped up in debates around the EU’s Cyber Resilience Act (CRA), where the solution likely lies in a combination of a revenue threshold (10-20m+) and the possibility for subjective exceptions. EU co-legislators should pay close attention to such files grappling with similar questions and further ensure that the open source community has clarity when it comes to interpreting concepts such as “placing on the market” and “putting into service”, which are critical in this respect.
Recommendation: The AI Act should provide clarity on the criteria by which a project will be judged to determine whether it has crossed the “commercialisation” threshold, including revenue. We urgently need language that ensures there is a subjective determination process that allows for nuance to reflect the variety of open source projects in difficult cases.
Regulatory obligations imposed on open source AI providers should not disincentivize open source development or outmatch the capabilities of the open source communities. However, they should also not lose sight of the AI Act’s objective to prevent harm and facilitate trust in AI. Therefore, it is important not to forget that open source AI should emphasize responsibility and trustworthiness, too. Nonetheless, any obligations imposed on open source AI should take into account the fact that with an increasing level of openness, compliance and evaluation become easier to achieve for downstream actors. For example, testing obligations can be met more easily if the model weights are made openly available (and the model subsequently deployable by anyone with sufficient computing resources to do so). Similarly, data governance requirements are easier to meet for downstream actors if training datasets are openly available.
The details of what this should look like in practice are key and suggestions should include subjective criteria that allow for case-by-case determination rather than just definitional gymnastics. This is the only way to prevent that merely marking something as open source absolves someone from all liability and neither does it place a crippling burden that makes open source AI development unfeasible. This is also linked to the question of base models and fine-tuning/other forms of modification, where liability questions are a lot more unclear.
Recommendation: The AI Act should allow for proportional obligations in the case of open source projects while creating strong guardrails to ensure they are not exploited to hide from legitimate regulatory scrutiny. This should include subjective criteria and a process that allows for case-by-case determination rather than encouraging definitional gymnastics.
It is clear that open source is only a part of the broader issue of what it takes to have an open and competitive AI landscape. We are thinking hard about this at Mozilla from the lens of data, infrastructure and compute resources, community involvement, liability and many other factors - there is much more to discuss and to do.