Download

Article 3: The Untapped Legal Basis for Europe’s Public AI Ambitions

Paul Keller (Institute for Information Law (IViR))

October 28, 2025

There is increasing recognition that, as part of its efforts to remain competitive in the race toward ever more powerful AI capabilities, Europe should invest in Public AI. That is, AI systems that are built by organizations acting in the public interest and that focus on creating public value rather than extracting as much value from the information commons as possible.

There is a wide range of proposals emerging, and we are seeing increased mobilisation of public investment in efforts to build European AI models. Many of these efforts are mired in uncertainty about copyright that hampers these efforts. This is counterproductive and unnecessary, since the EU copyright framework contains specific rules that allow public-interest organizations (research organizations and cultural heritage institutions) to train and make available AI models that are trained on all lawfully accessible works, without requiring them to obtain permission from copyright holders.

Unfortunately, much of this legal clarity has been lost in the fog of war surrounding commercial AI training and the never-ending debate about the applicability of the TDM exception to AI model training.

By giving both scientific research organisations and cultural heritage institutions a privileged position, the 2019 Copyright in the Digital Single Market Directive uniquely positions these types of entities as developers of public-value-driven AI models. This aligns well with the fact that many of the efforts to build publicly funded AI models in Europe are undertaken by either research organisations (such as GPT-NL in the Netherlands) or libraries (such as in Norway), often with the objective to make the resulting models available under open-source licences, allowing anyone to use and build upon them.

Article 3 is an enabling legal framework for EU Public Al development

Article 3 of the CDSM Directive enables these institutions to text and data-mine all “works or other subject matter to which they have lawful access” for scientific research purposes. Text and data mining is understood to cover “any automated analytical technique aimed at analysing text and data in digital form in order to generate information, which includes but is not limited to patterns, trends and correlations,” which clearly covers the development of AI models (see here or, more recently, here).

The figure below provides a more detailed analysis of how Article 3 applies to beneficiaries such as scientific research organisations and national libraries conducting research to develop large pre-trained AI models for open-source release. The analysis pertains to training data that is protected by copyright.

The figure shows that the three key stages of model training (data acquisition, data preparation, and the actual training) all fall within the definition of TDM (and, crucially, only involve reproductions and extractions).

It also acknowledges that the publication of training datasets that contain copyrighted works does require authorisation from rightsholders (because it involves the making-available right). While training-dataset transparency is an important component of building fully open AI models, it is an act that is separate from the actual model training.

This mapping of the scope of the TDM exception onto the individual steps of the AI model-development pipeline is largely uncontroversial and shared by observers who dispute the overall application of TDM to AI training (see here) and those who consider AI training fully in scope (see here).

Releasing the trained model (i.e., the research artefact resulting from the text-and-data-mining activities carried out in the previous steps) does not fall within the scope of the TDM exception. However, as long as the trained model does not contain any of the works that it has been trained on, making it available does not infringe any rights in copyrighted works that have been included in the training data.

This means that, as long as the model is made available in line with the public-interest research missions of the organisations undertaking the training (for example, by releasing the model, including its weights, under an open-source licence) and is not commercialised by these organisations, this also does not affect the status of the reproductions and extractions made during the training process.

This means that Article 3 does cover the full model-development pathway (from data acquisition to model publication under an open source license) that most non-commercial Public AI model developers pursue.

This conclusion is also supported by a recent legal opinion authored by Prof. Dr. Malte Stieper for the German Library Association, which confirms that, under the German implementation of Article 3 CDSM, libraries “may produce text corpora with works from their collections to conduct TDM activities themselves, e.g. to train an LLM for analysing the library collection” (author’s translation). The opinion further emphasises that this applies only to non-commercial research contexts and that contractual clauses excluding such uses are not enforceable.

While Stieper’s analysis focuses on the role of libraries, it should apply mutatis mutandis to the other beneficiaries of Article 3 included in the above analysis. Importantly, he also notes that the law allows libraries to partner with research institutions as a “verlängerter Arm” (extension of the research organisation) for concrete projects — further supporting the Public AI collaboration model.

Matching ambition with legal clarity

Despite the solid legal foundation provided by Article 3 CDSM, many public institutions engaged in building open AI models continue to face uncertainty about its scope. This uncertainty has a chilling effect: research organisations and cultural institutions hesitate to act, while others move ahead in legal environments that are either more permissive or less scrutinised.

If Europe is serious about its strategic ambition to build a competitive, sovereign, and trustworthy AI ecosystem, as expressed in the Apply AI Strategy and the AI Continent Action Plan, it must match investment with legal clarity. The Commission should make clear that Article 3 already provides the legal basis for model training by public-interest institutions on lawfully accessible data. As shown above such a clarification does not require new legislation it simply comes down to confirming the applicability of the legal system in place.

The Commission’s upcoming Data Union Strategy, which explicitly aims to scale up access to data and simplify data-governance rules, provides an excellent opportunity to do this. If the Commission is serious about its ambitions in the AI space, then including a clear reaffirmation that Article 3 of the CDSM Directive already enables public-interest AI model training on lawfully accessible data would be one of the most impactful steps it could take toward achieving these goals.

Finally - as I have argued elsewhere - a framework that enables the development of Public AI must go hand-in-hand with measures that sustain Europe’s information ecosystem. The redistribution of value should take place where it is created—at the point of commercial deployment of AI systems—rather than by restricting research. This is how Europe can combine a commitment to developing Public AI with fairness in remuneration.

Image: “Bee” by Diego Pianarosa, CC BY 2.0

Tags: public AI