Download

Collective Licensing for Gen AI Training: Feasible or Flawed? – Part 1

Stepanka Havlikova (Masaryk University Institute of Law and Technology)

May 17, 2026

On 10 March 2026, the European Parliament adopted a resolution on copyright and generative AI urging the EU Commission to clarify and potentially update the existing copyright framework for generative AI (GenAI), facilitate voluntary collective licensing agreements, propose full transparency obligations for AI providers and establish a rebuttable presumption of infringement where those obligations are not met, while also calling on the EUIPO to become the trusted intermediary to manage rightsholder opt-outs. As the Commission prepares to review the CDSM Directive no sooner than June 2026, statutory licensing and collective rights management could now be on the agenda as a potential fix. But can it deliver?

This blog post series examines potential collective licensing models and finds that each carries significant structural limitations. Part 1 of this blog post will explain why the existing text and data mining (TDM) exceptions framework falls short and will explore voluntary and statutory collective licensing. Part 2 of this blog post will turn to extended collective licensing before concluding with reflections on the most effective and least disruptive paths forward.

Why the TDM framework falls short

Article 4 of the CDSM Directive permits commercial text and data mining subject to a rightsholder opt-out. In practice, the opt-out mechanism is widely considered to fall short (see for example Senftleben, Mezei, or the EU Parliament`s March 2026 resolution). Namely, there is no standardised format for machine-readable opt-outs, rightsholders have multiple imperfect means of expressing reservations, while developers face serious difficulty reliably detecting them across numerous individual signals across jurisdictions. Technical measures such as robots.txt or metadata flags depend on voluntary crawler compliance (while empirical studies suggest many crawlers ignore them entirely), lack granularity and may block legitimate uses such as search indexing (for more details, see Hamman, Havlikova or Löbling). National implementation has fragmented the picture further, with courts in Germany, the Netherlands, and Hungary reaching divergent conclusions on what constitutes a valid opt-out. Fundamentally, the opt-out model offers only a binary choice: exclusion or uncompensated use. Individual licensing contracts are possible in theory but unworkable for individual authors at the scale AI training requires - leaving a gap that serves neither creators nor the goal of a functioning licensing market for AI training data.

Compounding this, the TDM exception does not address what happens after training of a model. Some empirical studies show that GenAI models can "memorise" and "regurgitate" verbatim snippets of training data - a risk the Munich Court found capable of constituting copyright infringement in its GEMA v. OpenAI ruling. Even state-of-the-art mitigation techniques such as deduplication and filtering cannot fully eliminate this risk. These gaps have pushed major AI developers toward individual licensing deals – for example, study issued by the British Film Institute reports that over 79 such agreements were signed globally between March 2023 and February 2025 - but this route is only viable for well-capitalised players, creating a two-tier market at odds with the CDSM Directive's goal of a level playing field.

Voluntary collective licensing: a useful but insufficient complement

Voluntary collective licensing - where rightsholders authorise a CMO to negotiate and enforce licences on their behalf - requires no legislative reform and can coexist with the Article 4 exception. Early efforts exist: Swedish CMO STIM announced launch of what it described as the world's first collective AI license for music in September 2025; the Copyright Clearance Center introduced a Collective AI License in the US; and the CLA announced a Generative AI Training Licence in the UK.

Yet the 2025 EU Council questionnaire reveals that "licensing of protected works by CMOs for the purpose of AI-training remains rare in the EU." The fundamental problem is scalability: the scheme can only cover those rightsholders who are aware of it and choose to participate. Smaller and independent creators - precisely those most at risk from AI market disruption - often lack the knowledge, time, or administrative capacity to join. As a result, the licensed catalogue remains incomplete and cannot provide the legal certainty developers need. In the context, of voluntary licensing, CMOs also lack sufficient bargaining power when facing major AI companies that can negotiate directly with large content suppliers.

The EU Parliament`s March 2026 resolution highlighted voluntary licensing as a potential solution, calling for sector-based collective licensing agreements and assigning the EUIPO responsibility for supporting that process - steps that would certainly help mitigate some of these shortcomings.

Statutory licensing and mandatory collective management: proportionality problems

With the increasing number of copyright disputes, policymakers and scholars have floated statutory or mandatory collective licensing as a potential solution (for example Geiger and Iaia and Lucchi). Such regimes risk reducing authors’ freedom to a mere claim for remuneration while stripping the authors of the freedom to decide whether to license or not. Although mandatory collective management enables the CMO (as the licensor) to refuse granting a license, this element of discretion is absent in statutory licensing and limitation-based remuneration schemes.

When measured against the Berne three-step test, these schemes face significant legal headwinds.

A statutory license covering virtually any work capable of being ingested into a training dataset - although having a narrow purpose and group of beneficiaries - risks failing the condition of being limited to “certain special cases”. In interpreting the TRIPs Agreement version of the text, a WTO Panel required exceptions to be clearly defined and narrow in scope - and the broad quantitative reach of AI training is problematic on that reading. That said, existing exceptions like private copying also affect large volumes of works yet survive this scrutiny; the decisive factor is whether eligible uses are framed with sufficient conceptual precision as to purpose, nature, and scope. Whether a statutory license for a GenAI training scheme would pass that test depends heavily on how narrowly those parameters would be drawn.

On the second step (conflict with normal exploitation of the work), much depends on the scope of rights covered. A regime limited to reproduction rights - permitting ingestion (i.e. reproduction) without authorising onward dissemination (i.e. communication to the public) - is more likely to satisfy the condition, since the revenue attributable to any single work within a vast training dataset is unlikely to constitute exploitation of "considerable economic or practical importance" As Senftleben noted, the benchmark should be the commercialisation of the individual work. Given the enormous number of works in a typical training dataset, the revenue attributable to any single work is likely too small to constitute "considerable economic importance" to its rightsholder. The calculus shifts if the regime extends to dissemination and communication to the public, where AI-generated outputs could directly compete with rightsholders' own markets. However, while limiting the regime to reproduction rights may satisfy the second step, it reintroduces the data memorisation and regurgitation problem (as explained by Mezei or Havlikova) - leaving AI developers without sufficient certainty to rely on such a regime without seeking additional licenses.

On the third step (unreasonable prejudice to rightsholders' legitimate interests), Senftleben has aptly concluded that equitable remuneration can prevent unreasonable prejudice, and that even a broad exception may satisfy this condition if compensation is adequate and opt-out rights are preserved. But statutory licensing goes further than the TDM exception: it removes authors' ability to refuse use altogether, converting exclusive rights into bare remuneration claims. A scheme that strips authors of any ability to exclude objectionable uses risks crossing the proportionality threshold regardless of the compensation offered.

Taken together, these considerations reveal a structural tension: a statutory licensing regime which would be narrow enough to satisfy the three-step test would likely be too restrictive to offer AI developers meaningful legal certainty, while one broad enough to serve that purpose would struggle to meet the test's cumulative conditions, not to mention that such framework would likely require either an expansion of the existing TDM exception or the introduction of a new exception covering subsequent communicative uses.

Not to mention that - as Senftleben cautioned - upfront remuneration imposes costs at the wrong stage of the value chain, potentially impeding the AI innovation these regimes are meant to enable.

To be continued ... see Part 2.

This blog post series is an adapted and shortened version of the article titled “Collective Licensing for Gen AI Training: Feasible or Flawed?” published in European Intellectual Property Review (EIPR), vol. 2026, No 3, p. 143 - 158. ISSN 0142-0461. For the full length version, see the EIPR journal.

Image created by AI