Download

GPT-NL respects copyright - cui bono? - Part 2

Paul Keller (Institute for Information Law (IViR))

June 26, 2026

In the first part of this post I examined what GPT-NL's rightholder-aligned data sourcing policy costs in terms of model performance. Here I turn to a second question: do the rightholders who contributed their content at least get paid?

But will rightholders actually get paid?

Now, GPT-NL has never claimed to be building the best-performing system, and it is important to acknowledge that it still does the thing its proponents praise: it has arrangements with rightholders intended to establish a different set of rules. Rules that support the information ecosystem, in contrast to the prevailing practice of training on publicly available information without any return to those who produce it.

Setting the model's performance aside, the real question that matters to the rightholders who contributed is: will they be paid?

The answer depends on how the deal is structured. According to the information that is publicly available, the participating publishers have entered into a revenue-sharing arrangement: rather than being paid at training time for the use of their content, they receive no payment upfront and instead share in the revenue the model earns once it reaches the market.

A revenue share is only worth as much as the commercial success of the model it is tied to. If GPT-NL cannot build a system that competes — and so far the benchmarks suggest it cannot — then their 50% share of its licence revenue, however generous it sounds, amounts to 50% of next to nothing.The difficulty in assessing a model’s commercial potential upfront is one of the reasons we, among others, have argued for a levy that attaches to the revenue generated by all commercial deployments of models trained on publicly available information.

So far such proposals have been met with opposition from rightholders, which makes it all the more striking that, as a mechanism, the GPT-NL arrangement and these levy proposals are very similar. The GPT-NL revenue-sharing mechanism is already levy-shaped in its architecture — a pooled fund, distribution by a usage-based key, payment triggered by deployment revenue — even if the base it rests on is too narrow to make it work. The most copyright-respecting project on offer has, in effect, reinvented a levy by contract. The one thing a contract cannot do is make participation mandatory — and it is precisely the large commercial model developers, who would never enter an arrangement like this voluntarily, that generate the revenue that would make such a system worth running.

Widening the base

The difference then is not in the redistribution mechanism but in its legal basis — and that difference is the whole point. GPT-NL keeps the exclusive-rights, opt-in architecture that rightholders prize: nothing is used without a licence. A levy gives up the exclusive right at the point of training and replaces it with a remuneration right. That sounds like the larger concession, but it is what makes the difference in outcome: an opt-in deal can only ever reach the revenue of the models whose developers agree to license, while a remuneration right reaches deployment across the market, including the high-revenue commercial models whose developers decline to license. Giving up the right to say no to any individual model developers is what buys access to a share of the revenue of all of them.

13:09Claude responded: There is a second sense in which the licensing route is too narrow.There is a second sense in which the licensing route is too narrow. A deal can only ever pay those who hold exclusive rights and have the bargaining power to negotiate. In practice that means the handful of large, organised rightholders in any given sector — the major news publishers in GPT-NL's case — while the long tail of individual creators, whose work is just as much in the training data, has no realistic way to strike a deal and falls outside the arrangement entirely. This pattern is visible in the GPT-NL deal itself: according to the Auteursbond, the Dutch authors' union, it remains unclear whether and how the journalists and other authors whose work appears in the licensed archives will be compensated under the NDP agreement.

But the data AI models are built on is not limited to licensed news archives. It draws on the public domain, openly licensed content, public-sector information and the holdings of cultural heritage institutions — most of which have no rightholder to license it and no one positioned to strike a deal. Because a remuneration right attaches to deployment rather than to the act of licensing at training time, its proceeds can (and should) be distributed more widely: not only to rightholders, but to the public-interest media, cultural heritage institutions, open content platforms and public AI infrastructure that produce and maintain the information commons these models depend on. A levy sustains the wider ecosystem that produces the training data in the first place: the ecosystem GPT-NL's proponents say they are trying to protect.

Towards sustainable European AI

Let us return to where this piece began. The dependency anxiety shaping the debate about sovereign European AI is real. The approach embodied by GPT-NL gets something important right about what a European alternative must attend to — the health of the information ecosystem as a whole — but it just as clearly shows that the method it chose is not suited to achieving that goal. Artificially limiting access to data may be sympathetic to rightholders, but it ultimately neither benefits them nor helps build competitive models.

The alternative is to accept the reality that competitive models need to be trained on as much information as possible, and that this includes web-crawled data. Being trained on the sum total of publicly available information is one of the defining features of current development approaches, and refraining from using such data — however well intended — is simply not a viable option. A deployment-based levy with broad redistribution is a mechanism that acknowledges this reality without giving up on the objective of supporting the creators, producers and stewards of that information.

But such a levy isn't only a fairer way to support information producers. A deployment-based levy would, in effect, replace the opt-out in Article 4 with a non-waivable remuneration right — giving European developers the same friction-free access to training data that their non-European competitors already operate under, while guaranteeing the ecosystem a share of the revenue that use generates. It strips out the friction and chilling effects that currently hold back model development in the EU. And, as I have argued here, it preserves room for exclusive licensing of copyrighted works at inference time, where rightholders retain real leverage

Learning from GPT-NL

GPT-NL's prospects as a viable sovereign model may be doubtful, but the project has been genuinely instructive. In trying to do the right thing — to build on a consensual, rightholder-respecting basis — it has shown us two things at once. First, that abstaining from the data the law allows produces a model that cannot compete; and second, in the structure of its own remuneration deal, what supporting the information ecosystem actually looks like in practice. Even this maximally consensual route reinvents the collection-and-distribution architecture of a levy — but, tied to the fortunes of a single licensed model, it can never reach the revenue across the market that would make such a system worth running. Only a statutory base can do that.

The lesson is not that GPT-NL aimed at the wrong goal, but that the goal is unreachable by the means it chose. Building sovereign European models and sustaining the ecosystem that feeds them are not competing objectives — but achieving both at once requires giving up on copyright as a right to refuse, and rebuilding it as a right to be paid.

Comments (0)

Your email address will not be published.