Download

GPT-NL respects copyright - cui bono? - Part 1

Paul Keller (Institute for Information Law (IViR))

June 25, 2026

As in many other places the abrupt removal of access to Anthropic’s Fable model has caused a lot of hand wringing about dependency on US frontier AI models in the Netherlands. What makes the Dutch discussion noteworthy is that for some commentators the news has been a reason to publicly bash the presumed performance of the main Dutch attempt to build sovereign AI: the GPT-NL project.

This project, run by TNO and a number of public-sector partners, was launched in 2023 with the ambition to build a rights respecting large language model trained on high quality dutch content.

The project has gained a lot of attention, in large part because of a number of content deals the developers have made with rightholders — including a group encompassing the major Dutch news publishers. Yet its actual capabilities remain shrouded in mystery. The model is neither publicly available, nor has anyone with access described its performance on the record. The level of secrecy surrounding the model has made it easy to dismiss its capacity in public.

This in turn has led other observers to defend GPT-NL on the grounds that it is a principled attempt to build an AI model based on public values including “respecting copyright, compensating and sustaining our media organisations and heritage institutions, preserving / restoring a healthy information ecosystem, and preventing exploitation in how data and models are maintained” (own translation). As I have argued before these are very relevant concerns and one of the reasons why Europe needs to build a public AI ecosystem. But what if “respecting copyright” as it is understood by GPT-NL is simply incompatible with both the ambition to build usable AI models and ensuring the sustainability of the information ecosystem at large? As it turns out the project offers some valuable insights that can help answer this question.

GPT-NL ❤️ rightholders

So what does GPT-NL mean when it says its approach is based on respecting copyright? The answer goes well beyond respecting copyright law. It comes closer to building a data sourcing policy around a set of rules that resemble how many rightholders would prefer copyright law to be. GPT-NL trains only on data it has positive authorisation to use: public-domain and openly-licensed content, public-sector data, and in-copyright material licensed directly from rightholders under negotiated agreements. This opt-in/licensing-only approach means that GPT-NL abstains, on principle, from using data crawled from the open web and from Wikipedia. This is not because copyright requires it, but a deliberate design choice. It makes GPT-NL the most rightholder-respecting LLM-building effort available.

Adopting this rightholder-aligned data sourcing strategy has allowed GPT-NL to bring on board a number of Dutch rightholders who have contributed their content for use as training data. The most prominent of these collaborations is the agreement with NDP Nieuwsmedia, the association of Dutch commercial news publishers, which brought in archives from most major Dutch news outlets under a single licensing arrangement. This agreement is one of the defining features of the project, but, as we shall see, it comes at a significant cost.

The cost of this choice is that GPT-NL forgoes most of the training data it could lawfully have used. Article 4 of the CDSM Directive permits text and data mining of lawfully accessible content for any purpose, including commercial AI training, subject only to rightholders opting out. A model built on that basis could draw on the open web, Wikipedia, and the vast body of digitised text whose rightholders have expressed no objection — exactly the material that makes up the bulk of any competitive training corpus. And it is not only a matter of volume: the open web is also where the diversity of domains, registers and topics lives, the breadth that licensed news archives and public-sector records, however large, cannot supply on their own. Given that both the volume and the diversity of training data are central to model quality, this represents a significant constraint on what GPT-NL can achieve.

Does respecting copyright break the model?

Until now this tradeoff has been difficult to quantify due to the fact that GPT-NL is not yet publicly available. However last week Edwin Rijgersberg, who until September 2025 was the NFI's project leader for GPT-NL, published a blogpost analysing a set of benchmark results that had been quietly published by TNO in a technical document in late 2025. In his analysis he uses these benchmark results to compare GPT-NL’s performance against the models the project has used as reference points (GPT-3.5 and Llama 2 7B) and a recent European open source model (Mistral Small 3.2). The resulting picture is not pretty: On all four Dutch-language indicators, GPT-NL, a model built specifically for Dutch, is beaten by at least two of the three other models, none of which had any Dutch-specific training objective. On one of the indicators, factual knowledge, the gap is dramatic: GPT-NL scores barely above what random guessing would produce.

Rijgersberg's post sets out to assess GPT-NL's capacity as a usable sovereign alternative, and while he repeatedly attributes its weak performance to the project's restrictive data sourcing policy, isolating the effect of that policy is not his aim. To see how far the copyright policy is actually responsible, we need a different comparison.

The comparison we need is with a model that takes copyright compliance seriously too, but stops short of GPT-NL's self-imposed abstention. The Swiss Apertus model, released in September 2025, fits the bill: its technical report is unusually detailed about how the training data was selected, and it goes as far as retroactively filtering its web crawl for opt-outs to comply with the EU copyright directive. The crucial difference is that Apertus does not refuse the open web on principle — it relies on the text and data mining exception and removes only the content rightholders have actually opted out of.

Apertus's scores on the same benchmarks Rijgersberg used are available in EuroEval. The chart below sets the results for Apertus-8B base alongside his numbers:

Data from Rijgersberg and EuroEval Dutch leaderboard, (v16.10.1, retrieved 20 June 2026). ScaLA-nl (MCC) and SQuAD-nl (EM) from the NLU view; MMLU-nl (MCC) from the generative view. The Apertus-8B WikiLingua-nl figure was provided by Edwin Rijgersberg, on the same metric as GPT-NL's score.

The result is unambiguous. On the four benchmarks in the table, Apertus-8B base outperforms GPT-NL across the board. On factual knowledge, where GPT-NL barely clears random guessing, Apertus scores 41.46, roughly on a par with GPT-3.5. On summarisation GPT-NL is at the floor while Apertus clears it. On reading comprehension and grammatical acceptability Apertus is comfortably ahead as well, and it does so as a smaller model with no Dutch-specific training objective. These are intermediate figures, taken after pre-training but before the small-scale instruction tuning GPT-NL still plans — which can usually squeeze a little more performance out of a model, but rarely shifts benchmark scores dramatically. The GPT-NL numbers should therefore be read as provisional.

It is important to note that this is not a controlled experiment. Apertus also has a larger and differently composed training corpus, and the two models differ in other ways, so these numbers establish a correlation, not a clean causal claim. The comparison above cannot rule out that something other than data sourcing explains the gap, but the direction is hard to argue with: the more permissive model is smaller and still wins across the board. Apertus illustrates that respecting opt-outs at scale does not make building a usable model impossible.

What separates the two, is that Apertus uses the data the law allows while GPT-NL declines most of it, and the model that uses more of it wins on every measure. While these numbers cannot prove that abstention alone explains the gap, they make clear that it is GPT-NL's self-imposed, rightholder-aligned stance — not copyright compliance as such — that is holding the model back.

The data sourcing policy, in other words, has cost GPT-NL the performance it was designed to deliver. But there is a second question that the project's defenders would rightly insist on asking: even if the model underperforms, do the rightholders who contributed their content at least get paid? That is the question the second part of this piece turns to.

Comments (0)

Your email address will not be published.