LAION Round 2: Machine-Readable but Still Not Actionable — The Lack of Progress on TDM Opt-Outs - Part 2

Image by Paul Keller

Part 1 of this post examined how the OLG Hamburg’s ruling in Kneschke v. LAION gives concrete meaning to the requirement that TDM opt-outs be machine-readable and machine-actionable, while also highlighting the temporal and contextual limits of that clarification. It showed why, beyond these boundary conditions, the judgment offers little guidance on how opt-outs can function reliably at scale.

 

As we argued in our 2023 paper “Defining best practices for opting out of ML training”, for machine-readable opt-outs to work across different standards (and possibly non-standardised instruments such as terms of use), they need to rely on a common vocabulary that ensures the intent of a rights reservation can be interpreted uniformly across different protocols and standards. This requirement follows directly from the core criterion of automated actionability articulated by the OLG Hamburg: without a shared and well-defined vocabulary, it is not possible for automated systems to reliably detect and act upon rights reservations at scale.

This point was also explicitly raised by LAION in its written submission on appeal. LAION argued that the criterion of “machine-readable” usage restrictions can only be met if such restrictions can be automatically located, understood, and correctly classified without the risk of inaccuracies or misinterpretation. From this perspective, the ability to carry out fully automated TDM—an objective explicitly pursued by both EU and German legislators—depends on the definition of a shared vocabulary of terms (described by LAION as “technical parameters”) that informs automated systems, such as crawlers, whether and under what conditions website content may be used for TDM.

An attempt to define such a vocabulary of terms is currently underway in the AI Preferences Working Group of the Internet Engineering Task Force (IETF), which is chartered to develop both a vocabulary for AI-related usage preferences and mechanisms for expressing those preferences via the Robots Exclusion Protocol (i.e. robots.txt). However, work on the vocabulary (the author of this piece is one of the editors of the vocabulary draft) has largely stalled since the summer. The most recent version of the vocabulary draft contains only two defined terms (“Foundation Model Production” and “Search”), and neither of these terms is currently close to being consensual.

The discussion in the AI Preferences Working Group reflects deep divisions between different sets of stakeholders. These divisions are further compounded by the fact that preference signals would need to operate against the background of widely diverging regulatory frameworks, in which the EU’s copyright framework—characterised by a clearly defined legal status of opt-outs and public-interest exceptions that are protected from technological and contractual override—stands out as an exception.

 

A proliferation of vocabularies

With the IETF working group moving much more slowly than originally expected, the dynamics around machine-readable opt-outs have shifted elsewhere. In September, Cloudflare launched contentsignals.org, described as an “implementation of a mechanism for allowing website publishers to declare how automated systems should use their content.” It functions as a proprietary extension of robots.txt and offers three opt-out categories (“ai-train”, “ai-input”, and “search”). Last week, the Really Simple Licensing (RSL) Collective—an initiative supported primarily by US online publishers and content delivery networks, presenting itself as a new type of collective rights management organisation—published version 1.0 of its eponymous protocol, which establishes “a standardized XML vocabulary and associated discovery and authorization mechanisms for expressing machine-readable usage, licensing, payment, and legal terms that govern how digital assets may be accessed or licensed by AI systems and automated agents.” 

The RSL protocol leverages robots.txt and HTTP headers to direct crawlers to a licence file, allowing the expression of rules related to automated processing in general (“all”), any use by AI systems (“ai-all”), specific uses by AI systems (“ai-train”, “ai-input”, and “ai-index”), and the building of a search index (“search”). The RSL protocol effectively expands the vocabulary defined by Cloudflare by introducing hierarchical relationships and anchoring the entire system in an overarching “automated processing” category that functions similarly to the notion of TDM under EU copyright law.

In addition to this there are a number of other standards and protocols such as the TDM Reservation Protocol (TDMRep) and TDM·AI — both specifically focussed on enabling TDM opt-outs in compliance with Article 4(3) — as well as broader initiatives such as C2PA and IPTC PLUS that also offer opt-out functionality based on their own vocabularies. 

Against the backdrop of this fragmented landscape the European Commission is currently running a consultation on protocols for reserving rights from text and data mining under the AI Act and the GPAI Code of Practice which is aimed at drawing up a list of generally agreed upon machine-readable protocols that GPAI model providers have to comply with to meet their obligations under the AI Act. 

Seen in this light, Kneschke v. LAION illustrates both the value and the limits of judicial intervention. The OLG Hamburg draws an important boundary by insisting that “machine-readable” opt-outs must be machine-actionable in practice, not merely intelligible in theory. But courts can only assess concrete constellations of fact against past technological capabilities; they cannot, on their own, supply the shared vocabularies and interoperable signalling mechanisms that automated compliance at scale would require.

As a result, while courts can clarify the legal boundary conditions, questions about how machine-readable opt-outs function in practice are being taken up in standard-setting processes. Whether this shift leads to interoperable, open standards developed in inclusive technical fora, or to de facto standards shaped by individual platforms and vendors, will be decisive for how the Article 4(3) opt-out mechanism functions in practice. The Kneschke v. LAION  ruling clarifies the legal boundary conditions, but it leaves open the question of who will define the technical meaning of “machine-readable” going forward—and under what governance model.

For now, we are faced with a real risk that, in the absence of shared and openly governed standards, machine-readable opt-outs will become the foundation for a new intermediary layer that normalises pay-per-crawl or pay-per-click access to the open web.



 

Comments (0)
Your email address will not be published.
Leave a Comment
Your email address will not be published.
Clear all
Image
ESG

Number 1 in Top 40 Copyright Blogs!

Image
feedspot

Book Ad List

Books
AIPPI
Artificial Intelligence and Copyright
Guillaume Henry and Sanna Wolk
€125.00
book1
The EU Artificial Intelligence (AI) Act: A Commentary
Ceyhun Necati Pehlivan, Nikolaus Forgó, & Peggy Valcke
€285.00