The city in panic over its chatty machines: the memorisation metaphor and its policy misalignments
December 22, 2025
We tend to think that bad policy begins in parliament buildings, in closed-door meetings, or in regulatory working groups. But often, it starts elsewhere — on the street, in a café queue, or on holiday strolls — in the unexamined phrases people casually or symptomatically repeat about things. These narratives travel from everyday talk into political speech, and from political speech into law, stripped of context and charged with misplaced certainty.
Currently in copyright, the most consequential of these framings is the memorisation as storage metaphor: the picture of a large language model (LLM) as a filing cabinet full of copyrighted works. It is simple, vivid - and glaringly wrong. Modern models are parametric function approximators, not databases (Carlini et al. 2023). A rare verbatim output under adversarial prompting does not imply a resident bitwise copy any more than a wine connoisseur's ability to identify a vintage implies a chemical formula stored in their brain, if I may use another metaphor to describe this blatant distortion. In short, one should not replace a complex, abstract process with a simplistic, physical one.
Conflating “memorisation” (a term of art in machine learning describing uncommon verbatim leakage correlated with duplication and salience) with legal reproduction (which turns on fixation and identity of protected expression) is a paradigmatic decontextualisation error. Parameter updates are not stable embodiments of protected works (Elhage et al. 2021). Where actual reproductions occur, they occur at the point of output and should be analysed as such, not retroactively used to re-characterise training as copying.
At this point, one might think I’m being overly optimistic about these systems. I’m not. On the contrary: unless we change the frame through which we see them, we will regulate the wrong layer (training, deployment, or outputs), entrench category errors in doctrine, and produce exactly the harms critics fear. If we keep re-using concepts built for legacy technologies, two failures follow.
First, doctrinal misfit: casting training as “reproduction” invites over-enforcement upstream, undermining the carefully struck balance of the text-and-data-mining (TDM) exceptions that permit analytical copying for research under conditions; that chill would fall heaviest on universities, SMEs and open-source labs rather than on incumbents with compliance teams. Second, policy misalignment: broad, slogan-level controls at the wrong layer (e.g., prohibitions on model training rather than governance of datasets, deployment contexts, and outputs) reward scale, raise fixed compliance costs, and concentrate power.
A realistic approach is not deregulatory; it is mechanism-aware and risk-based. Appropriate AI (and IP) regulation aims to foster innovation and trust; when it drifts from mechanisms to metaphors, trust erodes, and innovation moves elsewhere. It becomes empty discourse with harmful consequences. Meanwhile the genuine risks, as well as the (inherited) bias, exclusion, and downstream discrimination - are not solved. They are addressed by dataset governance, representativeness, documentation, impact assessment, and output-side controls.
Turning taboos into statutes has predictable results:
(i) regulatory arbitrage - developers move models or training elsewhere;
(ii) opacity - actors retreat from openness to evade ill-fitting rules;
(iii) research chill - TDM-dependent work slows or halts; and
(iv) rights friction - over-broad controls collide with expression, research and access (OECD 2019).
An open, competitive scientific ecosystem is not a luxury add-on: open science and lawful TDM are how we get better safety techniques (e.g., deduplication to reduce rare verbatim leakage), better evaluation, and reproducibility that regulators and courts can actually audit. If, instead, we legislate against a decontextualised metaphor - equating parametric learning with storage and copying - we will chill the very innovation we say we want, while leaving the real problems (data quality, deployment risks, misuse) untouched. The constructive path is clear: describe training as parameter optimisation aimed at statistical generalisation; keep reproduction tethered to fixation and identity; and target concrete risks at the layer they arise (data → governance, model → evaluation, deployment → safeguards, outputs → remedies).
Romanticising analogue tools while demonising digital ones is a naïve misframing that ignores the technological and environmental costs of the former, misidentifies the true locus of risk in incentives and governance, reinforces visibility bias by treating only conspicuous AI as “AI,” and sustains the illusion that pre-digital technologies were inherently fair or harmless. We have been here before. Television was said to create “videots”; the printing press was initially demonised and now is deified (Postman 1985); early state discourse in the twentieth century even branded cybernetics and artificial intelligence “bourgeois pseudoscience” before later rehabilitating them (Gerovitch 2002). Such cycles are classic moral panics: simplified threat narratives that travel because they are evocative and serve specific interests, not because they are accurate (ibid.).
When they cross into law, they can lock-in poor concepts with long half-lives. This matters because policy discourse often jumps from slogan to statute. Politicians proclaim that “AI should be regulated”, but the work lies in where, how, and at what layer: data acquisition, model training, deployment context, or outputs. Interests diverge - rightsholders, platforms, open-source communities, SMEs, incumbents. Broad-brush framing lets bad ideas pass, protects elites, and pushes aside wider social needs.
The performative bookcase on video calls, the ‘I never use AI’ boast, and the politician’s catch-all ‘regulate AI’ line are not trivial social tics. They are interpretative vectors. When memorisation leaves computer science, it arrives in law as a storage analogy. In doctrine, that analogy re-mutates into a claim that training equals reproduction, that weights equal copies, that adversarial outputs prove storage. Each step is conceptually wrong. The result leads to policy misalignment and weaker innovation. If the problem is framed as ‘models contain copies’, we over-invest in prohibitions and under-invest in dataset governance, deduplication, output-side controls, and safety layers - the levers that in practice mitigate rare leakage and other harms.
Moral panics are the alibi: they justify centralising discretion, licensing before doing, and vague prohibitions that can be stretched to fit whatever is disliked next (Sunstein 2012). In practice this means prior restraints on research, training, and even use, ‘safety’ pretexts for gatekeeping access to data and compute, and compliance burdens that only incumbents can afford.
It replaces clear rules with elastic standards that invite selective enforcement, nudging universities, SMEs, and open communities to self-censor or exit. We have seen the pattern with broadcast monopolies, print licensing, and ‘national security’ catch-alls: the rhetoric promises protection; the effect is control over inquiry, expression, and entry (ibid.). When debate is framed as a choice between fluidity and crackdown, the crackdown wins - and innovation, pluralism, and accountability lose.
The constructive alternative is language awareness and layer-appropriate regulation. Describe training as parameter optimisation aimed at statistical generalisation, not storage. Evaluate alleged infringements against expression and identity, not metaphor. Target concrete risks where they arise - data, model, deployment, outputs - rather than at the abstraction ‘AI’. Preserve the balance of text-and-data mining frameworks that enable analysis while supporting rightsholders with practical remedies where actual reproductions occur.
The “city in panic over its chatty machines” is absorbed in the fuss and legal divides regarding the treatment of model training and the legal meaning of memorisation — a debate with the power to define the economic future of Europe, that should be addressed through a focus on innovation and evidence-based regulation. How we talk about technology shapes what we regulate; how we regulate shapes what gets built. And regulation should not chill precisely the innovation that produces the assistive tools many now depend on. To move forward, we must abandon the flawed database metaphor.