Download

Unique Does not Mean Identifiable – Google Search Data Sharing under Article 6(11) DMA

Lena Hornkohl (University of Vienna, Austria)

June 19, 2026

The discussion around the Google search data sharing specification proceedings under Article 8(2) and Article 6(11) DMA has not cooled down after it’s (short) consultation period. The issue has remained the same from the beginning: how can the obligation under Article 6(11) DMA be implemented in a way that adequately safeguards end users’ privacy? And the interesting twist here: Google is raising the privacy defence: ‘We are concerned because the EC's approach to anonymization fails to protect Europeans' privacy’ – Who would have thought to see the day?

This piece takes privacy seriously and fully recognises the importance of personal data protection. But it is equally important to distinguish genuine privacy risks from strategic invocations of privacy by gatekeepers seeking to neutralise data-sharing remedies before markets become too contestable for comfort. Privacy matters; using it as a regulatory invisibility cloak should not.

Let us take a step back: What is Article 6(11) DMA about?

As my colleague Alba Ribera Martínez describes in her excellent and already quite exhaustive post, the case at hand deals with the obligation under Article 6(11) DMA, which requires Google – the only to date designated gatekeeper providing an online search engine – to ‘provide to any third-party undertaking providing online search engines, at its request, with access on fair, reasonable and non-discriminatory terms to ranking, query, click and view data in relation to free and paid search generated by end users on its online search engines.’ The Commission does not believe Google has delivered accordingly, opened proceedings in January 2026, and proposed measures in April 2026. A final decision is expected before the summer break starts.

Although the consultation period has ended, and the proposal has already received quite some feedback (see eg here and here for examples), the discussion continues. The main issue concerns the second part of Article 6(11) DMA, playfully omitted above: ‘Any such query, click and view data that constitutes personal data shall be anonymised.’ Recital 61 DMA is even more in-depth, providing that ‘gatekeeper should ensure the protection of the personal data of end users, including against possible re-identification risks, by appropriate means, such as anonymisation of such personal data, without substantially degrading the quality or usefulness of the data. The relevant data is anonymised if personal data is irreversibly altered in such a way that information does not relate to an identified or identifiable natural person or where personal data is rendered anonymous in such a manner that the data subject is not or is no longer identifiable.’ Furthermore, the Draft Joint Guidelines on the interplay between the GDPR and the DMA also in-length deal with the obligation under Article 6(11) DMA and possible anonymisation and privacy issues.

The red team allegations

The ongoing-discussion centres precisely on that question: do the Commission’s proposed measures sufficiently address anonymisation and privacy issues, or not? Some side with Google and believe they do not – but are they correct?

In terms of anonymisation, the Commission proposed several measures set out in Section 3 of the preliminary findings to ensure the anonymisation of end users’ personal data in the search dataset before it is shared with third-party online search engines. The end result of these measures is a combination of pseudonymisation, aggregation, and noise injection of personal data through mechanisms, such as the removal of direct identifiers, entity thresholding, query-length filtering, and metadata generalisation (see on the measures in detail already here but also below).

A Google scientist now publicly argues that especially the proposed measures to anonymise personal data would not suffice against modern AI tools possibility to re-identify people. Allegedly, an internal so-called red team was able to re-identify users in ‘less than two hours’ from a sample of search-engine query data anonymised under the Commission’s proposed method, using, among other things, AI tools to simulate realistic adversarial activity, particularly linkage attacks.

Is there any truth to these allegations? So far, the red team’s findings have not been shared with the public; all we have are media statements. The red-team claim should therefore be treated with caution. It rests on the assertion that users could be re-identified from data anonymised under the Commission’s proposed method, but no actual re-identification has been publicly demonstrated. The scientist quoted above claimed to be ‘eager to share technical expertise and work with the EC to establish the right guardrails and protect Europeans from privacy harm’ – but has not made the underlying methodology available for independent public scrutiny.

Unique queries and re-identification

That said, the privacy concerns appear to focus primarily on unique queries, as the compliance reports and workshops have already demonstrated. It seems to be Google’s view that uniqueness itself becomes the decisive risk marker: if a query record occurs only once, it should be treated as inherently unsafe for inclusion in the shared dataset for privacy concerns.

This concern is illustrated by the frequency-threshold model Google used in its previous voluntary data-sharing programme. Under that model, a query appeared in the dataset only if the identical query string had been entered by more than 30 signed-in users worldwide during the preceding 13 months.

In practice, this threshold removed the very data that would have made the dataset useful. As third-party online search engines explained during the compliance workshops in March 2024 and July 2025, Google’s approach excluded between 90% and 100% of unique queries and between 30% and 40% of overall query volume. The resulting dataset was therefore so heavily filtered that it was of little practical value.

The Commission’s proposed approach in para. 21 – 34 is materially different. Instead of excluding whole records simply because the full query string does not meet a frequency threshold, it relies on a layered set of more targeted safeguards. First, direct identifiers, such as account IDs, IP addresses, device identifiers and exact timestamps, are removed. Second, individual elements within a query are assessed through an allowlist-based entity-thresholding mechanism: only entities that appear in queries submitted by more than 50 signed-in users over the preceding 13 months, and that are included on a weekly updated list, may be retained. Third, unusually long queries are filtered out by applying a language-specific length threshold, excluding queries whose character count exceeds the 95th percentile for that language. Fourth, metadata is generalised so that location and device information is included only where at least 50 signed-in users share the same metadata combination. Finally, related consecutive queries are grouped through ‘mini-sessionisation’, preserving useful contextual information while reducing the risk that the data could be linked back to an individual user.

The practical effect is substantial. A query such as ‘best climbing gym Vienna 2026’ might be discarded under Google’s approach because the exact string is rare. Under the Commission’s model, however, its individual components – ‘best’, ‘climbing gym’, ‘Vienna’ and ‘2026’ – are assessed separately. Common concepts in uncommon combinations can therefore remain available, while genuinely identifying elements are still filtered out. Singleton queries that contain an identifier, such as names combined with addresses, and other genuinely rare tokens are detected and suppressed before sharing. The result is a dataset that is both more useful for rival search engines and –importantly – also more targeted in its privacy protection.

Google’s defensive position and privacy claims concerning unique queries would therefore be consequential should the Commission accept them in its final specification decision. Removing all unique queries would not merely eliminate a small category of outliers: it would strip out much of the search data and by that reduce the usefulness of the data made available under Article 6(11) DMA, particularly for smaller or emerging search services that depend on access to less common queries in order to improve relevance, coverage and competitiveness. This understanding is inconsistent with Recital 61 DMA, which requires anonymisation of personal data of end users ‘without substantially degrading the quality or usefulness of the data.’

More fundamentally, however, uniqueness should not be treated as a proxy for re-identification risk. A record may be unique within a dataset without, for that reason alone, allowing a user to be identified. The relevant question is not simply whether a query appears once, but whether the information available to a recipient, in context, would realistically enable them to single out or re-identify an individual.

At the same time, uniqueness is also not identifiability. A query may appear only once simply because language is varied; that does not mean it reveals who made it. Article 6(11) DMA requires the end user to be anonymised; it does not require every element of personal data to be stripped from the query. Given the rightly broad understanding of personal data, a person’s name or profession may constitute that person’s personal data and therefore fall within the scope of the GDPR. But the presence of such third-party personal data in a query does not mean that the end user who submitted the query can be re-identified.

Looking fully through the lens of the GDPR, the GDPR does not require anonymisation to eliminate every conceivable or purely hypothetical possibility of re-identification. Recital 26 GDPR asks whether identification is possible by means ‘reasonably likely’ to be used, taking into account objective factors such as cost, time, available technology, and technological developments. That approach has recently been reinforced by the CJEU in EDPS v SRB. The Court rejected the idea that pseudonymised data must automatically be treated as personal data for every recipient and in every circumstance. What matters is whether, in the hands of the relevant recipient and in light of the surrounding technical, organisational and legal constraints, the data subject remains identifiable. The assessment is therefore contextual, not absolute.

On that basis, the relevant question under Article 6(11) DMA is not whether re-identification can be imagined in the abstract, but whether the overall access regime reduces that risk to an insignificant level in practice. Consequently, Google’s approach also appears to assess anonymisation wrongly in isolation from the broader architecture of the proposed access regime. Yet, the Commission’s scheme does not rely solely on technical anonymisation measures applied to the dataset itself. It is also built around security safeguards, access controls, contractual restrictions, limitations on onward use and audit mechanisms. These layers are not incidental; they are an integral part of the risk assessment. The re-identification risk should therefore be evaluated in light of the full framework governing access to and use of the data, rather than by focusing exclusively on whether individual query records are unique.

Last words not spoken

Privacy and personal data protection matter deeply. But precisely for that reason, they should not be turned into tools for avoiding regulatory obligations. If Google’s privacy framing succeeds in watering down or indefinitely delaying the Article 6(11) DMA specification, it risks creating a template for gatekeepers to invoke privacy as a way to neutralise data-sharing remedies across the board.