Download

What if We Got Arbitration Wrong? Reimagining the System with MuZero

David Molina Coello (Bullard Falla Ezcurra+)

August 25, 2025

Natural language processing (“NLP”) artificial intelligence (“AI”) tools are booming in the legal industry, and international arbitration is no exception. Practitioners are harnessing AI to boost efficiency in the work of counsel, experts, and arbitrators. According to Queen Mary University’s 2025 International Arbitration Survey, 64% of respondents reported using AI for factual and legal research, and 91% expect to use it in the future. This marks a significant jump from the 2021 Survey, where only 15% of respondents reported using AI “frequently.”

But the focus on using AI for research may be too narrow. What happens if we step back from incremental improvements in using NLP AI tools and instead ask a more radical question: if we recreated arbitration from scratch, could AI imagine a better, previously unimagined version of the arbitral process? This post explores exactly that.

MuZero: Dispensing with Human Experience

Much of human knowledge, including arbitration, is built from the trial-and-error learning process that humankind calls experience. In machine learning (“ML”) this type of knowledge generation is called reinforcement learning (“RL”). RL systems learn to achieve “optimal results” through trial-and-error actions executed in a simulated environment. After each action, the environment delivers a reward: a numerical signal—positive, negative, or zero—which tells the AI how well it performed depending on whether the action moved it closer to its objective. Repetition then generates experience.

Some RL systems learn from human-generated data, while others start from scratch in what is known as a tabula rasa (i.e., blank slate) setting—where the system receives no rules, no prior knowledge, and learns entirely through interaction. In such cases, the only predefined element in the environment is the objective and reward—everything else (including possible actions and resulting states) is created through interaction. DeepMind’s MuZero is a well-known example of such a system, currently applied to robotic planning and control, video compression, and the mastery of complex video games such as those in the Atari Learning Environment, including Montezuma’s Revenge and Ms. Pac-Man.

MuZero is given no rules or data—nor does it need them. By interacting with the environment and focusing on outcomes, it builds internal representations that guide its decisions. The resulting knowledge runs in parallel to human experience, offering a new and independent source of knowledge capable of uncovering strategies and insights previously unimaginable to humankind.

What if Arbitration Could Be Done Better? Let AI Decide

If RL tabula rasa AI can uncover better ways to play human-created complex video games entailing long-term planning and unpredictable behavior within the environment; and even create efficiencies in robotic planning and tech environments, can it do the same with a complex human dispute-resolution system like arbitration? The hypothesis is that it could.

This post theoretically explores an arbitration-specific tool like MuZero whose goal would not be to “win” cases, but to discover procedural rules better suited to structuring proceedings. The experiment would not mimic human reasoning; it would ask whether an AI system—starting from zero—could design an arbitral process that is more efficient, fair, or consistent than the options and combinations that exist today. In short, can AI invent the game board instead of merely playing the game?

Although purely conceptual for now, the idea is that an AI system could propose a new arbitral process from scratch. For such an experiment to succeed, however, the AI must learn what arbitration could be, not what it already is. Predefining procedural stages or scripting interactions based on human experience would poison the well.

An arbitration-oriented RL tabula rasa system would begin without any predefined actions or resulting states. As with MuZero, the only irreducible core of the environment would be the objective and the reward. These raise a key difference from AI applications in games or robotics: here, the objectives are harder to quantify. In Go or chess, success is binary and measurable. In dispute resolution, “fairness” or “efficiency” must be translated into numeric proxies. One possible approach would be a two-step optimization process where objectives are broken down into quantifiable metrics: first, simulate thousands of interactions to determine the most efficient procedural structures (e.g., by measuring speed and cost minimization); then, refine these structures by introducing fairness constraints, such as maximizing the chance that all parties are adequately heard, or limiting asymmetries in procedural influence.

For its interactions, the simulation would rely on purely abstract actors, each with their respective objective and reward, that could be labeled as claimant, respondent, or arbitrator for illustrative purposes, but which are not meant to replicate human claimants, respondents, or arbitrators. These are not human-mimicking actors, nor are they fed real-world data. They are functional placeholders that interact in a blank environment guided only by role-specific reward functions. Each actor is assigned general objectives depending on where it stands, such as being heard, reaching resolution, or ensuring procedural economy.

For example, in the first step of the experiment: Actor A (claimant) initiates the interaction (e.g., initiates proceedings) with the goal of an expeditious and cost-effective decision; Actor B (respondent) interacts with the goal of a grounded and fully addressed decision, with no time constraints; and, Actor C (arbitrator) interacts with the goal of deciding in an effective and expeditious way.

The system would not be told to design an “arbitral process” in the traditional sense. That term itself could bias the simulation. Instead, the system would be asked, through trial-and-error, to develop a procedural framework—a sequence of steps—that accomplishes objectives such as fairness, efficiency, and consistency, expressed through proxy reward structures.

Through these repeated interactions, the system would gradually discover which procedural structures best serve its objectives. The reward function guides the development of structure, but the resulting logic is entirely self-constructed by the AI. States that align most closely with the objectives evolve into the building blocks of the AI-generated arbitration process. For example, within the system, quick but unfair paths lose points, while slower but fairer ones gain them. Over time, the AI retains the high-scoring designs and discards the rest, inventing a dispute resolution model tuned to its objectives.

This kind of AI would be fundamentally different from today’s arbitration tools. For example, Jus Mundi AI uses NLP to extract patterns, structures, and legal concepts from actual human-written documents like awards, procedural orders, and pleadings. It learns from what has already happened, rather than generating its own procedural logic from scratch, based only on abstract goals and self-generated experience acquired through thousands of simulations.

The Outcome? No One Can Say

The results could be revolutionary. The system might devise procedural frameworks we have never considered, rethink evidentiary rules, propose new hearing structures, or uncover hidden correlations between procedural steps and outcome predictability. It might reveal efficiencies we have overlooked, or offer structural fixes to fairness problems we take for granted.

But the risks are equally clear. The system could evolve alien logic (incomprehensible to humans), adopt patterns that lack transparency, or reinforce latent biases embedded in its initial simulation parameters. It might become accurate but unreadable, or fair but unexplainable.

Still, if tested through multiple iterations and guided by carefully defined goals, it seems plausible that a new and efficient perspective on the arbitral process would eventually emerge. Just as MuZero extracts the essence of an environment through pure reasoning, an arbitration AI might converge on a model of dispute resolution that—although unlike anything we currently know—is both principled and profoundly functional.

This is not tomorrow’s project. But it is the kind of moonshot that legal technologists and policymakers may one day pursue. Not just automating arbitration but uncovering a version we have not yet imagined.

The content of this post is intended for educational and general information. It is not intended for any promotional purposes. Kluwer Arbitration Blog, the Editorial Board, and this post’s author make no representation or warranty of any kind, express or implied, regarding the accuracy or completeness of any information in this post.

Comments (0)

Your email address will not be published.