Is Arbitration a Probabilistic Game? Some Reflections on the Potential Use-Case for AI/ML in Expert Damages Evidence
June 30, 2026
Choosing an arbitrator is not just about experience and credentials, but it is also about intuition, signalling, and strategic alignment. It is about loading the dice.
The same is true in liability. Counsel’s role is to persuade the Tribunal that their client’s account of the facts and law is the more persuasive and probable one.
But is this probabilistic framing taken into the damages phase? That could be a missed opportunity.
We have seen speeches, such as the 11th Annual EFILA Lecture, address the use of artificial intelligence (“AI”) tools to perform predictive analysis of dispute outcomes. Such tools may be valuable at key decision-making junctures, including before commencing proceedings or seeking funding, by predicting likely outcomes based on comparable past cases. We are also seeing an increasing number of attempts to legislate or create soft law or guidance around the use of AI in proceedings (see Blog post here).
This blog post, by contrast, focuses on a different application of AI: its use in supporting the construction of compelling damages evidence. The authors suggest that the exercise of formulating damages opinions often requires a significant element of probabilistic judgment, rather than mere arithmetic ‘calculation’.
In that sense, the considerations raised by this post could be seen as a more ‘micro-level’ application of AI, as opposed to the ‘meta-level’ outcome-prediction tools discussed in the lecture above.
This post focuses primarily on Machine Learning (“ML”) models: AI systems trained on data to identify patterns and generate probabilistic inferences relevant to damages quantification (“AI/ML”).
This post explores whether ML tools may help us “guess better” in damages quantification, while underscoring the continuing importance of human judgment and tribunal confidence in AI/ML-assisted expert evidence through transparency and explainability.
Damages: Hypotheticals and Judgement
Every ‘but for’ scenario is, by definition, a hypothetical, which is unknowable with 100% certainty.
Instead, we draw from incomplete historical data, adjust for real-world noise, and layer assumptions to approximate an outcome. The arithmetic may be neat, but the inputs often traditionally rest on human judgement.
AI/ML steps in, not as a calculator, but as a probabilistic inference engine.
AI/ML, when properly used, does not decide. It models. It helps us better guess what the counterfactual could have looked like. In a world governed by legal doctrine, this offers a powerful, but bounded, tool.
Human Judgement Still Matters
Recent panels at SIA on technology and arbitration also highlighted the importance of AI as an adjunct to, rather than substitute for independent thought, judgement and analysis. This is as true of expert evidence as it is of counsel submissions.
It has also been proffered that Tribunals could have the power to simply adopt uncritically and at ‘face value’, an AI-derived damages figure, even if from an un-replicable ‘black box’ model, where parties have pre-agreed an AI-powered damages model that can convert factual raw data inputs into a single damages figure.
In practice, such arrangements may prove difficult to operationalise. Disputes may arise over raw data inputs or format, cleansing methodologies, model parameters, or even whether the deployed model was in fact the agreed model. Who operates and debugs the model? How is discretion exercised? Can incomplete models be completed such that an output is even possible? What if the output is non-sensical?
In the authors’ view, arguments in favour of formula-driven, mathematical AI solutions to quantum issues may be compelling in certain use cases – particularly where the basis for damages, compensation, or amounts owing is heavily fact-based and largely ‘calculative’ in nature – and perhaps best suited to lower value disputes. An analogy might be drawn with liquidated damages, or with formula-based payment and pricing clauses in contracts.
By contrast, the use of AI/ML tools to refine assumptions or to assist with probabilistic analysis (for example, Monte Carlo simulations which would otherwise be extremely time-intensive) arguably supports the assessment of damages or compensation on a non-liquidated, yet still transparent, basis. That is, AI/ML tools assisting human decision-making, not replacing it.
The Basics of AI/ML Training
Training an AI/ML model ideally follows a three-stage process using a data set split into three distinct components:
- Training data: the model is first trained on a ‘set’ of known (historical) information.
- Validation data: this helps to refine the model and its internal parameters.
- Testing data: is completely ‘novel’ to the model but ‘known’ to the tester (i.e. it is data outside of the previous data sets presented but still empirical, historical data). It is this set on which the model’s predictive performance is tested. We ask the model: ‘given X, predict Y’.
Performance on this data simulates the model’s predictive power on future, genuinely unknown (or unknowable) data such as a ‘but for’ scenario.
Think of it like (US-style) witness prep: ‘training’ is rehearsal on a question set, ‘validation’ is a preliminary dry run on unseen questions with feedback used to assist with rehearsal, and ‘testing’ is the final dry run - without coaching or feedback - on further unseen questions to see if the logic holds under pressure. The ‘model’ is then ‘run’ in examination-in-chief or cross-examination.
If an AI/ML model predicts reliably on ‘testing’ data, we can infer it may make similarly accurate predictions when presented with additional novel data.
Application in Arbitration: Where AI/ML Adds Value
AI/ML may be particularly useful for modelling damages where uncertainty is highest, such as lost profits, diminution in intangible value or business interruption losses. These are forward-looking questions, and humans (even self-proclaimed experts) are notoriously bad at forecasting. A well-trained model may do better, though it seems for now that such an approach is rarely used in practice.
Machine-driven scenario analysis can evaluate more permutations than a human analyst realistically could, moving beyond simplistic low-, base- and high-case analyses into thousands of potential scenarios and distributional modelling.
Distributional modelling can give the Tribunal a more nuanced view of risk than a single ‘best estimate’, showing a range of possible outcomes and the likelihood of different results within that range.
The important point is that such models must be explained in practical terms. Any use of ‘bell curves’ should be translated into propositions the Tribunal can understand, such as: “there is a 60% probability that the loss exceeds USD 100 million” or “an 80% probability that the loss falls between USD 75 million and USD 150 million.”
Any limitations of such models should also be made apparent. An AI/ML model is generally only as good as the data upon which it is trained on (the ‘garbage in garbage out’ principle still applies). So such models can for example underestimate left tail (catastrophic) risk or even right tail (blue sky upside) risk if such models are only trained on say 20 years of data thus not taking into account 1-in-100 or 1-in-1000-year events which might not appear in the 20-year data set.
AI/ML’s strength is not omniscience; its strength lies in access to tremendous computational power, repeatability, and the ability to find patterns in data. If used properly, it augments human reasoning, but if misused, it can obfuscate or mislead.
Implications for Advocates, Experts, and Arbitrators
Some parting thoughts for each stakeholder:
- Lawyers should ensure that AI-derived evidence is explainable. The best advocacy will not only translate technical outputs into human reasoning the Tribunal can understand, but will also explain the underlying model training process so the Tribunal can trust the model.
- Experts may do well to document every human decision, such as model architecture, data filters, tuning parameters, akin to an arbitrator drafting the full procedural history of an award. Visual process mapping may also assist Tribunals in understanding how inputs move through the model.
- Tribunals may consider scrutinising not only the output, but also the process. If the Tribunal finds part of the process or a human assumption during the process was suboptimal, can the model be rerun with Tribunal-modified assumptions and process? Can the internal logic of the process selected be followed? Is the model sufficiently robust and resilient to withstand reasonable changes in human discretion in assumptions or process when deployed ‘in the wild’ (and not under ideal conditions), without becoming unstable and losing predictive performance? If not, it may deserve little weight. As for the use of AI by arbitrators themselves, prior Blog posts have indicated that the use of AI by arbitrators in the ‘work’ of arbitration is another consideration entirely, presumably not least because of due process arguments.
Importantly, this is still about persuasion, not perfection.
As an evidentiary matter, the Tribunal still ought to believe the AI/ML-supported conclusion is more likely than not (in the same way as non-AI/ML expert conclusions). That is, if arbitration is a probabilistic game, AI/ML can help load the dice – but not roll them for us.
Final Thoughts: From Prediction to Persuasion
AI/ML has significant potential in complex, data-heavy disputes. However, its value depends on how well its insights can be framed, justified, and tested.
Real trust comes not from the tools we use, but from our ability to communicate clearly, act transparently, and collaborate.
AI/ML does not eliminate uncertainty. Unless we explain our models in ways Tribunals can understand and trust, we are just hiding a bet in a box.
You may also like