Why Legal AI Must Check Whether a Case Is Still Good Law
One of the most serious legal AI failures is not a fabricated citation. It is a real citation used for the wrong current-law conclusion. A system can retrieve an authentic judgment, verify that it exists, and still give outdated legal advice if it does not test whether that authority remains valid for the proposition being asserted today.
The reliability gap lawyers actually care about
When lawyers evaluate AI tools, they often start with the obvious question: did the system cite a real source? That matters, but it is only the first safety check. A real case can still be a poor authority for the present question if it was later limited, reversed, overtaken by a higher court, or tied to a procedural posture that no longer controls the issue.
That distinction is easy to miss because AI systems are rewarded for fluent synthesis. If the answer sounds coherent, includes familiar authorities, and uses correct-looking legal language, users can assume the substantive current-law analysis has been done. In reality, the difficult question is often a second-order one: does this authority still support this proposition now?
For legal practice, that is not a technical edge case. It affects advice letters, negotiation positions, injunction strategy, due diligence flags, pleadings, internal memos, and client risk assessments. A persuasive answer built on stale authority is more difficult to detect than an obviously broken answer because it can survive a quick review.
How a legal AI answer can be both real and wrong
Consider a familiar pattern in case-law research. The system retrieves the foundational judgment, several relevant lower-court decisions, and a procedural decision that appears to frame the issue exactly as the user asked it. Every citation is genuine. The result looks impressive.
But later authority has changed the answer. The newer controlling decision may reject the theory the user searched for, narrow the remedy, or resolve a preliminary question that the older authority left open. If the AI does not actively look for that later treatment, it may present a historically interesting authority as if it were still current law.
A public Dutch example shows the point clearly. In the Didam II judgment, ECLI:NL:HR:2024:1661 (1), the Hoge Raad clarified the legal consequences of non-compliance with the Didam rules. An AI system that finds the earlier Didam line and later lower-court procedural material, but fails to detect the controlling Supreme Court answer, can produce a conclusion that is well sourced and materially outdated at the same time.
A verified citation is not the same thing as a reliable authority for the current proposition.
Why source-grounded legal AI still fails on current law
Many legal teams now understand the value of source-grounded AI. Retrieval is better than pure memory. Citations are better than unsupported prose. But source grounding alone does not solve the current-law problem.
There are five recurring reasons.
1. Existence verification is mistaken for legal validity
A system may know that an ECLI exists, that a judgment came from an official source, or that a document was ingested from a trusted corpus. None of that proves the authority still supports the legal point being advanced. Existence, provenance, and legal validity are different questions.
2. Retrieval follows the user's theory
If a user asks about a specific remedy or doctrinal argument, search will often retrieve authorities discussing exactly those terms. That is useful, but it can trap the system inside the user's framing. Legal research often requires the opposite move: find the later authority that says the framed theory is no longer correct.
3. Recency is not hierarchy
A newer lower-court decision is not automatically stronger than an older Supreme Court judgment. A recent commentary page is not stronger than an official ruling. A later procedural decision can be less important than an earlier controlling merits judgment. Currentness in law depends on hierarchy, treatment, and proposition fit, not only publication date.
4. Interim and procedural decisions are especially risky
Preliminary questions, interim judgments, referrals, Advocate General conclusions, and procedural orders are often highly relevant. They are also easy to misuse. They may explain how an issue developed without representing the present legal answer. Legal AI should be able to say: this is relevant history, not safe current support.
5. Commentary helps discovery, but it should not decide the law by itself
Blogs, legal updates, and practice notes are useful for spotting change. They often explain faster than the official text what shifted and why it matters. But commentary is not a substitute for the official authority that actually changed the rule. A serious legal AI workflow should use commentary to discover or interpret, not to overrule primary law on its own.
The better pattern: authority-status validation
The solution is not to ask a model one vague question such as, "Are these cases still applicable?" The better pattern is a dedicated authority-status validation layer between retrieval and synthesis. Research on self-reflective, corrective, truthfulness, and checker-style RAG workflows offers useful language for separating retrieval, evidence checking, and final answer confidence (4) (5) (6) (7).
That layer should answer a narrow professional question: can this authority still support this proposition today, and if not, what later authority or limitation changes the answer?
In practice, that means evaluating more than citation identity. It means classifying treatment. For example:
- Current: the authority still supports the proposition as stated.
- Current with limitations: the authority is usable, but only with an important qualification.
- Historical only: the authority is relevant background or procedural history, not the current answer.
- No longer reliable for this point: the proposition was later rejected, reversed, or overtaken.
- Not fully verified: the system could not safely resolve the current-law status from the available evidence.
Those categories matter because lawyers need more than answers. They need a usable research posture. A reliable system should indicate when to keep, warn, replace, down-rank, or reserve.
What reliable legal AI should do differently
Authority-status validation is not one feature. It is a sequence of disciplined checks.
Start with official legal data
Official sources should remain the first tier. In Dutch research, that means using sources such as Rechtspraak Open Data (2) and official judgment pages. Official metadata can help confirm document type, procedural stage, court, date, and in some cases formal relationships between decisions (3).
For lawyers, the practical value is straightforward: the system should know whether it is looking at a final merits judgment, an interim step, or a procedural document before it starts speaking in definitive current-law language.
Track relationships between authorities
Legal research quality improves when the system can detect that one authority answers, limits, distinguishes, or supersedes another. A flat list of related cases is not enough. What matters is the legal relationship.
That is especially important in evolving doctrine. Without relationship awareness, AI can treat the history of a dispute as if every document has equal value. Lawyers know better. The system should too.
Use source-restricted web discovery for currentness
Official metadata is not always enough on its own. Sometimes the fastest way to discover a later controlling decision or authoritative explanation is targeted search across official and high-trust legal domains. The key is discipline: source-restricted, date-aware, and focused on current-law change, not generic web breadth.
In other words, the question is not "what pages mention this case?" It is "what official or high-trust sources show whether this authority still governs this issue today?"
Use the model as an auditor, not a free improviser
Large language models can help classify treatment when given curated evidence. They are much less reliable when asked to infer status from memory. The safer approach is evidence-bound reasoning: the model may assess the materials in front of it, but it should not invent a controlling authority that was never surfaced in the evidence pack.
That distinction matters for professional review. Lawyers need the path from conclusion back to evidence, not a black-box confidence score.
Surface reservations instead of fake certainty
No legal AI system will resolve current-law status perfectly in every matter. That is not a reason to hide uncertainty. It is a reason to express it clearly. If current-law status cannot be resolved confidently, the right output is a reservation, not a polished definitive claim.
For lawyers, a well-labeled uncertainty is useful. A concealed uncertainty is dangerous.
Why this matters more than ordinary hallucination talk
Much of the public discussion about AI reliability focuses on hallucinations: fake cases, fake quotes, fake statutory language. Those are real problems. Recent legal AI reliability research underlines why source-grounded systems still need careful evaluation (8). But for legal practice, stale-authority errors may be just as important because they survive systems that already look "source grounded."
A lawyer reviewing a fabricated case may catch it quickly. A lawyer reviewing a real but overtaken case may not notice the problem until much later, especially under time pressure. That is why good-law validation deserves its own place in legal AI evaluation.
What lawyers should ask AI vendors and internal teams
- Can the system distinguish source verification from current-law validation?
- How does it detect whether a case has been reversed, limited, or superseded?
- Does it treat interim and procedural authorities differently from final controlling judgments?
- What official sources does it use to confirm treatment and case relationships?
- Can it show why an authority is labeled current, limited, historical, or uncertain?
- What happens when it cannot safely resolve current-law status?
- Are user-facing warnings clear enough that a lawyer can change strategy or escalate review?
Those questions are often more important than model branding. A faster or larger model does not solve the professional problem if the workflow still confuses real citations with reliable current authority.
The practical standard for legal AI trust
Lawyers need more than an AI system that merely sounds current. They need one that can show why an authority is still usable, where the legal position changed, and when the answer should be treated as qualified or unresolved.
That is the real shift from generic AI output to professional legal research support. The system should not only retrieve and summarize. It should test whether the authority still carries the proposition.
The next trust boundary in legal AI is not whether the citation is real. It is whether the authority remains reliable for the proposition asserted today.
Resources and further reading
- HR 15 November 2024, ECLI:NL:HR:2024:1661
- Rechtspraak Open Data documentation
- Rechtspraak formal relationships value list
- Self-RAG, Asai et al. (2023)
- Corrective RAG, Yan et al. (2024)
- RAGTruth, Wu et al. (2024)
- RAGChecker, Ru et al. (2024)
- Stanford Law publication page on legal AI reliability research
Conclusion
Legal AI will become much more useful to lawyers when it stops treating authority retrieval as the end of the reliability problem. The harder and more valuable step is authority-status validation: determining whether the cited material is still current, still controlling, and still safe for the exact proposition at issue.
That is how legal AI moves from plausible research assistance to something lawyers can verify, challenge, and trust under professional standards.