Headline: Colombia’s Supreme Court Rejects Appeal for Being “AI‑Written” — Then Its Own Ruling Gets Flagged by the Same Detector
The Supreme Court of Colombia recently tossed a cassation appeal after concluding the filing was generated by artificial intelligence. The court relied on a commercial detector called Winston AI, which reportedly found the brief contained only 7% human-written content — i.e., roughly 93% AI. After running other detectors with similar results, the court declared the filing inadmissible.
The move intended to crack down on the misuse of generative AI in legal filings. Instead it set off an immediate backlash when lawyers plugged the court’s own ruling (Auto AP760/2026) into the same detection tools — and got similar AI‑heavy scores.
What happened
- Attorney Emmanuel Alessio Velásquez posted on X on March 3, 2026 that Winston AI flagged the court’s text as 93% AI-generated. His tweet went viral among Colombian lawyers.
- Other lawyers and observers ran their own tests. Results were inconsistent: GPTZero returned 100% AI when scanning only the opening words of the court text, but flipped to 100% human on a longer version including background facts.
- Criminal defense lawyer Andres F. Arango G. submitted a 2019 court filing — predating the large language models those detectors claim to spot — and Winston reported 95% AI. An undergraduate thesis from 2020 was flagged as 100% AI.
- Attorney Darío Cabrera Montealegre and others highlighted the irony: the court used AI tools to decide whether a text used AI.
Why these tools are unreliable
- Most detectors use statistical heuristics — sentence length, vocabulary predictability and “burstiness” (variation in rhythm) — to estimate how likely text is machine-generated.
- Many legitimate texts mimic those patterns: formal legal prose, academic writing, and second‑language writers often look “machine‑like” to these algorithms.
- Research supports these limits: a 2023 Patterns study found over 61% of TOEFL essays by non‑native English speakers were misclassified as AI-generated. A systematic review by Weber‑Wulff in 2023 concluded available detectors aren’t precise or reliable.
- Industry experience shows the same problem: Turnitin acknowledged higher false positives when actual AI content is low; OpenAI pulled its own detector after accuracy issues. Universities including Vanderbilt and the University of Arizona disabled or scaled back detection features after false positives hurt students (Vanderbilt estimated ~3,000 false positives a year; a student at Arizona lost 20% of a grade after a false hit; UC Davis flagged mostly non‑native speakers in a 2024 case).
- There’s also a commercial conflict: some detectors push paid “humanize” editing services after flagging text, creating a perverse incentive.
Legal and policy context in Colombia
- Colombia is already moving to regulate judicial use of AI. In December 2024 the judicial branch adopted guidelines (agreement PCSJA24-12243) restricting AI use: administrative tasks and summaries are allowed, but sensitive uses (legal reasoning, evidence evaluation, judicial decision-making) require careful human review and disclosure when AI was used.
- The Supreme Court’s recent decisions — AC739-2026 (a Civil Chamber ruling that fined a lawyer for inventing 10 AI‑generated precedents in February) and AP760-2026 (the cassation matter) — are among the region’s first court tests of policy and practice around generative AI.
- The court has not publicly responded to the outcry about its choice of detection tools.
Why crypto and tech communities should care
- The episode reads like a cautionary tale for any high‑stakes domain that trusts opaque, proprietary AI tools: when tools are black boxes, false positives can deny access to justice, careers, or reputations.
- Crypto communities often push for transparency, auditable code and decentralized verification precisely to avoid single points of failure and hidden incentives — the same principles apply to AI‑detection regimes used in courts.
- If detection tools are to carry legal weight, they must be rigorously validated, transparent about methods, and subject to independent audit — otherwise we risk automating errors into binding decisions.
Takeaway
Colombia’s attempt to police AI in court filings exposed a structural problem: imperfect detectors, commercial incentives, and opaque algorithms can produce outcomes as problematic as the misuse they’re meant to prevent. Until detectors become demonstrably reliable and auditable, courts should be wary of letting them decide admissibility or take away litigants’ rights — and should stick to the judicial branch’s own rule: humans remain responsible for rulings. (As an aside that caught social media attention: the disputed ruling didn’t even contain em dashes.)
If you want, I can draft a short explainer for crypto readers on how decentralized verification and open‑source audits might help make AI‑detection more trustworthy.
Read more AI-generated news on: undefined/news