March 04, 2026 ChainGPT

AI Detector Paradox: Colombian Supreme Court's Tool Flags Its Own Ruling

AI Detector Paradox: Colombian Supreme Court's Tool Flags Its Own Ruling
A Colombian Supreme Court decision meant to police AI in legal filings has exposed a messy paradox: the same AI-detector the court relied on to throw out an appeal also flags the court’s own ruling as AI-generated. What the court did - The Supreme Court denied a cassation appeal after running the filing through Winston AI. The tool reportedly found the brief contained only 7% human content, leading the court to conclude it was “produced using artificial intelligence” and therefore inadmissible. - The court said it corroborated the Winston result with other detectors before dismissing the filing. What happened next - Legal observers quickly ran the court’s ruling (Auto AP760/2026) through the same Winston AI tool. Attorney Emmanuel Alessio Velásquez posted that Winston returned a 93% AI-generated score for the Supreme Court’s decision. - Other tests produced wildly inconsistent results. GPTZero flagged the opening lines as 100% AI, but when fed a longer section including factual background it returned 100% human. Lawyers also fed older documents (including a 2019 court filing and a 2020 thesis written before current large language models existed) into detectors and saw rates as high as 95–100% AI. Why this matters - These inconsistencies highlight serious reliability problems with current “AI detectors.” The tools analyze statistical patterns—sentence length, predictability of vocabulary, and “burstiness” (natural rhythm variation)—which can overlap between AI outputs and formal, non-native, or carefully edited human writing. That means lawyers, academics, and second-language writers are especially prone to false positives. - There are commercial conflicts, too: some detection services invite users to “humanize” flagged text via paid remediation, raising questions about incentives when a detector’s verdict can remove someone’s access to justice. Evidence of wider failure - Research and institutional experiences back up the concern: a 2023 Patterns study found over 61% of TOEFL essays by non-native speakers were wrongly flagged as AI. A systematic review by Weber-Wulff concluded available tools lack precision and reliability. Turnitin admitted in June 2023 to higher false positives at low AI levels; OpenAI pulled its own AI detection tool after persistent inaccuracies. - Universities have already felt the fallout: Vanderbilt disabled Turnitin’s AI detector in 2023 to avoid thousands of false positives; the University of Arizona dropped detection features after a student’s grade was affected by a false positive; UC Davis flagged dozens of linguistics students, most of them non-native speakers. Legal and regulatory context in Colombia - These rulings—AC739-2026 (a February decision fining a lawyer for citing nonexistent “AI-generated” precedents) and AP760-2026—are among the first in the region to tackle alleged misuse of generative AI in court documents. - Colombia’s judiciary adopted guidelines in December 2024 (agreement PCSJA24-12243) allowing AI for administrative support but restricting its use for evidence evaluation, legal interpretation, and judicial decision-making. The rules require human review and disclosure when AI tools are used—language that critics say could be invoked to challenge the Supreme Court’s approach here. Reaction and implications - Lawyers called out the irony of using opaque automated systems to police AI. “If the very ruling that condemns the use of artificial intelligence scores that percentage, the methodological fragility of using these detectors as argumentative support becomes self-evident,” Velásquez wrote. - Critics warn that unchecked reliance on black-box detectors risks delegitimizing access to justice, penalizing the most careful and formal writers, and embedding private vendors’ incentives into public procedures. Bottom line The episode exposes a core tension at the intersection of technology and law: judges are right to be wary of AI misuse, but the tools currently used to detect that misuse are error-prone and commercially driven. Until detectors become demonstrably reliable and transparent, their use as decisive evidence in court risks miscarriages of justice—ironically, driven by the very technology they aim to police. The Supreme Court has not offered further comment; and yes, the ruling in question didn’t contain any em dashes either. Read more AI-generated news on: undefined/news