The "Trustworthy AI" Trap

The “Trustworthy AI” Trap
There’s an interesting consensus forming across the usual antagonists — Silicon Valley and Brussels, libertarians and regulators, the “move fast and break things” crowd and the “precautionary principle” devotees. They’ve all converged on the same altar: Trustworthy AI.
The catechism goes something like this: we need guardrails, safety layers, alignment research, red teams, and constitutional frameworks to ensure these models never hallucinate, never exhibit bias, never lead the innocent user astray. An entire industry has emerged around the premise that if we just engineer AI correctly, we can make it safe for everyone.
I think we’re unintentionally paving a way to hell.
Trustworthiness Lives in the User, Not the Model
My counterpoint, stated plainly, is: Trustworthiness isn’t a feature of software. It’s a competency of the user.
Hand a scalpel to a toddler — it’s a weapon. Hand it to a surgeon — it saves lives. The object hasn’t changed. The operator has.
When we ask “Is this AI trustworthy?”, we’re asking the wrong question. The more useful one is: Is this user equipped to use this AI appropriately?
A large language model is, fundamentally, a sophisticated pattern-completion engine. Its outputs are probabilistic, context-dependent, and reflect the biases and limitations of its training. This isn’t a bug to be fixed—it’s the nature of the technology. A skilled user who understands this will verify claims, cross-reference outputs, and treat AI as a tool rather than an oracle. An unskilled user will accept hallucinations as gospel. Same model. Radically different outcomes.
The industry response has been to dull the blade so the toddler won’t cut themselves. But nobody’s asking the obvious question: why are we designing for toddlers?
Education Over Sanitisation
The mainstream approach says: Users can’t be expected to understand AI, so we must make AI foolproof.
But what if this gets the cost-benefit analysis backwards?
We’re pouring billions into engineering “truth” into probabilistic systems. This is (and I don’t use the term lightly) a category error. LLMs are prediction engines, not truth engines. They model what comes next in a sequence, not what corresponds to reality. Asking them to be 100% factually accurate is like asking a jazz musician to only play notes that Beethoven would have approved of. You can try. You’ll produce something worse.
The alternative is less glamorous but more honest: invest in AI literacy. This means teaching users that token prediction is not retrieval from a database of verified facts but a synthetic interpolation across a vast cultural and informational space. Verification must always remain the job of the biological intelligence in the room.
This creates durable, transferable skills. It produces people who can navigate not just today’s ChatGPT, but tomorrow’s unknown systems. Engineering increasingly elaborate safety theatre, on the other hand, creates false security while the underlying uncertainty remains. Users learn to trust the guardrails instead of developing judgement.
When we outsource truth-seeking entirely to the machine, we’re not being careful. We’re abdicating.
The Paradox of Foolproof Systems
Here’s where it gets uncomfortable.
The most dangerous outcome isn’t that we fail to make AI trustworthy. It’s that we succeed.
Consider GPS. It’s accurate enough (99,9% of the time) that we’ve collectively forgotten how to read maps, navigate by landmarks, or maintain any internal model of geography. We follow the blue line with the docility of cattle. Occasionally someone drives into a lake because the machine said to. The error rate is low, but when errors occur, we’re helpless.
Scale this up. If AI becomes genuinely reliable — if it earns our complete trust — we will stop checking its work. We’ll stop thinking. Ivan Illich wrote about this decades ago in Tools for Conviviality: past a certain threshold, our tools don’t serve us, they disable us.
A 90% accurate AI used by a discerning population is far safer than a 99,9% accurate AI used by a population that has forgotten how to think critically. That margin of doubt keeps us alert, forces us to triangulate, maintains our role as the final arbiter of what’s true. The errors from the “safer” system will be accepted uncritically, at scale, by billions.
Every layer of protection that removes the need for human judgement is a layer of human capability atrophying. Spell-checkers weakened spelling, and calculators changed our relationship with mental arithmetic. These were acceptable trade-offs. But critical thinking? The ability to evaluate sources? Epistemic humility? These aren’t skills we can afford to outsource.
The Real Ask
So here’s the uncomfortable conclusion I keep arriving at:
We don’t need AI that earns our trust. We need AI that demands our attention—that keeps us in the loop, sceptical, engaged. The goal shouldn’t be a machine so reliable we can sleepwalk through life. It should be a tool sharp enough to be useful and uncertain enough to keep us awake.
The “Trustworthy AI” project, taken to its logical endpoint, is a project to make human judgement obsolete. Our current framing of AI as a product to be perfected rather than a tool requiring skill, may be creating the very fragility we’re trying to prevent.
Perhaps the most trustworthy AI future isn’t one where we’ve finally built the foolproof system. It’s one where we never needed it to be.