The Referee Is on the Payroll

TJ Ashcraft
Mar 7
8 min read

Updated: Mar 12

PART II OF II // VIBES VS. VERIFICATION

You Are Using AI to Fact-Check AI. That Is Not Neutral Ground.

Last time, we established the problem: critical thinking has become a content format, and the performance has gotten good enough that most audiences cannot distinguish it from the real thing. We ended with a challenge — stop performing, start checking.

Here is the uncomfortable follow-up.

Checking with what?

If you read something generated by an AI and your instinct is to open a different AI to verify it, you have not left the problem. You have moved one level up inside the same problem. The referee is on the payroll. And most people using these tools right now do not know that, or have not thought about what it means.

This post is about that. Not to argue that AI is uniformly unreliable — it is not — but to argue that using one AI platform to critique another requires a specific kind of literacy that almost nobody is teaching and almost nobody has. And that the absence of that literacy is being quietly exploited by the very tools designed to help you think.

The Problem With the Mirror

All large language models share a common origin: they are trained on human-generated text, much of which was itself generated, influenced, or laundered through earlier AI systems. The training data is not cleanly separated by source. The model you are using to check a claim has almost certainly been trained on content produced by models similar to the one that generated the original claim.

This is not a conspiracy. It is an architecture. And it has a predictable consequence: AI systems tend to agree with each other more than they agree with external reality, because they are drawing from overlapping pools of information processed through similar statistical methods. When you use one AI to check another, you are often measuring consistency within a shared framework — not accuracy against the world.

Consistency is not the same thing as truth. It is possible for two systems to be consistently wrong in the same direction.

There is a second problem, which is commercial. The major AI platforms are competitors. But they are also all operating within the same basic epistemic economy: the more you trust the output, the more you use the tool, the more value accrues to the platform. None of them have a financial incentive to tell you that their output should be treated skeptically, cross-referenced with non-AI sources, or held lightly pending verification. The product is your confidence. Undermining your confidence is bad for the product.

This does not make the tools malicious. It makes them structurally inclined toward a particular kind of error — the error of overconfidence — in a way that their design does nothing to correct and their business model actively encourages.

The Laundering Effect

When AI-generated content circulates on the internet — in social posts, in articles, in forum threads, in comments — it gets ingested back into the training data of subsequent models. This creates a feedback loop with a very specific character: confident, fluent, plausible-sounding text gets rewarded with attention and shares, which means it gets indexed, which means it gets trained on, which means the next generation of models produces more confident, fluent, plausible-sounding text, regardless of its accuracy.

This is information laundering at scale. The original uncertainty — the hedges, the caveats, the genuine epistemic humility that responsible sources include — tends to get stripped away as content circulates and gets compressed into shareable form. What remains is the confident assertion. What gets trained on is the confident assertion. What gets produced in response to your query is the confident assertion.

The screenshot from last week's post — seven prompts that 'actually unlock better results' — is a small example of a large pattern. A numbered list formatted to look like insider knowledge, worded to feel authoritative, designed to be shared. It almost certainly informed similar lists. Those lists almost certainly shaped how AI models describe effective prompting. The original dubious claim has now been laundered through enough iterations that it appears, in aggregate, to be conventional wisdom. Because it is conventional wisdom — in the dataset.

When you ask an AI to evaluate that claim, you are asking a system trained on its descendants to assess whether it is credible. The answer will almost always be some version of yes.

But — And This Matters — AI Can Be Used Well

Here is where the post has to hold the tension rather than resolve it, because the honest answer is not that AI verification is useless. It is that the way most people are currently doing it is useless, while a more disciplined approach can be genuinely valuable.

The distinction is this: AI is good at identifying the structure of a claim and bad at verifying its substance against external reality. It can tell you whether an argument is internally consistent, whether its logic holds, whether it is missing an obvious counterargument, whether its framing is rhetorically loaded. It cannot reliably tell you whether the underlying facts are true, because it has no access to the world — only to text about the world, text that it has processed through methods that systematically favor confident assertions.

Used as a logic and rhetoric auditor, AI is a powerful tool. Used as a fact-checker, it is a confidence machine wearing a fact-checker's badge.

The people who use it well know the difference. They use AI to pressure-test an argument's structure while going elsewhere — to primary sources, to domain experts, to non-AI fact-checking infrastructure — to verify the substance. They treat AI output as a first draft of thinking, not a final verdict. They are aware that fluency is not accuracy and that the absence of a disclaimer is not the presence of a citation.

Most people are not doing this. Most people are doing something much faster and much less reliable: asking one AI if another AI was right, getting a confident answer, and moving on.

The Specific Danger of the Cross-Check

There is a particular failure mode worth naming explicitly because it is so common and so underexamined.

You read something — a social post, an article, a recommendation — that was generated by or heavily shaped by AI. It sounds authoritative. You are uncertain. You open a different AI platform and ask: 'Is this accurate?' The system responds with a measured, well-organized answer that largely confirms the original claim while noting a few minor caveats. You feel more confident. You share it.

What has happened here is not verification. What has happened is that two systems drawing from overlapping training data have produced consistent outputs, and your brain has interpreted consistency as evidence of accuracy. This is a cognitive error the tools are designed — not maliciously, but structurally — to produce. Confident output generates trust. Trust generates use. Use generates data. Data generates more confident output.

The caveats the system added made it feel more credible, not less. That is the tell. Genuine uncertainty sounds different from performed uncertainty. A system that genuinely does not know says 'I don't know' or 'I can't verify this' and directs you to a source that can. A system performing epistemic humility adds a hedge and then proceeds to give you exactly the confident answer you were looking for. If the caveat does not change the conclusion — if it is decoration rather than substance — it is not a real caveat. It is credibility costuming.

The caveat that changes nothing is not humility. It is the performance of humility, wearing humility's clothes to get past your defenses.

A Framework for Using AI to Critique AI

None of this means stop using the tools. It means use them with a clear-eyed understanding of what they are good at, what they are structurally inclined to get wrong, and what you need to do yourself that no AI can do for you.

Use AI for structure and rhetoric. Go elsewhere for substance.

1. Ask for the argument's skeleton. Strip away the fluency and ask the AI to identify the core claims being made, the evidence cited for each, and any logical steps that are assumed rather than stated. Fluent writing conceals weak arguments. The skeleton reveals them.

2. Ask what is missing, not whether it is right. The question 'Is this accurate?' generates confidence. The question 'What evidence would challenge this?' generates useful friction. Ask for the strongest counterargument. Ask what a skeptic would say. Ask what the piece does not mention. A system that cannot produce a credible counterargument is not reasoning — it is confirming.

3. Identify every factual claim and verify each one outside the AI ecosystem. Not all claims. The specific factual ones — the statistics, the attributions, the historical assertions. These are the ones AI gets wrong in ways that sound completely right. Take each one to a primary source, a peer-reviewed database, or a non-AI fact-checking outlet. This step cannot be skipped. It is the step.

4. Listen to the hedges. A system that says 'I cannot verify this' or 'my training data may not reflect current information on this topic' is being more useful to you than one that gives you a clean answer. Treat the hedge as important data, not as noise to skip past. If a system never hedges on the topic you are researching, that is a red flag, not a green one.

5. Notice when the tool agrees with you. Confirmation is the most dangerous AI output. If the system tells you what you already believed, with better-sounding evidence than you had before, slow down. You have not been verified. You have been echoed. The echo feels like proof. It is not.

6. Ask the same question of a different kind of source. What does a domain expert — a human one, accountable to a field, a peer review process, an institution — say about this? What does the relevant primary research say? If the AI's answer and the primary source agree, you have something. If they diverge, you have a more important conversation to have.

The Terrain Lens — Applied

When you use AI to evaluate AI-generated content, apply the lens to both the original output and the critique.

1. The First Feeling: Does the AI critique make you feel more confident, or does it make you feel like you now understand the problem better? More confidence is a warning sign. Better understanding is the goal.

2. The Credibility Costume: What cues is the AI using to perform trustworthiness? Organized structure, measured tone, a caveat before a confident conclusion — these are aesthetic choices, not evidence of accuracy.

3. The Missing Context: What sources did neither the original nor the critique cite? What expertise is being claimed without being grounded in a verifiable, accountable source?

4. The Incentive Structure: Both AI platforms have a financial interest in your confidence and continued use. Neither has a financial interest in telling you the output is unreliable. Weight that accordingly.

5. The Verification Path: What would it take for the AI to tell you it is wrong? If you cannot construct a scenario where the system says 'I don't know' or 'this is outside what I can verify,' the system is not reasoning under uncertainty. It is managing your confidence.

6. The Anti-Shortcut Check: Using AI to check AI is the shortcut. It feels like due diligence. It is often the fastest route to a laundered, double-confirmed wrong answer. The actual check requires going outside the AI ecosystem entirely.

7. The Share Test: If you share this because two AI systems agreed with each other — you are not sharing verification. You are sharing a coincidence dressed as consensus.

The question this series has been circling is not whether AI is useful. It is. The question is whether the way most people are using it — including the way it is being positioned as a solution to its own problems — is producing the critical thinking it appears to produce.

And the honest answer, right now, is no.

You are operating in an information environment where the tool that creates content and the tool that critiques content are increasingly the same tool, built by overlapping architectures, trained on overlapping data, optimized for the same outcome: your confidence. That confidence is the product being sold. Your job is to be the one thing in that loop that the product cannot fully account for — a person who actually checks.

That is harder than it sounds. It requires going outside the loop entirely, which is inconvenient, which is exactly why most people do not do it. The feed rewards the shortcut. The shortcut feels like work. The work feels like you are done.

You are not done. You are back at the beginning of the only question that matters: how do you know?

TODD J. ASHCRAFT

The Referee Is on the Payroll

You Are Using AI to Fact-Check AI. That Is Not Neutral Ground.

Recent Posts

Comments