Sanity
Miss Info
← Back to Home

Technical FAQ

Why general-purpose LLMs miss logical fallacies, and how we're training models that catch them

The Problem: Why LLMs Miss Logic Gaps

Most large language models are trained on massive amounts of text from the internet; everything from Wikipedia articles to Reddit threads to news sites. This gives them an impressive ability to understand language and generate coherent responses. But when it comes to critical reasoning, they often fall short.

Here's the thing: LLMs learn patterns from their training data. If an article uses sophisticated language and cites studies, the model might classify it as "well-reasoned" simply because it looks like high-quality content. The model has learned to recognize the style of good reasoning; it doesn't necessarily recognize the structure of valid logic.

Think of it like this: An LLM might see "Studies show that coffee drinkers live longer" and think it sounds reasonable. That's the kind of claim that appears in legitimate news articles. But it won't automatically check: Did the study account for confounding variables? Could there be reverse causation? Are we seeing the full picture or just survivors?

This isn't really a bug. It's a fundamental limitation of how these models work. They're pattern-matching engines, not logical reasoning systems. They predict "what would a human say next?" They don't ask "does this argument actually make sense?"

What Kinds of Logic Gaps Get Missed?

General-purpose LLMs tend to miss logical fallacies that require thinking beyond the surface-level meaning of the text. Here are the most common categories:

  • Hidden Confounders
    When a study claims "X causes Y" but doesn't account for a third factor Z that might explain both. For example: "Coffee drinkers are healthier" might actually mean "People who can afford coffee and have time for it are healthier." It's not the coffee; it's the socioeconomic status.
  • Survivorship Bias
    Only looking at the winners. "All successful startups had these traits" ignores the thousands of failed startups that also had those traits. The model sees the pattern in the successful cases; it doesn't think to check the failures.
  • Base Rate Neglect
    Focusing on percentages within a group without considering how common that group is overall. "87% of unicorn startups have strong teams" sounds impressive. Then you realize 90% of all startups have strong teams; so it's not actually predictive.
  • Reverse Causation
    Assuming A causes B when actually B causes A, or when they're correlated due to some other factor. "People who exercise regularly sleep better" might actually mean "People who sleep well have energy to exercise." It's not that exercise improves sleep.
  • Narrative Fallacy
    Constructing a story after the fact that makes success seem inevitable. "They succeeded because of their playbook" ignores the role of luck, timing, and all the paths that didn't work. LLMs are especially good at recognizing narratives; they're not good at questioning whether those narratives are retroactively constructed.

These aren't mistakes that show up in grammar or spelling. They're errors in the reasoning chain. A general LLM will read "Studies show X" and think "okay, that seems credible." It won't systematically check whether the conclusion actually follows from the premises.

Why These Gaps Are Hard to Catch

There are several reasons why general-purpose models struggle with these kinds of logical fallacies:

1. Training data bias: Most internet text doesn't explicitly call out logical fallacies. Articles present arguments as if they're sound; LLMs learn to mimic that style. They get really good at writing arguments that sound convincing. They don't develop a strong "skepticism module" for detecting flaws.

2. Lack of structured reasoning: General LLMs process text holistically. They understand the overall meaning; they don't break arguments down into premises, conclusions, and logical steps. To catch a fallacy, you need to trace the reasoning: "What evidence supports this claim? What alternative explanations exist? What assumptions are being made?"

3. No domain-specific knowledge: Catching statistical fallacies (like base rate neglect) requires understanding probability theory. Catching causal fallacies requires understanding study design. General models have some of this knowledge; it's not activated reliably when reading articles.

4. Overconfidence in authority: If an article cites a study or an expert, the model often treats that as sufficient proof. It doesn't automatically think "but how was the study designed?" or "but what did the expert actually say?"

How Fine-Tuning Helps

This is where fine-tuning comes in. Instead of training a model on random internet text, we train it specifically on examples of logical fallacies in real articles. Think of it like the difference between someone who's read everything and someone who's specifically trained to be a fact-checker.

Here's how it works:

  • Focused Training Data
    We feed the model thousands of examples where humans have identified logical fallacies. Each example shows the model: "Here's a passage that looks reasonable, but here's why the reasoning is flawed." Over time, it learns the patterns of bad reasoning; it learns more than just good writing style.
  • Explicit Reasoning Chains
    The fine-tuning process forces the model to break down arguments step-by-step. Instead of just saying "this seems wrong," it learns to identify: "This claims X causes Y, but there's a confounding variable Z that explains both, and the study didn't control for it."
  • Systematic Checklist
    A fine-tuned model develops something like a mental checklist: "When I see a causal claim, I check for reverse causation. When I see a study, I check for confounders. When I see success stories, I check for survivorship bias." General models don't have this systematic approach.
  • Feedback Loop
    As users flag false positives and missed issues, we retrain the model. It gets better at distinguishing between "genuinely problematic reasoning" and "valid arguments that happen to discuss uncertainties." This is iterative improvement; general models don't get this.
The key insight: A fine-tuned model is like a detective who's specifically trained to look for specific types of crimes, rather than a general police officer who knows a bit about everything but might miss subtle patterns.

This doesn't mean fine-tuned models are perfect. They still make mistakes. But they're systematically better at the specific task of detecting logical fallacies; they've been optimized for it, rather than trying to be good at everything.

General vs. Fine-Tuned: A Comparison

Aspect General-Purpose LLM Fine-Tuned Logic Model
Training focus Broad internet text (general knowledge) Logical fallacies and reasoning patterns
What it recognizes Well-written text, authoritative tone Specific fallacy patterns (confounders, bias, etc.)
Reasoning approach Holistic understanding Structured, step-by-step analysis
False positives Lower (doesn't flag much) Higher initially, improves with feedback
False negatives Higher (misses subtle fallacies) Lower (trained specifically to catch them)
Best for General conversation, summarizing Detecting logical gaps in arguments

It's the difference between a jack-of-all-trades and a specialist. Both have their place. When you need someone to catch logical fallacies in articles, you want the specialist.

The Bottom Line

General-purpose LLMs are incredible tools for understanding and generating language. They're not built for systematic logical reasoning. They'll miss fallacies that require thinking beyond the surface text; confounders, bias, statistical traps, and causal errors.

Fine-tuned models that are specifically trained to catch logical fallacies perform better. They've been optimized for that exact task. They learn the patterns of bad reasoning, develop systematic checklists, and improve through user feedback. They're not perfect, but they're purpose-built for finding the gaps that general models miss.

That's what SanityCheck does: it uses models that have been specifically trained to catch these reasoning gaps. You can catch the logical fallacies you'd normally miss.