LLMs Can Discover New Things — The Skeptics Were Wrong

Not long ago, a common critique of large language models was that they were fundamentally incapable of genuine discovery — sophisticated autocomplete at best, pattern-matching at worst. Whole Mars Catalog's pointed question this week captures a sentiment that's becoming harder to dismiss: the skeptics may have been wrong.

Whole Mars Catalog tweet questioning past skepticism about LLMs and discovery — Source: @wholemars — May 22, 2026

The argument against LLMs as discovery engines rested on a reasonable-sounding premise: auto-regressive models predict the next token based on prior context. They interpolate. They don't explore. But 2026 has produced a string of results that challenge that framing in concrete ways. Frameworks like CAESAR — an agentic AI system unveiled this month — are specifically designed to move beyond information retrieval, building dynamic knowledge graphs and refining outputs through iterative self-critique to generate original, cross-domain insights. That's a meaningful distinction from simply retrieving what's already known.

The practical evidence is stacking up elsewhere too. In pharmaceutical research, LLM-assisted clinical trial workflows have compressed specific trial phases by 20–35%, with accuracy rates of 94–97% against human-reviewed benchmarks. Protein folding simulations and drug candidate screening are areas where these models aren't just summarizing existing literature — they're surfacing non-obvious connections researchers hadn't yet made. Whether that constitutes "discovery" in a philosophical sense is a fair debate. Whether it's producing new, actionable scientific knowledge is less debatable.

The broader shift in 2026 has been from scale to capability density. The conversation has moved away from parameter counts toward reasoning depth, agentic planning, and multi-step workflow execution. That evolution matters for the discovery question: a model that can plan, execute, observe results, and revise its approach is operating in a fundamentally different regime than one that simply completes a prompt. The original skepticism wasn't unreasonable given where the technology stood — it just hasn't aged well.

Sources & reporting notes

The links below identify the material source records used for this report.

@wholemars on X (2026-05-22T03:19:20.000Z) — Direct source

Source links are preserved as published or accessed. See our editorial standards and corrections policy.

BASENOR Editorial Desk

BASENOR Newsroom

The BASENOR Editorial Desk covers Tesla, SpaceX, and related technology, curating reporting from primary sources — official accounts, regulatory filings, and software release data. Every article passes source-record and fact-checking review before publication. About the newsroom.

This report was curated by the BASENOR Editorial Desk from the sources listed above. Read our editorial standards or email editorial@basenor.com to report an error.

Tags: Ai & robotics

Stay in the Loop

Join 27,000+ Tesla owners who get our tips first — plus 10% OFF

Shop Tesla Accessories — Free USA Shipping