How is this different from ChatGPT or Claude?

ChatGPT and Claude start fresh every conversation. They don't know your domain, your competitors, or what you decided last quarter. Penumbra gives AI persistent memory of your strategic context. Every session builds on everything you've ever uploaded.

What does 'start at square two' mean?

Every strategic effort you've ever started, you started from scratch. Blank doc. Empty whiteboard. Re-explaining context. That's square one. With Penumbra, your context is already there. Your thinking has accumulated. You're not rebuilding — you're building on.

PDFs, Word docs, text files, Markdown, audio recordings, video files. We also connect to Google Docs, Notion, and Confluence. If it contains knowledge worth keeping, we can work with it.

Encrypted at rest and in transit. Your knowledge graph is completely isolated — we never train on your data or share it. You own your data, always.

How do I get started?

The beta is live now. You can start a 14-day free trial from the pricing page.

Beta is live — start your free trial→

PENUMBRA

Start Free Trial

Back to Research

Language ModelsJanuary 22, 2025

How LLMs Model Mental States

Large language models exhibit surprising capabilities in reasoning about beliefs, intentions, and knowledge states. We explore what this means for building systems that truly understand context.

Shep Bryan

Founder

The geometry of belief representation in transformer architectures.

When you read a story about Sally who puts a marble in a basket, leaves the room, and returns after Anne has moved it—you effortlessly know that Sally will look in the basket. This is theory of mind: the ability to attribute mental states to others and use those attributions to predict behavior.

The Surprising Competence of Language Models

Recent large language models pass classic theory of mind tests with remarkable consistency. They correctly predict that Sally will look where she last saw the marble, not where it actually is. They distinguish between what characters know and what they believe. They track how knowledge transfers between agents.

The question isn't whether LLMs can solve theory of mind tasks. They clearly can. The question is whether they're doing something like what humans do, or something entirely different that produces the same outputs.

Representation vs. Simulation

Two hypotheses dominate the debate:

LLMs build genuine representations of mental states—internal structures that track who knows what
LLMs simulate surface patterns—they've seen enough stories about false beliefs that they predict the next token correctly without 'understanding' belief

Our research suggests a third possibility: LLMs develop something functionally equivalent to mental state tracking, even if the mechanism differs from human cognition. The geometry of their internal representations shows systematic structure when processing belief-related content.

Implications for Knowledge Systems

If LLMs can model mental states, they can potentially model what users know, believe, and need. A knowledge system that understands not just what information exists, but what the user's mental model looks like, could surface precisely the right context at the right time.

This is the research direction we're pursuing at Penumbra: systems that don't just retrieve information, but understand the cognitive state of the person seeking it.

Open Questions

How robust are these capabilities to distribution shift?
Can we extract and examine the mental state representations directly?
What happens when mental state reasoning conflicts with other objectives?
How do we build systems that maintain accurate user models over time?

This is part of our ongoing research into cognitive architectures for knowledge systems. Follow our research updates for more on how we're building systems that truly understand context.

Mental Models RAG Semantic Memory

Research by

Shep Bryan

Founder

Shep is the founder of Penumbra, building knowledge systems that transform how teams capture, connect, and leverage institutional intelligence for strategic decisions.

Related Research

Jan 18, 2025

The Limits of RAG for Knowledge Work

Retrieval-augmented generation has become the default approach for knowledge-intensive tasks. But its assumptions about how humans organize and use knowledge may be fundamentally flawed.

Read article→

Jan 12, 2025

Semantic Memory in Human and Machine Cognition

How do humans organize conceptual knowledge, and what can this teach us about building better knowledge systems? A comparative analysis.

Read article→

Abstract geometric network visualization

Jan 6, 2025

Beyond Embeddings: Structured Knowledge in the Age of LLMs

Vector embeddings excel at similarity but struggle with relationships. We explore hybrid architectures that combine the best of neural and symbolic approaches.

Read article→