When we evaluate knowledge systems, we typically measure precision (how relevant are the results?) and recall (did we find everything relevant?). These metrics made sense for document retrieval. They fail for knowledge work.
How LLMs Model Mental States
Large language models exhibit surprising capabilities in reasoning about beliefs, intentions, and knowledge states. We explore what this means for building systems that truly understand context.
