Working notes on papers from OpenAI, Anthropic, Transformer Circuits, and Stanford CGRI, read for what they imply about deploying agents and rebuilding workflows in real organizations.
Anthropic on a method for surfacing — without telling it where to look — what's actually different between two LLMs. The operating-model implication for every team about to swap a model in production.
Anthropic's interpretability team on emotional patterns inside frontier models — and why that matters when one is sitting inside a client workflow.
OpenAI's own playbook for catching their internal coding agents going off-task — and what it tells every operator about deploying agents inside a real workflow.
Treating an LLM as a measurement instrument — and what it changes about how you instrument a workflow.
Reading OpenAI's Graviton paper for what it implies about the cost curve of running agents in production — not just training models.
Anthropic on what changes when an agent runs for hours instead of seconds — and why most workflows haven't caught up to it.
What it means when a frontier AI lab grounds its product decisions in 81,000 user interviews — and what it tells operators about disciplined research practice.
Stanford's Corporate Governance Research Initiative on what trust does — and what its absence costs — inside an organization.
We help leadership teams turn frontier research into operating-model decisions and production deployments. The first conversation is short, specific, and free.