The Marimo Paradigm: Reactive Notebooks for Agent-Driven Science
What is Marimo¶
Marimo is a reactive notebook framework that stores notebooks as pure Python .py files, backed by a dataflow graph (DAG) of cell dependencies. When one cell changes, only affected downstream cells re-execute, in the correct order. This eliminates Jupyter's reproducibility crisis (hidden state, out-of-order execution, ghost variables).
Created by Akshay Agrawal (Stanford PhD, former Google Brain), marimo was motivated by daily reproducibility bugs and the failure of runtime-tracing approaches (like ipyflow). He chose static analysis of Python ASTs to detect dependencies -- providing guarantees about which cells execute.
Core Architecture¶
Dataflow graph over imperative execution¶
Marimo maintains a directed acyclic graph of cell dependencies. Each cell "listens" to variables defined by other cells. Changes propagate automatically. This is fundamentally different from Jupyter where execution order is manual and In [27]: tells you nothing about dependency order.
Key consequence: a marimo notebook is a well-defined program, not a scratchpad. It can be:
- Run as a script: python notebook.py
- Deployed as a web app: marimo run notebook.py
- Deployed via WebAssembly (Pyodide) for static hosting
- Version-controlled with clean git diffs (no JSON, no base64 outputs)
Storage format¶
Pure .py files. No JSON wrapper, no embedded outputs, no metadata bloat. Standard Python tooling (pytest, ruff, linters) works directly. This is what makes agent integration natural.
Intelligent caching¶
Marimo identifies precisely which cells changed, avoiding expensive re-computation while ensuring dependent cells update. This is critical for scientific workflows where some cells (data loading, model fitting) are expensive.
Agent Integration Model¶
The --watch workflow¶
The recommended Claude Code + marimo workflow:
Terminal 1: marimo edit notebook.py --watch
Terminal 2: claude (or any coding agent)
The --watch flag detects filesystem changes and automatically reloads edits in the browser. The agent edits the .py file on disk; the human immediately sees the updated notebook rendered in marimo's UI. This creates a tight feedback loop without the agent needing to understand notebook state.
Why this works for agents¶
- Plain Python files: LLMs read and write
.pynatively -- no JSON structure fighting - Deterministic execution: When the agent modifies a cell, the dataflow engine guarantees correct downstream re-execution
- Self-checking: Agents can run
marimo check notebook.pyto validate structural integrity (variable redefinition, invalid cells) - Sandbox execution:
uv run notebook.py --sandboxprovides isolated verification - Code-first interactivity: Interactive elements (sliders, dropdowns) are created via Python code, not UI clicks -- natural for LLM generation
Schema and runtime context injection¶
Flath/Warmerdam describe the key innovation: marimo injects DataFrame schemas and first rows into the LLM prompt, giving it runtime awareness of data structure. When a Python object is received, marimo converts it to a prompt-suitable representation with special handling for datasets.
Open design question from Warmerdam: "Should we allow users to customize how variables are converted into prompt content?"
The Bespoken framework¶
Vincent Warmerdam built a complementary tool called Bespoken -- task-specific agents with minimal defaults but precise configurability. Key patterns: - Slash commands as Python functions with UI library for user input - Constraint-based architecture: agents scoped to specific tasks - Philosophy: "What would be the simplest API such that it's 80-100% configurable with just a couple of simple Python functions?"
Marimo vs. Pipeio's Current Notebook Model¶
| Aspect | Pipeio (current) | Marimo |
|---|---|---|
| Source format | .py percent-format (jupytext) |
.py native marimo format |
| Execution | Papermill (offline, parameterized) | DAG-reactive (live or script) |
| State model | Stateful kernel (Jupyter) | Stateless DAG (deterministic) |
| Sync model | Bidirectional .py <-> .ipynb |
No sync needed (single format) |
| Human interface | Jupyter Lab (via .ipynb) |
Marimo editor (via .py) |
| Agent interface | .py percent-format (read/write) |
.py native (read/write) |
| Validation | nb_audit (lifecycle checks) |
marimo check (structural) |
| Interactivity | Jupyter widgets (in .ipynb only) |
Code-first widgets (in .py) |
| Parameter injection | RunCard + papermill | Cell-level reactivity |
Where marimo is stronger¶
- Reproducibility: DAG guarantees eliminate hidden state
- Agent-friendliness: Single
.pyformat, no sync needed, structural validation - Live feedback:
--watchprovides real-time human oversight of agent edits - Interactive exploration: Reactive widgets without separate UI framework
Where pipeio's model is stronger¶
- Pipeline integration: Snakemake rules, flow-scoped organization, mod promotion
- Parameterized batch execution: RunCard + papermill for HPC/cluster runs
- Lifecycle management: draft -> active -> promoted -> archived with audit
- Multi-kernel support: Different Jupyter kernels per notebook (cogpy, neuropy-env)
- Publication pipeline: MyST/HTML output integrated with MkDocs site
Practical Recommendations from the Sources¶
From the marimo blog -- 8 strategies for Claude Code + marimo:
- Download marimo's CLAUDE.md prompt file for domain knowledge
- Use
--watchfor real-time feedback (the core pattern) - Reference pre-existing notebooks rather than generating from scratch
- Toggle planning mode (Shift+Tab) for extensive generation
- Request markdown documentation alongside code
- Use
--sandboxmode for self-checking - Structure DataFrame operations as functions for pipe() chaining
- Have Claude read data files for contextual understanding before generating analysis
From Eric Ma -- validation loop:
- Always run
marimo checkafter edits (instruction to agent via AGENTS.md) - Use
uvx marimo edit --sandbox notebook.py --watchas the standard invocation - Compatible agents: Claude Code, Cursor, GitHub Copilot
The Metacognition Question¶
Mineault's central warning applies to any agent-notebook integration: "Using the tools proficiently is feasible when you have gone through the hard work of writing your own code by yourself, failing repeatedly, picking yourself back up; I don't know of another way of getting to that level of metacognition."
The risk with reactive notebooks + agents: the feedback loop is so fast that validation may be skipped. The antidote is systematic visual inspection (cheap abundant plots) and structural validation (automated checks), not slower iteration.