Skip to content

The Marimo Paradigm: Reactive Notebooks for Agent-Driven Science

What is Marimo

Marimo is a reactive notebook framework that stores notebooks as pure Python .py files, backed by a dataflow graph (DAG) of cell dependencies. When one cell changes, only affected downstream cells re-execute, in the correct order. This eliminates Jupyter's reproducibility crisis (hidden state, out-of-order execution, ghost variables).

Created by Akshay Agrawal (Stanford PhD, former Google Brain), marimo was motivated by daily reproducibility bugs and the failure of runtime-tracing approaches (like ipyflow). He chose static analysis of Python ASTs to detect dependencies -- providing guarantees about which cells execute.

Core Architecture

Dataflow graph over imperative execution

Marimo maintains a directed acyclic graph of cell dependencies. Each cell "listens" to variables defined by other cells. Changes propagate automatically. This is fundamentally different from Jupyter where execution order is manual and In [27]: tells you nothing about dependency order.

Key consequence: a marimo notebook is a well-defined program, not a scratchpad. It can be: - Run as a script: python notebook.py - Deployed as a web app: marimo run notebook.py - Deployed via WebAssembly (Pyodide) for static hosting - Version-controlled with clean git diffs (no JSON, no base64 outputs)

Storage format

Pure .py files. No JSON wrapper, no embedded outputs, no metadata bloat. Standard Python tooling (pytest, ruff, linters) works directly. This is what makes agent integration natural.

Intelligent caching

Marimo identifies precisely which cells changed, avoiding expensive re-computation while ensuring dependent cells update. This is critical for scientific workflows where some cells (data loading, model fitting) are expensive.

Agent Integration Model

The --watch workflow

The recommended Claude Code + marimo workflow:

Terminal 1: marimo edit notebook.py --watch
Terminal 2: claude  (or any coding agent)

The --watch flag detects filesystem changes and automatically reloads edits in the browser. The agent edits the .py file on disk; the human immediately sees the updated notebook rendered in marimo's UI. This creates a tight feedback loop without the agent needing to understand notebook state.

Why this works for agents

  1. Plain Python files: LLMs read and write .py natively -- no JSON structure fighting
  2. Deterministic execution: When the agent modifies a cell, the dataflow engine guarantees correct downstream re-execution
  3. Self-checking: Agents can run marimo check notebook.py to validate structural integrity (variable redefinition, invalid cells)
  4. Sandbox execution: uv run notebook.py --sandbox provides isolated verification
  5. Code-first interactivity: Interactive elements (sliders, dropdowns) are created via Python code, not UI clicks -- natural for LLM generation

Schema and runtime context injection

Flath/Warmerdam describe the key innovation: marimo injects DataFrame schemas and first rows into the LLM prompt, giving it runtime awareness of data structure. When a Python object is received, marimo converts it to a prompt-suitable representation with special handling for datasets.

Open design question from Warmerdam: "Should we allow users to customize how variables are converted into prompt content?"

The Bespoken framework

Vincent Warmerdam built a complementary tool called Bespoken -- task-specific agents with minimal defaults but precise configurability. Key patterns: - Slash commands as Python functions with UI library for user input - Constraint-based architecture: agents scoped to specific tasks - Philosophy: "What would be the simplest API such that it's 80-100% configurable with just a couple of simple Python functions?"

Marimo vs. Pipeio's Current Notebook Model

Aspect Pipeio (current) Marimo
Source format .py percent-format (jupytext) .py native marimo format
Execution Papermill (offline, parameterized) DAG-reactive (live or script)
State model Stateful kernel (Jupyter) Stateless DAG (deterministic)
Sync model Bidirectional .py <-> .ipynb No sync needed (single format)
Human interface Jupyter Lab (via .ipynb) Marimo editor (via .py)
Agent interface .py percent-format (read/write) .py native (read/write)
Validation nb_audit (lifecycle checks) marimo check (structural)
Interactivity Jupyter widgets (in .ipynb only) Code-first widgets (in .py)
Parameter injection RunCard + papermill Cell-level reactivity

Where marimo is stronger

  • Reproducibility: DAG guarantees eliminate hidden state
  • Agent-friendliness: Single .py format, no sync needed, structural validation
  • Live feedback: --watch provides real-time human oversight of agent edits
  • Interactive exploration: Reactive widgets without separate UI framework

Where pipeio's model is stronger

  • Pipeline integration: Snakemake rules, flow-scoped organization, mod promotion
  • Parameterized batch execution: RunCard + papermill for HPC/cluster runs
  • Lifecycle management: draft -> active -> promoted -> archived with audit
  • Multi-kernel support: Different Jupyter kernels per notebook (cogpy, neuropy-env)
  • Publication pipeline: MyST/HTML output integrated with MkDocs site

Practical Recommendations from the Sources

From the marimo blog -- 8 strategies for Claude Code + marimo:

  1. Download marimo's CLAUDE.md prompt file for domain knowledge
  2. Use --watch for real-time feedback (the core pattern)
  3. Reference pre-existing notebooks rather than generating from scratch
  4. Toggle planning mode (Shift+Tab) for extensive generation
  5. Request markdown documentation alongside code
  6. Use --sandbox mode for self-checking
  7. Structure DataFrame operations as functions for pipe() chaining
  8. Have Claude read data files for contextual understanding before generating analysis

From Eric Ma -- validation loop:

  • Always run marimo check after edits (instruction to agent via AGENTS.md)
  • Use uvx marimo edit --sandbox notebook.py --watch as the standard invocation
  • Compatible agents: Claude Code, Cursor, GitHub Copilot

The Metacognition Question

Mineault's central warning applies to any agent-notebook integration: "Using the tools proficiently is feasible when you have gone through the hard work of writing your own code by yourself, failing repeatedly, picking yourself back up; I don't know of another way of getting to that level of metacognition."

The risk with reactive notebooks + agents: the feedback loop is so fast that validation may be skipped. The antidote is systematic visual inspection (cheap abundant plots) and structural validation (automated checks), not slower iteration.