Skip to content

Pipeio Notebook Integration: Concrete Recommendations from Agent-Notebook Research

Context

This note translates findings from the agent-notebook integration research (companion notes in this series) into concrete recommendations for pipeio's notebook system. Each recommendation is grounded in external evidence and assessed against pipeio's current architecture.

Recommendation 1: Add marimo as an alternative notebook backend

What

Support marimo-native .py notebooks alongside the current jupytext percent-format, selectable per-notebook in notebook.yml.

Why

Every source surveyed validates the reactive DAG model for agent-driven scientific work. Marimo eliminates the sync overhead (no .py <-> .ipynb bidirectional sync needed), provides built-in structural validation (marimo check), and enables live --watch feedback during agent editing. Eric Ma's benchmarks show Claude Opus achieves 100% instruction adherence on marimo notebooks.

How

Extend notebook.yml with a format field:

entries:
  - path: notebooks/explore/.src/investigate_noise.py
    format: percent          # default, current behavior (jupytext)
    kind: investigate
    # ...

  - path: notebooks/explore/.src/interactive_analysis.py
    format: marimo           # new: marimo-native .py
    kind: explore
    # ...

Implementation sketch: - nb_sync: skip for marimo-format notebooks (single format, no sync needed) - nb_exec: route to marimo run instead of papermill for marimo-format - nb_publish: use marimo's HTML export or convert to MyST - nb_audit: call marimo check for structural validation - nb_read: works as-is (both formats are .py) - nb_analyze: extend AST analysis to recognize marimo cell decorators

Effort and risk

Medium effort. The .py-as-source-of-truth principle is preserved. Main risk: marimo's cell format differs from jupytext percent-format, so nb_promote would need format-aware script extraction. Marimo is still a younger framework than jupytext (smaller ecosystem, evolving API).

Priority

High for exploratory notebooks (kind: investigate/explore). Lower for demo/validate notebooks where parameterized batch execution via papermill is more appropriate.

Recommendation 2: Add structural validation to nb_audit

What

Extend nb_audit to run format-specific structural validation on all active notebooks. For percent-format: check import isolation, cell ordering, variable shadowing. For marimo-format: run marimo check.

Why

Eric Ma's benchmarks show that every LLM -- including top-tier models -- violates import isolation to some degree. Structural validation catches what code review misses. The marimo team and Eric Ma both recommend running validation after every agent edit.

How

Add a --structural flag to nb_audit (or make it default):

def nb_audit_structural(entry: NotebookEntry) -> list[Issue]:
    if entry.format == "marimo":
        result = subprocess.run(["marimo", "check", entry.path], capture_output=True)
        return parse_marimo_check_output(result.stdout)
    else:
        # percent-format: custom AST-based checks
        analysis = analyze_notebook(entry.path)
        issues = []
        issues += check_import_isolation(analysis)  # imports in dedicated cells
        issues += check_variable_shadowing(analysis)  # no redefined names
        issues += check_markdown_documentation(analysis)  # md before code cells
        return issues

Effort and risk

Low effort. Builds on existing nb_audit and analyze_notebook infrastructure. No risk to existing functionality -- purely additive.

Priority

High. This is the easiest win and directly addresses the most common agent failure mode.

Recommendation 3: Implement a --watch integration for nb_sync

What

Add a filesystem watcher mode to pipeio that monitors .py source files and auto-triggers nb_sync when changes are detected, providing a live feedback loop similar to marimo's --watch but within pipeio's jupytext-based workflow.

Why

The --watch pattern is the linchpin that makes agent-driven notebook development practical (Eric Ma). For users who stay with jupytext percent-format (e.g., because they need Jupyter Lab for heavy interactive work or specific kernel environments), having the .ipynb update automatically when the agent edits the .py gives the same "agent writes, human sees" feedback loop.

How

# New CLI command or MCP tool
projio pipeio nb-watch --flow myflow
# or
pipeio_nb_watch(flow="myflow")

Implementation: use watchdog or inotify to monitor .src/ directories. On .py modification, run nb_sync_one(direction="py2nb") for the changed file. Optionally forward to Jupyter Lab via its REST API to refresh the open notebook.

Effort and risk

Medium effort. The sync logic already exists in nb_sync_one. The watcher is new infrastructure but straightforward. Risk: rapid repeated edits could cause race conditions -- debounce with a 1-2 second delay.

Priority

Medium. Most valuable for users who need Jupyter Lab but want agent-driven editing.

Recommendation 4: Schema injection for notebook context

What

Extend pipeio_nb_analyze or create a new pipeio_nb_context tool that provides agents with runtime-aware context about a notebook's data dependencies -- what datasets it loads, their schemas, sample rows, and pipeline stage outputs it accesses.

Why

Flath/Warmerdam identify schema injection as the key enabler for data-aware agents. Marimo does this automatically via its runtime; pipeio can do it via static analysis of PipelineContext usage combined with registry metadata.

How

Given a notebook that uses ctx.path("preproc", "bold", subject="sub-01"), the tool would:

  1. Resolve the path via the pipeline registry
  2. Inspect the output file (NIfTI header, CSV schema, JSON structure)
  3. Return structured context:
{
  "notebook": "investigate_noise.py",
  "data_dependencies": [
    {
      "group": "preproc",
      "member": "bold",
      "format": "nifti",
      "shape": [91, 109, 91, 200],
      "voxel_size": [2.0, 2.0, 2.0],
      "sample_entity": {"subject": "sub-01", "session": "ses-01"}
    }
  ],
  "imports": ["numpy", "nibabel", "textlib"],
  "runcard_params": {"threshold": 0.5, "n_components": 10}
}

This gives the agent the equivalent of marimo's schema injection without requiring marimo's runtime.

Effort and risk

Medium-high effort. Requires format-specific inspectors (NIfTI, CSV, JSON, etc.). Could start with CSV/TSV (schema = column names + dtypes from first rows) and expand. Risk: large files need sampling, not full reads.

Priority

Medium-high. This is the most architecturally novel recommendation and addresses a gap that no existing pipeio tool fills.

Recommendation 5: Notebook kind "interactive" for marimo-backed exploration

What

Add a new notebook kind: interactive to the taxonomy alongside investigate/explore/demo/validate, specifically for marimo-backed live exploration sessions that are not intended for batch execution.

Why

Marimo notebooks serve a different purpose than parameterized analysis notebooks. They are for interactive exploration, hypothesis testing, and data discovery -- the "creative work with data" that the marimo blog distinguishes from pure software engineering. Mixing them with batch-executable percent-format notebooks in the same kind categories creates lifecycle confusion.

How

entries:
  - path: notebooks/explore/.src/live_data_explorer.py
    format: marimo
    kind: interactive          # new kind
    status: active
    description: "Live ECoG signal explorer with reactive parameter tuning"
    publish_html: true         # marimo HTML export

Lifecycle rules for interactive: - Never executed via papermill (skipped by nb_exec) - Published via marimo export html instead of nbconvert - Not subject to "should be promoted" audit warnings (interactive notebooks persist) - Can link to a flow but are not expected to produce reproducible script output

Effort and risk

Low effort. Extends existing kind enum and lifecycle rules. Risk: users might overuse interactive kind to avoid promotion discipline -- mitigate with nb_audit guidance.

Priority

Medium. Depends on Recommendation 1 (marimo backend support).

Recommendation 6: Separate processing from visualization in notebook templates

What

Update nb_create bootstrap templates to enforce Mineault's principle: separate data processing cells from visualization cells, with clear section headers.

Why

Mineault: "That code is brittle, works on different timescales (you iterate on plots constantly; you shouldn't iterate on data processing constantly)." Every agent-notebook source emphasizes this separation. Pipeio's existing nb_create bootstrap could encode this convention.

How

Update the notebook template scaffold:

# %% [markdown]
# # {title}
# {description}

# %% [markdown]
# ## Setup

# %%
# imports

# %% [markdown]
# ## Data Loading and Processing

# %%
# data loading and transformation logic

# %% [markdown]
# ## Visualization and Analysis

# %%
# plots and statistical summaries

# %% [markdown]
# ## Findings
# (agent or human writes narrative here)

Effort and risk

Very low effort. Template-only change. No risk.

Priority

High. Immediate, low-cost improvement that aligns with field consensus.

Implementation Roadmap

Phase 1 (immediate, low effort)

  • Recommendation 6: Update nb_create templates (processing/visualization separation)
  • Recommendation 2: Add structural validation to nb_audit

Phase 2 (near-term, medium effort)

  • Recommendation 3: --watch integration for nb_sync
  • Recommendation 4: Schema injection via nb_context tool

Phase 3 (strategic, higher effort)

  • Recommendation 1: Marimo as alternative notebook backend
  • Recommendation 5: Interactive notebook kind
  • FigureSpec already separates "what to compute" from "how to render" -- the same principle applied to figures
  • The code tiers model (notebooks -> scripts -> utils -> core) provides the promotion discipline that interactive notebooks should complement, not replace
  • RunCard + PipelineContext provide parameterized execution that marimo's reactivity does not replace but could enhance for exploration phases