Pipeio Notebook Integration: Concrete Recommendations from Agent-Notebook Research
Context¶
This note translates findings from the agent-notebook integration research (companion notes in this series) into concrete recommendations for pipeio's notebook system. Each recommendation is grounded in external evidence and assessed against pipeio's current architecture.
Recommendation 1: Add marimo as an alternative notebook backend¶
What¶
Support marimo-native .py notebooks alongside the current jupytext percent-format, selectable per-notebook in notebook.yml.
Why¶
Every source surveyed validates the reactive DAG model for agent-driven scientific work. Marimo eliminates the sync overhead (no .py <-> .ipynb bidirectional sync needed), provides built-in structural validation (marimo check), and enables live --watch feedback during agent editing. Eric Ma's benchmarks show Claude Opus achieves 100% instruction adherence on marimo notebooks.
How¶
Extend notebook.yml with a format field:
entries:
- path: notebooks/explore/.src/investigate_noise.py
format: percent # default, current behavior (jupytext)
kind: investigate
# ...
- path: notebooks/explore/.src/interactive_analysis.py
format: marimo # new: marimo-native .py
kind: explore
# ...
Implementation sketch:
- nb_sync: skip for marimo-format notebooks (single format, no sync needed)
- nb_exec: route to marimo run instead of papermill for marimo-format
- nb_publish: use marimo's HTML export or convert to MyST
- nb_audit: call marimo check for structural validation
- nb_read: works as-is (both formats are .py)
- nb_analyze: extend AST analysis to recognize marimo cell decorators
Effort and risk¶
Medium effort. The .py-as-source-of-truth principle is preserved. Main risk: marimo's cell format differs from jupytext percent-format, so nb_promote would need format-aware script extraction. Marimo is still a younger framework than jupytext (smaller ecosystem, evolving API).
Priority¶
High for exploratory notebooks (kind: investigate/explore). Lower for demo/validate notebooks where parameterized batch execution via papermill is more appropriate.
Recommendation 2: Add structural validation to nb_audit¶
What¶
Extend nb_audit to run format-specific structural validation on all active notebooks. For percent-format: check import isolation, cell ordering, variable shadowing. For marimo-format: run marimo check.
Why¶
Eric Ma's benchmarks show that every LLM -- including top-tier models -- violates import isolation to some degree. Structural validation catches what code review misses. The marimo team and Eric Ma both recommend running validation after every agent edit.
How¶
Add a --structural flag to nb_audit (or make it default):
def nb_audit_structural(entry: NotebookEntry) -> list[Issue]:
if entry.format == "marimo":
result = subprocess.run(["marimo", "check", entry.path], capture_output=True)
return parse_marimo_check_output(result.stdout)
else:
# percent-format: custom AST-based checks
analysis = analyze_notebook(entry.path)
issues = []
issues += check_import_isolation(analysis) # imports in dedicated cells
issues += check_variable_shadowing(analysis) # no redefined names
issues += check_markdown_documentation(analysis) # md before code cells
return issues
Effort and risk¶
Low effort. Builds on existing nb_audit and analyze_notebook infrastructure. No risk to existing functionality -- purely additive.
Priority¶
High. This is the easiest win and directly addresses the most common agent failure mode.
Recommendation 3: Implement a --watch integration for nb_sync¶
What¶
Add a filesystem watcher mode to pipeio that monitors .py source files and auto-triggers nb_sync when changes are detected, providing a live feedback loop similar to marimo's --watch but within pipeio's jupytext-based workflow.
Why¶
The --watch pattern is the linchpin that makes agent-driven notebook development practical (Eric Ma). For users who stay with jupytext percent-format (e.g., because they need Jupyter Lab for heavy interactive work or specific kernel environments), having the .ipynb update automatically when the agent edits the .py gives the same "agent writes, human sees" feedback loop.
How¶
# New CLI command or MCP tool
projio pipeio nb-watch --flow myflow
# or
pipeio_nb_watch(flow="myflow")
Implementation: use watchdog or inotify to monitor .src/ directories. On .py modification, run nb_sync_one(direction="py2nb") for the changed file. Optionally forward to Jupyter Lab via its REST API to refresh the open notebook.
Effort and risk¶
Medium effort. The sync logic already exists in nb_sync_one. The watcher is new infrastructure but straightforward. Risk: rapid repeated edits could cause race conditions -- debounce with a 1-2 second delay.
Priority¶
Medium. Most valuable for users who need Jupyter Lab but want agent-driven editing.
Recommendation 4: Schema injection for notebook context¶
What¶
Extend pipeio_nb_analyze or create a new pipeio_nb_context tool that provides agents with runtime-aware context about a notebook's data dependencies -- what datasets it loads, their schemas, sample rows, and pipeline stage outputs it accesses.
Why¶
Flath/Warmerdam identify schema injection as the key enabler for data-aware agents. Marimo does this automatically via its runtime; pipeio can do it via static analysis of PipelineContext usage combined with registry metadata.
How¶
Given a notebook that uses ctx.path("preproc", "bold", subject="sub-01"), the tool would:
- Resolve the path via the pipeline registry
- Inspect the output file (NIfTI header, CSV schema, JSON structure)
- Return structured context:
{
"notebook": "investigate_noise.py",
"data_dependencies": [
{
"group": "preproc",
"member": "bold",
"format": "nifti",
"shape": [91, 109, 91, 200],
"voxel_size": [2.0, 2.0, 2.0],
"sample_entity": {"subject": "sub-01", "session": "ses-01"}
}
],
"imports": ["numpy", "nibabel", "textlib"],
"runcard_params": {"threshold": 0.5, "n_components": 10}
}
This gives the agent the equivalent of marimo's schema injection without requiring marimo's runtime.
Effort and risk¶
Medium-high effort. Requires format-specific inspectors (NIfTI, CSV, JSON, etc.). Could start with CSV/TSV (schema = column names + dtypes from first rows) and expand. Risk: large files need sampling, not full reads.
Priority¶
Medium-high. This is the most architecturally novel recommendation and addresses a gap that no existing pipeio tool fills.
Recommendation 5: Notebook kind "interactive" for marimo-backed exploration¶
What¶
Add a new notebook kind: interactive to the taxonomy alongside investigate/explore/demo/validate, specifically for marimo-backed live exploration sessions that are not intended for batch execution.
Why¶
Marimo notebooks serve a different purpose than parameterized analysis notebooks. They are for interactive exploration, hypothesis testing, and data discovery -- the "creative work with data" that the marimo blog distinguishes from pure software engineering. Mixing them with batch-executable percent-format notebooks in the same kind categories creates lifecycle confusion.
How¶
entries:
- path: notebooks/explore/.src/live_data_explorer.py
format: marimo
kind: interactive # new kind
status: active
description: "Live ECoG signal explorer with reactive parameter tuning"
publish_html: true # marimo HTML export
Lifecycle rules for interactive:
- Never executed via papermill (skipped by nb_exec)
- Published via marimo export html instead of nbconvert
- Not subject to "should be promoted" audit warnings (interactive notebooks persist)
- Can link to a flow but are not expected to produce reproducible script output
Effort and risk¶
Low effort. Extends existing kind enum and lifecycle rules. Risk: users might overuse interactive kind to avoid promotion discipline -- mitigate with nb_audit guidance.
Priority¶
Medium. Depends on Recommendation 1 (marimo backend support).
Recommendation 6: Separate processing from visualization in notebook templates¶
What¶
Update nb_create bootstrap templates to enforce Mineault's principle: separate data processing cells from visualization cells, with clear section headers.
Why¶
Mineault: "That code is brittle, works on different timescales (you iterate on plots constantly; you shouldn't iterate on data processing constantly)." Every agent-notebook source emphasizes this separation. Pipeio's existing nb_create bootstrap could encode this convention.
How¶
Update the notebook template scaffold:
# %% [markdown]
# # {title}
# {description}
# %% [markdown]
# ## Setup
# %%
# imports
# %% [markdown]
# ## Data Loading and Processing
# %%
# data loading and transformation logic
# %% [markdown]
# ## Visualization and Analysis
# %%
# plots and statistical summaries
# %% [markdown]
# ## Findings
# (agent or human writes narrative here)
Effort and risk¶
Very low effort. Template-only change. No risk.
Priority¶
High. Immediate, low-cost improvement that aligns with field consensus.
Implementation Roadmap¶
Phase 1 (immediate, low effort)¶
- Recommendation 6: Update nb_create templates (processing/visualization separation)
- Recommendation 2: Add structural validation to nb_audit
Phase 2 (near-term, medium effort)¶
- Recommendation 3: --watch integration for nb_sync
- Recommendation 4: Schema injection via nb_context tool
Phase 3 (strategic, higher effort)¶
- Recommendation 1: Marimo as alternative notebook backend
- Recommendation 5: Interactive notebook kind
Related Work in Projio¶
- FigureSpec already separates "what to compute" from "how to render" -- the same principle applied to figures
- The code tiers model (notebooks -> scripts -> utils -> core) provides the promotion discipline that interactive notebooks should complement, not replace
- RunCard + PipelineContext provide parameterized execution that marimo's reactivity does not replace but could enhance for exploration phases