Skip to content

Feature request: pipeio_nb_report — extract notebook results into a structured report

Problem

Explore notebooks accumulate many cells (imports, data loading, reshaping, intermediate debugging) mixed with the actual findings (figures, metrics, interpretations). Reading through a 500+ line notebook to find the key results is slow, especially for sharing with collaborators or preparing meeting notes.

Proposed tool: pipeio_nb_report(flow, name)

An MCP tool that extracts the meaningful outputs from an executed notebook and produces a structured report.

Implementation approach

Use nbconvert's MarkdownExporter + ExtractOutputPreprocessor (already installed, no new deps):

from nbconvert import MarkdownExporter
from nbconvert.preprocessors import ExtractOutputPreprocessor

exporter = MarkdownExporter()
exporter.register_preprocessor(ExtractOutputPreprocessor, enabled=True)
body, resources = exporter.from_filename("notebook.ipynb")
# body = markdown with image references
# resources['outputs'] = dict of {filename: bytes} for extracted PNGs

Tool behavior

  1. Read the executed .ipynb (from the workspace dir, after pipeio_nb_exec)
  2. Run nbconvert MarkdownExporter + ExtractOutputPreprocessor
  3. Save extracted PNGs to docs/assets/reports/<flow>/<name>/
  4. Return structured payload:
  5. markdown_cells: the narrative text (what's being computed, why)
  6. figures: [{path, cell_index, caption_if_any}]
  7. text_outputs: [{cell_index, content}] (print statements with metrics)
  8. code_cells: optional, for context

Companion slash command: /report

Takes the pipeio_nb_report output and asks the agent to write a result note (in docs/log/result/) with: - Concise description of each analysis step (the math, not the code) - Embedded figures with captions and interpretation - Key metrics highlighted - Overall conclusions

Cell tagging (optional enhancement)

Allow notebook authors to tag cells for inclusion in the report using a comment marker (e.g., # %% [report] or # REPORT: prefix in markdown cells). The tool would then only extract tagged cells, producing a curated subset rather than the full notebook dump.

Longer-term: MyST embed

We already generate .myst paired files via jupytext. MyST-NB supports labeling cells (# | label: my-figure) and embedding outputs in any other page. This would allow docs/plan/ or docs/manuscript/ pages to directly embed notebook figures without copying. But this requires adopting MyST-NB rendering in the mkdocs build.

References

  • nbconvert ExtractOutputPreprocessor: https://nbconvert.readthedocs.io/en/latest/nbconvert_library.html
  • MyST embed/reuse: https://mystmd.org/guide/reuse-jupyter-outputs
  • Junix (simpler CLI alternative): https://pypi.org/project/junix/
  • Motivated by pixecog TTL characterization notebook (30+ cells, key findings buried in 5-6 plots)

Source context: pixecog

PixEcog (pixecog): Neuropixels and ECoG dataset and analysis

Recent commits:

6b295b2 Update badlabel audit note: full pipeline comparison (85 new vs 7 legacy vs 71 TTL), zarr fix confirmed, 97.2% TTL catch rate
a532476 Fix zarr write bug in feature.py: include coordinate variables in chunk encoding
9cb9de5 Update badlabel audit result note with pipeline run findings: LNR useless after lowpass, zarr v3 write bug

README:


type: readme


Quick Start for Collaborators

Follow this checklist to get started with Pixecog documentation and workflows.

🐀 Pixecog Project — Compact Overview

Core principles

  • One immutable BIDS raw dataset (raw/) as the canonical baseline
  • Each analysis pipeline ha