Skip to content

pipeio smart read tools: mod_context and notebook metadata

Motivation

pipeio's ontology (pipe/flow/mod/notebook/rule/config) is well-registered but the MCP access pattern requires many round-trips to build context on a single entity. Two gaps identified:

  1. Mod-level context bundling — understanding or modifying a mod currently requires 5+ MCP calls (mod_resolve → rule_list → Read script → Read doc → config_read). A single smart-read tool would collapse this.

  2. Notebook sprawl — notebook.yml only tracks path and pair/publish flags. No intent, status, or category is persisted. Once notebooks accumulate, agents (and humans) can't tell what each is for or whether it's still relevant without reading the file.

Feature 1: pipeio_mod_context(pipe, flow, mod)

Bundled read tool that returns everything an agent needs to understand and work on a mod in one call. No new state — reads from existing registry, config, and filesystem.

Returns: - mod: name, rules list, doc_path - rules: parsed rule definitions (inputs, outputs, params, script, source_file) — subset of rule_list filtered to this mod - script_content: dict of {script_path: content} for each unique script referenced by the mod's rules - doc_content: mod doc markdown content (if doc_path exists) - config_params: relevant config sections referenced by the mod's params expressions - bids_signatures: bids() call strings for the mod's output groups

Implementation: Compose from existing internals — _parse_snakefile_rules, _find_registry, FlowConfig, filesystem reads. No new data model.

Feature 2: Notebook metadata in notebook.yml

Enrich NotebookEntry with three optional fields:

entries:
  - path: notebooks/investigate_noise.py
    kind: investigate          # investigate | explore | demo | validate
    description: "Check TTL artifact patterns across sessions"
    status: active             # draft | active | stale | promoted | archived
    pair_ipynb: true
    publish_myst: true

Schema change (NotebookEntry): - description: str = "" — one-line intent - status: str = "active" — lifecycle state - kind: str = "" — notebook category

Lifecycle semantics: - draft — work in progress, not ready for review - active — current investigation, being used - stale — superseded or no longer relevant, candidate for archival - promoted — investigation led to a pipeline mod (link to mod?) - archived — kept for reference but not active

MCP changes: - nb_create — already accepts description and kind, just persist them in notebook.yml - nb_status — include description, kind, status in output - New: pipeio_nb_update(pipe, flow, name, status?, description?) — update notebook metadata (parallels note_update pattern)

Implementation Plan

Phase 1: Notebook metadata (small, self-contained)

  1. Add description, status, kind fields to NotebookEntry in notebook/config.py
  2. Update mcp_nb_create to persist description, kind, status="active" in notebook.yml
  3. Update mcp_nb_status to include the new fields in output
  4. Add mcp_nb_update tool for status/description changes
  5. Register pipeio_nb_update in projio MCP server + agent instructions
  6. Tests for schema, create, status, update

Phase 2: pipeio_mod_context (read-only composition)

  1. Add mcp_mod_context(root, pipe, flow, mod) in pipeio/mcp.py
  2. Reuse _find_registry, _parse_snakefile_rules, FlowConfig
  3. Read script files, doc file, extract config params
  4. Register pipeio_mod_context in projio MCP server + agent instructions
  5. Tests with scaffold fixtures

Notes

  • No new persistent state — both features read from existing sources
  • notebook.yml serialization should use ruamel round-trip (same pattern as config_patch fix)
  • mod_context is read-only composition, not a cache — always fresh