pipeio smart read tools: mod_context and notebook metadata¶
Motivation¶
pipeio's ontology (pipe/flow/mod/notebook/rule/config) is well-registered but the MCP access pattern requires many round-trips to build context on a single entity. Two gaps identified:
-
Mod-level context bundling — understanding or modifying a mod currently requires 5+ MCP calls (mod_resolve → rule_list → Read script → Read doc → config_read). A single smart-read tool would collapse this.
-
Notebook sprawl — notebook.yml only tracks path and pair/publish flags. No intent, status, or category is persisted. Once notebooks accumulate, agents (and humans) can't tell what each is for or whether it's still relevant without reading the file.
Feature 1: pipeio_mod_context(pipe, flow, mod)¶
Bundled read tool that returns everything an agent needs to understand and work on a mod in one call. No new state — reads from existing registry, config, and filesystem.
Returns:
- mod: name, rules list, doc_path
- rules: parsed rule definitions (inputs, outputs, params, script, source_file) — subset of rule_list filtered to this mod
- script_content: dict of {script_path: content} for each unique script referenced by the mod's rules
- doc_content: mod doc markdown content (if doc_path exists)
- config_params: relevant config sections referenced by the mod's params expressions
- bids_signatures: bids() call strings for the mod's output groups
Implementation: Compose from existing internals — _parse_snakefile_rules, _find_registry, FlowConfig, filesystem reads. No new data model.
Feature 2: Notebook metadata in notebook.yml¶
Enrich NotebookEntry with three optional fields:
entries:
- path: notebooks/investigate_noise.py
kind: investigate # investigate | explore | demo | validate
description: "Check TTL artifact patterns across sessions"
status: active # draft | active | stale | promoted | archived
pair_ipynb: true
publish_myst: true
Schema change (NotebookEntry):
- description: str = "" — one-line intent
- status: str = "active" — lifecycle state
- kind: str = "" — notebook category
Lifecycle semantics:
- draft — work in progress, not ready for review
- active — current investigation, being used
- stale — superseded or no longer relevant, candidate for archival
- promoted — investigation led to a pipeline mod (link to mod?)
- archived — kept for reference but not active
MCP changes:
- nb_create — already accepts description and kind, just persist them in notebook.yml
- nb_status — include description, kind, status in output
- New: pipeio_nb_update(pipe, flow, name, status?, description?) — update notebook metadata (parallels note_update pattern)
Implementation Plan¶
Phase 1: Notebook metadata (small, self-contained)¶
- Add
description,status,kindfields toNotebookEntryinnotebook/config.py - Update
mcp_nb_createto persistdescription,kind,status="active"in notebook.yml - Update
mcp_nb_statusto include the new fields in output - Add
mcp_nb_updatetool for status/description changes - Register
pipeio_nb_updatein projio MCP server + agent instructions - Tests for schema, create, status, update
Phase 2: pipeio_mod_context (read-only composition)¶
- Add
mcp_mod_context(root, pipe, flow, mod)in pipeio/mcp.py - Reuse
_find_registry,_parse_snakefile_rules,FlowConfig - Read script files, doc file, extract config params
- Register
pipeio_mod_contextin projio MCP server + agent instructions - Tests with scaffold fixtures
Notes¶
- No new persistent state — both features read from existing sources
- notebook.yml serialization should use ruamel round-trip (same pattern as config_patch fix)
- mod_context is read-only composition, not a cache — always fresh