pipeio: Ontology¶
Concepts¶
pipeio manages a hierarchy of flows, mods, and their artifacts. Each concept maps to a filesystem convention, a registry entry, and a set of MCP tools.
Flow¶
A flow is the primary unit of work — a self-contained computational workflow with its own Snakefile, config, notebooks, scripts, and derivative output. Flow names are globally unique.
The flow's derivative directory (derivatives/{flow}/) is a datalad subdataset containing all pipeline outputs, organized by subject/session.
Mod¶
A mod (module) is a logical grouping of Snakemake rules within a flow. Mods are identified by rule name prefix: rules named filter_bandpass, filter_notch belong to mod filter. Each mod has:
- One or more rules (in Snakefile or rules/{mod}.smk)
- Optionally, scripts shared across rules (scripts/{script}.py)
- Documentation in three facets: theory, spec, delta
- Optionally, notebooks investigating or demoing its outputs (one demo per mod is ideal; flow-level demos spanning multiple mods leave mod empty)
Mod Documentation Facets¶
Each mod has up to three documentation facets, stored in {flow}/docs/{mod}/:
| Facet | File | Purpose | Evolves into |
|---|---|---|---|
| Theory | theory.md |
Scientific rationale, method justification, citations | Manuscript methods: "We used X because Y [@citekey]" |
| Spec | spec.md |
Technical specification: I/O contracts, parameters, component manifest | Manuscript methods: "Implemented as...", supplementary materials |
| Delta | delta.md |
Current state, known issues, refactor plans, changelog | Revision notes (temporary — removed when resolved) |
Theory is the entry point for understanding a mod. It contains proper pandoc citations ([@citekey]) that resolve through biblio. Sources include idea notes, meeting notes, paper references, and exploratory notebook conclusions.
Spec is the ground truth for implementation. Agents can auto-generate a skeleton from mod_context + config_read, then humans add intent and constraints. It answers: what should this mod produce, with what guarantees?
Delta is operational and temporary. Created by agents after audits or when a gap is found between spec and reality. Deleted when the gap is resolved.
Docs-to-Manuscript Pipeline¶
Mod docs are written in pandoc-compatible markdown with citations. The manuscript_assemble tool can pull from theory and spec docs to draft methods sections:
idea note (notio)
→ theory.md (drafted with biblio citations)
→ spec.md (from implementation decisions)
→ manuscript methods section (assembled by manuscripto)
Notebook¶
Notebooks live in two parallel workspaces within a flow, separated by purpose:
explore/— prototypes, investigations, parameter sweeps. Never published. Findings feed intotheory.md. Absorbed into mod scripts when done.demo/— showcases mod outputs in narrative form. Published to the project site as rendered HTML.
Both workspaces share the same internal structure (.src/, .myst/, .ipynb). The directory determines the default publish behavior — no need to set publish_html per notebook unless overriding.
Rule¶
A Snakemake rule defines one processing step. Rules are grouped into mods by naming convention. Complex mods can split rules into rules/{mod}.smk files included by the main Snakefile.
Rules have three execution modes:
- Script (script:) — runs a Python script from scripts/. Multiple rules can share the same script with different params/inputs/outputs.
- Shell (shell:) — runs a CLI command or MATLAB directly. No script file.
- Run (run:) — inline Python in the Snakefile itself.
Flow Directory Structure¶
code/pipelines/{flow}/
├── Snakefile # workflow definition (includes rules/*.smk)
├── config.yml # input/output dirs, registry groups
├── Makefile # convenience targets (delegates to pipeio CLI)
├── publish.yml # flow-level publish config (dag, report, scripts)
├── rules/ # optional: per-mod rule files
│ ├── filter.smk # included by Snakefile
│ └── hpclayer.smk # complex mods get their own file
├── scripts/ # rule scripts (may be shared across rules)
│ ├── filter.py
│ ├── interpolate.py
│ └── hpclayer_detect.py
├── docs/ # flow-local documentation (source of truth)
│ ├── index.md # flow overview + mod listing
│ ├── filter/ # per-mod doc directory
│ │ ├── theory.md # scientific rationale + citations
│ │ ├── spec.md # technical spec + I/O contracts
│ │ └── delta.md # optional: current state + plans
│ └── hpclayer/
│ ├── theory.md
│ └── spec.md
└── notebooks/ # notebook workspace
├── notebook.yml # config: entries, kernel, per-notebook publish
├── explore/ # exploratory notebooks (never published)
│ ├── .src/ # agent territory
│ │ ├── investigate_noise.py
│ │ └── investigate_tfspace.py
│ ├── .myst/ # generated MyST
│ │ └── ...
│ ├── investigate_noise.ipynb # human-facing
│ └── investigate_tfspace.ipynb
└── demo/ # demo notebooks (published to site)
├── .src/
│ └── demo_filter.py
├── .myst/
│ └── demo_filter.md
└── demo_filter.ipynb
Derivative Structure¶
derivatives/{flow}/
├── manifest.yml # derivative manifest
├── sub-01/
│ └── {datatype}/
│ └── sub-01_*_{suffix}.{ext}
├── sub-02/
│ └── ...
└── all/ # cross-subject aggregates (optional)
The manifest (manifest.yml) is a copy of the flow's registry: config section, written to the derivative directory on each run. Downstream flows reference it via input_manifest in their config to discover available outputs without needing access to the source flow's code or config.
# Cross-flow wiring in a downstream flow's config.yml
input_dir: "derivatives/preprocess_ieeg"
input_manifest: "derivatives/preprocess_ieeg/manifest.yml"
Configuration Files¶
config.yml — Snakemake I/O¶
Defines inputs, outputs, and registry groups. Consumed by Snakemake and snakebids.
input_dir: "raw"
output_dir: "derivatives/preprocess_ieeg"
registry:
badlabel:
bids: {root: badlabel, datatype: ieeg}
members:
npy: {suffix: ieeg, extension: .npy}
notebook.yml — notebook identity and per-notebook publish¶
kernel: cogpy # flow-level default kernel
entries:
- path: notebooks/explore/.src/investigate_noise.py
kind: investigate # implied by explore/ dir, but explicit for tools
mod: filter
status: active
pair_ipynb: true
- path: notebooks/explore/.src/investigate_tfspace.py
kind: investigate
mod: filter
status: archived # code absorbed into scripts
pair_ipynb: true
- path: notebooks/demo/.src/demo_filter.py
kind: demo # implied by demo/ dir
mod: filter
status: promoted
pair_ipynb: true
publish_html: true # default for demo/, explicit for clarity
publish.yml — flow-level publish config¶
Controls which flow-level artifacts docs_collect publishes to the site. Per-notebook publish is in notebook.yml.
dag: true # publish rule dependency graph (dag.svg)
report: true # publish latest snakemake report (report.html)
report_archive: false # keep old reports as report-{date}.html
scripts: true # generate script index with git links
Published Documentation¶
docs_collect reads publish.yml + notebook.yml to assemble the site:
docs/pipelines/{flow}/
├── index.md # flow overview (from flow/docs/index.md)
├── dag.svg # rule dependency graph (if publish.dag)
├── report.html # latest snakemake report (if publish.report)
├── mods/
│ ├── filter/ # mod docs (from flow/docs/filter/)
│ │ ├── theory.md # scientific rationale + citations
│ │ └── spec.md # technical specification
│ └── hpclayer/
│ ├── theory.md
│ └── spec.md
├── notebooks/
│ └── nb-demo_filter.html # rendered demo notebooks (from demo/)
└── scripts.md # auto-generated script index with git links
Entity Relationships¶
graph TD
Flow["Flow<br/><i>self-contained workflow</i>"]
Mod["Mod<br/><i>logical rule group</i>"]
Rule["Rule<br/><i>Snakemake rule</i>"]
Script["Script<br/><i>rule implementation</i>"]
Notebook["Notebook<br/><i>explore or demo</i>"]
Config["Config<br/><i>config.yml</i>"]
NbConfig["Notebook Config<br/><i>notebook.yml</i>"]
PubConfig["Publish Config<br/><i>publish.yml</i>"]
Derivative["Derivative<br/><i>output dataset</i>"]
Theory["Theory<br/><i>scientific rationale</i>"]
Spec["Spec<br/><i>technical specification</i>"]
Delta["Delta<br/><i>current state / plans</i>"]
Site["Project Site<br/><i>docs/pipelines/{flow}/</i>"]
DAG["DAG<br/><i>dag.svg</i>"]
Report["Report<br/><i>report.html</i>"]
Registry["Registry<br/><i>flow metadata index</i>"]
Manuscript["Manuscript<br/><i>methods sections</i>"]
IdeaNote["Idea Notes<br/><i>notio</i>"]
Papers["Papers<br/><i>biblio</i>"]
Registry -->|indexes| Flow
Flow -->|contains| Mod
Flow -->|has| Config
Flow -->|has| NbConfig
Flow -->|has| PubConfig
Flow -->|produces| Derivative
Flow -->|produces| DAG
Flow -->|produces| Report
Mod -->|groups| Rule
Rule -->|may execute| Script
Mod -->|has| Theory
Mod -->|has| Spec
Mod -->|has| Delta
Mod -->|explored by| Notebook
NbConfig -->|configures| Notebook
Config -->|defines I/O for| Rule
IdeaNote -->|informs| Theory
Papers -->|cited in| Theory
Notebook -->|findings feed| Theory
Theory -->|assembled into| Manuscript
Spec -->|assembled into| Manuscript
PubConfig -->|controls| Site
Theory -->|collected to| Site
Spec -->|collected to| Site
Notebook -->|published to| Site
Script -->|copied to| Site
DAG -->|published to| Site
Report -->|published to| Site
Naming Conventions¶
graph LR
subgraph "Flow: preprocess_ieeg"
direction TB
SF["Snakefile"]
R1["rule filter_bandpass<br/><i>script: filter.py</i>"]
R2["rule filter_notch<br/><i>script: filter.py</i>"]
R3["rule interpolate_bad<br/><i>script: interpolate.py</i>"]
R4["rule qc_report<br/><i>shell: matlab -r ...</i>"]
S1["scripts/filter.py<br/><i>(shared by 2 rules)</i>"]
S2["scripts/interpolate.py"]
T1["docs/filter/theory.md"]
SP1["docs/filter/spec.md"]
T2["docs/interpolate/theory.md"]
NB1["explore/.src/investigate_noise.py<br/><i>mod: filter</i>"]
NB2["demo/.src/demo_filter.py<br/><i>mod: filter</i>"]
end
subgraph "Mods"
M1["mod: filter"]
M2["mod: interpolate"]
M3["mod: qc"]
end
SF --> R1 & R2 & R3 & R4
R1 --> S1
R2 --> S1
R3 --> S2
M1 --- R1 & R2
M2 --- R3
M3 --- R4
M1 --- T1 & SP1
M2 --- T2
M1 --- NB1 & NB2
Lifecycle States¶
Flow lifecycle¶
scaffold → develop → validate → production
Mod lifecycle¶
graph LR
Idea["Idea note<br/><i>notio</i>"] --> Theory["theory.md<br/><i>drafted with biblio</i>"]
Theory --> NB["Exploratory notebook<br/><i>prototype approach</i>"]
NB --> Spec["spec.md<br/><i>from implementation</i>"]
Spec --> Impl["Implementation<br/><i>rules + scripts</i>"]
Impl --> Validate["Validate<br/><i>contracts + audit</i>"]
Validate --> Production["Production"]
Validate -->|issues found| Delta["delta.md<br/><i>gap + refactor plan</i>"]
Delta --> Impl
Notebook lifecycle¶
graph LR
Draft["draft"] --> Active["active"]
Active -->|"explore/"| Archived["archived<br/><i>code absorbed into scripts<br/>findings feed theory.md</i>"]
Active -->|"demo/"| Promoted["promoted<br/><i>published to site as HTML</i>"]
Documentation lifecycle¶
graph LR
IdeaNotes["Idea/meeting notes"] --> TheoryDraft["theory.md draft<br/><i>agent + biblio</i>"]
TheoryDraft --> TheoryReview["theory.md reviewed<br/><i>human validates</i>"]
TheoryReview --> SpecDraft["spec.md draft<br/><i>from mod_context</i>"]
SpecDraft --> SpecImpl["spec.md + implementation<br/><i>co-evolve</i>"]
SpecImpl --> Methods["Manuscript methods<br/><i>assembled by manuscripto</i>"]
Registry Schema¶
# .projio/pipeio/registry.yml
flows:
preprocess_ieeg:
name: preprocess_ieeg
code_path: code/pipelines/preprocess_ieeg
config_path: code/pipelines/preprocess_ieeg/config.yml
doc_path: docs/pipelines/preprocess_ieeg
app_type: snakemake
mods:
filter:
name: filter
rules: [filter_bandpass, filter_notch]
doc_path: code/pipelines/preprocess_ieeg/docs/filter
interpolate:
name: interpolate
rules: [interpolate_bad]
doc_path: null
Modkey Citation Format¶
Mods are citable in manuscripts via BibTeX:
@misc{preprocess_ieeg_mod-filter,
title = {mod: flow=preprocess_ieeg mod=filter},
author = {project_name},
year = {2026},
note = {doc_path=docs/pipelines/preprocess_ieeg/mods/filter; rules=filter_bandpass, filter_notch},
}
Referenced in pandoc markdown as [@preprocess_ieeg_mod-filter].
MCP Tool Categories¶
| Category | Tools | Purpose |
|---|---|---|
| Flow discovery | flow_list, flow_status, registry_scan, registry_validate |
Find and inspect flows |
| Flow management | flow_fork, flow_deregister |
Create variants, remove from registry |
| Notebook lifecycle | nb_status, nb_create, nb_update, nb_sync, nb_sync_flow, nb_diff, nb_scan, nb_read, nb_audit, nb_lab, nb_publish, nb_analyze, nb_exec, nb_pipeline |
Full notebook workflow |
| Mod management | mod_list, mod_context, mod_resolve, mod_create |
Discover and scaffold mods |
| Rule authoring | rule_list, rule_stub, rule_insert, rule_update |
Safe Snakefile editing |
| Config authoring | config_read, config_patch, config_init |
Flow config management |
| Contracts | contracts_validate, cross_flow, completion |
I/O validation |
| Documentation | docs_collect, docs_nav, mkdocs_nav_patch, modkey_bib |
Site publishing |
| Execution | run, run_status, run_dashboard, run_kill |
Snakemake session management |
| Inspection | target_paths, dag_export, log_parse |
Path resolution and debugging |