pipeio v2 roadmap: lean scope, snakebids/DataLad alignment¶
North star¶
pipeio is an agent-facing authoring + discovery layer for Snakemake/snakebids/DataLad projects. It does not compete with execution engines or provenance systems. It makes pipeline knowledge queryable and actionable for AI agents via MCP tools.
One flow = one snakebids app = one derivative directory (= one DataLad subdataset).
Principles¶
- Don't reimplement what snakebids, Snakemake, or DataLad already do
- Adapt their outputs into agent-usable structured data where needed
- Own the registry, authoring, contracts, and documentation layers they don't provide
- Align with BIDS derivatives metadata for cross-flow lineage
Source¶
See deep-research-pipeio-scope.md for full landscape analysis.
Current tool inventory and v2 fate¶
KEEP — unique agent value (no ecosystem equivalent)¶
| Tool | Purpose | v2 changes |
|---|---|---|
flow_list |
List flows in registry | Treat flows as snakebids apps; include derivative dir |
flow_status |
Overview of a flow | Add snakebids app status (has run.py, .snakebids marker) |
mod_list |
List mods in a flow | Keep as-is |
mod_resolve |
Resolve modkeys to metadata | Keep as-is |
mod_context |
Bundled read: rules, scripts, doc, config | Keep as-is |
mod_create |
Scaffold mod (script + doc + I/O) | Align with snakebids workflow/ layout |
rule_list |
Parse rules from Snakefiles | Keep — agents need structured rule data |
rule_stub |
Generate rule text from I/O specs | Keep — unique authoring tool |
rule_insert |
Insert rule into .smk file | Keep — unique authoring tool |
rule_update |
Patch existing rule | Keep — unique authoring tool |
config_read |
Parse flow config with bids signatures | Evolve to read config/snakebids.yml |
config_patch |
Surgical YAML edit (preserves comments/anchors) | Keep — unique; reposition for snakebids.yml |
cross_flow |
Map output→input chains across flows | Evolve: also read BIDS dataset_description.json GeneratedBy/SourceDatasets |
contracts_validate |
Check I/O contracts | Keep — feeds DataLad run --input/--output declarations |
registry_scan |
Discover flows from filesystem | Evolve: detect snakebids app structure |
registry_validate |
Check registry consistency | Keep |
nb_create |
Scaffold notebook with bootstrap cells | Keep |
nb_update |
Update notebook metadata | Keep |
nb_status |
Notebook sync/lifecycle status | Keep |
nb_sync |
Jupytext sync | Keep — thin wrapper over jupytext |
nb_publish |
Publish notebook to docs | Keep |
nb_analyze |
Parse notebook structure | Keep |
nb_exec |
Execute notebook (papermill) | Keep |
nb_pipeline |
Chain sync→publish→collect | Keep |
modkey_bib |
Generate modkey bibliography | Keep — unique |
docs_collect |
Collect flow docs into MkDocs | Keep |
docs_nav |
Generate nav YAML fragment | Keep |
mkdocs_nav_patch |
Patch mkdocs.yml nav | Keep |
THIN OUT — replace internals with ecosystem tools¶
| Tool | Current impl | v2: adapter over |
|---|---|---|
dag |
Custom Snakefile parser | snakemake --d3dag JSON output |
completion |
Glob filesystem vs registry schema | snakemake --summary lifted into contract-level status |
log_parse |
Read raw snakemake logs | Pointer to snakemake --report + DataLad run record |
config_init |
Scaffold flat config.yml | Scaffold snakebids app skeleton (config/snakebids.yml + workflow/ + run.py) |
STOP / REPLACE — duplicate ecosystem tools¶
| Tool | Current impl | v2: replaced by |
|---|---|---|
run |
screen -dmS snakemake + runs.json |
datalad run -- python run.py ... → return commit + run record |
run_status |
Parse screen sessions + log tail | DataLad run records + snakemake --summary |
run_dashboard |
Aggregate runs.json | DataLad git log of run records |
run_kill |
Kill screen sessions | Process management (if needed at all) |
Structural changes¶
Flow directory layout: flat → snakebids app¶
Current:
code/pipelines/{pipe}/{flow}/
Snakefile
config.yml
scripts/
v2 (snakebids app):
code/pipelines/{flow}/ # or code/apps/{flow}/
run.py # snakebids entry point
config/
snakebids.yml # pybids_inputs, parse_args, analysis_levels
workflow/
Snakefile
rules/*.smk # mod-organized rule files
scripts/
notebooks/
docs/
Impact on pipeio:
- registry_scan: detect run.py + config/snakebids.yml as snakebids app markers
- config_read/config_patch: target config/snakebids.yml
- rule_insert/rule_list: look in workflow/rules/ and workflow/Snakefile
- mod_create: scaffold scripts into workflow/scripts/
Execution: screen → datalad run¶
Current:
pipeio_run(pipe, flow)
→ screen -dmS snakemake ...
→ writes runs.json
v2:
pipeio_run(pipe, flow, analysis_level="participant")
→ datalad run \
--input {bids_dir} \
--output {derivative_dir} \
-- python run.py {bids_dir} {derivative_dir} {analysis_level}
→ returns { commit, run_record, derivative_dir }
Contracts feed the --input/--output declarations.
Cross-flow: registry → BIDS derivatives metadata¶
v2: Read/write dataset_description.json in each derivative dir:
{
"Name": "preprocess-ecephys",
"GeneratedBy": [{"Name": "preprocess-ecephys", "CodeURL": "..."}],
"SourceDatasets": [{"URL": "../raw"}]
}
Standards-aligned lineage that any BIDS tool can read.
Migration phases¶
Phase 0: Research & design (current)¶
- [x] Deep research on ecosystem landscape
- [x] Identify keep/thin/stop categories
- [ ] Design snakebids.yml schema mapping (what pipeio reads/writes)
- [ ] Design datalad run integration interface
- [ ] Decide on pipe/flow hierarchy: keep pipe as category or flatten?
Phase 1: Structural alignment (non-breaking, additive)¶
- [ ]
registry_scanlearns snakebids app layout alongside current flat layout - [ ]
config_readsupports bothconfig.ymlandconfig/snakebids.yml - [ ]
config_init --snakebidsgenerates app skeleton - [ ] Add
dataset_description.jsonto flow scaffolding
Phase 2: Execution migration¶
- [ ]
pipeio_rungainsprovenance=True→ wraps withdatalad run - [ ]
pipeio_dagswitches tosnakemake --d3dagbackend - [ ]
pipeio_completionswitches tosnakemake --summarybackend - [ ] Deprecate
runs.jsonstate machine
Phase 3: Full snakebids alignment¶
- [ ] Default scaffolding generates snakebids app layout
- [ ] Remove flat layout support (or keep as legacy)
- [ ] Explore pipeio as snakebids plugin
- [ ]
cross_flowreads BIDSGeneratedBy/SourceDatasets
Tool count¶
| Category | v1 | v2 |
|---|---|---|
| Keep (unique) | 27 | 27 |
| Thin (adapter) | 4 | 4 (same API, different internals) |
| Stop (replace) | 4 | 0 |
| New | — | ~2 (datalad_run wrapper, bids_metadata) |
| Total | 35 | ~33 |
The surface barely changes — the difference is what's inside: pipeio stops reimplementing and starts composing.