pipeio Specifications¶

Design specifications for pipeio — an agent-facing authoring and discovery layer for computational pipelines in research repositories.

pipeio makes pipeline knowledge (registry, configs, rules, contracts, notebooks) queryable and actionable for AI agents. It delegates execution to Snakemake, provenance to DataLad, path resolution to snakebids, and app lifecycle to snakebids deployment modes.

Spec Documents¶

Spec	Domain	Status
Ontology	Concepts, entity relationships, directory conventions, naming	Current
Overview & Architecture	Package scope, design principles, ecosystem fit	Implemented
Registry	Pipe/flow/mod hierarchy, YAML schema, scan & validation	Implemented
Flow Config	Per-flow `config.yml` schema, output registry (data contracts)	Implemented
Path Resolution	`PathResolver` protocol, `PipelineContext`, `Session`, `Stage`	Implemented (SimpleResolver + BidsResolver)
Notebook Lifecycle	Pair, sync, execute, publish — replacing Makefile shell scripts	Implemented
Scaffolding	Flow and mod creation from templates	Implemented (`flow new` + `mod_create`)
Contracts	Declarative input/output validation framework	Implemented (models + validation)
CLI	Command-line interface design	Implemented (full surface + `pf` helper)
MCP Tools	Agent-facing tools via projio MCP server (38 tools)	Implemented

Reference Implementation¶

These specs are derived from an audit of the pixecog project's pipeline infrastructure (code/utils/io/, code/pipelines/, workflow/). The audit document lives at pixecog/prompts/plan/pipeio-audit-and-design.md.

Design Principles¶

Agent-facing authoring layer — pipeio makes pipeline knowledge queryable and provides safe authoring operations; it does not own execution, provenance, or path resolution
One flow = one derivative — each flow is a self-contained snakebids app producing one derivative directory; pipe is a category tag
Delegation over duplication — execution → Snakemake, provenance → DataLad, paths → snakebids bids(), app lifecycle → snakebids
Declarative over imperative — registries and configs are YAML; validation is schema-driven
Graceful degradation — pipeio works without optional extras ([bids], [notebook])
Search before creation — registry queries help discover existing flows before scaffolding new ones
Notebook as first-class artifact — the lifecycle (pair/sync/exec/publish) is managed, not ad-hoc