pipeio: Registry Specification¶

Purpose¶

The pipeline registry maps the three-level hierarchy (pipe / flow / mod) to filesystem paths, config files, and documentation locations. It is the central index that all other pipeio operations consult.

Registry YAML Schema¶

The registry follows the schema discovered in pixecog's pipe_flow_mod_registry.yml (autogenerated, schema v2):

# .projio/pipeio/registry.yml  (preferred; legacy: .pipeio/registry.yml)
generated_at: '2026-03-13T13:50:08.541812+00:00'   # ISO timestamp

pipes:
  brainstate:                          # pipe name (slug)
    id: pipe-brainstate                # canonical ID
    slug_ok: true                      # passes naming convention check
    code:
      pipe_dir: code/pipelines/brainstate
    docs:
      doc_dir: docs/explanation/pipelines/pipe-brainstate
      index_md: docs/explanation/pipelines/pipe-brainstate/index.md
    flows:
      brainstate:                      # flow name (slug)
        id: pipe-brainstate_flow-brainstate
        slug_ok: true
        code:
          config_path: code/pipelines/brainstate/config.yml
          entrypoints:
            - path: code/pipelines/brainstate/Snakefile
              kind: snakefile          # snakefile | smk
              flow_root: code/pipelines/brainstate
              config_path: code/pipelines/brainstate/config.yml
        docs:
          doc_dir: docs/explanation/pipelines/pipe-brainstate/flow-brainstate
          index_md: docs/explanation/pipelines/pipe-brainstate/flow-brainstate/index.md
        mods:
          brainstate:                  # mod name (slug)
            id: pipe-brainstate_flow-brainstate_mod-brainstate
            doc_dir: docs/explanation/pipelines/.../mod-brainstate
            index_md: docs/explanation/pipelines/.../mod-brainstate/index.md

Field Definitions¶

Field	Type	Required	Description
`pipes`	mapping	yes	Top-level mapping of pipe slugs to pipe entries
`pipes.<name>.id`	string	yes	Canonical ID: `pipe-<name>`
`pipes.<name>.slug_ok`	bool	yes	Whether the name passes `slug_ok()` validation
`pipes.<name>.code.pipe_dir`	path	yes	Path to the pipe's code directory
`pipes.<name>.docs`	mapping\|null	no	Documentation paths (null if no docs exist)
`pipes.<name>.flows`	mapping	yes	Mapping of flow slugs to flow entries
`flows.<name>.id`	string	yes	Canonical ID: `pipe-<pipe>_flow-<flow>`
`flows.<name>.code.config_path`	path\|null	no	Path to flow's `config.yml`
`flows.<name>.code.entrypoints`	list	yes	Workflow entry points (Snakefiles, .smk files)
`flows.<name>.mods`	mapping	no	Mapping of mod slugs to mod entries

Slug Validation¶

Names must match ^[a-z][a-z0-9_]*$ (lowercase, underscore-separated). The slug_ok field tracks compliance. Names that fail validation (e.g., DGgamma) are flagged but not rejected — they work but emit warnings.

PipelineRegistry Python API¶

from pipeio.registry import PipelineRegistry

# Load from YAML
registry = PipelineRegistry.from_yaml(Path(".pipeio/registry.yml"))

# Query
registry.list_pipes()                    # → ['brainstate', 'preprocess', ...]
registry.list_flows()                    # → [FlowEntry(...), ...]
registry.list_flows(pipe="preprocess")   # → [FlowEntry(name='ieeg', ...), ...]
registry.get(pipe="preprocess", flow="ieeg")  # → FlowEntry

Pydantic Models¶

class ModEntry(BaseModel):
    name: str
    rules: list[str] = []
    doc_path: str | None = None

class FlowEntry(BaseModel):
    name: str
    pipe: str
    code_path: str                      # flow_root directory
    config_path: str | None = None      # path to config.yml
    doc_path: str | None = None
    mods: dict[str, ModEntry] = {}
    app_type: str = ""                  # "snakebids" | "snakemake" | "" (detected by registry_scan)

class PipelineRegistry(BaseModel):
    flows: dict[str, FlowEntry] = {}

Registry Generation (`pipeio registry scan`)¶

The scan command discovers flows by walking the pipelines directory:

For each <pipelines_dir>/<pipe>/<flow>/ (or <pipelines_dir>/<pipe>/ if single-flow):
Check for Snakefile or *.smk → entrypoints
Check for config.yml → config_path
Check for notebooks/ → notebook presence
For each entrypoint, scan rule names → extract mods by prefix grouping
Cross-reference with docs directory for documentation paths
Validate slugs, detect missing docs, emit warnings

Output¶

$ pipeio registry scan
Scanning code/pipelines/ ...
  pipe=preprocess  flow=ieeg     config=yes  mods=6  docs=yes
  pipe=preprocess  flow=ecephys  config=yes  mods=4  docs=no   ⚠ missing docs
  pipe=brainstate  flow=brainstate config=yes mods=1  docs=yes
  ...
Written: .projio/pipeio/registry.yml (8 pipes, 12 flows, 31 mods)

Registry Validation (`pipeio registry validate`)¶

Checks:

Slug compliance — all names pass slug_ok()
Config existence — every config_path points to an existing file
Docs coverage — every flow has a docs directory (warning, not error)
Entrypoint existence — every Snakefile/smk path exists
ID uniqueness — no duplicate IDs across the registry
Cross-flow consistency — shared mods across flows have compatible interfaces

Returns a ValidationResult with errors (blocking) and warnings (informational).