Skip to content

pipeio: Registry Specification

Purpose

The pipeline registry maps the three-level hierarchy (pipe / flow / mod) to filesystem paths, config files, and documentation locations. It is the central index that all other pipeio operations consult.

Registry YAML Schema

The registry follows the schema discovered in pixecog's pipe_flow_mod_registry.yml (autogenerated, schema v2):

# .projio/pipeio/registry.yml  (preferred; legacy: .pipeio/registry.yml)
generated_at: '2026-03-13T13:50:08.541812+00:00'   # ISO timestamp

pipes:
  brainstate:                          # pipe name (slug)
    id: pipe-brainstate                # canonical ID
    slug_ok: true                      # passes naming convention check
    code:
      pipe_dir: code/pipelines/brainstate
    docs:
      doc_dir: docs/explanation/pipelines/pipe-brainstate
      index_md: docs/explanation/pipelines/pipe-brainstate/index.md
    flows:
      brainstate:                      # flow name (slug)
        id: pipe-brainstate_flow-brainstate
        slug_ok: true
        code:
          config_path: code/pipelines/brainstate/config.yml
          entrypoints:
            - path: code/pipelines/brainstate/Snakefile
              kind: snakefile          # snakefile | smk
              flow_root: code/pipelines/brainstate
              config_path: code/pipelines/brainstate/config.yml
        docs:
          doc_dir: docs/explanation/pipelines/pipe-brainstate/flow-brainstate
          index_md: docs/explanation/pipelines/pipe-brainstate/flow-brainstate/index.md
        mods:
          brainstate:                  # mod name (slug)
            id: pipe-brainstate_flow-brainstate_mod-brainstate
            doc_dir: docs/explanation/pipelines/.../mod-brainstate
            index_md: docs/explanation/pipelines/.../mod-brainstate/index.md

Field Definitions

Field Type Required Description
pipes mapping yes Top-level mapping of pipe slugs to pipe entries
pipes.<name>.id string yes Canonical ID: pipe-<name>
pipes.<name>.slug_ok bool yes Whether the name passes slug_ok() validation
pipes.<name>.code.pipe_dir path yes Path to the pipe's code directory
pipes.<name>.docs mapping|null no Documentation paths (null if no docs exist)
pipes.<name>.flows mapping yes Mapping of flow slugs to flow entries
flows.<name>.id string yes Canonical ID: pipe-<pipe>_flow-<flow>
flows.<name>.code.config_path path|null no Path to flow's config.yml
flows.<name>.code.entrypoints list yes Workflow entry points (Snakefiles, .smk files)
flows.<name>.mods mapping no Mapping of mod slugs to mod entries

Slug Validation

Names must match ^[a-z][a-z0-9_]*$ (lowercase, underscore-separated). The slug_ok field tracks compliance. Names that fail validation (e.g., DGgamma) are flagged but not rejected — they work but emit warnings.

PipelineRegistry Python API

from pipeio.registry import PipelineRegistry

# Load from YAML
registry = PipelineRegistry.from_yaml(Path(".pipeio/registry.yml"))

# Query
registry.list_pipes()                    # → ['brainstate', 'preprocess', ...]
registry.list_flows()                    # → [FlowEntry(...), ...]
registry.list_flows(pipe="preprocess")   # → [FlowEntry(name='ieeg', ...), ...]
registry.get(pipe="preprocess", flow="ieeg")  # → FlowEntry

Pydantic Models

class ModEntry(BaseModel):
    name: str
    rules: list[str] = []
    doc_path: str | None = None

class FlowEntry(BaseModel):
    name: str
    pipe: str
    code_path: str                      # flow_root directory
    config_path: str | None = None      # path to config.yml
    doc_path: str | None = None
    mods: dict[str, ModEntry] = {}
    app_type: str = ""                  # "snakebids" | "snakemake" | "" (detected by registry_scan)

class PipelineRegistry(BaseModel):
    flows: dict[str, FlowEntry] = {}

Registry Generation (pipeio registry scan)

The scan command discovers flows by walking the pipelines directory:

  1. For each <pipelines_dir>/<pipe>/<flow>/ (or <pipelines_dir>/<pipe>/ if single-flow):
  2. Check for Snakefile or *.smk → entrypoints
  3. Check for config.yml → config_path
  4. Check for notebooks/ → notebook presence
  5. For each entrypoint, scan rule names → extract mods by prefix grouping
  6. Cross-reference with docs directory for documentation paths
  7. Validate slugs, detect missing docs, emit warnings

Output

$ pipeio registry scan
Scanning code/pipelines/ ...
  pipe=preprocess  flow=ieeg     config=yes  mods=6  docs=yes
  pipe=preprocess  flow=ecephys  config=yes  mods=4  docs=no   ⚠ missing docs
  pipe=brainstate  flow=brainstate config=yes mods=1  docs=yes
  ...
Written: .projio/pipeio/registry.yml (8 pipes, 12 flows, 31 mods)

Registry Validation (pipeio registry validate)

Checks:

  1. Slug compliance — all names pass slug_ok()
  2. Config existence — every config_path points to an existing file
  3. Docs coverage — every flow has a docs directory (warning, not error)
  4. Entrypoint existence — every Snakefile/smk path exists
  5. ID uniqueness — no duplicate IDs across the registry
  6. Cross-flow consistency — shared mods across flows have compatible interfaces

Returns a ValidationResult with errors (blocking) and warnings (informational).