Feature: generic per-flow cohort status report¶

Motivation¶

Surfaced while running the pixecog spikesorting fan-out: there's no built-in way to ask pipeio "for this flow, which sessions are done, which are in-flight, which are queued, and what are the per-session yield/QC numbers". Existing tools answer adjacent but different questions:

pipeio_flow_status — is the flow registered, are its rules wired up?
pipeio_run_status — what's the current snakemake invocation state?

Neither walks derivatives/<flow>/ to summarize per-session output state across the cohort.

Concrete sketch from pixecog¶

I scratched this in pixecog at:

code/pipelines/spikesorting/scripts/status_report.py — script that walks raw/sub-*/ses-*/ecephys/*recording-ap*ecephys.dat, computes per-session derivative state (preprocess / sort / curate / qc / complete), reads qc/.../yield.json for unit yields, renders a markdown report.
code/pipelines/spikesorting/Makefile — make status target invokes the script.
code/pipelines/spikesorting/docs/status.md — generated output (committed alongside docs).

Output sections: per-subject summary table, sessions-completed-per-day, cohort yields table (with columns task / acq / duration / total / GOOD / MUA / NOISE / NON-SOMA / yield% / sort date), in-flight, queued.

Why this generalizes¶

Every pipeio flow has the same shape: - BIDS inputs declared via pybids_inputs in config.yml - Derivatives organized as derivatives/<flow>/{mod}/sub-XX/ses-YY/ecephys/<base>_<member>.<ext> per the registry - A <flow>.all completion marker per session

So the generic report can: 1. Load config.yml → pybids_inputs → enumerate input sessions (snakebids already does this) 2. Load manifest.yml (the output_manifest) → enumerate expected per-session outputs from the registry 3. For each session × member: probe filesystem, classify stage = furthest-along output that exists 4. Optionally read flow-specific summary JSONs (e.g. qc/.../yield.json) — schema-agnostic via a small per-flow plugin or just a YAML config like cohort_report: { summary_glob: "qc/.../*_yield.json", columns: [total_units, good_units, ...] } 5. Render markdown to a stable path (e.g. docs/<flow>/status.md)

Proposed surface¶

Two equivalent entry points:

CLI / Makefile target: pipeio flow report-cohort <flow> writes docs/<flow>/status.md. Makes it cheap to wire make status per flow.
MCP tool: pipeio_flow_report_cohort(flow=...) returns the rendered markdown (or a structured dict). Useful from agentic sessions.

Both should be implementable from the same core function.

What the pixecog implementation does NOT solve¶

Hardcodes the BIDS pattern for spikesorting (recording-ap → spikesorting.all). A generic version needs to read the flow's registry.
Hardcodes the per-session yield JSON keys. Generic version needs a flow-level config of "what's interesting to summarize".
No auto-refresh / watch mode — operator runs make status manually. That's probably fine; per-flow daemons would be over-engineered.

Refs¶

pixecog files (committed in pixecog today, hash TBD): code/pipelines/spikesorting/scripts/status_report.py, code/pipelines/spikesorting/Makefile, code/pipelines/spikesorting/docs/status.md.
Source project: pixecog spikesorting flow run, fan-out tracked in docs/log/result/result-arash-20260504-005039-272664.md.

Source context: pixecog¶

PixEcog (pixecog): Neuropixels and ECoG dataset and analysis

Recent commits:

3d89457 notio: fold task status field — 14 distinct values → 5 canonical
ce81638 notio: Anton-feedback task batch + Master TAC update + START_HERE
2442af5 pickup-noise-cleaning: living report + Plan B coherence-failure addendum + 13.62 Hz attribution fix

README:

type: readme

Quick Start for Collaborators¶

Follow this checklist to get started with Pixecog documentation and workflows.

🐀 Pixecog Project — Compact Overview¶

Core principles

One immutable BIDS raw dataset (raw/) as the canonical baseline
Each analysis pipeline ha