Feature: generic per-flow cohort status report¶
Motivation¶
Surfaced while running the pixecog spikesorting fan-out: there's no built-in way to ask pipeio "for this flow, which sessions are done, which are in-flight, which are queued, and what are the per-session yield/QC numbers". Existing tools answer adjacent but different questions:
pipeio_flow_status— is the flow registered, are its rules wired up?pipeio_run_status— what's the current snakemake invocation state?
Neither walks derivatives/<flow>/ to summarize per-session output state across the cohort.
Concrete sketch from pixecog¶
I scratched this in pixecog at:
code/pipelines/spikesorting/scripts/status_report.py— script that walksraw/sub-*/ses-*/ecephys/*recording-ap*ecephys.dat, computes per-session derivative state (preprocess / sort / curate / qc / complete), readsqc/.../yield.jsonfor unit yields, renders a markdown report.code/pipelines/spikesorting/Makefile—make statustarget invokes the script.code/pipelines/spikesorting/docs/status.md— generated output (committed alongside docs).
Output sections: per-subject summary table, sessions-completed-per-day, cohort yields table (with columns task / acq / duration / total / GOOD / MUA / NOISE / NON-SOMA / yield% / sort date), in-flight, queued.
Why this generalizes¶
Every pipeio flow has the same shape:
- BIDS inputs declared via pybids_inputs in config.yml
- Derivatives organized as derivatives/<flow>/{mod}/sub-XX/ses-YY/ecephys/<base>_<member>.<ext> per the registry
- A <flow>.all completion marker per session
So the generic report can:
1. Load config.yml → pybids_inputs → enumerate input sessions (snakebids already does this)
2. Load manifest.yml (the output_manifest) → enumerate expected per-session outputs from the registry
3. For each session × member: probe filesystem, classify stage = furthest-along output that exists
4. Optionally read flow-specific summary JSONs (e.g. qc/.../yield.json) — schema-agnostic via a small per-flow plugin or just a YAML config like cohort_report: { summary_glob: "qc/.../*_yield.json", columns: [total_units, good_units, ...] }
5. Render markdown to a stable path (e.g. docs/<flow>/status.md)
Proposed surface¶
Two equivalent entry points:
- CLI / Makefile target:
pipeio flow report-cohort <flow>writesdocs/<flow>/status.md. Makes it cheap to wiremake statusper flow. - MCP tool:
pipeio_flow_report_cohort(flow=...)returns the rendered markdown (or a structured dict). Useful from agentic sessions.
Both should be implementable from the same core function.
What the pixecog implementation does NOT solve¶
- Hardcodes the BIDS pattern for
spikesorting(recording-ap → spikesorting.all). A generic version needs to read the flow's registry. - Hardcodes the per-session yield JSON keys. Generic version needs a flow-level config of "what's interesting to summarize".
- No auto-refresh / watch mode — operator runs
make statusmanually. That's probably fine; per-flow daemons would be over-engineered.
Refs¶
- pixecog files (committed in pixecog today, hash TBD):
code/pipelines/spikesorting/scripts/status_report.py,code/pipelines/spikesorting/Makefile,code/pipelines/spikesorting/docs/status.md. - Source project: pixecog spikesorting flow run, fan-out tracked in
docs/log/result/result-arash-20260504-005039-272664.md.
Source context: pixecog¶
PixEcog (pixecog): Neuropixels and ECoG dataset and analysis
Recent commits:
3d89457 notio: fold task status field — 14 distinct values → 5 canonical
ce81638 notio: Anton-feedback task batch + Master TAC update + START_HERE
2442af5 pickup-noise-cleaning: living report + Plan B coherence-failure addendum + 13.62 Hz attribution fix
README:
type: readme
Quick Start for Collaborators¶
Follow this checklist to get started with Pixecog documentation and workflows.
🐀 Pixecog Project — Compact Overview¶
Core principles
- One immutable BIDS raw dataset (
raw/) as the canonical baseline - Each analysis pipeline ha