Skip to content

Feature: generic per-flow cohort status report

Motivation

Surfaced while running the pixecog spikesorting fan-out: there's no built-in way to ask pipeio "for this flow, which sessions are done, which are in-flight, which are queued, and what are the per-session yield/QC numbers". Existing tools answer adjacent but different questions:

  • pipeio_flow_status — is the flow registered, are its rules wired up?
  • pipeio_run_status — what's the current snakemake invocation state?

Neither walks derivatives/<flow>/ to summarize per-session output state across the cohort.

Concrete sketch from pixecog

I scratched this in pixecog at:

  • code/pipelines/spikesorting/scripts/status_report.py — script that walks raw/sub-*/ses-*/ecephys/*recording-ap*ecephys.dat, computes per-session derivative state (preprocess / sort / curate / qc / complete), reads qc/.../yield.json for unit yields, renders a markdown report.
  • code/pipelines/spikesorting/Makefilemake status target invokes the script.
  • code/pipelines/spikesorting/docs/status.md — generated output (committed alongside docs).

Output sections: per-subject summary table, sessions-completed-per-day, cohort yields table (with columns task / acq / duration / total / GOOD / MUA / NOISE / NON-SOMA / yield% / sort date), in-flight, queued.

Why this generalizes

Every pipeio flow has the same shape: - BIDS inputs declared via pybids_inputs in config.yml - Derivatives organized as derivatives/<flow>/{mod}/sub-XX/ses-YY/ecephys/<base>_<member>.<ext> per the registry - A <flow>.all completion marker per session

So the generic report can: 1. Load config.ymlpybids_inputs → enumerate input sessions (snakebids already does this) 2. Load manifest.yml (the output_manifest) → enumerate expected per-session outputs from the registry 3. For each session × member: probe filesystem, classify stage = furthest-along output that exists 4. Optionally read flow-specific summary JSONs (e.g. qc/.../yield.json) — schema-agnostic via a small per-flow plugin or just a YAML config like cohort_report: { summary_glob: "qc/.../*_yield.json", columns: [total_units, good_units, ...] } 5. Render markdown to a stable path (e.g. docs/<flow>/status.md)

Proposed surface

Two equivalent entry points:

  • CLI / Makefile target: pipeio flow report-cohort <flow> writes docs/<flow>/status.md. Makes it cheap to wire make status per flow.
  • MCP tool: pipeio_flow_report_cohort(flow=...) returns the rendered markdown (or a structured dict). Useful from agentic sessions.

Both should be implementable from the same core function.

What the pixecog implementation does NOT solve

  • Hardcodes the BIDS pattern for spikesorting (recording-ap → spikesorting.all). A generic version needs to read the flow's registry.
  • Hardcodes the per-session yield JSON keys. Generic version needs a flow-level config of "what's interesting to summarize".
  • No auto-refresh / watch mode — operator runs make status manually. That's probably fine; per-flow daemons would be over-engineered.

Refs

  • pixecog files (committed in pixecog today, hash TBD): code/pipelines/spikesorting/scripts/status_report.py, code/pipelines/spikesorting/Makefile, code/pipelines/spikesorting/docs/status.md.
  • Source project: pixecog spikesorting flow run, fan-out tracked in docs/log/result/result-arash-20260504-005039-272664.md.

Source context: pixecog

PixEcog (pixecog): Neuropixels and ECoG dataset and analysis

Recent commits:

3d89457 notio: fold task status field — 14 distinct values → 5 canonical
ce81638 notio: Anton-feedback task batch + Master TAC update + START_HERE
2442af5 pickup-noise-cleaning: living report + Plan B coherence-failure addendum + 13.62 Hz attribution fix

README:


type: readme


Quick Start for Collaborators

Follow this checklist to get started with Pixecog documentation and workflows.

🐀 Pixecog Project — Compact Overview

Core principles

  • One immutable BIDS raw dataset (raw/) as the canonical baseline
  • Each analysis pipeline ha