Auto-QC: flow-specific validation notebooks that produce structured result notes¶
Problem¶
After a pipeline produces derivatives, an agent cannot autonomously assess output quality. The infrastructure to capture results exists (questio-record, result notes), but the judgment — "is this SWR detection valid?" — requires domain-specific validation logic. This is the last major gap (~10%) in the agentic research machine.
Proposed pattern: validation notebooks per flow¶
Each flow that produces evidence for a hypothesis should have a validation notebook that: 1. Loads the pipeline output (derivatives) 2. Runs QC checks (domain-specific: detection rates, waveform morphology, cross-subject consistency, etc.) 3. Produces a machine-readable QC summary (JSON or YAML in a standard schema) 4. Generates validation figures
The QC summary follows a standard schema so questio-record can parse it:
# Output by validation notebook
qc:
flow: sharpwaveripple
milestone: swr-detection-validated
subjects: [sub-01, sub-02, sub-03, sub-04, sub-05]
metrics:
detection_rate_per_minute: {mean: 12.3, std: 2.1, per_subject: [11.2, 13.4, 12.1, 10.8, 14.0]}
mean_amplitude_uv: {mean: 145, std: 23}
false_positive_rate: {mean: 0.03, threshold: 0.05, pass: true}
overall_pass: true
figures: [swr_detection_summary.html, swr_waveforms.html]
Workflow¶
pipeio_run(flow="sharpwaveripple") → derivatives produced
→ pipeio_nb_exec(notebook="validate_swr_detection") → QC summary + figures
→ questio-record parses QC summary → structured result note
→ questio milestone updated with evidence
Implementation per flow¶
Each flow needs its own validation notebook. For pixecog:
| Flow | Validation notebook | Key QC metrics |
|---|---|---|
| preprocess_ieeg | validate_ttl_removal | spectral power before/after, TTL harmonic suppression |
| preprocess_ecephys | validate_ecephys_preprocessing | signal quality, noise floor, channel yield |
| sharpwaveripple | validate_swr_detection | detection rate, waveform morphology, cross-subject consistency |
| spectrogram_burst | validate_burst_detection | spindle/delta detection rates, frequency bounds, duration stats |
| coupling_spindle_ripple | validate_coupling | cross-correlogram peaks, effect sizes, statistical significance |
| brainstate | validate_brainstate | epoch durations, transition rates, agreement with manual scoring |
What's generic vs project-specific¶
Generic (projio-level):
- QC summary schema (the YAML format above)
- questio-record parsing of QC output
- pipeio convention: validate_*.ipynb notebooks in each flow's notebook workspace
- A pipeio tool or skill: pipeio_validate(flow) that runs the validation notebook and returns structured QC
Project-specific: - The actual validation logic (what makes a good SWR detection) - Threshold values (detection_rate > 5/min, false_positive_rate < 0.05) - Per-subject acceptance criteria
Difficulty¶
Hard — not because of infrastructure (that's done) but because each validation notebook requires scientific expertise to define "what good looks like." This is real research work: the researcher writes the validation notebook, the infrastructure automates running it and feeding results into questio.
Source context: pixecog¶
PixEcog (pixecog): Neuropixels and ECoG dataset and analysis
Recent commits:
36f9326 Add result note directory and sample note
62841d9 Add questio YAML data model (questions.yml + milestones.yml)
9b2f6fa Scaffold ecephys TTL removal mod, flow overview + mod docs, demo notebook
README:
type: readme
Quick Start for Collaborators¶
Follow this checklist to get started with Pixecog documentation and workflows.
🐀 Pixecog Project — Compact Overview¶
Core principles
- One immutable BIDS raw dataset (
raw/) as the canonical baseline - Each analysis pipeline ha
Related Notes¶
- idea-arash-20260408-035007-479946.md — Directly forms the preceding step in the same agentic loop: auto-dispatch questio_gap → pipeio_run feeds into the auto-QC → questio-record cycle proposed here
- idea-arash-20260407-225436-752515.md — studyio hypothesis-aware orchestration is the broader framing for what auto-QC enables — structured QC summaries updating questio milestones is a core studyio primitive
- idea-arash-20260330-162752-883803.md — pipeio smart read tools (mod_context, notebook metadata) are the read-side complement to pipeio_nb_exec used in the proposed auto-QC workflow
- idea-arash-20260403-172004-817050.md — The auto-QC validation notebook pattern is a concrete skill candidate for the projio ecosystem skill list discussed in that note
- idea-arash-20260330-174518-164647.md — pipeio v2 roadmap context — auto-QC validation notebooks represent a natural extension of the notebook execution infrastructure being scoped there