Skip to content

Auto-QC: flow-specific validation notebooks that produce structured result notes

Problem

After a pipeline produces derivatives, an agent cannot autonomously assess output quality. The infrastructure to capture results exists (questio-record, result notes), but the judgment — "is this SWR detection valid?" — requires domain-specific validation logic. This is the last major gap (~10%) in the agentic research machine.

Proposed pattern: validation notebooks per flow

Each flow that produces evidence for a hypothesis should have a validation notebook that: 1. Loads the pipeline output (derivatives) 2. Runs QC checks (domain-specific: detection rates, waveform morphology, cross-subject consistency, etc.) 3. Produces a machine-readable QC summary (JSON or YAML in a standard schema) 4. Generates validation figures

The QC summary follows a standard schema so questio-record can parse it:

# Output by validation notebook
qc:
  flow: sharpwaveripple
  milestone: swr-detection-validated
  subjects: [sub-01, sub-02, sub-03, sub-04, sub-05]
  metrics:
    detection_rate_per_minute: {mean: 12.3, std: 2.1, per_subject: [11.2, 13.4, 12.1, 10.8, 14.0]}
    mean_amplitude_uv: {mean: 145, std: 23}
    false_positive_rate: {mean: 0.03, threshold: 0.05, pass: true}
  overall_pass: true
  figures: [swr_detection_summary.html, swr_waveforms.html]

Workflow

pipeio_run(flow="sharpwaveripple") → derivatives produced
  → pipeio_nb_exec(notebook="validate_swr_detection") → QC summary + figures
    → questio-record parses QC summary → structured result note
      → questio milestone updated with evidence

Implementation per flow

Each flow needs its own validation notebook. For pixecog:

Flow Validation notebook Key QC metrics
preprocess_ieeg validate_ttl_removal spectral power before/after, TTL harmonic suppression
preprocess_ecephys validate_ecephys_preprocessing signal quality, noise floor, channel yield
sharpwaveripple validate_swr_detection detection rate, waveform morphology, cross-subject consistency
spectrogram_burst validate_burst_detection spindle/delta detection rates, frequency bounds, duration stats
coupling_spindle_ripple validate_coupling cross-correlogram peaks, effect sizes, statistical significance
brainstate validate_brainstate epoch durations, transition rates, agreement with manual scoring

What's generic vs project-specific

Generic (projio-level): - QC summary schema (the YAML format above) - questio-record parsing of QC output - pipeio convention: validate_*.ipynb notebooks in each flow's notebook workspace - A pipeio tool or skill: pipeio_validate(flow) that runs the validation notebook and returns structured QC

Project-specific: - The actual validation logic (what makes a good SWR detection) - Threshold values (detection_rate > 5/min, false_positive_rate < 0.05) - Per-subject acceptance criteria

Difficulty

Hard — not because of infrastructure (that's done) but because each validation notebook requires scientific expertise to define "what good looks like." This is real research work: the researcher writes the validation notebook, the infrastructure automates running it and feeding results into questio.


Source context: pixecog

PixEcog (pixecog): Neuropixels and ECoG dataset and analysis

Recent commits:

36f9326 Add result note directory and sample note
62841d9 Add questio YAML data model (questions.yml + milestones.yml)
9b2f6fa Scaffold ecephys TTL removal mod, flow overview + mod docs, demo notebook

README:


type: readme


Quick Start for Collaborators

Follow this checklist to get started with Pixecog documentation and workflows.

🐀 Pixecog Project — Compact Overview

Core principles

  • One immutable BIDS raw dataset (raw/) as the canonical baseline
  • Each analysis pipeline ha
  • idea-arash-20260408-035007-479946.md — Directly forms the preceding step in the same agentic loop: auto-dispatch questio_gap → pipeio_run feeds into the auto-QC → questio-record cycle proposed here
  • idea-arash-20260407-225436-752515.md — studyio hypothesis-aware orchestration is the broader framing for what auto-QC enables — structured QC summaries updating questio milestones is a core studyio primitive
  • idea-arash-20260330-162752-883803.md — pipeio smart read tools (mod_context, notebook metadata) are the read-side complement to pipeio_nb_exec used in the proposed auto-QC workflow
  • idea-arash-20260403-172004-817050.md — The auto-QC validation notebook pattern is a concrete skill candidate for the projio ecosystem skill list discussed in that note
  • idea-arash-20260330-174518-164647.md — pipeio v2 roadmap context — auto-QC validation notebooks represent a natural extension of the notebook execution infrastructure being scoped there