Systematic literature processing — biblio × questio bridge¶
Problem¶
biblio manages papers (ingest, docling, grobid, enrich, RAG). questio manages hypotheses (questions.yml, milestones, evidence from result notes). But there's no structured layer connecting them — nothing that says "paper X found Y about hypothesis H1 with parameter Z, and our dataset can reproduce/extend/contradict it."
The gap was exposed when processing Swanson et al. 2025 for pixecog — manually extracting hypothesis-relevant findings, methods parameters, and dataset opportunities took significant effort and produced output with no canonical home in the biblio derivatives tree.
Suggested tools¶
1. biblio_extract(citekey, force=False)
- Reads docling markdown + grobid refs for a citekey
- Reads plan/questions.yml for hypothesis definitions
- Writes structured YAML to bib/derivatives/claude/{citekey}/extract.yml
- Schema per paper:
- relevance: per-hypothesis mapping (strength: direct/supporting/tangential/none, finding, parameters, implication for our dataset)
- methods: detection methods, thresholds, bands, software used
- dataset_opportunities: reproduce / extend / contradict lists
- species, regions, recording, n_subjects, sleep_stages
- This is the LLM extraction step — uses docling full text as input, questions.yml as the evaluation frame
- Could be run in batch (biblio_extract_batch) for all papers with docling output
2. questio_prior_art(question_id?)
- Reads all bib/derivatives/claude/*/extract.yml files
- Assembles per-hypothesis literature evidence tables:
- What each study found (finding, lag, effect size, species, n)
- Consensus parameters (range across studies)
- What our dataset uniquely adds
- Outputs markdown to docs/plan/prior_art/ (or integrates into questio_docs_collect)
- This is the synthesis layer — no LLM needed, just YAML aggregation + markdown generation
3. Extend questio_gap(question_id) response
- Currently shows: milestones, result-note evidence, blockers
- Could additionally show: "literature expectations" pulled from extract.yml files
- e.g., "4 studies report D-U→SWR lag of 100-150 ms; our milestone delta-ripple-coupling should reproduce this"
Why this matters¶
- Makes literature review a first-class pipeline output, not ad hoc notes
- Connects papers to hypotheses bidirectionally (which papers matter for H1? which hypotheses does paper X inform?)
- Captures specific parameters for pipeline validation targets (e.g., "expect 30 ms SWR→DOWN lag per Swanson 2025")
- The reproduce/extend/contradict frame drives manuscript framing directly
- Enables batch processing: ingest 20 papers → extract all → synthesize per hypothesis automatically
Design considerations¶
bib/derivatives/claude/follows the existing derivatives pattern (docling/, grobid/, openalex/)- extract.yml is YAML not markdown — composable for synthesis
- The extraction prompt needs the project's questions.yml as context — this is project-specific, not generic biblio
- Could live in biblio (since it's per-paper) or questio (since it's hypothesis-driven) — probably biblio since the output is per-citekey under bib/derivatives/
Source context: pixecog¶
PixEcog (pixecog): Neuropixels and ECoG dataset and analysis
Recent commits:
8dc0d9d Pipeline docs: gitignore docs/pipelines/, relocate hand-authored files
96cd1ec Refactor sharpwaveripple/contracts: extract generic helpers to utils/io, remove pipelines __init__.py
36f9326 Add result note directory and sample note
README:
type: readme
Quick Start for Collaborators¶
Follow this checklist to get started with Pixecog documentation and workflows.
🐀 Pixecog Project — Compact Overview¶
Core principles
- One immutable BIDS raw dataset (
raw/) as the canonical baseline - Each analysis pipeline ha
Related Notes¶
- idea-arash-20260407-225436-752515.md — studyio proposes hypothesis-aware research orchestration — directly overlaps with the biblio×questio bridge; both aim to connect literature evidence to hypothesis tracking
- idea-arash-20260408-035007-479946.md — questio_gap→pipeio_run involves questio evidence gaps; questio_prior_art proposed here is the complementary tool that fills those gaps from the literature side
- idea-arash-20260408-035035-245990.md — auto-QC structured result notes share the same schema design problem: where canonical structured evidence lives in the derivatives tree
- idea-arash-20260326-155537-817950.md — biblio_docling async mode is a prerequisite for biblio_extract_batch — both operate on the same docling markdown output
- idea-arash-20260403-172004-817050.md — skill candidates for projio ecosystem — biblio_extract and questio_prior_art are natural additions to that candidate list