Research Orchestration Layer — Design Spec¶
Status: draft Date: 2026-04-07 Origin: idea-arash-20260407-225257-089158 (pixecog), idea-arash-20260407-225436-752515 (projio)
1. Problem¶
Projio's six subsystems cover execution (pipeio), knowledge capture (notio, biblio, codio), retrieval (indexio), and presentation (figio, manuscripto). But no subsystem understands why work is being done. An agent can run a pipeline and capture a note, but cannot answer:
- "What pipeline runs are needed to test hypothesis H3?"
- "Is there enough evidence to draft the Results section for H4?"
- "What is the highest-impact unblocked task right now?"
- "What should the supervisor see this week?"
These require a research reasoning layer that connects hypotheses to evidence to manuscript — the missing ~40% of the research workflow.
The gap in concrete terms¶
| Capability | Current state | What's missing |
|---|---|---|
| Track research questions | Manual markdown in plan/ |
Machine-readable registry, query tools |
| Link results to hypotheses | Unstructured note tags | Typed evidence records with hypothesis/milestone fields |
| Assess evidence sufficiency | Human reads notes and judges | Automated gap analysis per hypothesis |
| Decide what to work on next | Human reads milestones | Dependency-aware dispatch based on hypothesis impact |
| Bridge evidence to manuscript | Human copy-pastes results | Structured evidence → manuscript section mapping |
| Report progress | Human writes summary | Automated progress aggregation |
2. Design principles¶
- Convention over configuration — the data model uses plain YAML/markdown in
plan/. No database, no daemon. Git-native. - Read-first — most value comes from querying existing plan/ and log/ data. Write tools come later.
- Composable, not monolithic — this layer orchestrates existing subsystems (pipeio, notio, manuscripto), not replaces them.
- Project-local — this is per-project research reasoning. Cross-project coordination belongs to worklog (which may optionally consume this layer's data, never the reverse).
- Agentic-first — tools are designed for autonomous agent workflows: orient → plan → execute → record → assess → write.
- Graceful degradation — works with partial data. A project with only
plan/hypotheses.ymlstill gets value; structured result notes and milestone tracking add more.
3. Naming¶
Decided: questio — from Latin quaestio (question, inquiry). Research is fundamentally question-driven. The central entity is the research question (hypothesis is a specific type). The subsystem manages the cycle from question → evidence → answer → manuscript. Short, clear, follows the *-io pattern.
4. Architecture¶
4.1 Not a separate package¶
Unlike biblio or pipeio, this layer is compositional — it orchestrates existing subsystems rather than managing a new domain with substantial independent logic. Making it a separate package would create circular import issues (it needs pipeio, notio, manuscripto), add a package for what is primarily query/aggregation logic, and fragment the orchestration that is projio's core value.
4.2 Skills-first, convention-driven¶
Questio is implemented as four lightweight layers, not a full subsystem module:
| Layer | What | Where |
|---|---|---|
| Convention | YAML schemas for questions + milestones | plan/questions.yml, plan/milestones.yml |
| Note type | Dedicated result note type with structured frontmatter |
notio.toml + docs/log/result/ |
| Deliverables | Shareable reports, presentations, posters | docs/deliverables/{reports,presentations,posters}/ |
| Skills | Prompt-based skills that compose existing MCP tools | .projio/skills/questio-*.md |
| Query tools | 2–3 thin MCP tools for structured YAML parsing | src/projio/mcp/questio.py |
The "intelligence" lives in the skill prompts — the agent uses existing tools (notio, pipeio, manuscripto) to execute. The only hard code is a thin MCP module for structured queries that skills can't do efficiently (dependency graph resolution, milestone aggregation, evidence gap analysis).
# Code footprint (minimal)
src/projio/
mcp/
questio.py # 2–3 MCP tools: questio_status, questio_gap, questio_docs_collect
# Convention footprint (per-project, all under docs/plan/)
docs/plan/
questions.yml # research question registry (YAML, source of truth)
milestones.yml # milestone definitions with dependencies (YAML, source of truth)
questions.md # auto-generated from questions.yml
milestones.md # auto-generated from milestones.yml
roadmap.md # auto-generated mermaid diagram from dependency graph
evidence.md # auto-generated evidence index grouped by question
# Evidence capture
docs/log/result/ # notio result notes with structured frontmatter
# Deliverables (shareable artifacts — see specs/deliverables.md)
docs/deliverables/
reports/ # questio-report persisted output
presentations/ # slide decks (one dir per deck)
posters/ # conference posters (one dir per poster)
# Skills
.projio/skills/
questio-status.md # orient: show research state
questio-next.md # plan: recommend highest-impact work
questio-record.md # record: guided result capture
questio-report.md # report: supervisor-ready summary, persisted to deliverables/
questio-docs-refresh.md # docs: regenerate plan/ pages from YAML
4.5 Deliverables convention¶
Deliverables are polished artifacts for external audiences (supervisor, team, conference). They are distinct from notes (atomic knowledge in docs/log/) and manuscripts (docs/manuscript/). Three types are supported:
- Reports — progress summaries generated by
questio-report, persisted as dated markdown files - Presentations — slide decks for lab meetings, conferences, or talks
- Posters — conference posters (future: figio-based composition)
Deliverables are not notio notes — they aggregate existing data rather than capturing new knowledge. Index pages in docs/deliverables/ are auto-generated by questio_docs_collect() when the directory exists.
See Deliverables Convention for the full spec.
4.3 Data model as convention¶
Data lives in the project repo under plan/ as YAML files. No hidden directories, no .projio/questio/. Humans can read and edit the YAML directly. The YAML is the single source of truth; all rendered markdown in docs/plan/ is auto-generated output (like compiled.bib in the biblio pipeline).
4.4 Dedicated result note type¶
Evidence is captured as notio notes with a dedicated result type — not just tagged idea notes. This gives:
- Own directory (
docs/log/result/) — avoids clutteringidea/with structured data - Own template with pre-filled frontmatter fields (question, milestone, metric, etc.)
- Own index page on the docs site
- Clear separation between exploratory ideas and structured evidence
Requires adding a result note type to notio's configuration (via project-level notio.toml).
5. Data model¶
5.1 Research questions (plan/questions.yml)¶
# plan/questions.yml
questions:
H1:
text: "Do cortical delta waves precede ripple initiation?"
type: hypothesis # hypothesis | exploratory | descriptive
prediction: "Large, spatially coherent cortical delta waves precede ripple initiation"
pipelines: [spectrogram_burst, sharpwaveripple, coupling_spindle_ripple]
milestones: [swr-detection-validated, delta-event-detection, delta-ripple-coupling]
manuscript_section: results/h1-delta-ripple
status: not_started # not_started | in_progress | blocked | sufficient | confirmed | refuted
depends_on: [] # other question IDs
citations: ["@sirota_2003", "@isomura_2006"]
H2:
text: "What are the cortical origins of ripple-driving spindles?"
type: hypothesis
prediction: "Ripple-triggering spindles originate from specific association regions"
pipelines: [spectrogram_burst]
milestones: [spindle-detection-validated, spindle-topography-mapped]
manuscript_section: results/h2-spindle-origins
status: not_started
citations: ["@siapas_1998", "@pedrosa_2024"]
Design choices:
- Questions are the top-level entity, not hypotheses — supports exploratory research too.
- type field distinguishes hypothesis (testable prediction) from exploratory (open-ended) from descriptive (characterization).
- pipelines lists pipeio flow names needed to generate evidence.
- milestones lists prerequisite milestones (defined separately).
- manuscript_section maps to manuscripto section paths.
- status uses a research-appropriate vocabulary, not generic task states.
5.2 Milestones (plan/milestones.yml)¶
# plan/milestones.yml
milestones:
preprocessing-stable:
description: "All preprocessing pipelines validated for all subjects"
flow: preprocess_ieeg # primary pipeio flow (structured link)
pipelines: [preprocess_ieeg, preprocess_ecephys] # all flows, for multi-flow milestones
depends_on: [ttl-removal-validated]
status: in_progress
evidence: []
swr-detection-validated:
description: "SWR detection validated across all subjects"
flow: sharpwaveripple
pipelines: [sharpwaveripple]
depends_on: [preprocessing-stable]
status: not_started
evidence: [] # filled by agent or manually: list of note IDs
delta-ripple-coupling:
description: "Delta-ripple temporal coupling quantified"
flow: coupling_spindle_ripple
pipelines: [coupling_spindle_ripple]
depends_on: [swr-detection-validated, delta-event-detection]
status: not_started
evidence: []
ttl-removal-validated:
description: "TTL artifact removal validated for iEEG and neuropixels"
flow: preprocess_ieeg
pipelines: [preprocess_ieeg]
depends_on: []
status: in_progress
evidence: []
Design choices:
- Milestones are decoupled from questions — multiple questions can share a milestone.
- depends_on enables dependency graph resolution for dispatch.
- evidence is a list of note IDs (pointers to notio result notes).
- Status vocabulary: not_started | in_progress | blocked | complete.
- flow is the primary structured link to a pipeio flow. It enables direct resolution: questio_gap returns the flow field → the agent calls pipeio_flow_status(flow) without NLU or guessing. When flow is absent but pipelines has exactly one entry, treat that as the flow.
- pipelines is retained for milestones that span multiple flows (e.g., preprocessing-stable requires both preprocess_ieeg and preprocess_ecephys). For single-flow milestones, pipelines mirrors flow for backward compatibility.
5.3 Evidence records (notio result notes)¶
Evidence is captured as notio notes with a dedicated result type:
# docs/log/result/result-arash-20260415-143022-123456.md
---
title: "SWR detection rate across subjects"
tags: [result]
series: sharpwaveripple
question: [H1, H3] # links to questions.yml IDs
milestone: swr-detection-validated # links to milestones.yml ID
subjects: [sub-01, sub-02, sub-03, sub-04, sub-05]
metric: detection_rate_per_minute
value: "12.3 +/- 2.1"
figure: docs/pipelines/sharpwaveripple/notebooks/swr_detection_summary.html
confidence: preliminary # preliminary | validated | final
---
SWR detection produces 12.3 +/- 2.1 events per minute across 5 subjects.
Rate is consistent with literature values (10-15/min during NREM).
...
Qualitative evidence — not all results have metrics. Observations ("the spectrograms
look clean after filtering") use the same note type with metric: qualitative and
the value field as free text:
metric: qualitative
value: "Spectrograms show clean signal after TTL removal, no residual artifacts visible"
confidence: preliminary
Design choices:
- Uses notio infrastructure — no parallel note system.
- Dedicated result note type → own directory (docs/log/result/), own template, own index.
- question and milestone fields are the semantic links that enable evidence querying.
- confidence tracks how "done" a result is (preliminary findings vs validated vs final).
- metric + value support both quantitative and qualitative evidence.
5.4 Compatibility with existing pixecog plan/¶
Pixecog already has plan/Milestones.md (markdown tables) and plan/master/03-Questions-and-Hypotheses.md (prose). The migration:
- Convert existing markdown tables →
plan/questions.yml+plan/milestones.yml. - Existing
plan/Milestones.mdbecomes an auto-generated output (regenerated byquestio_docs_collect). plan/master/03-Questions-and-Hypotheses.mdremains as the prose narrative — YAML handles the structured/queryable data, prose handles the scientific context.- YAML is the single source of truth. All rendered views in
docs/plan/are output artifacts.
5.5 Observation notes (mid-loop recording)¶
During investigate and iterate loops (see loop-mechanisms.md), agents capture mid-loop findings as lightweight observation notes. These use notio's existing idea note type — no dedicated note type is needed.
Convention:
---
title: "Observation: sub-03 missing TTL events in first 100s"
tags: [observation, investigate] # or [observation, iterate]
series: preprocess_ieeg # links to the flow under investigation
---
<observation body — what was found, what it means, what to check next>
Design choices:
- Observations use idea notes with tags: [observation] — they do NOT need a dedicated note type. Observations are lightweight and ephemeral.
- The series field links the observation to the relevant flow, enabling lookup by flow name.
- Tags include the loop type (investigate or iterate) for filtering.
- Observations are inputs to result notes, not evidence themselves. They do not appear in milestone evidence lists.
- Observations accumulate during a loop and are referenced in the body of the final result note that records the loop's conclusion.
- The convention is: "when in doubt, write an observation note." The cost of an extra note is near zero; the cost of lost context when a loop is interrupted is high.
6. Tools and skills¶
Questio divides its surface into MCP tools (structured queries requiring code) and skills (prompt-based compositions of existing tools). The split follows a simple rule: if it needs to parse YAML, resolve a dependency graph, or aggregate structured data — it's a tool. If it needs judgment, composition, or natural language output — it's a skill.
6.1 MCP tools (hard code — src/projio/mcp/questio.py)¶
| Tool | Args | Returns | Description |
|---|---|---|---|
questio_status |
question_id? |
questions with status, evidence counts, milestone completion %, blockers | Overview of research state. Parses YAML + scans result notes. No args = all questions. |
questio_gap |
question_id |
unmet milestones (with flow field), missing pipeline runs, confidence levels, dependency blockers |
"What's missing to answer H3?" Requires dependency resolution. Returns each milestone's flow field for direct pipeio_flow_status resolution. |
questio_docs_collect |
— | list of generated files | Regenerate docs/plan/ pages from YAML: questions table, milestones table, mermaid roadmap, evidence index. Follows pipeio_docs_collect pattern. |
Tool count: 3. Everything else is a skill.
6.2 Skills (prompt-based — .projio/skills/questio-*.md)¶
| Skill | Composes | Description |
|---|---|---|
questio-next |
questio_status + questio_gap + pipeio_flow_status |
"What should I work on?" Agent reasons over status + gaps to recommend highest-impact unblocked work. Returns flow field for direct dispatch. |
questio-ground |
paper_context + codio_discover + rag_query |
Before starting work on a milestone: gather literature context, find existing code, search prior decisions. Sets quality criteria and expected values. Also supports diagnostic grounding: when entering an investigate loop, ground on expected outputs to enable comparison against actual. |
questio-record |
note_create(type="result") + YAML update |
Guided result capture: agent creates a result note with proper frontmatter, then updates milestones.yml evidence list. Also supports mid-loop use: creating observation notes during investigate/iterate loops (see section 5.5). |
questio-investigate |
pipeio_target_paths + pipeio_nb_read + pipeio_log_parse + paper_context + note_create |
Agent-driven deep dive into an anomaly or issue. The agent inspects pipeline outputs, compares against grounded expectations, traces causes, and proposes explanations or fixes. Replaces questio-validate — validation is agent judgment, not pre-scripted notebook execution. See loop-mechanisms.md section 2. |
questio-iterate |
pipeio_run + pipeio_nb_exec + pipeio_target_paths + questio-record |
Execute-and-evaluate cycle within a human feedback loop. Replaces the standalone dispatch pattern — dispatch is one step within iterate. Covers: modify → execute → assess → report → receive feedback → next cycle. See loop-mechanisms.md section 3. |
questio-report |
questio_status + note_search |
Generate supervisor-ready progress summary: milestones hit, key results, blockers, next steps. |
questio-ready |
questio_status + questio_gap + manuscript_status |
"Which manuscript sections can I draft now?" Check evidence sufficiency per question. |
questio-session |
questio_status + questio-next + questio-ground + questio-report |
Full research session workflow: orient → plan → ground → work → report. Phase 4 (Execute) uses the iterate loop pattern. Recognizes when investigation is needed and switches to investigate mode. |
questio-docs-refresh |
questio_docs_collect + pipeio_docs_nav |
Regenerate all plan/ docs and patch mkdocs nav. |
Note on questio-validate (removed): the original design included a questio-validate skill that ran pre-scripted validation notebooks per flow. This has been replaced by the agent-as-judge philosophy: the agent inspects outputs directly and applies judgment informed by grounding context. The questio-investigate skill covers the same use case with more flexibility. See loop-mechanisms.md section 1 for the rationale.
6.3 Why this split works¶
Skills handle judgment calls (what's highest-impact? is this evidence sufficient? what should the supervisor see?) — these benefit from LLM reasoning and don't need deterministic code. The agent reads the structured data via questio_status/questio_gap, then applies research judgment.
MCP tools handle structured data operations (parse YAML, resolve dependency graphs, count evidence, generate markdown) — these need deterministic code and would be unreliable as prompt-only skills.
6.4 Tools and skills NOT included (and why)¶
| Excluded | Reason |
|---|---|
questio_dispatch (auto-run pipelines for a hypothesis) |
Subsumed by questio-iterate. Dispatch is execute-without-evaluate; iterate adds the essential assess-and-feedback cycle. See loop-mechanisms.md section 3. |
questio-validate (pre-scripted validation notebooks) |
Replaced by agent-as-judge philosophy. Validation is agent judgment during the iterate/investigate loops, not a pre-scripted notebook. See loop-mechanisms.md section 1. |
questio_milestone_update as MCP tool |
YAML file edit is simple enough for a skill (questio-record) to handle via file write. |
questio_evidence as MCP tool |
questio_status returns evidence counts; the skill can note_search(tags=["result"], series=...) for full details. |
questio_deps as MCP tool |
questio_gap already returns dependency information. Mermaid diagram is in the auto-generated roadmap.md. |
| Anything calling worklog | Worklog is external. One-way data flow: worklog reads plan/ files, never the reverse. |
7. Agentic workflow¶
A fully autonomous research session using the tool set:
# 1. Orient
agent → questio_status()
"7 questions. H1-H3 (CTX→HPC) blocked on preprocessing.
Preprocessing: 2/4 milestones in_progress."
# 2. Plan
agent → questio_next()
"Highest impact: complete ttl-removal-validated (blocks 3 milestones,
which block 5 hypotheses). Action: run preprocess_ieeg for remaining subjects."
# 3. Execute (via pipeio)
agent → pipeio_run(flow="preprocess_ieeg", ...)
agent → pipeio_nb_exec(flow="preprocess_ieeg", notebook="validate_ttl_removal")
# 4. Record
agent → questio_record(
question=["H1","H2","H3"],
milestone="ttl-removal-validated",
metric="ttl_artifact_residual_uv",
value="< 0.5 uV across all subjects",
confidence="validated",
title="TTL removal validated — residual < 0.5 uV"
)
# 5. Update milestone
agent → questio_milestone_update("ttl-removal-validated", status="complete",
evidence=["idea-arash-20260415-143022-123456"])
# 6. Assess
agent → questio_gap(question_id="H1")
"ttl-removal-validated: complete. preprocessing-stable: in_progress (ecephys pending).
swr-detection-validated: not_started. delta-event-detection: not_started.
delta-ripple-coupling: not_started. 3 milestones remaining."
# 7. Report
agent → questio_report(period="week")
"This week: completed TTL validation milestone. Unblocked preprocessing-stable.
Next: ecephys preprocessing, then SWR detection.
Blockers: none. Estimated: 2 milestones achievable next week."
# 8. Write (when ready, future sessions)
agent → questio_ready()
"H2 has sufficient evidence (2 validated results, all milestones complete).
Manuscript section: results/h2-spindle-origins. Ready to draft."
agent → questio_evidence("H2")
→ feeds into manuscript_section_context for drafting
8. Operational workflows¶
Questio's value is not in tracking alone — it's in enabling autonomous research loops where the agent grounds its work in literature and code, executes analyses, assesses results, and iterates. This section defines the action components, the loops they compose into, and what can be automated.
See also: loop-mechanisms.md extends this section with detailed investigate, iterate, and orient loop patterns. The inner/middle/outer loop framing here remains valid as a timescale model; loop-mechanisms.md adds concrete behavioral patterns (investigate, iterate, orient) that operate within these timescale loops, grounded in the agent-as-judge philosophy.
8.1 Action components¶
Every research action maps to a projio subsystem. These are the atomic operations an agent performs:
| Phase | Action | Subsystem | Tool/Skill |
|---|---|---|---|
| Ground | Check literature for methods, expected results, pitfalls | biblio | paper_context, rag_query, biblio_discover_authors |
| Ground | Find existing implementations, utilities, patterns | codio | codio_discover, codio_get, codio_vocab |
| Ground | Search project knowledge for prior work/decisions | indexio | rag_query, rag_query_multi |
| Ground | Check questio state — what's done, what's needed | questio | questio_status, questio_gap |
| Develop | Create new analysis notebook | pipeio | pipeio_nb_create |
| Develop | Update existing notebook (parameters, code) | pipeio | pipeio_nb_update, pipeio_nb_sync |
| Develop | Implement/modify pipeline rules and scripts | pipeio | pipeio_rule_insert, pipeio_script_create |
| Execute | Run notebook (single analysis) | pipeio | pipeio_nb_exec |
| Execute | Run pipeline (full dataset, Snakemake) | pipeio | pipeio_run |
| Assess | Inspect notebook outputs and figures | pipeio | pipeio_nb_read, pipeio_nb_analyze |
| Assess | Check pipeline run status and logs | pipeio | pipeio_run_status, pipeio_log_parse |
| Assess | Compare results against literature expectations | biblio + agent | paper_context + agent judgment |
| Assess | Validate against codio conventions | codio | codio_validate |
| Record | Capture structured evidence | questio | questio-record skill |
| Record | Capture unstructured observation or decision | notio | note_create |
| Record | Update milestone status | questio | via questio-record skill |
| Plan | Identify evidence gaps | questio | questio_gap |
| Plan | Recommend next action | questio | questio-next skill |
| Write | Check manuscript readiness | questio | questio-ready skill |
| Write | Draft manuscript section from evidence | manuscripto | manuscript_section_context |
| Report | Summarize progress | questio | questio-report skill |
8.2 Workflow loops¶
Research operates as nested loops at different timescales. Each loop has a clear entry condition, iteration logic, and exit condition.
Inner loop: Analysis iteration (minutes–hours)¶
The tightest loop. Agent iterates on an analysis — using notebooks, pipeline outputs, and direct file inspection as tools — until it produces satisfactory results. The agent IS the assessment layer: it reads outputs, compares against literature-grounded expectations, and makes a judgment call. Automatable when quality criteria are well-defined, but the agent's judgment quality is the bottleneck, not notebook scripting.
See loop-mechanisms.md section 3 (iterate loop) for the detailed mechanism.
┌──────────────────────────────────────────────┐
│ ANALYSIS ITERATION LOOP │
│ │
ground ──→ modify (config/notebook/script) ──→ execute ──→ assess
↑ │
│ unsatisfactory │
└─────────────────────────────────────────┘
│ satisfactory
↓
record evidence
Entry: milestone identified, analysis approach chosen. Grounding: before first iteration, agent checks biblio for expected values/methods and codio for existing implementations. Iteration: modify analysis (config, notebook code, script parameters) → execute (pipeio_run, pipeio_nb_exec, or other compute) → assess by reading outputs (pipeio_target_paths, pipeio_nb_read, file inspection) and comparing against expectations (paper_context, prior results). The assessment is agent-driven — the agent reads the data and judges, rather than executing a pre-scripted validation notebook. Exit: results meet quality criteria (statistical significance, consistency with literature, no artifacts). Agent creates a result note. Milestone update follows the propose-review-confirm pattern (agent proposes, human confirms).
Example — SWR detection validation:
1. biblio: paper_context("@sirota_2003") → expected ripple rate 10-15/min NREM
2. codio: codio_discover("sharp wave ripple detection") → cogpy.detection.swr exists
3. pipeio: pipeio_nb_create(flow="sharpwaveripple", notebook="validate_swr")
4. pipeio: pipeio_nb_exec(notebook="validate_swr", subjects=["sub-01"])
5. agent: reads output, compares 12.3/min against literature range — within expected bounds
6. pipeio: pipeio_nb_update(notebook="validate_swr") → add remaining subjects
7. pipeio: pipeio_nb_exec(notebook="validate_swr", subjects=["sub-01".."sub-05"])
8. agent: reads outputs per subject, checks cross-subject consistency (12.3 ± 2.1/min)
— compares against literature, checks for outlier subjects, assesses overall quality
9. agent: creates observation notes for any subjects with anomalous results
10. questio: questio-record(milestone="swr-detection-validated", ...)
11. agent: proposes milestone status → complete. Human confirms.
Key difference from earlier design: the agent creates the validation analysis as part of the loop. Notebooks may be created and executed, but they are agent-authored analytical tools, not pre-scripted validation templates. The judgment about quality lives in the agent's reasoning, informed by grounding context.
Middle loop: Milestone completion (hours–days)¶
Agent works through all milestones required for a hypothesis. Semi-automated — needs checkpoint reviews between milestones.
┌───────────────────────────────────────────────┐
│ MILESTONE COMPLETION LOOP │
│ │
questio_gap ──→ pick unblocked milestone ──→ inner loop(s) ──→ update milestone
↑ │
│ milestones remain │
└──────────────────────────────────────────────┘
│ all milestones complete
↓
question has sufficient evidence
Entry: questio_gap(H3) reveals unmet milestones.
Iteration: pick the deepest unblocked milestone → run inner loop(s) for required analyses → record evidence → update milestone → re-check gap.
Exit: all milestones for a question are complete.
Checkpoint: after each milestone completion, agent reports progress. Human may redirect priorities.
Outer loop: Research cycle (days–weeks)¶
Agent works across questions, prioritizing by impact. Agent-guided, human-directed — the agent proposes, the human approves direction changes.
┌──────────────────────────────────────────────────┐
│ RESEARCH CYCLE │
│ │
orient ──→ questio-next ──→ middle loop ──→ assess ──→ report
↑ │
│ questions remain │
└───────────────────────────────────────┘
│ question answered
↓
questio-ready → manuscript
Entry: session start, questio_status.
Iteration: questio-next picks highest-impact question → middle loop → report progress.
Exit (per question): evidence sufficient → draft manuscript section.
Human checkpoints: direction changes, surprising results, interpretation decisions.
8.3 Automated action sequences¶
These are concrete, automatable sequences that map onto skills or scheduled agents:
Sequence A: Literature-grounded development¶
Before implementing any analysis, the agent grounds itself in literature and code. This sequence precedes every inner loop iteration.
biblio: paper_context(citations from questions.yml)
→ expected methods, expected values, potential pitfalls
codio: codio_discover(keywords from milestone description)
→ existing implementations, reusable utilities
rag_query: search project notes for prior attempts
→ what was tried before, what worked/failed
→ agent synthesizes: approach, expected results, quality criteria
Skill: questio-ground — "before starting work on milestone X, gather context."
Sequence B: Agent-driven assessment sweep¶
Agent inspects pipeline outputs across subjects, comparing each against literature-grounded expectations. The agent uses available tools to read outputs, not a pre-scripted validation notebook. Assessment is agent judgment, not notebook execution.
for subject in subjects:
pipeio_target_paths(flow, subject=subject) → locate output files
read outputs (direct file inspection, pipeio_nb_read if notebook-generated)
compare against expectations from grounding (paper_context values, prior results)
create observation note for anomalous subjects
if all subjects meet expectations:
questio-record(confidence="validated")
else:
summarize failures in observation note
enter investigate loop for anomalous subjects, or escalate to human
The agent may create a notebook as part of assessment (e.g., to compute summary statistics or generate comparison figures), but this is agent-authored during the loop, not a pre-existing template. The assessment logic — "is this SWR detection rate acceptable?" — comes from the agent's grounding context, not from hard-coded thresholds in a notebook.
Skill: questio-investigate — "agent-driven deep dive into pipeline outputs against grounded expectations." See loop-mechanisms.md section 2.
Sequence C: Pipeline-to-evidence¶
After a pipeline completes, agent inspects results and converts outputs to evidence.
pipeio_run_status(flow) → check completion
pipeio_nb_exec(validation notebook) → run post-hoc analysis
pipeio_nb_read → inspect figures, metrics
biblio: compare to literature
if satisfactory:
questio-record(question, milestone, metrics)
else:
note_create(type="idea", tags=["observation"]) → capture what went wrong
iterate on pipeline
Sequence D: Morning research session¶
Full session workflow from orient to report.
questio_status → orient
questio-next → pick highest-impact work
questio-ground → literature + code context
[inner loop: develop notebook/pipeline]
[middle loop: complete milestone if possible]
questio-report → summarize what was accomplished
questio_docs_collect → regenerate plan docs
Skill: questio-session — "start a research session on this project."
8.4 Human-in-the-loop checkpoints¶
Not everything should be automated. These decision points require human judgment:
| Decision | Why human needed | Agent's role |
|---|---|---|
| Changing research direction | Scientific judgment about hypothesis viability | Present evidence, flag surprises, suggest alternatives |
| Interpreting unexpected results | Requires domain expertise + intuition | Surface the anomaly, provide literature context |
| Judging evidence sufficiency | "Enough" is a scientific and political judgment | Report what exists, flag gaps, but don't decide |
| Choosing between competing methods | Trade-offs require contextual priorities | Present options with literature backing |
| Approving manuscript drafts | Quality and accuracy bar | Draft, but human reviews |
The agent should surface these decision points rather than silently resolving them. When an inner loop produces surprising results (metrics far from literature expectations), the agent should create an observation note and flag it rather than iterate silently.
8.5 Loop automation levels¶
| Loop | Automation | Agent autonomy | Human role |
|---|---|---|---|
| Analysis iteration (inner) | Automatable — bounded by agent judgment quality, not notebook scripting | High — iterate until quality criteria met | Set quality criteria upfront; review agent's assessment rationale when results are non-obvious |
| Assessment sweep | Agent-driven — agent inspects outputs and judges, no pre-scripted notebooks | High — read, compare, assess, record | Review flagged anomalies; confirm milestone completion via propose-review-confirm pattern |
| Milestone completion (middle) | Semi-automated | Medium — works through milestones, pauses at checkpoints | Approve direction, review evidence, confirm milestone status changes |
| Research cycle (outer) | Agent-guided | Low — proposes next steps | Approve priorities, interpret results |
| Session workflow | Structured by skill | Medium — follows session structure | Initiate, review report |
Propose-review-confirm is the default pattern for milestone status updates. The agent proposes a status change with evidence, the human reviews the rationale, and confirms or rejects. Full autonomy for milestone updates is opt-in (a future automation dial), not the default. See loop-mechanisms.md section 5.3.
8.6 Failure mode taxonomy¶
All workflow loops share a common failure vocabulary. When something goes wrong during any loop, the agent classifies the failure and acts accordingly. This taxonomy is shared across investigate, iterate, and orient loops (see loop-mechanisms.md for detailed loop definitions).
| Mode | Meaning | Agent action | Escalation path |
|---|---|---|---|
retry |
Transient failure (timeout, resource contention, network error) | Re-run with same parameters, max 2 attempts | → investigate after 2 failures |
investigate |
Unexpected output (wrong values, empty files, partial results) | Enter investigate loop: gather evidence, trace cause | → escalate if cause not found after systematic search |
escalate |
Needs human judgment (ambiguous results, scientific interpretation required) | Create observation note with all evidence gathered, present findings, ask human | Terminal — human decides next action |
skip |
Blocked by external dependency (missing data, upstream flow incomplete) | Record the blocker in an observation note, move to next unblocked item | Re-check when dependency resolves (orient loop detects this) |
abort |
Unrecoverable (corrupted data, infrastructure failure, data integrity issue) | Stop the loop immediately, create detailed observation note, alert human | Terminal — requires human intervention before any further work |
Escalation cascade: each mode has a natural escalation path. retry escalates to investigate after max attempts. investigate escalates to escalate when the agent cannot determine the cause. This prevents infinite loops and ensures humans are informed when automation cannot resolve an issue.
Classification guidance: the agent should default to investigate when uncertain. retry is only appropriate when the failure is clearly transient (identical operation succeeded before, error message indicates timeout/contention). abort is reserved for situations where continuing could corrupt data or produce misleading results.
9. Integration with existing subsystems¶
9.1 pipeio¶
questions.ymlreferences pipeio flow names inpipelinesfield.questio-nextresolves which pipelines need running by checkingpipeio_flow_status.- Inner loop composes
pipeio_nb_create→pipeio_nb_exec→pipeio_nb_readfor notebook development. - Pipeline runs via
pipeio_run; results assessed viapipeio_run_status+pipeio_log_parse. - No changes to pipeio itself — questio reads pipeio state, doesn't modify it.
9.2 notio¶
- Evidence records are notio notes with extended frontmatter.
- Requires a
resultnote type in notio.toml with the structured fields. questio_recorddelegates to notio's note creation infrastructure.questio_evidencequeries notio by tags/series + parses frontmatter fields.
Proposed notio.toml addition:
[note_types.result]
mode = "event"
template = "result.md"
filename = "result-{owner}-{timestamp}"
toc_keys = ["question", "milestone", "metric", "confidence"]
toc_groupby = "series"
9.3 manuscripto¶
questions.ymlmaps questions to manuscript sections viamanuscript_section.questio_readychecks evidence sufficiency and reports which sections are draftable.questio_evidenceoutput feeds intomanuscript_section_contextfor drafting.- No changes to manuscripto — questio provides structured input.
9.4 biblio¶
questions.ymlcan reference citekeys incitationsfield.questio_evidencecan include relevant literature alongside result notes.- Grounding actions (sequence A) use biblio to set quality criteria and expected values before analysis.
- Assessment actions compare results against literature expectations.
- Minimal coupling — biblio is consulted, not modified.
9.5 codio¶
- Grounding actions use
codio_discoverandcodio_getto find existing implementations before writing new code. - Prevents re-invention: agent checks whether a method already exists in project libraries before implementing in a notebook.
codio_validatecan check that new implementations follow project conventions.- No changes to codio — questio's grounding skills call codio read tools.
9.6 indexio¶
rag_queryandrag_query_multiprovide project-wide knowledge search during grounding.- Agent searches prior notes, decisions, and documentation before starting new work.
questio-groundskill uses indexio to find prior attempts at similar analyses.
9.7 worklog (boundary)¶
Strict separation:
- Questio is project-local. It manages within-project research reasoning.
- Worklog is cross-project. It manages goals, capacity, scheduling across projects.
- Questio never calls worklog tools.
- Worklog may optionally read
plan/questions.ymlandplan/milestones.ymlto derive goal progress. This is a one-way data flow: project → worklog, never the reverse. - If worklog wants to track "pixecog-detection is 40% done," it can compute that from milestone completion in
milestones.yml— it doesn't need questio tools to do so.
Distinct responsibilities:
| Concern | Questio | Worklog |
|---|---|---|
| "What should I investigate?" | questio_next |
— |
| "Which project should I work on?" | — | focus() |
| "How is H3 progressing?" | questio_progress("H3") |
— |
| "How is pixecog overall?" | — | get_project("pixecog") |
| "What happened this week on H1-H7?" | questio_report(period="week") |
— |
| "What happened this week across all projects?" | — | agenda() |
10. Docs site rendering¶
The docs site is where questio data becomes visible to humans — the supervisor, collaborators, and the researcher reviewing their own progress. questio_docs_collect generates all plan/ pages from YAML, following the same pattern as pipeio_docs_collect.
9.1 Generated pages¶
All pages below are output artifacts — auto-generated from plan/questions.yml and plan/milestones.yml. They should not be hand-edited (regeneration overwrites them).
docs/plan/questions.md — Research question registry¶
Rendered table with status indicators and cross-links:
# Research Questions
| ID | Question | Type | Status | Milestones | Evidence | Section |
|----|----------|------|--------|------------|----------|---------|
| H1 | Do cortical delta waves precede ripple initiation? | hypothesis | not_started | 0/3 | 0 results | [results/h1](../manuscript/results/h1-delta-ripple/) |
| H2 | What are the cortical origins of ripple-driving spindles? | hypothesis | in_progress | 1/2 | 2 results | [results/h2](../manuscript/results/h2-spindle-origins/) |
Status uses text markers scannable in both rendered HTML and raw markdown:
- not_started, in_progress, blocked, sufficient, confirmed, refuted
docs/plan/milestones.md — Milestone tracker¶
Rendered table replacing the hand-maintained Milestones.md:
# Milestones
## Preprocessing
| Milestone | Status | Pipeline | Depends on | Evidence |
|-----------|--------|----------|------------|----------|
| TTL artifact removal validated | complete | preprocess_ieeg | — | [result-arash-20260415-...](../log/result/result-arash-20260415-143022-123456/) |
| iEEG preprocessing stable | in_progress | preprocess_ieeg | ttl-removal-validated | |
## Event Detection
...
Milestones are grouped by the questions they serve (computed from questions.yml → milestones field). Milestones shared across questions appear in a "Shared prerequisites" group.
docs/plan/roadmap.md — Dependency diagram¶
Auto-generated mermaid graph with status-colored nodes (same structure as pixecog's current hand-maintained roadmap.md, but regenerated from YAML):
# Roadmap
graph LR
subgraph Preprocessing
M_ttl["TTL removal<br/>● complete"]:::done
M_ieeg["iEEG stable"]:::progress
end
...
M_ttl --> M_ieeg
M_ieeg --> M_swr
...
classDef done fill:#2d6a4f,stroke:#1b4332,color:#fff
classDef progress fill:#e9c46a,stroke:#f4a261,color:#000
classDef pending fill:#adb5bd,stroke:#6c757d,color:#000
docs/plan/evidence.md — Evidence index¶
Evidence grouped by question, with links to result notes and figures:
# Evidence
## H1: Do cortical delta waves precede ripple initiation?
**Status:** not_started | **Milestones:** 0/3 complete | **Results:** 0
No evidence recorded yet. Required milestones:
- [ ] swr-detection-validated
- [ ] delta-event-detection
- [ ] delta-ripple-coupling
## H2: What are the cortical origins of ripple-driving spindles?
**Status:** in_progress | **Milestones:** 1/2 complete | **Results:** 2
| Result | Milestone | Metric | Value | Confidence | Date |
|--------|-----------|--------|-------|------------|------|
| [Spindle detection validated](../log/result/result-arash-...) | spindle-detection-validated | detection_rate | 8.1 ± 1.3/min | validated | 2026-04-12 |
| [Spindle topography preliminary](../log/result/result-arash-...) | spindle-topography-mapped | qualitative | "Clear frontal-parietal gradient" | preliminary | 2026-04-14 |
This page gives per-question evidence trails — combining the question overview with all linked result notes. As evidence accumulates, each question's section grows into a self-contained evidence dossier.
docs/plan/index.md — Plan overview¶
Landing page with high-level progress and links:
# Research Plan
**Progress:** 3/12 milestones complete | 1/7 questions with sufficient evidence
- [Questions](questions.md) — 7 research questions (3 hypothesis groups)
- [Milestones](milestones.md) — progress tracker with evidence links
- [Roadmap](roadmap.md) — visual dependency graph
- [Evidence](evidence.md) — result index by question
- [Backlog](../log/result/) — all result notes
9.2 Result note index¶
The dedicated result note type gets its own auto-generated index at docs/log/result/index.md (standard notio index behavior). This provides a chronological view of all evidence, complementing the per-question grouping in docs/plan/evidence.md.
9.3 MkDocs nav integration¶
questio_docs_collect also patches the mkdocs nav (via pipeio_mkdocs_nav_patch or equivalent) to include the plan section:
nav:
- Plan:
- plan/index.md
- Questions: plan/questions.md
- Milestones: plan/milestones.md
- Roadmap: plan/roadmap.md
- Evidence: plan/evidence.md
- Log:
- Results: log/result/index.md
# ... other note types
9.4 Rendering workflow¶
plan/questions.yml ──┐
├──→ questio_docs_collect ──→ docs/plan/*.md ──→ mkdocs build ──→ site
plan/milestones.yml ─┘ │
├──→ mkdocs nav patch
docs/log/result/*.md ─── (notio index) ──→ docs/log/result/index.md
The skill questio-docs-refresh wraps this: calls questio_docs_collect, then triggers a site rebuild if desired.
11. Implementation phases¶
Phase 0: Data model convention (no code)¶
- Define
questions.ymlandmilestones.ymlYAML schemas (JSON Schema or documented convention). - Validate with pixecog: convert existing
plan/Milestones.mdandplan/master/03-Questions-and-Hypotheses.mdto YAML. - Add
resultnote type to pixecog'snotio.toml. - Create 2–3 manual result notes with structured frontmatter to test the schema.
- Deliverable: pixecog has
plan/questions.yml,plan/milestones.yml, and a fewdocs/log/result/notes.
Phase 1: Docs generation (questio_docs_collect)¶
- Implement
questio_docs_collectMCP tool — generatesdocs/plan/pages from YAML. - Includes questions table, milestones table, mermaid roadmap, evidence index, plan index.
- Patches mkdocs nav.
- Deliverable: pixecog docs site shows auto-generated plan pages.
Phase 2: Query tools (questio_status, questio_gap)¶
questio_status— parse YAML + scan result notes, return structured overview.questio_gap— dependency resolution, evidence gap analysis per question.- Deliverable: agent can orient at session start and assess what's missing.
Phase 3: Skills¶
questio-next— compose status + gap + pipeio to recommend work.questio-record— guided result capture with note creation + milestone YAML update.questio-report— supervisor-ready summary.questio-ready— manuscript readiness check.questio-docs-refresh— regenerate docs.- Deliverable: full agentic research cycle is possible.
Phase 4: Agent instructions integration¶
agent_instructions()detectsplan/questions.ymland includes a questio summary in session context.questio-session-startskill for full orientation.- Deliverable: agents automatically know the research state when entering a questio-enabled project.
12. Open questions¶
-
Milestone auto-update — when
questio-recordcreates a result note for a milestone, should the skill auto-update the milestone YAML? Or require explicit confirmation? Auto-update is convenient but risks premature status changes. -
questio_nextsophistication — the skill needs to reason about what's highest-impact. Should it just sort by dependency depth (simple), or weigh hypothesis impact, pipeline cost, and evidence gaps (complex)? Start simple, iterate? -
Multi-study projects — some projects may have multiple independent studies (e.g., a methods paper + an application paper). Should
questions.ymlsupport study grouping, or is that a future concern? -
Agent instructions integration scope — should
agent_instructions()include full questio status (potentially verbose), or just a one-line summary with a pointer toquestio_status? -
Backlog rendering — pixecog has a
plan/backlog.mdwith task checklists. Should this be YAML-ified and included inquestio_docs_collect, or left as a hand-maintained file? Backlog items are more granular than milestones and may not need the same structured treatment. -
Evidence sufficiency criteria —
questio-readyneeds to judge "enough evidence to draft." Is this purely milestone-based (all milestones complete → ready), or should it also check confidence levels (all results must bevalidatedorfinal)? -
Inner loop guardrails — the notebook development loop can iterate indefinitely. What's the stopping condition? Max iterations? Human review after N attempts? Quality criteria defined upfront in the milestone?
-
Grounding depth —
questio-groundcould be shallow (check 2–3 papers, one codio search) or deep (comprehensive literature review, full code audit). Should the depth be configurable per milestone, or should the agent judge based on the novelty of the task? -
Scheduled automation — could the outer loop run as a scheduled agent (via worklog triggers)? E.g., "every morning, check questio_status for pixecog and run the next unblocked milestone." What's the right autonomy level for unattended operation?
-
Cross-subsystem skill design — skills like
questio-groundandquestio-validatecompose tools from 3–4 subsystems. How detailed should the skill prompts be? Should they be prescriptive step-by-step, or give the agent latitude to adapt? Pixecog's archived skills suggest prescriptive works well.