Spec: reconcile design.md section 8 with agent-as-judge loop philosophy¶
Goal¶
Update section 8 of the questio design spec to reflect the agent-as-judge philosophy. The current spec assumes pre-scripted validation notebooks and rigid QC schemas. The updated version should position the agent (Claude Code) as the assessment layer, with skills teaching investigation strategy rather than encoding domain-specific validation logic.
Context¶
The current design.md section 8 has good structural framing (inner/middle/outer loops, action components, automation levels) but makes assumptions that we've since rejected:
- Section 8.2 inner loop references "validation notebooks" as the assessment mechanism — but the agent should assess directly using tool outputs, not pre-scripted notebooks
- Sequence B (validation sweep) describes running
validate_*notebooks per subject — this is the artificial pattern we're replacing - Section 8.5 automation levels says inner loop is "fully automatable" — true, but the automation is agent reasoning over outputs, not notebook execution
- The
questio-validateskill in section 6.2 is described as "run validation notebook across subjects" — should be reframed as agent-driven assessment
The loop-mechanisms.md spec (task-arash-20260408-043830) defines the new patterns. This task makes design.md consistent with that spec.
Prompt¶
Update /storage2/arash/projects/projio/docs/specs/research-orchestration/design.md sections 8 and 6.2 to reflect the agent-as-judge philosophy.
Step 1: Read the full design spec and the loop mechanisms spec (if it exists at docs/specs/research-orchestration/loop-mechanisms.md, otherwise read task-arash-20260408-043830 for the intended design).
Step 2: Update section 8.2 (inner loop).
Change the inner loop description from "agent iterates on a notebook" to "agent iterates on an analysis, using notebooks as one tool among many." The key change: - Remove the assumption that validation = running a specific notebook - The agent reads pipeline outputs (pipeio_target_paths, file inspection), compares against expectations (biblio context, prior results), and makes a judgment call - Notebooks may be created/executed as part of this, but they're agent-authored during the loop, not pre-scripted templates - Keep the SWR detection example but reframe it: the agent creates the validation analysis as part of the loop, it doesn't execute a pre-existing template
Step 3: Update section 8.3 Sequence B (validation sweep).
Rewrite to reflect agent-driven assessment: - Instead of "for subject in subjects: run validate_* notebook", describe the agent inspecting outputs per subject using available tools - The agent uses pipeio_nb_read, pipeio_target_paths, and direct file reading to assess outputs - It compares against expectations set during grounding (literature values from paper_context) - It creates observation notes for each assessment, then a final result note if satisfactory
Step 4: Update section 6.2 skill table.
- Reframe
questio-validatefrom "run validation notebook" to "agent-driven assessment of pipeline output against grounded expectations" - Add
questio-investigateskill: "human-triggered deep dive into an anomaly or issue" - Add note that
questio-iteratereplaces the standalone dispatch pattern — it's the execute-and-evaluate cycle within a human feedback loop - Update
questio-grounddescription to mention it feeds investigation context, not just pre-work context
Step 5: Update section 8.5 automation levels.
Adjust the table to reflect that: - Inner loop automation depends on agent judgment quality, not notebook scripting - The "human role" for inner loop should include "review agent's assessment rationale" not just "set quality criteria upfront" - Add the propose-review-confirm pattern as the default for milestone updates
Step 6: Add a forward reference from section 8 to the loop-mechanisms.md spec for the detailed investigate/iterate/orient patterns.
Step 7: Commit with message: "Reconcile questio design spec section 8 with agent-as-judge loop philosophy"
Acceptance Criteria¶
- [ ] Section 8.2 inner loop no longer assumes pre-scripted validation notebooks
- [ ] Sequence B rewritten for agent-driven assessment
- [ ] Skill table updated with questio-investigate, questio-iterate reframing
- [ ] Automation levels reflect agent-as-judge model
- [ ] Forward reference to loop-mechanisms.md added
- [ ] No structural breakage to other sections
- [ ] Committed
Batch Result¶
- status: done
- batch queue_id:
08b012e5b5c7 - session:
f5d805ae-69f7-4207-809b-3ee4b88ea1bb - batch duration: 425.8s