Scenario book: evaluating a new detection method from recent literature¶
Goal¶
Write a scenario showing a researcher discovering a new paper with a better spindle detection algorithm, evaluating whether to adopt it, benchmarking against their current approach, and making a recorded decision.
Context¶
This scenario exercises the method evaluation workflow: literature discovery → code audit → benchmark → decision. It's a common real-world pattern — a new paper comes out, and you need to decide whether it's worth changing your approach.
The scenario should be written at docs/specs/research-orchestration/scenarios/scenario-method-evaluation.md.
Pixecog context:
- Current spindle detection: cogpy has SpindleDetector class in cogpy.detect
- Relevant flow: spectrogram_burst with scripts for spectrogram computation and blob detection
- H2 and H3 both depend on spindle detection quality
- Milestone: spindle-detection-validated is not_started
- The researcher reads Pedrosa 2024 (@pedrosa_2024, cited in H2 and H4) which describes an improved spindle detection approach
- biblio can discover related papers via biblio_discover_authors, biblio_graph_expand
Prompt¶
Write the scenario at /storage2/arash/projects/projio/docs/specs/research-orchestration/scenarios/scenario-method-evaluation.md.
Step 1: Read context.
- Read /storage2/arash/projects/projio/docs/specs/research-orchestration/scenario-book.md for format
- Read /storage2/arash/projects/projio/docs/specs/research-orchestration/loop-mechanisms.md
- Read pixecog's questions.yml and milestones.yml for hypothesis/milestone details
Step 2: Write the scenario.
Structure:
1. Header with starting state and researcher's goal
2. Phase 1: Literature trigger — researcher says "I just read Pedrosa 2024 and their spindle detection looks better than ours. Should we switch?" Agent uses paper_context("@pedrosa_2024") to extract their method details. Then biblio_discover_authors("Pedrosa") and biblio_graph_expand("@pedrosa_2024") to find related work and validate the method's reception.
3. Phase 2: Current method audit — Agent uses codio_get("cogpy") and codio_discover("spindle detection") to understand the current SpindleDetector implementation. pipeio_mod_context("spectrogram_burst", "blob_detect") to see how it's used in the pipeline. Presents a comparison: current approach vs paper's approach.
4. Phase 3: Benchmark design — Agent proposes a benchmark: run both methods on one subject, compare detection rates, temporal precision, and false positive rates. Uses the iterate loop. Human approves the plan.
5. Phase 4: Benchmark execution — 2-3 iterate cycles: run current method → run new method (implemented in a notebook) → compare metrics. Agent presents side-by-side comparison. Use paper_context to check against literature-reported values.
6. Phase 5: Decision — Agent presents findings. Human decides (e.g., "the new method is marginally better but not worth the migration cost" or "yes, let's adopt it"). Agent creates a decision note with rationale.
7. Phase 6: Record — Decision note captures: what was compared, metrics, rationale, implications for milestones. If adopting: create issues for code migration. If not: record why, so future agents don't re-evaluate.
Use mkdocs material admonitions:
- !!! info "Behind the scenes" for tool calls
- !!! tip "Why this matters" for design insights
- !!! warning "Human checkpoint" for judgment calls
- !!! note "Decision recording" for explaining why decisions are captured
End with ecosystem coverage, loop patterns, and key insight (answer: method evaluation as a first-class workflow — the decision to NOT change is as valuable as the decision to change).
Step 3: Commit with message: "Add scenario: evaluating a new detection method from literature"
Acceptance Criteria¶
- [ ] File at
docs/specs/research-orchestration/scenarios/scenario-method-evaluation.md - [ ] Exercises biblio (discovery, enrichment, graph expand), codio, pipeio, questio
- [ ] Shows the iterate loop used for benchmarking
- [ ] Decision recording is explicit — captures the "why" regardless of outcome
- [ ] Uses mkdocs material admonitions
- [ ] Committed
Batch Result¶
- status: done
- batch queue_id:
140acc720c2b - session:
49cfe4a4-561d-4bb8-8fb6-d01b2ac17020 - batch duration: 801.3s