Skip to content

Audit pipeio_run, run_status, run_dashboard, run_kill for datalad run migration plan

Goal

Analyze the current run execution tools (pipeio_run, run_status, run_dashboard, run_kill) and produce a concrete migration plan for replacing them with datalad run + snakebids entry points. Research-only — do NOT implement.

Context

See roadmap: docs/log/idea/idea-arash-20260330-174518-164647.md

Current state: pipeio_run launches snakemake in a detached screen session, tracks state in .pipeio/runs.json, and provides status/dashboard/kill tools. This duplicates Snakemake's execution and DataLad's provenance.

Target: pipeio_run becomes a thin launcher that wraps datalad run -- python run.py {bids_dir} {output_dir} {level} and returns structured metadata (commit hash, run record, derivative dir). No more screen sessions or runs.json.

Prompt

Audit pipeio's run execution tools and design a migration to datalad run. Research-only — do NOT change code.

  1. Read the current implementations in packages/pipeio/src/pipeio/mcp.py:
  2. mcp_run (~line 3145) — how it builds the snakemake command, launches screen, writes runs.json
  3. mcp_run_status (~line 3252) — how it checks screen sessions and parses logs
  4. mcp_run_dashboard (~line 3341) — how it aggregates run state
  5. mcp_run_kill (~line 3386) — how it kills screen sessions

  6. Read the projio datalad MCP tools at src/projio/mcp/datalad.py — understand what datalad commands are already available (datalad_save, datalad_status, datalad_push, datalad_pull). Note the conda environment wrapping pattern.

  7. Research how datalad run works:

  8. Command structure: datalad run --input X --output Y --message M -- command
  9. How run records are stored (git commits with structured metadata)
  10. How datalad rerun works
  11. The --explicit flag semantics

  12. Design the new pipeio_run interface:

  13. Input: flow name, analysis_level, snakemake extra args
  14. Must compute --input and --output from pipeio's contract/config data
  15. Must construct the snakebids entry point command (python run.py ...)
  16. Must wrap with datalad run
  17. Return: commit hash, run record info, derivative dir
  18. Consider: should it still support background execution? How?

  19. Design what replaces run_status/dashboard/kill:

  20. Status: git log of datalad run records + snakemake --summary
  21. Dashboard: aggregate run records from git history
  22. Kill: if background execution is needed, how to manage it

  23. Identify blockers and open questions:

  24. Does the derivative dir need to be a DataLad subdataset?
  25. How does datalad run interact with snakemake's own job management?
  26. What happens with long-running jobs (hours/days)?
  27. How to handle partial runs / restarts?

Write the complete audit and migration design into the Result section.

Acceptance Criteria

  • [ ] Current run tool implementations fully documented
  • [ ] datalad run integration interface designed
  • [ ] Input/output declaration strategy from contracts
  • [ ] Replacement plan for status/dashboard/kill
  • [ ] Open questions and blockers identified

Result

(Filled in after execution)

Batch Result

  • status: done
  • batch queue_id: d52d4b497700
  • session: dd9c6c81-6b88-4a7d-87b4-d0280355ac87
  • batch duration: 379.5s