Audit pipeio_run, run_status, run_dashboard, run_kill for datalad run migration plan¶
Goal¶
Analyze the current run execution tools (pipeio_run, run_status, run_dashboard, run_kill) and produce a concrete migration plan for replacing them with datalad run + snakebids entry points. Research-only — do NOT implement.
Context¶
See roadmap: docs/log/idea/idea-arash-20260330-174518-164647.md
Current state: pipeio_run launches snakemake in a detached screen session, tracks state in .pipeio/runs.json, and provides status/dashboard/kill tools. This duplicates Snakemake's execution and DataLad's provenance.
Target: pipeio_run becomes a thin launcher that wraps datalad run -- python run.py {bids_dir} {output_dir} {level} and returns structured metadata (commit hash, run record, derivative dir). No more screen sessions or runs.json.
Prompt¶
Audit pipeio's run execution tools and design a migration to datalad run. Research-only — do NOT change code.
- Read the current implementations in
packages/pipeio/src/pipeio/mcp.py:mcp_run(~line 3145) — how it builds the snakemake command, launches screen, writes runs.jsonmcp_run_status(~line 3252) — how it checks screen sessions and parses logsmcp_run_dashboard(~line 3341) — how it aggregates run state
mcp_run_kill(~line 3386) — how it kills screen sessionsRead the projio datalad MCP tools at
src/projio/mcp/datalad.py— understand what datalad commands are already available (datalad_save, datalad_status, datalad_push, datalad_pull). Note the conda environment wrapping pattern.Research how
datalad runworks:- Command structure:
datalad run --input X --output Y --message M -- command- How run records are stored (git commits with structured metadata)
- How
datalad rerunworksThe
--explicitflag semanticsDesign the new
pipeio_runinterface:- Input: flow name, analysis_level, snakemake extra args
- Must compute
--inputand--outputfrom pipeio's contract/config data- Must construct the snakebids entry point command (
python run.py ...)- Must wrap with
datalad run- Return: commit hash, run record info, derivative dir
Consider: should it still support background execution? How?
Design what replaces run_status/dashboard/kill:
- Status:
git logof datalad run records +snakemake --summary- Dashboard: aggregate run records from git history
Kill: if background execution is needed, how to manage it
Identify blockers and open questions:
- Does the derivative dir need to be a DataLad subdataset?
- How does
datalad runinteract with snakemake's own job management?- What happens with long-running jobs (hours/days)?
- How to handle partial runs / restarts?
Write the complete audit and migration design into the Result section.
Acceptance Criteria¶
- [ ] Current run tool implementations fully documented
- [ ] datalad run integration interface designed
- [ ] Input/output declaration strategy from contracts
- [ ] Replacement plan for status/dashboard/kill
- [ ] Open questions and blockers identified
Result¶
(Filled in after execution)
Batch Result¶
- status: done
- batch queue_id:
d52d4b497700 - session:
dd9c6c81-6b88-4a7d-87b4-d0280355ac87 - batch duration: 379.5s