Audit pipeio_run, run_status, run_dashboard, run_kill for datalad run migration plan¶

Goal¶

Analyze the current run execution tools (pipeio_run, run_status, run_dashboard, run_kill) and produce a concrete migration plan for replacing them with datalad run + snakebids entry points. Research-only — do NOT implement.

Context¶

See roadmap: docs/log/idea/idea-arash-20260330-174518-164647.md

Current state: pipeio_run launches snakemake in a detached screen session, tracks state in .pipeio/runs.json, and provides status/dashboard/kill tools. This duplicates Snakemake's execution and DataLad's provenance.

Target: pipeio_run becomes a thin launcher that wraps datalad run -- python run.py {bids_dir} {output_dir} {level} and returns structured metadata (commit hash, run record, derivative dir). No more screen sessions or runs.json.

Prompt¶

Audit pipeio's run execution tools and design a migration to datalad run. Research-only — do NOT change code.

Read the current implementations in packages/pipeio/src/pipeio/mcp.py:

mcp_run (~line 3145) — how it builds the snakemake command, launches screen, writes runs.json

mcp_run_status (~line 3252) — how it checks screen sessions and parses logs

mcp_run_dashboard (~line 3341) — how it aggregates run state

mcp_run_kill (~line 3386) — how it kills screen sessions

Read the projio datalad MCP tools at src/projio/mcp/datalad.py — understand what datalad commands are already available (datalad_save, datalad_status, datalad_push, datalad_pull). Note the conda environment wrapping pattern.

Research how datalad run works:

Command structure: datalad run --input X --output Y --message M -- command

How run records are stored (git commits with structured metadata)

How datalad rerun works

The --explicit flag semantics

Design the new pipeio_run interface:

Input: flow name, analysis_level, snakemake extra args

Must compute --input and --output from pipeio's contract/config data

Must construct the snakebids entry point command (python run.py ...)

Must wrap with datalad run

Return: commit hash, run record info, derivative dir

Consider: should it still support background execution? How?

Design what replaces run_status/dashboard/kill:

Status: git log of datalad run records + snakemake --summary

Dashboard: aggregate run records from git history

Kill: if background execution is needed, how to manage it

Identify blockers and open questions:

Does the derivative dir need to be a DataLad subdataset?

How does datalad run interact with snakemake's own job management?

What happens with long-running jobs (hours/days)?

How to handle partial runs / restarts?

Write the complete audit and migration design into the Result section.

Acceptance Criteria¶

[ ] Current run tool implementations fully documented
[ ] datalad run integration interface designed
[ ] Input/output declaration strategy from contracts
[ ] Replacement plan for status/dashboard/kill
[ ] Open questions and blockers identified

Result¶

(Filled in after execution)

Batch Result¶

status: done
batch queue_id: d52d4b497700
session: dd9c6c81-6b88-4a7d-87b4-d0280355ac87
batch duration: 379.5s