Audit pixecog preprocess/ieeg and write migration guide to pipeio v2 snakebids app design¶

Goal¶

Audit the actual pixecog preprocess/ieeg flow and produce a concrete, step-by-step migration guide for converting it from the current flat layout to a pipeio v2 snakebids app. This guide should be reusable as a template for migrating other flows.

Context¶

v2 roadmap: docs/log/idea/idea-arash-20260330-174518-164647.md Deep research: docs/log/idea/deep-research-pipeio-scope.md

pixecog is at /storage2/arash/projects/pixecog/. The preprocess/ieeg flow currently uses: - Flat layout: code/pipelines/preprocess/ieeg/Snakefile + config.yml - snakebids bids() function for path resolution - YAML config with anchors and member_sets - Derivatives at derivatives/preprocess/ - Multiple mods (badlabel, feature, interpolate, linenoise, etc.)

Target v2 layout: - snakebids app: run.py + config/snakebids.yml (or config.yml) + workflow/Snakefile + workflow/rules/*.smk + workflow/scripts/ - Derivative dir as self-contained app instance - dataset_description.json with GeneratedBy/SourceDatasets - DataLad subdataset per derivative

Key environment note: snakebids 0.14.0 and snakemake 9.11.4 are in the cogpy conda env. pipeio runs in rag env.

Prompt¶

Audit pixecog's preprocess/ieeg flow and write a migration guide for converting it to a pipeio v2 snakebids app. This is a research/documentation task — do NOT modify pixecog files.

Step 1: Audit current layout¶

Read and document the current structure of the preprocess/ieeg flow:

ls -la /storage2/arash/projects/pixecog/code/pipelines/preprocess/ieeg/ — what files exist?

Read the Snakefile — how many rules? How are includes organized? Does it use generate_inputs?

Read config.yml — what sections? pybids_inputs, registry groups, member_sets, params?

List all .smk files and scripts — how are mods organized?

Check derivatives/preprocess/ — what's the actual output structure?

Check if there's already a run.py, .snakebids marker, or dataset_description.json

Check the pipeio registry: use worklog_read_file(project_id="pixecog", rel_path=".pipeio/pipeio/registry.yml") or read directly

Step 2: Map current → target¶

For each file/directory in the current layout, document where it goes in the v2 layout:

Current path v2 path Notes

Snakefile workflow/Snakefile May need to add generate_inputs if not present

config.yml config.yml (snakebids is ok with this) Add parse_args, analysis_levels sections

*.smk workflow/rules/*.smk Move includes

scripts/ workflow/scripts/ Update script: paths in rules

(new) run.py Generate snakebids entry point

(new) derivatives/preprocess/dataset_description.json BIDS metadata

Step 3: Identify blockers and decisions¶

Does the current Snakefile use generate_inputs() or manual pybids? If manual, what's the migration effort?

Are there hardcoded paths that assume the flat layout?

Are there cross-flow dependencies (other flows consuming preprocess outputs)?

Is derivatives/preprocess/ already a DataLad subdataset?

What config sections need to become parse_args CLI arguments?

Step 4: Write the migration guide¶

Write a step-by-step guide in the Result section with:

Pre-migration checklist — what to verify before starting

Directory restructure — exact commands/moves

Snakefile changes — what to add/modify (generate_inputs, configfile path, script paths)

Config changes — what to add to config.yml for snakebids compatibility

run.py generation — exact content of the entry point

dataset_description.json — exact content for BIDS derivatives metadata

Test plan — how to verify the migration works (dry run, actual run)

Rollback plan — how to undo if something breaks

Step 5: Generalize¶

Note which parts of this guide are pixecog-specific vs reusable for any flow migration. Flag anything that pipeio could automate (e.g., a future pipeio_flow_migrate tool).

Acceptance Criteria¶

[ ] Current preprocess/ieeg layout fully documented
[ ] File-by-file mapping from current to v2
[ ] Blockers and decisions identified
[ ] Step-by-step migration guide written
[ ] Generalizable patterns noted for other flows

Result¶

(Filled in after execution)

Current path	v2 path	Notes
`Snakefile`	`workflow/Snakefile`	May need to add `generate_inputs` if not present
`config.yml`	`config.yml` (snakebids is ok with this)	Add `parse_args`, `analysis_levels` sections
`*.smk`	`workflow/rules/*.smk`	Move includes
`scripts/`	`workflow/scripts/`	Update script: paths in rules
(new)	`run.py`	Generate snakebids entry point
(new)	`derivatives/preprocess/dataset_description.json`	BIDS metadata