Skip to content

Audit pixecog preprocess/ieeg and write migration guide to pipeio v2 snakebids app design

Goal

Audit the actual pixecog preprocess/ieeg flow and produce a concrete, step-by-step migration guide for converting it from the current flat layout to a pipeio v2 snakebids app. This guide should be reusable as a template for migrating other flows.

Context

v2 roadmap: docs/log/idea/idea-arash-20260330-174518-164647.md Deep research: docs/log/idea/deep-research-pipeio-scope.md

pixecog is at /storage2/arash/projects/pixecog/. The preprocess/ieeg flow currently uses: - Flat layout: code/pipelines/preprocess/ieeg/Snakefile + config.yml - snakebids bids() function for path resolution - YAML config with anchors and member_sets - Derivatives at derivatives/preprocess/ - Multiple mods (badlabel, feature, interpolate, linenoise, etc.)

Target v2 layout: - snakebids app: run.py + config/snakebids.yml (or config.yml) + workflow/Snakefile + workflow/rules/*.smk + workflow/scripts/ - Derivative dir as self-contained app instance - dataset_description.json with GeneratedBy/SourceDatasets - DataLad subdataset per derivative

Key environment note: snakebids 0.14.0 and snakemake 9.11.4 are in the cogpy conda env. pipeio runs in rag env.

Prompt

Audit pixecog's preprocess/ieeg flow and write a migration guide for converting it to a pipeio v2 snakebids app. This is a research/documentation task — do NOT modify pixecog files.

Step 1: Audit current layout

Read and document the current structure of the preprocess/ieeg flow:

  1. ls -la /storage2/arash/projects/pixecog/code/pipelines/preprocess/ieeg/ — what files exist?
  2. Read the Snakefile — how many rules? How are includes organized? Does it use generate_inputs?
  3. Read config.yml — what sections? pybids_inputs, registry groups, member_sets, params?
  4. List all .smk files and scripts — how are mods organized?
  5. Check derivatives/preprocess/ — what's the actual output structure?
  6. Check if there's already a run.py, .snakebids marker, or dataset_description.json
  7. Check the pipeio registry: use worklog_read_file(project_id="pixecog", rel_path=".pipeio/pipeio/registry.yml") or read directly

Step 2: Map current → target

For each file/directory in the current layout, document where it goes in the v2 layout:

Current path v2 path Notes
Snakefile workflow/Snakefile May need to add generate_inputs if not present
config.yml config.yml (snakebids is ok with this) Add parse_args, analysis_levels sections
*.smk workflow/rules/*.smk Move includes
scripts/ workflow/scripts/ Update script: paths in rules
(new) run.py Generate snakebids entry point
(new) derivatives/preprocess/dataset_description.json BIDS metadata

Step 3: Identify blockers and decisions

  • Does the current Snakefile use generate_inputs() or manual pybids? If manual, what's the migration effort?
  • Are there hardcoded paths that assume the flat layout?
  • Are there cross-flow dependencies (other flows consuming preprocess outputs)?
  • Is derivatives/preprocess/ already a DataLad subdataset?
  • What config sections need to become parse_args CLI arguments?

Step 4: Write the migration guide

Write a step-by-step guide in the Result section with:

  1. Pre-migration checklist — what to verify before starting
  2. Directory restructure — exact commands/moves
  3. Snakefile changes — what to add/modify (generate_inputs, configfile path, script paths)
  4. Config changes — what to add to config.yml for snakebids compatibility
  5. run.py generation — exact content of the entry point
  6. dataset_description.json — exact content for BIDS derivatives metadata
  7. Test plan — how to verify the migration works (dry run, actual run)
  8. Rollback plan — how to undo if something breaks

Step 5: Generalize

Note which parts of this guide are pixecog-specific vs reusable for any flow migration. Flag anything that pipeio could automate (e.g., a future pipeio_flow_migrate tool).

Acceptance Criteria

  • [ ] Current preprocess/ieeg layout fully documented
  • [ ] File-by-file mapping from current to v2
  • [ ] Blockers and decisions identified
  • [ ] Step-by-step migration guide written
  • [ ] Generalizable patterns noted for other flows

Result

(Filled in after execution)