Migration guide: pixecog preprocess/ieeg → pipeio v2 snakebids app¶
This document audits the current preprocess/ieeg flow in pixecog and provides a
step-by-step migration guide for converting it to a pipeio v2 snakebids app layout.
1. Current layout audit¶
1.1 Directory structure¶
code/pipelines/preprocess/ieeg/
├── Snakefile # monolithic, 693 lines, ~25 rules
├── config.yml # 332 lines: pybids_inputs, registry, params
├── Makefile # notebook publishing workflow
├── scripts/ # 14 active scripts + 2 deprecated subdirs
│ ├── badlabel.py
│ ├── badness_video.py
│ ├── feature.py
│ ├── feature_umap.py
│ ├── filter.py
│ ├── interpolate.py
│ ├── lfp_video.py
│ ├── plot_feature_maps.py
│ ├── rowcol_noise.py
│ ├── ttl_removal.py
│ ├── linenoise/ # 7 scripts (zapline, comb profiling, downsample)
│ └── noise_tfspace/ # 3 scripts (spectrogram, summary, report)
├── notebooks/ # 21 notebooks (jupytext .py format)
│ ├── notebook.yml # notebook registry
│ └── {name}/{name}.py
├── report/ # .rst caption files for snakemake --report
├── docs/ # flow documentation
├── _docs/ # legacy docs
└── .snakemake/ # snakemake cache (not tracked)
1.2 Snakefile analysis¶
Key characteristics:
- Already uses generate_inputs from snakebids (line 17) — major plus
- Uses set_bids_spec("v0_0_0") for BIDS path generation
- Depends on sutil.repo_root.repo_abs for absolute path resolution (~30 call sites in Snakefile)
- Depends on sutil.bids.paths.BidsPaths for registry-driven path resolution
- Uses configfile: "config.yml" then re-reads it with safe_load (double-parsing)
- All rules are inline in a single Snakefile (no .smk includes)
- Uses report() wrapper for snakemake report integration on several outputs
- One rule (zapline_plus) has conda: "/storage/share/python/environments/Anaconda3/envs/matlab" — hardcoded absolute path
Rules (25 total):
| Rule | Type | Script | Notes |
|---|---|---|---|
all |
target | — | Default: expand pre_all + COMB_SUMMARY_HTML |
noisy_all |
target | — | Viz targets only |
report_noisy_all |
target | — | Existing viz files only |
noise_tfspace_all |
target | — | TF-space reports |
report |
target | — | Existing all + comb summary |
registry |
utility | inline | Writes registry YAML to derivatives |
manifest |
utility | inline | Writes manifest TSV |
test |
target | — | Single test entity |
status |
checkpoint | — | Touch file after QC outputs |
raw_zarr |
transform | inline | BIDS LFP → zarr |
ttl_removal |
transform | scripts/ttl_removal.py | TTL artifact removal |
lowpass |
transform | scripts/filter.py | Lowpass filter |
downsample |
transform | inline | Downsample zarr |
feature |
transform | scripts/feature.py | Feature extraction |
badlabel |
transform | scripts/badlabel.py | Bad channel detection |
plot_feature_maps |
viz | scripts/plot_feature_maps.py | Feature map plots |
rowcol_noise |
qc | scripts/rowcol_noise.py | Row/col noise stats |
badness_video |
viz | scripts/badness_video.py | Badness animation |
lfp_video |
viz | scripts/lfp_video.py | LFP animation |
interpolate |
transform | scripts/interpolate.py | Bad channel interpolation |
noise_tfspace_spectrogram |
transform | scripts/noise_tfspace/compute_tfspace_spectrogram.py | |
noise_tfspace_summary |
transform | scripts/noise_tfspace/compute_tfspace_summary.py | |
noise_tfspace_report |
viz | scripts/noise_tfspace/plot_tfspace_summary.py | |
downsample_for_zapline |
transform | scripts/linenoise/downsample_lfp.py | |
preprocess_json_sidecar |
utility | shell | Symlinks sidecars |
linenoise_profile_combfreqs |
transform | scripts/linenoise/measure_combfreqs.py | |
comb_cross_session_summary |
viz | scripts/linenoise/comb_cross_session_summary.py | |
comb_qc_plot |
viz | scripts/linenoise/comb_qc_plot.py | |
zapline_plus |
transform | scripts/linenoise/clean_zapline_plus.py | Needs matlab conda env |
noisy_spectrogram |
viz | scripts/linenoise/sample_spectrogram_plot.py | |
preprocess_alias |
utility | shell | Symlinks final output |
1.3 Config structure¶
# Input sources
input_dir: "raw"
input_registry: "raw/registry.yml"
input_dir_brainstate: "derivatives"
input_registry_brainstate: "derivatives/brainstate/flow-brainstate_registry.yml"
# pybids_inputs (2 input types: ieeg, ecephys)
pybids_inputs:
ieeg: { filters: ..., wildcards: [subject, session, task] }
ecephys: { filters: ..., wildcards: [subject, session, task, acquisition, recording] }
# Member set anchors (YAML &anchors for DRY)
_member_sets: { ... }
# Output
output_dir: "derivatives/preprocess"
output_registry: "derivatives/preprocess/pipe-preprocess_flow-ieeg_registry.yml"
# Registry groups (16 groups, ~60 members)
registry: { all, raw_zarr, lowpass, downsample, feature, badlabel, noise,
noise_tfspace, interpolate, zapline_in, linenoise, viz,
linenoise_profile, preprocess, ttl_removal }
# Processing params
geometry: { ... }
windowing: { ... }
features: [ ... ]
noise: { ... }
badlabel: { ... }
umap: { ... }
video: { ... }
linenoise: { ... } # 20+ params
noise_tfspace: { ... }
ttl_removal: { ... }
1.4 Dependencies and cross-flow consumers¶
Internal dependencies (sutil):
- sutil.repo_root.repo_abs — used in Snakefile + 20 scripts + 15 notebooks
- sutil.bids.paths.BidsPaths — used in Snakefile + 2 notebooks
Cross-flow consumers (downstream flows reading derivatives/preprocess/):
- sharpwaveripple — reads derivatives/preprocess/ via ecephys registry
- spectrogram/burst — reads derivatives/preprocess/ via ecephys registry
- Both reference pipe-preprocess_flow-ecephys_registry.yml (the ecephys sibling, not ieeg)
Key finding: No downstream flow directly consumes the ieeg preprocess registry. The
ieeg flow's outputs feed the ecephys flow (which shares derivatives/preprocess/), and
downstream flows consume the ecephys registry. This means the ieeg migration has no
direct cross-flow breakage risk.
1.5 Output structure¶
derivatives/preprocess/ is a DataLad subdataset (has .git/).
Contains:
- Per-stage subdirs: all/, badlabel/, downsample/, feature/, interpolate/,
linenoise/, linenoise_in/, linenoise_profile/, lowpass/, noise/,
noise_tfspace/, raw_zarr/, transient/, validate/, viz/, viz_cache/
- Per-subject dirs: sub-01/ through sub-05/, sub-test/
- Registry files: pipe-preprocess_flow-ieeg_registry.yml, pipe-preprocess_flow-ecephys_registry.yml
- No dataset_description.json — needs to be created
- No run.py — needs to be created
1.6 Pipeio registry status¶
The flow is already registered in pipeio at .projio/pipeio/registry.yml under
preprocess/ieeg with 22 mods and full rule mapping. This is consistent with the
current flat layout.
2. Current → target mapping¶
2.1 Directory mapping¶
| Current path | v2 path | Action |
|---|---|---|
Snakefile |
workflow/Snakefile |
Move; split rules into .smk files |
config.yml |
config/snakebids.yml |
Move; add parse_args + analysis_levels |
scripts/ |
workflow/scripts/ |
Move |
scripts/linenoise/ |
workflow/scripts/linenoise/ |
Move |
scripts/noise_tfspace/ |
workflow/scripts/noise_tfspace/ |
Move |
notebooks/ |
notebooks/ |
Keep in place |
report/ |
workflow/report/ |
Move (snakemake convention) |
docs/ |
docs/ |
Keep in place |
Makefile |
Makefile |
Keep; update paths |
| (new) | run.py |
Create snakebids entry point |
| (new) | derivatives/preprocess/dataset_description.json |
Create BIDS metadata |
2.2 Snakefile split by mod¶
The monolithic Snakefile should be split into mod-organized .smk files:
| Module | Rules | Target file |
|---|---|---|
| common | all, test, report, status, manifest, registry, targets | workflow/Snakefile (keep orchestration) |
| raw | raw_zarr, ttl_removal | workflow/rules/raw.smk |
| signal | lowpass, downsample, feature | workflow/rules/signal.smk |
| badlabel | badlabel, plot_feature_maps, badness_video | workflow/rules/badlabel.smk |
| noise | rowcol_noise | workflow/rules/noise.smk |
| interpolate | interpolate, preprocess_json_sidecar, preprocess_alias | workflow/rules/interpolate.smk |
| linenoise | downsample_for_zapline, linenoise_profile_combfreqs, comb_cross_session_summary, comb_qc_plot, zapline_plus, noisy_spectrogram | workflow/rules/linenoise.smk |
| noise_tfspace | noise_tfspace_spectrogram, noise_tfspace_summary, noise_tfspace_report | workflow/rules/noise_tfspace.smk |
| viz | lfp_video | workflow/rules/viz.smk |
3. Blockers and decisions¶
3.1 sutil.repo_root.repo_abs dependency (MAJOR)¶
Scope: 30+ call sites in Snakefile, 20 scripts, 15 notebooks.
repo_abs(rel) resolves a path relative to the repository root. In the v2 snakebids
model, the Snakefile runs from workflow/ and paths should be relative to the app root
or use snakebids' own path resolution.
Decision required: How to replace repo_abs:
- Option A: Replace with Path(workflow.basedir).parent / rel in Snakefile context
(snakemake provides workflow.basedir = directory containing Snakefile).
- Option B: Replace with config["root"] where root is injected by run.py.
- Option C: Keep sutil but make repo_abs resolve from config rather than git root.
Recommendation: Option B — run.py sets config["root"] to the repo root, and all
repo_abs(x) calls become Path(config["root"]) / x. This is a mechanical find-replace.
3.2 sutil.bids.paths.BidsPaths dependency (MODERATE)¶
Used for registry-driven path construction: out_paths("group", "member") → BIDS path template.
In v2, this is replaced by pipeio.bids.BidsResolver. The API is similar but not
identical. The Snakefile setup code (lines 15–26) needs rewriting.
Migration: BidsResolver is a drop-in adapter with the same (group, member) call
signature. Import changes from sutil.bids.paths.BidsPaths to pipeio.bids.BidsResolver.
3.3 configfile double-parsing (MINOR)¶
Lines 8+12–13: configfile: "config.yml" then re-reads with safe_load. This is because
repo_abs needs the config dict before snakemake's config is fully available.
v2 fix: With run.py injecting paths, the double-parse becomes unnecessary. Use
snakemake's native configfile: directive only.
3.4 Hardcoded conda environment (MINOR)¶
zapline_plus rule uses conda: "/storage/share/python/environments/Anaconda3/envs/matlab".
v2 fix: Move to a workflow/envs/matlab.yml conda env spec, or use
config["conda_envs"]["matlab"] for portability.
3.5 report() paths with rst captions (MINOR)¶
Several rules use report(path, caption="report/foo.rst"). The caption paths are
relative to the rule's location. After moving rules to workflow/rules/, these need
updating.
v2 fix: Move report/ to workflow/report/ and update caption paths.
3.6 Cross-flow output directory sharing (INFO)¶
Both preprocess/ieeg and preprocess/ecephys write to derivatives/preprocess/.
They share the DataLad subdataset but use separate registries. This is fine for v2 —
BIDS derivatives directories can contain outputs from multiple pipelines. The
dataset_description.json should list both as generators.
4. Step-by-step migration guide¶
Phase 1: Pre-migration checklist¶
- [ ] Verify all current outputs are committed in the
derivatives/preprocesssubdataset - [ ] Run
snakemake -nto confirm current Snakefile parses cleanly - [ ] Back up current Snakefile:
cp Snakefile Snakefile.v1.bak - [ ] Verify
sutilis installed in the cogpy environment - [ ] Check that no other flow's Snakefile imports from
preprocess/ieeg/directly
Phase 2: Create v2 directory skeleton¶
cd code/pipelines/preprocess/ieeg
# Create v2 directories
mkdir -p workflow/rules
mkdir -p workflow/scripts
mkdir -p config
# Move files
mv Snakefile workflow/Snakefile
mv config.yml config/snakebids.yml
mv scripts/* workflow/scripts/
rmdir scripts
mv report workflow/report
Phase 3: Create run.py¶
#!/usr/bin/env python
"""Snakebids entry point for preprocess/ieeg flow."""
from pathlib import Path
from snakebids.app import SnakeBidsApp
def main():
app = SnakeBidsApp(
snakefile_path=Path(__file__).resolve().parent / "workflow" / "Snakefile",
configfile_path=Path(__file__).resolve().parent / "config" / "snakebids.yml",
)
app.run_snakemake()
if __name__ == "__main__":
main()
Phase 4: Update config/snakebids.yml¶
Add snakebids-required sections at the top:
# snakebids app metadata
app_name: preprocess-ieeg
analysis_levels: &analysis_levels
- participant
parse_args:
bids_dir:
help: "Input BIDS directory"
default: "raw"
output_dir:
help: "Output derivatives directory"
default: "derivatives/preprocess"
analysis_level:
help: "Analysis level"
choices: *analysis_levels
default: "participant"
# ... rest of existing config unchanged ...
Phase 5: Update workflow/Snakefile¶
Key changes to the Snakefile header:
from snakemake.utils import min_version
min_version("6.0")
from snakebids import generate_inputs, bids, set_bids_spec
set_bids_spec("v0_0_0")
from pathlib import Path
# v2: config root injected by snakebids or set from workflow location
ROOT = Path(config.get("root", Path(workflow.basedir).parent.parent.parent.parent))
configfile: str(Path(workflow.basedir).parent / "config" / "snakebids.yml")
# Replace all repo_abs() calls with ROOT / "path"
# e.g.: repo_abs("code/pipelines/preprocess/ieeg") → ROOT / "code/pipelines/preprocess/ieeg"
from sutil.bids.paths import BidsPaths # or pipeio.bids.BidsResolver when ready
inputs = generate_inputs(ROOT / config["input_dir"], config["pybids_inputs"])
# ... rest of setup with ROOT instead of repo_abs ...
# Include mod rules
include: "rules/raw.smk"
include: "rules/signal.smk"
include: "rules/badlabel.smk"
include: "rules/noise.smk"
include: "rules/interpolate.smk"
include: "rules/linenoise.smk"
include: "rules/noise_tfspace.smk"
include: "rules/viz.smk"
# Keep target rules in main Snakefile
rule all:
input:
inputs['ieeg'].expand(pre_all),
COMB_SUMMARY_HTML
# ... other target rules ...
Phase 6: Split rules into .smk files¶
For each .smk file, extract the relevant rules from the monolithic Snakefile. The
rules can reference ROOT, inputs, in_paths, out_paths, and config as globals
(Snakemake includes share the namespace).
Update script: directives — paths are relative to the rule file's directory:
- In workflow/Snakefile: script: "scripts/foo.py" (relative to workflow/)
- In workflow/rules/raw.smk: script: "../scripts/foo.py" (up one level)
Phase 7: Update script repo_abs calls¶
Mechanical replacement in all scripts:
# Before:
from sutil.repo_root import repo_abs
path = repo_abs("derivatives/preprocess/...")
# After (in snakemake script context):
from pathlib import Path
root = Path(snakemake.config.get("root", "."))
path = root / "derivatives/preprocess/..."
For scripts that use repo_abs only for log file paths or notebook references, the
replacement is straightforward. Each script's snakemake.config["root"] provides the
repo root.
Phase 8: Create dataset_description.json¶
{
"Name": "preprocess",
"BIDSVersion": "1.9.0",
"DatasetType": "derivative",
"GeneratedBy": [
{
"Name": "preprocess-ieeg",
"Description": "iEEG preprocessing pipeline: raw→zarr, lowpass, downsample, feature extraction, bad channel detection, interpolation, line noise removal",
"CodeURL": "code/pipelines/preprocess/ieeg"
},
{
"Name": "preprocess-ecephys",
"Description": "Extracellular electrophysiology preprocessing pipeline",
"CodeURL": "code/pipelines/preprocess/ecephys"
}
],
"SourceDatasets": [
{
"URL": "../../raw"
}
]
}
Place at derivatives/preprocess/dataset_description.json.
Phase 9: Update Makefile¶
Update SNAKEMAKE invocation paths and any references to Snakefile or config.yml
to point to the new locations.
Phase 10: Test plan¶
- Parse test:
cd code/pipelines/preprocess/ieeg && snakemake -s workflow/Snakefile --configfile config/snakebids.yml -n - Dry run:
snakemake -s workflow/Snakefile --configfile config/snakebids.yml -n --forceall - Single subject test:
snakemake -s workflow/Snakefile --configfile config/snakebids.yml -n --config root=$(git rev-parse --show-toplevel) -- test - Entry point test:
python run.py raw derivatives/preprocess participant --dry-run - Full run on test subject: verify output matches v1 byte-for-byte for deterministic rules
- Registry scan:
pipeio_registry_scan()should detect the flow asapp_type: snakebids
Phase 11: Rollback plan¶
cd code/pipelines/preprocess/ieeg
# Restore v1 layout
mv workflow/Snakefile ./Snakefile
mv config/snakebids.yml ./config.yml
mv workflow/scripts/* ./scripts/
mv workflow/report ./report
rm -rf workflow config run.py
All changes are local to code/pipelines/preprocess/ieeg/. The derivatives directory
and DataLad subdataset are unaffected. No downstream flows need changes since they
consume the ecephys registry, not the ieeg one.
5. Effort estimate by component¶
| Component | Effort | Blocking? |
|---|---|---|
| Directory restructure | Small | No |
run.py creation |
Small | No |
| Config additions | Small | No |
| Snakefile split into .smk | Medium | No |
repo_abs → ROOT / in Snakefile |
Small (mechanical) | No |
repo_abs → config["root"] in 20 scripts |
Medium (mechanical) | No |
BidsPaths → BidsResolver swap |
Small (API-compatible) | Needs BidsResolver impl |
script: path updates in .smk files |
Small (mechanical) | No |
report() caption path updates |
Small | No |
Notebook repo_abs updates |
Large (15 notebooks) | Non-blocking (defer) |
| Hardcoded conda env | Small | No |
dataset_description.json |
Small | No |
| Testing | Medium | — |
Total: ~1-2 focused sessions. The mechanical repo_abs replacement dominates.
6. Pixecog-specific vs reusable¶
Pixecog-specific¶
- BidsPaths → BidsResolver migration (pixecog's custom path resolution)
sutil.repo_root.repo_abselimination (pixecog utility)- Specific rule split plan (domain knowledge)
- Cross-flow analysis (project-specific topology)
- Hardcoded conda path
Reusable for any flow migration¶
- Directory restructure template (flat → snakebids app)
run.pyboilerplateconfig/snakebids.ymladditions (parse_args,analysis_levels)dataset_description.jsontemplatescript:path update rules (Snakefile → rules/ relative paths)- Test plan structure
- Rollback plan pattern
Candidates for pipeio automation (pipeio_flow_migrate)¶
- Directory scaffolding —
mkdir -p workflow/rules config+ move files run.pygeneration — template with flow name substitutiondataset_description.jsongeneration — from registry metadata + configscript:path rewriting — parse rules, adjust relative paths after moveconfigfile:path update — mechanical- Dry-run validation —
snakemake -nafter migration to verify parse - Registry rescan — verify detection as snakebids app
A pipeio_flow_migrate(pipe, flow, dry_run=True) tool could handle items 1–6
automatically, with dry_run=True showing the plan before execution.