Stack-axis survey: BIDS / DataLad / Snakemake / Marimo / Quarto+MkDocs / projio / agentic across study projects

Purpose¶

Empirical input for the handbook + 4-day workshop on the open-science stack (BIDS, DataLad, Snakemake, Marimo, Quarto/MkDocs) plus the projio + agentic layer that wraps them. This survey slices horizontally across the stack: for each component, "how does it show up in projio's project-aware system, and how does its adoption vary across projio + the four study projects (cogpy, pixecog, gecog, msol)?"

Read-only across all projects. Companion / superseded by the per-project tool-use survey — that one is organized per-project on the analysis-substance axis; this one is organized per-component on the stack axis.

Workshop-audience framing throughout: each component opens with a one-paragraph operational definition that could appear verbatim in pre-workshop reading.

Component 1 — BIDS¶

What it does¶

BIDS (Brain Imaging Data Structure) is a filesystem convention for organizing neuroscience datasets so that any tool, person, or script knows where to find subjects, sessions, runs, modalities, and metadata without per-dataset configuration. Workshop framing: the directory layout is the API. A BIDS-valid raw/ lets you point any analysis at sub-01/ses-pre/... without inventing a project-specific path scheme each time.

How projio is aware of it¶

pipeio.adapters.bids.BidsPaths — pipeio's BIDS adapter (in packages/pipeio/src/pipeio/adapters/bids.py). Wraps a flow's input/output registries against a BIDS root and snakebids generate_inputs() output.
raw/registry.yml + per-flow manifest.yml convention layered on top of BIDS to carry cross-flow contract information BIDS itself doesn't express (e.g. which channel labels are good after a calibration step).
pipeio_target_paths(flow, group, member) — MCP tool that resolves output paths from BIDS wildcards without the agent having to construct paths by hand.

Adoption across projects¶

Project	Adopted?	How / signature artifact	Variation worth teaching
projio	n/a	tool repo, no dataset	— (workshop note: stack tools don't themselves consume BIDS)
cogpy	n/a	library repo, no dataset	snakebids appears in source code (`src/cogpy/workflows/preprocess/Snakefile`) without a real BIDS dataset
pixecog	yes (strict)	`raw/{participants.tsv, dataset_description.json, sub-01..05, sub-test, tasks.json, registry.yml}` + 18 derivative roots; `raw/sourcedata/` separate	18 derivative roots + 3 manifest-emitting flows = densest BIDS in the cohort
gecog	yes (strict)	`raw/{participants.tsv, dataset_description.json, sub-01..12, sourcedata/, registry.yml, config.yml}` + 7 derivative roots	derivative root with no `dataset_description.json` but with `manifest.yml` — soft-form derivative re-rooting
msol	yes	`raw/{participants.tsv, participants.json, dataset_description.json, sub-01..06, cameras.{json,tsv}, task-*.json}` + 4 derivative roots	first BIDS adoption outside electrophysiology — behavior + DLC; demonstrates BIDS-for-video

Convergent patterns¶

All three study projects use raw/ as the strict BIDS root with participants.tsv + dataset_description.json + sub-XX/.
All three keep sourcedata/ outside the BIDS-validated tree (pixecog and msol use it as a symlink to a separate dataset directory).
All three put per-flow output under derivatives/<flow>/ rather than in-place rewrites of raw/.

Divergent patterns¶

Derivative-root strictness: pixecog and gecog keep derivatives/<flow>/ partial BIDS — manifest.yml carries channel/event metadata but no dataset_description.json is emitted, so derivatives aren't independently BIDS-valid. msol doesn't use the manifest pattern at all (its derivatives are plain output dirs).
TTL-cleaning re-rooting: pixecog re-roots derivatives/preprocess_ieeg/ as if it were a new BIDS root for downstream flows (the bids_dir_ieeg config switch in lfp_extrema/config.yml). Neither gecog nor msol does this — pixecog is the one example of "derivative-of-derivative treated as a fresh BIDS root."
Subject scale: msol 6 subjects, pixecog 5 subjects, gecog 12 subjects — meaningful for choosing a workshop dataset (gecog largest, msol smallest).

Canonical teaching artifact¶

pixecog/raw/ + the derivatives/preprocess_ieeg/manifest.yml chain. Single concrete pair: a strict BIDS root + a non-trivial derivative root that has its own manifest.yml consumed by downstream flows. Lets the workshop teach (a) BIDS strictness in raw/ and (b) the soft-form derivative-rooting pattern as a deliberate departure from full BIDS in the same artifact.

Honest gap¶

Derivatives in pixecog/gecog are not BIDS-valid (no dataset_description.json per derivative root). The workshop should be explicit that the manifest.yml pattern is a projio convention layered on BIDS, not BIDS itself, and that there is an unresolved question about whether derivative roots should also emit dataset_description.json to be tool-portable beyond projio.

Component 2 — DataLad¶

What it does¶

DataLad versions data and code together by combining git (for small text and metadata) with git-annex (for large files), and by structuring repositories as superdatasets with subdatasets mounted at chosen paths. Workshop framing: git for everything, including the multi-gigabyte files and the upstream library you depend on. Subdatasets let you pin a known commit of code/lib/cogpy inside a study, and siblings let you push the same dataset to GitHub and a RIA store and a GitLab pages target with one command.

How projio is aware of it¶

mcp__projio__datalad_* tools: datalad_save, datalad_status, datalad_push, datalad_pull, datalad_siblings. Wrap CLI invocations in the labpy conda env per the runtime convention.
Sibling helpers in src/projio/helpers/ for GitHub, GitLab, and RIA sibling provisioning. All preview-first (require --yes to execute).
projio sync auto-discovers code/lib/<name>/ subdatasets and registers them in codio with role=core.
pipeio_flow_new scaffolds a flow whose derivatives/<flow>/ is expected to be registered as its own subdataset (the convention is enforced socially, not yet automatically).

Adoption across projects¶

Project	Adopted?	How / signature artifact	Variation worth teaching
projio	yes (composition)	6 ecosystem subdatasets under `packages/{biblio,notio,indexio,codio,pipeio,figio}` + 13 read-only mirrors under `.projio/codio/mirrors/`	only project where subdatasets are the product, not just inputs
cogpy	minimal	no `.gitmodules`; `.datalad/config` only carries dataset id `493104a7-...`	a library, used as a subdataset by others — shows the upstream side of the relationship
pixecog	maximal	`.gitmodules` lists ~25 entries: `raw`, 14 `derivatives/`, `bib`, `code/ECoGandNpix`, 3 `code/lib/`, 2 `.projio/codio/mirrors/sirotalab--*`	densest subdataset graph in the cohort; RIA URLs split across `/storage/share/` (ecosystem libs) and `/storage2/ria-store/` (study-specific)
gecog	medium	9 entries: `raw`, 5 `derivatives/*`, `code/lib/{cogpy,labpy}` (no labbox)	clean RIA layout (`/storage2/ria-store/alias/gecog-*` for everything)
msol	minimal	only one entry: `code/lib/ratcave` (an external GitHub remote, not a RIA alias); no `raw`/`derivatives` registered as subdatasets	the outlier — DataLad initialized but most of the directory tree is not currently treated as subdatasets

Convergent patterns¶

Every project has a .datalad/config with a dataset id (DataLad-initialized).
Every study project's code/lib/ mounts subdatasets for the cogpy + labpy + (sometimes) labbox + (msol-only) ratcave/database_io libraries.
RIA store is the canonical sibling pattern when a sibling is registered (/storage2/ria-store/alias/<project>-<component> or /storage/share/git/ria-store/alias/<lib>).

Divergent patterns¶

Subdataset coverage: pixecog and gecog register every flow's derivatives/<flow>/ as a subdataset; msol does not. Workshop has to pick one and call out the other as "this is the convention, msol is midway through adopting it."
Aliased entries in pixecog's .gitmodules (derivatives/spectrogram and derivatives/spectrogram_burst both pointing at pixecog-spectrogram; manifest and manifest_assemble both at pixecog-manifest) — evidence of in-flight rename, worth flagging but not worth teaching.
External GitHub subdataset is unique to msol (code/lib/ratcave → github.com/ratcave/ratcave.git) — the rest use RIA aliases throughout.

Canonical teaching artifact¶

gecog/.gitmodules — 9 entries, every URL /storage2/ria-store/alias/gecog-*, clean separation of raw/, per-flow derivatives/*, and code/lib/*. The cleanest demonstration of "DataLad as a coherent subdataset graph" without the rename-aliasing noise pixecog carries.

Honest gap¶

Sibling provisioning is preview-first in projio's helpers (good), but the subdataset-per-derivative convention is enforced socially. msol shows the consequence: a study can be DataLad-initialized but not actually use DataLad for the bulk of its content. The workshop and handbook should teach the convention as a deliberate choice the user has to make at flow-creation time, not as an automatic projio behavior.

Component 3 — Snakemake¶

What it does¶

Snakemake is a Python-based pipeline framework that turns a graph of input → output dependencies into a DAG of jobs that can be parallelized, re-run on staleness, and described declaratively in a Snakefile. Workshop framing: write down what each step needs and what it produces; Snakemake figures out the rest. The snakebids extension parameterizes rules over BIDS subject/session/run wildcards.

How projio is aware of it¶

packages/pipeio/ — projio's pipeline subsystem. pipeio.adapters.bids.BidsPaths layers projio's per-flow registry/manifest convention on top of snakebids.
.projio/pipeio/registry.yml — generated by pipeio_registry_scan, enumerates flows and their app_type (currently always snakemake).
MCP tools: pipeio_flow_list, pipeio_flow_status, pipeio_flow_new, pipeio_run, pipeio_dag_export, pipeio_flow_report, pipeio_mod_*, pipeio_rule_*. ~50 tools total addressing flows by name (no path).
Runner auto-detection (pixi.toml present → pixi run snakemake, else conda run -n <env> snakemake).

Adoption across projects¶

Project	Adopted?	How / signature artifact	Variation worth teaching
projio	disabled	`subsystems.pipeio.enabled: false`; `flows: {}`	tool repo deliberately does not run pipelines on itself
cogpy	legacy	`src/cogpy/workflows/preprocess/Snakefile` — pure snakebids (`from snakebids import bids, generate_inputs`), no pipeio adapter	shows the "before-pipeio" world; predates the project's adoption of `code/pipelines/<flow>/` layout
pixecog	extensive	16 flows in `code/pipelines/`, all `app_type: snakemake`; example Snakefile uses snakebids `set_bids_spec("v0_0_0")` and `BidsPaths(in_reg, root, inputs)`	most flows of the cohort; canonical example of snakebids + pipeio compose
gecog	medium	8 flows in `code/pipelines/`; same snakebids + BidsPaths pattern	cleanest single-domain set (factor analysis + sleep spindle + travelling wave)
msol	minimal	3 flows in `code/pipelines/`; plain snakemake — `glob_wildcards()` against flat path templates, no snakebids and no BidsPaths	demonstrates a working pipeline without the BIDS-aware adapters — useful for "Snakemake without ceremony" pedagogy

Convergent patterns¶

Every project that runs pipelines puts them under code/pipelines/<flow>/ with a Snakefile, config.yml, scripts/, and notebooks/.
Every flow output goes under derivatives/<flow>/ (BIDS-aligned).
Every study project's Makefile resolves SNAKEMAKE through pixi or conda env wrapping; none invoke snakemake bare.

Divergent patterns¶

snakebids vs plain snakemake: cogpy = pure snakebids; pixecog + gecog = snakebids + pipeio's BidsPaths; msol = plain snakemake. Three styles in the same workflow ecosystem.
Cross-flow contract: pixecog + gecog use manifest.yml written by upstream flows and read via BidsPaths(safe_load(...manifest.yml), ...) by downstream flows. cogpy is single-flow; msol's three flows do not cross-feed via manifests.
Flow scale: pixecog 16 ≫ gecog 8 ≫ msol 3.

Canonical teaching artifact¶

pixecog/code/pipelines/lfp_extrema/Snakefile + config.yml — the registry-extension pattern (the Snakefile programmatically extends config['registry'] with one group per detection-tuple, then fans out seven outputs per slow-wave detection). One artifact that shows config-driven Snakemake, snakebids wildcards, the BidsPaths adapter, and cross-flow manifest.yml consumption all at once.

Honest gap¶

Three Snakemake styles coexist in one ecosystem (snakebids alone, snakebids + BidsPaths, plain snakemake). The workshop should pick the snakebids + BidsPaths style as the default and explicitly position the others as predecessor (cogpy) and minimal-ceremony (msol) variants. Trying to teach all three styles in 4 days would dilute the message.

Component 4 — Marimo¶

What it does¶

Marimo is a Python notebook format where the file is a .py file (no JSON), cells form a reactive DAG (changing one cell automatically re-runs its dependents), and the notebook can run as a script, as a server, or as a static HTML/WASM bundle. Workshop framing: Jupyter-style narrative, diff-friendly storage, and reactive-spreadsheet semantics — with no hidden state. Marimo plays two distinct roles: (a) per-flow exploratory notebooks, and (b) handbook explorables exported via marimo export html-wasm.

How projio is aware of it¶

pipeio_nb_* MCP tools treat marimo as a first-class notebook backend alongside jupytext percent-format. pipeio_nb_create is kind-aware (investigate/explore vs demo/validate); pipeio_nb_watch launches marimo edit --watch for live editing; pipeio_nb_snapshot executes a marimo notebook and reads cell outputs (the agent's "eyes" into a notebook); pipeio_nb_validate runs marimo check.
format: field in notebook.yml selects the backend per notebook. Auto-detected when empty (which is currently the case in every surveyed notebook.yml).
marimo-pair skill (user-level, not project-local) launches and monitors a marimo session and is allow-listed in pixecog's and msol's .claude/settings.json via the discover-servers.sh and execute-code.sh Bash patterns.

Adoption across projects¶

Project	Adopted?	How / signature artifact	Variation worth teaching
projio	n/a	12 `import marimo` matches inside `packages/pipeio/` source/tests + 2 docs (`docs/specs/pipeio/notebook.md`, `docs/tutorials/marimo-notebooks.md`); no project-level notebooks	tool reference for the backend
cogpy	absent	0 marimo matches	library repo — does not run notebooks
pixecog	extensive	7 marimo `.py` notebooks under `code/pipelines/{calibrate_ieeg_notch, spectrogram_burst, calibrate_ieeg, preprocess_ecephys}/notebooks/explore/`; 15 `notebook.yml` files; `__marimo__/session/` cache dir at repo root	only project with real marimo authoring; the cache dir leaking into repo root is friction
gecog	minimal	2 explore notebooks (`code/pipelines/travelling_wave/notebooks/explore/{kw_spectrum.py, flow_and_patterns.py}`); 7 `notebook.yml` files	one flow has marimo notebooks, the rest are placeholders
msol	absent (notebooks empty)	3 `notebook.yml` files but the `.src/explore_*.py` paths point to placeholder marimo files; one calibration script (`code/scripts/calibration/calibrate_arena_corners.py`) imports marimo	adopted scaffolding without populating notebooks

Convergent patterns¶

Every project that has flows uses the notebooks/{explore,demo}/.src/ layout per the feedback_notebook_layout.md convention (split source vs MyST views; no per-notebook subdirs).
Every notebook.yml is auto-detect (format: '').
Marimo enters projects via pipeio's notebook tooling — none of the projects use marimo edit independently of the projio convention.

Divergent patterns¶

Real adoption is concentrated in pixecog (7 notebooks across 4 flows). Other projects have the scaffolding but mostly empty .src files.
Cache discipline: only pixecog leaks __marimo__/session/ into the repo root — should be in .gitignore.

Canonical teaching artifact¶

pixecog/code/pipelines/spectrogram_burst/notebooks/ — notebook.yml + a real marimo .py notebook in explore/.src/. Single concrete example of (a) the kind-aware notebook layout, (b) a real exploration notebook checked into the repo, (c) pipeio_nb_* discovery.

Honest gap¶

Marimo is real on disk in only one project. The workshop can teach authoring in pixecog, but the second role for marimo (handbook explorables via marimo export html-wasm) has zero examples in the surveyed projects. That is a deliberate handbook target rather than a current artifact: handbook chapter 1 can include the first real explorable as part of writing the chapter.

Component 5 — Quarto / MkDocs¶

What it does¶

Both are static-site generators that turn markdown/Quarto-markdown into HTML, but with different defaults: MkDocs (with the Material theme) is optimized for documentation sites with navigation, search, and MkDocs plugins (bibtex, monorepo, ezlinks); Quarto unifies markdown + executable code + multiple output formats (website, book, revealjs slides, PDF) under one source. Workshop framing: MkDocs for the handbook (a docs site), Quarto for the workshop (a multi-output course package where the same source feeds website + book + slides + executable notebooks).

How projio is aware of it¶

site.framework in .projio/config.yml (mkdocs | sphinx) plus output_dir. No quarto value yet in the framework enum — Quarto enters via .projio/render/quarto.yml for individual deliverables.
MCP tools: site_build, site_serve, site_deploy, site_list, site_detect. These dispatch to MkDocs or Sphinx based on site.framework.
pipeio_mkdocs_nav_patch — patches mkdocs.yml nav from collected flow docs. No equivalent for Quarto navigation yet.
docs/specs/quarto-reports.md + .projio/render/quarto.yml — the Quarto-for-deliverables convention (per-report .qmd files rendered to HTML/PDF, separate from the site nav).

Adoption across projects¶

Project	MkDocs	Sphinx	Quarto	Variation worth teaching
projio	yes (`theme.name: material`)	no	yes (deliverables only — `.projio/render/quarto.yml`)	only project with both MkDocs and Quarto in the same repo
cogpy	yes (material)	yes (`docs/build/html`, `Makefile` uses `sphinx-build`)	no	only project on Sphinx (legacy, kept)
pixecog	yes (material)	no	no	mkdocs.yml + Makefile splits `serve` and `serve_live` (mike-style versioning)
gecog	yes (material)	no	no	minimal mkdocs.yml; Makefile defers to `.projio/projio.mk`
msol	yes (material)	no	no	most plugin-rich mkdocs (search + monorepo + ezlinks + bibtex)

Convergent patterns¶

mkdocs-material is the universal documentation framework across all 5 projects.
All study projects' Makefiles include .projio/projio.mk for the generated docs targets.

Divergent patterns¶

Sphinx outlier: cogpy alone runs Sphinx (its API docs were pre-existing when it joined the projio ecosystem). Workshop honest framing: projio doesn't impose a docs framework.
Quarto split: projio uses Quarto only for deliverables (.qmd reports) under docs/deliverables/reports/, not for the site itself. The workshop introduces Quarto as the workshop's own publication framework, not as a projio convention to push into other projects.
Plugin density: msol's mkdocs.yml has 4 plugins (search, monorepo, ezlinks, bibtex); gecog's is minimal. Worth teaching that mkdocs-material is a bare framework — projects choose how much to decorate.

Canonical teaching artifact¶

projio/.projio/render/quarto.yml + one docs/deliverables/reports/*.qmd report. Single artifact that shows the projio Quarto-for-deliverables convention. For MkDocs, pixecog/mkdocs.yml is the strongest example — material theme, real navigation, Makefile-wired build.

Honest gap¶

The handbook (mkdocs-material) and the workshop (Quarto project) are different surfaces with different generators, and projio has no convention for cross-linking from one to the other yet. The architecture decision in the source idea note (handbook in projio, workshop in teaching/agentic-workshop/) sidesteps this by separation — worth surfacing as a deliberate design choice in chapter 1, not as an unsolved problem.

Component 6 — projio¶

What it does¶

projio is the project-aware layer that knows about BIDS conventions, DataLad subdatasets, Snakemake flows, marimo notebooks, and the docs framework — and exposes that knowledge through MCP tools so a Claude Code agent can query "what flows are registered?", "what does this mod produce?", "where does this paper live?", "what notes did the agent write last week?", without inventing a path scheme. Workshop framing: the stack already does the work; projio makes the stack queryable.

How projio is aware of itself¶

projio's six subsystems each manage a knowledge domain:

Subsystem	Domain	Awareness
indexio	retrieval	corpus indexing, chunking, embedding, RAG
biblio	literature	bibliography, citekey resolution, paper context, docling/grobid
notio	notes + manuscripts	structured notes, manuscript assembly
codio	code intelligence	library registry with `role: core/shared/external`
pipeio	pipelines	flow/mod/rule/notebook/config tooling
figio	figures	declarative figure orchestration (FigureSpec YAML)

Adoption across projects¶

Project	indexio	biblio	notio	codio	pipeio	figio	Variation
projio	active (2 corpora, 1.3k+75k chunks)	active	active	active (1 first-party + 14 mirrors)	disabled (`enabled: false`)	minimal (1 example)	tool repo — pipeio off intentionally
cogpy	active	active (full `bib/`)	active (10 log subdirs)	active (~40 external mirrors)	not enabled	absent	strongest codio external-mirror catalog
pixecog	active	active	active (9 log subdirs)	active (cogpy/labbox/labpy as `code/lib/*`)	active (16 flows)	dir-only, figs ad-hoc	most flows; densest subsystem usage
gecog	active	active	active (5 log subdirs)	active (cogpy/labpy as `code/lib/*`)	active (8 flows)	1 first-party FigureSpec	only first-party FigureSpec in the cohort
msol	active	`enabled: false` (drift)	`enabled: false` (drift)	active (database_io/ratcave)	active (3 flows)	absent	only project with two subsystems flagged off

Convergent patterns¶

All study projects use code/{lib,pipelines,utils}/ per the code-tiers spec.
All study projects keep docs/log/{idea,issue,task,result,meeting}/ with index.md (notio).
All study projects have bib/ populated (even msol where biblio is flagged off in config).

Divergent patterns¶

enabled: flag drift: msol declares biblio + notio off while using both. projio doesn't auto-reconcile.
figio adoption: 1 first-party FigureSpec across all 5 projects (gecog's May-02 cohort). Most figures are ad-hoc <report>-figs/.
Code tier presence: cogpy is src/cogpy/<area>/ flat (no code/lib/) — it's a library, not a study; the tier convention doesn't apply.

Graded introduction sequence (workshop)¶

The handbook + workshop introduce projio gradually: BIDS + DataLad + Snakemake + Marimo + Quarto first (workshop day 1 + early day 2), then projio enters as the layer that knows about those conventions. Each stage motivated by the pain it solves in the prior stage.

Stage	What's added	Workshop session that introduces it	Stack-awareness payoff
0	BIDS root + DataLad super/subdatasets + one snakebids flow + marimo explore notebook + mkdocs site, no projio	Day 1 (full day)	Establish the bare stack works without projio. Pain that motivates next stage: 12 derivatives dirs and no record of which Snakemake config produced each.
1	`projio sync` + `.projio/config.yml` + `notio` (`docs/log/` + indexes)	Day 2 AM	Project memory: dated tasks/results/ideas under `docs/log/`, navigable via MCP `note_search`. Pain solved: "why was this run produced and by whom?" Pain remaining: cross-flow paths still hand-constructed.
2	`pipeio` flow registry + `BidsPaths` adapter + `manifest.yml` contract	Day 2 PM	One MCP tool call (`pipeio_flow_list`, `pipeio_target_paths`) returns flow inventory and resolved BIDS output paths. Pain solved: agents and humans stop hand-constructing BIDS wildcards. Pain remaining: paper claims drift from data.
3	`biblio` + `indexio` (paper ingest, docling extraction, RAG)	Day 3 AM	Citekey-based references resolvable via MCP; full-text search across project corpus. Pain solved: claims back-link to extracted paper text. Pain remaining: figures still ad-hoc per report.
4	`figio` FigureSpec + `manuscript` subsystem	Day 3 PM	Composable figure specs (one panel-per-rule), auditable `manuscript_cite_check`. Pain solved: one figure spec → multiple outputs (PDF/PNG/SVG).
5	`codio` library catalog with `role: core` + `agent_instructions()` discovery	Day 3 PM (continues)	Agent-discoverable code reuse — "what library can do XYZ?" → MCP `codio_discover`. Pain solved: agent doesn't reinvent primitives.

The point of the gradient: each subsystem earns its complexity by solving a problem the previous stage exposed. A workshop participant who stops at stage 1 still has a working project; each later stage is additive, not foundational.

Canonical teaching artifact¶

pixecog/.projio/pipeio/registry.yml (16 flows) + pixecog/.projio/config.yml + the code/pipelines/lfp_extrema/ flow. Together they show: how projio discovers flows (registry.yml), how it configures itself (.projio/config.yml), and what a real flow looks like under the convention.

Honest gap¶

subsystems.<name>.enabled flags can lag actual on-disk usage, and projio doesn't auto-detect that drift (msol is the running example). Likewise figio is mostly aspirational across all five projects: one first-party FigureSpec exists. The handbook should be honest about both — the convention exists; uniform adoption does not.

Component 7 — Agentic on top¶

What it does¶

The "agentic on top" layer is the set of conventions and configurations that let a Claude Code agent operate on a projio-aware project: which MCP servers it talks to, which Bash commands and read paths are pre-approved (.claude/settings.json), which prompt-based skills are discoverable (.projio/skills/), and how captures/notes flow into dispatched tasks (worklog). Workshop framing: the agent is a collaborator with bounded permissions, structured context, and routed work — not an opaque chatbot.

How projio is aware of it¶

projio init scaffolds .claude/settings.json (with mcp__projio__* pre-approved) + .mcp.json (projio + worklog + sirocampus baseline). Both files gitignored via the # >>> projio >>> block.
agent_instructions() MCP tool discovers skills from src/projio/data/skills/ (ecosystem) + .projio/skills/<name>/SKILL.md (project-local; project overrides ecosystem).
skill_read(name) returns a skill body — Claude Code surfaces these as slash commands.
permissions_sync() reconciles project skill listings.

Agentic-on-top inventory¶

`.claude/settings.json` permission shape¶

Project	MCP servers pre-approved	Distinctive Bash	Distinctive Read paths
projio	`mcp__projio__`, `mcp__sirocampus__`, `mcp__worklog__*`	`git/python/pip/pytest/make` baseline	`/storage2/arash/*`, 14 `.projio/codio/mirrors/`
cogpy	`mcp__projio__`, `mcp__sirocampus__`, `mcp__worklog__*`	typing/lint stack: `mypy/ruff/black/isort/tox/nox/coverage/jupyter/ipython`	`/storage/share/codelib/*` (~17 lab-shared mirrors)
pixecog	`mcp__projio__`, `mcp__sirocampus__`, `mcp__worklog__*` + `WebSearch`	`pixi search `, `ssh gamma{1..4} uptime`, `ssh gpu/spikesort/beta/theta uptime`, `bash /home/arash/.claude/skills/marimo-pair/scripts/discover-servers.sh`	three subdataset trees `code/lib/{cogpy,labbox,labpy}`
gecog	`mcp__projio__`, `mcp__sirocampus__`, `mcp__worklog__` + `mcp__cogpy__`**	minimal Bash baseline	`code/lib/{cogpy,labpy}`
msol	`mcp__projio__`, `mcp__sirocampus__`, `mcp__worklog__*`	`pixi run -e analysis marimo `, `pixi run marimo `, `pixi search `, `pixi install `, marimo-pair scripts	`code/lib/{database_io,ratcave}`

Tool wildcards (mcp__projio__*) are universal. Explicit Bash command patterns vary by project domain (typing stack for cogpy, multi-host SSH for pixecog, pixi+marimo for msol).

`.mcp.json` server set¶

Project	projio MCP rooted at	Other servers	Notable
projio	this repo	sirocampus + worklog	three-server baseline
cogpy	this repo	sirocampus + worklog	baseline
pixecog	this repo	sirocampus + worklog	baseline
gecog	this repo	cogpy + sirocampus + worklog	unique fourth server (`cogpy` MCP exposes the cogpy library's projio tools as a sibling project)
msol	this repo	sirocampus + worklog	baseline

All servers run python -m projio.mcp.server via /storage/share/python/environments/Anaconda3/envs/rag/bin/python — a single env hosts every MCP server in the ecosystem.

Skills¶

Project-local (.projio/skills/<name>/):
projio: figio-guide, projio-setup (2)
cogpy: cogpy-dev (1)
pixecog: pixecog-flow-setup (1)
gecog: 0
msol: 0
User-level (.claude/skills/):
projio: gitnexus (1)
everywhere else: directory does not exist
Ecosystem skills (always available via agent_instructions()): ~25 skills from src/projio/data/skills/ — figio-guide, pipeio-guide, marimo-session, marimo-pair, pipeio-nb-extract, progress-report-deck, literature-presentation, idea-capture, questio-* (7), biblio-batch-curate, codelib-discovery, mcp-tool-scaffold, rag-query, etc.

Captures → tasks pattern (worklog)¶

Across projects: notes captured via worklog_note(text, project_id) land in docs/log/idea/ or docs/log/issue/ per kind. Tasks promoted via promote_to_task(source) land in docs/log/task/<task>.md with status: scheduled, then are dispatched via worklog_note(..., auto_dispatch=True, model="opus|sonnet") or schedule_queue(after=...) for dependency-based chains.

This very session is itself a teaching artifact: the source idea note (docs/log/idea/idea-arash-20260507-221835-382557.md) → three task notes (task-arash-20260508-160000-200001.md etc.) → this result note all live in projio's docs/log/ and trace a real planning loop.

Hooks¶

No hooks key in any of the five projects' .claude/settings.json. Permissions are per-project; cross-cutting automation (Stop / PreToolUse) is currently absent across the cohort. The workshop and handbook can introduce hooks as "advanced" rather than as baseline practice.

Routing (model + host)¶

Model selection follows the worklog-MCP convention: opus for synthesis (multi-package, architectural), sonnet for execution (single-module, clear scope), haiku for triage (trivial/typo). auto_dispatch=True defaults to opus.
Host routing is encoded only in pixecog's .claude/settings.json via the ssh gamma{1..4}/gpu/spikesort/beta/theta uptime allow-list — that's where multi-host work happens. The other projects don't expose cross-host SSH in their permission set.

Adoption summary table¶

Project	hooks	project skills	extra MCP servers	distinctive permission shape
projio	none	2 + user-level `gitnexus`	none beyond baseline	mirror Read globs
cogpy	none	1 (`cogpy-dev`)	none beyond baseline	typing/lint Bash
pixecog	none	1 (`pixecog-flow-setup`)	none beyond baseline	multi-host SSH + pixi + marimo-pair
gecog	none	0	cogpy MCP	minimal — cleanest "default" project
msol	none	0	none beyond baseline	pixi-env-named marimo Bash

Convergent patterns¶

Three-server MCP baseline (projio + worklog + sirocampus) is universal.
mcp__<server>__* wildcards must appear in both permissions.allow and allowedTools for auto-approval (per the user's feedback_mcp_permissions.md memory). Every surveyed project follows this.
Read access to /storage2/arash/** and /storage/share/sirocampus/** is universal — the cross-project read substrate.

Divergent patterns¶

Project-local skills: present in 3/5 projects, absent in 2/5 (gecog, msol). Skill authoring is real but not yet baseline.
Bash permission shape mirrors project domain: cogpy = library hygiene; pixecog = multi-host research compute; msol = pixi+marimo workflow; gecog = minimal.
Fourth MCP server: gecog uniquely wires cogpy as a per-project MCP server, exposing the cogpy library's projio tools alongside its own — the only example of a non-baseline projio MCP server in the cohort.

Canonical teaching artifact¶

pixecog/.claude/settings.json + pixecog/.mcp.json + the docs/log/idea/ → docs/log/task/ → docs/log/result/ chain for any recent multi-step initiative. One bundle that shows: tool wildcards, explicit Bash command patterns for multi-host SSH, marimo-pair script allow-listing, and the captured-to-dispatched-to-result flow as auditable artifacts.

Honest gap¶

Hooks are unused (zero across the cohort). Project-local skills are present in only 3/5 projects. The "agent operates with bounded permissions and structured context" pattern is real but not uniformly adopted. Workshop should introduce skills + hooks as graduated adoption rather than as baseline.

Universal patterns across the stack¶

These hold across every component that's adopted by a project:

The repository is the unit of knowledge — everything (data, code, papers, notes, deliverables, configs, agent settings) lives in or alongside one git/datalad superdataset. No separate "knowledge base" elsewhere.
Conventions over configuration — BIDS for data, DataLad for versioning, code/{lib,pipelines,utils}/ for code, docs/log/{idea, task,result,...}/ for notio, derivatives/<flow>/ for outputs, .projio/ for tool state. Each tool surfaces its own conventions so that an agent or a new collaborator can navigate a project they have never seen.
MCP wildcards in both permission slots — every project's .claude/settings.json lists mcp__<server>__* in both permissions.allow and allowedTools. This is non-obvious from Claude Code defaults and is enforced socially.
Subsystem-disable as a config flag, not a deletion — turning off pipeio (projio) or biblio/notio (msol) in .projio/config.yml prevents the subsystem from being invoked but leaves the on-disk state alone. Drift between flag and state is possible.
Snakemake outputs land under derivatives/<flow>/ and are registered as DataLad subdatasets (in pixecog and gecog — the convention msol is mid-adopting).
Cross-flow contract is manifest.yml + BidsPaths in the electrophysiology projects — Snakemake's input/output alone is not the integration glue.
Three MCP servers minimum: projio (rooted at the project) + sirocampus + worklog. All run from one shared rag conda env.
projio sync is the periodic reconciliation step — auto-detects code/lib/*/, generates .projio/projio.mk, copies CSL/Lua filters, regenerates skill index. Without it, drift accumulates.
Notes are the audit trail — docs/log/{idea,task,result,...}/ with daily/weekly indexes captures the why of every dispatched task and every produced result. The chain itself is navigable knowledge, not a side-effect of work.

Recommended teaching artifacts (one per component, 7 total)¶

Ranked by leverage (impact × concreteness × low explanatory friction) within each component. Each entry maps to a specific workshop session and a specific handbook chapter.

#	Component	Artifact	Project	Anchors
1	BIDS	`pixecog/raw/` (strict BIDS) + `pixecog/derivatives/preprocess_ieeg/manifest.yml` (soft-form derivative root)	pixecog	Workshop Day 1 AM; Handbook ch. "BIDS in practice"
2	DataLad	`gecog/.gitmodules` (9 entries, single-RIA-store layout)	gecog	Workshop Day 1 AM; Handbook ch. "DataLad as a coherent subdataset graph"
3	Snakemake	`pixecog/code/pipelines/lfp_extrema/Snakefile` + `config.yml` (registry-extension pattern)	pixecog	Workshop Day 1 PM; Handbook ch. "config-driven pipelines"
4	Marimo	`pixecog/code/pipelines/spectrogram_burst/notebooks/` (real `.src/explore_*.py` + `notebook.yml`)	pixecog	Workshop Day 2 AM; Handbook ch. "reactive notebooks for analysis and explorables"
5	Quarto / MkDocs	`projio/.projio/render/quarto.yml` + `pixecog/mkdocs.yml` (the two surfaces)	projio + pixecog	Workshop pre-workshop setup (build the site once on day 0); Handbook ch. "publication framework"
6	projio	`pixecog/.projio/pipeio/registry.yml` (16 flows) + `pixecog/.projio/config.yml` + `pixecog/code/pipelines/lfp_extrema/`	pixecog	Workshop Day 2 PM + Day 3 AM; Handbook chs. "the project as a queryable knowledge environment" + "the pipeio subsystem"
7	Agentic on top	`pixecog/.claude/settings.json` + `pixecog/.mcp.json` + a recent `docs/log/idea → task → result` chain (gecog mlclassifier or pixecog detection_qc)	pixecog (settings) + gecog (chain)	Workshop Day 2 AM intro + Day 3 PM; Handbook ch. "the iterative loop"

Notes on the shortlist:

pixecog dominates the artifact list (5 of 7 entries) because it has the densest stack adoption — the strongest single project to dissect for a workshop.
gecog provides the agentic chain because the brainstate.mlclassifier arc (Apr 29 – May 6) is the cleanest, most narrative idea → task → result loop in any project's docs/log/.
msol provides domain diversity for the "this generalizes beyond electrophysiology" slot in workshop day 1, but is not the canonical artifact for any single component (it under-adopts DataLad subdatasets and Marimo, doesn't use snakebids/BidsPaths).
cogpy provides the legacy snakebids reference (src/cogpy/workflows/preprocess/Snakefile) but is not on the canonical list — workshop introduces it as "before-pipeio" backdrop, not as the artifact to dissect.

Honest gaps¶

Consolidated from the seven component-level gaps, plus one cross-cutting gap. State each as a sentence, then a sentence on what the handbook / workshop should do about it.

Derivative roots aren't BIDS-valid (no per-derivative dataset_description.json). The workshop teaches manifest.yml as a projio convention layered on BIDS, not as BIDS itself. Open question for a future projio convention iteration: should pipeio_flow_new emit a derivative dataset_description.json?
Subdataset-per-derivative is socially enforced, not automatic. msol shows the consequence: a study can be DataLad-initialized while most of the directory tree is not actually subdatasetted. The handbook chapter on DataLad should teach the convention as a deliberate choice at flow-creation time, with pipeio_flow_new prompting the user to register the subdataset.
Three Snakemake idioms coexist (snakebids alone in cogpy, snakebids + BidsPaths in pixecog/gecog, plain snakemake in msol). The workshop picks snakebids + BidsPaths as default and labels the others as legacy / minimal-ceremony variants — no attempt to teach all three.
Marimo's second role (handbook explorables) has zero examples yet in any surveyed project. The first explorable is a deliberate handbook target, not an existing artifact to dissect. Plan the first explorable into chapter 1's writing schedule, not into pre-workshop reading.
Quarto and MkDocs don't cross-link between the handbook surface and the workshop surface. The architecture decision in the source note (handbook in projio, workshop in teaching/agentic-workshop/) sidesteps this by separation, but the user-facing path between the two surfaces is currently "navigate the URL bar." Worth surfacing as a deliberate choice in chapter 1, not as an unsolved problem.
subsystems.<name>.enabled flags drift from on-disk reality (msol has biblio + notio off while using both). projio doesn't auto-reconcile. The workshop advises periodic projio sync; the handbook should mention the drift exists.
Hooks are unused across all 5 projects' .claude/settings.json, and project-local skills are present in only 3/5. The "agent operates with bounded permissions and structured context" pattern is real but not uniformly adopted. Introduce skills + hooks as graduated adoption rather than baseline.
Cross-cutting: every project in this set has one author (Arash). The single-author fragility named in the source idea note (Quantomatic precedent) applies to every component of the stack as projio uses it. The handbook + workshop are the docs + examples + community legs of the survival strategy. Name this as motivation in chapter 1, not as deflection.

Method¶

Read the prior result-arash-20260508-tool-use-survey.md once for paths and tool names; framed new prose around the seven stack components, not subsystems.
Used mcp__worklog__worklog_read_file to read each project's .claude/settings.json, .mcp.json, .projio/config.yml, pixi.toml, Makefile, .gitmodules, .datalad/config, .projio/pipeio/registry.yml, one example Snakefile per project, and one example notebook.yml per project.
Used a general-purpose agent in parallel to gather counts, file inventories, and the marimo / __marimo__ cache scan across all five projects.
Cross-checked findings against the prior tool-use survey for paths and adoption counts; this survey is internally consistent with that survey but reframes the same evidence on the stack axis.
No source projects modified.

Stack-axis survey: BIDS / DataLad / Snakemake / Marimo / Quarto+MkDocs / projio / agentic across study projects

Purpose¶

Component 1 — BIDS¶

What it does¶

How projio is aware of it¶

Adoption across projects¶

Convergent patterns¶

Divergent patterns¶

Canonical teaching artifact¶

Honest gap¶

Component 2 — DataLad¶

What it does¶

How projio is aware of it¶

Adoption across projects¶

Convergent patterns¶

Divergent patterns¶

Canonical teaching artifact¶

Honest gap¶

Component 3 — Snakemake¶

What it does¶

How projio is aware of it¶

Adoption across projects¶

Convergent patterns¶

Divergent patterns¶

Canonical teaching artifact¶

Honest gap¶

Component 4 — Marimo¶

What it does¶

How projio is aware of it¶

Adoption across projects¶

Convergent patterns¶

Divergent patterns¶

Canonical teaching artifact¶

Honest gap¶

Component 5 — Quarto / MkDocs¶

What it does¶

How projio is aware of it¶

Adoption across projects¶

Convergent patterns¶

Divergent patterns¶

Canonical teaching artifact¶

Honest gap¶

Component 6 — projio¶

What it does¶

How projio is aware of itself¶

Adoption across projects¶

Convergent patterns¶

Divergent patterns¶

Graded introduction sequence (workshop)¶

Canonical teaching artifact¶

Honest gap¶

Component 7 — Agentic on top¶

What it does¶

How projio is aware of it¶

Agentic-on-top inventory¶

.claude/settings.json permission shape¶

.mcp.json server set¶

Skills¶

Captures → tasks pattern (worklog)¶

Hooks¶

Routing (model + host)¶

Adoption summary table¶

Convergent patterns¶

Divergent patterns¶

Canonical teaching artifact¶

Honest gap¶

Universal patterns across the stack¶

Recommended teaching artifacts (one per component, 7 total)¶

Honest gaps¶

Method¶

`.claude/settings.json` permission shape¶

`.mcp.json` server set¶