Stack-axis survey: BIDS / DataLad / Snakemake / Marimo / Quarto+MkDocs / projio / agentic across study projects
Purpose¶
Empirical input for the handbook + 4-day workshop on the open-science stack
(BIDS, DataLad, Snakemake, Marimo, Quarto/MkDocs) plus the projio + agentic
layer that wraps them. This survey slices horizontally across the stack:
for each component, "how does it show up in projio's project-aware system,
and how does its adoption vary across projio + the four study projects
(cogpy, pixecog, gecog, msol)?"
Read-only across all projects. Companion / superseded by the per-project tool-use survey — that one is organized per-project on the analysis-substance axis; this one is organized per-component on the stack axis.
Workshop-audience framing throughout: each component opens with a one-paragraph operational definition that could appear verbatim in pre-workshop reading.
Component 1 — BIDS¶
What it does¶
BIDS (Brain Imaging Data Structure) is a filesystem convention for organizing
neuroscience datasets so that any tool, person, or script knows where to find
subjects, sessions, runs, modalities, and metadata without per-dataset
configuration. Workshop framing: the directory layout is the API. A
BIDS-valid raw/ lets you point any analysis at sub-01/ses-pre/... without
inventing a project-specific path scheme each time.
How projio is aware of it¶
pipeio.adapters.bids.BidsPaths— pipeio's BIDS adapter (inpackages/pipeio/src/pipeio/adapters/bids.py). Wraps a flow's input/output registries against a BIDS root and snakebidsgenerate_inputs()output.raw/registry.yml+ per-flowmanifest.ymlconvention layered on top of BIDS to carry cross-flow contract information BIDS itself doesn't express (e.g. which channel labels are good after a calibration step).pipeio_target_paths(flow, group, member)— MCP tool that resolves output paths from BIDS wildcards without the agent having to construct paths by hand.
Adoption across projects¶
| Project | Adopted? | How / signature artifact | Variation worth teaching |
|---|---|---|---|
| projio | n/a | tool repo, no dataset | — (workshop note: stack tools don't themselves consume BIDS) |
| cogpy | n/a | library repo, no dataset | snakebids appears in source code (src/cogpy/workflows/preprocess/Snakefile) without a real BIDS dataset |
| pixecog | yes (strict) | raw/{participants.tsv, dataset_description.json, sub-01..05, sub-test, tasks.json, registry.yml} + 18 derivative roots; raw/sourcedata/ separate |
18 derivative roots + 3 manifest-emitting flows = densest BIDS in the cohort |
| gecog | yes (strict) | raw/{participants.tsv, dataset_description.json, sub-01..12, sourcedata/, registry.yml, config.yml} + 7 derivative roots |
derivative root with no dataset_description.json but with manifest.yml — soft-form derivative re-rooting |
| msol | yes | raw/{participants.tsv, participants.json, dataset_description.json, sub-01..06, cameras.{json,tsv}, task-*.json} + 4 derivative roots |
first BIDS adoption outside electrophysiology — behavior + DLC; demonstrates BIDS-for-video |
Convergent patterns¶
- All three study projects use
raw/as the strict BIDS root withparticipants.tsv+dataset_description.json+sub-XX/. - All three keep
sourcedata/outside the BIDS-validated tree (pixecog and msol use it as a symlink to a separate dataset directory). - All three put per-flow output under
derivatives/<flow>/rather than in-place rewrites ofraw/.
Divergent patterns¶
- Derivative-root strictness: pixecog and gecog keep
derivatives/<flow>/partial BIDS —manifest.ymlcarries channel/event metadata but nodataset_description.jsonis emitted, so derivatives aren't independently BIDS-valid. msol doesn't use the manifest pattern at all (its derivatives are plain output dirs). - TTL-cleaning re-rooting: pixecog re-roots
derivatives/preprocess_ieeg/as if it were a new BIDS root for downstream flows (thebids_dir_ieegconfig switch inlfp_extrema/config.yml). Neither gecog nor msol does this — pixecog is the one example of "derivative-of-derivative treated as a fresh BIDS root." - Subject scale: msol 6 subjects, pixecog 5 subjects, gecog 12 subjects — meaningful for choosing a workshop dataset (gecog largest, msol smallest).
Canonical teaching artifact¶
pixecog/raw/ + the derivatives/preprocess_ieeg/manifest.yml chain.
Single concrete pair: a strict BIDS root + a non-trivial derivative root
that has its own manifest.yml consumed by downstream flows. Lets the
workshop teach (a) BIDS strictness in raw/ and (b) the soft-form
derivative-rooting pattern as a deliberate departure from full BIDS in
the same artifact.
Honest gap¶
Derivatives in pixecog/gecog are not BIDS-valid (no
dataset_description.json per derivative root). The workshop should be
explicit that the manifest.yml pattern is a projio convention layered
on BIDS, not BIDS itself, and that there is an unresolved question
about whether derivative roots should also emit dataset_description.json
to be tool-portable beyond projio.
Component 2 — DataLad¶
What it does¶
DataLad versions data and code together by combining git (for small text and
metadata) with git-annex (for large files), and by structuring repositories
as superdatasets with subdatasets mounted at chosen paths. Workshop
framing: git for everything, including the multi-gigabyte files and the
upstream library you depend on. Subdatasets let you pin a known commit of
code/lib/cogpy inside a study, and siblings let you push the same
dataset to GitHub and a RIA store and a GitLab pages target with one
command.
How projio is aware of it¶
mcp__projio__datalad_*tools:datalad_save,datalad_status,datalad_push,datalad_pull,datalad_siblings. Wrap CLI invocations in thelabpyconda env per the runtime convention.- Sibling helpers in
src/projio/helpers/for GitHub, GitLab, and RIA sibling provisioning. All preview-first (require--yesto execute). projio syncauto-discoverscode/lib/<name>/subdatasets and registers them in codio withrole=core.pipeio_flow_newscaffolds a flow whosederivatives/<flow>/is expected to be registered as its own subdataset (the convention is enforced socially, not yet automatically).
Adoption across projects¶
| Project | Adopted? | How / signature artifact | Variation worth teaching |
|---|---|---|---|
| projio | yes (composition) | 6 ecosystem subdatasets under packages/{biblio,notio,indexio,codio,pipeio,figio} + 13 read-only mirrors under .projio/codio/mirrors/ |
only project where subdatasets are the product, not just inputs |
| cogpy | minimal | no .gitmodules; .datalad/config only carries dataset id 493104a7-... |
a library, used as a subdataset by others — shows the upstream side of the relationship |
| pixecog | maximal | .gitmodules lists ~25 entries: raw, 14 derivatives/*, bib, code/ECoGandNpix, 3 code/lib/*, 2 .projio/codio/mirrors/sirotalab--* |
densest subdataset graph in the cohort; RIA URLs split across /storage/share/ (ecosystem libs) and /storage2/ria-store/ (study-specific) |
| gecog | medium | 9 entries: raw, 5 derivatives/*, code/lib/{cogpy,labpy} (no labbox) |
clean RIA layout (/storage2/ria-store/alias/gecog-* for everything) |
| msol | minimal | only one entry: code/lib/ratcave (an external GitHub remote, not a RIA alias); no raw/derivatives registered as subdatasets |
the outlier — DataLad initialized but most of the directory tree is not currently treated as subdatasets |
Convergent patterns¶
- Every project has a
.datalad/configwith a dataset id (DataLad-initialized). - Every study project's
code/lib/mounts subdatasets for the cogpy + labpy + (sometimes) labbox + (msol-only) ratcave/database_io libraries. - RIA store is the canonical sibling pattern when a sibling is registered
(
/storage2/ria-store/alias/<project>-<component>or/storage/share/git/ria-store/alias/<lib>).
Divergent patterns¶
- Subdataset coverage: pixecog and gecog register every flow's
derivatives/<flow>/as a subdataset; msol does not. Workshop has to pick one and call out the other as "this is the convention, msol is midway through adopting it." - Aliased entries in pixecog's
.gitmodules(derivatives/spectrogramandderivatives/spectrogram_burstboth pointing atpixecog-spectrogram;manifestandmanifest_assembleboth atpixecog-manifest) — evidence of in-flight rename, worth flagging but not worth teaching. - External GitHub subdataset is unique to msol (
code/lib/ratcave→ github.com/ratcave/ratcave.git) — the rest use RIA aliases throughout.
Canonical teaching artifact¶
gecog/.gitmodules — 9 entries, every URL /storage2/ria-store/alias/gecog-*,
clean separation of raw/, per-flow derivatives/*, and code/lib/*. The
cleanest demonstration of "DataLad as a coherent subdataset graph" without
the rename-aliasing noise pixecog carries.
Honest gap¶
Sibling provisioning is preview-first in projio's helpers (good), but the subdataset-per-derivative convention is enforced socially. msol shows the consequence: a study can be DataLad-initialized but not actually use DataLad for the bulk of its content. The workshop and handbook should teach the convention as a deliberate choice the user has to make at flow-creation time, not as an automatic projio behavior.
Component 3 — Snakemake¶
What it does¶
Snakemake is a Python-based pipeline framework that turns a graph of input
→ output dependencies into a DAG of jobs that can be parallelized,
re-run on staleness, and described declaratively in a Snakefile.
Workshop framing: write down what each step needs and what it produces;
Snakemake figures out the rest. The snakebids extension parameterizes
rules over BIDS subject/session/run wildcards.
How projio is aware of it¶
packages/pipeio/— projio's pipeline subsystem.pipeio.adapters.bids.BidsPathslayers projio's per-flow registry/manifest convention on top of snakebids..projio/pipeio/registry.yml— generated bypipeio_registry_scan, enumerates flows and theirapp_type(currently alwayssnakemake).- MCP tools:
pipeio_flow_list,pipeio_flow_status,pipeio_flow_new,pipeio_run,pipeio_dag_export,pipeio_flow_report,pipeio_mod_*,pipeio_rule_*. ~50 tools total addressing flows by name (no path). - Runner auto-detection (
pixi.tomlpresent →pixi run snakemake, elseconda run -n <env> snakemake).
Adoption across projects¶
| Project | Adopted? | How / signature artifact | Variation worth teaching |
|---|---|---|---|
| projio | disabled | subsystems.pipeio.enabled: false; flows: {} |
tool repo deliberately does not run pipelines on itself |
| cogpy | legacy | src/cogpy/workflows/preprocess/Snakefile — pure snakebids (from snakebids import bids, generate_inputs), no pipeio adapter |
shows the "before-pipeio" world; predates the project's adoption of code/pipelines/<flow>/ layout |
| pixecog | extensive | 16 flows in code/pipelines/, all app_type: snakemake; example Snakefile uses snakebids set_bids_spec("v0_0_0") and BidsPaths(in_reg, root, inputs) |
most flows of the cohort; canonical example of snakebids + pipeio compose |
| gecog | medium | 8 flows in code/pipelines/; same snakebids + BidsPaths pattern |
cleanest single-domain set (factor analysis + sleep spindle + travelling wave) |
| msol | minimal | 3 flows in code/pipelines/; plain snakemake — glob_wildcards() against flat path templates, no snakebids and no BidsPaths |
demonstrates a working pipeline without the BIDS-aware adapters — useful for "Snakemake without ceremony" pedagogy |
Convergent patterns¶
- Every project that runs pipelines puts them under
code/pipelines/<flow>/with aSnakefile,config.yml,scripts/, andnotebooks/. - Every flow output goes under
derivatives/<flow>/(BIDS-aligned). - Every study project's Makefile resolves
SNAKEMAKEthrough pixi or conda env wrapping; none invokesnakemakebare.
Divergent patterns¶
- snakebids vs plain snakemake: cogpy = pure snakebids; pixecog + gecog
= snakebids + pipeio's
BidsPaths; msol = plain snakemake. Three styles in the same workflow ecosystem. - Cross-flow contract: pixecog + gecog use
manifest.ymlwritten by upstream flows and read viaBidsPaths(safe_load(...manifest.yml), ...)by downstream flows. cogpy is single-flow; msol's three flows do not cross-feed via manifests. - Flow scale: pixecog 16 ≫ gecog 8 ≫ msol 3.
Canonical teaching artifact¶
pixecog/code/pipelines/lfp_extrema/Snakefile + config.yml —
the registry-extension pattern (the Snakefile programmatically extends
config['registry'] with one group per detection-tuple, then fans out
seven outputs per slow-wave detection). One artifact that shows
config-driven Snakemake, snakebids wildcards, the BidsPaths adapter,
and cross-flow manifest.yml consumption all at once.
Honest gap¶
Three Snakemake styles coexist in one ecosystem (snakebids alone, snakebids + BidsPaths, plain snakemake). The workshop should pick the snakebids + BidsPaths style as the default and explicitly position the others as predecessor (cogpy) and minimal-ceremony (msol) variants. Trying to teach all three styles in 4 days would dilute the message.
Component 4 — Marimo¶
What it does¶
Marimo is a Python notebook format where the file is a .py file (no
JSON), cells form a reactive DAG (changing one cell automatically re-runs
its dependents), and the notebook can run as a script, as a server, or as
a static HTML/WASM bundle. Workshop framing: Jupyter-style narrative,
diff-friendly storage, and reactive-spreadsheet semantics — with no
hidden state. Marimo plays two distinct roles: (a) per-flow exploratory
notebooks, and (b) handbook explorables exported via marimo export
html-wasm.
How projio is aware of it¶
pipeio_nb_*MCP tools treat marimo as a first-class notebook backend alongside jupytext percent-format.pipeio_nb_createis kind-aware (investigate/explorevsdemo/validate);pipeio_nb_watchlaunchesmarimo edit --watchfor live editing;pipeio_nb_snapshotexecutes a marimo notebook and reads cell outputs (the agent's "eyes" into a notebook);pipeio_nb_validaterunsmarimo check.format:field innotebook.ymlselects the backend per notebook. Auto-detected when empty (which is currently the case in every surveyednotebook.yml).marimo-pairskill (user-level, not project-local) launches and monitors a marimo session and is allow-listed in pixecog's and msol's.claude/settings.jsonvia thediscover-servers.shandexecute-code.shBash patterns.
Adoption across projects¶
| Project | Adopted? | How / signature artifact | Variation worth teaching |
|---|---|---|---|
| projio | n/a | 12 import marimo matches inside packages/pipeio/ source/tests + 2 docs (docs/specs/pipeio/notebook.md, docs/tutorials/marimo-notebooks.md); no project-level notebooks |
tool reference for the backend |
| cogpy | absent | 0 marimo matches | library repo — does not run notebooks |
| pixecog | extensive | 7 marimo .py notebooks under code/pipelines/{calibrate_ieeg_notch, spectrogram_burst, calibrate_ieeg, preprocess_ecephys}/notebooks/explore/; 15 notebook.yml files; __marimo__/session/ cache dir at repo root |
only project with real marimo authoring; the cache dir leaking into repo root is friction |
| gecog | minimal | 2 explore notebooks (code/pipelines/travelling_wave/notebooks/explore/{kw_spectrum.py, flow_and_patterns.py}); 7 notebook.yml files |
one flow has marimo notebooks, the rest are placeholders |
| msol | absent (notebooks empty) | 3 notebook.yml files but the .src/explore_*.py paths point to placeholder marimo files; one calibration script (code/scripts/calibration/calibrate_arena_corners.py) imports marimo |
adopted scaffolding without populating notebooks |
Convergent patterns¶
- Every project that has flows uses the
notebooks/{explore,demo}/.src/layout per thefeedback_notebook_layout.mdconvention (split source vs MyST views; no per-notebook subdirs). - Every
notebook.ymlis auto-detect (format: ''). - Marimo enters projects via pipeio's notebook tooling — none of the
projects use
marimo editindependently of the projio convention.
Divergent patterns¶
- Real adoption is concentrated in pixecog (7 notebooks across 4 flows). Other projects have the scaffolding but mostly empty .src files.
- Cache discipline: only pixecog leaks
__marimo__/session/into the repo root — should be in.gitignore.
Canonical teaching artifact¶
pixecog/code/pipelines/spectrogram_burst/notebooks/ — notebook.yml
+ a real marimo .py notebook in explore/.src/. Single concrete
example of (a) the kind-aware notebook layout, (b) a real exploration
notebook checked into the repo, (c) pipeio_nb_* discovery.
Honest gap¶
Marimo is real on disk in only one project. The workshop can teach
authoring in pixecog, but the second role for marimo (handbook
explorables via marimo export html-wasm) has zero examples in
the surveyed projects. That is a deliberate handbook target rather
than a current artifact: handbook chapter 1 can include the first
real explorable as part of writing the chapter.
Component 5 — Quarto / MkDocs¶
What it does¶
Both are static-site generators that turn markdown/Quarto-markdown into HTML, but with different defaults: MkDocs (with the Material theme) is optimized for documentation sites with navigation, search, and MkDocs plugins (bibtex, monorepo, ezlinks); Quarto unifies markdown + executable code + multiple output formats (website, book, revealjs slides, PDF) under one source. Workshop framing: MkDocs for the handbook (a docs site), Quarto for the workshop (a multi-output course package where the same source feeds website + book + slides + executable notebooks).
How projio is aware of it¶
site.frameworkin.projio/config.yml(mkdocs|sphinx) plusoutput_dir. Noquartovalue yet in the framework enum — Quarto enters via.projio/render/quarto.ymlfor individual deliverables.- MCP tools:
site_build,site_serve,site_deploy,site_list,site_detect. These dispatch to MkDocs or Sphinx based onsite.framework. pipeio_mkdocs_nav_patch— patchesmkdocs.ymlnav from collected flow docs. No equivalent for Quarto navigation yet.docs/specs/quarto-reports.md+.projio/render/quarto.yml— the Quarto-for-deliverables convention (per-report.qmdfiles rendered to HTML/PDF, separate from the site nav).
Adoption across projects¶
| Project | MkDocs | Sphinx | Quarto | Variation worth teaching |
|---|---|---|---|---|
| projio | yes (theme.name: material) |
no | yes (deliverables only — .projio/render/quarto.yml) |
only project with both MkDocs and Quarto in the same repo |
| cogpy | yes (material) | yes (docs/build/html, Makefile uses sphinx-build) |
no | only project on Sphinx (legacy, kept) |
| pixecog | yes (material) | no | no | mkdocs.yml + Makefile splits serve and serve_live (mike-style versioning) |
| gecog | yes (material) | no | no | minimal mkdocs.yml; Makefile defers to .projio/projio.mk |
| msol | yes (material) | no | no | most plugin-rich mkdocs (search + monorepo + ezlinks + bibtex) |
Convergent patterns¶
- mkdocs-material is the universal documentation framework across all 5 projects.
- All study projects' Makefiles include
.projio/projio.mkfor the generated docs targets.
Divergent patterns¶
- Sphinx outlier: cogpy alone runs Sphinx (its API docs were pre-existing when it joined the projio ecosystem). Workshop honest framing: projio doesn't impose a docs framework.
- Quarto split: projio uses Quarto only for deliverables
(
.qmdreports) underdocs/deliverables/reports/, not for the site itself. The workshop introduces Quarto as the workshop's own publication framework, not as a projio convention to push into other projects. - Plugin density: msol's mkdocs.yml has 4 plugins (search, monorepo, ezlinks, bibtex); gecog's is minimal. Worth teaching that mkdocs-material is a bare framework — projects choose how much to decorate.
Canonical teaching artifact¶
projio/.projio/render/quarto.yml + one docs/deliverables/reports/*.qmd
report. Single artifact that shows the projio Quarto-for-deliverables
convention. For MkDocs, pixecog/mkdocs.yml is the strongest
example — material theme, real navigation, Makefile-wired build.
Honest gap¶
The handbook (mkdocs-material) and the workshop (Quarto project) are
different surfaces with different generators, and projio has no
convention for cross-linking from one to the other yet. The
architecture decision in the source idea note (handbook in projio,
workshop in teaching/agentic-workshop/) sidesteps this by separation —
worth surfacing as a deliberate design choice in chapter 1, not as an
unsolved problem.
Component 6 — projio¶
What it does¶
projio is the project-aware layer that knows about BIDS conventions, DataLad subdatasets, Snakemake flows, marimo notebooks, and the docs framework — and exposes that knowledge through MCP tools so a Claude Code agent can query "what flows are registered?", "what does this mod produce?", "where does this paper live?", "what notes did the agent write last week?", without inventing a path scheme. Workshop framing: the stack already does the work; projio makes the stack queryable.
How projio is aware of itself¶
projio's six subsystems each manage a knowledge domain:
| Subsystem | Domain | Awareness |
|---|---|---|
| indexio | retrieval | corpus indexing, chunking, embedding, RAG |
| biblio | literature | bibliography, citekey resolution, paper context, docling/grobid |
| notio | notes + manuscripts | structured notes, manuscript assembly |
| codio | code intelligence | library registry with role: core/shared/external |
| pipeio | pipelines | flow/mod/rule/notebook/config tooling |
| figio | figures | declarative figure orchestration (FigureSpec YAML) |
Adoption across projects¶
| Project | indexio | biblio | notio | codio | pipeio | figio | Variation |
|---|---|---|---|---|---|---|---|
| projio | active (2 corpora, 1.3k+75k chunks) | active | active | active (1 first-party + 14 mirrors) | disabled (enabled: false) |
minimal (1 example) | tool repo — pipeio off intentionally |
| cogpy | active | active (full bib/) |
active (10 log subdirs) | active (~40 external mirrors) | not enabled | absent | strongest codio external-mirror catalog |
| pixecog | active | active | active (9 log subdirs) | active (cogpy/labbox/labpy as code/lib/*) |
active (16 flows) | dir-only, figs ad-hoc | most flows; densest subsystem usage |
| gecog | active | active | active (5 log subdirs) | active (cogpy/labpy as code/lib/*) |
active (8 flows) | 1 first-party FigureSpec | only first-party FigureSpec in the cohort |
| msol | active | enabled: false (drift) |
enabled: false (drift) |
active (database_io/ratcave) | active (3 flows) | absent | only project with two subsystems flagged off |
Convergent patterns¶
- All study projects use
code/{lib,pipelines,utils}/per the code-tiers spec. - All study projects keep
docs/log/{idea,issue,task,result,meeting}/withindex.md(notio). - All study projects have
bib/populated (even msol where biblio is flagged off in config).
Divergent patterns¶
enabled:flag drift: msol declares biblio + notio off while using both. projio doesn't auto-reconcile.- figio adoption: 1 first-party FigureSpec across all 5 projects
(gecog's May-02 cohort). Most figures are ad-hoc
<report>-figs/. - Code tier presence: cogpy is
src/cogpy/<area>/flat (nocode/lib/) — it's a library, not a study; the tier convention doesn't apply.
Graded introduction sequence (workshop)¶
The handbook + workshop introduce projio gradually: BIDS + DataLad + Snakemake + Marimo + Quarto first (workshop day 1 + early day 2), then projio enters as the layer that knows about those conventions. Each stage motivated by the pain it solves in the prior stage.
| Stage | What's added | Workshop session that introduces it | Stack-awareness payoff |
|---|---|---|---|
| 0 | BIDS root + DataLad super/subdatasets + one snakebids flow + marimo explore notebook + mkdocs site, no projio | Day 1 (full day) | Establish the bare stack works without projio. Pain that motivates next stage: 12 derivatives dirs and no record of which Snakemake config produced each. |
| 1 | projio sync + .projio/config.yml + notio (docs/log/ + indexes) |
Day 2 AM | Project memory: dated tasks/results/ideas under docs/log/, navigable via MCP note_search. Pain solved: "why was this run produced and by whom?" Pain remaining: cross-flow paths still hand-constructed. |
| 2 | pipeio flow registry + BidsPaths adapter + manifest.yml contract |
Day 2 PM | One MCP tool call (pipeio_flow_list, pipeio_target_paths) returns flow inventory and resolved BIDS output paths. Pain solved: agents and humans stop hand-constructing BIDS wildcards. Pain remaining: paper claims drift from data. |
| 3 | biblio + indexio (paper ingest, docling extraction, RAG) |
Day 3 AM | Citekey-based references resolvable via MCP; full-text search across project corpus. Pain solved: claims back-link to extracted paper text. Pain remaining: figures still ad-hoc per report. |
| 4 | figio FigureSpec + manuscript subsystem |
Day 3 PM | Composable figure specs (one panel-per-rule), auditable manuscript_cite_check. Pain solved: one figure spec → multiple outputs (PDF/PNG/SVG). |
| 5 | codio library catalog with role: core + agent_instructions() discovery |
Day 3 PM (continues) | Agent-discoverable code reuse — "what library can do XYZ?" → MCP codio_discover. Pain solved: agent doesn't reinvent primitives. |
The point of the gradient: each subsystem earns its complexity by solving a problem the previous stage exposed. A workshop participant who stops at stage 1 still has a working project; each later stage is additive, not foundational.
Canonical teaching artifact¶
pixecog/.projio/pipeio/registry.yml (16 flows) + pixecog/.projio/config.yml
+ the code/pipelines/lfp_extrema/ flow. Together they show: how
projio discovers flows (registry.yml), how it configures itself
(.projio/config.yml), and what a real flow looks like under the
convention.
Honest gap¶
subsystems.<name>.enabled flags can lag actual on-disk usage, and
projio doesn't auto-detect that drift (msol is the running example).
Likewise figio is mostly aspirational across all five projects: one
first-party FigureSpec exists. The handbook should be honest about
both — the convention exists; uniform adoption does not.
Component 7 — Agentic on top¶
What it does¶
The "agentic on top" layer is the set of conventions and configurations
that let a Claude Code agent operate on a projio-aware project: which
MCP servers it talks to, which Bash commands and read paths are
pre-approved (.claude/settings.json), which prompt-based skills are
discoverable (.projio/skills/), and how captures/notes flow into
dispatched tasks (worklog). Workshop framing: the agent is a
collaborator with bounded permissions, structured context, and routed
work — not an opaque chatbot.
How projio is aware of it¶
projio initscaffolds.claude/settings.json(withmcp__projio__*pre-approved) +.mcp.json(projio + worklog + sirocampus baseline). Both files gitignored via the# >>> projio >>>block.agent_instructions()MCP tool discovers skills fromsrc/projio/data/skills/(ecosystem) +.projio/skills/<name>/SKILL.md(project-local; project overrides ecosystem).skill_read(name)returns a skill body — Claude Code surfaces these as slash commands.permissions_sync()reconciles project skill listings.
Agentic-on-top inventory¶
.claude/settings.json permission shape¶
| Project | MCP servers pre-approved | Distinctive Bash | Distinctive Read paths |
|---|---|---|---|
| projio | mcp__projio__*, mcp__sirocampus__*, mcp__worklog__* |
git/python/pip/pytest/make baseline |
/storage2/arash/**, 14 .projio/codio/mirrors/* |
| cogpy | mcp__projio__*, mcp__sirocampus__*, mcp__worklog__* |
typing/lint stack: mypy/ruff/black/isort/tox/nox/coverage/jupyter/ipython |
/storage/share/codelib/* (~17 lab-shared mirrors) |
| pixecog | mcp__projio__*, mcp__sirocampus__*, mcp__worklog__* + WebSearch |
pixi search *, ssh gamma{1..4} uptime, ssh gpu/spikesort/beta/theta uptime, bash /home/arash/.claude/skills/marimo-pair/scripts/discover-servers.sh* |
three subdataset trees code/lib/{cogpy,labbox,labpy} |
| gecog | mcp__projio__*, mcp__sirocampus__*, mcp__worklog__* + mcp__cogpy__* |
minimal Bash baseline | code/lib/{cogpy,labpy} |
| msol | mcp__projio__*, mcp__sirocampus__*, mcp__worklog__* |
pixi run -e analysis marimo *, pixi run marimo *, pixi search *, pixi install *, marimo-pair scripts |
code/lib/{database_io,ratcave} |
Tool wildcards (mcp__projio__*) are universal. Explicit Bash command
patterns vary by project domain (typing stack for cogpy, multi-host SSH
for pixecog, pixi+marimo for msol).
.mcp.json server set¶
| Project | projio MCP rooted at | Other servers | Notable |
|---|---|---|---|
| projio | this repo | sirocampus + worklog | three-server baseline |
| cogpy | this repo | sirocampus + worklog | baseline |
| pixecog | this repo | sirocampus + worklog | baseline |
| gecog | this repo | cogpy + sirocampus + worklog | unique fourth server (cogpy MCP exposes the cogpy library's projio tools as a sibling project) |
| msol | this repo | sirocampus + worklog | baseline |
All servers run python -m projio.mcp.server via /storage/share/python/environments/Anaconda3/envs/rag/bin/python — a single env hosts every MCP server in the ecosystem.
Skills¶
- Project-local (
.projio/skills/<name>/): - projio:
figio-guide,projio-setup(2) - cogpy:
cogpy-dev(1) - pixecog:
pixecog-flow-setup(1) - gecog: 0
- msol: 0
- User-level (
.claude/skills/): - projio:
gitnexus(1) - everywhere else: directory does not exist
- Ecosystem skills (always available via
agent_instructions()): ~25 skills fromsrc/projio/data/skills/—figio-guide,pipeio-guide,marimo-session,marimo-pair,pipeio-nb-extract,progress-report-deck,literature-presentation,idea-capture,questio-*(7),biblio-batch-curate,codelib-discovery,mcp-tool-scaffold,rag-query, etc.
Captures → tasks pattern (worklog)¶
Across projects: notes captured via worklog_note(text, project_id)
land in docs/log/idea/ or docs/log/issue/ per kind. Tasks promoted
via promote_to_task(source) land in docs/log/task/<task>.md with
status: scheduled, then are dispatched via worklog_note(...,
auto_dispatch=True, model="opus|sonnet") or schedule_queue(after=...)
for dependency-based chains.
This very session is itself a teaching artifact: the source idea note
(docs/log/idea/idea-arash-20260507-221835-382557.md) → three task
notes (task-arash-20260508-160000-200001.md etc.) → this result note
all live in projio's docs/log/ and trace a real planning loop.
Hooks¶
No hooks key in any of the five projects' .claude/settings.json.
Permissions are per-project; cross-cutting automation (Stop / PreToolUse)
is currently absent across the cohort. The workshop and handbook can
introduce hooks as "advanced" rather than as baseline practice.
Routing (model + host)¶
- Model selection follows the worklog-MCP convention: opus for
synthesis (multi-package, architectural), sonnet for execution
(single-module, clear scope), haiku for triage (trivial/typo).
auto_dispatch=Truedefaults to opus. - Host routing is encoded only in pixecog's
.claude/settings.jsonvia thessh gamma{1..4}/gpu/spikesort/beta/theta uptimeallow-list — that's where multi-host work happens. The other projects don't expose cross-host SSH in their permission set.
Adoption summary table¶
| Project | hooks | project skills | extra MCP servers | distinctive permission shape |
|---|---|---|---|---|
| projio | none | 2 + user-level gitnexus |
none beyond baseline | mirror Read globs |
| cogpy | none | 1 (cogpy-dev) |
none beyond baseline | typing/lint Bash |
| pixecog | none | 1 (pixecog-flow-setup) |
none beyond baseline | multi-host SSH + pixi + marimo-pair |
| gecog | none | 0 | cogpy MCP | minimal — cleanest "default" project |
| msol | none | 0 | none beyond baseline | pixi-env-named marimo Bash |
Convergent patterns¶
- Three-server MCP baseline (projio + worklog + sirocampus) is universal.
mcp__<server>__*wildcards must appear in bothpermissions.allowandallowedToolsfor auto-approval (per the user'sfeedback_mcp_permissions.mdmemory). Every surveyed project follows this.- Read access to
/storage2/arash/**and/storage/share/sirocampus/**is universal — the cross-project read substrate.
Divergent patterns¶
- Project-local skills: present in 3/5 projects, absent in 2/5 (gecog, msol). Skill authoring is real but not yet baseline.
- Bash permission shape mirrors project domain: cogpy = library hygiene; pixecog = multi-host research compute; msol = pixi+marimo workflow; gecog = minimal.
- Fourth MCP server: gecog uniquely wires
cogpyas a per-project MCP server, exposing the cogpy library's projio tools alongside its own — the only example of a non-baseline projio MCP server in the cohort.
Canonical teaching artifact¶
pixecog/.claude/settings.json + pixecog/.mcp.json + the
docs/log/idea/ → docs/log/task/ → docs/log/result/ chain for any
recent multi-step initiative. One bundle that shows: tool wildcards,
explicit Bash command patterns for multi-host SSH, marimo-pair script
allow-listing, and the captured-to-dispatched-to-result flow as
auditable artifacts.
Honest gap¶
Hooks are unused (zero across the cohort). Project-local skills are present in only 3/5 projects. The "agent operates with bounded permissions and structured context" pattern is real but not uniformly adopted. Workshop should introduce skills + hooks as graduated adoption rather than as baseline.
Universal patterns across the stack¶
These hold across every component that's adopted by a project:
-
The repository is the unit of knowledge — everything (data, code, papers, notes, deliverables, configs, agent settings) lives in or alongside one git/datalad superdataset. No separate "knowledge base" elsewhere.
-
Conventions over configuration — BIDS for data, DataLad for versioning,
code/{lib,pipelines,utils}/for code,docs/log/{idea, task,result,...}/for notio,derivatives/<flow>/for outputs,.projio/for tool state. Each tool surfaces its own conventions so that an agent or a new collaborator can navigate a project they have never seen. -
MCP wildcards in both permission slots — every project's
.claude/settings.jsonlistsmcp__<server>__*in bothpermissions.allowandallowedTools. This is non-obvious from Claude Code defaults and is enforced socially. -
Subsystem-disable as a config flag, not a deletion — turning off
pipeio(projio) orbiblio/notio(msol) in.projio/config.ymlprevents the subsystem from being invoked but leaves the on-disk state alone. Drift between flag and state is possible. -
Snakemake outputs land under
derivatives/<flow>/and are registered as DataLad subdatasets (in pixecog and gecog — the convention msol is mid-adopting). -
Cross-flow contract is
manifest.yml+BidsPathsin the electrophysiology projects — Snakemake's input/output alone is not the integration glue. -
Three MCP servers minimum: projio (rooted at the project) + sirocampus + worklog. All run from one shared
ragconda env. -
projio syncis the periodic reconciliation step — auto-detectscode/lib/*/, generates.projio/projio.mk, copies CSL/Lua filters, regenerates skill index. Without it, drift accumulates. -
Notes are the audit trail —
docs/log/{idea,task,result,...}/with daily/weekly indexes captures the why of every dispatched task and every produced result. The chain itself is navigable knowledge, not a side-effect of work.
Recommended teaching artifacts (one per component, 7 total)¶
Ranked by leverage (impact × concreteness × low explanatory friction) within each component. Each entry maps to a specific workshop session and a specific handbook chapter.
| # | Component | Artifact | Project | Anchors |
|---|---|---|---|---|
| 1 | BIDS | pixecog/raw/ (strict BIDS) + pixecog/derivatives/preprocess_ieeg/manifest.yml (soft-form derivative root) |
pixecog | Workshop Day 1 AM; Handbook ch. "BIDS in practice" |
| 2 | DataLad | gecog/.gitmodules (9 entries, single-RIA-store layout) |
gecog | Workshop Day 1 AM; Handbook ch. "DataLad as a coherent subdataset graph" |
| 3 | Snakemake | pixecog/code/pipelines/lfp_extrema/Snakefile + config.yml (registry-extension pattern) |
pixecog | Workshop Day 1 PM; Handbook ch. "config-driven pipelines" |
| 4 | Marimo | pixecog/code/pipelines/spectrogram_burst/notebooks/ (real .src/explore_*.py + notebook.yml) |
pixecog | Workshop Day 2 AM; Handbook ch. "reactive notebooks for analysis and explorables" |
| 5 | Quarto / MkDocs | projio/.projio/render/quarto.yml + pixecog/mkdocs.yml (the two surfaces) |
projio + pixecog | Workshop pre-workshop setup (build the site once on day 0); Handbook ch. "publication framework" |
| 6 | projio | pixecog/.projio/pipeio/registry.yml (16 flows) + pixecog/.projio/config.yml + pixecog/code/pipelines/lfp_extrema/ |
pixecog | Workshop Day 2 PM + Day 3 AM; Handbook chs. "the project as a queryable knowledge environment" + "the pipeio subsystem" |
| 7 | Agentic on top | pixecog/.claude/settings.json + pixecog/.mcp.json + a recent docs/log/idea → task → result chain (gecog mlclassifier or pixecog detection_qc) |
pixecog (settings) + gecog (chain) | Workshop Day 2 AM intro + Day 3 PM; Handbook ch. "the iterative loop" |
Notes on the shortlist:
- pixecog dominates the artifact list (5 of 7 entries) because it has the densest stack adoption — the strongest single project to dissect for a workshop.
- gecog provides the agentic chain because the
brainstate.mlclassifierarc (Apr 29 – May 6) is the cleanest, most narrativeidea → task → resultloop in any project'sdocs/log/. - msol provides domain diversity for the "this generalizes beyond electrophysiology" slot in workshop day 1, but is not the canonical artifact for any single component (it under-adopts DataLad subdatasets and Marimo, doesn't use snakebids/BidsPaths).
- cogpy provides the legacy snakebids reference (
src/cogpy/workflows/preprocess/Snakefile) but is not on the canonical list — workshop introduces it as "before-pipeio" backdrop, not as the artifact to dissect.
Honest gaps¶
Consolidated from the seven component-level gaps, plus one cross-cutting gap. State each as a sentence, then a sentence on what the handbook / workshop should do about it.
-
Derivative roots aren't BIDS-valid (no per-derivative
dataset_description.json). The workshop teachesmanifest.ymlas a projio convention layered on BIDS, not as BIDS itself. Open question for a future projio convention iteration: shouldpipeio_flow_newemit a derivativedataset_description.json? -
Subdataset-per-derivative is socially enforced, not automatic. msol shows the consequence: a study can be DataLad-initialized while most of the directory tree is not actually subdatasetted. The handbook chapter on DataLad should teach the convention as a deliberate choice at flow-creation time, with
pipeio_flow_newprompting the user to register the subdataset. -
Three Snakemake idioms coexist (snakebids alone in cogpy, snakebids + BidsPaths in pixecog/gecog, plain snakemake in msol). The workshop picks snakebids + BidsPaths as default and labels the others as legacy / minimal-ceremony variants — no attempt to teach all three.
-
Marimo's second role (handbook explorables) has zero examples yet in any surveyed project. The first explorable is a deliberate handbook target, not an existing artifact to dissect. Plan the first explorable into chapter 1's writing schedule, not into pre-workshop reading.
-
Quarto and MkDocs don't cross-link between the handbook surface and the workshop surface. The architecture decision in the source note (handbook in projio, workshop in
teaching/agentic-workshop/) sidesteps this by separation, but the user-facing path between the two surfaces is currently "navigate the URL bar." Worth surfacing as a deliberate choice in chapter 1, not as an unsolved problem. -
subsystems.<name>.enabledflags drift from on-disk reality (msol has biblio + notio off while using both). projio doesn't auto-reconcile. The workshop advises periodicprojio sync; the handbook should mention the drift exists. -
Hooks are unused across all 5 projects'
.claude/settings.json, and project-local skills are present in only 3/5. The "agent operates with bounded permissions and structured context" pattern is real but not uniformly adopted. Introduce skills + hooks as graduated adoption rather than baseline. -
Cross-cutting: every project in this set has one author (Arash). The single-author fragility named in the source idea note (Quantomatic precedent) applies to every component of the stack as projio uses it. The handbook + workshop are the docs + examples + community legs of the survival strategy. Name this as motivation in chapter 1, not as deflection.
Method¶
- Read the prior
result-arash-20260508-tool-use-survey.mdonce for paths and tool names; framed new prose around the seven stack components, not subsystems. - Used
mcp__worklog__worklog_read_fileto read each project's.claude/settings.json,.mcp.json,.projio/config.yml,pixi.toml,Makefile,.gitmodules,.datalad/config,.projio/pipeio/registry.yml, one exampleSnakefileper project, and one examplenotebook.ymlper project. - Used a general-purpose agent in parallel to gather counts, file
inventories, and the marimo /
__marimo__cache scan across all five projects. - Cross-checked findings against the prior tool-use survey for paths and adoption counts; this survey is internally consistent with that survey but reframes the same evidence on the stack axis.
- No source projects modified.