Skip to content

Honest gaps

Sources & anchors

  • Stack component: Meta
  • Canonical artifact: stack-axis survey §Honest gaps (8 items, verbatim mapping)
  • Workshop session: Day-1 closing + Day-3 closing
  • Outline: _outline.md §B

Frame

This chapter is the audit trail. Every other chapter in the handbook asserts that some component of the stack is adopted across the project cohort and teachable from a canonical artifact. The stack-axis survey that produced the chapter list also produced an inventory of places where the assertion breaks: conventions that are enforced socially rather than mechanically, idioms that coexist instead of converging, features that exist as scaffolding without content, and a structural fact about the cohort that the rest of the handbook has to treat as a permanent condition rather than a bug to fix.

The companion chapter single-author fragility names the structural fact. This chapter names the rest. Each gap is stated once, plainly, followed by what the handbook does about it. No new gaps are introduced here — the eight below are the survey's enumeration, and this chapter is the resolution layer, not a new source of truth.

1. Derivative roots aren't BIDS-valid

Across pixecog and gecog every flow produces a derivatives/<flow>/ tree that looks BIDS-shaped — sub-XX/ directories, modality-typed files, consistent wildcard ordering — but does not include a per-derivative dataset_description.json. A strict BIDS validator treating one of those derivative roots as its own BIDS root would reject it. The shape is there; the metadata that promotes the shape to a portable BIDS dataset is not.

What the handbook does about it. The chapter derivatives-and-manifest teaches manifest.yml as a projio convention layered on BIDS, not as BIDS itself. Workshop participants leave knowing that the manifest is what makes pipeio's cross-flow contracts work, and that emitting a dataset_description.json for tool-portability beyond the projio ecosystem is an open convention question — currently considered for pipeio_flow_new, not yet implemented.

2. Subdataset-per-derivative is socially enforced

DataLad does not require a study to register every derivative as its own subdataset. pixecog and gecog both follow the convention (25 and 9 .gitmodules entries respectively, every flow's derivatives/<flow>/ registered). msol is the counterexample: DataLad-initialized, one subdataset entry total (code/lib/ratcave), every other directory treated as ordinary in-repo content. Nothing in the stack flagged this — the convention is carried by humans, not by tooling.

What the handbook does about it. The chapter code-as-subdataset presents subdataset-per-derivative as a deliberate choice the user makes at flow-creation time, not as automatic projio behaviour. The right moment to register the subdataset is when pipeio_flow_new scaffolds the flow, and the handbook treats prompting at that moment as a plausible future improvement rather than a current guarantee. msol is named as the project mid-way through adopting the convention so that participants who inherit a similarly partial setup recognise the state.

3. Three Snakemake idioms coexist

cogpy runs snakebids alone — generate_inputs() plus custom path helpers, no BidsPaths, no manifest. pixecog and gecog run snakebids + BidsPaths, the style that participates in pipeio's cross-flow contracts. msol runs plain snakemake with glob_wildcards() on flat paths and no snakebids layer at all. Three real, working styles, one ecosystem.

What the handbook does about it. The chapter three-idioms picks snakebids + BidsPaths as the workshop default and labels the other two explicitly — snakebids-alone as the predecessor pattern (new flows should not use it) and plain snakemake as the minimal-ceremony variant for non-BIDS or one-shot pipelines. The workshop teaches one style and motivates the others as contrast; trying to teach all three in four days would dilute the message.

4. Marimo explorables have zero existing examples

Marimo plays two roles in the planned stack: per-flow exploratory notebooks (real on disk in pixecog, scaffolded but mostly empty elsewhere) and handbook explorables exported via marimo export html-wasm. The second role has zero examples in any surveyed project — every existing marimo file is an exploration notebook tied to a pipeline, not an embeddable, backend-free WASM bundle for the docs site.

What the handbook does about it. The chapter handbook-explorables treats the explorable role as a deliberate handbook target rather than an existing artifact to dissect. Outline §F caps the handbook at five WASM explorables and names each one (E1–E5) with its host chapter. Writing those explorables is part of writing the chapter; the handbook is honest that the first marimo-WASM bundle in the projio ecosystem will be authored for the handbook, not lifted from a study project.

The handbook is an MkDocs site (mkdocs-material, plugin-decorated). The workshop is a Quarto project under teaching/agentic-workshop/ that renders the same source to website + book + slides + executable notebooks. They live on two different surfaces with two different generators, and projio currently has no convention for cross-linking between them. A workshop participant who wants the handbook does so through the URL bar.

What the handbook does about it. The chapter two-surfaces-one-cross-link-protocol surfaces this as a deliberate architectural choice, not as an unsolved problem. Workshop handouts link into the handbook by published URL; handbook chapters cite source artifacts by repo-relative path. The separation keeps the workshop dataset detachable, and the handbook does not pretend the gap is a bug.

6. subsystems.<name>.enabled flags drift from on-disk reality

projio's .projio/config.yml carries subsystems.<name>.enabled flags for biblio, notio, codio, figio, pipeio, indexio. msol has biblio and notio set to off in its config while clearly using both — there are .bib files, there are notes in docs/log/, the subsystems are working. projio does not auto-reconcile the flag with what's actually on disk.

What the handbook does about it. The chapter 00-stack-aware-layer advises periodic projio sync to bring the config in line with the repository state and names the drift as a real failure mode. The right long-term fix is auto-reconciliation; the present fix is a documented hygiene command. The handbook does not pretend the flag is authoritative.

7. Hooks are unused and skills aren't uniform

.claude/settings.json supports a hooks key that lets a project fire commands on Stop, PreToolUse, and similar events. Zero of the five surveyed projects have any hooks configured. Project-local skills under .projio/skills/ are present in three projects (projio, cogpy, pixecog) and absent in two (gecog, msol). The "agent operates with bounded permissions and structured context" pattern is real — every project has the MCP-server allow-list, the Bash command shape, the Read globs — but the richer expressions of that pattern (hooks for cross-cutting automation, skills for prompt templates) are not yet baseline.

What the handbook does about it. The chapter skills introduces SKILL.md as a graduated adoption — useful when a project has a repeatable workflow worth naming, not required from day one. Hooks appear in permissions-and-bounded-context as an advanced configuration, not as baseline. The handbook teaches the shape these features take when present; the workshop does not require participants to ship a project with either.

8. Every project in the cohort has one author

The five projects surveyed (projio, cogpy, pixecog, gecog, msol) are all sole-authored by Arash. The single-author fragility named in the source idea note — Quantomatic as the cautionary precedent — applies to every component of the stack as projio currently uses it. There is no second author, no co-maintainer, no upstream community for projio itself. The cohort is not representative of multi-author research practice because it cannot be.

What the handbook does about it. single-author fragility names this as motivation, not deflection. The handbook plus the September workshop are the docs + examples + community legs of a deliberate survival strategy: the handbook is the docs leg, the workshop is the seed of the community leg, and the published handbook artifacts (every chapter cites real files in real projects) are the examples leg. The goal is not to hide the fragility but to do something about it. The honest scope statements in that chapter — system not stable yet, two subsystems aspirational, agentic layer presupposes Claude Code — are this chapter's natural pair.

Reading the audit

A reader can use this chapter two ways. Forward, as a way of calibrating expectations chapter by chapter — when chapter code-as-subdataset says "register every derivative as its own subdataset," the reader knows that gap 2 already named msol as the project that does not. Backward, as a way of asking what would close each gap: a derivative dataset_description.json emitter (gap 1), a pipeio_flow_new prompt that registers the subdataset (gap 2), a documented migration off snakebids-alone (gap 3), the first marimo-WASM bundle (gap 4), a Quarto-to-MkDocs cross-link mechanism (gap 5), subsystems.enabled auto-reconciliation (gap 6), a starter hook configuration and a skills-by-default checklist (gap 7), and a second author (gap 8).

Seven of the eight are convention or tooling work that could happen in a quarter. The eighth is the work the handbook itself is for.

Further reading

  • BIDS specification — the authoritative source for derivative validation requirements described in gap 1.
  • The Turing Way — community-assembled checklist of common gaps in reproducible research practice.
  • goodresearch.dev — the companion handbook against which this cohort's gaps were calibrated.