Projio ecosystem audit — glue identity, drift, and the integration boundary
Tags: architecture, projio, ecosystem, glue, refactor Series: ecosystem-audit
Frame¶
Projio is glue, in the DataLad sense — an integration layer over a constellation of independently useful tools (pandoc, snakemake, mkdocs, jupyter/marimo, pixi, datalad, openalex/zotero, optionally tree-sitter via gitnexus). The discipline that makes that identity respectable, taken from DataLad:
- Don't replace what you're gluing — bare
snakemake,pandoc,mkdocs,jupytermust still work. - Conventions should be minimal and load-bearing — every entry under
.projio/must justify its existence. - Original features only exist where existing tools genuinely don't reach — notio and biblio are honest gaps; anything else needs scrutiny.
Six io submodules under packages/ are the tools projio glues: biblio, codio, figio, indexio, notio, pipeio. They are independently installable (each has its own pyproject, CLI, tests, mcp.py).
Inventory (sizes are approximate)¶
Projio core (~17.7k LOC in src/projio/)¶
| File | LOC | Assessment |
|---|---|---|
init.py |
2091 | Scaffolding has overgrown — feature accumulation across three project kinds |
mcp/server.py |
1926 | Tool registration + cross-cutting MCP tools — mostly defensible |
mcp/pipeio.py |
1833 | Wrapper over pipeio.mcp.* — large, but pipeio has 60+ tools |
mcp/presentio.py |
1453 | Wrapper over notio.present.* — questionable that it sits in projio not notio |
mcp/questio.py |
1274 | No backing package — pure projio, 29 tools |
mcp/manuscripto.py |
1212 | Wrapper over notio.manuscript.* — same question as presentio |
sync.py |
1145 | Cross-cutting wiring (lua, CSL, pandoc-defaults, indexio sources, hooks). Heavy |
mcp/biblio.py |
1117 | Wrapper over biblio.mcp.*, biblio.config, biblio.bibtex — thicker than ideal |
site.py |
733 | mkdocs wrapping — likely accumulated nav patching |
mcp/context.py |
673 | Cross-cutting (project_context, runtime_conventions, agent_instructions, skill_read) |
mcp/report.py |
630 | No backing package — pure projio |
cli.py |
581 | Already broad surface |
mcp/rag.py |
536 | Wrapper over indexio |
mcp/codio.py |
413 | Wrapper over codio.mcp.* — healthy ratio |
mcp/notio.py |
386 | Wrapper over notio core notes |
config.py |
259 | Two-tier config — focused |
mcp/datalad.py |
233 | Wraps datalad CLI |
manuscript_cmd.py |
163 | CLI surface for manuscript ops |
mcp/site.py |
136 | MCP for site ops |
render.py |
130 | render.yml → pandoc-defaults — exemplar of focused glue |
mcp/figio.py |
128 | Wrapper over figio — thinnest of the wrappers, healthy |
Subpackage map (the surprise)¶
The surface is wider than "six packages":
- 6 standalone packages: biblio, codio, figio, indexio, notio, pipeio
- notio subpackages:
notio.manuscript,notio.present, plus core notes — wrapped bymcp/manuscripto.pyandmcp/presentio.py - Projio-native domains (no backing package):
questio(research planning),report(Quarto-based reports)
So the agent-visible "ecosystem" actually has 10 functional domains, four of which (manuscript, present, questio, report) live partly or wholly inside projio.
Where the glue framing holds up¶
- io packages are genuinely separable. Each has CLI, tests, pyproject, own mcp.py. Standalone install works.
- Most projio MCP wrappers do thin re-export of
<pkg>.mcp.*—figio.pyis the model (128 lines, 8 functions over an external package). - Graceful degradation: io imports are lazy/late inside function bodies. Missing optional deps don't break the server.
render.py(130 LOC) is the gold standard of cross-cutting glue: one config (render.yml), one output (pandoc-defaults.yaml), no domain logic.- Conventions in
.projio/are mostly load-bearing:compiled.bib,merged.bib,modkey.bib,pandoc-defaults.yaml,csl/,filters/include.lua,skills/.
Where the glue framing leaks (the smells)¶
1. questio and report are unsponsored¶
Both are substantial (1274 + 630 LOC, 46 tools combined), both are domain features with their own document layouts (plan/, docs/plan/), and neither has a backing package. They sit in projio's MCP layer.
Rule violation: original features only where existing tools don't reach. The "doesn't reach" test passes (no existing tool does research-question planning the way questio does), but the placement is wrong — they should be questio/ and reportio/ submodules under packages/, mirroring the io pattern. Right now they're projio's growing-into-product surface.
Recommendation: extract questio to packages/questio/ (its own package, own CLI, own mcp.py); decide if report is big enough to warrant the same or stays a thin pass-through to Quarto.
2. mcp/presentio.py and mcp/manuscripto.py are misplaced¶
They wrap notio.present.* and notio.manuscript.*. By the "thin wrapper" pattern, these MCP modules belong in notio itself (notio.mcp.manuscript, notio.mcp.present) and projio's MCP server should import and re-register them — same as it does for pipeio.mcp etc. The current arrangement makes notio look smaller than it is and bloats projio's MCP layer.
Recommendation: move the bodies of mcp/manuscripto.py and mcp/presentio.py into notio; leave projio with thin registration shims.
3. init.py at 2091 LOC is overgrown¶
Three project kinds (generic, tool, study) shouldn't take 2k lines. This is feature accumulation: every io package that wanted "init me" hooks added scaffolding here.
Recommendation: each io package should expose a scaffold(root) hook (biblio already has biblio.scaffold, pipeio has pipeio.scaffold). projio init should be the orchestrator that selects which scaffolds to run — a few hundred lines, not 2k.
4. sync.py reaches across subsystem boundaries¶
1145 LOC of "auto-discover codio libs, sync lua filter, copy CSL, generate pandoc-defaults, rebuild indexio sources, install git hooks, detect project_utils, …" The pattern is consistent: projio is doing each subsystem's sync chores instead of calling each subsystem's sync().
Recommendation: per-package sync() hooks; projio's sync is the conductor. This is the same pattern as init.py.
5. The CLI surface is broader than DataLad's discipline allows¶
581 LOC of cli.py plus manuscript_cmd.py (163) plus subcommand sprawl. DataLad's CLI maps roughly 1:1 to user-mental-model operations. Projio's includes ops that users rarely invoke (projio render, projio site subcommands, git untrack).
Recommendation: audit CLI subcommand usage; demote rarely-used ones to MCP tools or hide behind projio internal.
6. .projio/ is accumulating per-package subdirs¶
Today: biblio/, codio/, figio/, filters/, indexio/, notio/, pipeio/, render/, site/, skills/, claude/. Twelve top-level entries. Some are projio-owned (render/, filters/, skills/, claude/); most are per-package. Per the consolidation memory, this is intended — .<pkg>/ consolidated into .projio/<pkg>/.
Status: defensible, but a periodic sweep is worth scheduling. Each subdir should answer "what writes here, what reads here, would removing it break a real workflow."
7. No plugin discovery for io packages¶
To add a 7th io package today, you must edit src/projio/mcp/server.py and add a registration block. There's no entry-point system, no pyproject.toml discovery hook. That's friction for a self-described ecosystem.
Recommendation: define a projio.subsystem entry-point group; each io package registers its MCP module + scaffold + sync via entry points. Server boot iterates entry points.
8. site.py at 733 LOC suggests accumulated mkdocs.yml munging¶
Wrapping mkdocs build/serve is a few dozen lines. The bulk is likely nav patching, theme config, plugin selection. That's defensible as cross-cutting glue (mkdocs touches biblio output, codio output, manuscript output, etc.) but worth refactoring into smaller composable pieces if any single section dominates.
Convention audit (.projio/ entries)¶
| Path | Owned by | Read by | Verdict |
|---|---|---|---|
config.yml |
projio | all | core |
projio.mk |
sync | makefiles | load-bearing |
packages.yml |
sync | sync | check usage |
servers.json |
sync | mcp | check usage |
render/ |
render.py | pandoc, mkdocs | load-bearing |
filters/include.lua |
sync (copy) | pandoc | load-bearing |
claude/ |
sync | claude code | load-bearing |
skills/ |
sync | agent_instructions | load-bearing |
biblio/ |
biblio | biblio | per-pkg consolidation |
codio/ |
codio | codio | per-pkg consolidation |
figio/ |
figio | figio | per-pkg consolidation |
indexio/ |
indexio | indexio | per-pkg consolidation |
notio/ |
notio | notio | per-pkg consolidation |
pipeio/ |
pipeio | pipeio | per-pkg consolidation |
site/ |
site.py | mkdocs | confirm role |
External tool integration discipline¶
The candidate test cases:
- DataLad: clean — projio shells out via
helpers/, never replaces git/git-annex. - Pandoc: clean — projio generates
pandoc-defaults.yamland lua filter, then invokes pandoc directly. - MkDocs: thin wrapper, but
site.py's 733 LOC suggests creeping ownership of mkdocs.yml — re-audit. - Snakemake: clean — pipeio shells out, with conda/pixi auto-detection. Good.
- Jupyter / Marimo: pluggable backend protocol in pipeio. Exemplary.
- Quarto (via
report): unclear —report.pylives in projio MCP with no package home. Either commit toreportioor keepreportas the thinnest possible Quarto invocation. - Zotero / OpenAlex / CrossRef: clean — biblio owns these; projio doesn't see them.
- GitNexus (proposed): the right answer is peer MCP, no integration code. Add a doc note in agent_instructions; optionally one capability hook in
pipeio_mod_auditfor blast-radius-on-promotion if the value lands. No graphio. No codio enrichment. No.gitnexus/integration inprojio sync.
Action list (priority order)¶
- Extract questio.
packages/questio/with own pyproject, CLI, mcp.py. Move 1274 LOC out of projio. - Move manuscripto/presentio into notio.
notio.mcp.manuscript,notio.mcp.present. Projio MCP becomes shim re-registration. - Decide on
report. Either extract toreportioor shrink to a thin Quarto wrapper. Don't leave it as a 630-LOC orphan. - Decompose
sync.pyinto per-packagesync()hooks. Projio sync becomes orchestrator. - Decompose
init.pyinto per-packagescaffold()hooks. Same pattern. - Audit
cli.py+site.pyfor command/feature sprawl. - Plugin discovery via entry points — set up
projio.subsystemgroup; migrate one package as proof. - GitNexus: add doc paragraph in
agent_instructionsdescribing it as a recommended peer MCP. No code.
Open questions¶
- Should
reportgraduate to a package, or is it a glue-only feature (Quarto invocation + frontmatter wiring)? - Are
presentio/manuscriptoin projio because notio doesn't yet have an MCP layer, or for some historical reason worth preserving? - Is the entry-point plugin model worth the refactor cost given that the io set is stable?
- What's the test coverage of projio glue itself (sync, render, init)? The io packages have their own tests; projio's glue layer is the biggest source of integration risk.
Bottom line¶
Projio's "glue, like DataLad" identity is defensible and largely lived up to — but four specific drifts threaten it: (1) questio + report grew inside projio without earning their own packages; (2) manuscripto + presentio sit in projio when they belong in notio; (3) init.py and sync.py are doing per-package work that should live in the packages; (4) there's no plugin model so the ecosystem isn't actually open. None are crises; all are addressable with refactors that shrink projio rather than grow it. That direction is the right one.