Skip to content

Projio ecosystem audit — glue identity, drift, and the integration boundary

Tags: architecture, projio, ecosystem, glue, refactor Series: ecosystem-audit

Frame

Projio is glue, in the DataLad sense — an integration layer over a constellation of independently useful tools (pandoc, snakemake, mkdocs, jupyter/marimo, pixi, datalad, openalex/zotero, optionally tree-sitter via gitnexus). The discipline that makes that identity respectable, taken from DataLad:

  1. Don't replace what you're gluing — bare snakemake, pandoc, mkdocs, jupyter must still work.
  2. Conventions should be minimal and load-bearing — every entry under .projio/ must justify its existence.
  3. Original features only exist where existing tools genuinely don't reach — notio and biblio are honest gaps; anything else needs scrutiny.

Six io submodules under packages/ are the tools projio glues: biblio, codio, figio, indexio, notio, pipeio. They are independently installable (each has its own pyproject, CLI, tests, mcp.py).

Inventory (sizes are approximate)

Projio core (~17.7k LOC in src/projio/)

File LOC Assessment
init.py 2091 Scaffolding has overgrown — feature accumulation across three project kinds
mcp/server.py 1926 Tool registration + cross-cutting MCP tools — mostly defensible
mcp/pipeio.py 1833 Wrapper over pipeio.mcp.* — large, but pipeio has 60+ tools
mcp/presentio.py 1453 Wrapper over notio.present.* — questionable that it sits in projio not notio
mcp/questio.py 1274 No backing package — pure projio, 29 tools
mcp/manuscripto.py 1212 Wrapper over notio.manuscript.* — same question as presentio
sync.py 1145 Cross-cutting wiring (lua, CSL, pandoc-defaults, indexio sources, hooks). Heavy
mcp/biblio.py 1117 Wrapper over biblio.mcp.*, biblio.config, biblio.bibtex — thicker than ideal
site.py 733 mkdocs wrapping — likely accumulated nav patching
mcp/context.py 673 Cross-cutting (project_context, runtime_conventions, agent_instructions, skill_read)
mcp/report.py 630 No backing package — pure projio
cli.py 581 Already broad surface
mcp/rag.py 536 Wrapper over indexio
mcp/codio.py 413 Wrapper over codio.mcp.* — healthy ratio
mcp/notio.py 386 Wrapper over notio core notes
config.py 259 Two-tier config — focused
mcp/datalad.py 233 Wraps datalad CLI
manuscript_cmd.py 163 CLI surface for manuscript ops
mcp/site.py 136 MCP for site ops
render.py 130 render.yml → pandoc-defaults — exemplar of focused glue
mcp/figio.py 128 Wrapper over figio — thinnest of the wrappers, healthy

Subpackage map (the surprise)

The surface is wider than "six packages":

  • 6 standalone packages: biblio, codio, figio, indexio, notio, pipeio
  • notio subpackages: notio.manuscript, notio.present, plus core notes — wrapped by mcp/manuscripto.py and mcp/presentio.py
  • Projio-native domains (no backing package): questio (research planning), report (Quarto-based reports)

So the agent-visible "ecosystem" actually has 10 functional domains, four of which (manuscript, present, questio, report) live partly or wholly inside projio.

Where the glue framing holds up

  • io packages are genuinely separable. Each has CLI, tests, pyproject, own mcp.py. Standalone install works.
  • Most projio MCP wrappers do thin re-export of <pkg>.mcp.*figio.py is the model (128 lines, 8 functions over an external package).
  • Graceful degradation: io imports are lazy/late inside function bodies. Missing optional deps don't break the server.
  • render.py (130 LOC) is the gold standard of cross-cutting glue: one config (render.yml), one output (pandoc-defaults.yaml), no domain logic.
  • Conventions in .projio/ are mostly load-bearing: compiled.bib, merged.bib, modkey.bib, pandoc-defaults.yaml, csl/, filters/include.lua, skills/.

Where the glue framing leaks (the smells)

1. questio and report are unsponsored

Both are substantial (1274 + 630 LOC, 46 tools combined), both are domain features with their own document layouts (plan/, docs/plan/), and neither has a backing package. They sit in projio's MCP layer.

Rule violation: original features only where existing tools don't reach. The "doesn't reach" test passes (no existing tool does research-question planning the way questio does), but the placement is wrong — they should be questio/ and reportio/ submodules under packages/, mirroring the io pattern. Right now they're projio's growing-into-product surface.

Recommendation: extract questio to packages/questio/ (its own package, own CLI, own mcp.py); decide if report is big enough to warrant the same or stays a thin pass-through to Quarto.

2. mcp/presentio.py and mcp/manuscripto.py are misplaced

They wrap notio.present.* and notio.manuscript.*. By the "thin wrapper" pattern, these MCP modules belong in notio itself (notio.mcp.manuscript, notio.mcp.present) and projio's MCP server should import and re-register them — same as it does for pipeio.mcp etc. The current arrangement makes notio look smaller than it is and bloats projio's MCP layer.

Recommendation: move the bodies of mcp/manuscripto.py and mcp/presentio.py into notio; leave projio with thin registration shims.

3. init.py at 2091 LOC is overgrown

Three project kinds (generic, tool, study) shouldn't take 2k lines. This is feature accumulation: every io package that wanted "init me" hooks added scaffolding here.

Recommendation: each io package should expose a scaffold(root) hook (biblio already has biblio.scaffold, pipeio has pipeio.scaffold). projio init should be the orchestrator that selects which scaffolds to run — a few hundred lines, not 2k.

4. sync.py reaches across subsystem boundaries

1145 LOC of "auto-discover codio libs, sync lua filter, copy CSL, generate pandoc-defaults, rebuild indexio sources, install git hooks, detect project_utils, …" The pattern is consistent: projio is doing each subsystem's sync chores instead of calling each subsystem's sync().

Recommendation: per-package sync() hooks; projio's sync is the conductor. This is the same pattern as init.py.

5. The CLI surface is broader than DataLad's discipline allows

581 LOC of cli.py plus manuscript_cmd.py (163) plus subcommand sprawl. DataLad's CLI maps roughly 1:1 to user-mental-model operations. Projio's includes ops that users rarely invoke (projio render, projio site subcommands, git untrack).

Recommendation: audit CLI subcommand usage; demote rarely-used ones to MCP tools or hide behind projio internal.

6. .projio/ is accumulating per-package subdirs

Today: biblio/, codio/, figio/, filters/, indexio/, notio/, pipeio/, render/, site/, skills/, claude/. Twelve top-level entries. Some are projio-owned (render/, filters/, skills/, claude/); most are per-package. Per the consolidation memory, this is intended — .<pkg>/ consolidated into .projio/<pkg>/.

Status: defensible, but a periodic sweep is worth scheduling. Each subdir should answer "what writes here, what reads here, would removing it break a real workflow."

7. No plugin discovery for io packages

To add a 7th io package today, you must edit src/projio/mcp/server.py and add a registration block. There's no entry-point system, no pyproject.toml discovery hook. That's friction for a self-described ecosystem.

Recommendation: define a projio.subsystem entry-point group; each io package registers its MCP module + scaffold + sync via entry points. Server boot iterates entry points.

8. site.py at 733 LOC suggests accumulated mkdocs.yml munging

Wrapping mkdocs build/serve is a few dozen lines. The bulk is likely nav patching, theme config, plugin selection. That's defensible as cross-cutting glue (mkdocs touches biblio output, codio output, manuscript output, etc.) but worth refactoring into smaller composable pieces if any single section dominates.

Convention audit (.projio/ entries)

Path Owned by Read by Verdict
config.yml projio all core
projio.mk sync makefiles load-bearing
packages.yml sync sync check usage
servers.json sync mcp check usage
render/ render.py pandoc, mkdocs load-bearing
filters/include.lua sync (copy) pandoc load-bearing
claude/ sync claude code load-bearing
skills/ sync agent_instructions load-bearing
biblio/ biblio biblio per-pkg consolidation
codio/ codio codio per-pkg consolidation
figio/ figio figio per-pkg consolidation
indexio/ indexio indexio per-pkg consolidation
notio/ notio notio per-pkg consolidation
pipeio/ pipeio pipeio per-pkg consolidation
site/ site.py mkdocs confirm role

External tool integration discipline

The candidate test cases:

  • DataLad: clean — projio shells out via helpers/, never replaces git/git-annex.
  • Pandoc: clean — projio generates pandoc-defaults.yaml and lua filter, then invokes pandoc directly.
  • MkDocs: thin wrapper, but site.py's 733 LOC suggests creeping ownership of mkdocs.yml — re-audit.
  • Snakemake: clean — pipeio shells out, with conda/pixi auto-detection. Good.
  • Jupyter / Marimo: pluggable backend protocol in pipeio. Exemplary.
  • Quarto (via report): unclear — report.py lives in projio MCP with no package home. Either commit to reportio or keep report as the thinnest possible Quarto invocation.
  • Zotero / OpenAlex / CrossRef: clean — biblio owns these; projio doesn't see them.
  • GitNexus (proposed): the right answer is peer MCP, no integration code. Add a doc note in agent_instructions; optionally one capability hook in pipeio_mod_audit for blast-radius-on-promotion if the value lands. No graphio. No codio enrichment. No .gitnexus/ integration in projio sync.

Action list (priority order)

  1. Extract questio. packages/questio/ with own pyproject, CLI, mcp.py. Move 1274 LOC out of projio.
  2. Move manuscripto/presentio into notio. notio.mcp.manuscript, notio.mcp.present. Projio MCP becomes shim re-registration.
  3. Decide on report. Either extract to reportio or shrink to a thin Quarto wrapper. Don't leave it as a 630-LOC orphan.
  4. Decompose sync.py into per-package sync() hooks. Projio sync becomes orchestrator.
  5. Decompose init.py into per-package scaffold() hooks. Same pattern.
  6. Audit cli.py + site.py for command/feature sprawl.
  7. Plugin discovery via entry points — set up projio.subsystem group; migrate one package as proof.
  8. GitNexus: add doc paragraph in agent_instructions describing it as a recommended peer MCP. No code.

Open questions

  • Should report graduate to a package, or is it a glue-only feature (Quarto invocation + frontmatter wiring)?
  • Are presentio/manuscripto in projio because notio doesn't yet have an MCP layer, or for some historical reason worth preserving?
  • Is the entry-point plugin model worth the refactor cost given that the io set is stable?
  • What's the test coverage of projio glue itself (sync, render, init)? The io packages have their own tests; projio's glue layer is the biggest source of integration risk.

Bottom line

Projio's "glue, like DataLad" identity is defensible and largely lived up to — but four specific drifts threaten it: (1) questio + report grew inside projio without earning their own packages; (2) manuscripto + presentio sit in projio when they belong in notio; (3) init.py and sync.py are doing per-package work that should live in the packages; (4) there's no plugin model so the ecosystem isn't actually open. None are crises; all are addressable with refactors that shrink projio rather than grow it. That direction is the right one.