Shared Code Library with Codio + Indexio¶
This tutorial walks through setting up a shared, cloned mirror of external reference repositories that multiple projects can browse and query via RAG. By the end you will have a searchable code library indexed for semantic search.
Prerequisites¶
- Full ecosystem installed (
pip install "projio[all]") - MCP server connected (see Ecosystem Overview)
- A shared storage location accessible to your projects (e.g.
/storage/share/codelib/)
Why¶
- Reference implementations live in dozens of external repos. Cloning them into a shared location means any project can look up algorithms, patterns, and APIs without leaving the editor.
- Indexio's RAG turns the clones into a searchable knowledge base — ask "how does neurodsp implement burst detection?" and get the actual source code.
- Shared storage avoids duplicating clones across projects.
Architecture¶
/storage/share/codelib/ # shared clone directory
neurodsp-tools--neurodsp/ # shallow clones, named <owner>--<repo>
fooof-tools--fooof/
...
project-a/.projio/
config.yml # mirrors_dir: /storage/share/codelib
codio/
catalog.yml # library metadata (name, URL, license)
profiles.yml # triage (tier, decision, capabilities)
repos.yml # clone state (storage: managed, local_path)
indexio/
config.yaml # codelib source glob
index/ # project-local vector index
Each project has its own catalog, profiles, and index. The clones on disk are shared.
Step 1: Configure the shared clone directory¶
In .projio/config.yml:
codio:
mirrors_dir: /storage/share/codelib
Create the directory if it doesn't exist:
mkdir -p /storage/share/codelib
Step 2: Add repos to the catalog and clone¶
Via MCP:
codio_add_urls(
urls=["https://github.com/org/repo", ...],
clone=True,
shallow=True
)
This does three things per URL:
- Adds (or updates) the catalog entry with metadata fetched from GitHub/GitLab
- Shallow-clones the repo to
/storage/share/codelib/<owner>--<repo> - Updates
repos.ymlwithstorage: managedand thelocal_path
Clone behavior
shallow=Trueusesgit clone --depth 1— saves disk for large repos.- The clone uses the remote's default branch (no
-bflag), so repos usingmasterormainboth work. - GitLab URLs work too — hosting is auto-detected.
- If a repo URL has changed (e.g.,
nbara/meegkitrenamed tonbara/python-meegkit), add the new URL and remove the stale catalog entry manually.
Step 3: Triage the catalog¶
Edit profiles.yml to set priority, runtime_import, and decision_default per library. The catalog role field controls agent write access:
| Role | Priority | Clone? | Use case |
|---|---|---|---|
core |
tier1 |
Yes | Project's own compute library — agents can add code |
shared |
tier1-2 |
Yes | Lab/org library — used as-is, agents should not modify |
external |
tier2-3 |
Optional | PyPI/conda package — never modified locally |
Set the role in catalog.yml:
libraries:
cogpy:
kind: internal
role: core # actively developed, promote target
path: code/lib/cogpy
labbox:
kind: internal
role: shared # lab library, used as-is
path: code/lib/labbox
mne:
kind: external_mirror
role: external # PyPI package, reference only
Auto-discovery
projio sync scans code/lib/*/ and auto-registers libraries with role: core, kind: internal. No manual catalog editing needed for project-local libraries.
Key profile fields:
runtime_import:pip_only(installed dep),reference_only(code reference, not imported)decision_default:wrap(thin adapter),new(rewrite for your data model),direct(use as-is)
Step 4: Add the codelib source to indexio¶
In .projio/indexio/config.yaml, add:
sources:
- id: "codelib"
corpus: "codelib"
glob: "/storage/share/codelib/**/*.py"
Absolute globs are supported — indexio resolves them from the filesystem root.
Step 5: Build the index¶
Via MCP:
indexio_build(sources=["codelib"])
Partial rebuild — only indexes the codelib source, leaves other sources untouched.
Step 6: Query¶
rag_query(query="multitaper spectral estimation", corpus="codelib", k=5)
Returns ranked code chunks from the indexed repos.
Adding the library to another project¶
For a second project that wants to use the same shared library:
Point mirrors_dir at the shared location¶
In the new project's .projio/config.yml:
codio:
mirrors_dir: /storage/share/codelib
Copy or create catalog + profiles¶
Three options, from least to most shared:
Option A — Independent catalog. Create the project's own catalog.yml and profiles.yml. Run codio_add_urls with the same URLs (use clone=True — it's idempotent, skips cloning if the directory already exists).
Option B — Shared catalog via absolute path. Point at a central catalog:
codio:
mirrors_dir: /storage/share/codelib
catalog_path: /storage/share/codelib/catalog.yml
profiles_path: /storage/share/codelib/profiles.yml
Option C — Symlink. Symlink .projio/codio/ to a shared directory. Simple but couples the projects tightly.
Recommendation
Option A for most cases. Each project should own its triage decisions (tier/priority may differ by project). The clones themselves are shared; the metadata is cheap to duplicate.
Add the indexio source and build¶
Same as the first project — add the codelib source glob to .projio/indexio/config.yaml and run indexio_build(sources=["codelib"]). Each project maintains its own vector index but reads from the same code on disk.
Maintenance¶
Adding repos: codio_add_urls(urls=[...], clone=True, shallow=True) from any project. The clone lands in the shared directory; other projects pick it up on their next index rebuild.
Updating clones: Currently manual — git -C /storage/share/codelib/<owner>--<repo> pull. A codio_sync tool would be a natural addition.
Removing repos: Delete the clone directory, remove the entry from catalog.yml, profiles.yml, and repos.yml, then rebuild the index.
Re-indexing after changes: indexio_build(sources=["codelib"]).
Gotchas¶
- Absolute paths in config:
mirrors_dir,catalog_path, etc. must be handled with the_resolve()helper incodio/config.py— absolute paths should not be prepended withproject_root. - Repo renames: GitHub doesn't always redirect clone URLs for renamed repos. Verify URLs before adding. The error surfaces as
could not read Username for 'https://github.com'. - GitLab repos: Work fine — pass the GitLab URL directly. Hosting is auto-detected but GitHub metadata fetching is skipped for non-GitHub URLs.
- Branch detection: Clones use the remote's default branch. The actual branch is detected post-clone via
git symbolic-refand recorded inrepos.yml. - indexio absolute globs: Supported — paths outside the project root are stored as absolute in chunk metadata.
- MCP server caching: After reinstalling codio, restart the MCP server (VS Code: reload window or
/mcp).
Code tiers: core library + project utils¶
Beyond external mirrors, codio integrates with pipeio's code tier model. Projects typically have three code tiers:
| Tier | Location | codio role |
Purpose |
|---|---|---|---|
| Core library | code/lib/{name}/ |
core |
Dataset-agnostic, reusable functions |
| Project utils | code/utils/ |
— | Pipeline-aware glue (PipelineContext, bootstrap) |
| Flow scripts | code/pipelines/{flow}/scripts/ |
— | Snakemake wiring |
Setting up project_utils¶
In .projio/config.yml:
code:
project_utils: code/utils
Or let projio sync auto-detect it — if code/utils/ exists, it sets code.project_utils automatically.
How scaffolding uses tiers¶
When pipeio creates notebooks (nb_create) or scripts (script_create, mod_create), it:
- Queries codio for libraries with
role=core→ addsimport cogpyto the template - Reads
code.project_utilsfrom config → addsfrom utils.io import PipelineContext
This means new notebooks and scripts start with the correct imports for the project's code hierarchy.
See docs/specs/pipeio/code-tiers.md for the full specification.
What's next¶
- Agent Orchestration — multi-tool session combining all ecosystem components
- Grand Routine — the full idea-to-deployment research workflow