Library Workspace Layout¶
This document defines the on-disk layout that codio owns, both as implemented today and as proposed for managed library workspaces. It is a reference specification, not a tutorial.
Throughout, implemented means the behavior exists in src/codio/ today.
Proposed means the layout addresses a concrete gap but has no code yet.
1. Current Layout (Implemented)¶
Running codio init in a project root creates the following:
<project_root>/
.codio/
catalog.yml # library identity metadata (LibraryCatalogEntry)
profiles.yml # project-specific policy (ProjectProfileEntry)
docs/
reference/
codelib/
libraries/ # curated note markdown files
What each artifact is¶
| Path | Owner | Purpose |
|---|---|---|
.codio/catalog.yml |
codio | YAML dict under libraries: key. Each entry is a LibraryCatalogEntry keyed by library slug. |
.codio/profiles.yml |
codio | YAML dict under profiles: key. Each entry is a ProjectProfileEntry keyed by library slug. |
docs/reference/codelib/libraries/ |
codio | Directory for curated note files. Profile entries reference these via curated_note field. |
Configuration¶
All three paths are configurable via .projio/config.yml under the codio
section (see src/codio/config.py):
# .projio/config.yml
codio:
catalog_path: .codio/catalog.yml # default
profiles_path: .codio/profiles.yml # default
notes_dir: docs/reference/codelib/libraries/ # default
Paths are resolved relative to project_root (the directory containing
.projio/ or the current working directory).
What codio does not own today¶
Codio reads but does not own:
.projio/config.yml— owned by projio.infra/indexio/config.yaml— codio writes its own source IDs (codio-notes,codio-catalog) but the file is shared with other tools.- Source code directories referenced by catalog
pathfields — codio records the path but does not create, clone, or modify the code.
This is the complete set of codio-owned filesystem artifacts as of the current implementation.
2. Candidate Managed-Workspace Layout (Proposed)¶
When codio manages a local clone of an external repository (storage mode
managed per the proposed Repository entity), it needs a deterministic
location for that clone. The proposed layout places managed repositories under
.codio/mirrors/.
Proposed directory structure¶
<project_root>/
.codio/
catalog.yml
profiles.yml
repos.yml # proposed: Repository entity storage
mirrors/ # proposed: managed clones root
<repo_id>/ # one directory per managed repository
.git/
<repo contents>
docs/
reference/
codelib/
libraries/
Rationale for .codio/mirrors/¶
- Single ownership root. All codio-owned artifacts live under
.codio/. This avoids ambiguity about which tool owns a top-level directory likecode/lib/repos/orlibs/. - Gitignore-friendly. Projects can add
.codio/mirrors/to.gitignoresince managed clones are reproducible from their remote URLs. - No conflict with project source. Managed clones are reference material,
not project source code. Keeping them under
.codio/makes this distinction clear in the filesystem hierarchy. - DataLad compatibility. If the project uses DataLad for large file
management,
.codio/mirrors/can be registered as a DataLad subdataset or excluded entirely. Placing clones under a dot-directory avoids interference with DataLad's top-level tracking.
Alternative considered: project-level code/lib/repos/¶
A top-level directory like code/lib/repos/<repo_id>/ was considered. This
has the advantage of visibility (not hidden) and aligns with projects that
already use a code/ tree for source organization. However, it introduces
ambiguity about ownership: is code/lib/ owned by codio, by the project, or
by another tool? The .codio/mirrors/ approach avoids this question entirely.
Assumption: This layout assumes codio operates within a single project root. Multi-project workspaces (e.g., a monorepo containing multiple independent codio registries) would need a separate convention. This document does not address that case.
3. Naming and Identity Rules¶
repo_id as filesystem path component¶
The repo_id slug from the proposed Repository entity doubles as the
directory name under .codio/mirrors/. This means repo_id values must be
valid filesystem path components.
Slug conventions¶
| Convention | Example | Filesystem path |
|---|---|---|
| Flat slug | scipy |
.codio/mirrors/scipy/ |
| Namespaced slug | scipy--scipy |
.codio/mirrors/scipy--scipy/ |
| Owner-repo with separator | scikit-learn--scikit-learn |
.codio/mirrors/scikit-learn--scikit-learn/ |
Rules:
- Characters: lowercase ASCII letters, digits, hyphens. The double-hyphen
--serves as the namespace separator (replacing/which is not valid in directory names). - No nested directories. The mirrors directory is flat.
repo_idscipy--scipymaps to.codio/mirrors/scipy--scipy/, not.codio/mirrors/scipy/scipy/. - Uniqueness:
repo_idmust be unique within a single registry. Two catalog entries may reference the samerepo_id(monorepo case), but two Repository entries may not share arepo_id. - Derivation from URL: For repositories cloned from a remote, the
suggested default derivation is
<owner>--<repo>extracted from the clone URL (e.g.,https://github.com/scipy/scipybecomesscipy--scipy). For local-only repositories, the slug is user-chosen.
Mapping from repo_id to local path¶
For managed repositories, the local path is deterministic:
local_path = <project_root> / .codio / mirrors / <repo_id>
This path is recorded in the Repository entity's local_path field but can
always be reconstructed from repo_id and project_root. The recorded path
serves as a cache; the canonical source of truth is the repo_id plus the
mirrors root convention.
4. Required Metadata Alongside Managed Repos¶
Managed clones are bare git repositories (or standard clones — this is an unresolved question). Codio does not store additional metadata files inside the cloned repository directory. All metadata lives in the registry files.
Repository-level metadata (in .codio/repos.yml)¶
Each managed repository has an entry in the proposed repos.yml file:
# .codio/repos.yml
repositories:
scipy--scipy:
repo_id: scipy--scipy
url: https://github.com/scipy/scipy.git
hosting: github
storage: managed
local_path: .codio/mirrors/scipy--scipy
default_branch: main
What is not stored alongside the clone¶
- Sync state (last pull timestamp, current HEAD) — can be derived from the
git repository itself via
git logorgit rev-parse. - Codio-specific config — no
.codio.ymlor similar file is placed inside the managed clone. The clone is a faithful mirror of upstream. - Index artifacts — indexio indexes are stored in
infra/indexio/, not alongside the source.
CodeSource pointers¶
When a catalog entry references a specific subtree within a managed repository, the proposed CodeSource entity records the subpath:
# Example: catalog entry referencing a code source
libraries:
scipy-linalg:
name: scipy-linalg
kind: external_mirror
repo_url: https://github.com/scipy/scipy
path: .codio/mirrors/scipy--scipy/scipy/linalg
In the proposed model, this path would be replaced by a source_id
referencing a CodeSource entity that combines repo_id: scipy--scipy with
subpath: scipy/linalg. The current path field on LibraryCatalogEntry
continues to work as a plain string in the interim.
5. Attached Repository Layout¶
An attached repository is one that already exists on the filesystem. Codio records its location but does not clone, pull, or modify it.
What codio stores¶
A Repository entry with storage: attached:
repositories:
internal--utils:
repo_id: internal--utils
url: ""
hosting: local
storage: attached
local_path: /home/user/projects/internal-utils
default_branch: main
What codio does not do for attached repos¶
- No cloning. The directory must already exist.
- No syncing. Codio does not run
git pullor any git operations. - No directory creation. If
local_pathdoes not exist, validation warns but codio does not create it. - No ownership. The attached directory is not under
.codio/. It may be anywhere on the filesystem.
Key difference from managed repos¶
| Aspect | Managed | Attached |
|---|---|---|
| Clone location | .codio/mirrors/<repo_id>/ |
Anywhere; user-specified |
| Created by codio | Yes | No |
| Updated by codio | Yes (sync commands) | No |
| Path deterministic | Yes (derived from repo_id) |
No (recorded as-is) |
Safe to delete .codio/mirrors/ |
Yes (re-cloneable) | N/A |
| Absolute vs relative path | Relative to project root | May be absolute or relative |
External repositories (no local clone)¶
Repositories with storage: external have no local_path. They exist in the
registry as metadata only (URL, hosting, default branch). Catalog entries
referencing external repositories cannot use path-based operations (e.g.,
indexing source code) but can still participate in discovery via curated notes
and capability tags.
6. Relationship to Current Catalog and Profile Files¶
Backward compatibility¶
The proposed layout is additive. Nothing about the existing .codio/catalog.yml
and .codio/profiles.yml schema changes. Specifically:
catalog.ymlkeeps thelibraries:top-level key. No structural change.profiles.ymlkeeps theprofiles:top-level key. No structural change.- The
pathfield onLibraryCatalogEntryremains a plain string. It can point to a managed mirror path (e.g.,.codio/mirrors/scipy--scipy/scipy/linalg) or to any other local directory, exactly as today.
New file: repos.yml¶
The Repository entity is stored in a new file, .codio/repos.yml, rather than
adding a section to catalog.yml. Rationale:
- Separation of concerns. Repository identity is distinct from library identity. A repository may contain multiple libraries. Mixing them in one file would require a schema version bump and migration logic.
- Independent lifecycle. Repositories can be added or removed without editing the catalog. A managed mirror can exist before any catalog entries reference it.
- Simpler validation. Catalog validation and repository validation are
independent passes. Cross-referencing (catalog
repo_idFK to repositoryrepo_idPK) is a separate check.
Migration path¶
- No migration required for existing registries. The absence of
repos.ymland.codio/mirrors/is the current state. Codio continues to work without them. - Opt-in managed mirrors. When a user runs a future
codio cloneorcodio mirror addcommand, codio createsrepos.ymland.codio/mirrors/on demand. - Gradual adoption of
repo_id. Catalog entries gain an optionalrepo_idfield. Entries withoutrepo_idcontinue to use thepathandrepo_urlfields as unstructured metadata, as they do today. - Scaffold update.
codio initdoes not createrepos.ymlor.codio/mirrors/unless the user opts in. The default scaffold remains minimal (catalog + profiles + notes directory).
Config additions¶
Two new config fields (with defaults):
# .projio/config.yml (proposed additions)
codio:
repos_path: .codio/repos.yml # default
mirrors_dir: .codio/mirrors/ # default
These follow the same pattern as catalog_path, profiles_path, and
notes_dir: relative to project_root, overridable in .projio/config.yml.
7. Unresolved Questions¶
7.1 Storage ownership boundary¶
When codio clones a repository into .codio/mirrors/<repo_id>/, who owns the
git state? Codio created it, but it is a full git repository with its own
.git/ directory, branches, and potentially uncommitted changes (if a user
edits files there). Options:
- Codio owns it fully. Codio may delete and re-clone at any time. Users should not make local modifications. This is the simplest model.
- Codio owns the clone, users may branch. Codio manages the default branch but users can create local branches for experimentation. This requires codio to avoid force-resetting.
- Shared ownership. Codio tracks upstream but the user is responsible for the working tree. This blurs the line between managed and attached.
Recommendation: start with full ownership (option 1). Document that managed mirrors are ephemeral reference copies.
7.2 Durability and .gitignore¶
Should .codio/mirrors/ be gitignored by default? Arguments for: managed
clones are large, reproducible, and should not be committed to the host
project. Arguments against: if the project uses DataLad, the mirrors might be
tracked as DataLad subdatasets for reproducibility.
The scaffold should add .codio/mirrors/ to .gitignore by default but
document how to override this for DataLad workflows.
7.3 Workspace root assumptions¶
The entire layout assumes a single project_root containing .codio/. This
assumption breaks in at least two scenarios:
- Monorepo with multiple codio registries. Each sub-project might have its
own
.codio/directory. The mirrors directory would need to be per-registry or shared with deduplication. - Detached registry. A codio registry stored outside the project it
describes (e.g., a team-shared catalog in a separate repository). The
project_rootfor config resolution differs from the root containing the code.
Neither scenario is addressed by this layout. They are noted as future design constraints.
7.4 DataLad interaction¶
This project uses DataLad (evidenced by [DATALAD] commits in the git
history). DataLad manages datasets as git/git-annex repositories and tracks
subdatasets. Potential interactions:
- Managed clones as subdatasets. If
.codio/mirrors/<repo_id>/is registered as a DataLad subdataset, DataLad tracks its state and can reproduce it. This is useful for reproducibility but adds complexity to codio's sync operations (must go through DataLad, not raw git). - Annex conflicts. If the host project uses git-annex, files in
.codio/mirrors/might be annexed unintentionally. The.gitignoreapproach (section 7.2) avoids this, but a DataLad-aware workflow might want explicit.codio/mirrors/**entries in.gitattributes. - Lock files. DataLad operations create lock files. If codio runs
git pullinside a managed mirror while DataLad is operating on the parent dataset, there may be contention.
Resolution requires input from the DataLad integration design. For now, codio should treat managed mirrors as plain git clones and document the DataLad interaction as an integration point, not a built-in feature.
7.5 Shallow clones and partial checkouts¶
Managed mirrors of large repositories (e.g., CPython, Linux kernel) may be impractical as full clones. Options:
- Shallow clone (
git clone --depth 1): small footprint, butgit logand blame are limited. - Partial clone with sparse checkout: only materializes the subtrees referenced by CodeSource entries. Complex to manage but space-efficient.
- No clone; metadata only: treat the repository as
externaland rely on indexio corpus queries for code intelligence. Simplest but loses local file access.
This is a policy decision per repository, not a layout question. The layout
supports any of these options since the .codio/mirrors/<repo_id>/ directory
is just a git working tree regardless of clone depth.
7.6 repos.yml vs. a section in catalog.yml¶
Section 6 recommends a separate repos.yml file. The alternative is a
repositories: top-level key in catalog.yml. Arguments for a section in
catalog.yml:
- Fewer files to manage.
- Atomic reads: loading the catalog also loads repository metadata.
- Simpler scaffold.
Arguments for a separate file (current recommendation):
- Repository and library are different entities with different lifecycles.
- A repository can exist before any library references it (e.g., cloned for exploration, not yet cataloged).
- Schema evolution is independent.
This remains an open question pending implementation experience.
7.7 Cleanup and garbage collection¶
If a catalog entry referencing a managed mirror is deleted, should codio automatically delete the clone? Options:
- Manual cleanup.
codio mirror pruneremoves clones with no catalog references. - Automatic cleanup. Deleting the last catalog entry referencing a
repo_idtriggers clone removal. Risky if the deletion was accidental. - No cleanup. Clones persist until the user manually deletes
.codio/mirrors/<repo_id>/.
Recommendation: manual cleanup via an explicit command, with a dry-run mode.
8. Summary of Proposed Filesystem Artifacts¶
<project_root>/
.codio/
catalog.yml # implemented: library identity metadata
profiles.yml # implemented: project-specific policy
repos.yml # proposed: Repository entity storage
mirrors/ # proposed: managed clone root
<repo_id>/ # proposed: one per managed repository
<repo contents>
docs/
reference/
codelib/
libraries/ # implemented: curated note markdown files
<library-slug>.md
.projio/
config.yml # referenced: codio config section
infra/
indexio/
config.yaml # referenced: codio writes owned source IDs
Ownership summary:
| Artifact | Status | Owner | Writable by codio |
|---|---|---|---|
.codio/catalog.yml |
Implemented | codio | Yes |
.codio/profiles.yml |
Implemented | codio | Yes |
docs/reference/codelib/libraries/ |
Implemented | codio | Yes |
.codio/repos.yml |
Proposed | codio | Yes |
.codio/mirrors/<repo_id>/ |
Proposed | codio | Yes (clone/sync) |
.projio/config.yml |
Implemented | projio | No (read only) |
infra/indexio/config.yaml |
Implemented | shared | Codio-owned IDs only |