Feature request: codio_ingest_api — package API ingestion for agent-quality code generation
Problem¶
When agents generate notebook/script code using libraries (cogpy, scipy, xarray, etc.), they lack knowledge of: - Function signatures and type contracts - Dimension/shape broadcasting behavior (e.g. "psd_multitaper vectorizes over non-time dims") - Idiomatic usage patterns (e.g. "don't loop over channels, pass the full xarray")
This causes rookie mistakes like writing explicit for-loops over channels when the function already handles multi-dimensional input natively.
Current skill prompts list function names but can't keep up with API evolution. Docstrings help but aren't accessible to agents at generation time.
Proposed solution: codio_ingest_api¶
A new codio ingestion mode that extracts and indexes package APIs for semantic search.
Ingestion flow¶
codio_ingest_api(package="cogpy") — works for any installed Python package:
- Walk module tree via
importlib - For each public function/class:
- Extract signature via
inspect.signature - Extract type hints (including generics like
xr.DataArray) - Parse docstring (numpy/google/sphinx style) into structured sections
- Static-analyze function body for:
- xarray dim handling (
.dims,.transpose(),xr.apply_ufunc) - Broadcasting patterns (which dims are reduced, which are preserved)
- Common input validation patterns
- xarray dim handling (
- If
@api_contractdecorator present, use it as high-confidence metadata - Produce structured API index:
{module: {function: {signature, type_hints, dim_contract, docstring_summary, vectorization_info}}} - Index into existing RAG infrastructure for semantic search
Query interface¶
codio_api_query(package, query) — semantic search over the API index:
codio_api_query("cogpy", "compute PSD of multichannel xarray signal")
→ cogpy.spectral.psd.psd_multitaper
signature: (signal, fs, NW=4.0, ...)
accepts: xr.DataArray with any dims containing "time", or 1D ndarray
returns: (psd: DataArray[non-time-dims, freq], freqs: ndarray)
vectorizes_over: all non-time dims — no loops needed
→ cogpy.spectral.specx.psdx
signature: (signal, fs, ...)
accepts: same as psd_multitaper
returns: xr.Dataset with power and freqs coords
note: higher-level wrapper, returns Dataset instead of tuple
Integration with skills¶
The /notebook skill would call codio_api_query before writing cogpy code to get correct signatures and vectorization behavior. Similarly for /notebook-promote when extracting processing logic into scripts.
Optional: @api_contract decorator for packages¶
For packages that want to provide high-confidence metadata:
@api_contract(
input={"signal": "DataArray[..., time]"},
output={"psd": "DataArray[..., freq]", "freqs": "ndarray[freq]"},
vectorizes_over="all non-time dims",
)
def psd_multitaper(signal, fs, NW=4.0, ...):
The ingestion flow reads these if present, falls back to static analysis if not. The decorators also serve as documentation for human readers.
Why codio¶
codio already tracks packages (codio_list, codio_get, codio_registry). This extends it from "what packages exist and what do they do" to "what does each function accept and how does it handle multi-dimensional data." Same registry, deeper introspection.
Use cases¶
/notebookskill — write correct vectorized code on first try/notebook-promote— understand function contracts when extracting to scripts/add-feature-cogpy(cogpy-dev) — check for existing API overlap before adding new functions- Any agent writing code against an installed package
Source context: pixecog¶
PixEcog (pixecog): Neuropixels and ECoG dataset and analysis
Recent commits:
c309f45 Fix pipeline doc naming drift, populate registry doc_path, close 3 issues
84d605b Migrate 43 scripts from utils.smk.smk_log
5808910 [DATALAD] removed content
README:
type: readme
Quick Start for Collaborators¶
Follow this checklist to get started with Pixecog documentation and workflows.
🐀 Pixecog Project — Compact Overview¶
Core principles
- One immutable BIDS raw dataset (
raw/) as the canonical baseline - Each analysis pipeline ha