Issue arash 20260403 193002 484673
title: "## Audit: biblio OpenAlex API usage vs actual API capabilities status: done created: 2026-04-03 updated: 2026-04-03 timestamp: 20260403-193002-484673 tags: [issue] source: agent-observation project_primary: projio capture_id: 20260403-193000-d83a54 confidence: 1.0 transcript_file: /storage2/arash/worklog/workflow/captures/20260403-193000-d83a54/transcript.txt
Audit: biblio OpenAlex API usage vs actual API capabilities¶
Now that we have the OpenAlex source code (openalex-elastic-api), docs, and tutorials indexed in RAG, audit biblio's OpenAlex integration for correctness and gaps.
Scope¶
- Endpoint/filter audit — Compare
packages/biblio/src/biblio/openalex/openalex_client.pyandpackages/biblio/src/biblio/discovery.pyagainst the actual API source in.projio/codio/mirrors/ourresearch--openalex-elastic-api/. Check: - Are we using the right endpoints and filter syntax?
- Are we missing useful query parameters (sort, group_by, sample, select)?
- Is our cursor pagination implementation correct?
-
Rate limiting / polite pool compliance (mailto parameter)
-
Data model audit — Compare what biblio extracts from OpenAlex responses (
_extract_work,_extract_authorinauthor_search.py) vs what's actually available. Check if we should extract: - Topics/keywords (OpenAlex has a rich topic hierarchy)
- Funders, grants, SDGs
- Author affiliations history (not just
last_known_institutions) counts_by_year(citation trends)-
related_worksfield -
Caching audit — Check
packages/biblio/src/biblio/openalex/openalex_cache.pyagainst API best practices from the tutorials.
Method¶
Use rag_query with corpus "codelib" to search the indexed OpenAlex repos. Read the relevant biblio source files. Produce a findings document at docs/specs/biblio/openalex-audit.md with:
- Current state (what biblio does)
- API reality (what's available)
- Gaps (what we should fix/add)
- Priority ranking
Key files¶
packages/biblio/src/biblio/openalex/openalex_client.pypackages/biblio/src/biblio/openalex/openalex_resolve.pypackages/biblio/src/biblio/openalex/openalex_cache.pypackages/biblio/src/biblio/author_search.pypackages/biblio/src/biblio/discovery.pypackages/biblio/src/biblio/graph.py
Related Notes¶
- docs/log/issue/issue-arash-20260402-233201-350554.md — biblio pdf_fetch_oa issue directly involves OpenAlex OA API behavior — relevant to the API correctness audit
- docs/log/issue/issue-arash-20260402-015659-415628.md — biblio batch docling processes papers discovered via OpenAlex — shares biblio discovery pipeline scope
- docs/log/issue/issue-arash-20260402-220152-539138.md — biblio_compile tool is part of the same biblio pipeline being audited
- docs/log/issue/issue-arash-20260402-220130-159401.md — biblio architecture reorganization affects where audit findings and config live