Skip to content

## Audit: biblio OpenAlex API usage vs actual API capabilities

Now that we have

Goal

(promoted from note)

Context

(see source note)

Prompt

Fix the issue described in /storage2/arash/projects/projio/docs/log/issue/issue-arash-20260403-193002-484673.md. Read it first, then implement the proposed fix.

Acceptance Criteria

  • [ ]

Result

Completed in 222.4s (1 steps, model: opus)

Session: 6803550f-8532-4a28-a118-e23d18ebd130 Result note: /storage2/arash/worklog/workflow/captures/20260403-193409-64a440/note.md

Output

Audit complete. Created `docs/specs/biblio/openalex-audit.md` with findings across 3 audit areas:

**Key findings:**
- **No bugs** — all endpoints, filters, and cursor pagination are correct
- **6 P1 gaps**: batch DOI lookups via OR filter (50x speedup), missing `type`/`is_retracted`/`topics` extraction in `WorkRecord`, default `per_page` should be 200 for bulk ops, only first author affiliation captured
- **11 P2 gaps**: `group_by` support, `keywords`/`grants`/`counts_by_year`/`related_works`/`fwci` extraction, cache TTL, author/institution cache, rate limiter
- **8 P3 gaps**: `sample`/`seed`, SDGs, abstracts, MeSH, etc.

## Next Steps

1. Implement the P1 batch DOI lookup using the OR filter (pipe `|`) in `openalex_resolve.py` to speed up srcbib resolution by up to 50x.
2. Add `type`, `is_retracted`, and `topics` fields to the `WorkRecord` dataclass and `_extract_work` function in `author_search.py`.
3. Change the default `per_page` from 25 to 200 for bulk operations in `openalex_client.py` and add author/institution caching to `openalex_cache.py`.