Skip to content

## Implement: OpenAlex API P1 fixes from audit Apply all P1 findings from `docs

Implement: OpenAlex API P1 fixes from audit

Apply all P1 findings from docs/specs/biblio/openalex-audit.md.

Tasks

  1. Batch DOI lookups via OR filter in openalex_client.py:
  2. Add get_works_by_dois(dois: list[str]) that uses filter=doi:doi1|doi2|...|doi50
  3. Batch into groups of 50 (OpenAlex limit)
  4. Use in openalex_resolve.py to replace one-at-a-time resolution

  5. Default per_page=200 for bulk operations:

  6. In openalex_client.py OpenAlexClientConfig, change default from 25 to 200
  7. The API max is 200, and biblio's bulk paths (graph expand, author works, institution works) all paginate anyway

  8. Extract type field in _extract_work (author_search.py):

  9. Add type: str | None to WorkRecord dataclass
  10. Extract from data.get("type") — values like "article", "book", "dataset", "preprint"

  11. Extract is_retracted field in _extract_work:

  12. Add is_retracted: bool to WorkRecord
  13. Extract from data.get("is_retracted", False)

  14. Extract topics in _extract_work:

  15. Add topics: list[dict] to WorkRecord (or a TopicRecord dataclass)
  16. Extract primary_topic + topics[] with hierarchy (domain/field/subfield/topic) and scores
  17. This data is already fetched via DEFAULT_SELECT but discarded

  18. Full affiliations history for _extract_author:

  19. Add affiliations: list[dict] to AuthorRecord
  20. Extract from data.get("affiliations") — list of {institution, years}
  21. Keep affiliation (singular) as the last_known for backwards compat

  22. Cache author/institution lookups in openalex_cache.py:

  23. Add author/{hash}.json and institution/{hash}.json cache paths
  24. Wire into get_author() and get_institution() in client

Key files

  • packages/biblio/src/biblio/openalex/openalex_client.py
  • packages/biblio/src/biblio/openalex/openalex_cache.py
  • packages/biblio/src/biblio/openalex/openalex_resolve.py
  • packages/biblio/src/biblio/author_search.py
  • packages/biblio/src/biblio/discovery.py
  • packages/biblio/src/biblio/mcp.py — update return dicts to include new fields
  • docs/specs/biblio/openalex-audit.md — source of requirements