## Implement: OpenAlex API P1 fixes from audit Apply all P1 findings from `docs
Implement: OpenAlex API P1 fixes from audit¶
Apply all P1 findings from docs/specs/biblio/openalex-audit.md.
Tasks¶
- Batch DOI lookups via OR filter in
openalex_client.py: - Add
get_works_by_dois(dois: list[str])that usesfilter=doi:doi1|doi2|...|doi50 - Batch into groups of 50 (OpenAlex limit)
-
Use in
openalex_resolve.pyto replace one-at-a-time resolution -
Default
per_page=200for bulk operations: - In
openalex_client.pyOpenAlexClientConfig, change default from 25 to 200 -
The API max is 200, and biblio's bulk paths (graph expand, author works, institution works) all paginate anyway
-
Extract
typefield in_extract_work(author_search.py): - Add
type: str | NonetoWorkRecorddataclass -
Extract from
data.get("type")— values like "article", "book", "dataset", "preprint" -
Extract
is_retractedfield in_extract_work: - Add
is_retracted: booltoWorkRecord -
Extract from
data.get("is_retracted", False) -
Extract
topicsin_extract_work: - Add
topics: list[dict]toWorkRecord(or aTopicRecorddataclass) - Extract
primary_topic+topics[]with hierarchy (domain/field/subfield/topic) and scores -
This data is already fetched via
DEFAULT_SELECTbut discarded -
Full affiliations history for
_extract_author: - Add
affiliations: list[dict]toAuthorRecord - Extract from
data.get("affiliations")— list of {institution, years} -
Keep
affiliation(singular) as the last_known for backwards compat -
Cache author/institution lookups in
openalex_cache.py: - Add
author/{hash}.jsonandinstitution/{hash}.jsoncache paths - Wire into
get_author()andget_institution()in client
Key files¶
packages/biblio/src/biblio/openalex/openalex_client.pypackages/biblio/src/biblio/openalex/openalex_cache.pypackages/biblio/src/biblio/openalex/openalex_resolve.pypackages/biblio/src/biblio/author_search.pypackages/biblio/src/biblio/discovery.pypackages/biblio/src/biblio/mcp.py— update return dicts to include new fieldsdocs/specs/biblio/openalex-audit.md— source of requirements
Related Notes¶
- issue-arash-20260403-193002-484673.md — This is the OpenAlex API audit that generated the P1 findings being implemented here
- issue-arash-20260403-193112-105596.md — Enrichment pipeline redesign is directly affected by the WorkRecord/AuthorRecord field additions and batch DOI lookup changes
- issue-arash-20260403-193037-589959.md — Topics extraction task (item 5) overlaps with the concept tagging vs OpenAlex topic classification audit
- issue-arash-20260403-210033-704498.md — GUI outbound links depend on type, is_retracted, and topics fields being extracted from OpenAlex
- issue-arash-20260403-205942-590539.md — Pool promote likely uses author/institution lookups that would benefit from the caching fix in item 7