## Implement: Topic enrichment pipeline — persist OpenAlex topics per citekey F
Implement: Topic enrichment pipeline — persist OpenAlex topics per citekey¶
From docs/specs/biblio/enrichment-pipeline.md and docs/specs/biblio/concept-topic-overlap.md.
What¶
After OpenAlex resolution, persist the topic hierarchy and keywords for each work. Use as a free baseline for tagging before (optionally) running LLM concept extraction.
Tasks¶
- Enrichment storage —
packages/biblio/src/biblio/openalex/openalex_enrich.py(new): enrich_resolved(root)— readresolved.jsonl, extract topics/keywords/type/retracted per citekey-
Write per-citekey YAML to
bib/derivatives/openalex/{citekey}.yml:citekey: smith2024 type: article is_retracted: false primary_topic: name: "Sharp-Wave Ripples" subfield: "Behavioral Neuroscience" field: "Psychology" domain: "Social Sciences" score: 0.92 topics: [...] keywords: [...] counts_by_year: {2024: 5, 2023: 12, ...} -
Topic → tag mapping —
packages/biblio/src/biblio/openalex/topic_tags.py(new): - Map OpenAlex topic hierarchy to biblio tag vocabulary
- Generate tags like
domain:neuroscience,field:behavioral-neuroscience,topic:sharp-wave-ripples -
Auto-populate library.yml tags from topics (opt-in, configurable)
-
Integration with autotag — modify
packages/biblio/src/biblio/autotag.py: - If OpenAlex topics exist for a citekey, use them as context/seed in the LLM prompt
-
This makes LLM tagging more accurate and avoids redundant classification
-
MCP tools:
biblio_enrich(citekeys)— run enrichment for specific citekeysbiblio_enrich_all()— run for all resolved papers-
Update
paper_context()to include topic data -
Pipeline integration — update the ingest pipeline documentation:
- After
openalex_resolve→biblio_enrich→ then graph_expand, docling, etc.
Key files¶
packages/biblio/src/biblio/openalex/openalex_resolve.py— reads resolved.jsonlpackages/biblio/src/biblio/autotag.py— LLM tagging to augment with topicspackages/biblio/src/biblio/concepts.py— LLM concepts to compare/complementpackages/biblio/src/biblio/library.py— where tags are storedpackages/biblio/src/biblio/mcp.py— MCP wrappersdocs/specs/biblio/enrichment-pipeline.mddocs/specs/biblio/concept-topic-overlap.md
Related Notes¶
- issue-arash-20260403-193112-105596.md — Direct spec for the enrichment pipeline being implemented here
- issue-arash-20260403-193037-589959.md — Audit of concept tagging vs OpenAlex topic classification — the overlap analysis this pipeline addresses
- issue-arash-20260404-014840-332631.md — Companion OpenAlex API P1 fixes needed for the resolution step this pipeline depends on
- issue-arash-20260403-210020-146442.md — Zotero push-tags writes enrichments back — downstream consumer of the topics this pipeline produces
- issue-arash-20260403-193002-484673.md — OpenAlex API audit that informs which topic fields are reliably available for enrichment