## Audit: biblio concept tagging vs OpenAlex topic classification — overlap anal¶
Goal¶
(promoted from note)
Context¶
(see source note)
Prompt¶
Fix the issue described in /storage2/arash/projects/projio/docs/log/issue/issue-arash-20260403-193037-589959.md. Read it first, then implement the proposed fix.
Acceptance Criteria¶
- [ ]
Result¶
Completed in 151.8s (1 steps, model: opus)
Session: 8d0b7222-f231-4935-aa1c-e04819a56b0a
Result note: /storage2/arash/worklog/workflow/captures/20260403-193715-8bcc14/note.md
Output¶
Done. The analysis is at `docs/specs/biblio/concept-topic-overlap.md`.
**Key finding:** The systems are **complementary, not redundant**. OpenAlex provides free hierarchical domain classification (~4,500 topics in a 4-level tree), while biblio's LLM extraction provides fine-grained details OpenAlex can't — named datasets, specific metrics, techniques, and project-specific vocabulary. The only real overlap is at the broad domain level.
**Recommendation:** Layer them — OpenAlex topics as free baseline (layer 0), autotag propagation (layer 1), autotag LLM (layer 2), concept extraction (layer 3). Biblio already fetches `topics` from OpenAlex in the resolver, so Phase 1 requires no new API calls.
## Next Steps
1. Implement Phase 1: extract and persist OpenAlex topics from the existing resolver data into `bib/derivatives/openalex/{citekey}.yml` so they're available for downstream tools.
2. Add `openalex_mappings` to `tag_vocab.yml` to map OpenAlex subfield IDs to the autotag controlled vocabulary, enabling zero-cost topic-to-tag conversion.
3. Add OpenAlex topic data to the concept index and RAG corpus so `rag_query` and `search_concepts` can surface OpenAlex-derived classifications alongside LLM-extracted concepts.