Skip to content

## Audit: biblio concept tagging vs OpenAlex topic classification — overlap anal

Audit: biblio concept tagging vs OpenAlex topic classification — overlap analysis

Biblio has LLM-based concept extraction (packages/biblio/src/biblio/concepts.py) that uses Claude to extract methods, datasets, metrics, domains, techniques from papers. OpenAlex has its own concept/topic classification system. Determine whether these are redundant and whether biblio should use OpenAlex topics instead of (or alongside) the LLM approach.

Scope

  1. OpenAlex topic system — Use RAG to query the indexed openalex-concept-tagging repo and openalex-docs to understand:
  2. What is OpenAlex's topic/concept hierarchy? (levels, granularity)
  3. How are topics assigned to works? (ML model? rule-based?)
  4. What fields are returned per work? (topics, concepts, keywords)
  5. Coverage: do all works have topics?

  6. Biblio concept system — Read packages/biblio/src/biblio/concepts.py to understand:

  7. What categories does biblio extract? (methods, datasets, metrics, domains, techniques)
  8. How are they extracted? (LLM prompt → structured output)
  9. Where are they stored? (bib/derivatives/concepts/)
  10. How are they used? (concept index, concept search)

  11. Overlap analysis — Compare:

  12. Do OpenAlex topics cover the same ground as biblio concepts?
  13. Are biblio's LLM-extracted concepts more specific/useful for research workflows?
  14. Cost: LLM calls per paper vs free metadata from OpenAlex
  15. Could biblio use OpenAlex topics as a baseline and LLM concepts as enrichment?

Output

Write findings to docs/specs/biblio/concept-topic-overlap.md with: - Side-by-side comparison table - Recommendation: replace, complement, or keep separate - If complement: how to integrate OpenAlex topics into biblio's data model

Key files

  • packages/biblio/src/biblio/concepts.py
  • packages/biblio/src/biblio/autotag.py (also uses LLM for tagging)
  • .projio/codio/mirrors/ourresearch--openalex-concept-tagging/ (indexed in RAG)
  • .projio/codio/mirrors/ourresearch--openalex-docs/ (indexed in RAG)