Issue arash 20260403 193112 105596
title: "## Spec: biblio enrichment pipeline redesign status: done created: 2026-04-03 updated: 2026-04-03 timestamp: 20260403-193112-105596 tags: [issue] source: agent-observation project_primary: projio capture_id: 20260403-193110-0173cb confidence: 1.0 transcript_file: /storage2/arash/worklog/workflow/captures/20260403-193110-0173cb/transcript.txt
Spec: biblio enrichment pipeline redesign¶
Based on auditing biblio's OpenAlex integration (see openalex-audit.md once available), design an improved enrichment pipeline that makes better use of OpenAlex metadata.
Current pipeline¶
srcbib/*.bib → biblio_merge → openalex_resolve (DOI/title → OpenAlex ID) → graph_expand (references/citing)
This only extracts: DOI, title, year, authors, cited_by_count, OA status, referenced_works.
Proposed pipeline additions¶
Design specs for:
-
Topic enrichment — After resolution, extract OpenAlex topics and persist them. Map to biblio's tag vocabulary where possible. Consider replacing or supplementing LLM concept extraction.
-
Author model enrichment — Currently
AuthorRecordis thin (name, affiliation, h-index). Spec what biblio should persist from OpenAlex's rich author data: - Affiliations over time
- Topic profile
- Co-author network (for lab discovery)
-
ORCID linkage
-
Citation trend enrichment — OpenAlex provides
counts_by_year. Spec how to store and surface citation trajectories (useful for identifying rising/declining papers). -
Funder/grant enrichment — OpenAlex links works to funders. Spec whether biblio should track this (useful for grant reporting).
Output¶
Write the spec to docs/specs/biblio/enrichment-pipeline.md. For each proposed addition:
- What data is available from OpenAlex
- Where it would be stored in biblio's data model
- Which MCP tools would expose it
- Priority (must-have vs nice-to-have)
Reference the discovery model spec at packages/biblio/docs/explanation/discovery.md for the overall philosophy.
Key files¶
packages/biblio/src/biblio/openalex/openalex_resolve.py(current resolution pipeline)packages/biblio/src/biblio/graph.py(current graph expansion)packages/biblio/src/biblio/author_search.py(current author model)packages/biblio/src/biblio/library.py(library ledger — where metadata lives)docs/specs/biblio/bib-architecture.md(current architecture spec)
Related Notes¶
- issue-arash-20260403-193037-589959.md — Direct prerequisite: the concept tagging vs OpenAlex topic audit informs the topic enrichment design in this spec
- issue-arash-20260403-193002-484673.md — Direct prerequisite: the OpenAlex API capabilities audit is the basis for this enrichment pipeline redesign
- issue-arash-20260402-220025-468258.md — Related spec work on bib architecture — enrichment pipeline builds on the sources vs artifacts separation
- issue-arash-20260402-220152-539138.md — biblio_compile tool is part of the pipeline this spec extends