## Spec: biblio enrichment pipeline redesign¶
Based on auditing biblio's OpenAle
Goal¶
(promoted from note)
Context¶
(see source note)
Prompt¶
Fix the issue described in /storage2/arash/projects/projio/docs/log/issue/issue-arash-20260403-193112-105596.md. Read it first, then implement the proposed fix.
Acceptance Criteria¶
- [ ]
Result¶
Completed in 133.3s (1 steps, model: opus)
Session: d57d2682-ef1e-4ed3-8e44-defbb233b35d
Result note: /storage2/arash/worklog/workflow/captures/20260403-194002-5bfd05/note.md
Output¶
Spec written to `docs/specs/biblio/enrichment-pipeline.md` and issue marked done.
The spec covers all four proposed additions with priorities, data models, storage locations, MCP tool changes, and a phased implementation plan:
1. **Topic enrichment (P1)** — persist already-fetched OpenAlex topics/keywords as per-citekey YAML + cross-paper topic index; layer with existing autotag/concepts systems
2. **Author model enrichment (P1–P2)** — extend `AuthorRecord` with full affiliation history, author-level topics, counts-by-year; cache author lookups
3. **Citation trend enrichment (P2)** — persist `counts_by_year`, FWCI, citation percentile; add rising/declining paper analysis
4. **Funder/grant enrichment (P3)** — persist grant metadata, build funder index for grant reporting
Key design decision: enrichment happens during resolution (same API call), not as a separate pipeline step — the data is already in the response, just being discarded.
## Next Steps
1. Implement Phase 1: add `keywords`, `type`, `is_retracted` to `DEFAULT_SELECT`, extend `WorkRecord` and `_extract_work`, and write per-citekey OpenAlex YAML during resolution.
2. Implement the `biblio_topic_search` MCP tool to let agents filter the library by OpenAlex topic/field/domain.
3. Fix the `WorkRecord` / `_extract_work` inconsistency with `_work_to_minimal` so both extract topics uniformly.