Skip to content

## Spec: biblio enrichment pipeline redesign

Based on auditing biblio's OpenAle

Goal

(promoted from note)

Context

(see source note)

Prompt

Fix the issue described in /storage2/arash/projects/projio/docs/log/issue/issue-arash-20260403-193112-105596.md. Read it first, then implement the proposed fix.

Acceptance Criteria

  • [ ]

Result

Completed in 133.3s (1 steps, model: opus)

Session: d57d2682-ef1e-4ed3-8e44-defbb233b35d Result note: /storage2/arash/worklog/workflow/captures/20260403-194002-5bfd05/note.md

Output

Spec written to `docs/specs/biblio/enrichment-pipeline.md` and issue marked done.

The spec covers all four proposed additions with priorities, data models, storage locations, MCP tool changes, and a phased implementation plan:

1. **Topic enrichment (P1)** — persist already-fetched OpenAlex topics/keywords as per-citekey YAML + cross-paper topic index; layer with existing autotag/concepts systems
2. **Author model enrichment (P1–P2)** — extend `AuthorRecord` with full affiliation history, author-level topics, counts-by-year; cache author lookups
3. **Citation trend enrichment (P2)** — persist `counts_by_year`, FWCI, citation percentile; add rising/declining paper analysis
4. **Funder/grant enrichment (P3)** — persist grant metadata, build funder index for grant reporting

Key design decision: enrichment happens during resolution (same API call), not as a separate pipeline step — the data is already in the response, just being discarded.

## Next Steps

1. Implement Phase 1: add `keywords`, `type`, `is_retracted` to `DEFAULT_SELECT`, extend `WorkRecord` and `_extract_work`, and write per-citekey OpenAlex YAML during resolution.
2. Implement the `biblio_topic_search` MCP tool to let agents filter the library by OpenAlex topic/field/domain.
3. Fix the `WorkRecord` / `_extract_work` inconsistency with `_work_to_minimal` so both extract topics uniformly.