biblio: pool-aware derivative resolution¶
Goal¶
Stop regenerating docling/grobid/openalex outputs that already exist in the shared pool. This saves compute time, API calls, and storage — the shared pool at /storage/share/sirocampus/bib/ already has derivatives for most papers.
Context¶
- Pool PDF search works (
pool.searchin config, used byfetch_pdfs_oa) - Derivatives are NOT pool-aware — every project regenerates them independently
- SiroCampus pool has docling outputs for 250+ papers
- Running
docling batchon a new project re-processes all of them (~45s each = 3+ hours wasted)
Two-tier model¶
| Tier | Examples | Resolution |
|---|---|---|
| Shared | PDFs, docling, grobid, openalex | Check pool first, symlink if found |
| Project-local | summaries, reviews, reading lists, RAG | Always generate per-project |
Acceptance Criteria¶
- [ ]
resolve_derivative(cfg, citekey, "docling")checks pool paths - [ ]
run_docling_for_keyskips when pool output exists (unlessforce=True) - [ ]
find_pending_doclingexcludes keys with pool derivatives - [ ]
paper_contextreportsderivative_source: "pool" | "local" | "missing" - [ ] Batch operations show pool skip counts
- [ ] Config:
pool.derivatives: true(default enabled)
Result¶
(Filled in after execution)