Issue arash 20260403 201609 055675
title: "## Spec: biblio ↔ Zotero integration model status: done created: 2026-04-03 updated: 2026-04-03 timestamp: 20260403-201609-055675 tags: [issue] source: agent-observation project_primary: projio capture_id: 20260403-201607-874ec8 confidence: 1.0 transcript_file: /storage2/arash/worklog/workflow/captures/20260403-201607-874ec8/transcript.txt
Spec: biblio ↔ Zotero integration model¶
Biblio complements Zotero (collection manager) and OpenAlex (metadata network) as an enrichment engine. Design the bidirectional integration between biblio and Zotero.
Context¶
- pyzotero (Python Zotero API client) supports both Zotero Web API and Zotero 7 local API
- Zotero API supports read AND write: items, collections, tags, notes, attachments (PDFs)
- Currently biblio only consumes Zotero via manual .bib exports — no live connection
- pyzotero, zotero-schema, and zotero translators repos are now indexed in codio RAG
Scope¶
1. Zotero → biblio (pull)
- Spec a biblio zotero sync command that pulls items + PDFs from a Zotero collection
- Authentication model: API key for cloud, local API for Zotero 7
- Mapping: Zotero item types → BibTeX entry types (use zotero-schema for reference)
- PDF handling: download attachments → bib/articles/{citekey}/
- Incremental sync: only pull new/changed items since last sync
- Collection filtering: sync a specific collection, tag, or the whole library
2. biblio → Zotero (push back) - What enrichments should biblio write back? - Tags from autotag/concept extraction → Zotero tags - LLM-generated summaries → Zotero notes - Resolved DOIs/OpenAlex IDs → Zotero extra field - Reading status → Zotero tags or collections - Conflict handling: what if the user edited the item in Zotero since last sync?
3. Architecture decisions - Should biblio depend on pyzotero directly or wrap it? - Where does sync state live? (last sync timestamp, item version map) - How does this interact with the existing srcbib workflow? (coexist or replace?) - DataLad interaction: if the project is a DataLad dataset, how do synced PDFs get annexed?
4. Pool integration
- Should Zotero sync target the project (bib/) or the pool?
- If pool: multiple projects can share one Zotero-synced pool
Output¶
Write spec to docs/specs/biblio/zotero-integration.md. Include:
- Data flow diagram (Zotero ↔ biblio ↔ OpenAlex)
- Recommended commands / MCP tools
- Phased implementation plan (phase 1: read-only pull, phase 2: write-back)
- Dependencies and config model
Key references (indexed in RAG)¶
.projio/codio/mirrors/urschrei--pyzotero/— pyzotero source, understand API surface.projio/codio/mirrors/zotero--zotero-schema/— Zotero data model.projio/codio/mirrors/zotero--translators/— how Zotero generates BibTeX exportspackages/biblio/docs/explanation/discovery.md— biblio's discovery modelpackages/biblio/src/biblio/pool.py— pool architecturepackages/biblio/src/biblio/ingest.py— current ingest pipeline
Related Notes¶
- issue-arash-20260403-193112-105596.md — Biblio enrichment pipeline redesign directly overlaps with the Zotero push-back scope (tags, summaries, resolved IDs)
- issue-arash-20260403-193037-589959.md — Concept tagging output is a candidate for Zotero push-back (tags from autotag/concept extraction)
- issue-arash-20260403-193002-484673.md — OpenAlex API capabilities audit informs how resolved OpenAlex IDs get written back to Zotero extra field
- issue-arash-20260402-220025-468258.md — Bib architecture spec (srcbib vs artifacts) defines where Zotero-pulled items and PDFs should land
- issue-arash-20260402-233201-350554.md — PDF fetch OA bug is directly relevant to PDF attachment handling in the Zotero sync pull path