Issue arash 20260403 201609 055675

title: "## Spec: biblio ↔ Zotero integration model status: done created: 2026-04-03 updated: 2026-04-03 timestamp: 20260403-201609-055675 tags: [issue] source: agent-observation project_primary: projio capture_id: 20260403-201607-874ec8 confidence: 1.0 transcript_file: /storage2/arash/worklog/workflow/captures/20260403-201607-874ec8/transcript.txt

Spec: biblio ↔ Zotero integration model¶

Biblio complements Zotero (collection manager) and OpenAlex (metadata network) as an enrichment engine. Design the bidirectional integration between biblio and Zotero.

Context¶

pyzotero (Python Zotero API client) supports both Zotero Web API and Zotero 7 local API
Zotero API supports read AND write: items, collections, tags, notes, attachments (PDFs)
Currently biblio only consumes Zotero via manual .bib exports — no live connection
pyzotero, zotero-schema, and zotero translators repos are now indexed in codio RAG

Scope¶

1. Zotero → biblio (pull) - Spec a biblio zotero sync command that pulls items + PDFs from a Zotero collection - Authentication model: API key for cloud, local API for Zotero 7 - Mapping: Zotero item types → BibTeX entry types (use zotero-schema for reference) - PDF handling: download attachments → bib/articles/{citekey}/ - Incremental sync: only pull new/changed items since last sync - Collection filtering: sync a specific collection, tag, or the whole library

2. biblio → Zotero (push back) - What enrichments should biblio write back? - Tags from autotag/concept extraction → Zotero tags - LLM-generated summaries → Zotero notes - Resolved DOIs/OpenAlex IDs → Zotero extra field - Reading status → Zotero tags or collections - Conflict handling: what if the user edited the item in Zotero since last sync?

3. Architecture decisions - Should biblio depend on pyzotero directly or wrap it? - Where does sync state live? (last sync timestamp, item version map) - How does this interact with the existing srcbib workflow? (coexist or replace?) - DataLad interaction: if the project is a DataLad dataset, how do synced PDFs get annexed?

4. Pool integration - Should Zotero sync target the project (bib/) or the pool? - If pool: multiple projects can share one Zotero-synced pool

Output¶

Write spec to docs/specs/biblio/zotero-integration.md. Include: - Data flow diagram (Zotero ↔ biblio ↔ OpenAlex) - Recommended commands / MCP tools - Phased implementation plan (phase 1: read-only pull, phase 2: write-back) - Dependencies and config model

Key references (indexed in RAG)¶

.projio/codio/mirrors/urschrei--pyzotero/ — pyzotero source, understand API surface
.projio/codio/mirrors/zotero--zotero-schema/ — Zotero data model
.projio/codio/mirrors/zotero--translators/ — how Zotero generates BibTeX exports
packages/biblio/docs/explanation/discovery.md — biblio's discovery model
packages/biblio/src/biblio/pool.py — pool architecture
packages/biblio/src/biblio/ingest.py — current ingest pipeline

issue-arash-20260403-193112-105596.md — Biblio enrichment pipeline redesign directly overlaps with the Zotero push-back scope (tags, summaries, resolved IDs)
issue-arash-20260403-193037-589959.md — Concept tagging output is a candidate for Zotero push-back (tags from autotag/concept extraction)
issue-arash-20260403-193002-484673.md — OpenAlex API capabilities audit informs how resolved OpenAlex IDs get written back to Zotero extra field
issue-arash-20260402-220025-468258.md — Bib architecture spec (srcbib vs artifacts) defines where Zotero-pulled items and PDFs should land
issue-arash-20260402-233201-350554.md — PDF fetch OA bug is directly relevant to PDF attachment handling in the Zotero sync pull path