Skip to content

Issue arash 20260403 201609 055675


title: "## Spec: biblio ↔ Zotero integration model status: done created: 2026-04-03 updated: 2026-04-03 timestamp: 20260403-201609-055675 tags: [issue] source: agent-observation project_primary: projio capture_id: 20260403-201607-874ec8 confidence: 1.0 transcript_file: /storage2/arash/worklog/workflow/captures/20260403-201607-874ec8/transcript.txt


Spec: biblio ↔ Zotero integration model

Biblio complements Zotero (collection manager) and OpenAlex (metadata network) as an enrichment engine. Design the bidirectional integration between biblio and Zotero.

Context

  • pyzotero (Python Zotero API client) supports both Zotero Web API and Zotero 7 local API
  • Zotero API supports read AND write: items, collections, tags, notes, attachments (PDFs)
  • Currently biblio only consumes Zotero via manual .bib exports — no live connection
  • pyzotero, zotero-schema, and zotero translators repos are now indexed in codio RAG

Scope

1. Zotero → biblio (pull) - Spec a biblio zotero sync command that pulls items + PDFs from a Zotero collection - Authentication model: API key for cloud, local API for Zotero 7 - Mapping: Zotero item types → BibTeX entry types (use zotero-schema for reference) - PDF handling: download attachments → bib/articles/{citekey}/ - Incremental sync: only pull new/changed items since last sync - Collection filtering: sync a specific collection, tag, or the whole library

2. biblio → Zotero (push back) - What enrichments should biblio write back? - Tags from autotag/concept extraction → Zotero tags - LLM-generated summaries → Zotero notes - Resolved DOIs/OpenAlex IDs → Zotero extra field - Reading status → Zotero tags or collections - Conflict handling: what if the user edited the item in Zotero since last sync?

3. Architecture decisions - Should biblio depend on pyzotero directly or wrap it? - Where does sync state live? (last sync timestamp, item version map) - How does this interact with the existing srcbib workflow? (coexist or replace?) - DataLad interaction: if the project is a DataLad dataset, how do synced PDFs get annexed?

4. Pool integration - Should Zotero sync target the project (bib/) or the pool? - If pool: multiple projects can share one Zotero-synced pool

Output

Write spec to docs/specs/biblio/zotero-integration.md. Include: - Data flow diagram (Zotero ↔ biblio ↔ OpenAlex) - Recommended commands / MCP tools - Phased implementation plan (phase 1: read-only pull, phase 2: write-back) - Dependencies and config model

Key references (indexed in RAG)

  • .projio/codio/mirrors/urschrei--pyzotero/ — pyzotero source, understand API surface
  • .projio/codio/mirrors/zotero--zotero-schema/ — Zotero data model
  • .projio/codio/mirrors/zotero--translators/ — how Zotero generates BibTeX exports
  • packages/biblio/docs/explanation/discovery.md — biblio's discovery model
  • packages/biblio/src/biblio/pool.py — pool architecture
  • packages/biblio/src/biblio/ingest.py — current ingest pipeline