Skip to content

biblio_merge: dedup by DOI, not just citekey

Symptom (pixecog 2026-05-05): merged.bib contains four entries that are actually two papers:

Citekey DOI
Sirota_2003_Communication (from srcbib/zotero.bib) 10.1073/pnas.0437938100
sirota_2003_CommunicationNeocortex (from srcbib/src-arash-pixecog.bib) 10.1073/pnas.0437938100
Siapas_1998_Coordinated (zotero.bib) 10.1016/S0896-6273(00)80629-7
siapas_1998_CoordinatedInteractions (src-arash-pixecog.bib) 10.1016/S0896-6273(00)80629-7

Two Zotero exports of the same library produced under different Better BibTeX citekey schemes. The merge step in .projio/biblio/biblio.yml keys dedup on citekey, so DOI-identical entries pass through silently. logs/duplicate_bib_ids.txt is empty — no warning surfaced.

Expected: biblio_merge should detect cross-srcbib DOI collisions and either (a) fail loudly with a duplicates report, (b) auto-pick a canonical citekey (longest-titled? newest? configurable?) and rewrite refs, or (c) at minimum populate a duplicate_dois.txt log so the user can fix manually.

Concrete proposals: 1. Extend the merge dedup pass to bucket by doi (lowercased, normalized) before citekey, when DOI is present. 2. When DOI buckets contain >1 entry, log to .projio/biblio/logs/duplicate_dois.txt with citekeys + source files. 3. Optional: biblio_merge --strategy=doi-canonical flag that picks one citekey per DOI bucket and emits an alias map for downstream tools.

Why this matters: The whole point of DOI is to be the stable identity. Letting two citekeys for the same DOI through into merged.bib means downstream tools (rag_sync, paper_context, citekey_resolve) can return inconsistent results depending on which citekey they happen to encounter. Also breaks biblio_status per-citekey accounting (one paper looks like two).

Reference investigation: see pixecog conversation 2026-05-05 around dvorak_2021_DentateSpikes ingest.


Source context: pixecog

PixEcog (pixecog): Neuropixels and ECoG dataset and analysis

Recent commits:

37dc64d fix author: Arash Sal Moslehian → Arash Shahidi
8bcc47f calibrate_ieeg_clean: Plan B coherence-test failure + handoff
9bf2e76 spikesorting docs: park date_taskgroup mode, document UnitMatch path

README:


type: readme


Quick Start for Collaborators

Follow this checklist to get started with Pixecog documentation and workflows.

🐀 Pixecog Project — Compact Overview

Core principles

  • One immutable BIDS raw dataset (raw/) as the canonical baseline
  • Each analysis pipeline ha