biblio_merge: dedup by DOI, not just citekey¶
Symptom (pixecog 2026-05-05): merged.bib contains four entries that are actually two papers:
| Citekey | DOI |
|---|---|
Sirota_2003_Communication (from srcbib/zotero.bib) |
10.1073/pnas.0437938100 |
sirota_2003_CommunicationNeocortex (from srcbib/src-arash-pixecog.bib) |
10.1073/pnas.0437938100 |
Siapas_1998_Coordinated (zotero.bib) |
10.1016/S0896-6273(00)80629-7 |
siapas_1998_CoordinatedInteractions (src-arash-pixecog.bib) |
10.1016/S0896-6273(00)80629-7 |
Two Zotero exports of the same library produced under different Better BibTeX citekey schemes. The merge step in .projio/biblio/biblio.yml keys dedup on citekey, so DOI-identical entries pass through silently. logs/duplicate_bib_ids.txt is empty — no warning surfaced.
Expected: biblio_merge should detect cross-srcbib DOI collisions and either (a) fail loudly with a duplicates report, (b) auto-pick a canonical citekey (longest-titled? newest? configurable?) and rewrite refs, or (c) at minimum populate a duplicate_dois.txt log so the user can fix manually.
Concrete proposals:
1. Extend the merge dedup pass to bucket by doi (lowercased, normalized) before citekey, when DOI is present.
2. When DOI buckets contain >1 entry, log to .projio/biblio/logs/duplicate_dois.txt with citekeys + source files.
3. Optional: biblio_merge --strategy=doi-canonical flag that picks one citekey per DOI bucket and emits an alias map for downstream tools.
Why this matters: The whole point of DOI is to be the stable identity. Letting two citekeys for the same DOI through into merged.bib means downstream tools (rag_sync, paper_context, citekey_resolve) can return inconsistent results depending on which citekey they happen to encounter. Also breaks biblio_status per-citekey accounting (one paper looks like two).
Reference investigation: see pixecog conversation 2026-05-05 around dvorak_2021_DentateSpikes ingest.
Source context: pixecog¶
PixEcog (pixecog): Neuropixels and ECoG dataset and analysis
Recent commits:
37dc64d fix author: Arash Sal Moslehian → Arash Shahidi
8bcc47f calibrate_ieeg_clean: Plan B coherence-test failure + handoff
9bf2e76 spikesorting docs: park date_taskgroup mode, document UnitMatch path
README:
type: readme
Quick Start for Collaborators¶
Follow this checklist to get started with Pixecog documentation and workflows.
🐀 Pixecog Project — Compact Overview¶
Core principles
- One immutable BIDS raw dataset (
raw/) as the canonical baseline - Each analysis pipeline ha