biblio_extract: YAML parsing fails when LLM output contains colons in summary text¶
biblio_extract fails ~20% of the time because the LLM generates unescaped colons in summary values, breaking YAML parsing. Example error:
mapping values are not allowed here
in "<unicode string>", line 25, column 58:
...racterizes the temporal sequence: SPWRs cluster before δ-waves ...
Affected papers (pixecog session)¶
- peyrache_2011_InhibitionRecruitment — failed 3x consecutively on retry
- sirota_2003_CommunicationNeocortex — failed once, succeeded on retry
Fix options (in order of robustness)¶
- Post-process LLM output: wrap all string values in quotes before YAML parse
- Use
yaml.safe_loadwith a pre-processing step that escapes bare colons in values - Switch to JSON output format in the LLM prompt (more reliably parseable)
- Add retry logic with a "please quote your YAML string values" prompt suffix
Source context: pixecog¶
PixEcog (pixecog): Neuropixels and ECoG dataset and analysis
Recent commits:
8dc0d9d Pipeline docs: gitignore docs/pipelines/, relocate hand-authored files
96cd1ec Refactor sharpwaveripple/contracts: extract generic helpers to utils/io, remove pipelines __init__.py
36f9326 Add result note directory and sample note
README:
type: readme
Quick Start for Collaborators¶
Follow this checklist to get started with Pixecog documentation and workflows.
🐀 Pixecog Project — Compact Overview¶
Core principles
- One immutable BIDS raw dataset (
raw/) as the canonical baseline - Each analysis pipeline ha
Related Notes¶
- issue-arash-20260409-231618-516346.md — Both are biblio tool failures from the same pixecog session — biblio_enrich silent failure is a companion issue
- issue-arash-20260409-231641-242830.md — Same pixecog biblio session; biblio_pdf_fetch_oa download failure is a co-occurring issue
- issue-arash-20260409-231546-838942.md — Same pixecog biblio session; missing biblio_openalex_resolve blocks the same extraction workflow
- issue-arash-20260409-231703-392876.md — Same pixecog biblio session; citekey diacritic mangling is another LLM/text-processing failure in the biblio pipeline
- issue-arash-20260404-021642-474901.md — Biblio-glutton study directly relates to bibliographic matching robustness — context for improving biblio_extract reliability