Skip to content

biblio_extract: YAML parsing fails when LLM output contains colons in summary text

biblio_extract fails ~20% of the time because the LLM generates unescaped colons in summary values, breaking YAML parsing. Example error:

mapping values are not allowed here
  in "<unicode string>", line 25, column 58:
    ...racterizes the temporal sequence: SPWRs cluster before δ-waves ...

Affected papers (pixecog session)

  • peyrache_2011_InhibitionRecruitment — failed 3x consecutively on retry
  • sirota_2003_CommunicationNeocortex — failed once, succeeded on retry

Fix options (in order of robustness)

  1. Post-process LLM output: wrap all string values in quotes before YAML parse
  2. Use yaml.safe_load with a pre-processing step that escapes bare colons in values
  3. Switch to JSON output format in the LLM prompt (more reliably parseable)
  4. Add retry logic with a "please quote your YAML string values" prompt suffix

Source context: pixecog

PixEcog (pixecog): Neuropixels and ECoG dataset and analysis

Recent commits:

8dc0d9d Pipeline docs: gitignore docs/pipelines/, relocate hand-authored files
96cd1ec Refactor sharpwaveripple/contracts: extract generic helpers to utils/io, remove pipelines __init__.py
36f9326 Add result note directory and sample note

README:


type: readme


Quick Start for Collaborators

Follow this checklist to get started with Pixecog documentation and workflows.

🐀 Pixecog Project — Compact Overview

Core principles

  • One immutable BIDS raw dataset (raw/) as the canonical baseline
  • Each analysis pipeline ha