Skip to content

Issue arash 20260404 021609 691872


title: "## Spec: GROBID citation context — beyond simple citation networks status: done created: 2026-04-04 updated: 2026-04-04 timestamp: 20260404-021609-691872 tags: [issue] source: agent-observation project_primary: projio capture_id: 20260404-021607-6de75f confidence: 1.0 transcript_file: /storage2/arash/worklog/workflow/captures/20260404-021607-6de75f/transcript.txt


Spec: GROBID citation context — beyond simple citation networks

biblio currently uses GROBID for header extraction and reference parsing, producing a flat list of references per paper. But GROBID can also extract citation contexts — the sentences where a reference is cited. This enables "paper X cites paper Y in context C" relationships, which are far richer than simple citation edges.

Research questions

  1. What does GROBID provide for citation context?
  2. TEI XML <ref> elements have target attributes linking to bibliography entries
  3. These refs are embedded in the full-text paragraphs — the surrounding text IS the citation context
  4. How does grobid-client-python expose this?
  5. What does the TEI structure look like for inline citations?

  6. How could biblio use citation contexts?

  7. Enrich the citation graph: instead of just "A cites B", store "A cites B saying '...sharp-wave ripples were shown to...'"
  8. RAG queries could return citation contexts as evidence
  9. Manuscript writing: auto-generate citation sentences based on how others cited the same paper
  10. Literature review: cluster papers by how they cite a common reference

  11. What's the data model?

  12. Where to store citation contexts? bib/derivatives/grobid/{citekey}/contexts.json?
  13. Schema: {citing_citekey, cited_citekey, context_text, section, position}
  14. How to extract from existing TEI XML that biblio already generates

  15. What does biblio-glutton add?

  16. biblio-glutton does high-performance bibliographic matching
  17. Could replace or augment biblio's CrossRef-based resolve_doi_by_title
  18. Matching unresolved GROBID references to DOIs

Output

Write spec to docs/specs/biblio/citation-context.md covering: - GROBID TEI XML structure for inline citations (with examples from the indexed repo) - Proposed data model for citation contexts in biblio - Integration with existing graph.py and reference resolution - MCP tools to query citation contexts - Priority assessment: must-have vs nice-to-have

Key references (indexed in RAG)

  • .projio/codio/mirrors/grobidorg--grobid/ — GROBID source, TEI output format
  • .projio/codio/mirrors/grobidorg--grobid-client-python/ — Python client API
  • .projio/codio/mirrors/kermitt2--biblio-glutton/ — bibliographic matching
  • packages/biblio/src/biblio/grobid.py — current GROBID integration
  • packages/biblio/src/biblio/graph.py — current citation graph
  • packages/biblio/src/biblio/ref_md.py — reference-markdown standardization