Manuscript Subpackage Design Spec¶
Status: Draft
Date: 2026-03-31
Package: notio.manuscript
Location: packages/notio/src/notio/manuscript/
Motivation¶
Academic manuscripts are fundamentally ordered sequences of sections — the same primitive that notio already manages as notes with frontmatter, templates, and indexing. Rather than creating a sixth standalone subsystem, manuscript functionality lives as a notio subpackage that reuses the existing note infrastructure and adds:
- A
ManuscriptSpecYAML schema for declaring manuscripts - Section ordering and assembly (concatenation by
orderfield) - Pandoc-based rendering with
--citeprocfor bibliography integration - Figure insertion bridging figio assets into the manuscript
The full paper pipeline is: figio (figures) + biblio (citations) + notio/manuscript (assembly + render) = paper.
Design Principles¶
- Sections are notes. Each manuscript section is a Markdown file with YAML frontmatter, created and queried through standard notio mechanisms.
- ManuscriptSpec is the manifest. A single YAML file declares the section order, bibliography, CSL style, figure mappings, and render settings.
- Assembly is concatenation. No Lua transclusion filters — sections are
concatenated in
orderfield sequence with optional heading-level adjustment. - Rendering is pandoc. The subpackage shells out to pandoc with
--citeproc, passing the assembled Markdown and bibliography. - Filesystem-backed. All state lives in files; no database.
ManuscriptSpec Schema¶
The manifest lives at a user-chosen path (conventionally
docs/manuscript/<name>/manuscript.yml).
# manuscript.yml
name: my-paper # unique identifier
title: "My Paper Title"
authors:
- name: "Alice Smith"
affiliation: "University X"
email: alice@example.com
- name: "Bob Jones"
affiliation: "Institute Y"
# Section ordering — each entry maps to a note file
sections:
- key: abstract
path: sections/abstract.md # relative to manuscript.yml parent
order: 1
heading_level: 0 # 0 = no heading adjustment
- key: introduction
path: sections/introduction.md
order: 2
- key: methods
path: sections/methods.md
order: 3
- key: results
path: sections/results.md
order: 4
- key: discussion
path: sections/discussion.md
order: 5
- key: references
path: sections/references.md
order: 6
heading_level: 0
# Bibliography — inherits from .projio/render.yml by default
# Override per-manuscript if needed; compiled.bib is the project-wide default
bibliography:
bib_file: .projio/render/compiled.bib # project compiled bib (default from render.yml)
csl: .projio/render/csl/apa.csl # CSL style file (default from render.yml)
# Figures (figio integration)
figures:
dir: figures/ # directory for figure specs/outputs
mappings: # optional: map figure IDs to labels
- id: fig-overview
label: "Figure 1"
caption: "System overview"
spec: figures/overview.figurespec.yaml
- id: fig-results
label: "Figure 2"
caption: "Main results"
spec: figures/results.figurespec.yaml
# Render settings
render:
output_dir: _build/ # relative to manuscript.yml parent
formats: [pdf, docx, html] # pandoc output formats
template: null # optional pandoc template
pandoc_args: [] # extra pandoc CLI arguments
variables: {} # pandoc template variables
Section Notes¶
Each section file is a standard Markdown file. Sections may optionally include YAML frontmatter for metadata, but it is stripped during assembly — only the body content is concatenated.
---
title: "Introduction"
order: 2
manuscript: my-paper
tags: [manuscript, section]
---
# Introduction
The study of neural oscillations...
Frontmatter fields used by the manuscript system:
| Field | Type | Description |
|---|---|---|
title |
str | Section title |
order |
int | Sort position within manuscript |
manuscript |
str | Manuscript name (for cross-referencing) |
status |
str | Draft status: draft, review, final |
Module Structure¶
src/notio/manuscript/
├── __init__.py # Public API exports
├── schema.py # ManuscriptSpec dataclass + YAML loading + render.yml merging
├── assembly.py # Section ordering, frontmatter stripping, concatenation
├── render.py # Pandoc subprocess invocation
├── figures.py # Figure reference resolution, figio bridge
├── validate.py # Section/citation/figure/pandoc validation
└── master.py # Dual-marker master documents (Lua transclusion for plans/specs)
schema.py¶
Dataclass-based schema loaded from YAML. No pydantic dependency — uses
dataclasses + a from_yaml() classmethod that parses with the stdlib-
compatible yaml library (PyYAML, already a transitive dependency).
@dataclass
class Author:
name: str
affiliation: str = ""
email: str = ""
@dataclass
class SectionEntry:
key: str
path: str
order: int
heading_level: int = 1 # default: keep as-is
@dataclass
class BibConfig:
bib_file: str = ""
csl: str = ""
@dataclass
class FigureMapping:
id: str
label: str = ""
caption: str = ""
spec: str = ""
@dataclass
class FiguresConfig:
dir: str = "figures/"
mappings: list[FigureMapping] = field(default_factory=list)
@dataclass
class RenderConfig:
output_dir: str = "_build/"
formats: list[str] = field(default_factory=lambda: ["pdf"])
template: str | None = None
pandoc_args: list[str] = field(default_factory=list)
variables: dict[str, str] = field(default_factory=dict)
@dataclass
class ManuscriptSpec:
name: str
title: str = ""
authors: list[Author] = field(default_factory=list)
sections: list[SectionEntry] = field(default_factory=list)
bibliography: BibConfig = field(default_factory=BibConfig)
figures: FiguresConfig = field(default_factory=FiguresConfig)
render: RenderConfig = field(default_factory=RenderConfig)
@classmethod
def from_yaml(cls, path: Path) -> "ManuscriptSpec": ...
@classmethod
def from_dict(cls, data: dict, base_dir: Path) -> "ManuscriptSpec": ...
assembly.py¶
def load_sections(spec: ManuscriptSpec, base_dir: Path) -> list[Section]:
"""Load and order section files. Returns Section objects sorted by order."""
def strip_frontmatter(text: str) -> str:
"""Remove YAML frontmatter from Markdown text."""
def adjust_headings(text: str, level_offset: int) -> str:
"""Shift Markdown heading levels by offset (e.g., +1 makes # into ##)."""
def assemble(spec: ManuscriptSpec, base_dir: Path) -> str:
"""Concatenate sections in order into a single Markdown document.
Strips frontmatter, adjusts heading levels, inserts section breaks."""
def write_assembled(spec: ManuscriptSpec, base_dir: Path) -> Path:
"""Assemble and write to output_dir/assembled.md. Returns path."""
render.py¶
def find_pandoc() -> Path | None:
"""Locate pandoc binary via shutil.which."""
def build_pandoc_command(
input_path: Path,
output_path: Path,
fmt: str,
spec: ManuscriptSpec,
base_dir: Path,
) -> list[str]:
"""Build the pandoc CLI argument list."""
def render(spec: ManuscriptSpec, base_dir: Path, *, formats: list[str] | None = None) -> list[Path]:
"""Assemble then render via pandoc. Returns list of output paths."""
def render_single(input_path: Path, output_path: Path, fmt: str,
bib_file: Path | None, csl: Path | None,
template: Path | None, extra_args: list[str],
variables: dict[str, str]) -> Path:
"""Render a single format. Raises on pandoc failure."""
figures.py¶
def resolve_figure_paths(spec: ManuscriptSpec, base_dir: Path) -> dict[str, Path]:
"""Map figure IDs to their built output paths (SVG/PDF).
Checks figio _build/ directories for latest outputs."""
def insert_figure_references(text: str, figures: dict[str, Path], base_dir: Path) -> str:
"""Replace figure placeholders (e.g., ) with resolved paths."""
def validate_figures(spec: ManuscriptSpec, base_dir: Path) -> list[str]:
"""Check that all referenced figures exist. Returns list of missing figure IDs."""
Public API¶
Exported from notio.manuscript.__init__:
from notio.manuscript.schema import ManuscriptSpec
from notio.manuscript.assembly import assemble, write_assembled
from notio.manuscript.render import render
from notio.manuscript.figures import resolve_figure_paths, validate_figures
CLI Integration¶
New subcommand group under the notio CLI:
notio manuscript init <name> [--dir PATH] # scaffold manuscript.yml + sections/
notio manuscript status <spec> # show section completion status
notio manuscript assemble <spec> # concatenate sections → assembled.md
notio manuscript build <spec> [--format FMT] # assemble + pandoc render
notio manuscript validate <spec> # check sections, bib, figures
MCP Tools¶
Registered in notio's MCP server (mcp/server.py) and wrapped by projio at
src/projio/mcp/manuscripto.py:
| Tool | Description |
|---|---|
manuscript_init(name, dir?) |
Scaffold manuscript.yml and section files |
manuscript_list() |
List manuscripts in project |
manuscript_status(spec_path) |
Section count, word counts, missing sections |
manuscript_assemble(spec_path) |
Concatenate sections → assembled.md |
manuscript_build(spec_path, formats?) |
Full pipeline: assemble + render |
manuscript_validate(spec_path) |
Check sections, bibliography, figures |
manuscript_figure_insert(spec_path, figure_id, section_key) |
Insert figure reference into section |
Integration Points¶
biblio¶
bibliography.bib_filein the spec points to a.bibfile managed by biblio- Pandoc's
--citeprocresolves@citekeyreferences against this file biblio_merge()should be run beforemanuscript_buildto ensure the bib is up to date
figio¶
figures.mappings[].specpoints to*.figurespec.yamlfilesresolve_figure_paths()looks in figio's_build/for rendered outputs- Figure placeholders in section text use
syntax manuscript_validatechecks that all mapped figures have been built
notio core¶
- Section files can be created via
notio note sectionif asectionnote type is configured, or created directly as plain Markdown files seriesfield can group sections belonging to the same manuscript- Existing query functions (
list_notes,search_notes) work on section files
Rendering Pipeline¶
sections/*.md
│
▼
assemble() ──→ assembled.md (frontmatter stripped, headings adjusted, figures resolved)
│
▼
render() ──→ pandoc --citeproc --bibliography=refs.bib --csl=style.csl
│
▼
_build/{name}.{pdf,docx,html}
Error Handling¶
- Missing section file →
FileNotFoundErrorwith descriptive message - Missing pandoc binary →
RuntimeError("pandoc not found") - Pandoc failure →
RuntimeErrorwith stderr captured - Missing bibliography → warning (rendering proceeds without citeproc)
- Missing figure → warning in validate, placeholder left in assembled text
Dependencies¶
- Required: PyYAML (already a transitive dep via other notio features)
- Optional: pandoc (system binary, checked at render time)
- No new Python package dependencies
Testing Strategy¶
- Unit tests for schema loading (
from_yaml,from_dict) - Unit tests for assembly (ordering, frontmatter stripping, heading adjustment)
- Unit tests for figure reference resolution
- Integration test for full pipeline (requires pandoc fixture)
- All tests under
packages/notio/tests/test_manuscript.py
Agentic Tools¶
Manuscript MCP tools are split into priority tiers. P0 tools ship first and cover the core agent workflow; later tiers add validation, diffing, and journal-awareness.
Ontology¶
ManuscriptSpec ──┬── SectionEntry* ──── section .md file (content)
│
├── BibConfig ──────── .bib file (biblio)
│
├── FiguresConfig ──── FigureMapping* ──── FigureSpec YAML (figio)
│
└── RenderConfig ───── pandoc settings
Interactions:
- RAG (indexio) —
rag_queryfor literature/code context per section - biblio — citation resolution, fulltext status,
.bibparsing - figio — figure build status, spec mtime comparison
- notio —
note_searchfor related ideas/notes
Lifecycle¶
scaffold → draft → populate → validate → render → review → submit
│ │ │ │ │ │
init section cite/fig cite_check build diff/journal
context insert overview
P0 — Core agent tools¶
manuscript_section_context(name, section)¶
One-call context gathering for drafting a section.
| Parameter | Type | Description |
|---|---|---|
name |
str | Manuscript name |
section |
str | Section key (e.g. introduction) |
Returns:
| Field | Type | Source |
|---|---|---|
current_content |
str | Section file text (stripped frontmatter) |
rag_hits |
list[dict] | Top RAG results for section title (indexio rag_query) |
figures |
list[dict] | Figure mappings for this section + build status (figio) |
citations_used |
list[str] | [@citekey] patterns found in section text |
related_notes |
list[dict] | Notes matching section title (notio note_search) |
word_count |
int | Current word count of section body |
manuscript_overview(name)¶
Rich manuscript dashboard — superset of manuscript_status.
| Parameter | Type | Description |
|---|---|---|
name |
str | Manuscript name |
Returns:
| Field | Type | Source |
|---|---|---|
sections |
list[dict] | Per-section: key, title, word_count, citation_count, figure_ref_count, status |
total_words |
int | Sum of all section word counts |
total_citations |
int | Unique citekeys across all sections |
total_figures |
int | Number of figure mappings |
missing_citations |
list[str] | Citekeys in text but not in .bib |
missing_figures |
list[str] | Figure IDs in mappings without built outputs |
stale_figures |
list[str] | Figure specs newer than built outputs (mtime comparison) |
bibliography |
dict | path, entry_count, papers_with_fulltext |
P1 — Validation tools¶
manuscript_cite_check(name)¶
Citation-focused cross-subsystem validation.
| Parameter | Type | Description |
|---|---|---|
name |
str | Manuscript name |
Returns:
| Field | Type | Description |
|---|---|---|
found |
list[dict] | {citekey, sections, has_fulltext} for each resolved citation |
missing |
list[dict] | {citekey, sections} for unresolved citations |
suggestions |
list[str] | Actionable hints (e.g. "run biblio_docling on X") |
Cross-checks: section text → .bib file → biblio docling extraction status.
manuscript_figure_build_all(name)¶
Batch figure build via figio.
| Parameter | Type | Description |
|---|---|---|
name |
str | Manuscript name |
Returns: list[{figure_id, status: "built"|"failed"|"skipped", path?, error?}]
Iterates figure mappings with spec paths and invokes figio build on each.
P2 — Diff and suggestion tools¶
manuscript_diff(name)¶
Compare current section content against last _build/assembled.md snapshot.
Detects section-level changes, word count deltas, and citation drift.
| Parameter | Type | Description |
|---|---|---|
name |
str | Manuscript name |
Returns:
| Field | Type | Description |
|---|---|---|
sections_changed |
list[str] | Section keys whose content differs from the last build |
sections_added |
list[str] | Section keys present now but absent in last build |
sections_removed |
list[str] | Section keys in last build but absent now |
word_count_before |
int | Word count of last _build/assembled.md |
word_count_after |
int | Word count of current assembled text |
word_count_delta |
int | after - before |
citations_added |
list[str] | Citekeys in current text but not in last build |
citations_removed |
list[str] | Citekeys in last build but not in current text |
has_previous_build |
bool | Whether _build/assembled.md existed |
Uses difflib for comparison — no external dependencies. When no previous
build exists, all current sections are reported as added.
manuscript_cite_suggest(name, section, claim?)¶
Search the RAG biblio corpus for papers relevant to a section or claim text. Returns ranked citekeys with relevance scores and snippets.
| Parameter | Type | Description |
|---|---|---|
name |
str | Manuscript name |
section |
str | Section key (e.g. introduction) |
claim |
str? | Optional claim text to search for (overrides section content) |
Returns:
| Field | Type | Description |
|---|---|---|
suggestions |
list[dict] | Ranked citation suggestions |
suggestions[].citekey |
str | Citekey if extractable from RAG source path |
suggestions[].relevance_score |
float | RAG similarity score |
suggestions[].snippet |
str | Matching text snippet from corpus |
suggestions[].source |
str | Source document path |
query_used |
str | The text that was actually sent to RAG |
section |
str | Section key queried |
Degrades gracefully: returns {error} when RAG or biblio corpus is unavailable.
P3 — Journal awareness¶
manuscript_journal_check(name, journal?)¶
Compare manuscript metrics against built-in journal target profiles.
| Parameter | Type | Description |
|---|---|---|
name |
str | Manuscript name |
journal |
str? | Journal key (e.g. nature, plos-one, elife, biorxiv). When omitted, lists available profiles. |
Returns:
| Field | Type | Description |
|---|---|---|
journal |
str | Journal key used |
journal_name |
str | Full journal name |
word_count |
dict | {current, limit, over_by?} — current total vs journal limit |
figure_count |
dict | {current, limit, over_by?} — current total vs journal limit |
required_sections |
list[dict] | {key, required, present} — sections the journal expects, with present/missing status |
csl_match |
dict | {expected, configured, match} — whether the configured CSL matches the target journal |
warnings |
list[str] | Actionable issues (over word limit, missing required sections, CSL mismatch, etc.) |
available_profiles |
list[str] | Only returned when journal is omitted |
Built-in profiles are minimal dicts — extensible without a new schema. Initial
set: nature, plos-one, elife, biorxiv.
Future Considerations¶
- Pandoc filters: Support for custom Lua/Python pandoc filters in render config
- LaTeX templates: First-class support for journal-specific LaTeX templates
- Collaborative editing: Section locking/status for multi-author workflows
- Word count targets: Per-section word count goals with progress tracking