Skip to content

docs_collect: overview.md fix + two-phase export/collect architecture

Bug fixes (shipped)

1. overview.md fallback — fixed

Root cause: overview.md was unconditionally mapped to target/index.md regardless of whether a source index.md existed. When both existed, index.md was copied first (alphabetical sort order), then overview.md tried to overwrite the same destination but _is_stale() returned False (target mtime was newer), silently dropping overview content.

Fix: Added has_source_index check in DocsCollector. overview.md is only mapped to index.md when no source index.md exists. When both exist, overview.md is collected as a separate page alongside the index.

2. Source-tree immutability — confirmed safe

The code never writes to code/pipelines/{flow}/docs/. All output goes to docs/pipelines/{flow}/ (the output directory). The source tree is treated as read-only during collection by design.

Two-phase architecture (shipped)

Refactored docs_collect() from a monolithic function into a two-phase export-then-collect pipeline, following Sphinx-inspired principles from idea note idea-arash-20260409-135130-379286.

Phase 1: Export — component-owned artifact generation

Export functions generate artifacts into {flow}/.build/:

  • export_dag(ctx) — snakemake rulegraph + graphviz → .build/dag.svg
  • export_notebooks(ctx) — nbconvert HTML/MyST → .build/notebooks/

Each component owns its generation logic. No generation happens in collectors.

Phase 2: Collect — pure file copiers

6 collectors copy pre-built artifacts to docs/pipelines/{flow}/:

# Collector Source Gate
1 DocsCollector {flow}/docs/ always
2 NotebookCollector {flow}/.build/notebooks/ .build/ exists
3 DagCollector {flow}/.build/dag.svg or {flow}/dag.svg .build/ or publish.dag
4 ReportCollector derivatives/{flow}/report.html publish.report
5 ScriptsCollector {flow}/scripts/ publish.scripts
6 IndexCollector generates stub always (last)

docs_collect(root, *, export=True)

  • export=True (default): runs both phases — one-command UX
  • export=False: collect-only mode for when exports were run separately
  • CLI: pipeio docs collect --no-export

MCP tools updated

  • mcp_dag_export: writes SVG to both .build/dag.svg and docs/pipelines/ (immediate visibility)
  • mcp_nb_publish: writes to both .build/notebooks/ and docs/pipelines/notebooks/

Sphinx-inspired improvements — scope decision

Improvement Decision Reasoning
overview.md fix Shipped Bug fix, required
Two-phase export/collect Shipped Eliminates duplicated generation in collectors
.build/ convention Shipped Standard per-flow build directory for exported artifacts
Collector decomposition Shipped Separation of concerns, testability
docs.yml manifest Defer Auto-discovery works well enough
Include directives Defer Pandoc's include filter handles this for manuscripts
Explicit ordering Defer Alphabetical + index-first is sufficient

Files changed

  • packages/pipeio/src/pipeio/docs.py — two-phase architecture, export functions, pure collectors
  • packages/pipeio/src/pipeio/cli.py--no-export flag
  • packages/pipeio/src/pipeio/mcp.pymcp_dag_export and mcp_nb_publish write to .build/
  • packages/pipeio/tests/test_docs.py — 58 tests (32 existing + 26 new)
  • src/projio/mcp/pipeio.py — MCP tool docstring updated

Verification

  • pixecog preprocess_ieeg: overview.md as index, 4 mods, 2 notebook HTML exported to .build/ then collected
  • pixecog preprocess_ecephys: overview.md as index, 9 mods, 1 notebook HTML exported to .build/ then collected
  • 70 total files collected across both flows
  • 58/58 pipeio tests pass