docs_collect: overview.md fix + two-phase export/collect architecture¶
Bug fixes (shipped)¶
1. overview.md fallback — fixed¶
Root cause: overview.md was unconditionally mapped to target/index.md regardless of whether a source index.md existed. When both existed, index.md was copied first (alphabetical sort order), then overview.md tried to overwrite the same destination but _is_stale() returned False (target mtime was newer), silently dropping overview content.
Fix: Added has_source_index check in DocsCollector. overview.md is only mapped to index.md when no source index.md exists. When both exist, overview.md is collected as a separate page alongside the index.
2. Source-tree immutability — confirmed safe¶
The code never writes to code/pipelines/{flow}/docs/. All output goes to docs/pipelines/{flow}/ (the output directory). The source tree is treated as read-only during collection by design.
Two-phase architecture (shipped)¶
Refactored docs_collect() from a monolithic function into a two-phase
export-then-collect pipeline, following Sphinx-inspired principles from
idea note idea-arash-20260409-135130-379286.
Phase 1: Export — component-owned artifact generation¶
Export functions generate artifacts into {flow}/.build/:
export_dag(ctx)— snakemake rulegraph + graphviz →.build/dag.svgexport_notebooks(ctx)— nbconvert HTML/MyST →.build/notebooks/
Each component owns its generation logic. No generation happens in collectors.
Phase 2: Collect — pure file copiers¶
6 collectors copy pre-built artifacts to docs/pipelines/{flow}/:
| # | Collector | Source | Gate |
|---|---|---|---|
| 1 | DocsCollector | {flow}/docs/ |
always |
| 2 | NotebookCollector | {flow}/.build/notebooks/ |
.build/ exists |
| 3 | DagCollector | {flow}/.build/dag.svg or {flow}/dag.svg |
.build/ or publish.dag |
| 4 | ReportCollector | derivatives/{flow}/report.html |
publish.report |
| 5 | ScriptsCollector | {flow}/scripts/ |
publish.scripts |
| 6 | IndexCollector | generates stub | always (last) |
docs_collect(root, *, export=True)¶
export=True(default): runs both phases — one-command UXexport=False: collect-only mode for when exports were run separately- CLI:
pipeio docs collect --no-export
MCP tools updated¶
mcp_dag_export: writes SVG to both.build/dag.svganddocs/pipelines/(immediate visibility)mcp_nb_publish: writes to both.build/notebooks/anddocs/pipelines/notebooks/
Sphinx-inspired improvements — scope decision¶
| Improvement | Decision | Reasoning |
|---|---|---|
| overview.md fix | Shipped | Bug fix, required |
| Two-phase export/collect | Shipped | Eliminates duplicated generation in collectors |
.build/ convention |
Shipped | Standard per-flow build directory for exported artifacts |
| Collector decomposition | Shipped | Separation of concerns, testability |
docs.yml manifest |
Defer | Auto-discovery works well enough |
| Include directives | Defer | Pandoc's include filter handles this for manuscripts |
| Explicit ordering | Defer | Alphabetical + index-first is sufficient |
Files changed¶
packages/pipeio/src/pipeio/docs.py— two-phase architecture, export functions, pure collectorspackages/pipeio/src/pipeio/cli.py—--no-exportflagpackages/pipeio/src/pipeio/mcp.py—mcp_dag_exportandmcp_nb_publishwrite to.build/packages/pipeio/tests/test_docs.py— 58 tests (32 existing + 26 new)src/projio/mcp/pipeio.py— MCP tool docstring updated
Verification¶
- pixecog
preprocess_ieeg: overview.md as index, 4 mods, 2 notebook HTML exported to.build/then collected - pixecog
preprocess_ecephys: overview.md as index, 9 mods, 1 notebook HTML exported to.build/then collected - 70 total files collected across both flows
- 58/58 pipeio tests pass