pipeio docs_collect: adopt Sphinx-inspired explicit manifest and source-only convention¶

Overview¶

Redesign pipeio docs_collect to follow Sphinx-inspired principles: explicit page manifests, source-tree immutability, and composable includes. Currently docs_collect uses implicit convention-based discovery that silently drops files it doesn't know about (e.g., overview.md, top-level cross-flow docs).

Lessons from Sphinx¶

Explicit toctree, not implicit collection. Sphinx requires you to declare pages in toctree directives. docs_collect auto-discovers mod docs and notebooks but silently ignores anything outside its conventions. A flow-level docs.yml manifest (analogous to notebook.yml) would let flows declare their pages explicitly while still auto-discovering mods/notebooks.
Index vs content separation. Sphinx's index.rst is a navigational page (toctree links), not a content page. docs_collect conflates these by using overview.md as the index. Keeping them separate allows both a navigational index and a content overview.
Glob with explicit ordering. Sphinx toctree supports :glob: for auto-discovery but allows explicit entries first for ordering control. docs_collect could auto-discover mod docs / notebooks but allow a docs_order.yml or similar to control page ordering and include hand-authored pages.
Source tree is read-only during build. Sphinx writes only to _build/, never back to the source tree. docs_collect currently regenerates stub index.md files in the source code/pipelines/{flow}/docs/ dir when they're missing. Generated output should go only to docs/pipelines/; the source tree under code/pipelines/ should be hand-authored only.
Include directives for composability. Sphinx .. include:: lets you compose pages from fragments. If docs_collect supported an {include: overview.md} directive in index templates, you could compose pages without duplicating content.

Proposed design¶

Add optional docs.yml per flow (sibling to notebook.yml) listing pages, ordering, and includes
When absent, fall back to current auto-discovery (mods, scripts, notebooks)
When present, merge auto-discovered content with explicit entries
Never write to code/pipelines/{flow}/docs/ during collection — only to docs/pipelines/
Support top-level code/pipelines/*.md as ecosystem-level docs (architecture, cross-flow DAG)
Convention: overview.md in source → becomes flow index content; index.md is auto-generated navigation only

Source context: pixecog¶

PixEcog (pixecog): Neuropixels and ECoG dataset and analysis

Recent commits:

8dc0d9d Pipeline docs: gitignore docs/pipelines/, relocate hand-authored files
96cd1ec Refactor sharpwaveripple/contracts: extract generic helpers to utils/io, remove pipelines __init__.py
36f9326 Add result note directory and sample note

README:

type: readme

Quick Start for Collaborators¶

Follow this checklist to get started with Pixecog documentation and workflows.

🐀 Pixecog Project — Compact Overview¶

Core principles

One immutable BIDS raw dataset (raw/) as the canonical baseline
Each analysis pipeline ha

idea-arash-20260407-171834-423514.md — Directly related — configurable docs paths for subsystem-owned docs is a prerequisite concern for any explicit manifest approach in docs_collect
idea-arash-20260330-162752-883803.md — pipeio mod_context and notebook metadata tools are the read-side counterpart to docs_collect's write-side conventions — both hinge on how docs are discovered and structured
deep-research-pipeio-scope.md — Broad pipeio scope research likely covers docs tooling design decisions relevant to a docs_collect redesign
idea-arash-20260330-174518-164647.md — pipeio v2 roadmap sets the design direction that a Sphinx-inspired docs_collect overhaul would need to align with