Skip to content

pipeio docs_collect: adopt Sphinx-inspired explicit manifest and source-only convention

Overview

Redesign pipeio docs_collect to follow Sphinx-inspired principles: explicit page manifests, source-tree immutability, and composable includes. Currently docs_collect uses implicit convention-based discovery that silently drops files it doesn't know about (e.g., overview.md, top-level cross-flow docs).

Lessons from Sphinx

  1. Explicit toctree, not implicit collection. Sphinx requires you to declare pages in toctree directives. docs_collect auto-discovers mod docs and notebooks but silently ignores anything outside its conventions. A flow-level docs.yml manifest (analogous to notebook.yml) would let flows declare their pages explicitly while still auto-discovering mods/notebooks.

  2. Index vs content separation. Sphinx's index.rst is a navigational page (toctree links), not a content page. docs_collect conflates these by using overview.md as the index. Keeping them separate allows both a navigational index and a content overview.

  3. Glob with explicit ordering. Sphinx toctree supports :glob: for auto-discovery but allows explicit entries first for ordering control. docs_collect could auto-discover mod docs / notebooks but allow a docs_order.yml or similar to control page ordering and include hand-authored pages.

  4. Source tree is read-only during build. Sphinx writes only to _build/, never back to the source tree. docs_collect currently regenerates stub index.md files in the source code/pipelines/{flow}/docs/ dir when they're missing. Generated output should go only to docs/pipelines/; the source tree under code/pipelines/ should be hand-authored only.

  5. Include directives for composability. Sphinx .. include:: lets you compose pages from fragments. If docs_collect supported an {include: overview.md} directive in index templates, you could compose pages without duplicating content.

Proposed design

  • Add optional docs.yml per flow (sibling to notebook.yml) listing pages, ordering, and includes
  • When absent, fall back to current auto-discovery (mods, scripts, notebooks)
  • When present, merge auto-discovered content with explicit entries
  • Never write to code/pipelines/{flow}/docs/ during collection — only to docs/pipelines/
  • Support top-level code/pipelines/*.md as ecosystem-level docs (architecture, cross-flow DAG)
  • Convention: overview.md in source → becomes flow index content; index.md is auto-generated navigation only

Source context: pixecog

PixEcog (pixecog): Neuropixels and ECoG dataset and analysis

Recent commits:

8dc0d9d Pipeline docs: gitignore docs/pipelines/, relocate hand-authored files
96cd1ec Refactor sharpwaveripple/contracts: extract generic helpers to utils/io, remove pipelines __init__.py
36f9326 Add result note directory and sample note

README:


type: readme


Quick Start for Collaborators

Follow this checklist to get started with Pixecog documentation and workflows.

🐀 Pixecog Project — Compact Overview

Core principles

  • One immutable BIDS raw dataset (raw/) as the canonical baseline
  • Each analysis pipeline ha
  • idea-arash-20260407-171834-423514.md — Directly related — configurable docs paths for subsystem-owned docs is a prerequisite concern for any explicit manifest approach in docs_collect
  • idea-arash-20260330-162752-883803.md — pipeio mod_context and notebook metadata tools are the read-side counterpart to docs_collect's write-side conventions — both hinge on how docs are discovered and structured
  • deep-research-pipeio-scope.md — Broad pipeio scope research likely covers docs tooling design decisions relevant to a docs_collect redesign
  • idea-arash-20260330-174518-164647.md — pipeio v2 roadmap sets the design direction that a Sphinx-inspired docs_collect overhaul would need to align with