Skip to content

References

Inspirations (handbook-wide)

The handbook owes its shape to several solo-author works and traditions. Primary inspirations:

  • goodresearch.dev — Patrick Mineault. The canonical solo-author research-workflow handbook; this handbook's most direct influence.
  • xcorr.net — Patrick Mineault (blog companion to goodresearch.dev).
  • cartesian.app — Elias Yilma (interactive DSA handbook; explorable-essay pattern).

Additional inspirations (organised by tradition):

  • Solo handbooks: Jenny Bryan (Happy Git with R); Hadley Wickham (R Packages, R for Data Science); Karpathy ("Recipe for Training Neural Networks"); Stas Bekman (ml-engineering); Google DL Tuning Playbook; Vince Buffalo (Bioinformatics Data Skills); The Turing Way.
  • Note-to-blog essayists: Simon Willison; Julia Evans; Lilian Weng; Jay Alammar; Chris Olah; Andy Matuschak; Eugene Yan; Chip Huyen; Maggie Appleton; Dan Luu.
  • Interactive / explorable: Bartosz Ciechanowski; Amit Patel (Red Blob Games); Nicky Case; Bret Victor; Distill.pub; Setosa.io; Seeing Theory; Immersive Linear Algebra.
  • Neuroscience-specific: Mike X Cohen (Analyzing Neural Time Series Data); Russell Poldrack (Statistical Thinking for the 21st Century); Neuromatch Academy.

Source documents

Per-chapter further reading

§00 Framing

Why this stack

  • goodresearch.dev (Patrick Mineault) — the closest companion handbook: solo-author research workflows from question to figure.
  • The Turing Way — community handbook for reproducible, ethical, collaborative research; especially the Reproducible Research guide.
  • ml-engineering (Stas Bekman) — large-scale engineering handbook for ML practitioners; model for the opinionated practitioner-guide format.

Why interactivity

Single-author fragility

  • The Turing Way — patterns for transitioning from solo to collaborative research practice; bus-factor and FAIR principles.
  • goodresearch.dev — Patrick Mineault; team and continuity considerations alongside the solo-author workflow.
  • ml-engineering (Stas Bekman) — large single-author effort that grew into a community resource; case study in sustainability.

§10 BIDS

Strict raw root

  • BIDS specification — canonical source for all entity names, sidecar requirements, and dataset_description.json fields.
  • bids-validator — run bids-validator raw/ to catch layout violations; JavaScript and Python variants.
  • PyBIDS — Python library for querying BIDS datasets; complement to snakebids for non-Snakemake code.
  • MNE-BIDS — BIDS-aware I/O for electrophysiology; handles sidecar creation from raw EEG/iEEG recordings.

Derivatives and manifest

  • BIDS derivatives specification — formal rules for derivative dataset layout, dataset_description.json in derivatives/, and GeneratedBy provenance fields.
  • PyBIDS derivativesBIDSLayout(derivatives=True) for querying processed outputs alongside raw.

BIDS beyond electrophysiology

  • BIDS Extension Proposals — active proposals extending BIDS to video (BEP 024), microscopy, MEG, and other modalities.
  • BIDS starter kit — annotated examples and templates for adopting BIDS in a new modality.

§20 DataLad

Superdataset and subdatasets

  • DataLad handbook — comprehensive reference covering datalad install, nested datasets, provenance recording, and the YODA principles.
  • git-annex — underlying binary-tracking layer; useful when DataLad's abstraction is insufficient or when working with non-DataLad repositories.

Siblings and RIA

Code as subdataset

  • DataLad handbook §YODA principles — the layout principle that keeps code pinned at a commit inside the superdataset; rationale and workflow.
  • DataLad rundatalad run records a command's provenance; the complement to pinning code versions.

§30 Snakemake

Rules and the DAG

  • Snakemake documentation — reference for rule syntax, input/output, run, shell, and script directives; cluster execution profiles.
  • Mölder et al. 2021 — "Sustainable data analysis with Snakemake," F1000Research; cite this when describing the pipeline engine in a methods section.
  • Snakemake tutorial — hands-on walkthrough; fastest path from zero to a running first rule.

Snakebids wildcards

Config-driven pipelines

  • Snakemake §Configurationconfigfile:, the config dict, and profile-based configuration for reproducible parameter sweeps.
  • snakebids documentation — how snakebids config extends Snakemake's own config with BIDS-aware input specifications.

Three idioms


§40 Marimo

Reactive cells

  • Marimo documentation — installation, the reactive execution model, UI element API (mo.ui.*), and the .py file format.
  • Marimo GitHub — source and issues; the blog posts in the repository explain core design decisions.

Analysis notebooks

  • xarrayDataArray, Dataset, .sel()/.isel() coordinate selection, and groupby operations on labelled N-D arrays.
  • HoloViews — declarative multi-dimensional plotting; the .hvplot accessor that bridges xarray and interactive bokeh/panel renderers.
  • MNE-Python — EEG/iEEG processing; read_raw_*, epochs, and time-frequency representations.

Handbook explorables

  • Marimo §WASM exportmarimo export html-wasm; bundle size limits, supported PyPI packages, and embedding options.
  • Pyodide — Python in WebAssembly; the runtime that powers Marimo's browser-side execution.

§50 Publication

MkDocs for the site

  • Material for MkDocs — theme reference; navigation, admonitions, search, social cards, and the full plugin list.
  • MkDocs documentationnav: structure, mkdocs.yml, custom hooks, and deployment to GitHub Pages.

Quarto for deliverables

  • Quarto documentation — formats (html, pdf, revealjs, docx), YAML front-matter, _quarto.yml project files, and the include shortcode.
  • Quarto revealjs guide — slide transitions, incremental lists, fragment animations, and code-block highlighting options.

Two surfaces, one cross-link protocol

  • Material for MkDocs — cross-page links and the mkdocs-ezlinks plugin that resolves bare filenames.
  • Quarto projects_quarto.yml and {{< include >}} for cross-document transclusion within a Quarto project.

§60 Projio

Stack-aware layer

  • Model Context Protocol specification — the JSON-RPC wire format that projio's MCP server implements; tool and resource schemas.
  • FastMCP — Python library used to register projio's MCP tools; decorator-based tool definition.

Notio

Pipeio

Biblio and indexio

  • Docling — PDF text extraction library; table, figure, and structured reference extraction.
  • GROBID — ML tool for structured reference and header extraction from PDFs; powers biblio_grobid.
  • OpenAlex API — open scholarly metadata API; powers DOI resolution and citation-graph expansion in biblio.

Figio and manuscript

  • Pandoc user manual--citeproc, --bibliography, Lua filter interface, and all output format options.
  • Citation Style Language — CSL spec; the APA, IEEE, Chicago, and Vancouver styles bundled by projio are drawn from this repository.

Codio

  • uv — fast Python package manager; uv tool install --editable is used to share editable core libraries across environments without per-project installs.

§70 Agentic workflows

Claude Code and MCP

  • Claude Code documentation — installation, .mcp.json configuration, CLAUDE.md memory hierarchy, and the tool-permission model.
  • Model Context Protocol — the JSON-RPC wire format; reference for writing a new MCP server from scratch.

Permissions and bounded context

Skills

Captures, tasks, and queues

  • Claude Code documentation — the execution model underpinning execute_task() and run_prompt(); session and subagent lifecycle.
  • Anthropic model overview — haiku / sonnet / opus capability tiers; the basis for model selection in dispatch calls.

§80 Orchestration

Worklog overview

  • Claude Code §Sub-agents — the subagent model that worklog's queue taps into; how sessions are isolated and parallelised.
  • Anthropic model overview — the model-tier ladder (haiku → sonnet → opus) that worklog uses for cost-sensitive dispatch.

Goals and critical path

Cross-project dispatch

  • Anthropic model overview — haiku / sonnet / opus capability tiers; guidance for matching model to task complexity in dispatch calls.
  • Claude Code §Sub-agents — how the Agent(...) tool spawns isolated subagent contexts; the mechanism run_prompt() drives.

§90 Future directions

Agent hierarchies

  • Claude Code §Sub-agents — the current Agent(subagent_type=...) primitive that two-tier hierarchies are built on today.
  • Model Context Protocol — the shared communication layer that makes tool-access portable across agent tiers.

Live agent communication

  • Claude Code documentation — the current session model; context for understanding what a "persistent multi-agent session" would extend.

§99 Honest gaps

Honest gaps