Architecture And Scope

This document defines the intended architectural boundary of cogpy: what the package should own, what it should leave to external projects, and what technical conventions make the library composable over time.

cogpy is primarily a reusable compute + IO toolkit for ECoG / iEEG analysis. It should stay maintainable by keeping core algorithms file-agnostic, keeping file-format concerns in cogpy.io, and keeping full workflow orchestration outside the package whenever possible.

Package Goals

cogpy should provide:

  1. Reusable in-memory compute primitives for preprocessing, spectral analysis, measures, and event detection.

  2. Structured IO helpers for translating between file formats and the package’s internal xarray-centered representations.

  3. Stable enough internal conventions that external pipelines and frontends can compose cogpy outputs without per-project glue everywhere.

  4. Backend-facing utilities for notebooks and visualization frontends (e.g. TensorScope, a separate React + TypeScript application).

What cogpy Is And Is Not

cogpy is:

  • a library of reusable computational building blocks

  • an IO layer for loading, coercing, validating, and saving structured data

  • a place to define stable internal conventions for common signal and output shapes

  • a backend that external pipelines and frontends can call

cogpy is not:

  • the owner of full project orchestration

  • a replacement for project-level Snakemake DAGs, derivative registries, scheduling, or publication workflows

  • a GUI application or frontend framework

  • a claim that every schema in the package is already fully mature

The intended split is:

  • cogpy.* (top-level subpackages) = reusable in-memory compute primitives

  • cogpy.io = load/save, sidecars, and file-format translation

  • external projects such as PixECoG = orchestration, dataset-specific configs, large-scale execution, and derivative layout

The repository may still contain packaged workflows or CLI wrappers for convenience, testing, and examples. Those should remain thin composition layers around stable library APIs, not grow into a second workflow engine inside cogpy.

Design Principles

  • Keep core processing code file-agnostic and testable.

  • Put file-format and sidecar logic in cogpy.io.

  • Prefer stable data conventions over ad hoc array shapes.

  • Make outputs structured enough for downstream pipelines and frontends.

  • Allow lightweight composition helpers, but keep orchestration minimal.

  • Be honest about evolving areas instead of freezing premature abstractions.

Layering And Composition Philosophy

The intended layering is:

  • cogpy.io: read files, translate to internal representations, validate or coerce schemas, save results, update sidecars

  • cogpy.* (compute subpackages): transform in-memory arrays, compute measures, detect events, build reusable compute abstractions

  • external pipelines: choose inputs, parameter sets, derivative paths, scheduling, caching, and execution order

  • external frontends: own UI state, interaction design, visualization layout, and application logic

This means external project pipelines should usually be thin orchestrators around stable cogpy APIs. A Snakemake rule or project script should mostly:

  1. load via cogpy.io

  2. call one or more cogpy compute functions or lightweight composition helpers

  3. validate/coerce outputs where needed

  4. save via cogpy.io

If reusable composition is needed inside cogpy, it should stay small and local: detector pipelines, convenience wrappers, or helper objects are fine. Owning project-scale DAG semantics is not.

Core Data Model And Schema Expectations

Core functions should operate on structured in-memory objects, typically xarray.DataArray and sometimes xarray.Dataset.

The intended direction is an xarray-centered internal model with common dims such as:

  • time

  • channel

  • freq

  • time_win or other window-like dims where the operation is windowed

  • AP / ML for grid-aware spatial layouts

  • ap / ml in some frontend-oriented or orthoslicer-oriented views

Common expectations:

  • dims should be named, not implied by axis position alone

  • key coordinates such as time, freq, channel, AP, and ML should be present when the abstraction requires them

  • sampling rate should be recoverable in a stable way, currently most often via attrs["fs"]

  • metadata should travel with the array when it materially affects downstream interpretation

DataArray is usually the right representation for a single typed tensor such as a signal, spectrogram, or feature map. Dataset is appropriate when a result is naturally a named collection of aligned arrays.

The important architectural point is not that every object already has a final frozen schema. It is that common schemas should converge, and boundary code should validate or coerce inputs toward those schemas so independent pieces of the package remain composable.

Today, src/cogpy/datasets/schemas.py already defines several canonical dim orders and validate_* / coerce_* helpers. That is the right direction, but the full schema story is still evolving. In particular, dim naming and case are not yet perfectly uniform across all modules, so new code should favor existing canonical validators rather than inventing new one-off conventions.

Conceptual Layers Inside cogpy

The package should remain conceptually clear even when the exact package layout evolves. The main roles are:

Preprocess / transforms

These operations reshape or clean signals without claiming to be a final scientific measure. Examples include filtering, rereferencing, line-noise handling, resampling, interpolation, and normalization.

Spectral transforms

These convert signals into representations such as PSDs or spectrograms. They are still transforms: their main job is to produce a new representation for later interpretation or downstream computation.

Measures / feature maps

These compute interpretable quantities from signals or transforms. Outputs may be scalars, spectra, channel-wise feature vectors, grid feature maps, or windowed maps. Examples include channel features, spatial measures, and frequency-domain summary measures.

Detectors / event extraction

Detectors consume a signal or transformed representation and produce event-oriented outputs such as catalogs, intervals, or peak tables. Architecturally, this is different from a measure: a detector returns discrete events with provenance and event-specific metadata, not only another dense tensor.

Plotting and frontend-facing backend utilities

cogpy may provide backend utilities that make structured tensors and event outputs easier to inspect or hand to a frontend. The library should not absorb UI state management or frontend application logic.

This distinction matters because transforms, measures, and detectors should not all collapse into one generic “analysis” bucket. They have different contracts, different output types, and different downstream uses.

IO Responsibilities

cogpy.io owns:

  • file-format translation

  • metadata and sidecar handling

  • construction of valid internal xarray objects from raw files

  • saving structured outputs back to external formats

This is also the right place for thin convenience wrappers that combine compute with required file bookkeeping. For example, updating sidecars after resampling belongs in IO, not in compute subpackages.

Structured Outputs

Core operations should return structured outputs that downstream code can rely on. Depending on the abstraction, those outputs may be:

  • transformed signals

  • PSDs and spectrograms

  • feature maps

  • summary measures

  • event tables or catalogs

  • interval-like outputs

Stable structured outputs matter for two reasons:

  1. external pipelines need outputs they can validate, serialize, and route without bespoke per-step parsing

  2. frontends need predictable tensors and event-like tables for visualization, overlays, and linked inspection

The package should therefore prefer explicit output contracts over loosely-typed tuples or ad hoc dicts when an output becomes a reusable boundary.

Grid-Aware Processing

Grid-aware ECoG processing is a first-class concern in cogpy, not an afterthought layered onto flat channels.

Important supported patterns include:

  • AP×ML-aware filtering and spatial transforms

  • neighborhood-based normalization relative to local grid structure

  • grid adjacency and footprint helpers

  • spatial measures over feature maps or spectral maps

  • conversions between grid views and channel-stacked views when needed by viewers or downstream algorithms

This is why stable spatial conventions matter. A function that understands AP/ML layout can do more than a generic channel-only function: it can use local neighborhoods, directional structure, and electrode geometry in a reusable way.

Snakemake And External Pipeline Guidance

Project pipelines should remain thin orchestration layers around cogpy, even when they are authored in the same repository or shipped as examples.

Good uses of Snakemake or project-level orchestration:

  • dataset selection and batching

  • dependency ordering and caching

  • resource management

  • derivative path layout

  • project-specific configuration and provenance

Good uses of cogpy inside those pipelines:

  • loading and schema coercion

  • reusable compute primitives

  • detector and transform composition helpers

  • saving outputs and sidecars

The architectural target is not “no workflows anywhere in the repo.” The target is that workflow logic stays lightweight and replaceable, while the durable technical value lives in stable cogpy APIs.

If companion conceptual docs such as docs/reference/feature-registry or docs/reference/example-snakemake-pipeline are maintained, they should sit above this architecture and describe:

  • what feature families exist

  • which cogpy outputs are intended to be stable boundaries

  • how external orchestration should call those APIs without reimplementing them

Preprocessing Structure

The preprocessing stack is still one of the clearest examples of the intended architecture:

  • focused core modules for filtering, resampling, interpolation, and line-noise handling

  • grid-aware bad-channel feature extraction and neighborhood normalization

  • thin wrappers or scripts that compose those pieces for pipeline use

Legacy modules may remain as compatibility shims, but new code should prefer the more explicit modules that preserve schema clarity and keep compute separate from orchestration.

Frontend Boundary

TensorScope is the primary visualization frontend. It is a standalone React + TypeScript application in its own repository — it is not part of cogpy.

cogpy acts as the compute and data backend for TensorScope and similar tools:

  • provide stable tensor representations

  • provide transforms, measures, and detector outputs that can be visualized

  • provide event-like outputs that a frontend can overlay or inspect

TensorScope itself owns:

  • UI state

  • interaction logic

  • layout and rendering

  • frontend-specific persistence and application behavior

Archived TensorScope design docs remain in explanation/plot/_archive/ for historical reference on backend requirements.

Open Questions

  • How far should schema normalization go across current dim-name variants such as AP/ML, ap/ml, and time_win?

  • Which outputs should be normalized around DataArray, which around Dataset, and which around table-like types such as EventCatalog?

  • Which structured outputs are mature enough to be treated as public contracts for external projects today, versus internal conventions still settling?

  • How much packaged workflow support should remain in-repo as examples or convenience wrappers without pulling cogpy toward full workflow ownership?