Architecture And Scope

This document defines the intended architectural boundary of cogpy: what the package should own, what it should leave to external projects, and what technical conventions make the library composable over time.

cogpy is primarily a reusable compute + IO toolkit for ECoG / iEEG analysis. It should stay maintainable by keeping core algorithms file-agnostic, keeping file-format concerns in cogpy.io, and keeping full workflow orchestration outside the package whenever possible.

Package Goals

cogpy should provide:

Reusable in-memory compute primitives for preprocessing, spectral analysis, measures, and event detection.
Structured IO helpers for translating between file formats and the package’s internal xarray-centered representations.
Stable enough internal conventions that external pipelines and frontends can compose cogpy outputs without per-project glue everywhere.
Backend-facing utilities for notebooks and visualization frontends (e.g. TensorScope, a separate React + TypeScript application).

What `cogpy` Is And Is Not

cogpy is:

a library of reusable computational building blocks
an IO layer for loading, coercing, validating, and saving structured data
a place to define stable internal conventions for common signal and output shapes
a backend that external pipelines and frontends can call

cogpy is not:

the owner of full project orchestration
a replacement for project-level Snakemake DAGs, derivative registries, scheduling, or publication workflows
a GUI application or frontend framework
a claim that every schema in the package is already fully mature

The intended split is:

cogpy.* (top-level subpackages) = reusable in-memory compute primitives
cogpy.io = load/save, sidecars, and file-format translation
external projects such as PixECoG = orchestration, dataset-specific configs, large-scale execution, and derivative layout

The repository may still contain packaged workflows or CLI wrappers for convenience, testing, and examples. Those should remain thin composition layers around stable library APIs, not grow into a second workflow engine inside cogpy.

Design Principles

Keep core processing code file-agnostic and testable.
Put file-format and sidecar logic in cogpy.io.
Prefer stable data conventions over ad hoc array shapes.
Make outputs structured enough for downstream pipelines and frontends.
Allow lightweight composition helpers, but keep orchestration minimal.
Be honest about evolving areas instead of freezing premature abstractions.

Layering And Composition Philosophy

The intended layering is:

cogpy.io: read files, translate to internal representations, validate or coerce schemas, save results, update sidecars
cogpy.* (compute subpackages): transform in-memory arrays, compute measures, detect events, build reusable compute abstractions
external pipelines: choose inputs, parameter sets, derivative paths, scheduling, caching, and execution order
external frontends: own UI state, interaction design, visualization layout, and application logic

This means external project pipelines should usually be thin orchestrators around stable cogpy APIs. A Snakemake rule or project script should mostly:

load via cogpy.io
call one or more cogpy compute functions or lightweight composition helpers
validate/coerce outputs where needed
save via cogpy.io

If reusable composition is needed inside cogpy, it should stay small and local: detector pipelines, convenience wrappers, or helper objects are fine. Owning project-scale DAG semantics is not.

Core Data Model And Schema Expectations

Core functions should operate on structured in-memory objects, typically xarray.DataArray and sometimes xarray.Dataset.

The intended direction is an xarray-centered internal model with common dims such as:

time
channel
freq
time_win or other window-like dims where the operation is windowed
AP / ML for grid-aware spatial layouts
ap / ml in some frontend-oriented or orthoslicer-oriented views

Common expectations:

dims should be named, not implied by axis position alone
key coordinates such as time, freq, channel, AP, and ML should be present when the abstraction requires them
sampling rate should be recoverable in a stable way, currently most often via attrs["fs"]
metadata should travel with the array when it materially affects downstream interpretation

DataArray is usually the right representation for a single typed tensor such as a signal, spectrogram, or feature map. Dataset is appropriate when a result is naturally a named collection of aligned arrays.

The important architectural point is not that every object already has a final frozen schema. It is that common schemas should converge, and boundary code should validate or coerce inputs toward those schemas so independent pieces of the package remain composable.

Today, src/cogpy/datasets/schemas.py already defines several canonical dim orders and validate_* / coerce_* helpers. That is the right direction, but the full schema story is still evolving. In particular, dim naming and case are not yet perfectly uniform across all modules, so new code should favor existing canonical validators rather than inventing new one-off conventions.

Conceptual Layers Inside `cogpy`

The package should remain conceptually clear even when the exact package layout evolves. The main roles are:

Preprocess / transforms

These operations reshape or clean signals without claiming to be a final scientific measure. Examples include filtering, rereferencing, line-noise handling, resampling, interpolation, and normalization.

Spectral transforms

These convert signals into representations such as PSDs or spectrograms. They are still transforms: their main job is to produce a new representation for later interpretation or downstream computation.

Measures / feature maps

These compute interpretable quantities from signals or transforms. Outputs may be scalars, spectra, channel-wise feature vectors, grid feature maps, or windowed maps. Examples include channel features, spatial measures, and frequency-domain summary measures.

Detectors / event extraction

Detectors consume a signal or transformed representation and produce event-oriented outputs such as catalogs, intervals, or peak tables. Architecturally, this is different from a measure: a detector returns discrete events with provenance and event-specific metadata, not only another dense tensor.

Plotting and frontend-facing backend utilities

cogpy may provide backend utilities that make structured tensors and event outputs easier to inspect or hand to a frontend. The library should not absorb UI state management or frontend application logic.

This distinction matters because transforms, measures, and detectors should not all collapse into one generic “analysis” bucket. They have different contracts, different output types, and different downstream uses.

IO Responsibilities

cogpy.io owns:

file-format translation
metadata and sidecar handling
construction of valid internal xarray objects from raw files
saving structured outputs back to external formats

This is also the right place for thin convenience wrappers that combine compute with required file bookkeeping. For example, updating sidecars after resampling belongs in IO, not in compute subpackages.

Structured Outputs

Core operations should return structured outputs that downstream code can rely on. Depending on the abstraction, those outputs may be:

transformed signals
PSDs and spectrograms
feature maps
summary measures
event tables or catalogs
interval-like outputs

Stable structured outputs matter for two reasons:

external pipelines need outputs they can validate, serialize, and route without bespoke per-step parsing
frontends need predictable tensors and event-like tables for visualization, overlays, and linked inspection

The package should therefore prefer explicit output contracts over loosely-typed tuples or ad hoc dicts when an output becomes a reusable boundary.

Grid-Aware Processing

Grid-aware ECoG processing is a first-class concern in cogpy, not an afterthought layered onto flat channels.

Important supported patterns include:

AP×ML-aware filtering and spatial transforms
neighborhood-based normalization relative to local grid structure
grid adjacency and footprint helpers
spatial measures over feature maps or spectral maps
conversions between grid views and channel-stacked views when needed by viewers or downstream algorithms

This is why stable spatial conventions matter. A function that understands AP/ML layout can do more than a generic channel-only function: it can use local neighborhoods, directional structure, and electrode geometry in a reusable way.

Snakemake And External Pipeline Guidance

Project pipelines should remain thin orchestration layers around cogpy, even when they are authored in the same repository or shipped as examples.

Good uses of Snakemake or project-level orchestration:

dataset selection and batching
dependency ordering and caching
resource management
derivative path layout
project-specific configuration and provenance

Good uses of cogpy inside those pipelines:

loading and schema coercion
reusable compute primitives
detector and transform composition helpers
saving outputs and sidecars

The architectural target is not “no workflows anywhere in the repo.” The target is that workflow logic stays lightweight and replaceable, while the durable technical value lives in stable cogpy APIs.

If companion conceptual docs such as docs/reference/feature-registry or docs/reference/example-snakemake-pipeline are maintained, they should sit above this architecture and describe:

what feature families exist
which cogpy outputs are intended to be stable boundaries
how external orchestration should call those APIs without reimplementing them

Preprocessing Structure

The preprocessing stack is still one of the clearest examples of the intended architecture:

focused core modules for filtering, resampling, interpolation, and line-noise handling
grid-aware bad-channel feature extraction and neighborhood normalization
thin wrappers or scripts that compose those pieces for pipeline use

Legacy modules may remain as compatibility shims, but new code should prefer the more explicit modules that preserve schema clarity and keep compute separate from orchestration.

Frontend Boundary

TensorScope is the primary visualization frontend. It is a standalone React + TypeScript application in its own repository — it is not part of cogpy.

cogpy acts as the compute and data backend for TensorScope and similar tools:

provide stable tensor representations
provide transforms, measures, and detector outputs that can be visualized
provide event-like outputs that a frontend can overlay or inspect

TensorScope itself owns:

UI state
interaction logic
layout and rendering
frontend-specific persistence and application behavior

Archived TensorScope design docs remain in explanation/plot/_archive/ for historical reference on backend requirements.

Open Questions

How far should schema normalization go across current dim-name variants such as AP/ML, ap/ml, and time_win?
Which outputs should be normalized around DataArray, which around Dataset, and which around table-like types such as EventCatalog?
Which structured outputs are mature enough to be treated as public contracts for external projects today, versus internal conventions still settling?
How much packaged workflow support should remain in-repo as examples or convenience wrappers without pulling cogpy toward full workflow ownership?