Data Model

cogpy uses xarray.DataArray as its primary data structure. This page explains the schema conventions and why they were chosen.

Why xarray?

ECoG data has labeled dimensions (time, space, frequency) that are meaningful to the analysis. Raw numpy arrays lose this context — you have to track which axis is which separately. xarray solves this by attaching named dimensions and coordinates to arrays.

cogpy chose xarray over custom classes because:

It integrates with the scientific Python ecosystem (pandas, dask, zarr)
It provides serialization for free (netCDF, Zarr)
Dimension-aware operations (sel, isel, groupby) reduce indexing bugs
It avoids the maintenance burden of a custom array class

Signal schemas

All schemas are defined in cogpy.base.ECoGSchema:

Grid ECoG: `(time, AP, ML)`

The primary schema for 2D electrode grids. Used by spatial measures, CSD computation, and grid-aware preprocessing.

sig.dims  →  ("time", "AP", "ML")
sig.fs    →  1000.0  (Hz, scalar coordinate)
sig.time  →  [0.000, 0.001, 0.002, ...]  (seconds)
sig.AP    →  [0, 1, 2, ..., 15]  (grid row indices)
sig.ML    →  [0, 1, 2, ..., 15]  (grid column indices)

AP = anterior-posterior (row 0 = most posterior). ML = medial-lateral (column 0 = most medial). This convention matches the physical electrode layout.

Flat ECoG: `(time, ch)`

For channel-indexed data without grid semantics (e.g., strip electrodes, or after flattening a grid).

Multichannel: `(channel, time)`

Generic multichannel format. Note the transposed axis order compared to flat ECoG — this matches common neuroscience conventions (channels as rows).

Sampling rate

fs can be stored in two ways:

Scalar coordinate: sig.coords["fs"] — preferred
Attribute: sig.attrs["fs"] — fallback

Use cogpy.base.get_fs(sig) to retrieve it regardless of storage method, and ensure_fs(sig, fs=1000.0) to guarantee it is set.

Spectrogram schemas

Two spectrogram schemas serve different roles:

`GridWindowedSpectrum` — compute pipeline form

spec.dims  →  ("time_win", "AP", "ML", "freq")

Uppercase spatial dims match the (..., AP, ML) batch convention expected by spatial measures. Use coerce_grid_windowed_spectrum(da) to convert spectrogramx() output into this form (handles renames and transposes).

`GridSpectrogram4D` — orthoslicer/GUI form

spec.dims  →  ("ml", "ap", "time", "freq")

Lowercase spatial dims, optimized for slice-based visualization.

Flat form

spec.dims  →  ("ch", "time_win", "freq")

Normalization

normalize_spectrogram(spec, method=...) produces a whitened or dB-transformed spectrogram with the same dims:

"robust_zscore" — (x - median) / MAD along freq
"db" — 10 * log10(x)

Batch dimension convention

Compute functions in cogpy follow numpy’s broadcasting convention with typed trailing axes:

Domain	Convention	Example
Temporal measures	`(..., time)`	`kurtosis(arr)` reduces last axis
Spectral features	`(..., freq)`	`band_power(psd, freqs, band)`
Spatial measures	`(..., AP, ML)`	`moran_i(grid)` reduces last two axes

Leading ... dimensions are batch dimensions (time windows, frequency bins, channels, etc.). Functions broadcast over them automatically.

This design means you can apply a spatial measure to a full 4D spectrogram (time, freq, AP, ML) in one call — no loops required.

Event representations

Events are stored in EventCatalog, a thin pandas DataFrame wrapper with standardized columns:

Column	Required	Description
`event_id`	yes	Unique integer ID
`t`	yes	Event time (seconds)
`t0`, `t1`	no	Interval start/end
`duration`	no	`t1 - t0`
`freq`	no	Peak frequency (Hz)
`AP`, `ML`	no	Grid position
`channel`	no	Channel index
`label`	no	Event type label
`score`	no	Detection confidence
`detector`	no	Detector name
`pipeline`	no	Pipeline provenance

Validation boundaries

Core compute functions assume valid input — they do not coerce dimensions or check schemas. Validation happens at system boundaries:

cogpy.io — constructs valid DataArrays from raw files
cogpy.datasets.schemas — validate_*() and coerce_*() functions
cogpy.cli — argument parsing and input validation
Frontend entry points — before passing data to the backend

This keeps core functions fast and simple.