Data Model

cogpy uses xarray.DataArray as its primary data structure. This page explains the schema conventions and why they were chosen.

Why xarray?

ECoG data has labeled dimensions (time, space, frequency) that are meaningful to the analysis. Raw numpy arrays lose this context — you have to track which axis is which separately. xarray solves this by attaching named dimensions and coordinates to arrays.

cogpy chose xarray over custom classes because:

  • It integrates with the scientific Python ecosystem (pandas, dask, zarr)

  • It provides serialization for free (netCDF, Zarr)

  • Dimension-aware operations (sel, isel, groupby) reduce indexing bugs

  • It avoids the maintenance burden of a custom array class

Signal schemas

All schemas are defined in cogpy.base.ECoGSchema:

Grid ECoG: (time, AP, ML)

The primary schema for 2D electrode grids. Used by spatial measures, CSD computation, and grid-aware preprocessing.

sig.dims  →  ("time", "AP", "ML")
sig.fs    →  1000.0  (Hz, scalar coordinate)
sig.time  →  [0.000, 0.001, 0.002, ...]  (seconds)
sig.AP    →  [0, 1, 2, ..., 15]  (grid row indices)
sig.ML    →  [0, 1, 2, ..., 15]  (grid column indices)

AP = anterior-posterior (row 0 = most posterior). ML = medial-lateral (column 0 = most medial). This convention matches the physical electrode layout.

Flat ECoG: (time, ch)

For channel-indexed data without grid semantics (e.g., strip electrodes, or after flattening a grid).

Multichannel: (channel, time)

Generic multichannel format. Note the transposed axis order compared to flat ECoG — this matches common neuroscience conventions (channels as rows).

Sampling rate

fs can be stored in two ways:

  • Scalar coordinate: sig.coords["fs"] — preferred

  • Attribute: sig.attrs["fs"] — fallback

Use cogpy.base.get_fs(sig) to retrieve it regardless of storage method, and ensure_fs(sig, fs=1000.0) to guarantee it is set.

Spectrogram schemas

Two spectrogram schemas serve different roles:

GridWindowedSpectrum — compute pipeline form

spec.dims  →  ("time_win", "AP", "ML", "freq")

Uppercase spatial dims match the (..., AP, ML) batch convention expected by spatial measures. Use coerce_grid_windowed_spectrum(da) to convert spectrogramx() output into this form (handles renames and transposes).

GridSpectrogram4D — orthoslicer/GUI form

spec.dims  →  ("ml", "ap", "time", "freq")

Lowercase spatial dims, optimized for slice-based visualization.

Flat form

spec.dims  →  ("ch", "time_win", "freq")

Normalization

normalize_spectrogram(spec, method=...) produces a whitened or dB-transformed spectrogram with the same dims:

  • "robust_zscore"(x - median) / MAD along freq

  • "db"10 * log10(x)

Batch dimension convention

Compute functions in cogpy follow numpy’s broadcasting convention with typed trailing axes:

Domain

Convention

Example

Temporal measures

(..., time)

kurtosis(arr) reduces last axis

Spectral features

(..., freq)

band_power(psd, freqs, band)

Spatial measures

(..., AP, ML)

moran_i(grid) reduces last two axes

Leading ... dimensions are batch dimensions (time windows, frequency bins, channels, etc.). Functions broadcast over them automatically.

This design means you can apply a spatial measure to a full 4D spectrogram (time, freq, AP, ML) in one call — no loops required.

Event representations

Events are stored in EventCatalog, a thin pandas DataFrame wrapper with standardized columns:

Column

Required

Description

event_id

yes

Unique integer ID

t

yes

Event time (seconds)

t0, t1

no

Interval start/end

duration

no

t1 - t0

freq

no

Peak frequency (Hz)

AP, ML

no

Grid position

channel

no

Channel index

label

no

Event type label

score

no

Detection confidence

detector

no

Detector name

pipeline

no

Pipeline provenance

Validation boundaries

Core compute functions assume valid input — they do not coerce dimensions or check schemas. Validation happens at system boundaries:

  • cogpy.io — constructs valid DataArrays from raw files

  • cogpy.datasets.schemasvalidate_*() and coerce_*() functions

  • cogpy.cli — argument parsing and input validation

  • Frontend entry points — before passing data to the backend

This keeps core functions fast and simple.