Data Model
cogpy uses xarray.DataArray as its primary data structure. This page
explains the schema conventions and why they were chosen.
Why xarray?
ECoG data has labeled dimensions (time, space, frequency) that are meaningful to the analysis. Raw numpy arrays lose this context — you have to track which axis is which separately. xarray solves this by attaching named dimensions and coordinates to arrays.
cogpy chose xarray over custom classes because:
It integrates with the scientific Python ecosystem (pandas, dask, zarr)
It provides serialization for free (netCDF, Zarr)
Dimension-aware operations (
sel,isel,groupby) reduce indexing bugsIt avoids the maintenance burden of a custom array class
Signal schemas
All schemas are defined in cogpy.base.ECoGSchema:
Grid ECoG: (time, AP, ML)
The primary schema for 2D electrode grids. Used by spatial measures, CSD computation, and grid-aware preprocessing.
sig.dims → ("time", "AP", "ML")
sig.fs → 1000.0 (Hz, scalar coordinate)
sig.time → [0.000, 0.001, 0.002, ...] (seconds)
sig.AP → [0, 1, 2, ..., 15] (grid row indices)
sig.ML → [0, 1, 2, ..., 15] (grid column indices)
AP = anterior-posterior (row 0 = most posterior). ML = medial-lateral (column 0 = most medial). This convention matches the physical electrode layout.
Flat ECoG: (time, ch)
For channel-indexed data without grid semantics (e.g., strip electrodes, or after flattening a grid).
Multichannel: (channel, time)
Generic multichannel format. Note the transposed axis order compared to flat ECoG — this matches common neuroscience conventions (channels as rows).
Sampling rate
fs can be stored in two ways:
Scalar coordinate:
sig.coords["fs"]— preferredAttribute:
sig.attrs["fs"]— fallback
Use cogpy.base.get_fs(sig) to retrieve it regardless of storage
method, and ensure_fs(sig, fs=1000.0) to guarantee it is set.
Spectrogram schemas
Two spectrogram schemas serve different roles:
GridWindowedSpectrum — compute pipeline form
spec.dims → ("time_win", "AP", "ML", "freq")
Uppercase spatial dims match the (..., AP, ML) batch convention expected by
spatial measures. Use coerce_grid_windowed_spectrum(da) to convert
spectrogramx() output into this form (handles renames and transposes).
GridSpectrogram4D — orthoslicer/GUI form
spec.dims → ("ml", "ap", "time", "freq")
Lowercase spatial dims, optimized for slice-based visualization.
Flat form
spec.dims → ("ch", "time_win", "freq")
Normalization
normalize_spectrogram(spec, method=...) produces a whitened or
dB-transformed spectrogram with the same dims:
"robust_zscore"—(x - median) / MADalongfreq"db"—10 * log10(x)
Batch dimension convention
Compute functions in cogpy follow numpy’s broadcasting convention with typed trailing axes:
Domain |
Convention |
Example |
|---|---|---|
Temporal measures |
|
|
Spectral features |
|
|
Spatial measures |
|
|
Leading ... dimensions are batch dimensions (time windows, frequency bins,
channels, etc.). Functions broadcast over them automatically.
This design means you can apply a spatial measure to a full 4D spectrogram
(time, freq, AP, ML) in one call — no loops required.
Event representations
Events are stored in EventCatalog, a thin pandas DataFrame wrapper with
standardized columns:
Column |
Required |
Description |
|---|---|---|
|
yes |
Unique integer ID |
|
yes |
Event time (seconds) |
|
no |
Interval start/end |
|
no |
|
|
no |
Peak frequency (Hz) |
|
no |
Grid position |
|
no |
Channel index |
|
no |
Event type label |
|
no |
Detection confidence |
|
no |
Detector name |
|
no |
Pipeline provenance |
Validation boundaries
Core compute functions assume valid input — they do not coerce dimensions or check schemas. Validation happens at system boundaries:
cogpy.io— constructs valid DataArrays from raw filescogpy.datasets.schemas—validate_*()andcoerce_*()functionscogpy.cli— argument parsing and input validationFrontend entry points — before passing data to the backend
This keeps core functions fast and simple.