# Architecture And Scope

This document defines the intended architectural boundary of `cogpy`: what the
package should own, what it should leave to external projects, and what
technical conventions make the library composable over time.

`cogpy` is primarily a reusable **compute + IO toolkit** for ECoG / iEEG
analysis. It should stay maintainable by keeping core algorithms file-agnostic,
keeping file-format concerns in `cogpy.io`, and keeping full workflow
orchestration outside the package whenever possible.

## Package Goals

`cogpy` should provide:

1. **Reusable in-memory compute primitives** for preprocessing, spectral
   analysis, measures, and event detection.
2. **Structured IO helpers** for translating between file formats and the
   package's internal xarray-centered representations.
3. **Stable enough internal conventions** that external pipelines and frontends
   can compose `cogpy` outputs without per-project glue everywhere.
4. **Backend-facing utilities** for notebooks and visualization frontends (e.g.
   [TensorScope](https://github.com/arashshahidi1997/tensorscope), a separate
   React + TypeScript application).

## What `cogpy` Is And Is Not

`cogpy` is:

- a library of reusable computational building blocks
- an IO layer for loading, coercing, validating, and saving structured data
- a place to define stable internal conventions for common signal and output
  shapes
- a backend that external pipelines and frontends can call

`cogpy` is not:

- the owner of full project orchestration
- a replacement for project-level Snakemake DAGs, derivative registries,
  scheduling, or publication workflows
- a GUI application or frontend framework
- a claim that every schema in the package is already fully mature

The intended split is:

- `cogpy.*` (top-level subpackages) = reusable in-memory compute primitives
- `cogpy.io` = load/save, sidecars, and file-format translation
- external projects such as PixECoG = orchestration, dataset-specific configs,
  large-scale execution, and derivative layout

The repository may still contain packaged workflows or CLI wrappers for
convenience, testing, and examples. Those should remain thin composition layers
around stable library APIs, not grow into a second workflow engine inside
`cogpy`.

## Design Principles

- Keep core processing code file-agnostic and testable.
- Put file-format and sidecar logic in `cogpy.io`.
- Prefer stable data conventions over ad hoc array shapes.
- Make outputs structured enough for downstream pipelines and frontends.
- Allow lightweight composition helpers, but keep orchestration minimal.
- Be honest about evolving areas instead of freezing premature abstractions.

## Layering And Composition Philosophy

The intended layering is:

- `cogpy.io`: read files, translate to internal representations, validate or
  coerce schemas, save results, update sidecars
- `cogpy.*` (compute subpackages): transform in-memory arrays, compute measures,
  detect events, build reusable compute abstractions
- external pipelines: choose inputs, parameter sets, derivative paths,
  scheduling, caching, and execution order
- external frontends: own UI state, interaction design, visualization layout,
  and application logic

This means external project pipelines should usually be **thin orchestrators**
around stable `cogpy` APIs. A Snakemake rule or project script should mostly:

1. load via `cogpy.io`
2. call one or more `cogpy` compute functions or lightweight composition helpers
3. validate/coerce outputs where needed
4. save via `cogpy.io`

If reusable composition is needed inside `cogpy`, it should stay small and
local: detector pipelines, convenience wrappers, or helper objects are fine.
Owning project-scale DAG semantics is not.

## Core Data Model And Schema Expectations

Core functions should operate on structured in-memory objects, typically
`xarray.DataArray` and sometimes `xarray.Dataset`.

The intended direction is an xarray-centered internal model with common dims
such as:

- `time`
- `channel`
- `freq`
- `time_win` or other window-like dims where the operation is windowed
- `AP` / `ML` for grid-aware spatial layouts
- `ap` / `ml` in some frontend-oriented or orthoslicer-oriented views

Common expectations:

- dims should be named, not implied by axis position alone
- key coordinates such as `time`, `freq`, `channel`, `AP`, and `ML` should be
  present when the abstraction requires them
- sampling rate should be recoverable in a stable way, currently most often via
  `attrs["fs"]`
- metadata should travel with the array when it materially affects downstream
  interpretation

`DataArray` is usually the right representation for a single typed tensor such
as a signal, spectrogram, or feature map. `Dataset` is appropriate when a
result is naturally a named collection of aligned arrays.

The important architectural point is not that every object already has a final
frozen schema. It is that **common schemas should converge**, and boundary code
should validate or coerce inputs toward those schemas so independent pieces of
the package remain composable.

Today, `src/cogpy/datasets/schemas.py` already defines several canonical dim
orders and `validate_*` / `coerce_*` helpers. That is the right direction, but
the full schema story is still evolving. In particular, dim naming and case are
not yet perfectly uniform across all modules, so new code should favor existing
canonical validators rather than inventing new one-off conventions.

## Conceptual Layers Inside `cogpy`

The package should remain conceptually clear even when the exact package layout
evolves. The main roles are:

### Preprocess / transforms

These operations reshape or clean signals without claiming to be a final
scientific measure. Examples include filtering, rereferencing, line-noise
handling, resampling, interpolation, and normalization.

### Spectral transforms

These convert signals into representations such as PSDs or spectrograms. They
are still transforms: their main job is to produce a new representation for
later interpretation or downstream computation.

### Measures / feature maps

These compute interpretable quantities from signals or transforms. Outputs may
be scalars, spectra, channel-wise feature vectors, grid feature maps, or
windowed maps. Examples include channel features, spatial measures, and
frequency-domain summary measures.

### Detectors / event extraction

Detectors consume a signal or transformed representation and produce
event-oriented outputs such as catalogs, intervals, or peak tables.
Architecturally, this is different from a measure: a detector returns discrete
events with provenance and event-specific metadata, not only another dense
tensor.

### Plotting and frontend-facing backend utilities

`cogpy` may provide backend utilities that make structured tensors and event
outputs easier to inspect or hand to a frontend. The library should not absorb
UI state management or frontend application logic.

This distinction matters because transforms, measures, and detectors should not
all collapse into one generic "analysis" bucket. They have different contracts,
different output types, and different downstream uses.

## IO Responsibilities

`cogpy.io` owns:

- file-format translation
- metadata and sidecar handling
- construction of valid internal xarray objects from raw files
- saving structured outputs back to external formats

This is also the right place for thin convenience wrappers that combine compute
with required file bookkeeping. For example, updating sidecars after resampling
belongs in IO, not in compute subpackages.

## Structured Outputs

Core operations should return structured outputs that downstream code can rely
on. Depending on the abstraction, those outputs may be:

- transformed signals
- PSDs and spectrograms
- feature maps
- summary measures
- event tables or catalogs
- interval-like outputs

Stable structured outputs matter for two reasons:

1. external pipelines need outputs they can validate, serialize, and route
   without bespoke per-step parsing
2. frontends need predictable tensors and event-like tables for visualization,
   overlays, and linked inspection

The package should therefore prefer explicit output contracts over
loosely-typed tuples or ad hoc dicts when an output becomes a reusable boundary.

## Grid-Aware Processing

Grid-aware ECoG processing is a first-class concern in `cogpy`, not an
afterthought layered onto flat channels.

Important supported patterns include:

- AP×ML-aware filtering and spatial transforms
- neighborhood-based normalization relative to local grid structure
- grid adjacency and footprint helpers
- spatial measures over feature maps or spectral maps
- conversions between grid views and channel-stacked views when needed by
  viewers or downstream algorithms

This is why stable spatial conventions matter. A function that understands
`AP`/`ML` layout can do more than a generic channel-only function: it can use
local neighborhoods, directional structure, and electrode geometry in a
reusable way.

## Snakemake And External Pipeline Guidance

Project pipelines should remain thin orchestration layers around `cogpy`, even
when they are authored in the same repository or shipped as examples.

Good uses of Snakemake or project-level orchestration:

- dataset selection and batching
- dependency ordering and caching
- resource management
- derivative path layout
- project-specific configuration and provenance

Good uses of `cogpy` inside those pipelines:

- loading and schema coercion
- reusable compute primitives
- detector and transform composition helpers
- saving outputs and sidecars

The architectural target is not "no workflows anywhere in the repo." The target
is that workflow logic stays lightweight and replaceable, while the durable
technical value lives in stable `cogpy` APIs.

If companion conceptual docs such as `docs/reference/feature-registry` or
`docs/reference/example-snakemake-pipeline` are maintained, they should sit
above this architecture and describe:

- what feature families exist
- which `cogpy` outputs are intended to be stable boundaries
- how external orchestration should call those APIs without reimplementing them

## Preprocessing Structure

The preprocessing stack is still one of the clearest examples of the intended
architecture:

- focused core modules for filtering, resampling, interpolation, and line-noise
  handling
- grid-aware bad-channel feature extraction and neighborhood normalization
- thin wrappers or scripts that compose those pieces for pipeline use

Legacy modules may remain as compatibility shims, but new code should prefer the
more explicit modules that preserve schema clarity and keep compute separate
from orchestration.

## Frontend Boundary

[TensorScope](https://github.com/arashshahidi1997/tensorscope) is the primary
visualization frontend. It is a **standalone React + TypeScript application**
in its own repository — it is not part of cogpy.

`cogpy` acts as the **compute and data backend** for TensorScope and similar
tools:

- provide stable tensor representations
- provide transforms, measures, and detector outputs that can be visualized
- provide event-like outputs that a frontend can overlay or inspect

TensorScope itself owns:

- UI state
- interaction logic
- layout and rendering
- frontend-specific persistence and application behavior

Archived TensorScope design docs remain in `explanation/plot/_archive/` for
historical reference on backend requirements.

## Open Questions

- How far should schema normalization go across current dim-name variants such
  as `AP`/`ML`, `ap`/`ml`, and `time_win`?
- Which outputs should be normalized around `DataArray`, which around
  `Dataset`, and which around table-like types such as `EventCatalog`?
- Which structured outputs are mature enough to be treated as public contracts
  for external projects today, versus internal conventions still settling?
- How much packaged workflow support should remain in-repo as examples or
  convenience wrappers without pulling `cogpy` toward full workflow ownership?