Cross-Recording Factor Matching
This tutorial demonstrates how to match spatio-spectral factors across multiple recordings (sessions or animals) and compute a consensus (centroid) decomposition. It is a direct follow-up to Factor Analysis of Spectrograms with erpPCA, which covers single-recording erpPCA.
Setup
Python 3.10+ required. Install cogpy with the extras this notebook needs (interactive plots + spectral backend + Jupyter kernel):
pip install "ecogpy[viz,signal,notebook]"
Or install everything at once: pip install "ecogpy[all]".
For a development install see Installation.
Prerequisites
Read Factor Analysis of Spectrograms with erpPCA first. This tutorial assumes familiarity
with SpatSpecDecomposition, erpPCA, loadings, and scores.
The problem
Varimax-rotated PCA decomposes each recording’s spectrogram into spatio-spectral factors independently. The factors are meaningful within a single recording, but across recordings:
Factor ordering differs — variance ranking changes with noise.
Factor polarity and shape vary — slightly different spatial patterns emerge from different data segments.
To compare factors across recordings (e.g. “is this spindle pattern consistent across sessions?”), we need to match corresponding factors and quantify their similarity.
The approach
The matching pipeline (Garcia-Cortadella et al. 2024,
cogpy.decomposition.match.match_factors()) works in five steps:
Algorithm overview
Pairwise similarity — for every pair of recordings, compute a
(nfac x nfac)similarity matrix using spatial correlation of loading maps at peak frequency, gated by a frequency threshold.Hungarian assignment — find the optimal one-to-one factor mapping for each pair that maximises total similarity (
scipy.optimize.linear_sum_assignment).Reference selection — pick the recording whose transitive matches are best overall (the “medoid”), or use a fixed reference.
Transitive alignment — re-index all recordings’ factors to match the reference’s ordering.
Centroid — average the aligned, normalised loadings to get a consensus decomposition.
import warnings; warnings.filterwarnings("ignore") # suppress xarray FutureWarnings
import numpy as np
import xarray as xr
import pandas as pd
import holoviews as hv
hv.extension("bokeh")
1. Generate synthetic multi-session data
We create 4 synthetic recordings that share the same underlying spatio-spectral factors but with different noise realisations and factor orderings — mimicking what you’d get from independent PCA fits on different sessions of the same animal.
Why synthetic data?
The bundled sample recording is a single 7.5 s segment. To demonstrate matching we need multiple recordings with shared structure. We extract “ground truth” loadings from the real data, then generate synthetic spectrograms by mixing these loadings with random scores + noise. Each independent PCA fit recovers the factors in a different order — exactly the scenario matching solves.
from cogpy.datasets import load_sample
from cogpy.spectral.specx import spectrogramx
from cogpy.decomposition.spatspec import SpatSpecDecomposition
from cogpy.decomposition.pca import erpPCA
# Fit erpPCA on the real sample to get "ground truth" loadings
sig = load_sample()
spec = spectrogramx(sig, bandwidth=4.0, nperseg=64, noverlap=56)
spec = spec.compute() if hasattr(spec.data, "compute") else spec
spec = spec.sel(freq=slice(1, 50)).isel(
AP=slice(None, None, 2), ML=slice(None, None, 2)
)
spec = spec.coarsen(freq=2, boundary="trim").mean()
mtx = spec.rename({"AP": "h", "ML": "w"})
ss_real = SpatSpecDecomposition(mtx)
X_real = ss_real.designmat(mtx, log=True)
Xc_real = X_real - X_real.mean("time")
Xc_real.data = np.nan_to_num(Xc_real.data)
nfac = 6
erp = erpPCA(nfac=nfac, verbose=False)
erp.fit(Xc_real.data)
# Ground truth loadings: (nfac, n_vars)
L_true = erp.LR.T # (nfac, hwf)
print(f"Ground truth: {nfac} factors, {L_true.shape[1]} variables")
computing covariance matrix ...
eigendecomposition ...
Varimax ...
Ground truth: 6 factors, 768 variables
Now generate 4 synthetic spectrograms by mixing these loadings with random scores, adding noise, and fitting erpPCA independently on each:
rng = np.random.default_rng(42)
nrec = 4
n_time = 200 # time windows per synthetic recording
synthetic_series = []
for irec in range(nrec):
# Random scores (each factor has different variance)
variances = rng.exponential(1.0, size=nfac)
scores = rng.normal(size=(n_time, nfac)) * np.sqrt(variances)
# Reconstruct spectrogram + add noise
X_synth = scores @ L_true
noise_level = 0.3 * np.std(X_synth)
X_synth += rng.normal(scale=noise_level, size=X_synth.shape)
# Fit erpPCA independently — factor order will differ
erp_synth = erpPCA(nfac=nfac, verbose=False)
erp_synth.fit(X_synth)
# Build SpatSpecDecomposition
ss = SpatSpecDecomposition(ss_real.spatspec_coords)
ss.ldx_set(erp_synth.LR)
ss.ldx_process()
synthetic_series.append(ss)
ss_series = pd.Series(synthetic_series)
print(f"Created {nrec} synthetic recordings, each with {nfac} factors")
computing covariance matrix ...
eigendecomposition ...
Varimax ...
computing covariance matrix ...
eigendecomposition ...
Varimax ...
computing covariance matrix ...
eigendecomposition ...
Varimax ...
computing covariance matrix ...
eigendecomposition ...
Varimax ...
Created 4 synthetic recordings, each with 6 factors
2. Inspect unmatched factors
Before matching, factors from different recordings are in arbitrary order. Compare the spatial maps at peak frequency for the first two recordings — notice that similar patterns appear at different factor indices:
from cogpy.plot.hv.decomposition import loading_spatial_layout
layout_0 = loading_spatial_layout(
ss_series.iloc[0].ldx_slc_maxfreq,
ss_series.iloc[0].ldx_df,
).opts(title="Recording 0 (unmatched)")
layout_1 = loading_spatial_layout(
ss_series.iloc[1].ldx_slc_maxfreq,
ss_series.iloc[1].ldx_df,
).opts(title="Recording 1 (unmatched)")
layout_0 + layout_1
What to look for
You should see similar spatial patterns (blobs in similar grid locations) but at different factor indices across the two recordings. For example, Recording 0’s Factor 0 might look like Recording 1’s Factor 3. This is the ordering ambiguity that matching resolves.
3. Pairwise similarity matrix
The first step in matching is computing the similarity between every
factor pair across two recordings. The metric is the Pearson
correlation between spatial loading maps at peak frequency. Factor
pairs whose peak frequencies differ by more than freq_threshold Hz
are assigned zero similarity:
from cogpy.decomposition.pca import compute_similarity_matrix
simil_01 = compute_similarity_matrix(
ss_series.iloc[0], ss_series.iloc[1], freq_threshold=3
)
hv.HeatMap(
[(f"Rec0 F{i}", f"Rec1 F{j}", simil_01[i, j])
for i in range(nfac) for j in range(nfac)],
kdims=["Recording 0", "Recording 1"],
vdims=["Similarity"],
).opts(
cmap="RdBu_r", colorbar=True, width=400, height=350,
title="Pairwise factor similarity (Rec 0 vs Rec 1)",
xrotation=45, symmetric=True,
)
Reading the heatmap
Each cell (i, j) shows the spatial correlation between Factor i
of Recording 0 and Factor j of Recording 1. Values close to 1
indicate a strong match. Zero values mean the peak frequencies
differed by more than freq_threshold — those pairs are excluded
from consideration.
4. Hungarian assignment (bipartite matching)
The Hungarian algorithm (scipy.optimize.linear_sum_assignment)
finds the one-to-one mapping that maximises total similarity. It
solves the assignment problem optimally in O(n^3):
from scipy.optimize import linear_sum_assignment
row_ind, col_ind = linear_sum_assignment(-simil_01) # negate for maximisation
print("Optimal factor mapping (Rec 0 -> Rec 1):")
for r, c in zip(row_ind, col_ind):
print(f" Factor {r} -> Factor {c} (similarity: {simil_01[r, c]:.3f})")
print(f"\nMean optimal similarity: {simil_01[row_ind, col_ind].mean():.3f}")
Optimal factor mapping (Rec 0 -> Rec 1):
Factor 0 -> Factor 0 (similarity: 0.997)
Factor 1 -> Factor 1 (similarity: 0.993)
Factor 2 -> Factor 3 (similarity: 0.979)
Factor 3 -> Factor 5 (similarity: 0.977)
Factor 4 -> Factor 2 (similarity: 0.944)
Factor 5 -> Factor 4 (similarity: 0.000)
Mean optimal similarity: 0.815
Why Hungarian and not greedy?
A greedy approach (pick the best pair, remove it, repeat) can get stuck in suboptimal solutions. The Hungarian algorithm guarantees the globally optimal one-to-one assignment — the permutation that maximises the sum of matched similarities.
5. Multi-recording matching with match_factors
For more than 2 recordings, pairwise Hungarian can’t be applied
directly (it’s a bipartite algorithm — it only handles two sets).
match_factors uses a hub-and-spoke strategy:
from cogpy.decomposition.match import match_factors
results = match_factors(
ss_series,
nrec=nrec,
nfac=nfac,
freq_threshold=3,
simil_score_threshold=0.5,
optimal_ref=True,
)
print(f"Optimal reference recording: {int(results['metadata_csv']['opt_ref'])}")
print(f"\nMatch table (factor index per recording):")
print(results["match_df_csv"])
Optimal reference recording: 0
Match table (factor index per recording):
0 1 2 3 simil
matched_factor
0 1 1 1 0 0.989496
1 0 0 0 1 0.988475
2 3 5 4 3 0.964772
3 4 2 3 2 0.944194
Understanding the match table
Each row is a matched factor group. The numbered columns
(0, 1, 2, 3) show which factor index in each recording corresponds
to that matched factor. The simil column is the minimum
pairwise similarity across all recording pairs for that group — a
conservative quality measure.
Factors with similarity below simil_score_threshold are dropped.
Optimal reference selection
When optimal_ref=True, the algorithm tries every recording as
the reference, runs the full matching, and picks the one whose
transitive matches are best overall. This is like finding the
“medoid” of the group — the most typical recording. Set
optimal_ref=False to always use recording 0 as the reference
(faster, deterministic, but may give worse matches if recording 0
is atypical).
6. Visualise matched loadings
After matching, factor indices are aligned across recordings. Each column below shows the same matched factor across all 4 recordings — they should look spatially similar:
ss_all = results["ldx_all_pkl"]
n_matched = min(4, len(results["match_df_csv"]))
panels = []
for ifac in range(n_matched):
for irec in range(nrec):
slc = ss_all.ldx_slc_maxfreq.sel(rec=irec, factor=ifac)
panels.append(
hv.Image(
slc, kdims=["w", "h"],
).opts(
cmap="RdBu_r", colorbar=False, width=160, height=140,
symmetric=True, invert_yaxis=True,
title=f"Rec {irec}, F{ifac}",
)
)
hv.Layout(panels).cols(nrec).opts(
title="Matched factors: each column = same factor across recordings"
)
7. Centroid (consensus) loadings
The centroid is the mean of normalised, aligned loadings — a consensus factor that captures the shared spatial-spectral pattern across all recordings:
ss_cent = results["ldx_pkl"]
loading_spatial_layout(ss_cent.ldx_slc_maxfreq, ss_cent.ldx_df)
Centroid spectral profiles
The frequency profile at the peak electrode for each consensus factor — the red dashed line marks the peak frequency:
from cogpy.plot.hv.decomposition import loading_spectral_profiles
loading_spectral_profiles(ss_cent.ldx_slc_maxch, ss_cent.ldx_df)
8. Quality assessment
Not all factors match well across recordings. The simil column
quantifies the worst-case pairwise similarity for each matched
factor. Low scores indicate patterns that are session-specific or
noise-driven:
match_df = results["match_df_csv"]
hv.Bars(
[(f"F{i}", s) for i, s in zip(match_df.index, match_df["simil"])],
kdims=["Matched factor"], vdims=["Min. similarity"],
).opts(
width=400, height=250, color="steelblue",
title="Match quality per factor (min pairwise similarity)",
ylim=(0, 1),
)
Interpreting match quality
> 0.9: strong match — the factor is highly reproducible.
0.7 – 0.9: moderate — likely a real pattern with some session-to-session variability.
< 0.7: weak — consider whether this factor is meaningful or noise-driven. Increase
simil_score_thresholdto filter these out automatically.
Summary
Step |
Function |
Module |
|---|---|---|
Fit per-recording |
|
|
Pairwise similarity |
|
|
Full matching pipeline |
|
|
HV spatial layout |
|
|
HV spectral profiles |
|
|
Key parameters
Parameter |
Default |
Purpose |
|---|---|---|
|
3 Hz |
Max peak-frequency difference to consider a match |
|
0.7 |
Min similarity to retain a matched factor |
|
|
Auto-select best reference recording |
Cross-animal matching
When comparing factors across animals, the matching can be done in two stages:
Within-animal: match factors across sessions of the same animal using
match_factorsto produce a per-animal centroid.Across-animal: collect the per-animal centroids into a new Series and run
match_factorsagain to produce a grand centroid.
# Stage 1: within-animal centroids
animal_centroids = {}
for animal_id, session_series in sessions_by_animal.items():
result = match_factors(session_series, nrec=len(session_series),
nfac=nfac, freq_threshold=3,
simil_score_threshold=0.7)
animal_centroids[animal_id] = result["ldx_pkl"]
# Stage 2: across-animal matching
cross_animal_series = pd.Series(animal_centroids)
grand_result = match_factors(cross_animal_series,
nrec=len(cross_animal_series),
nfac=nfac, freq_threshold=5,
simil_score_threshold=0.5)
grand_centroid = grand_result["ldx_pkl"]
Spatial correspondence across animals
The current similarity metric (spatial correlation) assumes pixel-to-pixel correspondence — i.e. the same grid size and placement across animals. When grid placements differ, spatial correlation may underestimate true similarity.
Garcia-Cortadella et al. (2024) use an alternative feature-space
Euclidean distance for cross-animal matching: a 7-dimensional
summary (AP/ML sign, grid position, peak frequency, spatial extent)
that does not require aligned grids. A method="feature_space"
option for match_factors is planned for a future release.
Next steps
Factor Analysis of Spectrograms with erpPCA — single-recording factor analysis (prerequisite)
Spectral Analysis — spectral analysis stack (PSD, spectrograms)
cogpy.decomposition.match— API reference for matching functions
References
Garcia-Cortadella, R. et al. (2024). DC-coupled, 2.5D electrophysiological imaging of large-scale cortical dynamics. bioRxiv. DOI: 10.1101/2024.12.20.629545