March 2026EEGHPCSignal Processing

EEG preprocessing at scale on HPC

Why this note

Our lab's EEG preprocessing used to take three weeks of wall-clock time per dataset. After a rewrite targeting our university's HPC cluster, the same pipeline now finishes in two days. This note collects the lessons — what mattered, what didn't, and where the remaining time is spent.

The original pipeline

The legacy pipeline was a single-threaded MATLAB script:

Per-subject ICA decomposition
Manual bad-channel review
Epoch-level artifact rejection
Spectral decomposition
Group-level statistics

Steps 1 and 3 dominated runtime. Neither had any parallelism — not across subjects, not across epochs, not across channels.

What we changed

Per-subject parallelism. One HPC job per subject. This alone accounted for a ~20× speedup on a 300-subject cohort.
Checkpointing between stages. Each stage writes to disk. A failed artifact-rejection pass no longer throws away an hour of ICA.
Deterministic seeds. ICA results are reproducible across runs. This mattered more than I expected — we'd been silently comparing slightly different decompositions across re-runs for months.

What didn't move the needle

GPU ICA. We tried it. For our channel counts (64–128), CPU ICA was faster after accounting for data transfer.
Rewriting in Python. The bottleneck wasn't MATLAB; it was the lack of parallelism. A pure translation would have changed nothing.

Remaining bottlenecks

The pipeline is now I/O-bound on our shared filesystem. If we wanted to go faster, we'd stage data onto node-local scratch. For now, two days is fine.

Takeaway

Before rewriting, profile. We almost did a full MATLAB→Python migration; the real fix was a sbatch array job.