EEG preprocessing at scale on HPC
Why this note
Our lab's EEG preprocessing used to take three weeks of wall-clock time per dataset. After a rewrite targeting our university's HPC cluster, the same pipeline now finishes in two days. This note collects the lessons — what mattered, what didn't, and where the remaining time is spent.
The original pipeline
The legacy pipeline was a single-threaded MATLAB script:
- Per-subject ICA decomposition
- Manual bad-channel review
- Epoch-level artifact rejection
- Spectral decomposition
- Group-level statistics
Steps 1 and 3 dominated runtime. Neither had any parallelism — not across subjects, not across epochs, not across channels.
What we changed
- Per-subject parallelism. One HPC job per subject. This alone accounted for a ~20× speedup on a 300-subject cohort.
- Checkpointing between stages. Each stage writes to disk. A failed artifact-rejection pass no longer throws away an hour of ICA.
- Deterministic seeds. ICA results are reproducible across runs. This mattered more than I expected — we'd been silently comparing slightly different decompositions across re-runs for months.
What didn't move the needle
- GPU ICA. We tried it. For our channel counts (64–128), CPU ICA was faster after accounting for data transfer.
- Rewriting in Python. The bottleneck wasn't MATLAB; it was the lack of parallelism. A pure translation would have changed nothing.
Remaining bottlenecks
The pipeline is now I/O-bound on our shared filesystem. If we wanted to go faster, we'd stage data onto node-local scratch. For now, two days is fine.
Takeaway
Before rewriting, profile. We almost did a full MATLAB→Python migration; the
real fix was a sbatch array job.