← Back to portfolio
March 2026[EEG][HPC][SIGNAL PROCESSING]

EEG preprocessing at scale on HPC

Why this note

Our lab's EEG preprocessing used to take three weeks of wall-clock time per dataset. After a rewrite targeting our university's HPC cluster, the same pipeline now finishes in two days. This note collects the lessons — what mattered, what didn't, and where the remaining time is spent.

The original pipeline

The legacy pipeline was a single-threaded MATLAB script:

  1. Per-subject ICA decomposition
  2. Manual bad-channel review
  3. Epoch-level artifact rejection
  4. Spectral decomposition
  5. Group-level statistics

Steps 1 and 3 dominated runtime. Neither had any parallelism — not across subjects, not across epochs, not across channels.

What we changed

What didn't move the needle

Remaining bottlenecks

The pipeline is now I/O-bound on our shared filesystem. If we wanted to go faster, we'd stage data onto node-local scratch. For now, two days is fine.

Takeaway

Before rewriting, profile. We almost did a full MATLAB→Python migration; the real fix was a sbatch array job.