4.6. HPC (High-Performance Computing)#
4.6.1. What is High-Performance Computing (HPC)?#
HPC uses “supercomputers” or “clusters” to solve advanced computing problems.
Typically measured in teraflops or more (FLOPS = floating point operations per second).
Involves multiple computers connected via a network working together.
4.6.1.1. Components#
Compute nodes: Perform calculations.
CPU cores: Traditional processors.
GPU/Accelerators: Optimized for number crunching and machine learning.
RAM: Local memory.
Local disk space: HDD or SSD.
Network file system: Shared storage (e.g., APFS, Lustre).
/home
: Smaller, slower, backed-up./scratch
: Larger, faster, not backed-up, purged regularly (used for I/O intensive jobs).
Scheduler (e.g., SLURM): Manages job distribution.
Specifies number of nodes, cores, accelerators, memory, runtime, and resource sharing.
To utilize HPC effectively, code must be parallelized.
4.6.2. Types of Parallelism#
4.6.2.1. Serial#
One worker does all tasks.
Easy to implement but doesn’t scale.
4.6.2.3. Distributed (Distributed Memory)#
Multiple nodes, each with its own memory.
Workers communicate over a network using MPI (Message Passing Interface).
Example: Split data, each node processes part, then results are combined.
4.6.2.3.1. Communication Patterns#
Point-to-point
Point-to-all (broadcast)
All-to-point (reduction)
All-to-all
4.6.2.3.2. MPI + X#
Combines distributed and threaded parallelism for large-scale computing.
Performance depends on network hardware (fabric) for high bandwidth and low latency.
4.6.3. Parallelization in Molecular Dynamics (MD)#
Force calculation is the most expensive part.
4.6.3.1. Threaded#
Use multiple cores to evaluate pairwise forces.
Works, but limited scalability.
4.6.3.2. Distributed#
Domain decomposition: Split simulation box among processors.
Each processor “owns” particles in its domain.
Ghost particles: Shared across boundaries for interactions.
Particles migrate between domains as they move.
4.6.4. Parallel Performance#
Not all code or problems parallelize well.
Amdahl’s Law: Speedup is limited by the non-parallel portion.
Where:
( P ) = fraction of code that can be parallelized
( N ) = number of processors
4.6.4.1. Strong Scaling#
Measures speedup for a fixed problem size as resources increase.
How fast can I go?
4.6.4.2. Weak Scaling#
Measures how runtime changes when both problem size and resources increase proportionally.
How big can I go?
4.6.5. Additional Resorces#
test