HPC (High-Performance Computing)

4.6. HPC (High-Performance Computing)#

4.6.1. What is High-Performance Computing (HPC)?#

HPC uses “supercomputers” or “clusters” to solve advanced computing problems.
Typically measured in teraflops or more (FLOPS = floating point operations per second).
Involves multiple computers connected via a network working together.

4.6.1.1. Components#

Compute nodes: Perform calculations.
CPU cores: Traditional processors.
GPU/Accelerators: Optimized for number crunching and machine learning.
RAM: Local memory.
Local disk space: HDD or SSD.
Network file system: Shared storage (e.g., APFS, Lustre).
- /home: Smaller, slower, backed-up.
- /scratch: Larger, faster, not backed-up, purged regularly (used for I/O intensive jobs).
Scheduler (e.g., SLURM): Manages job distribution.
- Specifies number of nodes, cores, accelerators, memory, runtime, and resource sharing.

To utilize HPC effectively, code must be parallelized.

4.6.2. Types of Parallelism#

4.6.2.1. Serial#

One worker does all tasks.
Easy to implement but doesn’t scale.

4.6.2.2. Threaded (Shared Memory)#

Multiple workers on one node/accelerator.
All share memory.
Examples:
- OpenMP: #pragma omp
- CUDA: GPU parallel programming
Easy to use, but not all code parallelizes well.

4.6.2.3. Distributed (Distributed Memory)#

Multiple nodes, each with its own memory.
Workers communicate over a network using MPI (Message Passing Interface).
Example: Split data, each node processes part, then results are combined.

4.6.2.3.1. Communication Patterns#

Point-to-point
Point-to-all (broadcast)
All-to-point (reduction)
All-to-all

4.6.2.3.2. MPI + X#

Combines distributed and threaded parallelism for large-scale computing.

Performance depends on network hardware (fabric) for high bandwidth and low latency.

4.6.3. Parallelization in Molecular Dynamics (MD)#

Force calculation is the most expensive part.

4.6.3.1. Threaded#

Use multiple cores to evaluate pairwise forces.
Works, but limited scalability.

4.6.3.2. Distributed#

Domain decomposition: Split simulation box among processors.
Each processor “owns” particles in its domain.
Ghost particles: Shared across boundaries for interactions.
Particles migrate between domains as they move.

4.6.4. Parallel Performance#

Not all code or problems parallelize well.
Amdahl’s Law: Speedup is limited by the non-parallel portion.

\[ S = \frac{1}{(1 - P) + \frac{P}{N}} \]

Where:

( P ) = fraction of code that can be parallelized
( N ) = number of processors

4.6.4.1. Strong Scaling#

Measures speedup for a fixed problem size as resources increase.

\[ \text{Efficiency} = \frac{N \times \text{base time}}{\text{actual time}} \quad (\text{Ideal} = 100\%) \]

How fast can I go?

4.6.4.2. Weak Scaling#

Measures how runtime changes when both problem size and resources increase proportionally.

How big can I go?

4.6.5. Additional Resorces#

test

HPC (High-Performance Computing)

Contents

4.6. HPC (High-Performance Computing)#

4.6.1. What is High-Performance Computing (HPC)?#

4.6.1.1. Components#

4.6.2. Types of Parallelism#

4.6.2.1. Serial#

4.6.2.2. Threaded (Shared Memory)#

4.6.2.3. Distributed (Distributed Memory)#

4.6.2.3.1. Communication Patterns#

4.6.2.3.2. MPI + X#

4.6.3. Parallelization in Molecular Dynamics (MD)#

4.6.3.1. Threaded#

4.6.3.2. Distributed#

4.6.4. Parallel Performance#

4.6.4.1. Strong Scaling#

4.6.4.2. Weak Scaling#

4.6.5. Additional Resorces#