3.3. Statistical Mechanics: Ensembles, Ergodicity, and Other Stuff#
The physical foundation of molecular simulations is statistical mechanics. Without it, it would be impossible to connect small systems (at most \(10^3\)–\(10^7\) particles, simulated for nanoseconds to microseconds) to macroscopic thermodynamic quantities like pressure, temperature, or heat capacities that we measure in the lab. Statistical mechanics provides the bridge between:
Microscopic variables: positions and momenta of all particles in a simulation box.
Macroscopic variables: \(N\), \(V\), \(E\), \(T\), \(p\), \(\mu\), etc.
In simulations we always control a small, model system, often with periodic boundary conditions, and ask: If we sample microscopic states in the right ensemble, can we recover correct thermodynamic averages? In other words, can we use a few hundred or thousand atoms as a faithful representative of a macroscopic material containing Avogadro’s number of atoms?
3.3.1. Microstates, macrostates, and phase space#
Consider a classical system of \(N\) particles. The complete microscopic state is specified by all positions and all momenta:
so the phase space has dimension \(6N\). A single point \(\Gamma\) in this phase space is a microstate.
A macrostate is defined by a few thermodynamic constraints, e.g.
Many different microstates \(\Gamma\) are compatible with the same macrostate. For example, keeping \((N,V,E)\) fixed, we can rearrange particles and redistribute momenta in many ways while keeping the total energy unchanged. All such microstates belong to the same ensemble.
3.3.1.1. Example: gas in a box#
Imagine \(N\) argon atoms in a cubic box of volume \(V\) with perfectly reflecting walls. Specifying the total energy \(E\) and the number of atoms \(N\) defines a macrostate \((N,V,E)\). One microstate might have half the atoms on the left side and half on the right; another microstate might have a different spatial arrangement and momentum distribution, but as long as the total energy is \(E\), both microstates correspond to the same macrostate.
3.3.1.2. Ensemble#
An ensemble is a (conceptual) collection of all microstates consistent with a specified set of macroscopic constraints (e.g. fixed \(N\), \(V\), \(E\) in the microcanonical ensemble). In a simulation, we aim to sample microstates from the correct ensemble, either by deterministic dynamics (MD) or stochastic algorithms (Monte Carlo, Langevin, etc.). The ensemble perspective is powerful because it allows us to replace complicated, history-dependent dynamics by simpler probabilistic descriptions.
3.3.2. Density of states, entropy, and temperature#
For a system with fixed \(N\) and \(V\), the density of states (or number of states) at energy \(E\) is usually denoted by \(\Omega(E)\):
where the sum is over all microstates \(\Gamma\) and \(\delta\) is a Dirac delta (or a Kronecker delta for discrete spectra). In words, \(\Omega(E)\) counts how many microstates have total energy \(E\).
The microcanonical entropy is then defined as
where \(k_{\mathrm{B}}\) is Boltzmann’s constant. From this, the temperature \(T\) follows as
Intuitively:
\(\Omega(E)\) measures how many microstates are compatible with the macroscopic constraints.
\(S(E)\) measures how much uncertainty we have about the microscopic configuration if we only know \(E\), \(N\), and \(V\).
3.3.2.1. Example: ideal gas (very briefly)#
For a monatomic ideal gas, \(\Omega(E)\) can be computed analytically and grows roughly like \(E^{3N/2}\). Plugging this into \(S(E) = k_{\mathrm{B}}\ln\Omega(E)\) and differentiating with respect to \(E\) directly gives the familiar relation
which is the equipartition result for an ideal gas.
3.3.3. The microcanonical (NVE) ensemble#
The simplest ensemble, and the easiest one to simulate, is the microcanonical or NVE ensemble:
Think of an isolated system: fixed number of particles, fixed volume, and fixed energy. At thermodynamic equilibrium, all accessible microstates with the same total energy are (assumed to be) equally likely. This is the fundamental postulate of the microcanonical ensemble.
Mathematically, the microcanonical partition function (often denoted \(\Omega\)) is
and the probability of a microstate \(\Gamma\) is
An ensemble average of an observable \(A(\Gamma)\) is then
In molecular dynamics, a pure NVE simulation corresponds to integrating Newton’s equations of motion with no thermostat or barostat, so that total energy is (up to numerical error) conserved.
3.3.3.1. Example: NVE MD of a Lennard–Jones fluid#
A standard textbook example is a box of particles interacting via the Lennard–Jones potential. After an initial equilibration, one can switch to an NVE run and monitor the total energy \(E = K + U\):
The kinetic energy \(K\) and potential energy \(U\) fluctuate in time.
The sum \(E\) remains constant, showing that the dynamics stays on the constant-energy surface in phase space.
Time averages of observables (pressure, structure factor, etc.) estimate microcanonical ensemble averages.
3.3.4. The canonical (NVT) ensemble and the partition function#
Most real systems are not isolated; they are in contact with a heat bath that can exchange energy, but not particles, with the system. This motivates the canonical (NVT) ensemble:
Consider a small system (our simulation box) weakly coupled to a large heat bath. If the total energy is \(E_{\mathrm{tot}} = E_{\mathrm{sys}} + E_{\mathrm{bath}}\), then the probability that the system is in a microstate \(i\) with energy \(E_i\) is proportional to the number of bath states compatible with \(E_{\mathrm{bath}} = E_{\mathrm{tot}} - E_i\). Maximizing the total entropy of system + bath leads to the Boltzmann distribution:
where
is the canonical partition function.
The partition function \(Z\) plays several roles:
It is a normalization constant: \(\sum_i P_i = 1.\)
It encodes the density of states and the Boltzmann weights of all microstates.
From \(Z\), one can compute almost all thermodynamic properties.
For an observable \(A\) that takes the value \(A_i\) in microstate \(i\), the canonical ensemble average is
It is often more convenient to work with the Helmholtz free energy
which is an extensive quantity (scales with system size) and is directly related to \(Z\). Thermodynamic quantities follow from derivatives of \(F\), e.g.
3.3.4.1. Example: ideal gas in the canonical ensemble#
For a monatomic ideal gas of identical particles with mass \(m\), the canonical partition function factorizes into momentum and configuration parts. One finds
where \(\Lambda\) is the thermal de~Broglie wavelength. From this:
This reproduces the ideal gas law and the equipartition theorem directly from the canonical partition function.
3.3.4.2. Energy fluctuations in the canonical ensemble#
In the canonical ensemble the energy is not fixed but fluctuates around its mean value \(\langle E\rangle\). The variance of the energy can be expressed as
where \(C_V\) is the heat capacity at constant volume. The relative fluctuations scale as
so for macroscopic systems (very large \(N\)) these fluctuations are negligible, but for simulations with modest \(N\) they are visible and physically important.
3.3.4.3. NVT in practice: thermostats#
In MD, we construct dynamics whose invariant probability distribution is the canonical one, \(P(\Gamma)\propto e^{-\beta H(\Gamma)}\). Popular examples include:
Nosé–Hoover and Nosé–Hoover chains (deterministic thermostats).
Langevin dynamics (stochastic thermostat; see below).
Stochastic velocity rescaling and related schemes.
These methods modify the equations of motion so that time-averaged observables converge to canonical ensemble averages.
3.3.5. Other standard ensembles#
Real simulations often use other ensembles, depending on what is held fixed and what is allowed to fluctuate:
Isothermal–isobaric (NPT): \(N\), \(p\), \(T\) fixed; \(V\) fluctuates.
Grand canonical (\(\mu\)VT): chemical potential \(\mu\), \(V\), \(T\) fixed; \(N\) fluctuates.
Formally, these ensembles have their own partition functions (e.g. \(Z_{NPT}\), \(\Xi_{\mu VT}\)) obtained by summing over the appropriate variables. In practice, MD codes implement NPT or \(\mu\)VT via barostats and particle-insertion/removal schemes that are designed to sample the correct distribution.
3.3.5.1. Isothermal–isobaric (NPT) ensemble#
In the NPT ensemble we imagine our system in contact with:
a large heat bath at temperature \(T\), and
a mechanical reservoir that fixes the external pressure \(p\) but allows the volume \(V\) to fluctuate.
The NPT partition function is
where \(Z(N,V,T)\) is the canonical partition function at fixed \(N,V,T\). The equilibrium probability density of observing a particular volume \(V\) and microstate \(i\) (with energy \(E_i(V)\)) is then
The corresponding thermodynamic potential is the Gibbs free energy,
From \(G\) we obtain:
where \(\kappa_T\) is the isothermal compressibility. Thus, volume fluctuations in an NPT simulation are directly related to the material’s compressibility.
3.3.5.2. NPT in practice: barostats#
To realize the NPT ensemble in MD, we introduce a barostat that allows the box volume (and sometimes shape) to fluctuate. Standard algorithms include:
Berendsen barostat (simple, but does not sample the correct distribution exactly).
Andersen, Parrinello–Rahman, or Martyna–Tobias–Klein (MTK) barostats, which are more rigorous.
These methods, when combined with a thermostat, generate dynamics whose stationary distribution is the NPT distribution. In an MD input file, this typically appears as a directive like use barostat = NPT with target pressure and temperature.
3.3.5.3. Grand canonical (\(\mu\)VT) ensemble (very briefly)#
In the grand canonical ensemble the system is allowed to exchange particles with a reservoir at fixed chemical potential \(\mu\):
and
Here \(N\) fluctuates, which is useful when studying adsorption, phase coexistence, or open systems. In simulations this is usually realized via Monte Carlo particle insertions and deletions rather than pure MD.
3.3.6. Classical limit and Maxwell–Boltzmann distribution#
In the classical limit, the canonical partition function can be written as an integral over phase space:
where \(H = K + U\) is the Hamiltonian (kinetic plus potential energy). If \(H(\mathbf{q},\mathbf{p}) = K(\mathbf{p}) + U(\mathbf{q})\), then the integrals factor:
The momentum integral produces the familiar Maxwell–Boltzmann (Gaussian) distribution of momenta:
with the equipartition result
The remaining configuration integral defines the configurational partition function used in many simulation contexts.
3.3.6.1. One-particle velocity distribution#
For a single particle of mass \(m\) in three dimensions, the Maxwell–Boltzmann velocity distribution is
where \(v = |\mathbf{v}|\). This is the distribution that many MD codes use to assign initial velocities when “velocity rescaling” or “temperature initialization” is requested.
3.3.7. Time averages vs ensemble averages#
In real experiments, we typically observe a single macroscopic system over time and take time averages. In statistical mechanics, ensemble averages are defined as averages over many copies (replicas) of the system, each in a different microstate consistent with the ensemble.
3.3.7.1. Ensemble average#
For an observable \(A\) with values \(A_i\) in microstate \(i\):
3.3.7.2. Time average#
For a single system evolving in time, we can define a time average:
where \(\Gamma(t)\) is the trajectory in phase space.
In molecular dynamics, we simulate one trajectory and compute time-averaged quantities. The central assumption (to be discussed next) is that, under suitable conditions, time averages converge to ensemble averages.
3.3.7.3. Example: pressure from MD#
Suppose we are running an NVT simulation and we want the average pressure. We can compute the instantaneous pressure \(p(t)\) at each time step using the virial formula and then form a time average,
If the simulation is equilibrated and sufficiently long, \(\bar{p}\) is our estimate of the canonical ensemble average \(\langle p \rangle_{\text{NVT}}\).
3.3.8. The ergodic hypothesis#
The ergodic hypothesis states that, for an isolated system at fixed energy, a single trajectory in phase space will eventually come arbitrarily close to every accessible microstate compatible with the conserved quantities (energy, momentum, etc.). In that case,
for all “reasonable” observables \(A\).
Roughly speaking:
The system relaxes from many different initial conditions to the same equilibrium macroscopic state.
Time correlations decay, and the trajectory explores the energy surface in a complicated, effectively random way.
A long, single MD run is “as good as” sampling many independent microstates from the equilibrium distribution.
This hypothesis underpins the common practice in MD: we run one (or a few) trajectories and treat time averages as estimates of equilibrium ensemble averages.
3.3.8.1. Mixing and decorrelation#
Ergodicity is closely related to the idea of mixing: initial information about the microstate is gradually lost as the system evolves, and observables decorrelate in time. In practice we often estimate the correlation time \(\tau_c\) of an observable \(A\) and use it to determine how long we need to simulate to obtain statistically independent samples.
3.3.9. Non-ergodic behavior and glasses#
Real systems are often not perfectly ergodic, or at least not on accessible simulation timescales. Non-ergodic behavior means that the system fails to explore all of phase space compatible with its energy and constraints, either because:
There are very high barriers in the energy landscape (metastable states).
The dynamics are nearly integrable or regular.
The timescale to cross barriers is astronomically large compared to our simulation time.
A classical example is the Fermi–Pasta–Ulam (FPU) problem: a chain of particles connected by weakly anharmonic springs. Instead of energy spreading evenly among all modes (as equipartition would suggest), energy remains trapped in a few modes for a very long time, revealing non-ergodic dynamics.
Amorphous solids and structural glasses are another important example. If we cool a liquid too fast, it may fall out of equilibrium and get stuck in a disordered, metastable configuration. In such systems:
The observed structure and dynamics depend sensitively on the preparation protocol (e.g. cooling rate).
There is no unique thermodynamic “glass transition temperature” independent of protocol.
Time averages depend on the history of the sample and are not equal to equilibrium ensemble averages.
In simulations of glassy, non-ergodic systems, you can still measure useful quantities (structure factors, relaxation times, dynamic correlation functions), but you must always specify:
How the system was prepared (cooling rate, initial configuration).
Over what time window quantities were averaged.
Whether results change if you modify the protocol.
3.3.9.1. Toy example: double-well potential#
Even a single particle in a one-dimensional double-well potential can show non-ergodic behavior on finite timescales. If the barrier between wells is large compared to \(k_{\mathrm{B}}T\), a trajectory starting in the left well may never cross to the right well during the simulation time, so the time average does not reflect the true equilibrium distribution over both wells.
3.3.10. Contact with a heat bath: thermostats and Monte Carlo#
In the lab, systems are rarely perfectly isolated: they exchange energy, momentum, and sometimes particles with their environment. In simulations we mimic this contact with a “bath” by modifying the equations of motion or using stochastic algorithms. Common approaches include:
Deterministic Molecular Dynamics (MD):
Integrates Newton’s equations of motion, \(m_i \frac{\mathrm{d}^2 \mathbf{r}_i}{\mathrm{d}t^2} = \mathbf{F}_i = -\nabla_{\mathbf{r}_i} U(\mathbf{r}_1,\dots,\mathbf{r}_N),\) usually sampling (approximately) the NVE ensemble. When combined with additional dynamical variables (e.g. Nosé–Hoover thermostat, barostat degrees of freedom), MD can also sample NVT or NPT ensembles.Langevin Dynamics:
Adds friction and random forces to mimic a thermal bath. The equation of motion is $$ m_i \frac{\mathrm{d}^2 \mathbf{r}i}{\mathrm{d}t^2} = -\nabla{\mathbf{r}_i} U\gamma m_i \frac{\mathrm{d}\mathbf{r}_i}{\mathrm{d}t}
\mathbf{R}_i(t), $\( where \)\gamma\( is a friction coefficient and \)\mathbf{R}_i(t)$ is a random force satisfying fluctuation–dissipation relations. This samples the canonical (NVT) ensemble.
More precisely, the random forces are chosen to have \(\langle \mathbf{R}_i(t) \rangle = \mathbf{0}\) and \(\langle R_{i\alpha}(t) R_{j\beta}(t') \rangle = 2\gamma m_i k_{\mathrm{B}}T\,\delta_{ij}\,\delta_{\alpha\beta}\,\delta(t-t'),\) where \(\alpha,\beta\) label Cartesian components. These conditions ensure that, in equilibrium, momenta follow the Maxwell–Boltzmann distribution at temperature \(T\).
Example: diffusion from Langevin dynamics.
Consider a single Brownian particle (e.g. a colloid) in a solvent modeled via Langevin dynamics. In the limit of long times, its mean-squared displacement grows linearly with time, \(\langle [\mathbf{r}(t) - \mathbf{r}(0)]^2 \rangle \approx 6 D t,\) with diffusion constant \(D = \frac{k_{\mathrm{B}}T}{\zeta}\) and \(\zeta = \gamma m,\) consistent with the Einstein relation. By tuning \(\gamma\), we can interpolate between nearly Hamiltonian dynamics (small friction) and strongly damped Brownian motion (large friction).Langevin thermostats in MD codes.
In practice, Langevin thermostats are implemented using discrete-time integrators (e.g. BAOAB, OBABO, velocity Verlet with noise). Users typically choose a damping time \(\tau_{\mathrm{damp}} = 1/\gamma\) on the order of a fraction of a picosecond to a few picoseconds. Too small \(\tau_{\mathrm{damp}}\) (large friction) overdamps the dynamics and can slow down sampling of collective modes; too large \(\tau_{\mathrm{damp}}\) (weak coupling) leads to slow temperature control.Brownian Dynamics:
Overdamped limit where inertial terms are negligible compared to friction. The equation of motion reads \(\gamma \frac{\mathrm{d}\mathbf{r}_i}{\mathrm{d}t} = -\nabla_{\mathbf{r}_i} U + \mathbf{R}_i(t),\) and velocities are not explicitly followed. This is useful for simulations of large particles in viscous environments where momentum relaxes very quickly (e.g. coarse-grained polymer or colloid models).Monte Carlo (MC):
Random moves in configuration space accepted or rejected with a rule that ensures detailed balance, e.g. Metropolis MC. For the canonical ensemble, a trial move from microstate \(i\) to \(j\) with energy change \(\Delta E = E_j - E_i\) is accepted with probability \(P_{\text{acc}}(i\to j) = \min\bigl[1,\exp(-\beta\Delta E)\bigr].\) Repeated moves generate configurations distributed according to \(P_i \propto e^{-\beta E_i}\). MC algorithms can be designed to sample NVT, NPT, or other ensembles by including volume-change moves, particle insertions/deletions, etc.
Once any randomness is introduced (Langevin, MC, etc.), ergodicity is usually improved, and convergence towards the canonical distribution is more robust—although for complex systems, equilibration can still be extremely slow.
3.3.11. Practical issues: equilibration, windowing, and diagnostics#
In practice, to obtain meaningful averages from simulations:
Equilibration:
Start from some initial configuration (often artificial, e.g. a lattice or random packing). The early part of the trajectory is usually far from equilibrium. You should:Let the system evolve until observables (e.g. energy, pressure, structure) fluctuate around a steady mean.
Discard this initial equilibration segment from your averages.
Check stationarity:
Always plot the observable of interest \(A(t)\) versus time. Ask:Does \(A(t)\) fluctuate around a constant mean?
Or does it drift slowly (still evolving)?
How long does it take before the drift disappears?
Time-window averaging:
For non-stationary or slowly relaxing systems (e.g. glasses), be explicit about:The time window over which you averaged.
How the average changes if you use a different time window.
Whether results depend on when you start collecting data (early vs late segments).
Block averaging and error estimates:
Consecutive MD frames are correlated, so naive estimates of the standard error can be misleading. A common approach is block averaging:Split the production trajectory into \(M\) blocks of duration \(\tau_{\text{block}}\).
Compute the block average \(A_k\) in each block \(k = 1,\dots,M\).
Treat the \(A_k\) as approximately independent samples and estimate the mean and standard error from them: \(\bar{A} = \frac{1}{M}\sum_{k=1}^M A_k\) and \(\sigma_{\bar{A}} = \sqrt{\frac{1}{M(M-1)} \sum_{k=1}^M (A_k - \bar{A})^2}.\)
Choosing \(\tau_{\text{block}}\) several times larger than the correlation time of \(A\) improves the reliability of these error estimates.
Protocol dependence:
If your results depend on preparation protocol (cooling rate, compression rate, etc.), then you are not measuring equilibrium ensemble averages, but still something physically meaningful. You must report the protocol clearly.
3.3.12. Summary#
Statistical mechanics connects microscopic variables from simulations to macroscopic thermodynamic quantities.
Microstates are points in a \(6N\)-dimensional phase space; macrostates are defined by a small set of constraints (e.g. \(N,V,E\)).
The density of states \(\Omega(E)\) and the partition function \(Z\) encode the structure of phase space and allow us to compute thermodynamic properties.
Different ensembles (NVE, NVT, NPT, \(\mu\)VT) correspond to different physical constraints and are implemented in simulations via thermostats, barostats, and MC schemes.
The ergodic hypothesis justifies using time averages from MD trajectories as approximations to ensemble averages.
Non-ergodic behavior (e.g. glasses, metastable states) is common and must be treated carefully; results are often protocol-dependent.
Good simulation practice always includes checking equilibration, estimating statistical errors, and carefully defining the ensemble and preparation protocol.