This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Contents
1 From Microscopic to Macroscopic Behavior 1 1.1
1.2 Intro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Some qualitative observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
3 1.3
1.4 Doing work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Quality of energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
5 1.5
1.6 Some simple simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Work, heating, and the ﬁrst law of thermo dynamics . . . . . . . . . . . . . . . . . 6
14 1.7
1.8 Measuring the pressure and temperature . . . . . . . . . . . . . . . . . . . . . . . .
*The fundamental need for a statistical approach . . . . . . . . . . . . . . . . . . . 15
18 1.9 *Time and ensemble averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.10 *Mo dels of matter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
20 1.10.1 The ideal gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.10.2 Interparticle potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
21 1.10.3 Lattice mo dels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.11 Importance of simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
22 1.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Suggestions for Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
25 2 Thermo dynamic Concepts
2.1 Intro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
26 2.2
2.3 The system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Thermodynamic Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
27 2.4
2.5 Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Pressure Equation of State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
31 2.6
2.7 Some Thermo dynamic Pro cesses . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
34 i CONTENTS
2.8 ii The First Law of Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.9 Energy Equation of State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.10 Heat Capacities and Enthalpy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
40 2.11 Adiabatic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.12 The Second Law of Thermo dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 43
47 2.13 The Thermo dynamic Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.14 The Second Law and Heat Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
51 2.15 Entropy Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.16 Equivalence of Thermo dynamic and Ideal Gas Scale Temperatures . . . . . . . . .
2.17 The Thermo dynamic Pressure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
61 2.18 The Fundamental Thermodynamic Relation . . . . . . . . . . . . . . . . . . . . . .
2.19 The Entropy of an Ideal Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
63 2.20 The Third Law of Thermo dynamics . . . . . . . . . . . . . . . . . . . . . . . . . .
2.21 Free Energies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
65 Appendix 2B: Mathematics of Thermo dynamics . . . . . . . . . . . . . . . . . . . . . .
Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
73 Suggestions for Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3 Concepts of Probability 82 3.1
3.2 Probability in everyday life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The rules of probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
84 3.3 Mean values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.4 The meaning of probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Information and uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
93 3.5 3.4.2 *Bayesian inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Bernoulli processes and the binomial distribution . . . . . . . . . . . . . . . . . . . 97
99 3.6
3.7 Continuous probability distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 109
The Gaussian distribution as a limit of the binomial distribution . . . . . . . . . . 111 3.8
3.9 The central limit theorem or why is thermodynamics possible? . . . . . . . . . . . 113
The Poisson distribution and should you ﬂy in airplanes? . . . . . . . . . . . . . . 116 3.10 *Traﬃc ﬂow and the exponential distribution . . . . . . . . . . . . . . . . . . . . . 117
3.11 *Are all probability distributions Gaussian? . . . . . . . . . . . . . . . . . . . . . . 119
Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Suggestions for Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 iii CONTENTS
4 Statistical Mechanics 138 4.1
4.2 Intro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
A simple example of a thermal interaction . . . . . . . . . . . . . . . . . . . . . . . 140 4.3 Counting microstates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.3.1 Noninteracting spins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.3.2
4.3.3 *Onedimensional Ising model . . . . . . . . . . . . . . . . . . . . . . . . . . 150
A particle in a onedimensional box . . . . . . . . . . . . . . . . . . . . . . 151 4.3.4 Onedimensional harmonic oscillator . . . . . . . . . . . . . . . . . . . . . . 153 4.3.5
4.3.6 One particle in a twodimensional box . . . . . . . . . . . . . . . . . . . . . 154
One particle in a threedimensional box . . . . . . . . . . . . . . . . . . . . 156 4.4 4.3.7 Two noninteracting identical particles and the semiclassical limit . . . . . . 156
The number of states of N noninteracting particles: Semiclassical limit . . . . . . . 158 4.5
4.6 The micro canonical ensemble (ﬁxed E, V, and N) . . . . . . . . . . . . . . . . . . . 160
Systems in contact with a heat bath: The canonical ensemble (ﬁxed T, V, and N) 165 4.7
4.8 Connection between statistical mechanics and thermo dynamics . . . . . . . . . . . 170
Simple applications of the canonical ensemble . . . . . . . . . . . . . . . . . . . . . 172 4.9 A simple thermometer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.10 Simulations of the micro canonical ensemble . . . . . . . . . . . . . . . . . . . . . . 177
4.11 Simulations of the canonical ensemble . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.12 Grand canonical ensemble (ﬁxed T, V, and µ) . . . . . . . . . . . . . . . . . . . . . 179
4.13 Entropy and disorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Appendix 4A: The Volume of a Hypersphere . . . . . . . . . . . . . . . . . . . . . . . . 183
Appendix 4B: Fluctuations in the Canonical Ensemble . . . . . . . . . . . . . . . . . . . 184
Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Suggestions for Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
5 Magnetic Systems
190
5.1 Paramagnetism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
5.2
5.3 Thermo dynamics of magnetism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
The Ising mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 5.4 The Ising Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
5.4.1 Exact enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
5.4.2
5.4.3
5.4.4
5.4.5 5.5 Spinspin correlation function . . . . . . . . . . . . . . . . . . . . . . . . . 199
Simulations of the Ising chain . . . . . . . . . . . . . . . . . . . . . . . . . . 201
*Transfer matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Absence of a phase transition in one dimension . . . . . . . . . . . . . . . . 205 ∗ The TwoDimensional Ising Mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 CONTENTS iv 5.5.1 Onsager solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 5.6 5.5.2 Computer simulation of the twodimensional Ising mo del . . . . . . . . . . 211
MeanField Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 5.7 *Inﬁniterange interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Suggestions for Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
6 Noninteracting Particle Systems
230
6.1 Intro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
6.2
6.3 The Classical Ideal Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
Classical Systems and the Equipartition Theorem . . . . . . . . . . . . . . . . . . . 238 6.4
6.5 Maxwell Velocity Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Occupation Numbers and Bose and Fermi Statistics . . . . . . . . . . . . . . . . . 243 6.6
6.7 Distribution Functions of Ideal Bose and Fermi Gases . . . . . . . . . . . . . . . . 245
Single Particle Density of States . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
6.7.1
6.7.2 6.8
6.9 Photons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Electrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 The Equation of State for a Noninteracting Classical Gas . . . . . . . . . . . . . . 252
Black Body Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 6.10 Noninteracting Fermi Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
6.10.1 Groundstate properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
6.10.2 Low temperature thermo dynamic properties . . . . . . . . . . . . . . . . . . 263
6.11 Bose Condensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
6.12 The Heat Capacity of a Crystalline Solid . . . . . . . . . . . . . . . . . . . . . . . . 272
6.12.1 The Einstein mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
6.12.2 Debye theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Appendix 6A: Low Temperature Expansion . . . . . . . . . . . . . . . . . . . . . . . . . 275
Suggestions for Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
7 Thermo dynamic Relations and Pro cesses
288
7.1 Intro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
7.2
7.3 Maxwell Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
Applications of the Maxwell Relations . . . . . . . . . . . . . . . . . . . . . . . . . 291
7.3.1
7.3.2 7.4 Internal energy of an ideal gas . . . . . . . . . . . . . . . . . . . . . . . . . 291
Relation between the speciﬁc heats . . . . . . . . . . . . . . . . . . . . . . . 291 Applications to Irreversible Pro cesses . . . . . . . . . . . . . . . . . . . . . . . . . . 292
7.4.1 The Joule or free expansion pro cess . . . . . . . . . . . . . . . . . . . . . . 293 CONTENTS v 7.4.2 JouleThomson pro cess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 7.5 Equilibrium Between Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
7.5.1 Equilibrium conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
7.5.2
7.5.3 ClausiusClapeyron equation . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Simple phase diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 7.5.4
7.5.5 Pressure dependence of the melting point . . . . . . . . . . . . . . . . . . . 301
Pressure dependence of the boiling point . . . . . . . . . . . . . . . . . . . . 302 7.5.6 The vapor pressure curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 7.6 Vo cabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Suggestions for Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
8 Classical Gases and Liquids 306 8.1
8.2 Intro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
The Free Energy of an Interacting System . . . . . . . . . . . . . . . . . . . . . . . 306 8.3
8.4 Second Virial Coeﬃcient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Cumulant Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 8.5
8.6 High Temperature Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Density Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 8.7 Radial Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
8.7.1 Relation of thermodynamic functions to g(r) . . . . . . . . . . . . . . . . . 326
8.7.2
8.7.3
8.7.4 8.8
8.9 Relation of g(r) to static structure function S(k) . . . . . . . . . . . . . . . 327
Variable number of particles . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Density expansion of g(r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Computer Simulation of Liquids . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Perturbation Theory of Liquids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
8.9.1
8.9.2 The van der Waals Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 334
ChandlerWeeksAndersen theory . . . . . . . . . . . . . . . . . . . . . . . . 335 8.10 *The OrnsteinZernicke Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
8.11 *Integral Equations for g(r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
8.12 *Coulomb Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
8.12.1 DebyeHuckel Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
¨
8.12.2 Linearized DebyeHuckel approximation . . . . . . . . . . . . . . . . . . . . 341
¨
8.12.3 Diagrammatic Expansion for Charged Particles . . . . . . . . . . . . . . . . 342
8.13 Vo cabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Appendix 8A: The third virial coeﬃcient for hard spheres . . . . . . . . . . . . . . . . . 344
8.14 Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 CONTENTS
9 Critical Phenomena vi
350 9.1
9.2 A Geometrical Phase Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
Renormalization Group for Percolation . . . . . . . . . . . . . . . . . . . . . . . . . 354 9.3
9.4 The LiquidGas Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Bethe Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 9.5
9.6 Landau Theory of Phase Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . 363
Other Mo dels of Magnetism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 9.7 Universality and Scaling Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 9.8
9.9 The Renormalization Group and the 1D Ising Model . . . . . . . . . . . . . . . . . 372
The Renormalization Group and the TwoDimensional Ising Mo del . . . . . . . . . 376 9.10 Vo cabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
9.11 Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
Suggestions for Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
10 Intro duction to ManyBo dy Perturbation Theory 387 10.1 Intro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
10.2 Occupation Number Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 388
10.3 Operators in the Second Quantization Formalism . . . . . . . . . . . . . . . . . . . 389
10.4 Weakly Interacting Bose Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
A Useful Formulae
397
A.1 Physical constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
A.2 SI derived units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
A.3 Conversion factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
A.4 Mathematical Formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
A.5 Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
A.6 EulerMaclaurin formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
A.7 Gaussian Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
A.8 Stirling’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
A.9 Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
A.10 Probability distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
A.11 Fermi integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
A.12 Bose integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Chapter 1 From Microscopic to Macroscopic
Behavior
c 2006 by Harvey Gould and Jan Tobochnik
28 August 2006
The goal of this introductory chapter is to explore the fundamental diﬀerences between microscopic and macroscopic systems and the connections between classical mechanics and statistical
mechanics. We note that bouncing balls come to rest and hot objects cool, and discuss how the
behavior of macroscopic objects is related to the behavior of their microscopic constituents. Computer simulations will be introduced to demonstrate the relation of microscopic and macroscopic
behavior. 1.1 Introduction Our goal is to understand the properties of macroscopic systems, that is, systems of many electrons, atoms, molecules, photons, or other constituents. Examples of familiar macroscopic objects
include systems such as the air in your room, a glass of water, a copper coin, and a rubber band
(examples of a gas, liquid, solid, and polymer, respectively). Less familiar macroscopic systems
are superconductors, cell membranes, the brain, and the galaxies.
We will ﬁnd that the type of questions we ask about macroscopic systems diﬀer in important
ways from the questions we ask about microscopic systems. An example of a question about a
microscopic system is “What is the shape of the trajectory of the Earth in the solar system?”
In contrast, have you ever wondered about the trajectory of a particular molecule in the air of
your room? Why not? Is it relevant that these molecules are not visible to the eye? Examples of
questions that we might ask about macroscopic systems include the following:
1. How does the pressure of a gas depend on the temperature and the volume of its container?
2. How does a refrigerator work? What is its maximum eﬃciency?
1 CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 2 3. How much energy do we need to add to a kettle of water to change it to steam?
4. Why are the properties of water diﬀerent from those of steam, even though water and steam
consist of the same type of molecules?
5. How are the molecules arranged in a liquid?
6. How and why does water freeze into a particular crystalline structure?
7. Why does iron lose its magnetism above a certain temperature?
8. Why does helium condense into a superﬂuid phase at very low temperatures? Why do some
materials exhibit zero resistance to electrical current at suﬃciently low temperatures?
9. How fast does a river current have to be before its ﬂow changes from laminar to turbulent?
10. What will the weather be tomorrow?
The above questions can be roughly classiﬁed into three groups. Questions 1–3 are concerned
with macroscopic properties such as pressure, volume, and temperature and questions related to
heating and work. These questions are relevant to thermodynamics which provides a framework
for relating the macroscopic properties of a system to one another. Thermodynamics is concerned
only with macroscopic quantities and ignores the microscopic variables that characterize individual
molecules. For example, we will ﬁnd that understanding the maximum eﬃciency of a refrigerator
does not require a knowledge of the particular liquid used as the coolant. Many of the applications
of thermodynamics are to thermal engines, for example, the internal combustion engine and the
steam turbine.
Questions 4–8 relate to understanding the behavior of macroscopic systems starting from the
atomic nature of matter. For example, we know that water consists of molecules of hydrogen
and oxygen. We also know that the laws of classical and quantum mechanics determine the
behavior of molecules at the microscopic level. The goal of statistical mechanics is to begin with
the microscopic laws of physics that govern the behavior of the constituents of the system and
deduce the properties of the system as a whole. Statistical mechanics is the bridge between the
microscopic and macroscopic worlds.
Thermodynamics and statistical mechanics assume that the macroscopic properties of the
system do not change with time on the average. Thermodynamics describes the change of a
macroscopic system from one equilibrium state to another. Questions 9 and 10 concern macroscopic phenomena that change with time. Related areas are nonequilibrium thermodynamics and
ﬂuid mechanics from the macroscopic point of view and nonequilibrium statistical mechanics from
the microscopic point of view. Although there has been progress in our understanding of nonequilibrium phenomena such as turbulent ﬂow and hurricanes, our understanding of nonequilibrium
phenomena is much less advanced than our understanding of equilibrium systems. Because understanding the properties of macroscopic systems that are independent of time is easier, we will
focus our attention on equilibrium systems and consider questions such as those in Questions 1–8. CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 1.2 3 Some qualitative observations We begin our discussion of macroscopic systems by considering a glass of water. We know that if
we place a glass of hot water into a cool room, the hot water cools until its temperature equals
that of the room. This simple observation illustrates two important properties associated with
macroscopic systems – the importance of temperature and the arrow of time. Temperature is
familiar because it is associated with the physiological sensation of hot and cold and is important
in our everyday experience. We will ﬁnd that temperature is a subtle concept.
The direction or arrow of time is an even more subtle concept. Have you ever observed a glass
of water at room temperature spontaneously become hotter? Why not? What other phenomena
exhibit a direction of time? Time has a direction as is expressed by the nursery rhyme:
Humpty Dumpty sat on a wall
Humpty Dumpty had a great fall
All the king’s horses and all the king’s men
Couldn’t put Humpty Dumpty back together again.
Is there a a direction of time for a single particle? Newton’s second law for a single particle,
F = dp/dt, implies that the motion of particles is time reversal invariant, that is, Newton’s second
law looks the same if the time t is replaced by −t and the momentum p by −p. There is no
direction of time at the microscopic level. Yet if we drop a basketball onto a ﬂoor, we know that it
will bounce and eventually come to rest. Nobody has observed a ball at rest spontaneously begin
to bounce, and then bounce higher and higher. So based on simple everyday observations, we can
conclude that the behavior of macroscopic bodies and single particles is very diﬀerent.
Unlike generations of about a century or so ago, we know that macroscopic systems such as a
glass of water and a basketball consist of many molecules. Although the intermolecular forces in
water produce a complicated trajectory for each molecule, the observable properties of water are
easy to describe. Moreover, if we prepare two glasses of water under similar conditions, we would
ﬁnd that the observable properties of the water in each glass are indistinguishable, even though
the motion of the individual particles in the two glasses would be very diﬀerent.
Because the macroscopic behavior of water must be related in some way to the trajectories of its
constituent molecules, we conclude that there must be a relation between the notion of temperature
and mechanics. For this reason, as we discuss the behavior of the macroscopic properties of a glass
of water and a basketball, it will be useful to discuss the relation of these properties to the motion
of their constituent molecules.
For example, if we take into account that the bouncing ball and the ﬂoor consist of molecules,
then we know that the total energy of the ball and the ﬂoor is conserved as the ball bounces
and eventually comes to rest. What is the cause of the ball eventually coming to rest? You
might be tempted to say the cause is “friction,” but friction is just a name for an eﬀective or
phenomenological force. At the microscopic level we know that the fundamental forces associated
with mass, charge, and the nucleus conserve the total energy. So if we take into account the
molecules of the ball and the ﬂoor, their total energy is conserved. Conservation of energy does
not explain why the inverse process does not occur, because such a process also would conserve
the total energy. So a more fundamental explanation is that the ball comes to rest consistent with
conservation of the total energy and consistent with some other principle of physics. We will learn CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 4 that this principle is associated with an increase in the entropy of the system. For now, entropy is
only a name, and it is important only to understand that energy conservation is not suﬃcient to
understand the behavior of macroscopic systems. (As for most concepts in physics, the meaning
of entropy in the context of thermodynamics and statistical mechanics is very diﬀerent than the
way entropy is used by nonscientists.)
For now, the nature of entropy is vague, because we do not have an entropy meter like we do
for energy and temperature. What is important at this stage is to understand why the concept of
energy is not suﬃcient to describe the behavior of macroscopic systems.
By thinking about the constituent molecules, we can gain some insight into the nature of
entropy. Let us consider the ball bouncing on the ﬂoor again. Initially, the energy of the ball
is associated with the motion of its center of mass, that is, the energy is associated with one
degree of freedom. However, after some time, the energy becomes associated with many degrees
of freedom associated with the individual molecules of the ball and the ﬂoor. If we were to bounce
the ball on the ﬂoor many times, the ball and the ﬂoor would each feel warm to our hands. So we
can hypothesize that energy has been transferred from one degree of freedom to many degrees of
freedom at the same time that the total energy has been conserved. Hence, we conclude that the
entropy is a measure of how the energy is distributed over the degrees of freedom.
What other quantities are associated with macroscopic systems besides temperature, energy,
and entropy? We are already familiar with some of these quantities. For example, we can measure
the air pressure in a basketball and its volume. More complicated quantities are the thermal
conductivity of a solid and the viscosity of oil. How are these macroscopic quantities related to
each other and to the motion of the individual constituent molecules? The answers to questions
such as these and the meaning of temperature and entropy will take us through many chapters. 1.3 Doing work We already have observed that hot objects cool, and cool objects do not spontaneously become
hot; bouncing balls come to rest, and a stationary ball does not spontaneously begin to bounce.
And although the total energy must be conserved in any process, the distribution of energy changes
in an irreversible manner. We also have concluded that a new concept, the entropy, needs to be
introduced to explain the direction of change of the distribution of energy.
Now let us take a purely macroscopic viewpoint and discuss how we can arrive at a similar
qualitative conclusion about the asymmetry of nature. This viewpoint was especially important
historically because of the lack of a microscopic theory of matter in the 19th century when the
laws of thermodynamics were being developed.
Consider the conversion of stored energy into heating a house or a glass of water. The stored
energy could be in the form of wood, coal, or animal and vegetable oils for example. We know that
this conversion is easy to do using simple methods, for example, an open ﬁreplace. We also know
that if we rub our hands together, they will become warmer. In fact, there is no theoretical limit1
to the eﬃciency at which we can convert stored energy to energy used for heating an object.
What about the process of converting stored energy into work? Work like many of the other
concepts that we have mentioned is diﬃcult to deﬁne. For now let us say that doing work is
1 Of course, the eﬃciency cannot exceed 100%. CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 5 equivalent to the raising of a weight (see Problem 1.18). To be useful, we need to do this conversion
in a controlled manner and indeﬁnitely. A single conversion of stored energy into work such as the
explosion of a bomb might do useful work, such as demolishing an unwanted football stadium, but
a bomb is not a useful device that can be recycled and used again. It is much more diﬃcult to
convert stored energy into work and the discovery of ways to do this conversion led to the industrial
revolution. In contrast to the primitiveness of the open hearth, we have to build an engine to do
this conversion.
Can we convert stored energy into work with 100% eﬃciency? On the basis of macroscopic
arguments alone, we cannot answer this question and have to appeal to observations. We know
that some forms of stored energy are more useful than others. For example, why do we bother to
burn coal and oil in power plants even though the atmosphere and the oceans are vast reservoirs
of energy? Can we mitigate global warming by extracting energy from the atmosphere to run a
power plant? From the work of Kelvin, Clausius, Carnot and others, we know that we cannot
convert stored energy into work with 100% eﬃciency, and we must necessarily “waste” some of
the energy. At this point, it is easier to understand the reason for this necessary ineﬃciency by
microscopic arguments. For example, the energy in the gasoline of the fuel tank of an automobile
is associated with many molecules. The job of the automobile engine is to transform this energy
so that it is associated with only a few degrees of freedom, that is, the rolling tires and gears. It
is plausible that it is ineﬃcient to transfer energy from many degrees of freedom to only a few.
In contrast, transferring energy from a few degrees of freedom (the ﬁrewood) to many degrees of
freedom (the air in your room) is relatively easy.
The importance of entropy, the direction of time, and the ineﬃciency of converting stored
energy into work are summarized in the various statements of the second law of thermodynamics.
It is interesting that historically, the second law of thermodynamics was conceived before the ﬁrst
law. As we will learn in Chapter 2, the ﬁrst law is a statement of conservation of energy. 1.4 Quality of energy Because the total energy is conserved (if all energy transfers are taken into account), why do we
speak of an “energy shortage”? The reason is that energy comes in many forms and some forms are
more useful than others. In the context of thermodynamics, the usefulness of energy is determined
by its ability to do work.
Suppose that we take some ﬁrewood and use it to “heat” a sealed room. Because of energy
conservation, the energy in the room plus the ﬁrewood is the same before and after the ﬁrewood
has been converted to ash. But which form of the energy is more capable of doing work? You
probably realize that the ﬁrewood is a more useful form of energy than the “hot air” that exists
after the ﬁrewood is burned. Originally the energy was stored in the form of chemical (potential)
energy. Afterward the energy is mostly associated with the motion of the molecules in the air.
What has changed is not the total energy, but its ability to do work. We will learn that an increase
in entropy is associated with a loss of ability to do work. We have an entropy problem, not an
energy shortage. CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 1.5 6 Some simple simulations So far we have discussed the behavior of macroscopic systems by appealing to everyday experience
and simple observations. We now discuss some simple ways that we can simulate the behavior of
macroscopic systems, which consist of the order of 1023 particles. Although we cannot simulate
such a large system on a computer, we will ﬁnd that even relatively small systems of the order of
a hundred particles are suﬃcient to illustrate the qualitative behavior of macroscopic systems.
Consider a macroscopic system consisting of particles whose internal structure can be ignored.
In particular, imagine a system of N particles in a closed container of volume V and suppose that
the container is far from the inﬂuence of external forces such as gravity. We will usually consider
twodimensional systems so that we can easily visualize the motion of the particles.
For simplicity, we assume that the motion of the particles is given by classical mechanics,
that is, by Newton’s second law. If the resultant equations of motion are combined with initial
conditions for the positions and velocities of each particle, we can calculate, in principle, the
trajectory of each particle and the evolution of the system. To compute the total force on each
particle we have to specify the nature of the interaction between the particles. We will assume
that the force between any pair of particles depends only on the distance between them. This
simplifying assumption is applicable to simple liquids such as liquid argon, but not to water. We
will also assume that the particles are not charged. The force between any two particles must be
repulsive when their separation is small and weakly attractive when they are reasonably far apart.
For simplicity, we will usually assume that the interaction is given by the LennardJones potential,
whose form is given by
σ6
σ 12
−
.
(1.1)
u(r) = 4
r
r
A plot of the LennardJones potential is shown in Figure 1.1. The r−12 form of the repulsive part
of the interaction is chosen for convenience only and has no fundamental signiﬁcance. However,
the attractive 1/r6 behavior at large r is the van der Waals interaction. The force between any
two particles is given by f (r) = −du/dr.
Usually we want to simulate a gas or liquid in the bulk. In such systems the fraction of
particles near the walls of the container is negligibly small. However, the number of particles that
can be studied in a simulation is typically 103 –106 . For these relatively small systems, the fraction
of particles near the walls of the container would be signiﬁcant, and hence the behavior of such
a system would be dominated by surface eﬀects. The most common way of minimizing surface
eﬀects and to simulate more closely the properties of a bulk system is to use what are known as
toroidal boundary conditions. These boundary conditions are familiar to computer game players.
For example, a particle that exits the right edge of the “box,” reenters the box from the left side.
In one dimension, this boundary condition is equivalent to taking a piece of wire and making it
into a loop. In this way a particle moving on the wire never reaches the end.
Given the form of the interparticle potential, we can determine the total force on each particle
due to all the other particles in the system. Given this force, we ﬁnd the acceleration of each
particle from Newton’s second law of motion. Because the acceleration is the second derivative
of the position, we need to solve a secondorder diﬀerential equation for each particle (for each
direction). (For a twodimensional system of N particles, we would have to solve 2N diﬀerential
equations.) These diﬀerential equations are coupled because the acceleration of a given particle CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 7 u σ ε r Figure 1.1: Plot of the LennardJones potential u(r), where r is the distance between the particles.
Note that the potential is characterized by a length σ and an energy . depends on the positions of all the other particles. Obviously, we cannot solve the resultant
set of coupled diﬀerential equations analytically. However, we can use relatively straightforward
numerical methods to solve these equations to a good approximation. This way of simulating dense
gases, liquids, solids, and biomolecules is called molecular dynamics.2
Approach to equilibrium. In the following we will explore some of the qualitative properties
of macroscopic systems by doing some simple simulations. Before you actually do the simulations,
think about what you believe the results will be. In many cases, the most valuable part of the simulation is not the simulation itself, but the act of thinking about a concrete model and its behavior.
The simulations can be run as applications on your computer by downloading the Launcher from
<stp.clarku.edu/simulations/choose.html> . The Launcher conveniently packages all the simulations (and a few more) discussed in these notes into a single ﬁle. Alternatively, you can run
each simulation as an applet using a browser.
Problem 1.1. Approach to equilibrium
Suppose that a box is divided into three equal parts and N particles are placed at random in
the middle third of the box.3 The velocity of each particle is assigned at random and then the
velocity of the center of mass is set to zero. At t = 0, we remove the “barriers” between the
2 The nature of molecular dynamics is discussed in Chapter 8 of Gould, Tobochnik, and Christian.
have divided the box into three parts so that the eﬀects of the toroidal boundary conditions will not be as
apparent as if we had initially conﬁned the particles to one half of the box. The particles are placed at random in
the middle third of the box with the constraint that no two particles can be closer than the length σ . This constraint
prevents the initial force between any two particles from being two big, which would lead to the breakdown of the
numerical method used to solve the diﬀerential equations. The initial density ρ = N/A is ρ = 0.2.
3 We CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 8 three parts and watch the particles move according to Newton’s equations of motion. We say
that the removal of the barrier corresponds to the removal of an internal constraint. What do
you think will happen? The applet/application at <stp.clarku.edu/simulations/approach.
html> implements this simulation. Give your answers to the following questions before you do the
simulation.
(a) Start the simulation with N = 27, n1 = 0, n2 = N , and n3 = 0. What is the qualitative
behavior of n1 , n2 , and n3 , the number of particles in each third of the box, as a function of
the time t? Does the system appear to show a direction of time? Choose various values of N
that are multiples of three up to N = 270. Is the direction of time better deﬁned for larger N ?
(b) Suppose that we made a video of the motion of the particles considered in Problem 1.1a. Would
you be able to tell if the video were played forward or backward for the various values of N ?
Would you be willing to make an even bet about the direction of time? Does your conclusion
about the direction of time become more certain as N increases?
(c) After n1 , n2 , and n3 become approximately equal for N = 270, reverse the time and continue
the simulation. Reversing the time is equivalent to letting t → −t and changing the signs of
all the velocities. Do the particles return to the middle third of the box? Do the simulation
again, but let the particles move for a longer time before the time is reversed. What happens
now?
(d) From watching the motion of the particles, describe the nature of the boundary conditions
that are used in the simulation.
The results of the simulations in Problem 1.1 might not seem very surprising until you start
to think about them. Why does the system as a whole exhibit a direction of time when the motion
of each particle is time reversible? Do the particles ﬁll up the available space simply because the
system becomes less dense?
To gain some more insight into these questions, we consider a simpler simulation. Imagine
a closed box that is divided into two parts of equal volume. The left half initially contains N
identical particles and the right half is empty. We then make a small hole in the partition between
the two halves. What happens? Instead of simulating this system by solving Newton’s equations
for each particle, we adopt a simpler approach based on a probabilistic model. We assume that the
particles do not interact with one another so that the probability per unit time that a particle goes
through the hole in the partition is the same for all particles regardless of the number of particles
in either half. We also assume that the size of the hole is such that only one particle can pass
through it in one unit of time.
One way to implement this model is to choose a particle at random and move it to the other
side. This procedure is cumbersome, because our only interest is the number of particles on each
side. That is, we need to know only n, the number of particles on the left side; the number on
the right side is N − n. Because each particle has the same chance to go through the hole in the
partition, the probability per unit time that a particle moves from left to right equals the number
of particles on the left side divided by the total number of particles; that is, the probability of a
move from left to right is n/N . The algorithm for simulating the evolution of the model is given
by the following steps: CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 9 Figure 1.2: Evolution of the number of particles in each third of the box for N = 270. The particles
were initially restricted to the middle third of the box. Toroidal boundary conditions are used in
both directions. The initial velocities were assigned at random from a distribution corresponding
to temperature T = 5. The time was reversed at t ≈ 59. Does the system exhibit a direction of
time? 1. Generate a random number r from a uniformly distributed set of random numbers in the
unit interval 0 ≤ r < 1.
2. If r ≤ n/N , move a particle from left to right, that is, let n → n − 1; otherwise, move a
particle from right to left, n → n + 1.
3. Increase the “time” by 1.
Problem 1.2. Particles in a box
(a) The applet at <stp.clarku.edu/simulations/box.html> implements this algorithm and
plots the evolution of n. Describe the behavior of n(t) for various values of N . Does the
system approach equilibrium? How would you characterize equilibrium? In what sense is
equilibrium better deﬁned as N becomes larger? Does your deﬁnition of equilibrium depend
on how the particles were initially distributed between the two halves of the box?
(b) When the system is in equilibrium, does the number of particles on the lefthand side remain
a constant? If not, how would you describe the nature of equilibrium?
(c) If N 32, does the system ever return to its initial state? CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 10 (d) How does n, the mean number of particles on the lefthand side, depend on N after the system
has reached equilibrium? For simplicity, the program computes various averages from time
t = 0. Why would such a calculation not yield the correct equilibrium average values? What
is the purpose of the Zero averages button?
(e) Deﬁne the quantity σ by the relation σ 2 = (n − n)2 . What does σ measure? What would be
its value if n were constant? How does σ depend on N ? How does the ratio σ/n depend on
N ? In what sense is equilibrium better deﬁned as N increases?
From Problems 1.1 and 1.2 we see that after a system reaches equilibrium, the macroscopic
quantities of interest become independent of time on the average, but exhibit ﬂuctuations about
their average values. We also learned that the relative ﬂuctuations about the average become
smaller as the number of constituents is increased and the details of the dynamics are irrelevant
as far as the general tendency of macroscopic systems to approach equilibrium.
How can we understand why the systems considered in Problems 1.1 and 1.2 exhibit a direction
of time? There are two general approaches that we can take. One way would be to study the
dynamics of the system. A much simpler way is to change the question and take advantage of
the fact that the equilibrium state of a macroscopic system is independent of time on the average
and hence time is irrelevant in equilibrium. For the simple system considered in Problem 1.2 we
will see that counting the number of ways that the particles can be distributed between the two
halves of the box will give us much insight into the nature of equilibrium. This information tells
us nothing about the approach of the system to equilibrium, but it will give us insight into why
there is a direction of time.
Let us call each distinct arrangement of the particles between the two halves of the box a
conﬁguration. A given particle can be in either the left half or the right half of the box. Because
the halves are equivalent, a given particle is equally likely to be in either half if the system is in
equilibrium. For N = 2, the four possible conﬁgurations are shown in Table 1.1. Note that each
conﬁguration has a probability of 1/4 if the system is in equilibrium.
conﬁguration
L
L
L
R
R
L
R
R n
2 W (n)
1 1 2 0 1 Table 1.1: The four possible ways in which N = 2 particles can be distributed between the
two halves of a box. The quantity W (n) is the number of conﬁgurations corresponding to the
macroscopic state characterized by n.
Now let us consider N = 4 for which there are 2 × 2 × 2 × 2 = 24 = 16 conﬁgurations (see
Table 1.2). From a macroscopic point of view, we do not care which particle is in which half of the
box, but only the number of particles on the left. Hence, the macroscopic state or macrostate is
speciﬁed by n. Let us assume as before that all conﬁgurations are equally probable in equilibrium.
We see from Table 1.2 that there is only one conﬁguration with all particles on the left and the
most probable macrostate is n = 2. CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 11 For larger N , the probability of the most probable macrostate with n = N/2 is much greater
than the macrostate with n = N , which has a probability of only 1/2N corresponding to a single
conﬁguration. The latter conﬁguration is “special” and is said to be nonrandom, while the conﬁgurations with n ≈ N/2, for which the distribution of the particles is approximately uniform,
are said to be “random.” So we can see that the equilibrium macrostate corresponds to the most
probable state.
conﬁguration
LLLL
RLLL
LRLL
LLRL
LLLR n
4
3
3
3
3 W (n)
1 P (n)
1/16 4 4/16 R
R
R
L
L
L R
L
L
R
R
L L
R
L
R
L
R L
L
R
L
R
R 2
2
2
2
2
2 6 6/16 R
R
R
L R
R
L
R R
L
R
R L
R
R
R 1
1
1
1 4 4/16 R R R R 0 1 1/16 Table 1.2: The sixteen possible ways in which N = 4 particles can be distributed between the
two halves of a box. The quantity W (n) is the number of conﬁgurations corresponding to the
macroscopic state characterized by n. The probability P (n) of the macrostate n is calculated
assuming that each conﬁguration is equally likely.
Problem 1.3. Enumeration of possible conﬁgurations
(a) Calculate the number of possible conﬁgurations for each macrostate n for N = 8 particles.
What is the probability that n = 8? What is the probability that n = 4? It is possible
to count the number of conﬁgurations for each n by hand if you have enough patience, but
because there are a total of 28 = 256 conﬁgurations, this counting would be very tedious. An
alternative is to derive an expression for the number of ways that n particles out of N can
be in the left half of the box. One way to motivate such an expression is to enumerate the
possible conﬁgurations for smaller values of N and see if you can observe a pattern.
(b) From part (a) we see that the macrostate with n = N/2 is much more probable than the
macrostate with n = N . Why?
We observed that if an isolated macroscopic system changes in time due to the removal of an
internal constraint, it tends to evolve from a less random to a more random state. We also observed CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 12 that once the system reaches its most random state, ﬂuctuations corresponding to an appreciably
nonuniform state are very rare. These observations and our reasoning based on counting the
number of conﬁgurations corresponding to a particular macrostate allows us to conclude that
A system in a nonuniform macrostate will change in time on the average so as to
approach its most random macrostate where it is in equilibrium.
Note that our simulations involved watching the system evolve, but our discussion of the
number of conﬁgurations corresponding to each macrostate did not involve the dynamics in any
way. Instead this approach involved the enumeration of the conﬁgurations and assigning them
equal probabilities assuming that the system is isolated and in equilibrium. We will ﬁnd that it is
much easier to understand equilibrium systems by ignoring the time altogether.
In the simulation of Problem 1.1 the total energy was conserved, and hence the macroscopic
quantity of interest that changed from the specially prepared initial state with n2 = N to the
most random macrostate with n2 ≈ N/3 was not the total energy. So what macroscopic quantity
changed besides n1 , n2 , and n3 (the number of particles in each third of the box)? Based on our
earlier discussion, we tentatively say that the quantity that changed is the entropy. This statement
is no more meaningful than saying that balls fall near the earth’s surface because of gravity. We
conjecture that the entropy is associated with the number of conﬁgurations associated with a
given macrostate. If we make this association, we see that the entropy is greater after the system
has reached equilibrium than in the system’s initial state. Moreover, if the system were initially
prepared in a random state, the mean value of n2 and hence the entropy would not change. Hence,
we can conclude the following:
The entropy of an isolated system increases or remains the same when an internal
constraint is removed.
This statement is equivalent to the second law of thermodynamics. You might want to skip to
Chapter 4, where this identiﬁcation of the entropy is made explicit.
As a result of the two simulations that we have done and our discussions, we can make some
additional tentative observations about the behavior of macroscopic systems.
Fluctuations in equilibrium. Once a system reaches equilibrium, the macroscopic quantities of
interest do not become independent of the time, but exhibit ﬂuctuations about their average values.
That is, in equilibrium only the average values of the macroscopic variables are independent of
time. For example, for the particles in the box problem n(t) changes with t, but its average value
n does not. If N is large, ﬂuctuations corresponding to a very nonuniform distribution of the
particles almost never occur, and the relative ﬂuctuations, σ/n become smaller as N is increased.
History independence. The properties of equilibrium systems are independent of their history.
For example, n would be the same whether we had started with n(t = 0) = 0 or n(t = 0) = N .
In contrast, as members of the human race, we are all products of our history. One consequence
of history independence is that it is easier to understand the properties of equilibrium systems by
ignoring the dynamics of the particles. (The global constraints on the dynamics are important.
For example, it is important to know if the total energy is a constant or not.) We will ﬁnd that
equilibrium statistical mechanics is essentially equivalent to counting conﬁgurations. The problem
will be that this counting is diﬃcult to do in general. CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 13 Need for statistical approach. Systems can be described in detail by specifying their microstate.
Such a description corresponds to giving all the information that is possible. For a system of
classical particles, a microstate corresponds to specifying the position and velocity of each particle.
In our analysis of Problem 1.2, we speciﬁed only in which half of the box a particle was located,
so we used the term conﬁguration rather than microstate. However, the terms are frequently used
interchangeably.
From our simulations, we see that the microscopic state of the system changes in a complicated
way that is diﬃcult to describe. However, from a macroscopic point of view, the description is
much simpler. Suppose that we simulated a system of many particles and saved the trajectories
of the particles as a function of time. What could we do with this information? If the number of
particles is 106 or more or if we ran long enough, we would have a problem storing the data. Do
we want to have a detailed description of the motion of each particle? Would this data give us
much insight into the macroscopic behavior of the system? As we have found, the trajectories of
the particles are not of much interest, and it is more useful to develop a probabilistic approach.
That is, the presence of a large number of particles motivates us to use statistical methods. In
Section 1.8 we will discuss another reason why a probabilistic approach is necessary.
We will ﬁnd that the laws of thermodynamics depend on the fact that the number of particles in
macroscopic systems is enormous. A typical measure of this number is Avogadro’s number which
is approximately 6 × 1023 , the number of atoms in a mole. When there are so many particles,
predictions of the average properties of the system become meaningful, and deviations from the
average behavior become less and less important as the number of atoms is increased.
Equal a priori probabilities. In our analysis of the probability of each macrostate in Problem 1.2, we assumed that each conﬁguration was equally probable. That is, each conﬁguration of
an isolated system occurs with equal probability if the system is in equilibrium. We will make this
assumption explicit for isolated systems in Chapter 4.
Existence of diﬀerent phases. So far our simulations of interacting systems have been restricted
to dilute gases. What do you think would happen if we made the density higher? Would a system
of interacting particles form a liquid or a solid if the temperature or the density were chosen
appropriately? The existence of diﬀerent phases is investigated in Problem 1.4.
Problem 1.4. Diﬀerent phases
(a) The applet/application at <stp.clarku.edu/simulations/lj.html> simulates an isolated
system of N particles interacting via the LennardJones potential. Choose N = 64 and L = 18
so that the density ρ = N/L2 ≈ 0.2. The initial positions are chosen at random except that
no two particles are allowed to be closer than σ . Run the simulation and satisfy yourself that
this choice of density and resultant total energy corresponds to a gas. What is your criterion?
(b) Slowly lower the total energy of the system. (The total energy is lowered by rescaling the
velocities of the particles.) If you are patient, you might be able to observe “liquidlike”
regions. How are they diﬀerent than “gaslike” regions?
(c) If you decrease the total energy further, you will observe the system in a state roughly corresponding to a solid. What is your criteria for a solid? Explain why the solid that we obtain in
this way will not be a perfect crystalline solid. CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 14 (d) Describe the motion of the individual particles in the gas, liquid, and solid phases.
(e) Conjecture why a system of particles interacting via the LennardJones potential in (1.1) can
exist in diﬀerent phases. Is it necessary for the potential to have an attractive part for the
system to have a liquid phase? Is the attractive part necessary for there to be a solid phase?
Describe a simulation that would help you answer this question.
It is fascinating that a system with the same interparticle interaction can be in diﬀerent
phases. At the microscopic level, the dynamics of the particles is governed by the same equations
of motion. What changes? How does such a phase change occur at the microscopic level? Why
doesn’t a liquid crystallize immediately when its temperature is lowered quickly? What happens
when it does begin to crystallize? We will ﬁnd in later chapters that phase changes are examples
of cooperative eﬀects. 1.6 Measuring the pressure and temperature The obvious macroscopic variables that we can measure in our simulations of the system of particles
interacting via the LennardJones potential include the average kinetic and potential energies, the
number of particles, and the volume. We also learned that the entropy is a relevant macroscopic
variable, but we have not learned how to determine it from a simulation.4 We know from our
everyday experience that there are at least two other macroscopic variables that are relevant for
describing a macrostate, namely, the pressure and the temperature.
The pressure is easy to measure because we are familiar with force and pressure from courses
in mechanics. To remind you of the relation of the pressure to the momentum ﬂux, consider N
particles in a cube of volume V and linear dimension L. The center of mass momentum of the
particles is zero. Imagine a planar surface of area A = L2 placed in the system and oriented
perpendicular to the xaxis as shown in Figure 1.3. The pressure P can be deﬁned as the force per
unit area acting normal to the surface:
Fx
P=
.
(1.2)
A
We have written P as a scalar because the pressure is the same in all directions on the average.
From Newton’s second law, we can rewrite (1.2) as
P= 1 d(mvx )
.
A dt (1.3) From (1.3) we see that the pressure is the amount of momentum that crosses a unit area of
the surface per unit time. We could use (1.3) to determine the pressure, but this relation uses
information only from the fraction of particles that are crossing an arbitrary surface at a given
time. Instead, our simulations will use the relation of the pressure to the virial, a quantity that
involves all the particles in the system.5
4 We will ﬁnd that it is very diﬃcult to determine the entropy directly by making either measurements in the
laboratory or during a simulation. Entropy, unlike pressure and temperature, has no mechanical analog.
5 See Gould, Tobochnik, and Christian, Chapter 8. The relation of the pressure to the virial is usually considered
in graduate courses in mechanics. CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 15 not done Figure 1.3: Imaginary plane perpendicular to the xaxis across which the momentum ﬂux is evaluated. Problem 1.5. Nature of temperature
(a) Summarize what you know about temperature. What reasons do you have for thinking that
it has something to do with energy?
(b) Discuss what happens to the temperature of a hot cup of coﬀee. What happens, if anything,
to the temperature of its surroundings?
The relation between temperature and energy is not simple. For example, one way to increase
the energy of a glass of water would be to lift it. However, this action would not aﬀect the
temperature of the water. So the temperature has nothing to do with the motion of the center of
mass of the system. As another example, if we placed a container of water on a moving conveyor
belt, the temperature of the water would not change. We also know that temperature is a property
associated with many particles. It would be absurd to refer to the temperature of a single molecule.
This discussion suggests that temperature has something to do with energy, but it has missed
the most fundamental property of temperature, that is, the temperature is the quantity that becomes
equal when two systems are allowed to exchange energy with one another. (Think about what
happens to a cup of hot coﬀee.) In Problem 1.6 we identify the temperature from this point of
view for a system of particles.
Problem 1.6. Identiﬁcation of the temperature
(a) Consider two systems of particles interacting via the LennardJones potential given in (1.1). Select the applet/application at <stp.clarku.edu/simulations/thermalcontact.html> . For
system A, we take NA = 81, AA = 1.0, and σAA = 1.0; for system B , we have NB = 64,
AA = 1.5, and σAA = 1.2. Both systems are in a square box with linear dimension L = 12. In
this case, toroidal boundary conditions are not used and the particles also interact with ﬁxed
particles (with inﬁnite mass) that make up the walls and the partition between them. Initially,
the two systems are isolated from each other and from their surroundings. Run the simulation
until each system appears to be in equilibrium. Does the kinetic energy and potential energy
of each system change as the system evolves? Why? What is the mean potential and kinetic
energy of each system? Is the total energy of each system ﬁxed (to within numerical error)? CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 16 (b) Remove the barrier and let the two systems interact with one another.6 We choose AB = 1.25
and σAB = 1.1. What quantity is exchanged between the two systems? (The volume of each
system is ﬁxed.)
(c) Monitor the kinetic and potential energy of each system. After equilibrium has been established
between the two systems, compare the average kinetic and potential energies to their values
before the two systems came into contact.
(d) We are looking for a quantity that is the same in both systems after equilibrium has been
established. Are the average kinetic and potential energies the same? If not, think about what
would happen if you doubled the N and the area of each system? Would the temperature
change? Does it make more sense to compare the average kinetic and potential energies or the
average kinetic and potential energies per particle? What quantity does become the same once
the two systems are in equilibrium? Do any other quantities become approximately equal?
What do you conclude about the possible identiﬁcation of the temperature?
From the simulations in Problem 1.6, you are likely to conclude that the temperature is
proportional to the average kinetic energy per particle. We will learn in Chapter 4 that the
proportionality of the temperature to the average kinetic energy per particle holds only for a
system of particles whose kinetic energy is proportional to the square of the momentum (velocity).
Another way of thinking about temperature is that temperature is what you measure with a
thermometer. If you want to measure the temperature of a cup of coﬀee, you put a thermometer
into the coﬀee. Why does this procedure work?
Problem 1.7. Thermometers
Describe some of the simple thermometers with which you are familiar. On what physical principles
do these thermometers operate? What requirements must a thermometer have?
Now lets imagine a simulation of a simple thermometer. Imagine a special particle, a “demon,”
that is able to exchange energy with a system of particles. The only constraint is that the energy
of the demon Ed must be nonnegative. The behavior of the demon is given by the following
algorithm:
1. Choose a particle in the system at random and make a trial change in one of its coordinates.
2. Compute ∆E , the change in the energy of the system due to the change.
3. If ∆E ≤ 0, the system gives the surplus energy ∆E  to the demon, Ed → Ed + ∆E , and
the trial conﬁguration is accepted.
4. If ∆E > 0 and the demon has suﬃcient energy for this change, then the demon gives the
necessary energy to the system, Ed → Ed − ∆E , and the trial conﬁguration is accepted.
Otherwise, the trial conﬁguration is rejected and the conﬁguration is not changed.
6 In order to ensure that we can continue to identify which particle belongs to system A and system B, we have
added a spring to each particle so that it cannot wander too far from its original lattice site. CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 17 Note that the total energy of the system and the demon is ﬁxed.
We consider the consequences of these simple rules in Problem 1.8. The nature of the demon
is discussed further in Section 4.9.
Problem 1.8. The demon and the ideal gas
(a) The applet/application at <stp.clarku.edu/simulations/demon.html> simulates a demon
that exchanges energy with an ideal gas of N particles moving in d spatial dimensions. Because
the particles do not interact, the only coordinate of interest is the velocity of the particles.
In this case the demon chooses a particle at random and changes its velocity in one of its d
directions by an amount chosen at random between −∆ and +∆. For simplicity, the initial
velocity of each particle is set equal to +v0 x, where v0 = (2E0 /m)1/2 /N , E0 is the desired
ˆ
total energy of the system, and m is the mass of the particles. For simplicity, we will choose
units such that m = 1. Choose d = 1, N = 40, and E0 = 10 and determine the mean energy
of the demon E d and the mean energy of the system E . Why is E = E0 ?
(b) What is e, the mean energy per particle of the system? How do e and E d compare for various
values of E0 ? What is the relation, if any, between the mean energy of the demon and the
mean energy of the system?
(c) Choose N = 80 and E0 = 20 and compare e and E d . What conclusion, if any, can you make?7
(d) Run the simulation for several other values of the initial total energy E0 and determine how e
depends on E d for ﬁxed N .
(e) From your results in part (d), what can you conclude about the role of the demon as a
thermometer? What properties, if any, does it have in common with real thermometers?
(f) Repeat the simulation for d = 2. What relation do you ﬁnd between e and E d for ﬁxed N ?
(g) Suppose that the energy momentum relation of the particles is not = p2 /2m, but is = cp,
where c is a constant (which we take to be unity). Determine how e depends on E d for ﬁxed
N and d = 1. Is the dependence the same as in part (d)?
(h) Suppose that the energy momentum relation of the particles is = Ap3/2 , where A is a constant
(which we take to be unity). Determine how e depends on E d for ﬁxed N and d = 1. Is this
dependence the same as in part (d) or part (g)?
(i) The simulation also computes the probability P (Ed )δE that the demon has energy between
Ed and Ed + δE . What is the nature of the dependence of P (Ed ) on Ed ? Does this dependence
depend on the nature of the system with which the demon interacts?
7 There are ﬁnite size eﬀects that are order 1/N . CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 1.7 18 Work, heating, and the ﬁrst law of thermodynamics If you watch the motion of the individual particles in a molecular dynamics simulation, you would
probably describe the motion as “random” in the sense of how we use random in everyday speech.
The motion of the individual molecules in a glass of water would exhibit similar motion. Suppose
that we were to expose the water to a low ﬂame. In a simulation this process would roughly
correspond to increasing the speed of the particles when they hit the wall. We say that we have
transferred energy to the system incoherently because each particle would continue to move more
or less at random.
You learned in your classical mechanics courses that the change in energy of a particle equals
the work done on it and the same is true for a collection of particles as long as we do not change
the energy of the particles in some other way at the same time. Hence, if we squeeze a plastic
container of water, we would do work on the system, and we would see the particles near the wall
move coherently. So we can distinguish two diﬀerent ways of transferring energy to the system.
That is, heating transfers energy incoherently and doing work transfers energy coherently.
Lets consider a molecular dynamics simulation again and suppose that we have increased the
energy of the system by either compressing the system and doing work on it or by increasing the
speed of the particles that reach the walls of the container. Roughly speaking, the ﬁrst way would
initially increase the potential energy of interaction and the second way would initially increase
the kinetic energy of the particles. If we increase the total energy by the same amount, could we
tell by looking at the particle trajectories after equilibrium has been reestablished how the energy
had been increased? The answer is no, because for a given total energy, volume, and number of
particles, the kinetic energy and the potential energy would have unique equilibrium values. (See
Problem 1.6 for a demonstration of this property.) We conclude that the energy of the gas can
be changed by doing work on it or by heating it. This statement is equivalent to the ﬁrst law of
thermodynamics and from the microscopic point of view is simply a statement of conservation of
energy.
Our discussion implies that the phrase “adding heat” to a system makes no sense, because
we cannot distinguish “heat energy” from potential energy and kinetic energy. Nevertheless, we
frequently use the word “heat ” in everyday speech. For example, we might way “Please turn on
the heat” and “I need to heat my coﬀee.” We will avoid such uses, and whenever possible avoid
the use of the noun “heat.” Why do we care? Because there is no such thing as heat! Once upon
a time, scientists thought that there was a ﬂuid in all substances called caloric or heat that could
ﬂow from one substance to another. This idea was abandoned many years ago, but is still used in
common language. Go ahead and use heat outside the classroom, but we won’t use it here. 1.8 *The fundamental need for a statistical approach In Section 1.5 we discussed the need for a statistical approach when treating macroscopic systems
from a microscopic point of view. Although we can compute the trajectory (the position and
velocity) of each particle for as long as we have patience, our disinterest in the trajectory of any
particular particle and the overwhelming amount of information that is generated in a simulation
motivates us to develop a statistical approach. CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR (a) 19 (b) Figure 1.4: (a) A special initial condition for N = 11 particles such that their motion remains
parallel indeﬁnitely. (b) The positions of the particles at time t = 8.0 after the change in vx (6).
The only change in the initial condition from part (a) is that vx (6) was changed from 1 to 1.000001. We now discuss why there is a more fundamental reason why we must use probabilistic methods to describe systems with more than a few particles. The reason is that under a wide variety of
conditions, even the most powerful supercomputer yields positions and velocities that are meaningless! In the following, we will ﬁnd that the trajectories in a system of many particles depend
sensitively on the initial conditions. Such a system is said to be chaotic. This behavior forces us
to take a statistical approach even for systems with as few as three particles.
As an example, consider a system of N = 11 particles moving in a box of linear dimension
L (see the applet/application at <stp.clarku.edu/simulations/sensitive.html> ). The initial
conditions are such that all particles have the same velocity vx (i) = 1, vy (i) = 0, and the particles
are equally spaced vertically, with x(i) = L/2 for i = 1, . . . , 11 (see Fig. 1.4(a)). Convince yourself
that for these special initial conditions, the particles will continue moving indeﬁnitely in the xdirection (using toroidal boundary conditions).
Now let us stop the simulation and change the velocity of particle 6, such that vx (6) =
1.000001. What do you think happens now? In Fig. 1.4(b) we show the positions of the particles
at time t = 8.0 after the change in velocity of particle 6. Note that the positions of the particles
are no longer equally spaced and the velocities of the particles are very diﬀerent. So in this case,
a small change in the velocity of one particle leads to a big change in the trajectories of all the
particles.
Problem 1.9. Irreversibility
The applet/application at <stp.clarku.edu/simulations/sensitive.html> simulates a system
of N = 11 particles with the special initial condition described in the text. Conﬁrm the results that
we have discussed. Change the velocity of particle 6 and stop the simulation at time t and reverse CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 20 all the velocities. Conﬁrm that if t is suﬃciently short, the particles will return approximately to
their initial state. What is the maximum value of t that will allow the system to return to its
initial positions if t is replaced by −t (all velocities reversed)?
An important property of chaotic systems is their extreme sensitivity to initial conditions,
that is, the trajectories of two identical systems starting with slightly diﬀerent initial conditions
will diverge exponentially in a short time. For such systems we cannot predict the positions
and velocities of the particles because even the slightest error in our measurement of the initial
conditions would make our prediction entirely wrong if the elapsed time is suﬃciently long. That
is, we cannot answer the question, “Where is particle 2 at time t?” if t is suﬃciently long. It might
be disturbing to realize that our answers are meaningless if we ask the wrong questions.
Although Newton’s equations of motion are time reversible, this reversibility cannot be realized
in practice for chaotic systems. Suppose that a chaotic system evolves for a time t and all the
velocities are reversed. If the system is allowed to evolve for an additional time t, the system will
not return to its original state unless the velocities are speciﬁed with inﬁnite precision. This lack
of practical reversibility is related to what we observe in macroscopic systems. If you pour milk
into a cup of coﬀee, the milk becomes uniformly distributed throughout the cup. You will never
see a cup of coﬀee spontaneously return to the state where all the milk is at the surface because
to do so, the positions and velocities of the milk and coﬀee molecules must be chosen so that the
molecules of milk return to this very special state. Even the slightest error in the choice of positions
and velocities will ruin any chance of the milk returning to the surface. This sensitivity to initial
conditions provides the foundation for the arrow of time. 1.9 *Time and ensemble averages We have seen that although the computed trajectories are meaningless for chaotic systems, averages
over the trajectories are physically meaningful. That is, although a computed trajectory might
not be the one that we thought we were computing, the positions and velocities that we compute
are consistent with the constraints we have imposed, in this case, the total energy E , the volume
V , and the number of particles N . This reasoning suggests that macroscopic properties such as
the temperature and pressure must be expressed as averages over the trajectories.
Solving Newton’s equations numerically as we have done in our simulations yields a time
average. If we do a laboratory experiment to measure the temperature and pressure, our measurements also would be equivalent to a time average. As we have mentioned, time is irrelevant in
equilibrium. We will ﬁnd that it is easier to do calculations in statistical mechanics by doing an
ensemble average. We will discuss ensemble averages in Chapter 3. In brief an ensemble average is
over many mental copies of the system that satisfy the same known conditions. A simple example
might clarify the nature of these two types of averages. Suppose that we want to determine the
probability that the toss of a coin results in “heads.” We can do a time average by taking one
coin, tossing it in the air many times, and counting the fraction of heads. In contrast, an ensemble
average can be found by obtaining many similar coins and tossing them into the air at one time.
It is reasonable to assume that the two ways of averaging are equivalent. This equivalence
is called the quasiergodic hypothesis. The use of the term “hypothesis” might suggest that the
equivalence is not well accepted, but it reminds us that the equivalence has been shown to be CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 21 rigorously true in only a few cases. The sensitivity of the trajectories of chaotic systems to initial
conditions suggests that a classical system of particles moving according to Newton’s equations of
motion passes through many diﬀerent microstates corresponding to diﬀerent sets of positions and
velocities. This property is called mixing, and it is essential for the validity of the quasiergodic
hypothesis.
In summary, macroscopic properties are averages over the microscopic variables and give
predictable values if the system is suﬃciently large. One goal of statistical mechanics is to give
a microscopic basis for the laws of thermodynamics. In this context it is remarkable that these
laws depend on the fact that gases, liquids, and solids are chaotic systems. Another important
goal of statistical mechanics is to calculate the macroscopic properties from a knowledge of the
intermolecular interactions. 1.10 *Models of matter There are many models of interest in statistical mechanics, corresponding to the wide range of
macroscopic systems found in nature and made in the laboratory. So far we have discussed a
simple model of a classical gas and used the same model to simulate a classical liquid and a solid.
One key to understanding nature is to develop models that are simple enough to analyze, but
that are rich enough to show the same features that are observed in nature. Some of the more
common models that we will consider include the following. 1.10.1 The ideal gas The simplest models of macroscopic systems are those for which the interaction between the individual particles is very small. For example, if a system of particles is very dilute, collisions between
the particles will be rare and can be neglected under most circumstances. In the limit that the
interactions between the particles can be neglected completely, the model is known as the ideal
gas. The classical ideal gas allows us to understand much about the behavior of dilute gases, such
as those in the earth’s atmosphere. The quantum version will be useful in understanding blackbody radiation (Section 6.9), electrons in metals (Section 6.10), the low temperature behavior of
crystalline solids (Section 6.12), and a simple model of superﬂuidity (Section 6.11).
The term “ideal gas” is a misnomer because it can be used to understand the properties of
solids and other interacting particle systems under certain circumstances, and because in many
ways the neglect of interactions is not ideal. The historical reason for the use of this term is that
the neglect of interparticle interactions allows us to do some calculations analytically. However,
the neglect of interparticle interactions raises other issues. For example, how does an ideal gas
reach equilibrium if there are no collisions between the particles? CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 1.10.2 22 Interparticle potentials As we have mentioned, the most popular form of the potential between two neutral atoms is the
LennardJones potential8 given in (1.1). This potential has an weak attractive tail at large r,
reaches a minimum at r = 21/6 σ ≈ 1.122σ , and is strongly repulsive at shorter distances. The
LennardJones potential is appropriate for closedshell systems, that is, rare gases such as Ar or Kr.
Nevertheless, the LennardJones potential is a very important model system and is the standard
potential for studies where the focus is on fundamental issues, rather than on the properties of a
speciﬁc material.
An even simpler interaction is the hard core interaction given by
V (r) = ∞ (r ≤ σ )
0. (r > σ ) (1.4) A system of particles interacting via (1.4) is called a system of hard spheres, hard disks, or hard
rods depending on whether the spatial dimension is three, two, or one, respectively. Note that
V (r) in (1.4) is purely repulsive. 1.10.3 Lattice models In another class of models, the positions of the particles are restricted to a lattice or grid and the
momenta of the particles are irrelevant. In the most popular model of this type the “particles”
correspond to magnetic moments. At high temperatures the magnetic moments are aﬀected by
external magnetic ﬁelds, but the interaction between moments can be neglected.
The simplest, nontrivial model that includes interactions is the Ising model, the most important model in statistical mechanics. The model consists of spins located on a lattice such that
each spin can take on one of two values designated as up and down or ±1. The interaction energy
between two neighboring spins is −J if the two spins are in the same state and +J if they are
in opposite states. One reason for the importance of this model is that it is one of the simplest
to have a phase transition, in this case, a phase transition between a ferromagnetic state and a
paramagnetic state.
We will focus on three classes of models – the ideal classical and quantum gas, classical systems
of interacting particles, and the Ising model and its extensions. These models will be used in many
contexts to illustrate the ideas and techniques of statistical mechanics. 1.11 Importance of simulations Only simple models such as the ideal gas or special cases such as the twodimensional Ising model
can be analyzed by analytical methods. Much of what is done in statistical mechanics is to establish
the general behavior of a model and then relate it to the behavior of another model. This way of
understanding is not as strange as it ﬁrst might appear. How many diﬀerent systems in classical
mechanics can be solved exactly?
8 This potential is named after John LennardJones, 1894–1954, a theoretical chemist and physicist at Cambridge
University. CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 23 Statistical physics has grown in importance over the past several decades because powerful
computers and new computer algorithms have allowed us to explore the consequences of more complex systems. Simulations play an important intermediate role between theory and experiment. As
our models become more realistic, it is likely that they will require the computer for understanding
many of their properties. In a simulation we start with a microscopic model for which the variables
represent the microscopic constituents and determine the consequences of their interactions. Frequently the goal of our simulations is to explore these consequences so that we have a better idea
of what type of theoretical analysis might be possible and what type of laboratory experiments
should be done. Simulations allow us to compute many diﬀerent kinds of quantities, some of which
cannot be measured in a laboratory experiment.
Not only can we simulate reasonably realistic models, we also can study models that are impossible to realize in the laboratory, but are useful for providing a deeper theoretical understanding
of real systems. For example, a comparison of the behavior of a model in three and four spatial
dimensions can yield insight into why the threedimensional system behaves the way it does.
Simulations cannot replace laboratory experiments and are limited by the ﬁnite size of the
systems and by the short duration of our runs. For example, at present the longest simulations of
simple liquids are for no more than 10−6 s.
Not only have simulations made possible new ways of doing research, they also make it possible
to illustrate the important ideas of statistical mechanics. We hope that the simulations that we
have already discussed have already convinced you of their utility. For this reason, we will consider
many simulations throughout these notes. 1.12 Summary This introductory chapter has been designed to whet your appetite, and at this point it is not likely
that you will fully appreciate the signiﬁcance of such concepts as entropy and the direction of time.
We are reminded of the book, All I Really Need to Know I Learned in Kindergarten.9 In principle,
we have discussed most of the important ideas in thermodynamics and statistical physics, but it
will take you a while before you understand these ideas in any depth.
We also have not discussed the tools necessary to solve any problems. Your understanding of
these concepts and the methods of statistical and thermal physics will increase as you work with
these ideas in diﬀerent contexts. You will ﬁnd that the unifying aspects of thermodynamics and
statistical mechanics are concepts such as the nature of equilibrium, the direction of time, and
the existence of cooperative eﬀects and diﬀerent phases. However, there is no unifying equation
such as Newton’s second law of motion in mechanics, Maxwell’s equations in electrodynamics, and
Schrodinger’s equation in nonrelativistic quantum mechanics.
There are many subtleties that we have glossed over so that we could get started. For example,
how good is our assumption that the microstates of an isolated system are equally probable? This
question is a deep one and has not been completely answered. The answer likely involves the
nature of chaos. Chaos seems necessary to insure that the system will explore a large number of
the available microstates, and hence make our assumption of equal probabilities valid. However,
we do not know how to tell a priori whether a system will behave chaotically or not.
9 Robert Fulghum, All I Really Need to Know I Learned in Kindergarten, Ballantine Books (2004). CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 24 Most of our discussion concerns equilibrium behavior. The “dynamics” in thermodynamics
refers to the fact that we can treat a variety of thermal processes in which a system moves from
one equilibrium state to another. Even if the actual process involves nonequilibrium states, we
can replace the nonequilibrium states by a series of equilibrium states which begin and end at
the same equilibrium states. This type of reasoning is analogous to the use of energy arguments
in mechanics. A ball can roll from the top of a hill to the bottom, rolling over many bumps and
valleys, but as long as there is no dissipation due to friction, we can determine the ball’s motion
at the bottom without knowing anything about how the ball got there.
The techniques and ideas of statistical mechanics are now being used outside of traditional
condensed matter physics. The ﬁeld theories of high energy physics, especially lattice gauge theories, use the methods of statistical mechanics. New methods of doing quantum mechanics convert
calculations to path integrals that are computed numerically using methods of statistical mechanics. Theories of the early universe use ideas from thermal physics. For example, we speak about
the universe being quenched to a certain state in analogy to materials being quenched from high
to low temperatures. We already have seen that chaos provides an underpinning for the need for
probability in statistical mechanics. Conversely, many of the techniques used in describing the
properties of dynamical systems have been borrowed from the theory of phase transitions, one of
the important areas of statistical mechanics.
Thermodynamics and statistical mechanics have traditionally been applied to gases, liquids,
and solids. This application has been very fruitful and is one reason why condensed matter physics,
materials science, and chemical physics are rapidly evolving and growing areas. Examples of new
materials include high temperature superconductors, lowdimensional magnetic and conducting
materials, composite materials, and materials doped with various impurities. In addition, scientists
are taking a new look at more traditional condensed systems such as water and other liquids,
liquid crystals, polymers, crystals, alloys, granular matter, and porous media such as rocks. And
in addition to our interest in macroscopic systems, there is growing interest is mesoscopic systems,
systems that are neither microscopic nor macroscopic, but are in between, that is, between ∼ 102
to ∼ 106 particles.
Thermodynamics might not seem to be as interesting to you when you ﬁrst encounter it.
However, an understanding of thermodynamics is important in many contexts including societal
issues such as global warming, electrical energy production, fuel cells, and other alternative energy
sources.
The science of information theory uses many ideas from statistical mechanics, and recently, new
optimization methods such as simulated annealing have been borrowed from statistical mechanics.
In recent years statistical mechanics has evolved into the more general ﬁeld of statistical
physics. Examples of systems of interest in the latter area include earthquake faults, granular matter, neural networks, models of computing, genetic algorithms, and the analysis of the distribution
of time to respond to email. Statistical physics is characterized more by its techniques than by the
problems that are its interest. This universal applicability makes the techniques more diﬃcult to
understand, but also makes the journey more exciting. CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 25 Vocabulary
thermodynamics, statistical mechanics
macroscopic system
conﬁguration, microstate, macrostate
specially prepared state, equilibrium, ﬂuctuations
thermal contact, temperature
sensitivity to initial conditions
models, computer simulations Problems
Problems
1.1
1.2
1.3
1.4
1.5 and 1.6
1.7
1.8
1.9 page
7
9
11
13
15
16
17
19 Table 1.3: Listing of inline problems.
Problem 1.10. (a) What do you observe when a small amount of black dye is placed in a glass
of water? (b) Suppose that a video were taken of this process and the video was run backward
without your knowledge. Would you be able to observe whether the video was being run forward or
backward? (c) Suppose that you could watch a video of the motion of an individual ink molecule.
Would you be able to know that the video was being shown forward or backward?
Problem 1.11. Describe several examples based on your everyday experience that illustrate the
unidirectional temporal behavior of macroscopic systems. For example, what happens to ice placed
in a glass of water at room temperature? What happens if you make a small hole in an inﬂated
tire? What happens if you roll a ball on a hard surface?
Problem 1.12. In what contexts can we treat water as a ﬂuid? In what context can water not
be treated as a ﬂuid?
Problem 1.13. How do you know that two objects are at the same temperature? How do you
know that two bodies are at diﬀerent temperatures?
Problem 1.14. Summarize your understanding of the properties of macroscopic systems.
Problem 1.15. Ask some of your friends why a ball falls when released above the Earth’s surface.
Explain why the answer “gravity” is not really an explanation.
Problem 1.16. What is your understanding of the concept of “randomness” at this time? Does
“random motion” imply that the motion occurs according to unknown rules? CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 26 Problem 1.17. What evidence can you cite from your everyday experience that the molecules in
a glass of water or in the surrounding air are in seemingly endless random motion?
Problem 1.18. Write a brief paragraph on the meaning of the abstract concepts, “energy” and
“justice.” (See the Feynman Lectures, Vol. 1, Chapter 4, for a discussion of why it is diﬃcult to
deﬁne such abstract concepts.)
Problem 1.19. A box of glass beads is also an example of macroscopic systems if the number
of beads is suﬃciently large. In what ways such a system diﬀerent than the macroscopic systems
that we have discussed in this chapter?
Problem 1.20. Suppose that the handle of a plastic bicycle pump is rapidly pushed inward.
Predict what happens to the temperature of the air inside the pump and explain your reasoning.
(This problem is given here to determine how you think about this type of problem at this time.
Similar problems will appear in later chapters to see if and how your reasoning has changed.) Appendix 1A: Mathematics Refresher
As discussed in Sec. 1.12, there is no unifying equation in statistical mechanics such as Newton’s
second law of motion to be solved in a variety of contexts. For this reason we will not adopt one
mathematical tool. Appendix 2B summarizes the mathematics of thermodynamics which makes
much use of partial derivatives. Appendix A summarizes some of the mathematical formulas and
relations that we will use. If you can do the following problems, you have a good background for
most of the mathematics that we will use in the following chapters.
Problem 1.21. Calculate the derivative with respect to x of the following functions: ex , e3x , eax ,
ln x, ln x2 , ln 3x, ln 1/x, sin x, cos x, sin 3x, and cos 2x.
Problem 1.22. Calculate the following integrals:
2 dx
2x2 (1.5a) dx
4x (1.5b) e3x dx (1.5c) 1
2
1
2
1 Problem 1.23. Calculate the partial derivative of x2 + xy + 3y 2 with respect to x and y . Suggestions for Further Reading
P. W. Atkins, The Second Law, Scientiﬁc American Books (1984). A qualitative introduction to
the second law of thermodynamics and its implications. CHAPTER 1. FROM MICROSCOPIC TO MACROSCOPIC BEHAVIOR 27 J. G. Oliveira and A.L. Barab´si, “Darwin and Einstein correspondence patterns,” Nature 437,
a
1251 (2005). The authors found the probability that Darwin and Einstein would respond to
a letter in τ days is well approximated by a power law, P (τ ) ∼ τ −a with a ≈ 3/2. What
is the explanation for this power law behavior? How long does it take you to respond to an
email?
Manfred Eigen and Ruthild Winkler, How the Principles of Nature Govern Chance, Princeton
University Press (1993).
Richard Feynman, R. B. Leighton, and M. Sands, Feynman Lectures on Physics, AddisonWesley
(1964). Volume 1 has a very good discussion of the nature of energy and work.
Harvey Gould, Jan Tobochnik, and Wolfgang Christian, An Introduction to Computer Simulation
Methods, third edition, AddisonWesley (2006).
F. Reif, Statistical Physics, Volume 5 of the Berkeley Physics Series, McGrawHill (1967). This
text was the ﬁrst to make use of computer simulations to explain some of the basic properties
of macroscopic systems.
Jeremy Rifkin, Entropy: A New World View, Bantom Books (1980). Although this popular book
raises some important issues, it, like many other popular books articles, misuses the concept
of entropy. For more discussion on the meaning of entropy and how it should be introduced,
see <www.entropysite.com/> and <www.entropysimple.com/> . Chapter 2 Thermodynamic Concepts and
Processes
c 2005 by Harvey Gould and Jan Tobochnik
29 September 2005
The study of temperature, energy, work, heating, entropy, and related macroscopic concepts comprise the ﬁeld known as thermodynamics. 2.1 Introduction In this chapter we will discuss ways of thinking about macroscopic systems and introduce the basic
concepts of thermodynamics. Because these ways of thinking are very diﬀerent from the ways that
we think about microscopic systems, most students of thermodynamics initially ﬁnd it diﬃcult
to apply the abstract principles of thermodynamics to concrete problems. However, the study of
thermodynamics has many rewards as was appreciated by Einstein:
A theory is the more impressive the greater the simplicity of its premises, the more
diﬀerent kinds of things it relates, and the more extended its area of applicability.
Therefore the deep impression that classical thermodynamics made to me. It is the only
physical theory of universal content which I am convinced will never be overthrown,
within the framework of applicability of its basic concepts.1
The essence of thermodynamics can be summarized by two laws: (1) Energy is conserved
and (2) entropy increases. These statements of the laws are deceptively simple. What is energy?
You are probably familiar with the concept of energy from other courses, but can you deﬁne it?
Abstract concepts such as energy and entropy are not easily deﬁned nor understood. However, as
you apply these concepts in a variety of contexts, you will gradually come to understand them.
1 A. Einstein, Autobiographical Notes, Open Court Publishing Company (1991). 26 CHAPTER 2. THERMODYNAMIC CONCEPTS 27 surroundings system boundary Figure 2.1: Schematic of a thermodynamic system. 2.2 The system The ﬁrst step in applying thermodynamics is to select the appropriate part of the universe of
interest. This part of the universe is called the system. In this context the term system is simply
anything that we wish to consider. The system is deﬁned by a closed surface called the boundary.
The boundary may be real or imaginary and may or may not be ﬁxed in shape or volume. The
system might be as obvious as a block of steel, water in a container, or the gas in a balloon. Or
the system might be a volume deﬁned by an imaginary ﬁxed boundary within a ﬂowing liquid.
The remainder of the universe is called the surroundings (see Figure 2.1). We usually take
the surroundings to be that part of the universe that is aﬀected by changes in the system. For
example, if an ice cube is placed in a glass of water, we might take the ice to be the system and
the water to be the surroundings. In this case we can usually ignore the interaction of the ice
cube with the air in the room and the interaction of the glass with the table on which the glass
is set. However, it might be more relevant to take the ice cube and water to be the system and
the air in the room to be the surroundings. The choice depends on the questions of interest. The
surroundings need not surround the system. 2.3 Thermodynamic Equilibrium Macroscopic systems often exhibit some memory of their recent history. A stirred cup of tea
continues to swirl. But if we wait for a while, we will no longer observe any large scale motion.
A hot cup of coﬀee cools and takes on the temperature of its surroundings regardless of its initial
temperature. The ﬁnal states of such systems are called equilibrium states, which are characterized
by their time independence, history independence, and relative simplicity.
Time independence means that the measurable macroscopic properties (such as the temperature, pressure, and density) of equilibrium systems do not change with time except for very small
ﬂuctuations that we can observe only under special conditions. In contrast, nonequilibrium states
change with time. The time scale for changes may be seconds or years, and cannot be determined
from thermodynamic arguments alone. We can say for sure that a system is not in equilibrium if its CHAPTER 2. THERMODYNAMIC CONCEPTS 28 properties change with time, but time independence during our observation time is not suﬃcient
to determine if a system is in equilibrium. It is possible that we just did not observe the system
long enough.
As in Chapter 1 the macrostate of a system refers to its macroscopic bulk properties such as its
temperature and pressure. Only a relatively few quantities are needed to specify the macrostate of
a system in equilibrium. For example, if you drop an ice cube into a cup of coﬀee, the temperature
immediately afterward will vary throughout the coﬀee until the coﬀee reaches equilibrium. Before
equilibrium is reached, we must specify the temperature at every point in the coﬀee to fully specify
its state. Once equilibrium is reached, the temperature will be uniform throughout and only one
number is needed to specify the temperature.
History independence implies that a system can come to the same ﬁnal equilibrium state
through an inﬁnity of possible ways. The ﬁnal state has lost all memory of how it was produced. For
example, if we put several cups of coﬀee in the same room, they will all reach the ﬁnal temperature,
regardless of their diﬀerent initial temperatures or how much milk was added. However, there are
many examples where the history of the system is important. For example, a metal cooled quickly
may contain defects that depend on the detailed history of how the metal was cooled. Such a
system is not in equilibrium.
It is diﬃcult to know for certain whether a system is in equilibrium because the time it
takes the system to reach equilibrium may be very long and our measurements might not indicate
whether a system’s macroscopic properties are changing. In practice, the criterion for equilibrium
is circular. Operationally, a system is in equilibrium if its properties can be consistently described
by the laws of thermodynamics.
The circular nature of thermodynamics is not fundamentally diﬀerent than that of other ﬁelds
of physics. For example, the law of conservation of energy can never be disproved, because we
can always make up new forms of energy to make it true. If we ﬁnd that we are continually
making up new forms of energy for every new system we ﬁnd, then we would discard the law of
conservation of energy as not being useful. As an example, if we were to observe a neutron at rest
decay into an electron and proton (beta decay) and measure the energy and momentum of the
decay products, we would ﬁnd an apparent violation of energy conservation in the vast majority of
decays. Historically, Pauli did not reject energy conservation, but instead suggested that a third
particle (the neutrino) is also emitted. Pauli’s suggestion was made in 1930, but the (anti)neutrino
was not detected until 1956. In this example our strong belief in conservation of energy led to a
new prediction and discovery.
The same is true for thermodynamics. We ﬁnd that if we use the laws of thermodynamics for
systems that experimentally appear to be in equilibrium, then everything works out ﬁne. In some
systems such as glasses that we suspect are not in thermal equilibrium, we must be very careful in
interpreting our measurements according to the laws of thermodynamics. 2.4 Temperature The concept of temperature plays a central role in thermodynamics and is related to the physiological sensation of hot and cold. Because such a sensation is an unreliable measure of temperature,
we will develop the concept of temperature by considering what happens when two bodies are CHAPTER 2. THERMODYNAMIC CONCEPTS 29 placed in thermal contact. The most important property of the temperature is its tendency to
become equal. For example, if we put a hot and a cold body into thermal contact, the temperature
of the hot body decreases and the temperature of the cold body increases until both bodies are at
the same temperature and the two bodies are in thermal equilibrium.
Problem 2.1. (a) Suppose you are blindfolded and place one hand in a pan of warm water and
the other hand in a pan of cold water. Then your hands are placed in another pan of water at room
temperature. What temperature would each hand perceive? (b) What are some other examples
of the subjectivity of our perception of temperature?
To deﬁne temperature more carefully, consider two systems separated by an insulating wall.2
A wall is said to be insulating if the thermodynamic variables of one system can be changed without
inﬂuencing the thermodynamic variables of the other system. For example, if we place one system
under a ﬂame, the temperature, pressure, and the volume of the second system would remain
unchanged. If the wall between the two systems were conducting, then the other system would be
aﬀected. Of course, insulating and conducting walls are idealizations. A good approximation to
the former is the wall of a thermos bottle; a thin sheet of copper is a good approximation to the
latter.
Now consider two systems surrounded by insulating walls, except for a common conducting
wall. For example, suppose that one system is a cup of coﬀee in a vacuum ﬂask and the other
system is mercury enclosed in a glass tube. (That is, the glass tube is in thermal contact with
the coﬀee.) We know that the height of the mercury column will reach a timeindependent value,
and hence the coﬀee and the mercury are in equilibrium. Now suppose that we dip the mercury
thermometer into a cup of tea in another vacuum ﬂask. If the height of the mercury column is
the same as it was when placed into the coﬀee, we say that the coﬀee and tea are at the same
temperature. This conclusion can be generalized as
If two bodies are in thermal equilibrium with a third body, they are in thermal equilibrium with each other (zeroth law of thermodynamics).
This conclusion is sometimes called the zeroth law of thermodynamics. The zeroth law implies the
existence of some universal property of systems in thermal equilibrium and allows us to obtain the
temperature of a system without a direct comparison to some standard. Note that this conclusion
is not a logical necessity, but an empirical fact. If person A is a friend of B and B is a friend of C ,
it does not follow that A is a friend of C .
Problem 2.2. Describe some other measurements that also satisfy a law similar to the zeroth
law.
Any body whose macroscopic properties change in a welldeﬁned manner can be used to
measure temperature. A thermometer is a system with some convenient macroscopic property that
changes with the temperature in a known way. Examples of convenient macroscopic properties
include the length of an iron rod, and the magnitude of the electrical resistance of gold. In all
these cases we need to measure only a single quantity to indicate the temperature.
2 An insulating wall is sometimes called an adiabatic wall. CHAPTER 2. THERMODYNAMIC CONCEPTS 30 Problem 2.3. Why are thermometers relatively small devices in comparison to the system of
interest?
To use diﬀerent thermometers quantitatively, we need to make them consistent with one
another. To do so, we choose a standard thermometer that works over a wide range of temperatures
and deﬁne reference temperatures which correspond to physical processes that always occur at the
same temperature. The familiar gas thermometer is based on the fact that the temperature T of
a dilute gas is proportional to its pressure P at constant volume. The temperature scale that is
based on the gas thermometer is called the ideal gas temperature scale. The unit of temperature
is called the kelvin (K). We need two points to deﬁne a linear function. We write
T (P ) = aP + b, (2.1) where a and b are constants. We may choose the magnitude of the unit of temperature in any
convenient way. The gas temperature scale has a natural zero — the temperature at which the
pressure of an ideal gas vanishes — and hence we take b = 0. The second point is established
by the triple point of water, the unique temperature and pressure at which ice, water, and water
vapor coexist. The temperature of the triple point is deﬁned to be 273.16 K exactly. Hence, the
temperature of a ﬁxed volume gas thermometer is given by
T = 273.16 P
,
Ptp (ideal gas temperature scale) (2.2) where P is the pressure of the ideal gas thermometer, and Ptp is its pressure at the triple point.
Equation (2.2) holds for a ﬁxed amount of matter in the limit P → 0. From (2.2) we see that the
kelvin is deﬁned as the fraction 1/273.16 of the temperature of the triple point of water.
Note that the gas scale of temperature is based on experiment, and there is no a priori reason to
prefer this scale to any other. However, we will show in Section 2.16 that the ideal gas temperature
deﬁned by (2.2) is consistent with the thermodynamic temperature scale.
At low pressures all gas thermometers read the same temperature regardless of the gas that
is used. The relation (2.2) holds only if the gas is suﬃciently dilute that the interactions between
the molecules can be ignored. Helium is the most useful gas because it liqueﬁes at a temperature
lower than any other gas.
The historical reason for the choice of 273.16 K for the triple point of water is that it gave, to
the accuracy of the best measurements then available, 100 K for the diﬀerence between the ice point
(the freezing temperature at standard pressure3 ) and the steam point (the boiling temperature at
standard pressure of water). However, more accurate measurements now give the diﬀerence as
99.97 K (see Table 2.1).
The centigrade temperature scale is deﬁned as
Tcentigrade = (T − Tice ) × 100
,
Tsteam − Tice (2.3) 3 Standard atmospheric pressure is the pressure of the earth’s atmosphere under normal conditions at sea level
and is deﬁned to be 1.013 × 105 N/m2 . The SI unit of pressure is N/m2 ; this unit has been given the name pascal
(Pa). CHAPTER 2. THERMODYNAMIC CONCEPTS
triple point
steam point
ice point 273.16 K
373.12 K
273.15 K 31
deﬁnition
experiment
experiment Table 2.1: Fixed points of the ideal gas temperature scale.
where Tice and Tsteam are the ice and steam points of water. By deﬁnition, there is 100 centigrade
units between the ice and steam points. Because the centigrade unit deﬁned in (2.3) is slightly
smaller than the kelvin, it is convenient to deﬁne the Celsius scale:
TCelius = T − 273.15, (2.4) where T is the ideal gas temperature. Note that Celsius is not a new name for centigrade and
that the Celsius and ideal gas temperatures diﬀer only by the shift of the zero. By convention the
degree sign is included with the C for Celsius temperature (◦ C), but no degree sign is used with
K for kelvin.
Problem 2.4. (a) The Fahrenheit scale is deﬁned such that the ice point is at 32 ◦ F and the steam
point is 212 ◦ F. Derive the relation between the Fahrenheit and Celsius temperature scales. (b)
What is normal body temperature (98.6 ◦ F) on the Celsius and Kelvin scales? (c) A meteorologist
in Canada reports a temperature of 30 ◦ C. How does this temperature compare to 70 ◦ F?
Problem 2.5. What is the range of temperatures that is familiar to you from your everyday
experience and from your prior studies? 2.5 Pressure Equation of State As we have discussed, the equilibrium states of a thermodynamic system are much simpler to
describe than nonequilibrium states. For example, the state of a simple ﬂuid (gas or liquid)
consisting of a single species is determined by its pressure P , (number) density ρ = N/V , and
temperature T , where N is the number of particles and V is the volume of the system. The
quantities P , T , and ρ are not independent, but are connected by a relation of the general form
P = f (T, ρ), (2.5) which is called the pressure equation of state. Each of these three quantities can be regarded as
a function of the other two, and the macrostate of the system is determined by any two of the
three. Note that we have implicitly assumed that the thermodynamic properties of a ﬂuid are
independent of its shape.
In general, the pressure equation of state is very complicated and must be determined either
empirically or from a simulation or from an approximate theoretical calculation (an application of
statistical mechanics). One of the few exceptions is the ideal gas for which the equation of state
is very simple. As discussed in Section 1.10, the ideal gas represents a mathematical idealization
in which the potential energy of interaction between the molecules is very small in comparison to CHAPTER 2. THERMODYNAMIC CONCEPTS 32 their kinetic energy and the system can be treated classically. For an ideal gas, we have for ﬁxed
temperature the empirical relation:
P∝ 1
.
V (ﬁxed temperature) (2.6) or
P V = constant. (2.7) The relation (2.7) is sometimes called Boyle’s law and was published by Robert Boyle in 1660.
Note that the relation (2.7) is not a law of physics, but an empirical relation. An equation such as
(2.7), which relates diﬀerent states of a system all at the same temperature, is called an isotherm.
We also have the empirical relation
V ∝ T. (ﬁxed pressure) (2.8) Some textbooks refer to (2.8) as Charles’s law, but it should be called the law of GayLussac.
We can express the two empirical relations, (2.7) and (2.8), as P ∝ T /V . In addition, if we
hold T and P constant and introduce more gas into the system, we ﬁnd that the pressure increases
in proportion to the amount of gas. If N is the number of gas molecules, we can write
P V = N kT, (ideal gas pressure equation of state) (2.9) where the constant of proportionality k in (2.9) is found experimentally to have the same value for
all gases in the limit P → 0. The value of k is
k = 1.38 × 10−23 J/K, (Boltzmann’s constant) (2.10) and is called Boltzmann’s constant. The relation (2.9) will be derived using statistical mechanics
in Section 4.5.
Because the number of particles in a typical gas is very large, it sometimes is convenient to
measure this number relative to the number of particles in one mole of gas. A (gram) mole of
any substance consists of Avogadro’s number, NA = 6.022 × 1023 , of that substance. Avogadro’s
number is deﬁned so that 12.0 g of 12 C atoms contain exactly one mole of these atoms. If there
are ν moles, then N = νNA , and the ideal gas equation of state can be written as
P V = νNA kT = νRT, (2.11) R = NA k = 8.314 J/K mole (2.12) where
is the gas constant.
Real gases do not satisfy the ideal gas equation of state except in the limit of low density. For
now we will be satisﬁed with considering a simple phenomenological4 equation of state of a real
gas with an interparticle interaction similar to the LennardJones potential (see Figure 1.1). The
4 Phenomenological is a word that we will use often. It means a description of the phenomena; such a description
is not derived from fundamental considerations. CHAPTER 2. THERMODYNAMIC CONCEPTS 33 simplest phenomenological pressure equation of state that describes the behavior of real gases at
moderate densities is due to van der Waals and has the form
(P + N2
a)(V − N b) = N kT,
V2 (van der Waals equation of state) (2.13) where a and b are empirical constants characteristic of the particular gas. The parameter b takes
into account the ﬁnite size of the molecules by decreasing the eﬀective available volume to any given
molecule. The parameter a is associated with the attractive interactions between the molecules.
We will derive this approximate equation of state in Section 8.2. 2.6 Some Thermodynamic Processes A change from one equilibrium state of the system to another is called a thermodynamic process.
Thermodynamics cannot determine how much time such a process will take, and the ﬁnal state
is independent of the amount of time it takes to reach equilibrium. It is convenient to consider
thermodynamic processes where a system is taken from an initial to a ﬁnal state by a continuous
succession of intermediate states. To describe a process in terms of thermodynamic variables, the
system must be in thermodynamic equilibrium. However, for the process to occur, the system cannot be exactly in thermodynamic equilibrium because at least one of the thermodynamic variables
is changing. However, if the change is suﬃciently slow, the process is quasistatic, and the system
can be considered to be in a succession of equilibrium states. A quasistatic process is an idealized
concept. Although no physical process is quasistatic, we can imagine real processes that approach
the limit of quasistatic processes.
Some thermodynamic processes can go only in one direction and others can go in either
direction. For example, a scrambled egg cannot be converted to a whole egg. Processes that can
go only in one direction are called irreversible. A process is reversible if it is possible to restore the
system and its surroundings to their original condition. (The surroundings include any body that
was aﬀected by the change.) That is, if the change is reversible, the status quo can be restored
everywhere.
Processes such as stirring the milk in a cup of coﬀee or passing an electric current through a
resistor are irreversible because once the process is done, there is no way of reversing the process.
But suppose we make a small and very slow frictionless change of a constraint such as an increase
in the volume, which we then reverse. Because there is no friction, we do no net work in this
process. At the end of the process, the constraints and the energy of the system return to their
original values and the state of the system is unchanged. In this case we can say that this process
is reversible. Of course, no real process is truly reversible because it would require an inﬁnite time
to occur. The relevant question is whether the process approaches reversibility.
Consider a gas in a closed, insulated container that is divided into two chambers by an impermeable partition. The gas is initially conﬁned to one chamber and then allowed to expand freely
into the second chamber to ﬁll the entire container. What is the nature of this process? The process
is certainly not quasistatic. But we can imagine this process to be performed quasistatically. We
could divide the second chamber into many small chambers separated by partitions and puncture
each partition in turn, allowing the expanded gas to come into equilibrium. So in the limit of an CHAPTER 2. THERMODYNAMIC CONCEPTS 34 inﬁnite number of partitions, such a process would be quasistatic. However this process would not
be reversible, because the gas would never return to its original volume.
Problem 2.6. Are the following processes reversible or irreversible?
(a) Air is pumped into a tire.
(b) Air leaks out of a tire. 2.7 Work During a process the surroundings can do work on the system of interest or the system can do
work on its surroundings. We now obtain an expression for the mechanical work done on a system
in a quasistatic process. For simplicity, we assume the system to be a ﬂuid. Because the ﬂuid is
in equilibrium, we can characterize it by a uniform pressure P . For simplicity, we assume that the
ﬂuid is contained in a cylinder of crosssectional area A ﬁtted with a movable piston. The piston
is girded by rings so that no gas or liquid can escape (see Figure 2.2). We can add weights to the
piston causing it to compress the ﬂuid. Because the pressure is deﬁned as the force per unit area,
the magnitude of the force exerted by the ﬂuid on the piston is given by P A, which also is the
force exerted by the piston on the ﬂuid. If the piston is displaced quasistatically by an amount dx,
then the work done on the ﬂuid by the piston is given by5
dW = −(P A) dx = −P (Adx) = −P dV. (2.15) The negative sign in (2.15) is present because if the volume of the ﬂuid is decreased, the work done
by the piston is positive.
If the volume of the ﬂuid changes quasistatically from a initial volume V1 to a ﬁnal volume V2 ,
the system remains very nearly in equilibrium, and hence its pressure at any stage is a function of
its volume and temperature. Hence, the total work is given by the integral
W1→2 = − V2 P (T, V ) dV. (quasistatic process) (2.16) V1 Note that the work done on the ﬂuid is positive if V2 < V1 and is negative if V2 > V1 .
For the special case of an ideal gas, the work done on a gas that is compressed at constant
temperature is given by
W1→2 = −N kT V2
V1 = −N kT ln
5 Equation dV
V V2
.
V1 (ideal gas at constant temperature) (2.17) (2.15) can be written as dV
dW
= −P
,
dt
dt
if the reader does not like the use of diﬀerentials. See Appendix 2B. (2.14) CHAPTER 2. THERMODYNAMIC CONCEPTS 35 F = PA ∆x P Figure 2.2: Example of work done on a ﬂuid enclosed within a cylinder ﬁtted with a piston when
the latter moves a distance ∆x. Figure 2.3: A block on an frictionless incline. The ﬁgure is taken from Loverude et al.
We have noted that the pressure P must be uniform throughout the ﬂuid. But compression
cannot occur if pressure gradients are not present. To move the piston from its equilibrium position,
we must add (remove) a weight from it. Then for a moment, the total weight on the piston will be
greater than P A. This diﬀerence is necessary if the piston is to move downward and do work on
the gas. If the movement is suﬃciently slow, the pressure departs only slightly from its equilibrium
value. What does “suﬃciently slow” mean? To answer this question, we have to go beyond the
macroscopic reasoning of thermodynamics and consider the molecules that comprise the ﬂuid. If
the piston is moved downward a distance ∆x, then the density of the molecules near the piston
becomes greater than the bulk of the ﬂuid. Consequently, there is a net movement of molecules
away from the piston until the density again becomes uniform. The time τ for the ﬂuid to return to
equilibrium is given by τ ≈ ∆x/vs , where vs is the mean speed of the molecules (see Section 6.4).
For comparison, the characteristic time τp for the process is τp ≈ ∆x/vp , where vp is the speed of
vs . That is, the
the piston. If the process is to be quasistatic, it is necessary that τ
τp or vp
speed of the piston must be much less than the mean speed of the molecules, a condition that is
easy to satisfy in practice.
Problem 2.7. To refresh your understanding of work in the context of mechanics, look at Fig. 2.3
and explain whether the following quantities are positive, negative, or zero: (a) The work done on
the block by the hand. (b) The work done on the block by the earth. (c) The work done on the
hand by the block (if there is no such work, state so explicitly).
Work depends on the path. The solution of the following example illustrates that the work CHAPTER 2. THERMODYNAMIC CONCEPTS 36 P P2 A P1 D V1 B C V2 V Figure 2.4: A simple cyclic process. What is the net work done on the gas? done on a system depends not only on the initial and ﬁnal states, but also on the intermediate
states, that is, on the path.
Example 2.1. Cyclic processes
Figure 2.4 shows a cyclic path ABCDA in the PV diagram of an ideal gas. How much work is
done on the gas during this cyclic process? (Look at the ﬁgure before you attempt to answer the
question.)
Solution. During the isobaric expansion A → B , the work done on the gas is
WAB = −P2 (V2 − V1 ). (2.18) No work is done from B → C and from D → A. The work done on the gas from C → D is
WCD = −P1 (V1 − V2 ). (2.19) Wnet = WAB + WCD = −P2 (V2 − V1 ) − P1 (V1 − V2 )
= −(P2 − P1 )(V2 − V1 ) < 0. (2.20) The net work done on the gas is then The result is that the net work done on the gas is the negative of the area enclosed by the path.
If the cyclic process were carried out in the reverse order, the net work done on the gas would be
positive.
Because the system was returned to its original pressure and volume, why is the net amount
of work done not zero? What would be the work done if the gas were taken from V2 to V1 along
the diagonal path connecting C and A? CHAPTER 2. THERMODYNAMIC CONCEPTS 2.8 37 The First Law of Thermodynamics If we think of a macroscopic system as consisting of a large number of interacting particles, we
know that it has a well deﬁned total energy which satisﬁes a conservation principle. This simple
justiﬁcation of the existence of a thermodynamic energy function is very diﬀerent from the historical development because thermodynamics was developed before the atomic theory of matter was
well accepted. Historically, the existence of a macroscopic conservation of energy principle was
demonstrated by purely macroscopic observations as outlined in the following.6
Consider a system enclosed by insulating walls – walls that prevent the system from being
heated by the environment. Such a system is thermally isolated. A process in which the state of the
system is changed only by work done on the system is called adiabatic. We know from overwhelming
empirical evidence that the amount of work needed to change the state of a thermally isolated
system depends only on the initial and ﬁnal states and not on the intermediate states through
which the system passes. This independence of the path under these conditions implies that we
can deﬁne a function E such that for a change from state 1 to state 2, the work done on a thermally
isolated system equals the change in E :
W = E2 − E1 = ∆E. (adiabatic process) (2.21) The quantity E is called the (internal) energy of the system.7 The internal energy in (2.21) is
measured with respect to the center of mass.8 The energy E is an example of a state function,
that is, it characterizes the state of a macroscopic system and is independent of the path.
Problem 2.8. What the diﬀerence between the total energy and the internal energy?
If we choose a convenient reference state as the zero of energy, then E has an unique value for
each state of the system because W is independent of the path for an adiabatic process. (Remember
that in general W depends on the path.)
If we relax the condition that the change be adiabatic and allow the system to interact with
its surroundings, we would ﬁnd in general that ∆E = W . (The diﬀerence between ∆E and W is
zero for an adiabatic process.) In general, we know that we can increase the energy of a system
by doing work on it or by heating it as a consequence of a temperature diﬀerence between it and
its surroundings. In general, the change in the internal energy of a closed system (ﬁxed number of
particles) is given by
∆E = W + Q. (ﬁrst law of thermodynamics) (2.22) The quantity Q is the change in the system’s energy due to heating (Q > 0) or cooling (Q < 0) and
W is the work done on the system. Equation (2.22) expresses the law of conservation of energy
and is known as the ﬁrst law of thermodynamics. This equation is equivalent to saying that there
are two macroscopic ways of changing the internal energy of a system: doing work and heating.
6 These experiments were done by Joseph Black (1728–1799), Benjamin Thompson (Count Rumford) (1753–
1814), especially Robert Mayer (1814–1878), and James Joule (1818–1889). Mayer and Joule are now recognized as
the codiscovers of the ﬁrst law of thermodynamics, but Mayer received little recognition at the time of his work.
7 Another common notation for the internal energy is U .
8 Microscopically, the internal energy of a system of particles is the sum of the kinetic energy in a reference frame
in which the center of mass velocity is zero and the potential energy arising from the forces of the particles on each
other. CHAPTER 2. THERMODYNAMIC CONCEPTS 38 One consequence of the ﬁrst law of thermodynamics is that ∆E is independent of the path,
even though the amount of work W does depend on the path. And because W depends on the
path and ∆E does not, the amount of heating also depends on the path.
Problem 2.9. A cylindrical pump contains one mole of a gas. The piston ﬁts tightly so that no air
escapes and friction in negligible between the piston and the cylinder walls. The pump is thermally
insulated from its surroundings. The piston is quickly pressed inward. What will happen to the
temperature of the gas? Explain your reasoning.
So far we have considered two classes of thermodynamic quantities. One class consists of state
functions because they have a speciﬁc value for each macroscopic state of the system. An example
of such a function is the internal energy E . As we have discussed, there are other quantities, such
as work and energy transfer due to heating, that do not depend on the state of the system. These
latter quantities depend on the thermodynamic process by which the system changed from one
state to another.
Originally, many scientists thought that there was a ﬂuid called heat in all substances which
could ﬂow from one substance to another. This idea was abandoned many years ago, but is still
used in everyday language. Thus, people talk about adding heat to a system. We will avoid this use
and whenever possible we will avoid the use of the noun “heat” altogether. Instead, we will refer
to a process as heating or cooling if it changes the internal energy of a system without changing
any external parameters (such as the external pressure, electric ﬁeld, magnetic ﬁeld, etc). Heating
occurs whenever two solids at diﬀerent temperatures are brought into thermal contact. In everyday
language we would say that heat ﬂows from the hot to the cold body. However, we prefer to say
that energy is transferred from the hotter to the colder body. There is no need to invoke the noun
“heat,” and it is misleading to say that heat “ﬂows” from one body to another.
To understand better that there is no such thing as the amount of heat in a body, consider
the following simple analogy adapted from Callen.9 A farmer owns a pond, fed by one stream and
drained by another. The pond also receives water from rainfall and loses water by evaporation.
The pond is the system of interest, the water within it is analogous to the internal energy, the
process of transferring water by the streams is analogous to doing work, the process of adding
water by rainfall is analogous to heating, and the process of evaporation is analogous to cooling.
The only quantity of interest is the water, just as the only quantity of interest is energy in the
thermal case. An examination of the change in the amount of water in the pond cannot tell us
how the water got there. The terms rain and evaporation refer only to methods of water transfer,
just as the terms heating and cooling refer only to methods of energy transfer.
Another example is due to Bohren and Albrecht.10 Take a small plastic container and add
just enough water to it so that its temperature can be conveniently measured. Then let the water
and the bottle come into equilibrium with their surroundings. Measure the temperature of the
water, cap the bottle, and shake the bottle until you are too tired to continue further. Then uncap
the bottle and measure the water temperature again. If there were a “whole lot of shaking going
on,” you would ﬁnd the temperature had increased a little.
In this example, the temperature of the water increased without heating. We did work on
the water, which resulted in an increase in its internal energy as manifested by a rise in the
9 See 10 See page 20.
page 25. CHAPTER 2. THERMODYNAMIC CONCEPTS 39 temperature. The same increase in temperature could have been obtained by bringing the water
into contact with a body at a higher temperature. But it would be impossible to determine by
making measurements on the water whether shaking or heating had been responsible for taking
the system from its initial to its ﬁnal state. (To silence someone who objects that you heated the
water with “body heat,” wrap the bottle with an insulating material.)
Problem 2.10. How could the owner of the pond distinguish between the diﬀerent types of water
transfer assuming that the owner has ﬂow meters, a tarpaulin, and a vertical pole?
Problem 2.11. Convert the following statement to the language used by physicists, “I am cold,
please turn on the heat.”
Before the equivalence of heating and energy transfer was well established, a change in energy
by heating was measured in calories. One calorie is the amount of energy needed to raise the
temperature of one gram of water from 14.5 ◦ C to 15.5 ◦ C. We now know that one calorie is
equivalent to 4.186 J, but the use of the calorie for energy transfer by heating and the joule for
work still persists. Just to cause confusion, the calorie we use to describe the energy content of
foods is actually a kilocalorie. 2.9 Energy Equation of State In (2.9) we gave the pressure equation of state for an ideal gas. Now that we know that the internal
energy also determines the state of a system of particles, we need to know how E depends on two
of the three variables, T and P or V . The form of the energy equation of state for an ideal gas
must also be determined empirically or calculated from ﬁrst principles using statistical mechanics
(see Section 4.5). From these considerations the energy equation of state for a monatomic gas is
given by
3
(ideal gas energy equation of state)
(2.23)
E = N kT.
2
Note that the energy of an ideal gas is independent of its volume.
Similarly, the approximate thermal equation of state of a real gas corresponding to the pressure
equation of state (2.13) is given by
3
N
N kT − N a.
(van der Waals energy equation of state)
(2.24)
2
V
Note that the energy depends on the volume if the interactions between particles is included.
E= Example 2.2. Work is done on an ideal gas at constant temperature. (a) What is the change in
the energy11 of the gas?
Solution.
Because the energy of an ideal gas depends only on the temperature (see (2.23)), there is no
change in its internal energy for an isothermal (constant temperature) process. Hence, ∆E = 0 =
Q + W , and
Q = −W = N kT ln
11 We V2
.
V1 (isothermal process for an ideal gas) actually mean the internal energy, but the meaning should be clear from the context. (2.25) CHAPTER 2. THERMODYNAMIC CONCEPTS 40 We see that if work is done on the gas (V2 < V1 ), then the gas must give energy to its surroundings
so that its temperature does not change.
Extensive and intensive variables. The thermodynamic variables that we have introduced so
far may be divided into two classes. Quantities such as the density ρ, the pressure P , and the
temperature T are intensive variables and are independent of the size of the system. Quantities
such as the volume V and the internal energy E are extensive variables and are proportional to
the number of particles in the system (at ﬁxed density). As we will see in Section 2.10, it often
is convenient to convert extensive quantities to a corresponding intensive quantity by deﬁning the
ratio of two extensive quantities. For example, the energy per particle and the energy per per unit
mass are intensive quantities. 2.10 Heat Capacities and Enthalpy We know that the temperature of a macroscopic system usually increases when we transfer energy
to it by heating.12 The magnitude of the increase in temperature depends on the nature of the
body and how much of it there is. The amount of energy transfer due to heating required to
produce a unit temperature rise in a given substance is called the heat capacity of that substance.
Here again we see the archaic use of the word “heat.” But because the term “heat capacity” is
common, we are forced to use it. If a body undergoes an increase of temperature from T1 to T2
accompanied by an energy transfer due to heating Q, then the average heat capacity is given by
the ratio
Q
average heat capacity =
.
(2.26)
T2 − T1
The value of the heat capacity depends on what constraints are imposed. We introduce the heat
capacity at constant volume by the relation
CV = ∂E
∂T V . (2.27) Note that if the volume V is held constant, the change in energy of the system is due only to
the energy transferred by heating. We have adopted the common notation in thermodynamics of
enclosing partial derivatives in parentheses and using subscripts to denote the variables that are
held constant. In this context, it is clear that the diﬀerentiation in (2.27) is at constant volume,
and we will write CV = ∂E/∂T if there is no ambiguity.13 (See Appendix 2B for a discussion of
the mathematics of thermodynamics.)
Equation (2.27) together with (2.23) can be used to obtain the heat capacity at constant
volume of a monatomic ideal gas:
CV = 3
N k.
2 (monatomic ideal gas) (2.28) Note that the heat capacity at constant volume of an ideal gas is independent of the temperature.
12 Can you think of an counterexample?
the number of particles also is held constant, we will omit the subscript N in (2.27) and in other
partial derivatives to reduce the number of subscripts.
13 Although CHAPTER 2. THERMODYNAMIC CONCEPTS 41 The heat capacity is an extensive quantity, and it is convenient to introduce the speciﬁc
heat which depends only on the nature of the material, not on the amount of the material. The
conversion to an intensive quantity can be achieved by dividing the heat capacity by the amount
of the material expressed in terms of the number of moles, the mass, or the number of particles.
We will use lower case c for speciﬁc heat; the distinction between the various kinds of speciﬁc heats
will be clear from the context and the units of c.
The enthalpy. The combination of thermodynamic variables, E + P V , occurs suﬃciently often
to acquire its own name. The enthalpy H is deﬁned as
H = E + P V. (enthalpy) (2.29) We can use (2.29) to ﬁnd a simple expression for CP , the heat capacity at constant pressure. From
(2.15) and (2.22), we have dE = dQ − P dV or dQ = dE + P dV (at constant pressure). From the
identity, d(P V ) = P dV + V dP , we can write dQ = dE + d(P V ) − V dP . At constant pressure
dQ = dE + d(P V ) = d(E + P V ) = dH . Hence, we can deﬁne the heat capacity at constant
pressure as
∂H
,
(2.30)
CP =
∂T
where we have suppressed noting that the pressure P is held constant during diﬀerentiation. We
will learn that the enthalpy is another state function that often makes the analysis of a system
simpler. At this point, we can only see that CP can be expressed more simply in terms of the
enthalpy.
Problem 2.12. (a) Give some of examples of materials that have a relatively low and relatively
high heat capacity. (b) Why do we have to distinguish between the heat capacity at constant
volume and the heat capacity at constant pressure?
Example 2.3. A water heater holds 150 kg of water. How much energy is required to raise the
water temperature from 18 ◦ C to 50 ◦ C?
Solution. The (mass) speciﬁc heat of water is c = 4184 J/kg K. (The diﬀerence between the speciﬁc
heats of water at constant volume and constant pressure is negligible at room temperatures.) The
energy required to raise the temperature by 32 ◦ C is
Q = mc(T2 − T1 ) = 150 kg × (4184 J/kg K) × (50 ◦ C − 18 ◦ C)
= 2 × 107 J.
We have assumed that the speciﬁc heat is constant in this temperature range.
Note that because the kelvin is exactly the same magnitude as a degree Celsius, it often is
more convenient to express temperature diﬀerences in degrees Celsius.
Example 2.4. A 1.5 kg glass brick is heated to 180 ◦C and then plunged into a cold bath containing
10 kg of water at 20 ◦ C. Assume that none of the water boils and that there is no heating of the
surroundings. What is the ﬁnal temperature of the water and the glass? The speciﬁc heat of glass
is approximately 750 J/kg K. CHAPTER 2. THERMODYNAMIC CONCEPTS 42 Solution. Conservation of energy implies that
∆Eglass + ∆Ewater = 0,
or
mglass cglass (T − Tglass ) + mwater cwater (T − Twater ) = 0.
The ﬁnal equilibrium temperature T is the same for both. We solve for T and obtain
mglass cglass Tglass + mwater cwater Twater
mglass cglass + mwater cwater
(1.5 kg)(750 J/kg K)(180 ◦ C) + (10 kg)(4184 J/kg K)(20 ◦ C)
=
(1.5 kg)(750 J/kg K) + (10 kg)(4184 J/kg K)
= 24.2 ◦ C. T= Example 2.5. The temperature of one mole of helium gas is increased from 20 ◦ C to 40 ◦ C at
constant volume. How much energy is needed to accomplish this temperature change?
Solution. Because the amount of He gas is given in moles, we need to know the molar speciﬁc
heat. From (2.28) and (2.12), we have that cV = 3R/2 = 1.5 × 8.314 = 12.5 J/mole K. Because cV
is constant (an excellent approximation), we have
∆E = Q = dT = 1 mole × 12.5 CV dT = νcV J
× 20 K = 250 J.
mole K Example 2.6. At very low temperatures the heat capacity of an insulating solid is proportional to
T 3 . If we take C = AT 3 for a particular solid, what is the energy needed to raise the temperature
from T1 to T2 ? The diﬀerence between CV and CP can be ignored at low temperatures. (In
Section 6.12, we use the Debye theory to express the constant A in terms of the speed of sound
and other parameters and ﬁnd the range of temperatures for which the T 3 behavior is a reasonable
approximation.)
Solution. Because C is temperaturedependent, we have to express the energy added as an integral:
T2 Q= C (T ) dT. (2.31) T1 In this case we have T2 Q=A T 3 dT = T1 A4
4
(T − T1 ).
42 (2.32) General relation between CP and CV . The ﬁrst law can be used to ﬁnd the general relation (2.36)
between CP and CV . The derivation involves straightforward, but tedious manipulations of thermodynamic derivatives. We give it here to give a preview of the general nature of thermodynamic
arguments.
From (2.29) and (2.30), we have
CP = ∂H
∂T P = ∂E
∂T P +P ∂V
∂T P . (2.33) CHAPTER 2. THERMODYNAMIC CONCEPTS 43 If we consider E to be a function of T and V , we can write
dE = ∂E
∂T V ∂E
∂V dT + T dV, (2.34) and hence (by dividing by ∆T and taking the limit ∆T → 0 at constant P )
∂E
∂T P = CV + ∂E
∂V T ∂V
∂T P . (2.35) If we eliminate (∂E/∂T )P in (2.33) by using (2.35), we obtain our desired result:
CP = CV + ∂V
∂E ∂V
+P
.
∂V ∂T
∂T (general result) (2.36) Equation (2.36) is a general relation that depends only on the ﬁrst law. A more useful general
relation between CP and CV that depends on the second law of thermodynamics will be derived
in Section 7.3.2.
For the special case of an ideal gas, ∂E/∂V = 0 and ∂V /∂T = N k/P , and hence
CP = CV + N k (ideal gas) (2.37) Note that CP is bigger than CV , a general result for any macroscopic body. Note that we used
the two equations of state for an ideal gas, (2.9) and (2.23), to obtain CP , and we did not have to
make an independent measurement or calculation.
Why is CP bigger than CV ? Unless we prevent it from doing so, a system normally expands
as its temperature increases. The system has to do work on its surroundings as it expands. Hence,
when a system is heated at constant pressure, energy is needed both to increase the temperature
of the system and to do work on its surroundings. In contrast, if the volume is kept constant, no
work is done on the surroundings and the heating only has to supply the energy required to raise
the temperature of the system.
The result (2.37) for CP for an ideal gas illustrates the power of thermodynamics. We used
a general principle, the ﬁrst law, and one equation of state to determine CP , a quantity that
we did not know directly. In Chapter 7 we will derive the general relation CP > CV for any
thermodynamic system. 2.11 Adiabatic Processes So far we have considered processes at constant temperature, constant volume, and constant pressure.14 We have also considered adiabatic processes which occur when the system does not exchange
energy with its surroundings due to a temperature diﬀerence. Note that an adiabatic process need
not be isothermal. For example, a chemical reaction that occurs within a container that is well
insulated is not isothermal.
Problem 2.13. Give an example of an isothermal process that is not adiabatic.
14 These processes are called isothermal, isochoric, and isobaric, respectively. CHAPTER 2. THERMODYNAMIC CONCEPTS 44 We now show that the pressure of an ideal gas changes more rapidly for a given change of
volume in a quasistatic adiabatic process than it does in an isothermal process. For an adiabatic
process the ﬁrst law reduces to
dE = dW. (adiabatic process) (2.38) For an ideal gas we have ∂E/∂V = 0, and hence (2.34) reduces to
dE = CV dT = −P dV, (ideal gas only) (2.39) where we have used (2.38). The easiest way to proceed is to eliminate P in (2.39) using the ideal
gas law P V = N kT :
dV
(2.40)
CV dT = −N kT
V
We next eliminate N k in (2.40) in terms of CP − CV and express (2.40) as
1 dT
dV
CV
dT
=
=−
,
CP − CV T
γ−1 T
V (2.41) where the symbol γ is the ratio of the heat capacities:
γ= CP
.
CV (2.42) For an ideal gas CV and CP and hence γ are independent of temperature, and we can integrate
(2.41) to obtain
T V γ −1 = constant.
(quasistatic adiabatic process)
(2.43)
For an ideal monatomic gas, we have from (2.28) and (2.37) that CV = 3N k/2 and CP =
5N k/2, and hence
γ = 5/3.
(ideal monatomic gas)
(2.44)
Problem 2.14. Use (2.43) and the ideal gas pressure equation of state in (2.9) to show that in a
quasistatic adiabatic processes P and V are related as
P V γ = constant. (2.45) T P (1−γ )/γ = constant. (2.46) Also show that T and P are related as The relations (2.43)–(2.46) hold for a quasistatic adiabatic process of an ideal gas; the relation
(2.45) is the easiest relation to remember. Because γ > 1, the relation (2.45) implies that for
a given volume change, the pressure changes more for an adiabatic process than it does for a
comparable isothermal process for which P V = constant. We can understand the reason for this
diﬀerence as follows. For an isothermal compression the pressure increases and the internal energy
of the gas does not change. For an adiabatic compression the energy increases because we have
done work on the gas and no energy can be transferred to the surroundings. The increase in the
energy causes the temperature to increase. Hence in an adiabatic compression, both the decrease
in the volume and the increase in the temperature cause the pressure to increase faster. CHAPTER 2. THERMODYNAMIC CONCEPTS 45 Ti P isotherm Tf
adiabatic
isotherm V
Figure 2.5: A P V diagram for adiabatic and isothermal processes. The two processes begin at
the same initial temperature, but the adiabatic process has a steeper slope and ends at a higher
temperature.
In Figure 2.5 we show the P V diagram for both isothermal and adiabatic processes. The
adiabatic curve has a steeper slope than the isothermal curves at any point. From (2.45) we see
that the slope of an adiabatic curve for an ideal gas is
∂P
∂V adiabatic = −γ P
,
V (2.47) in contrast to the slope of an isothermal curve for an ideal gas:
∂P
∂V T =− P
.
V (2.48) How can the ideal gas relations P V γ = constant and P V = N kT both be correct? The answer
is that P V = constant only for an isothermal process. A quasistatic ideal gas process cannot be
both adiabatic (Q = 0) and isothermal (no temperature change). During an adiabatic process, the
temperature of an ideal gas must change.
Problem 2.15. Although we do work on an ideal gas when we compress it isothermally, why does
the energy of the gas not increase? CHAPTER 2. THERMODYNAMIC CONCEPTS 46 Example 2.7. Adiabatic and isothermal expansion. Two identical systems each contain ν =
0.06 mole of an ideal gas at T = 300 K and P = 2.0 × 105 Pa. The pressure in the two systems is
reduced by a factor of two allowing the systems to expand, one adiabatically and one isothermally.
What are the ﬁnal temperatures and volumes of each system? Assume that γ = 5/3.
Solution. The initial volume V1 is given by
V1 = νRT1
0.060 mole × 8.3 J/(K mole) × 300 K
= 7.5 × 10−4 m3 .
=
P1
2.0 × 105 Pa For the isothermal system, P V remains constant, so the volume doubles as the pressure
decreases by a factor of two and hence V2 = 1.5 × 10−3 m3 . Because the process is isothermal, the
temperature remains at 300 K.
For adiabatic compression we have
V2γ =
or
V2 = P1
P2 1/γ P1 V1γ
,
P2 V1 = 23/5 × 7.5 × 10−4 m3 = 1.14 × 10−3 m3 . In this case we see that for a given pressure change, the volume change for the adiaabatic process
is greater. We leave it as an exercise to show that T2 = 250 K.
Problem 2.16. Air initially at 20◦ C is compressed by a factor of 15. (a) What is the ﬁnal
temperature assuming that the compression is adiabatic and γ = 1.4, the value of γ for air at
the relevant range of temperatures? By what factor does the pressure increase? (b) What is the
ﬁnal pressure assuming the compression is isothermal? (c) In which case does the pressure change
more?
How much work is done in a quasistatic adiabatic process? Because Q = 0, ∆E = W . For an
ideal gas, ∆E = CV ∆T for any process. Hence for a quasistatic adiabatic process
W = CV (T2 − T1 ). (quasistatic adiabatic process for an ideal gas) (2.49) We leave it to Problem 2.17 to show that (2.49) can be expressed in terms of the pressure and
volume as
P2 V2 − P1 V1
W=
.
(2.50)
γ−1
Problem 2.17. Another way to derive (2.50), the work done in a quasistatic adiabatic process,
is to use the relation (2.45). Work out the steps.
Example 2.8. Compression in a Diesel engine occurs quickly enough so that very little heating
of the environment occurs and thus the process may be considered adiabatic. If a temperature of
500 ◦ C is required for ignition, what is the compression ratio? Assume that γ = 1.4 for air and the
temperature is 20 ◦ C before compression. CHAPTER 2. THERMODYNAMIC CONCEPTS 47 Solution. Equation (2.43) gives the relation between T and V for a quasistatic adiabatic process.
We write T1 and V1 and T2 and V2 for the temperature and volume at the beginning and the end
of the piston stroke. Then (2.45) becomes
T1 V1γ −1 = T2 V2γ −1 . (2.51) Hence the compression ratio V1 /V2 is
T2
V1
=
V2
T1 1/(γ −1) = 773 K
293 K 1/0.4 = 11. Of course it is only an approximation to assume that the compression is quasistatic. 2.12 The Second Law of Thermodynamics The consequences of the ﬁrst law of thermodynamics can be summarized by the statements that
(a) energy is conserved in thermal processes and (b) heating is a form of energy transfer. We also
noted that the internal energy of a system can be identiﬁed with the sum of the potential and
kinetic energies of the particles (in a reference frame in which the center of mass velocity is zero.)
There are many processes that do not occur in nature, but whose occurrence would be consistent with the ﬁrst law. For example, the ﬁrst law does not prohibit energy from being transferred
spontaneously from a cold body to a hot body, yet it has never been observed. There is another
property of systems that must be taken into account, and this property is called the entropy.15
Entropy is another example of a state function. One of the remarkable achievements of the
nineteenth century was the reasoning that such a state function must exist without any idea of
how to measure its value directly. In Chapter 4 we will learn about the relationship between the
entropy and the number of possible microscopic states, but for now we will follow a logic that does
not depend on any knowledge of the microscopic behavior.
It is not uncommon to use heating as a means of doing work. For example, power plants burn
oil or coal to turn water into steam which in turn turns a turbine in a magnetic ﬁeld creating
electricity which then can do useful work in your home. Can we completely convert all the energy
created by chemical reactions into work? Or more simply can we cool a system and use all the
energy lost by the system to do work? Our everyday experience tells us that we cannot. If it
were possible, we could power a boat to cross the Atlantic by cooling the sea and transferring
energy from the sea to drive the propellers. We would need no fuel and travel would be much
cheaper. Or instead of heating a ﬂuid by doing electrical work on a resistor, we could consider
a process in which a resistor cools the ﬂuid and produces electrical energy at its terminals. The
fact that these processes do not occur is summarized in one of the statements of the second law of
thermodynamics:
No process is possible whose sole result is the complete conversion of energy transferred
by heating into work (Kelvin statement).
The second law implies that a perpetual motion machine of the second kind does not exist. Such
a machine would convert heat completely into work (see Figure 2.6).
15 This thermodynamic variable was named by Rudolf Clausius, who wrote in 1850 that he formed the word
entropy (from the Greek word for transformation) so as to be as similar as possible to the word energy. CHAPTER 2. THERMODYNAMIC CONCEPTS 48 energy absorbed
from atmosphere Q magic box light Figure 2.6: A machine that converts energy transferred by heating into work with 100% eﬃciency
violates the Kelvin statement of the second law of thermodynamics.
What about the isothermal expansion of an ideal gas? Does this process violate the second
law? When the gas expands, it does work on the piston which causes the gas to lose energy.
Because the process is isothermal, the gas must absorb energy so that its internal energy remains
constant. (The internal energy of an ideal gas depends only on the temperature.) We have
∆E = Q + W = 0. (2.52) We see that W = −Q, that is, the work done on the gas is −W and the work done by the gas is
Q. We conclude that we have completely converted the absorbed energy into work. However, this
conversion does not violate the Kelvin statement because the state of the gas is diﬀerent at the
end than at the beginning. We cannot use the gas to make an engine.
Another statement of the second law based on the empirical observation that energy does not
spontaneously go from a colder to a hotter body can be stated as
No process is possible whose sole result is cooling a colder body and heating a hotter
body (Clausius statement).
The Kelvin and the Clausius statements of the second law look diﬀerent, but each statement implies
the other so their consequences are identical (see Appendix 2A).
A more abstract version of the second law that is not based directly on experimental observations, but that is more convenient in many contexts, can be expressed as
There exists an additive function of state known as the entropy S that can never
decrease in an isolated system.
Because the entropy cannot decrease in an isolated system, we conclude that the entropy is a
maximum for an isolated system in equilibrium. The term additive means that if the entropy of two
systems is SA and SB , respectively, the total entropy of the combined system is Stotal = SA + SB .
In the following we adopt this version of the second law and show that the Kelvin and Clausius
statements follow from it. CHAPTER 2. THERMODYNAMIC CONCEPTS 49 The statement of the second law in terms of the entropy is applicable only to isolated systems
(a system enclosed by insulating, rigid, and impermeable walls). In general, the system of interest
can exchange energy with its surroundings. In many cases the surroundings may be idealized as
a large body that does not interact with the rest of the universe. For example, we can take the
surroundings of a cup of hot water to be the atmosphere in the room. In this case we can treat
the composite system, system plus surroundings, as isolated. For the composite system, we have
for any process
(2.53)
∆Scomposite ≥ 0,
where Scomposite is the entropy of the system plus its surroundings.
If a change is reversible, we cannot have ∆Scomposite > 0, because if we reverse the change we
would have ∆Scomposite < 0, a violation of the Clausius statement. Hence, the only possibility is
that
∆Scomposite = 0.
(reversible process)
(2.54)
To avoid confusion, we will use the term reversible to be equivalent to a constant entropy process.
The condition for a process to be reversible requires only that the total entropy of a closed system
be constant; the entropies of its parts may either increase or decrease. 2.13 The Thermodynamic Temperature The Clausius and Kelvin statements of the second law arose from the importance of heat engines
to the development of thermodynamics. A seemingly diﬀerent purpose of thermodynamics is to
determine the conditions of equilibrium. These two purposes are linked by the fact that whenever
there is a diﬀerence of temperature, work can be extracted. So the possibility of work and the
absence of equilibrium are related.
In the following, we derive the properties of the thermodynamic temperature from the second law. In Section 2.16 we will show that this temperature is the same as the ideal gas scale
temperature.
Consider an isolated composite system that is partitioned into two subsystems A and B by a
ﬁxed, impermeable, insulating wall. For the composite system we have
E = EA + EB = constant, (2.55) V = VA + VB = constant, and N = NA + NB = constant. Because the entropy is additive, we can
write the total entropy as
S (EA , VA , NA , EB , VB , NB ) = SA (EA , VA , NA ) + SB (EB , VB , NB ). (2.56) Most divisions of energy, EA and EB , between subsystems A and B do not correspond to thermal
equilibrium.
For thermal equilibrium to be established, we replace the ﬁxed, impermeable, insulating wall
by a ﬁxed, impermeable, conducting wall so that the two subsystems are in thermal contact and
energy transfer by heating or cooling can occur. We say that we have relaxed an internal constraint.
According to our statement of the second law, the values of EA and EB will be such that the entropy CHAPTER 2. THERMODYNAMIC CONCEPTS 50 of the system becomes a maximum. To ﬁnd the value of EA that maximizes S as given by (2.56),
we calculate
∂ SA
∂ SB
dS =
dEA +
dEB .
(2.57)
∂EA VA ,NA
∂EB VB ,NB
Because the total energy of the system is conserved, we have dEB = −dEA , and hence
∂ SA
∂ SB
−
dEA .
∂EA VA ,NA
∂EB VB ,NB
The condition for equilibrium is that dS = 0 for arbitrary values of dEA , and hence
dS = ∂ SA
∂EA VA ,NA = ∂ SB
∂EB VB ,NB . (2.58) (2.59) Because the temperatures of the two systems are equal in thermal equilibrium, we conclude that
the derivative ∂S/∂E must be associated with the temperature. We will ﬁnd that it is convenient
to deﬁne the thermodynamic temperature T as
∂S
1
=
T
∂E V ,N , (thermodynamic deﬁnition of temperature) which implies that the condition for thermal equilibrium is
1
1
=
.
TA
TB (2.60) (2.61) Of course we can rewrite (2.61) as TA = TB .
We have found that if two systems are separated by a conducting wall, energy will be transferred until each of the systems reaches the same temperature. We now suppose that the two
subsystems are initially separated by an insulating wall and that the temperatures of the two subsystems are almost equal with TA > TB . If this constraint is removed, we know that energy will
be transferred across the conducting wall and the entropy of the composite system will increase.
From (2.58) we can write the change in entropy as
∆S ≈ 1
1
∆EA > 0,
−
TA
TB (2.62) where TA and TB are the initial values of the temperatures. The condition that TA > TB , requires
that ∆EA < 0 in order for ∆S > 0 in (2.62) to be satisﬁed. Hence, we conclude that the deﬁnition
(2.60) of the thermodynamic temperature implies that energy is transferred from a system with
a higher value of T to a system with a lower value of T . We can express (2.62) as: No process
exists in which a cold body cools oﬀ while a hotter body heats up and the constraints on the bodies
and the state of its surroundings are left unchanged. We recognize this statement as the Clausius
statement of the second law.
The deﬁnition (2.60) of T is not unique, and we could have replaced 1/T by other functions
√
of temperature such as 1/T 2 or 1/ T . However, we will ﬁnd in Section 2.16 that the deﬁnition
(2.60) implies that the thermodynamic temperature is identical to the ideal gas scale temperature.
Note that the inverse temperature can be interpreted as the response of the entropy to a
change in the energy of the system. In Section 2.17, we will derive the condition for mechanical
equilibrium, and in Section 4.5 we will derive the condition for chemical equilibrium. These two
conditions complement the condition for thermal equilibrium. All three conditions must be satisﬁed
for thermodynamic equilibrium to be established. CHAPTER 2. THERMODYNAMIC CONCEPTS 2.14 51 The Second Law and Heat Engines A body that can change the temperature of another body without changing its own temperature
and without doing work is known as a heat bath. The term is archaic, but we will adopt it because
of its common usage. We also will sometimes use the term thermal bath. Examples of a heat source
and a heat sink depending on the circumstances are the earth’s ocean and atmosphere. If we want
to measure the electrical conductivity of a small block of copper at a certain temperature, we can
place it into a large body of water that is at the desired temperature. The temperature of the
copper will become equal to the temperature of the large body of water, whose temperature will
be unaﬀected by the copper.
For pure heating or cooling the increase in the entropy is given by
dS = ∂S
∂E V ,N dE. (2.63) In this case dE = dQ because no work is done. If we express the partial derivative in (2.63) in
terms of T , we can rewrite (2.63) as
dS = dQ
.
T (pure heating) (2.64) We emphasize that the relation (2.64) holds only for quasistatic changes. Note that (2.64) implies
that the entropy does not change in a quasistatic, adiabatic process.
We now use (2.64) to discuss the problem that stimulated the development of thermodynamics
– the eﬃciency of heat engines. We know that an engine converts energy from a heat source to
work and returns to its initial state. According to (2.64), the transfer of energy from a heat source
lowers the entropy of the source. If the energy transferred is used to do work, the work done
must be done on some other system. Because the process of doing work may be quasistatic (for
example, compressing a gas), the work need not involve a change of entropy. But if all of the
energy transferred is converted into work, the total entropy would decrease, and we would violate
the entropy statement of the second law. Hence, we arrive at the conclusion summarized in Kelvin’s
statement of the second law: no process is possible whose sole result is the complete conversion of
energy into work.
The simplest possible engine works in conjunction with a heat source at temperature Thigh
and a heat sink at temperature Tlow . In one cycle the heat source transfers energy Qhigh to the
engine, and the engine does work W and transfers energy Qlow to the heat sink (see Figure 2.7).
At the end of one cycle, the energy and entropy of the engine are unchanged because they return
to their original values. An engine of this type is known as a Carnot engine.
By energy conservation, we have Qhigh = W + Qlow , or
W = Qhigh − Qlow , (2.65) where in this context Qhigh and Qlow are positive quantities. From the second law we have that
∆Stotal = ∆Shigh + ∆Slow = − Qhigh Qlow
+
≥ 0.
Thigh
Tlow (2.66) CHAPTER 2. THERMODYNAMIC CONCEPTS 52 heat source
Thigh
Qhigh
engine
W
Qlow
heat sink
Tlow
Figure 2.7: Schematic energy transfer diagram for an ideal heat engine. By convention, the
quantities Qhigh , Qlow , and W are taken to be positive.
We rewrite (2.66) as
Qlow
Tlow
≥
.
Qhigh
Thigh (2.67) The thermal eﬃciency η of the engine is deﬁned as
what you obtain
what you pay for
W
Qhigh − Qlow
Qlow
=
=
=1−
.
Qhigh
Qhigh
Qhigh η= (2.68)
(2.69) From (2.69) we see that the engine is most eﬃcient when the ratio Qlow /Qhigh is as small as
possible. Equation (2.67) shows that Qlow /Qhigh is a minimum when the cycle is reversible so that
∆Stotal = 0,
Qlow
Tlow
=
.
Qhigh
Thigh and (2.70)
(2.71) For these conditions we ﬁnd that the maximum thermal eﬃciency is
η =1− Tlow
.
Thigh (maximum thermal eﬃciency) (2.72) Note that the temperature in (2.72) is the thermodynamic temperature.
The result (2.72) illustrates the remarkable power of thermodynamics. We have concluded
that all reversible engines operating between a heat source and a heat sink with the same pair of CHAPTER 2. THERMODYNAMIC CONCEPTS 53 temperatures have the same eﬃciency and that no irreversible engine working between the same
pair of temperatures can have a greater eﬃciency. This statement is known as Carnot’s principle.
That is, based on general principles, we have been able to determine the maximum eﬃciency of a
reversible engine without knowing anything about the details of the engine.
Of course, real engines never reach the maximum thermodynamic eﬃciency because of the
presence of mechanical friction and because the processes cannot really be quasistatic. For these
reasons, real engines seldom attain more than 30–40% of the maximum thermodynamic eﬃciency.
Nevertheless, the basic principles of thermodynamics are an important factor in their design. We
will discuss other factors that are important in the design of heat engines in Chapter 7.
Example 2.9. A Carnot engine
A Carnot engine extracts 240 J from a heat source and rejects 100 J to a heat sink at 15 ◦ C during
each cycle. How much work does the engine do in one cycle? What is its eﬃciency? What is the
temperature of the heat source?
Solution. From the ﬁrst law we have
W = 240 J − 100 J = 140 J.
The eﬃciency is given by
η= 140
W
= 0.583 = 58.3%.
=
Qhigh
240 We can use this result for η and the general relation (2.72) to solve for Thigh :
Thigh = 288 K
Tlow
=
= 691 K.
1−η
1 − 0.583 Note that to calculate the eﬃciency, we must work with the thermodynamic temperature.
Example 2.10. The cycle of a hypothetical engine is illustrated in Figure 2.8. Let P1 = 1 × 106
Pa, P2 = 2 × 106 Pa, V1 = 5 × 10−3 m3 , and V2 = 25 × 10−3 m3 . If the energy absorbed by
heating the engine is 5 × 104 J, what is the eﬃciency of the engine?
Solution. The work done by the engine equals the area enclosed:
W= 1
(P2 − P1 )(V2 − V1 ).
2 Conﬁrm that W = 1 × 104 J. The eﬃciency is given by
1 × 104
= 0.20.
Qabsorbed
5 × 104
The maximum eﬃciency of a heat engine depends on the temperatures Thigh and Tlow in
a simple way and not on the details of the cycle or working substance. The Carnot cycle is a
particular sequence of idealized processes of an ideal gas that yields the maximum thermodynamic
eﬃciency given in (2.72). The four steps of the Carnot cycle (two adiabatic and two isothermal
steps) are illustrated in Figure 2.9. The initial state is at the point A. The gas is in contact with a
hot heat bath at temperature Thigh so that the temperature of the gas also is Thigh . The piston is
pushed in as far as possible so the volume is reduced. As a result of the relatively high temperature
and small volume, the pressure of the gas is high.
η= W = CHAPTER 2. THERMODYNAMIC CONCEPTS 54 P P2 P1 V1 V2 V Figure 2.8: The cycle of the hypothetical engine considered in Example 2.10. 1. A → B , isothermal expansion. The gas expands while it is in contact with the heat source.
During the expansion the high pressure gas pushes on the piston and the piston turns a
crank. This step is a power stroke of the engine and the engine does work. To keep the gas
at the same temperature, the engine must absorb energy by being heated by the heat source.
We could compress the gas isothermally and return the gas to its initial state. Although
this step would complete the cycle, exactly the same amount of work would be needed to
push the piston back to its original position and hence no net work would be done. To make
the cycle useful, we have to choose a cycle so that not all the work of the power stroke is lost
in restoring the gas to its initial pressure, temperature, and volume. The idea is to reduce
the pressure of the gas so that during the compression step less work has to be done. One
way of reducing the pressure is to lower the temperature of the gas by doing an adiabatic
expansion.
2. B → C , adiabatic expansion. We remove the thermal contact of the gas with the hot bath
and allow the volume to continue to increase so that the gas expands adiabatically. Both the
pressure and the temperature of the gas fall. The step from B → C is still a power stroke,
but now we are cashing in on the energy stored in the gas, because it can no longer take
energy from the heat source.
3. C → D, isothermal compression. We now begin to restore the gas to its initial condition.
At C the gas is placed in contact with the heat sink at temperature Tlow , to ensure that the
pressure remains low. We now do work on the gas by pushing on the piston and compressing
the gas. As the gas is compressed, the temperature of the gas tends to rise, but the thermal
contact with the cold bath ensures that the temperature remains at the same temperature
Tlow . The extra energy is dumped to the heat sink.
4. D → A, adiabatic compression. At D the volume is almost what it was initially, but the CHAPTER 2. THERMODYNAMIC CONCEPTS 55 not done Figure 2.9: The four steps of the Carnot cycle.
temperature of the gas is too low. Before the piston returns to its initial state, we remove
the contact with the heat sink and allow the work of adiabatic compression to raise the
temperature of the gas to Thigh .
These four steps represent a complete cycle and our idealized engine is ready to go through
another cycle. Note that a net amount of work has been done, because more work was done by
the gas during its power strokes than was done on the gas while it was compressed. The reason is
that the work done during the compression steps was against a lower pressure. The result is that
we have extracted useful work. But some of the energy of the gas was discarded into the heat sink
while the gas was being compressed. Hence, the price we have had to pay to do work by having
the gas heated by the heat source is to throw away some of the energy to the heat sink.
Example 2.11. Work out the changes in the various thermodynamic quantities of interest during
each step of the Carnot cycle and show that the eﬃciency of a Carnot cycle whose working substance
is an ideal gas is given by η = 1 − T2 /T1 .
Solution. We will use the P V diagram for the engine shown in Figure 2.9. During the isothermal
expansion from A to B, energy Qhigh is absorbed by the gas by heating at temperature Thigh . The
expanding gas does a positive amount of work against its environment. Because ∆E = 0 for an
ideal gas along an isotherm,
Qhigh = −WA→B = N kThigh ln VB
,
VA (2.73) where WAB is the (negative) work done on the gas.
During the adiabatic expansion from B → C , QB→C = 0 and WB→C = CV (TC − TB ).
Similarly, WC→D = −N kTlow ln VD /VC , and
Qlow = −N kTlow ln VD
.
VC (2.74) (By convention Qhigh and Qlow are both positive.) Finally, during the adiabatic compression from
D → A, QD→A = 0 and WD→A = CV (TD − TA ). We also have Wnet = Qhigh − Qlow .
Because the product T V γ −1 is a constant in a quasistatic adiabatic process, we have
γ
γ
ThighVB −1 = Tlow VC −1
γ
Tlow VD −1 = γ
ThighVA −1 , (2.75a)
(2.75b) CHAPTER 2. THERMODYNAMIC CONCEPTS 56 which implies that
VB
VC
=
.
VA
VD (2.76) The net work is given by
Wnet = Qhigh − Qlow = N k (Thigh − Tlow ) ln
The eﬃciency is given by
η= Wnet
Thigh − Tlow
=
,
Qhigh
Thigh VB
.
VC (2.77) (2.78) as was found earlier by general arguments. 2.15 Entropy Changes As we have mentioned, the impetus for developing thermodynamics was the industrial revolution
and the eﬃciency of engines. However, similar reasoning can be applied to other macroscopic
systems to calculate the change in entropy.
Example 2.12. A solid with heat capacity C is taken from an initial temperature T1 to a ﬁnal
temperature T2 . What is its change in entropy? (Ignore the small diﬀerence in the heat capacities
at constant volume and constant pressure.)
Solution. We assume that the temperature of the solid is increased by putting the solid in contact
with a succession of heat baths at temperatures separated by a small amount ∆T . Then the
entropy change is given by
S2 − S1 = T2
T1 dQ
=
T T2 C (T )
T1 dT
.
T (2.79) Because the heat capacity C is a constant, we ﬁnd
∆S = S2 − S1 = C T2
T1 T2
dT
= C ln .
T
T1 (2.80) Note that if T2 > T1 , the entropy has increased.
How can we measure the entropy of a solid? We know how to measure the temperature and
the energy, but we have no entropy meter. Instead we have to determine the entropy indirectly. If
the volume is held constant, we can determine the temperature dependence of the entropy by doing
many successive measurements of the heat capacity and by doing the integral in (2.79). In practice,
such an integral can be done numerically. Note that such a determination gives only an entropy
diﬀerence. We will discuss how to determine the absolute value of the entropy in Section 2.20.
Entropy changes due to thermal contact
A solid with heat capacity CA at temperature TA is placed in contact with another solid with heat
capacity CB at a lower temperature TB . What is the change in entropy of the system after the CHAPTER 2. THERMODYNAMIC CONCEPTS 57 two bodies have reached thermal equilibrium? Assume that the heat capacities are independent of
temperature and the two solids are isolated from their surroundings.
From Example 2.4 we know that the ﬁnal equilibrium temperature is given by
T= CA TA + CB TB
.
CA + CB (2.81) Although the process is irreversible, we can calculate the entropy change by considering any process
that takes a body from one temperature to another. For example, we can imagine that a body is
brought from its initial temperature TB to the temperature T in many successive inﬁnitesimal steps
by placing it in successive contact with a series of reservoirs at inﬁnitesimally greater temperatures.
At each contact the body is arbitrarily close to equilibrium and has a well deﬁned temperature.
For this reason, we can apply the result (2.80) which yields ∆SA = CA ln T /TA . The total change
in the entropy of the system is given by
∆S = ∆SA + ∆SB = CA ln T
T
+ CB ln
,
TA
TB (2.82) where T is given by (2.81). Substitute real numbers for TA , TB , and C and convince yourself that
∆S ≥ 0. Does the sign of ∆S depend on whether TA > TB or TA < TB ?
Example 2.13. Entropy changes
One kilogram of water at 0 ◦ C is brought into contact with a heat bath at 50 ◦ C. What is the
change of entropy of the water, the bath, and the combined system consisting of both the water
and the heat bath?
Solution. The change in entropy of the water is given by
∆SH2 0 = C ln T2
273 + 50
= 703.67 J/K.
= 4184 ln
T1
273 + 0 (2.83) Why does the factor of 273 enter in (2.83)? The amount of energy transferred to the water from
the heat bath is
Q = C (T2 − T1 ) = 4184 × 50 = 209, 200 J.
(2.84)
The change in entropy of the heat bath is
∆SB = −Q
209200
= −647.68 J/K.
=−
T2
323 (2.85) Hence the total change in the entropy is
∆S = ∆SH2 0 + ∆SB = 703.67 − 647.68 = 56 J/K. (2.86) Problem 2.18. The temperature of one kilogram of water at 0 ◦ C is increased to 50 ◦ C by ﬁrst
bringing it into contact with a heat bath at 25 ◦ C and then with a heat bath at 50 ◦ C. What is the
change in entropy of the entire system? CHAPTER 2. THERMODYNAMIC CONCEPTS 58 Example 2.14. Melting of ice
A beaker contains a mixture of 0.1 kg of ice and 0.1 kg of water. Suppose that we place the beaker
over a low ﬂame and melt 0.02 kg of the ice. What is the change of entropy of the icewater
mixture? (It takes 334 kJ to melt 1 kg of ice.)
Solution. If we add energy to ice at the melting temperature T = 273.15 K, the eﬀect is to melt
the ice rather than to raise the temperature.
The addition of energy to the icewater mixture is generally not a reversible process, but we
can ﬁnd the entropy change by considering a reversible process between the initial and ﬁnal states.
We melt 0.02 kg of ice in a reversible process by supplying 0.02 kg × 334 kJ/kg = 6680 J from a
heat bath at 273.15 K, the icewater mixture being in equilibrium at atmospheric pressure. Hence,
the entropy increase is given by ∆S = 6680/273.15 = 24.46 J/K.
Entropy change in a free expansion. Consider an ideal gas of N particles in a closed, insulated
container that is divided into two chambers by an impermeable partition (see Figure 2.10). The
gas is initially conﬁned to one chamber of volume VA at a temperature T . The gas is then allowed
to expand freely into a vacuum to ﬁll the entire container of volume VB . What is the change in
entropy for this process? partition VA V B  VA Figure 2.10: The free expansion of an isolated ideal gas. The second chamber is initially a vacuum
and the total volume of the two chambers is VB .
Because the expansion is into a vacuum, no work is done by the gas. The expansion also is
adiabatic because the container is thermally insulated. Hence, there is no change in the internal
energy of the gas. It might be argued that ∆S = Q/T = 0 because Q = 0. However, this conclusion
would be incorrect because the relation dS = dQ/T holds only for a quasistatic process.
The expansion from VA to VB is an irreversible process. Left to itself, the system will not
return spontaneously to its original state with all the particles in the left container. To calculate
the change in the entropy, we may consider a quasistatic process that takes the system between the
same two states. Because the gas is ideal, the internal energy depends only on the temperature,
and hence the temperature of the ideal gas is unchanged. So we will calculate the energy added CHAPTER 2. THERMODYNAMIC CONCEPTS 59 during an isothermal process to take the gas from volume VA to VB ,
Q = N kT ln VB
,
VA (2.87) where we have used (2.25). Hence, from (2.79), the entropy change is given by
∆S = Q
VB
= N k ln
.
T
VA (2.88) Note that VB > VA and the entropy change is positive as expected.
Alternatively, we can argue that the work needed to restore the gas to its original state is
given by
VA
VB
W =−
P dV = N kT ln
,
(2.89)
VA
VB
where we have used the fact that the process is isothermal. Hence, in this case W = T ∆S , and
the entropy increase of the universe requires work on the gas to restore it to its original state.
The discussion of the free expansion of an ideal gas illustrates two initially confusing aspects of
thermodynamics. One is that the name thermodynamics is a misnomer because thermodynamics
treats only equilibrium states and not dynamics. Nevertheless, thermodynamics discusses processes
that must happen over some interval of time. Also confusing is that we can consider processes
that did not actually happen. In this case no energy by heating was transferred to the gas and
the process was adiabatic. The value of Q calculated in (2.87) is the energy transferred in an
isothermal process. Of course, no energy is transferred by heating in an adiabatic process, but the
entropy change is the same. For this reason we calculated the entropy change as if an isothermal
process had occurred.
Quasistatic adiabatic processes. We have already discussed that quasistatic adiabatic processes
have the special property that the entropy does not change, but we repeat this statement here to
emphasize its importance. Because the process is adiabatic, Q = 0, and because the process is
quasistatic, ∆S = Q/T = 0, and there is no change in the entropy.
Maximum work. When two bodies are placed in thermal contact, no work is done, that is,
∆W = 0 and ∆E = QA + QB = 0. What can we do to extract the maximum work possible from
the two bodies? From our discussion of heat engines, we know that we should not place them in
thermal contact. Instead we run a Carnot (reversible) engine between the two bodies. However,
unlike the reservoirs considered in the Carnot engine, the heat capacities of the two bodies are
ﬁnite, and hence the temperature of each body changes as energy is transferred from one body to
the other.
Because the process is assumed to be reversible, we have
∆S = ∆SA + ∆SB = 0, (2.90) from which it follows using (2.79) that
CA ln T
T
+ CB ln
= 0.
TA
TB (2.91) CHAPTER 2. THERMODYNAMIC CONCEPTS 60 If we solve (2.91) for T , we ﬁnd that
C /( C A + C B ) T = TA A C /( C A + C B ) TB B . (2.92) We see that the ﬁnal temperature for a reversible process is the geometrical average of TA and TB
weighted by their respective heat capacities.
Problem 2.19. (a) Suppose T1 = 256 K and TB = 144 K. What are the relative values of the ﬁnal
temperatures in (2.81) and (2.92) assuming that the heat capacities of the two bodies are equal?
(b) Show that the work performed by the heat engine in the reversible case is given by
W = ∆E = CA (T − TA ) + CB (T − TB ). (2.93) Are all forms of energy equivalent? If you were oﬀered 100 J of energy, would you choose to
have it delivered as compressed gas at room temperature or as a hot brick at 400 K? The answer
might depend on what you want to do with the energy. If you want to lift a block of ice, the best
choice would be to take the energy in the compressed gas. If you want to keep warm, the 400 K
object would be acceptable.
If you are not sure what you want to do with the energy, it is clear from the second law of
thermodynamics that we should take the form of the energy that can be most directly converted
into work, because there is no restriction on using stored energy for heating. What is diﬀerent is
the quality of the energy, which we take to be a measure of its ability to do a variety of tasks. We
can readily convert energy from higher to lower quality, but the second law of thermodynamics
prevents us from going in the opposite direction with 100% eﬃciency.
We found in our discussion of the adiabatic free expansion of a gas that the entropy increases.
Because the system has lost ability to do work, we can say that there has been a deterioration of
the quality of energy. Suppose that we had let the gas undergo a quasistatic isothermal expansion
instead of an adiabatic free expansion. Then the work done by the gas would have been (see
(2.25)):
VB
.
(2.94)
W = N kT ln
VA
After the adiabatic free expansion, the gas can no longer do this work, even though its energy is
unchanged. If we compare (2.94) with (2.88), we see that the energy that becomes unavailable to
do work in an adiabatic free expansion is
Eunavailable = T ∆S. (2.95) Equation (2.95) indicates that entropy is a measure of the quality of energy. Given two systems
with the same energy, the one with the lower entropy has the higher quality energy. An increase
in entropy implies that some energy has become unavailable to do work. 2.16 Equivalence of Thermodynamic and Ideal Gas Scale
Temperatures So far we have assumed that the ideal gas scale temperature which we introduced in Section 2.4
is the same as the thermodynamic temperature deﬁned by (2.60). We now show that the two
temperatures are proportional and can be made equal if we choose the units of S appropriately. CHAPTER 2. THERMODYNAMIC CONCEPTS 61 The gas scale temperature, which we denote as θ in this section to distinguish it from the
thermodynamic temperature T , is deﬁned by the relation
θ = P V /N k. (2.96) That is, θ is proportional to the pressure of a gas at a ﬁxed low density and is equal to 273.16 K
at the triple point of water. The fact that θ ∝ P is a matter of deﬁnition. Another important
property of ideal gases is that the internal energy depends only on θ and is independent of the
volume.
One way to show that T is proportional to θ is to consider a Carnot cycle (see Figure 2.9)
with an ideal gas as the working substance. At every stage of the cycle we have
dQ
dE − dW
dE + P dV
=
=
,
θ
θ
θ
or
dQ
dE
dV
=
+ Nk
.
θ
θ
V (2.97) The ﬁrst term on the righthand side of (2.97) depends only on θ and the second term depends
only on the volume. If we integrate (2.97) around one cycle, both θ and V return to their starting
values, and hence the loop integral of the righthand side of (2.97) is zero. We conclude that
Qcold Qhot
dQ
=
−
= 0.
θ
θcold
θhot (2.98) In Section 2.14 we showed that Qcold /Qhot = Tcold /Thot for a Carnot engine (see (2.71)). If we
combine this result with (2.98), we ﬁnd that
θcold
Tcold
=
.
Thot
θhot (2.99) It follows that the thermodynamic temperature T is proportional to the ideal gas scale temperature
θ. From now on we shall assume that we have chosen suitable units for S so that T = θ. 2.17 The Thermodynamic Pressure In Section 2.13 we showed that the thermodynamic deﬁnition of temperature follows by considering
the condition for the thermal equilibrium of two subsystems. In the following, we show that the
pressure can be deﬁned in an analogous way and that the pressure can be interpreted as a response
of the entropy to a change in the volume.
As before, consider an isolated composite system that is partitioned into two subsystems. The
subsystems are separated by a movable, insulating wall so that the energies and volumes of the
subsystems can adjust themselves, but NA and NB are ﬁxed. For simplicity, we assume that EA
and EB have already changed so that thermal equilibrium has been established. For ﬁxed total
volume V , we have one independent variable which we take to be VA ; VB is given by VB = V − VA .
The value of VA that maximizes Stotal is given by
dStotal = ∂SA
∂SB
dVA +
dVB = 0.
∂VA
∂VB (2.100) CHAPTER 2. THERMODYNAMIC CONCEPTS 62 Because dVA = −dVB , we can use (2.100) to write the condition for mechanical equilibrium as
∂SB
∂SA
=
.
∂VA
∂VB (2.101) We deﬁne the thermodynamic pressure P as
∂S
P
=
T
∂V E ,N . (thermodynamic deﬁnition of the pressure) (2.102) For completeness, we also deﬁne the chemical potential as the response of the entropy to a
change in the number of particles:
∂S
µ
=−
T
∂N E ,V . (thermodynamic deﬁnition of the chemical potential) (2.103) We will discuss the interpretation of µ in Section 4.12. You probably won’t be surprised to learn
that if two systems can exchange particles, then µ1 = µ2 is the condition for chemical equilibrium.
We will sometimes distinguish between thermal equilibrium, mechanical equilibrium, and
chemical equilibrium for which the temperatures, pressures, and chemical potentials are equal,
respectively. 2.18 The Fundamental Thermodynamic Relation The ﬁrst law of thermodynamics implies that the internal energy E is a function of state. For any
change of state, the change in E is given by (2.21):
∆E = Q + W. (any process) (2.104) To separate the contributions to E due to heating and work, the constraints on the system have
to be known. If the change is quasistatic, then the inﬁnitesimal work done is
dW = −P dV, (quasistatic process) (2.105) dQ = T dS. (quasistatic process) (2.106) and Thus, for an inﬁnitesimal change in energy, we obtain
dE = T dS − P dV. (2.107) There are two ways of thinking about (2.107). As our derivation suggests this equation tells us the
relationship between changes in energy, entropy, and volume in a quasistatic process. However,
because S , V , and E are functions of state, we can view (2.107) as the diﬀerential form (for ﬁxed
N ) of the fundamental equation E = E (S, V, N ) which describes the relationship between E , S ,
V , and N for all equilibrium states. We can also understand (2.107) by considering S as a function
of E , V , and N , and writing dS as
dS = ∂S
∂S
∂S
dE +
dV +
dN.
∂E
∂V
∂N (2.108) CHAPTER 2. THERMODYNAMIC CONCEPTS 63 If we use the deﬁnitions (2.60), (2.102), and (2.103) of the various partial derivatives of S (E, V, N ),
we can write
1
P
µ
dS = dE + dV − dN,
(2.109)
T
T
T
which is equivalent to (2.107) for a ﬁxed number of particles.
If we know the entropy S as a function of E, V and N , we can immediately determine the
corresponding responses T, P , and µ. For this reason we shall refer to E, V and N as the natural
variables in which S should be expressed. In this context S can be interpreted as a thermodynamic
potential because its various partial derivatives yield the equations of state of the system. In
Section 2.21 we shall discuss thermodynamic potentials that have diﬀerent sets of natural variables.
We can alternatively consider E as a function of S , V , and N and rewrite (2.109) as
dE = T dS − P dV + µdN. (fundamental thermodynamic relation) (2.110) Equation (2.110) is a mathematical statement of the combined ﬁrst and second laws of thermodynamics. Although there are few equations in physics that are necessary to memorize, (2.110) is
one of the few equations of thermodynamics that you should know without thinking.
Many useful thermodynamic relations can be derived using (2.110). For example, if we regard
E as a function of S , V , and N , we can write
dE = ∂E
∂E
∂E
dS +
dV +
dN.
∂S
∂V
∂N (2.111) If we compare (2.110) and (2.111), we see that
T= ∂E
∂S P =− V ,N ∂E
∂V S,N µ= ∂E
∂N S,V . (2.112) Note that E (S, V, N ) also can be interpreted as a thermodynamic potential. Or we can start with
(2.110) and easily obtain (2.109) and the thermodynamics deﬁnitions of T , P , and µ. 2.19 The Entropy of an Ideal Gas Because we know two equations of state of an ideal gas, (2.9) and (2.23), we can ﬁnd the entropy of
an ideal gas as a function of various combinations of E , T , P , and V (for ﬁxed N ). If we substitute
1/T = 3N k/(2E ) and P/T = N k/V into (2.109), we obtain
dS = 3
dE
dV
Nk
+ Nk
.
2
E
V (2.113) We can integrate (2.113) to obtain the change in the entropy from state E1 , V1 to state E2 , V2 :
∆S = 3
E2
V2
N k ln
+ N k ln .
2
E1
V1 We see that S is an additive quantity as we assumed; that is, S is proportional to N , (2.114) CHAPTER 2. THERMODYNAMIC CONCEPTS 64 Frequently it is more convenient to express S in terms of T and V or T and P . To obtain
S (T, V ) we substitute E = 3N kT /2 into (2.114) and obtain
∆S = 3
T2
V2
N k ln
+ N k ln .
2
T1
V1 (2.115) Problem 2.20. (a) Find ∆S (T, P ) for an ideal gas. (b) Use (2.115) to derive the relation (2.45)
for a quasistatic adiabatic process. 2.20 The Third Law of Thermodynamics We can calculate only diﬀerences in the entropy using purely thermodynamic relations as we did
in Section 2.19. We can determine the absolute value of the entropy by using the third law of
thermodynamics which states that
lim S = 0. T →0 (third law of thermodynamics) (2.116) A statement equivalent to (2.116) was ﬁrst proposed by Nernst in 1906 on the basis of empirical
observations.16 The statistical basis of this law is discussed in Section 4.6. In the context of
thermodynamics, the third law can be understood only as a consequence of empirical observations.
The most important consequence of the third law is that all heat capacities must go to zero
as the temperature approaches zero. For changes at constant volume, we know that
S (T2 , V ) − S (T1 , V ) = T2
T1 CV (T )
dT.
T (2.117) The condition (2.116) implies that in the limit T1 → 0, the integral in (2.117) must go to a
ﬁnite limit, and hence we require that CV (T ) → 0 as T → 0. Similarly, we can argue that
CP → 0 as T → 0. Note that these conclusions about the low temperature behavior of CV and
CP are independent of the nature of the system. Such is the power of thermodynamics. This low
temperature behavior of the heat capacity was ﬁrst established experimentally in 1910–1912.
As we will ﬁnd in Section 4.6, the third law is a consequence of the fact that the most
fundamental description of nature at the microscopic level is quantum mechanical. We have already
seen that the heat capacity is a constant for an ideal classical gas. Hence, the thermal equation
of state, E = 3N kT /2, as well as the pressure equation of state, P V = N kT , must cease to be
applicable at suﬃciently low temperatures.
Example 2.15. At very low temperature T , the heat capacity C of an insulating solid is proportional to T 3 . If we take C = AT 3 for a particular solid, what is the entropy of the solid at
temperature T ?
Solution. As before, the entropy is given by (see (2.79)):
T S (T ) =
0 CV (T )
dT,
T (2.118) 16 Walther Nernst (1864–1943) was awarded the 1920 Nobel prize in chemistry for his discovery of the third law
and related work. CHAPTER 2. THERMODYNAMIC CONCEPTS 65 where we have used the fact that S (T = 0) = 0. We can integrate the righthand side of (2.118)
from T = 0 to the desired value of T to ﬁnd the absolute value of S . The result in this case is
S = AT 3 /3.
Problem 2.21. Is the expression (2.115) applicable at very low temperatures? Explain. 2.21 Free Energies We have seen that the entropy of an isolated system can never decrease. However, an isolated
system is not of much experimental interest, and we wish to consider the more typical case where the
system of interest is connected to a much larger system whose properties do not change signiﬁcantly.
As we have discussed on page 51, this larger system is called a heat bath.
If a system is connected to a heat bath, then the entropy of the system may increase or
decrease. The only thing we can say for sure is that the entropy of the system plus the heat bath
must increase. Because the entropy is additive, we have17
Scomposite = S + Sbath , (2.119) ∆Scomposite = ∆S + ∆Sbath ≥ 0, (2.120) and
where the properties of the system of interest are denoted by the absence of a subscript. Our goal
is to determine if there is a property of the system alone (not the composite system) that is a
maximum or a minimum. We begin by writing the change ∆Sbath in terms of the properties of
the system. Because energy can be transferred between the system and heat bath, we have
∆Sbath = −Q
,
Tbath (2.121) where Q is the amount of energy transferred by heating the system, and −Q is the amount of energy
transferred to the heat bath. If we use (2.121) and the fundamental thermodynamic relation,
(2.110), we can rewrite (2.120) as
∆Scomposite = ∆S − Q
.
Tbath (2.122) The application of the ﬁrst law to the system gives
∆E = Q + W, (2.123) where ∆E is the change in the energy of the system and W is the work done on it. If the work
done on the system is due to the heat bath, then W = −Pbath ∆V , where ∆V is the change in
volume of the system. Then we can write
∆Scomposite = ∆S −
17 The ∆E − W
∆E + Pbath ∆V
= ∆S −
≥ 0.
Tbath
Tbath following discussion is adapted from Mandl, pp. 89–92. (2.124) CHAPTER 2. THERMODYNAMIC CONCEPTS 66 A little algebra leads to
∆E + Pbath ∆V − Tbath ∆S ≤ 0. (2.125) This result suggests that we deﬁne the availability by
A = E + Pbath V − Tbath S, (2.126) ∆A = ∆E + Pbath ∆V − Tbath ∆S ≤ 0. (2.127) so that (2.125) becomes
The availability includes properties of both the system and the heat bath. The signiﬁcance of the
availability will be discussed below.
We now look at some typical experimental situations, and introduce a quantity that depends
only on the properties of the system. As before, we assume that its volume and number of particles
is ﬁxed, and that its temperature equals the temperature of the heat bath, that is, Tbath = T and
∆V = 0. In this case we have
∆A = ∆E − T ∆S ≡ ∆F ≤ 0, (2.128) where we have deﬁned the Helmholtz free energy as
F = E − T S. (2.129) The inequality in (2.128) implies that if a constraint within the system is removed, then the system’s
Helmholtz free energy will decrease. At equilibrium the lefthand side of (2.128) will vanish, and
F will be a minimum. Thus, F plays the analogous role for systems at constant T and V that was
played by the entropy for an isolated system (constant E and V ).
The entropy of an isolated system is a function of E , V , and N . What are the natural variables
for F ? From our discussion it should be clear that these variables are T , V , and N . The answer
can be found by taking the diﬀerential of (2.129) and using (2.110). The result is
dF = dE − SdT − T dS
= (T dS − P dV + µdN ) − SdT − T dS
= −SdT − P dV + µdN. (2.130a)
(2.130b)
(2.130c) We substituted dE = T dS − P dV + µdN to go from (2.130a) to (2.130b).
From (2.130c) we see that F = F (T, V, N ) and that S , P , and µ can be obtained by taking
appropriate partial derivatives of F . For example,
S=− ∂F
∂T V ,N P =− ∂F
∂V T ,N , (2.131) and
(2.132) Hence, we conclude that the Helmholtz free energy is a minimum for a given T , V , and N .
The Helmholtz free energy is only one example of a free energy or thermodynamic potential.
We can relax the condition of a ﬁxed volume by requiring that the pressure be speciﬁed. In this CHAPTER 2. THERMODYNAMIC CONCEPTS 67 case mechanical equilibrium requires that the pressure of the system equal the pressure of the
bath. This case is common in experiments using ﬂuids where the pressure is ﬁxed at atmospheric
pressure. We write Pbath = P and express (2.125) as
∆A = ∆E + P ∆V − T ∆S ≡ ∆G ≤ 0, (2.133) where we have deﬁned the Gibbs free energy as
G = E − T S + P V = F + P V. (2.134) The natural variables of G can be found in the same way as we did for F . We ﬁnd that G =
G(T, P, N ) and
dG = dE − SdT − T dS + P dV + V dP
= (T dS − P dV + µdN ) − SdT − T dS + P dV + V dP
= −SdT + V dP + µdN. (2.135) We can use similar reasoning to conclude that G is a minimum at ﬁxed temperature, pressure, and
number of particles.
We can also relate G to the chemical potential using the following argument. Note that G
and N are extensive variables, but T and P are not. Thus, G must be proportional to N :
G = N g (T, P ), (2.136) where g (T, P ) is the Gibb’s free energy per particle. This function must be the chemical potential
because ∂G/∂N = g (T, P ) from (2.136) and ∂G/∂N = µ from (2.135). Thus, the chemical
potential is the Gibbs free energy per particle:
µ(T, P ) = G
= g (T, p).
N (2.137) Because g depends only on T and P , we have
∂g
∂g
dP +
∂P T
∂T
= vdP − sdT, dg = P dT (2.138)
(2.139) where v = V /N and s = S/N . The properties of G and the relation (2.139) will be important
when we discuss processes involving a change of phase (see Section 7.5).
Another common thermodynamic potential is the enthalpy H which we deﬁned in (2.29). This
potential is similar to E (S, V, N ) except for the requirement of ﬁxed P rather than ﬁxed V .
Problem 2.22. Show that
dH = T dS + V dP + µdN, (2.140) and
∂H
∂S
∂H
V=
∂P
∂H
µ=
∂N
T= (2.141) P,N (2.142) S,N
S,P . (2.143) CHAPTER 2. THERMODYNAMIC CONCEPTS 68 Problem 2.23. Show that H is a minimum for an equilibrium system at ﬁxed entropy.
Landau potential. As we have seen, we can deﬁne many thermodynamic potentials depending
on which variables we constrain. A very useful thermodynamic potential that has no generally
recognized name or symbol is sometimes called the Landau potential and is denoted by Ω. Another
common name is simply the grand potential. The Landau potential is the thermodynamic potential
for which the variables T, V , and µ are speciﬁed and is given by
Ω(T, V, µ) = F − µN. (2.144) If we take the derivative of Ω and use the fact that dF = −SdT − P dV + µdN (see (2.130c)), we
obtain
dΩ = dF − µdN − N dµ
= −SdT − P dV − N dµ. (2.145a)
(2.145b) From (2.145a) we have
∂Ω
∂T
∂Ω
P =−
∂V
∂Ω
N =−
∂µ
S=− V ,µ
T ,µ
T ,V . (2.146) . (2.147) . (2.148) Because G = N µ, we can write Ω = F − G. Hence, if we use the deﬁnition G = F + P V , we obtain
Ω(T, V, µ) = F − µN = F − G = −P V. (2.149) The relation (2.149) will be very useful for obtaining the equation of state of various systems (see
Section 6.10).
*Useful work and availability. The free energies that we have introduced are useful for understanding the maximum amount of useful work, Wuseful , that can be done by a system when it is
connected to a heat bath. The system is not necessarily in thermal or mechanical equilibrium with
its surroundings. In addition to the system of interest and its surroundings (the bath), we include
a third body, namely, the body on which the system does useful work. The third body is thermally
insulated. The total work Wby done by the system is the work done against its surroundings,
Pbath ∆V plus the work done on the body, Wuseful :
Wby = Pbath ∆V + Wuseful . (2.150) Because Wby is the work done by the system when its volume changes by ∆V , the ﬁrst term in
(2.150) does not contain a negative sign. This term is the work that is necessarily and uselessly
performed by the system in changing its volume and thus also the volume of its surroundings. The
second term is the useful work done by the system. In (2.125) we replace the work done on the
heat bath, Pbath ∆V , by the total work done by the system Pbath ∆V + Wuseful to obtain
∆E + Pbath ∆V + Wuseful − Tbath ∆S ≤ 0, (2.151) CHAPTER 2. THERMODYNAMIC CONCEPTS 69 or the useful work done is
Wuseful ≤ −(∆E + Pbath ∆V − Tbath ∆S ) = −∆A, (2.152) Note that the maximum amount of useful work that can be done by the system is equal to −∆A.
This relation explains the meaning of the terminology availability because only −∆A is available
for useful work. The rest of the work is wasted on the surroundings.
Problem 2.24. (a) Show that if the change in volume of the system is zero, ∆V = 0, and the
initial and ﬁnal temperature are that of the heat bath, then the maximum useful work is −∆F . (b)
Show that if the initial and ﬁnal temperature and pressure are that of the bath, then the maximum
useful work is −∆G. Vocabulary
thermodynamics, system, boundary, surroundings
insulator, conductor, adiabatic wall
thermal contact, thermal equilibrium, temperature, thermodynamic equilibrium
thermometer, Celsius temperature scale, ideal gas temperature scale, thermodynamic temperature
heating, work
internal energy E , entropy S , state function, laws of thermodynamics
ideal gas, ideal gas equation of state, van der Waals equation of state
Boltzmann’s constant, universal gas constant
intensive and extensive variables
heat capacity, speciﬁc heat
quasistatic, reversible, irreversible, isothermal, constant volume, adiabatic, quasistatic, and
cyclic processes
heat bath, heat source, heat sink
Carnot engine, refrigerator, heat pump eﬃciency, coeﬃcient of performance
thermodynamic potential, Helmholtz free energy F , Gibbs free energy G, enthalpy H , Landau
potential Ω, availability A
Notation
volume V , number of particles N , thermodynamic temperature T , pressure P , chemical
potential µ CHAPTER 2. THERMODYNAMIC CONCEPTS 70 total work W , total energy transferred due to a temperature diﬀerence alone Q
kelvin K, Celsius ◦ C, Fahrenheit ◦ F
heat capacity C , speciﬁc heat c
thermal eﬃciency η
Boltzmann’s constant k , gas constant R Appendix 2A: Equivalence of Diﬀerent Statements of the Second Law
[xx not done xx] Appendix 2B: The Mathematics of Thermodynamics
Because the notation of thermodynamics can be cumbersome, we have tried to simplify it whenever
possible. However, one common simpliﬁcation can lead to initial confusion.
Consider the functional relationships:
y = f (x) = x2 ,
x = g (z ) = z 1/2 . and (2.153)
(2.154) If we write x in terms of z , we can write y as
y = h(z ) = f (g (z )) = z. (2.155) We have given the composite function a diﬀerent symbol h because it is diﬀerent from both f and
g . But we would soon exhaust the letters of the alphabet, and we frequently write y = f (z ) = z .
Note that f (z ) is a diﬀerent function than f (x).
The notation is even more confusing in thermodynamics. Consider for example, the entropy S
as a function of E , V , and N , which we write as S (E, V, N ). However, we frequently consider E as
a function of T from which we would obtain another functional relationship: S (E (T, V, N ), V, N ).
A mathematician would write the latter function with a diﬀerent symbol, but we don’t. In so doing
we confuse the name of a function with that of a variable and use the same name (symbol) for the
same physical quantity. This sloppiness can cause problems when we take partial derivatives. If
we write ∂S/∂V , is E or T to be held ﬁxed? One way to avoid confusion is to write (∂S/∂V )E or
(∂S/∂V )T , but this notation can become cumbersome.
Another confusing aspect of the mathematics of thermodynamics is the use of diﬀerentials.
Many authors, including Bohren and Albrecht,18 have criticized their use. These authors and
18 See Bohren and Albrecht, pp. 93–99. pressure CHAPTER 2. THERMODYNAMIC CONCEPTS 71 1
2 volume
Figure 2.11: The change in internal energy can be made arbitrarily small by making the initial (1)
and ﬁnal (2) states arbitrarily close, but the total work done, which is the area enclosed by the
nearly closed curve, is not vanishingly small. Adapted from Bohren and Albrecht.
others argue for example that the ﬁrst law should be written as
dQ dW
dE
=
+
,
dt
dt
dt (2.156) dE = ∆Q + ∆W, (2.157) rather than
An argument for writing the ﬁrst law in the form (2.156) is that the ﬁrst law applies to a process,
which of course takes place over an interval of time. Here, dE/dt represents the rate of energy
change, dW/dt is the rate of doing work or working and dQ/dt is the rate of heating. In contrast, dE
in (2.157) is the inﬁnitesimal change in internal energy, ∆W is the inﬁnitesimal work done on the
system, and ∆Q is the inﬁnitesimal heat added. However, the meaning of an inﬁnitesimal in this
context is vague. For example, for the process shown in Figure 2.11, the energy diﬀerence E2 − E1 is
arbitrarily small and hence could be represented by a diﬀerential dE , but the work and heating are
not inﬁnitesimal. However, the use of inﬁnitesimals should not cause confusion if you understand
that dy in the context dy/dx = f (x) has a diﬀerent meaning than in the context, dy = f (x) dx.
If the use of inﬁnitesimals is confusing to you, we encourage you to replace inﬁnitesimals by rate
equations as in (2.156).
Review of partial derivatives. The basic theorem of partial diﬀerentiation states that if z is a
function of two independent variables x and y , then the total change in z (x, y ) due to changes in
x and y can be expressed as
∂z
∂z
dx +
dy.
(2.158)
dz =
∂x y
∂y x
The cross derivatives ∂ 2 z/∂x ∂y and ∂ 2 z/∂y ∂x are equal, that is, the order of the two derivatives CHAPTER 2. THERMODYNAMIC CONCEPTS 72 does not matter. We will use this property to derive what are known as the Maxwell relations in
Section 7.2.
The chain rule for diﬀerentiation holds in the usual way if the same variables are held constant
in each derivative. For example, we can write
∂z
∂x y ∂z
∂w = ∂w
∂x y y . (2.159) We also can derive a relation whose form is superﬁcially similar to (2.159) when diﬀerent variables
are held constant in each term. From (2.158) we set dz = 0 and obtain
dz = 0 = ∂z
∂x y dx + ∂z
∂y x dy. (2.160) , (2.161) We divide both sides of (2.160) by dx:
0= ∂z
∂x y + ∂z
∂y x ∂y
∂x z and rewrite (2.161) as
∂z
∂x y =− ∂z
∂y x ∂y
∂x z . (2.162) Note that (2.162) involves a relation between the three possible partial derivatives which involve
x, y , and z .
Problem 2.25. Consider the function
z (x, y ) = x2 y + 2x4 y 6 .
Calculate ∂z/∂x, ∂z/∂y , ∂ 2 z/∂x ∂y , and ∂ 2 z/∂y ∂x and show that ∂ 2 z/∂x ∂y = ∂ 2 z/∂y ∂x. CHAPTER 2. THERMODYNAMIC CONCEPTS 73 Additional Problems
Problems
2.1, 2.2, 2.3
2.4, 2.5
2.6
2.7
2.8, 2.9
2.10, 2.11
2.12
2.13, 2.14
2.15
2.16
2.17
2.18
2.19
2.20
2.21, 2.22
2.23
2.24
2.25 page
30
31
34
35
38
39
41
44
45
46
46
57
60
64
67
68
69
72 Listing of inline problems.
Problem 2.26. Compare the notion of mechanical equilibrium and thermodynamic equilibrium.
Problem 2.27. Explain how a barometer works to measure pressure.
Problem 2.28. Is a diamond forever? What does it mean to say that diamond is a metastable form
of carbon? What is the stable form of carbon? Is it possible to apply the laws of thermodynamics
to diamond?
Problem 2.29. Although you were probably taught how to convert between Fahrenheit and
Celsius temperatures, you might not remember the details. The fact that 1 ◦ C equals 9 ◦ F is not
5
too diﬃcult to remember, but where does the factor of 32 go? An alternative procedure is to add
40 to the temperature in ◦ C or ◦ F and multiply by 5 if going from ◦ F to ◦ C or by 9 if going
9
5
from ◦ C to ◦ F. Then subtract 40 from the calculated temperature to obtain the desired conversion.
Explain why this procedure works.
Problem 2.30. It is common in everyday language to refer to temperatures as “hot” and “cold.”
Why is this use of language misleading? Does it make sense to say that one body is “twice as hot”
as another? Does it matter whether the Celsius or kelvin temperature scale is used?
Problem 2.31. Does it make sense to talk about the amount of heat in a room?
Problem 2.32. In what context can energy transferred by heating be treated as a ﬂuid? Give
some examples where this concept of “heat” is used in everyday life. In what context does the
concept of “heat” as a ﬂuid break down? Is it possible to isolate “heat” in a bottle or pour it from
one object to another? CHAPTER 2. THERMODYNAMIC CONCEPTS 74 Problem 2.33. Why should we check the pressure in a tire when the tire is cold?
Problem 2.34. Suppose that we measure the temperature of a body and then place the body on
a moving conveyer belt. Does the temperature of the body change?
Problem 2.35. Why do we use the triple point of water to calibrate thermometers? Why not use
the melting point or the boiling point?
Problem 2.36. In the text we discussed the analogy of the internal energy to the amount of
water in a pond. The following analogy due to Dugdale might also be helpful.19 Suppose that a
student has a bank account with a certain amount of money. The student can add to this amount
by either depositing or withdrawing cash and by writing or depositing checks from the accounts
of others. Does the total amount of money in his account distinguish between cash and check
transfers? Discuss the analogies to internal energy, work, and heating.
Problem 2.37. The following excerpt is taken from a text used by one of the author’s children
in the sixth grade. The title and the author will remain anonymous. Find the conceptual errors
in the text.
A. What is heat?
You have learned that all matter is made up of atoms. Most of these atoms combine to form molecules.
These molecules are always moving—they have kinetic energy. Heat is the energy of motion (kinetic energy)
of the particles that make up any piece of matter.
The amount of heat a material has depends on how many molecules it has and how fast the molecules
are moving. The greater the number of molecules and the faster they move, the greater the number of
collisions between them. These collisions produce a large amount of heat.
How is heat measured? Scientists measure heat by using a unit called a calorie. A calorie is the
amount of heat needed to raise the temperature of 1 gram of 1 water 1 degree centigrade (Celsius).
A gram is a unit used for measuring mass. There are about 454 grams in 1 pound.
Scientists use a small calorie and a large Calorie. The unit used to measure the amount of heat needed
to raise the temperature of 1 gram of water 1 degree centigrade is the small calorie. The large calorie is
used to measure units of heat in food. For example, a glass of milk when burned in your body produces
about 125 Calories.
Questions:
1. What is heat?
2. What two things does the amount of heat a substance has depend on?
3. What is a calorie?
4. Explain the following: small calorie; large calorie.
B. What is temperature?
The amount of hotness in an object is called its temperature. A thermometer is used to measure
temperature in units called degrees. Most thermometers contain a liquid.
C. Expansion and Contraction
Most solids, liquids and gases expand when heated and contract when cooled. When matter is heated,
its molecules move faster. As they move, they collide with their neighbors very rapidly. The collisions
force the molecules to spread farther apart. The farther apart they spread, the more the matter expands.
19 See Dugdale, pp. 21–22. CHAPTER 2. THERMODYNAMIC CONCEPTS 75 Air, which is a mixture of gases, expands and becomes lighter when its temperature rises. Warm air
rises because the cold, heavier air sinks and pushes up the lighter warm air.
What happens when solids or gases are cooled? The molecules slow down and collide less. The
molecules move closer together, causing the material to contract. Problem 2.38. Why are the terms heat capacity and speciﬁc heat poor choices of names? Suggest
more appropriate names. Criticize the statement: “The heat capacity of a body is a measure of
how much heat the body can hold.”
Problem 2.39. The atmosphere of Mars has a pressure that is only 0.007 times that of the Earth
and an average temperature of 218 K. What is the volume of 1 mole of the Martian atmosphere?
Problem 2.40. Discuss the meaning of the statement that one of the most important contributions
of 19th century thermodynamics was the development of the understanding that heat (and work)
are names of methods not names of things.
Problem 2.41. Gasoline burns in an automobile engine and releases energy at the rate of 160 kW.
Energy is exhausted through the car’s radiator at the rate of 51 kW and out the exhaust at 50 kW.
An additional 23 kW goes to frictional heating within the machinery of the car. What fraction of
the fuel energy is available for moving the car?
Problem 2.42. Two moles of an ideal gas at 300 K occupying a volume of 0.10 m3 is compressed
isothermally by a motor driven piston to a volume of 0.010 m3 . If this process takes places in 120 s,
how powerful a motor is needed?
Problem 2.43. Give an example of a process in which a system is not heated, but its temperature
increases. Also give an example of a process in which a system is heated, but its temperature is
unchanged.
Problem 2.44. (a) Suppose that a gas expands adiabatically into a vacuum. What is the work
done by the gas? (b) Suppose that the total energy of the gas is given by (see (2.24))
E= N
3
N kT − N a,
2
V (2.163) where a is a positive constant. Initially the gas occupies a volume VA at a temperature TA . The
gas then expands adiabatically into a vacuum so that it occupies a total volume VB . What is the
ﬁnal temperature of the gas?
Problem 2.45. Calculate the work done on one mole of an ideal gas in an adiabatic quasistatic
compression from volume VA to volume VB . The initial pressure is PA .
Problem 2.46. Consider the following processes and calculate W , the total work done on the
system and Q, the total energy absorbed by heating the system when it is brought quasistatically
from A to C (see Figure 2.12). Assume that the system is an ideal gas. (This problem is adapted
from Reif, p. 215.)
(a) The volume is changed quasistatically from A → C while the gas is kept thermally isolated. CHAPTER 2. THERMODYNAMIC CONCEPTS 76 (b) The system is compressed from its original volume of VA = 8 m3 to its ﬁnal volume VC = 1 m3
along the path A → B and B → C . The pressure is kept constant at PA = 1 Pa and the system
is cooled to maintain constant pressure. The volume is then kept constant and the system is
heated to increase the pressure to PB = 32 Pa.
(c) A → D and D → C . The two steps of the preceding process are performed in opposite order.
(d) A → C . The volume is decreased and the system is heated so that the pressure is proportional
to the volume. P
32 C D P ∝ V5/3 1 B 1 A 8V Figure 2.12: Illustration of various thermodynamic processes discussed in Problem 2.46. The units
of the pressure P and the volume V are Pa and m3 , respectively.
Problem 2.47. A 0.5 kg copper block at 80 ◦ C is dropped into 1 kg of water at 10 ◦ C. What is
the ﬁnal temperature? What is the change in entropy of the system? The speciﬁc heat of copper
is 386 J/(kg K).
Problem 2.48. (a) Surface temperatures in the tropical oceans are approximately 25 ◦ C, while
hundreds of meters below the surface the temperature is approximately 5 ◦ C. What would be the
eﬃciency of a Carnot engine operating between these temperatures? (b) What is the eﬃciency of
a Carnot engine operating between the normal freezing and boiling points of water?
Problem 2.49. A small sample of material is taken through a Carnot cycle between a heat source
of boiling helium at 1.76 K and a heat sink at an unknown lower temperature. During the process,
7.5 mJ of energy is absorbed by heating from the helium and 0.55 mJ is rejected at the lower
temperature. What is the lower temperature? CHAPTER 2. THERMODYNAMIC CONCEPTS 77 Problem 2.50. Positive change in total entropy
(a) Show that the total entropy change in Example 2.13 can be written as
∆S = Cf ( T2
),
T1 (2.164) where 1
− 1.
(2.165)
x
and x > 1 corresponds to heating. Calculate f (x = 1) and df /dx and show that the entropy
of the universe increases for a heating process.
f (x) = ln x + (b) If the total entropy increases in a heating process, does the total entropy decrease in a cooling
process? Use similar considerations to show that the total entropy increases in both cases.
(c) Plot f (x) as a function of x and conﬁrm that its minimum value is at x = 1 and that f > 0
for x < 1 and x > 1.
Problem 2.51. Show that the enthalpy, H ≡ E + P V , is the appropriate free energy for the case
where the entropy and number of particles is ﬁxed, but the volume can change. In this case we
consider a system connected to a larger body such that the pressure of the system equals that of
the large body with the constraint that the larger body and the system do not exchange energy.
An example of this situation would be a gas conﬁned to a glass container with a movable piston.
Problem 2.52. Find the Landau potential for the case where the temperature is ﬁxed by a heat
bath, the volume is ﬁxed, and particles can move between the systems and the heat bath. You will
need to extend the deﬁnition of the availability to allow for the number of particles to vary within
the system. Use the same argument about extensive variables to show that the Landau potential
equals −P V .
Problem 2.53. One kilogram of water at 50 ◦ C is brought into contact with a heat bath at 0 ◦ C.
What is the change of entropy of the water, the bath, and the combined system consisting of both
the water and the heat bath? Given that the total entropy increased in Example 2.13, should the
entropy increase or decrease in this case?
Problem 2.54. Calculate the changes in entropy due to various methods of heating:
(a) One kilogram of water at 0 ◦ C is brought into contact with a heat bath at 90 ◦ C. What is
the change in entropy of the water? What is the change in entropy of the bath? What is the
change in entropy of the entire system consisting of both water and heat bath? (The speciﬁc
heat of water is approximately 4184 J/kg K.)
(b) The water is heated from 0 ◦ C to 90 ◦ C by ﬁrst bringing it into contact with a heat bath at
45 ◦ C and then with a heat bath at 90 ◦ C. What is the change in entropy of the entire system?
(c) Discuss how the water can be heated from 0 ◦ C to 90 ◦ C without any change in entropy of the
entire system. CHAPTER 2. THERMODYNAMIC CONCEPTS 78 Problem 2.55. If S is expressed as a function of T, V or T, P , then it is no longer a thermodynamic
potential. That is, the maximum thermodynamic information is contained in S as a function of E
and V (for ﬁxed N ). Why?
Problem 2.56. Refrigerators. A refrigerator cools a body by heating the hotter room surrounding
the body. According to the second law of thermodynamics, work must be done by an external body.
Suppose that we cool the cold body by the amount Qcold at temperature Tcold and heat the room
by the amount Qhot at temperature Thot . The external work supplied is W (see Figure 2.13). The
work W supplied is frequently electrical work, the refrigerator interior is cooled (Qcold extracted),
and Qhot is given to the room. We deﬁne the coeﬃcient of performance (COP) as
COP = Qcold
what you get
=
.
what you pay for
W (2.166) Show that the maximum value of the COP corresponds to a reversible refrigerator and is given by
COP = Tcold
.
Thot − Tcold (2.167) Note that a refrigerator is more eﬃcient for smaller temperature diﬀerences. Thot room Qhot
W engine Qcold
Tcold
refrigerator Figure 2.13: The transfer of energy in an idealized refrigerator.
Problem 2.57. Heat Pumps. A heat pump works on the same principle as a refrigerator, but the
object is to heat a room by cooling its cooler surroundings. For example, we could heat a building
by cooling a nearby body of water. If we extract energy Qcold from the surroundings at Tcold , do
work W , and deliver Qhot to the room at Thot , the coeﬃcient of performance is given by
COP = what you get
Qhot
=
.
what you pay for
W (2.168) CHAPTER 2. THERMODYNAMIC CONCEPTS 79 What is the maximum value of COP for a heat pump in terms of Tcold and Thot ? What is the COP
when the outside temperature is 0 ◦ C and the interior temperature is 23 ◦ C? Is it more eﬀective
to operate a heat pump during the winters in New England where the winters are cold or in the
Paciﬁc Northwest where the winters are relatively mild?
Problem 2.58. Use (2.115) to derive the relation (2.43) between V and T for an ideal gas in a
quasistatic adiabatic process.
Problem 2.59. The Otto cycle. The Otto cycle is the idealized prototype of most presentday
internal combustion engines. The cycle was ﬁrst described by Beau de Rochas in 1862. Nicholas
Otto independently conceived of the same cycle in 1876 and then constructed an engine to implement it. The idealization makes two basic assumptions. One is that the working substance is taken
to be air rather than a mixture of gases and vapor whose composition changes during the cycle.
For simplicity, we assume that CV and CP are constant and that γ = CP /CV = 1.4, the value
for air. The more important approximation is that the changes are assumed to be quasistatic. An
idealized cycle that represents the six parts of this cycle is known as the air standard Otto cycle
and is illustrated in Figure 2.14.
5 → 1. Intake stroke. The mixture of gasoline and air is drawn into the cylinder through the
intake valve by the movement of the piston. Idealization: A quasistatic isobaric intake of air
at pressure P0 to a volume V1 .
1 → 2. Compression stroke. The intake valve closes and airfuel mixture is rapidly compressed in the cylinder. The compression is nearly adiabatic and the temperature rises.
Idealization: A quasistatic adiabatic compression from V1 to V2 ; the temperature rises from
T1 to T2 .
2 → 3. Explosion. The mixture explodes such that the volume remains unchanged and a very
high temperature and pressure is reached. Idealization: A quasistatic and constant volume
increase of temperature and pressure due to the absorption of energy from a series of heat
baths between T2 and T3 .
3 → 4. Power stroke. The hot combustion products expand and do work on the piston.
The pressure and temperature decrease considerably. Idealization: A quasistatic adiabatic
expansion produces a decrease in temperature.
4 → 1. Valve exhaust. At the end of the power stroke the exhaust valve opens and the combustion products are exhausted to the atmosphere. There is a sudden decrease in pressure.
Idealization: A quasistatic constant volume decrease in temperature to T1 and pressure P0
due to a exchange of energy with a series of heat baths between T4 and T1 .
1 → 5. Exhaust stroke. The piston forces the remaining gases into the atmosphere. The
exhaust valve then closes and the intake valve opens for the next intake stroke. Idealization:
A quasistatic isobaric expulsion of the air.
Show that the eﬃciency of the Otto cycle is
η =1− V2
V1 γ −1 . (2.169) CHAPTER 2. THERMODYNAMIC CONCEPTS 80 A compression ratio of about ten can be used without causing knocking. Estimate the theoretical
maximum eﬃciency. In a real engine, the eﬃciency is about half of this value. P 3
Q1
4 2 Q2
P0 5 1
V2 V1 V Figure 2.14: The air standard Otto cycle. Suggestions for Further Reading
C. J. Adkins, Equilibrium Thermodynamics, third edition, Cambridge University Press (1983).
Ralph Baierlein, Thermal Physics, Cambridge University Press, New York (1999).
Craig F. Bohren and Bruce A. Albrecht, Atmospheric Thermodynamics, Oxford University Press
(1998). A book for prospective meteorologists that should be read by physics professors and
students alike. This chapter relies strongly on their development.
Robert P. Bauman, Modern Thermodynamics with Statistical Mechanics, Macmillan Publishing
Company (1992).
Stephen G. Brush, The Kind of Motion We Call Heat: A History of the Kinetic Theory of Gases
in the Nineteenth Century, Elsevier Science (1976). See also The Kind of Motion We Call
Heat: Physics and the Atomists, Elsevier Science (1986); The Kind of Motion We Call Heat:
Statistical Physics and Irreversible Processes, Elsevier Science (1986).
Herbert B. Callen, Thermodynamics and an Introduction to Thermostatistics, second edition,
John Wiley & Sons (1985). Callen’s discussion of the principles of thermodynamics is an
example of clarity. CHAPTER 2. THERMODYNAMIC CONCEPTS 81 J. S. Dugdale, Entropy and its Physical Meaning, Taylor & Francis (1996). A very accessible
introduction to thermodynamics.
E. Garber, S. G. Brush, and C. W. F. Everitt, eds., Maxwell on Heat and Statistical Mechanics,
Lehigh University Press (1995).
Michael E. Loverude, Christian H. Kautz, and Paula R. L. Henon, “Student understanding of the
ﬁrst law of thermodynamics: Relating work to the adiabatic compression of an ideal gas,”
Am. J. Phys. 70, 137–148 (2002).
F. Reif, Statistical Physics, Volume 5 of the Berkeley Physics Series, McGrawHill (1965).
F. Mandl, Statistical Physics, second edition, John Wiley & Sons (1988).
David E. Meltzer, “Investigation of students’ reasoning regarding heat, work, and the ﬁrst law
of thermodynamics in an introductory calculusbased general physics course,” Am. J. Phys.
72, 1432–1446 (2004).
Daniel V. Schroeder, An Introduction to Thermal Physics, AddisonWesley (2000).
Richard Wolfson and Jay M. Pasachoﬀ, Physics, second edition, HarperCollins College Publishers
(1995). Chapters 19–22 are a good introduction to the laws of thermodynamics. Chapter 3 Concepts of Probability
c 2005 by Harvey Gould and Jan Tobochnik
5 December 2005
We introduce the basic concepts of probability and apply it to simple physical systems and everyday
life. We will discover the universal nature of the central limit theorem and the Gaussian distribution
for the sum of a large number of random processes and discuss its relation to why thermodynamics
is possible. Because of the importance of probability in many contexts and the relatively little
time it will take us to cover more advanced topics, our discussion goes beyond what we will need
for most applications of statistical mechanics. 3.1 Probability in everyday life Chapter 2 provided an introduction to thermodynamics using macroscopic arguments. Our goal,
which we will consider again in Chapter 4, is to relate the behavior of various macroscopic quantities
to the underlying microscopic behavior of the individual atoms or other constituents. To do so, we
will need to introduce some ideas from probability.
We all use the ideas of probability in everyday life. For example, every morning many of us
decide what to wear based on the probability of rain. We cross streets knowing that the probability
of being hit by a car is small. We can even make a rough estimate of the probability of being hit
by a car. It must be less that one in a thousand, because you have crossed streets thousands of
times and hopefully you have not been hit. Of course you might be hit tomorrow, or you might
have been hit the ﬁrst time you tried to cross a street. These comments illustrate that we have
some intuitive sense of probability and because it is a useful concept for survival, we know how to
estimate it. As expressed by Laplace (1819),
Probability theory is nothing but common sense reduced to calculation.
Another interesting thought is due to Maxwell (1850):
The true logic of this world is the calculus of probabilities . . . 82 CHAPTER 3. CONCEPTS OF PROBABILITY 83 However, our intuition only takes us so far. Consider airplane travel. Is it safe to ﬂy? Suppose
that there is a one chance in 100,000 of a plane crashing on a given ﬂight and that there are a
1000 ﬂights a day. Then every 100 days or so there would be a reasonable likelihood of a plane
crashing. This estimate is in rough accord with what we read. For a given ﬂight, your chances of
crashing are approximately one part in 105 , and if you ﬂy ﬁve times a year for 80 years, it seems
that ﬂying is not too much of a risk. However, suppose that instead of living 80 years, you could
live 20,000 years. In this case you would take 100,000 ﬂights, and it would be much more risky to
ﬂy if you wished to live your full 20,000 years. Although this last statement seems reasonable, can
you explain why?
Much of the motivation for the mathematical formulation of probability arose from the proﬁciency of professional gamblers in estimating gambling odds and their desire to have more quantitative measures. Although games of chance have been played since history has been recorded, the
ﬁrst steps toward a mathematical formulation of games of chance began in the middle of the 17th
century. Some of the important contributors over the following 150 years include Pascal, Fermat,
Descartes, Leibnitz, Newton, Bernoulli, and Laplace, names that are probably familiar to you.
Given the long history of games of chance and the interest in estimating probability in a variety
of contexts, it is remarkable that the theory of probability took so long to develop. One reason
is that the idea of probability is subtle and is capable of many interpretations. An understanding
of probability is elusive due in part to the fact that the probably depends on the status of the
information that we have (a fact well known to poker players). Although the rules of probability are
deﬁned by simple mathematical rules, an understanding of probability is greatly aided by experience
with real data and concrete problems. To test your current understanding of probability, try to
solve Problems 3.1–3.6 before reading the rest of this chapter. Then in Problem 3.7 formulate the
laws of probability as best as you can based on your solutions to these problems.
Problem 3.1. A jar contains 2 orange, 5 blue, 3 red, and 4 yellow marbles. A marble is drawn
at random from the jar. Find the probability that (a) the marble is orange, (b) the marble is red,
(c) the marble is orange or blue.
Problem 3.2. A piggy bank contains one penny, one nickel, one dime, and one quarter. It is
shaken until two coins fall out at random. What is the probability that at least $0.30 falls out?
Problem 3.3. A girl tosses a pair of dice at the same time. Find the probability that (a) both
dice show the same number, (b) both dice show a number less than 5, (c) both dice show an even
number, (d) the product of the numbers is 12.
Problem 3.4. A boy hits 16 free throws out of 25 attempts. What is the probability that he will
make a free throw on his next attempt?
Problem 3.5. Consider an experiment in which a die is tossed 150 times and the number of times
each face is observed is counted. The value of A, the number of dots on the face of the die and the
number of times that it appeared is shown in Table 3.1. What is the predicted average value of A
assuming a fair die? What is the average value of A observed in this experiment?
Problem 3.6. A coin is taken at random from a purse that contains one penny, two nickels, four
dimes, and three quarters. If x equals the value of the coin, ﬁnd the average value of x. CHAPTER 3. CONCEPTS OF PROBABILITY
value of A
1
2
3
4
5
6 84
frequency
23
28
30
21
23
25 Table 3.1: The number of times face A appeared in 150 tosses.
Problem 3.7. Based on your solutions to the above problems, state the rules of probability as
you understand them at this time.
The following problems are related to the use of probability in everyday life.
Problem 3.8. Suppose that you are oﬀered a choice of the following: a certain $10 or the chance
of rolling a die and receiving $36 if it comes up 5 or 6, but nothing otherwise. Make arguments in
favor of each choice.
Problem 3.9. Suppose that you are oﬀered the following choice: (a) a prize of $100 if a ﬂip of
a coin yields heads or (b) a certain prize of $40. What choice would you make? Explain your
reasoning.
Problem 3.10. Suppose that you are oﬀered the following choice: (a) a prize of $100 is awarded
for each head found in ten ﬂips of a coin or (b) a certain prize of $400. What choice would you
make? Explain your reasoning.
Problem 3.11. Suppose that you were to judge an event to be 99.9999% probable. Would you be
willing to bet $999999 against $1 that the event would occur? Discuss why probability assessments
should be kept separate from decision issues.
Problem 3.12. Suppose that someone gives you a dollar to play the lottery. What sequence of
six numbers between 1 and 36 would you choose?
Problem 3.13. Suppose you toss a coin 8 times and obtain heads each time. Estimate the
probability that you will obtain heads on your ninth toss.
Problem 3.14. What is the probability that it will rain tomorrow? What is the probability that
the Dow Jones industrial average will increase tomorrow?
Problem 3.15. Give several examples of the use of probability in everyday life. Distinguish
between various types of probability. 3.2 The rules of probability We now summarize the basic rules and ideas of probability.1 Suppose that there is an operation or
a process that has several distinct possible outcomes. The process might be the ﬂip of a coin or the
1 In 1933 the Russian mathematician A. N. Kolmogorov formulated a complete set of axioms for the mathematical
deﬁnition of probability. CHAPTER 3. CONCEPTS OF PROBABILITY 85 roll of a sixsided die.2 We call each ﬂip a trial. The list of all the possible events or outcomes is
called the sample space. We assume that the events are mutually exclusive, that is, the occurrence
of one event implies that the others cannot happen at the same time. We let n represent the
number of events, and label the events by the index i which varies from 1 to n. For now we assume
that the sample space is ﬁnite and discrete. For example, the ﬂip of a coin results in one of two
events that we refer to as heads and tails and the role of a die yields one of six possible events.
For each event i, we assign a probability P (i) that satisﬁes the conditions
P (i) ≥ 0, (3.1) and
P (i) = 1. (3.2) i P (i) = 0 implies that the event cannot occur, and P (i) = 1 implies that the event must occur.
The normalization condition (3.2) says that the sum of the probabilities of all possible mutually
exclusive outcomes is unity.
Example 3.1. Let x be the number of points on the face of a die. What is the sample space of x?
Solution. The sample space or set of possible events is xi = {1, 2, 3, 4, 5, 6}. These six outcomes
are mutually exclusive.
The rules of probability will be summarized further in (3.3) and (3.4). These abstract rules
must be supplemented by an interpretation of the term probability. As we will see, there are
many diﬀerent interpretations of probability because any interpretation that satisﬁes the rules of
probability may be regarded as a kind of probability.
Perhaps the interpretation of probability that is the easiest to understand is based on symmetry. Suppose that we have a twosided coin that shows heads and tails. Then there are two
possible mutually exclusive outcomes, and if the coin is perfect, each outcome is equally likely. If
a die with six distinct faces (see Figure 3.1) is perfect, we can use symmetry arguments to argue
that each outcome should be counted equally and P (i) = 1/6 for each of the six faces. For an
actual die, we can estimate the probability a posteriori, that is, by the observation of the outcome
of many throws. As is usual in physics, our intuition will lead us to the concepts. Figure 3.1: The six possible outcomes of the toss of a die.
Suppose that we know that the probability of rolling any face of a die in one throw is equal
to 1/6, and we want to ﬁnd the probability of ﬁnding face 3 or face 6 in one throw. In this case
we wish to know the probability of a trial that is a combination of more elementary operations
for which the probabilities are already known. That is, we want to know the probability of the
2 The earliest known sixsided dice have been found in the Middle East. A die made of baked clay was found
in excavations of ancient Mesopotamia. The history of games of chance is discussed by Deborah J. Bennett,
Randomness, Harvard University Press (1998). CHAPTER 3. CONCEPTS OF PROBABILITY 86 outcome, i or j , where i is distinct from j . According to the rules of probability, the probability
of event i or j is given by
P (i or j ) = P (i) + P (j ). (addition rule) (3.3) The relation (3.3) is generalizable to more than two events. An important consequence of (3.3) is
that if P (i) is the probability of event i, then the probability of event i not occurring is 1 − P (i).
Example 3.2. What is the probability of throwing a three or a six with one throw of a die?
Solution. The probability that the face exhibits either 3 or 6 is 1 + 1 = 1 .
6
6
3
Example 3.3. What is the probability of not throwing a six with one throw of die?
Solution. The answer is the probability of either “1 or 2 or 3 or 4 or 5.” The addition rule gives
that the probability P (not six) is
P (not six) = P (1) + P (2) + P (3) + P (4) + P (5)
5
= 1 − P (6) = ,
6
where the last relation follows from the fact that the sum of the probabilities for all outcomes sums
to unity. Although this property of the probability is obvious, it is very useful to take advantage
of this property when solving many probability problems.
Another simple rule is for the probability of the joint occurrence of independent events. These
events might be the probability of throwing a 3 on one die and the probability of throwing a 4 on
a second die. If two events are independent, then the probability of both events occurring is the
product of their probabilities
P (i and j ) = P (j ) P (j ). (multiplication rule) (3.4) Events are independent if the occurrence of one event does not change the probability for the
occurrence of the other.
To understand the applicability of (3.4) and the meaning of the independence of events,
consider the problem of determining the probability that a person chosen at random is a female
over six feet tall. Suppose that we know that the probability of a person to be over six feet tall
is P (6+ ) = 1 , and the probability of being female is P (female) = 1 . We might conclude that
5
2
1
the probability of being a tall female is P (female) × P (6+ ) = 1 × 1 = 10 . The same probability
2
5
would hold for a tall male. However, this reasoning is incorrect, because the probability of being
a tall female diﬀers from the probability of being a tall male. The problem is that the two events
– being over six feet tall and being female – are not independent. On the other hand, consider
the probability that a person chosen at random is female and was born on September 6. We
can reasonably assume equal likelihood of birthdays for all days of the year, and it is correct to
1
conclude that this probability is 1 × 365 (not counting leap years). Being a woman and being born
2
on September 6 are independent events.
Problem 3.16. Give an example from your solutions to Problems 3.1–3.6 where you used the
addition rule or the multiplication rule. CHAPTER 3. CONCEPTS OF PROBABILITY 87 Example 3.4. What is the probability of throwing an even number with one throw of a die?
Solution. We can use the addition rule to ﬁnd that
P (even) = P (2) + P (4) + P (6) = 1
111
++=.
666
2 Example 3.5. What is the probability of the same face appearing on two successive throws of a
die?
Solution. We know that the probability of any speciﬁc combination of outcomes, for example,
1
(1,1), (2,2), . . . (6,6) is 1 × 1 = 36 . Hence, by the addition rule
6
6
P (same face) = P (1, 1) + P (2, 2) + . . . + P (6, 6) = 6 × 1
1
=.
36
6 Example 3.6. What is the probability that in two throws of a die at least one six appears?
Solution. We have already established that
P (6) = 1
6 P (not 6) = 5
.
6 There are four possible outcomes (6, 6), (6, not 6), (not 6, 6), (not 6, not 6) with the probabilities
P (6, 6) = 11
1
×=
66
36 P (6, not 6) = P (not 6, 6) =
P (not 6, not 6) = 15
5
×=
66
36 55
25
×=
.
66
36 All outcomes except the last have at least one six. Hence, the probability of obtaining at least one
six is
P (at least one 6) = P (6, 6) + P (6, not 6) + P (not 6, 6)
5
5
11
1
+
+
=
.
=
36 36 36
36
A more direct way of obtaining this result is to use the normalization condition. That is,
P (at least one six) = 1 − P (not 6, not 6) = 1 − 25
11
=
.
36
36 Example 3.7. What is the probability of obtaining at least one six in four throws of a die?
Solution. We know that in one throw of a die, there are two outcomes with P (6) = 1 and
6
P (not 6) = 5 . Hence, in four throws of a die there are sixteen possible outcomes, only one of
6
which has no six. That is, in the ﬁfteen mutually exclusive outcomes, there is at least one six. We
can use the multiplication rule (3.3) to ﬁnd that
P (not 6, not 6, not 6, not 6) = P (not 6)4 = 54
,
6 CHAPTER 3. CONCEPTS OF PROBABILITY 88 and hence
P (at least one six) = 1 − P (not 6, not 6, not 6, not 6)
54
671
=1−
≈ 0.517.
=
6
1296
Frequently we know the probabilities only up to a constant factor. For example, we might know
P (1) = 2P (2), but not P (1) or P (2) separately. Suppose we know that P (i) is proportional to f (i),
where f (i) is a known function. To obtain the normalized probabilities, we divide each function
f (i) by the sum of all the unnormalized probabilities. That is, if P (i) ∝ f (i) and Z =
f (i),
then P (i) = f (i)/Z . This procedure is called normalization.
Example 3.8. Suppose that in a given class it is three times as likely to receive a C as an A,
twice as likely to obtain a B as an A, onefourth as likely to be assigned a D as an A, and nobody
fails the class. What are the probabilities of getting each grade?
Solution. We ﬁrst assign the unnormalized probability of receiving an A as f (A) = 1. Then
f (B ) = 2, f (C ) = 3, and f (D) = 0.25. Then Z = i f (i) = 1 + 2 + 3 + 0.25 = 6.25. Hence,
P (A) = f (A)/Z = 1/6.25 = 0.16, P (B ) = 2/6.25 = 0.32, P (C ) = 3/6.25 = 0.48, and P (D) =
0.25/6.25 = 0.04.
The normalization procedure arises again and again in diﬀerent contexts. We will see that
much of the mathematics of statistical mechanics can be formulated in terms of the calculation of
normalization constants.
Problem 3.17. Find the probability distribution P (n) for throwing a sum n with two dice and
plot P (n) as a function of n.
Problem 3.18. What is the probability of obtaining at least one double six in twentyfour throws
of a pair of dice?
Problem 3.19. Suppose that three dice are thrown at the same time. What is the probability
that the sum of the three faces is 10 compared to 9?
Problem 3.20. What is the probability that the total number of spots shown on three dice thrown
at the same time is 11? What is the probability that the total is 12? What is the fallacy in the
following argument? The number 11 occurs in six ways: (1,4,6), (2,3,6), (1,5,5), (2,4,5), (3,3,5),
(3,4,4). The number 12 also occurs in six ways: (1,5,6), (2,4,6), (3,3,6), (2,5,5), (3,4,5), (4,4,4) and
hence the two numbers should be equally probable.
Problem 3.21. In two tosses of a single coin, what is the probability that heads will appear
at least once? Use the rules of probability to show that the answer is 3 . However, d’Alembert,
4
a distinguished French mathematician of the eighteenth century, reasoned that there are only 3
possible outcomes: heads on the ﬁrst throw, heads on the second throw, and no heads at all. The
ﬁrst two of these three outcomes is favorable. Therefore the probability that heads will appear at
least once is 2 . What is the fallacy in this reasoning?
3 CHAPTER 3. CONCEPTS OF PROBABILITY 3.3 89 Mean values The speciﬁcation of the probability distribution P (1), P (2), . . . P (n) for the n possible values of the
variable x constitutes the most complete statistical description of the system. However, in many
cases it is more convenient to describe the distribution of the possible values of x in a less detailed
way. The most familiar way is to specify the average or mean value of x, which we will denote as
x. The deﬁnition of the mean value of x is
x ≡ x1 P (1) + x2 P (2) + . . . + xn P (n) (3.6a) n = xi P (i), (3.6b) i=1 where P (i) is the probability of xi , and we have assumed that the probability is normalized. If
f (x) is a function of x, then the mean value of f (x) is deﬁned by
n f (x) = f (xi )P (i). (3.7) i=1 If f (x) and g (x) are any two functions of x, then
n f (x) + g (x) = [f (xi ) + g (xi )]P (i)
i=1
n = n f (xi )P (i) +
i=1 g (xi )P (i),
i=1 or
f (x) + g (x) = f (x) + g (x). (3.8) Problem 3.22. Show that if c is a constant, then
cf (x) = cf (x). (3.9) In general, we can deﬁne the mth moment of the probability distribution P as
n xm ≡ xi m P (i), (3.10) i=1 where we have let f (x) = xm . The mean of x is the ﬁrst moment of the probability distribution.
Problem 3.23. Suppose that the variable x takes on the values −2, −1, 0, 1, and 2 with probabilities 1/4, 1/8, 1/8, 1/4, and 1/4, respectively. Calculate the ﬁrst two moments of x.
The mean value of x is a measure of the central value of x about which the various values of
xi are distributed. If we measure x from its mean, we have that
∆x ≡ x − x,
and (3.11) CHAPTER 3. CONCEPTS OF PROBABILITY 90 ∆x = (x − x) = x − x = 0. (3.12) That is, the average value of the deviation of x from its mean vanishes.
If only one outcome j were possible, we would have P (i) = 1 for i = j and zero otherwise, that
is, the probability distribution would have zero width. In general, there is more than one outcome
and a possible measure of the width of the probability distribution is given by
2 ∆x2 ≡ x − x . (3.13) The quantity ∆x2 is known as the dispersion or variance and its square root is called the standard
deviation. It is easy to see that the larger the spread of values of x about x, the larger the variance.
The use of the square of x − x ensures that the contribution of x values that are smaller and larger
than x enter with the same sign. A useful form for the variance can be found by letting
x−x 2 = x2 − 2xx + x2
= x2 (3.14) − 2x x + x ,
2 or
x−x 2 = x2 − x2 . (3.15) Because ∆x2 is always nonnegative, it follows that x2 ≥ x2 .
The variance is the mean value of the square (x − x)2 and represents the square of a width.
We will ﬁnd that it is useful to interpret the width of the probability distribution in terms of the
standard deviation. The standard deviation of the probability distribution P (x) is given by
σx = ∆x2 = x2 − x2 . (3.16) Example 3.9. Find the mean value x, the variance ∆x2 , and the standard deviation σx for the
value of a single throw of a die.
Solution. Because P (i) = 1 for i = 1, . . . , 6, we have that
6
7
1
(1 + 2 + 3 + 4 + 5 + 6) = = 3.5
6
2
46
1
2 = (1 + 4 + 9 + 16 + 25 + 36) =
x
6
3
37
46 49
2 = x2 − x2 =
−
=
≈ 3.08
∆x
3
4
12
√
σx ≈ 3.08 = 1.76
x= Example 3.10. On the average, how many times must a die be thrown until a 6 appears?
Solution. Although it might seem obvious that the answer is six, it is instructive to conﬁrm this
answer. Let p be the probability of a six on a given throw. The probability of success for the ﬁrst
time on trial i is given in Table 3.2. CHAPTER 3. CONCEPTS OF PROBABILITY
trial
1
2
3
4 91 probability of
success on trial i
p
qp
q2 p
q3 p Table 3.2: Probability of a head for the ﬁrst time on trial i (q = 1 − p).
The sum of the probabilities is p + pq + pq 2 + · · · = p(1 + q + q 2 + · · · ) = p/(1 − q ) = p/p = 1.
The mean number of trials m is
m = p + 2pq + 3pq 2 + 4pq 3 + · · ·
= p(1 + 2q + 3q 2 + · · · )
d
= p (1 + q + q 2 + q 3 + · · · )
dq
p
d1
1
=
=p
=
dq 1 − q
(1 − q )2
p (3.17) Another way to obtain this result is to note that if the ﬁrst toss is a failure, then the mean
number of tosses required is 1 + m, and if the ﬁrst toss is a success, the mean number is 1. Hence,
m = q (1 + m) + p(1) or m = 1/p. 3.4 The meaning of probability How can we assign the probabilities of the various events? If we say that event E1 is more probable
than event E2 (P (E1 ) > P (E2 )), we mean that E1 is more likely to occur than E2 . This statement
of our intuitive understanding of probability illustrates that probability is a way of classifying the
plausibility of events under conditions of uncertainty. Probability is related to our degree of belief
in the occurrence of an event.
This deﬁnition of the concept of probability is not bound to a single evaluation rule and
there are many ways to obtain P (Ei ). For example, we could use symmetry considerations as
we have already done, past frequencies, simulations, theoretical calculations, or as we will learn
in Section 3.4.2, Bayesian inference. Probability assessments depend on who does the evaluation
and the status of the information the evaluator has at the moment of the assessment. We always
evaluate the conditional probability, that is, the probability of an event E given the information
I , P (E I ). Consequently, several people can have simultaneously diﬀerent degrees of belief about
the same event, as is well known to investors in the stock market.
If rational people have access to the same information, they should come to the same conclusion about the probability of an event. The idea of a coherent bet forces us to make probability
assessments that correspond to our belief in the occurrence of an event. If we consider an event to
be 50% probable, then we should be ready to place an even bet on the occurrence of the event or
on its opposite. However, if someone wishes to place the bet in one direction but not in the other,
it means that this person thinks that the preferred event is more probable than the other. In this CHAPTER 3. CONCEPTS OF PROBABILITY 92 case the 50% probability assessment is incoherent and this person’s wish does not correspond to
his or her belief.
A coherent bet has to be considered virtual. For example, a person might judge an event
to be 99.9999% probable, but nevertheless refuse to bet $999999 against $1, if $999999 is much
more than the person’s resources. Nevertheless, the person might be convinced that this bet
would be fair if he/she had an inﬁnite budget. Probability assessments should be kept separate
from decision issues. Decisions depend not only on the probability of the event, but also on the
subjective importance of a given amount of money (see Problems 3.11 and 3.90).
Our discussion of probability as the degree of belief that an event will occur shows the inadequacy of the frequency deﬁnition of probability, which deﬁnes probability as the ratio of the
number of desired outcomes to the total number of possible outcomes. This deﬁnition is inadequate
because we would have to specify that each outcome has equal probability. Thus we would have to
use the term probability in its own deﬁnition. If we do an experiment to measure the frequencies of
various outcomes, then we need to make an additional assumption that the measured frequencies
will be the same in the future as they were in the past. Also we have to make a large number of
measurements to insure accuracy, and we have no way of knowing a priori how many measurements
are suﬃcient. Thus, the deﬁnition of probability as a frequency really turns out to be a method
for estimating probabilities with some hidden assumptions.
Our deﬁnition of probability as a measure of the degree of belief in the occurrence of an
outcome implies that probability depends on our prior knowledge, because belief depends on prior
knowledge. For example, if we toss a coin and obtain 100 tails in a row, we might use this
knowledge as evidence that the coin or toss is biased, and thus estimate that the probability of
throwing another tail is very high. However, if a careful physical analysis shows that there is no
bias, then we would stick to our estimate of 1/2. The probability depends on what knowledge
we bring to the problem. If we have no knowledge other than the possible outcomes, then the
best estimate is to assume equal probability for all events. However, this assumption is not a
deﬁnition, but an example of belief. As an example of the importance of prior knowledge, consider
the following problem.
Problem 3.24. (a) A couple has two children. What is the probability that at least one child is
a girl? (b) Suppose that you know that at least one child is a girl. What is the probability that
the other child is a girl? (c) Instead suppose that we know that the oldest child is a girl. What is
the probability that the youngest is a girl?
We know that we can estimate probabilities empirically by sampling, that is, by making
repeated measurements of the outcome of independent events. Intuitively we believe that if we
perform more and more measurements, the calculated average will approach the exact mean of the
quantity of interest. This idea is called the law of large numbers.
As an example, suppose that we ﬂip a single coin M times and count the number of heads. Our
result for the number of heads is shown in Table 3.3. We see that the fraction of heads approaches
1/2 as the number of measurements becomes larger.
Problem 3.25. Use the applet/application at <stp.clarku.edu/simulations/cointoss> to
simulate multiple tosses of a single coin. What is the correspondence between this simulation
of a coin being tossed many times and the actual physical tossing of a coin? If the coin is “fair,” CHAPTER 3. CONCEPTS OF PROBABILITY
heads
4
29
49
101
235
518
4997
50021
249946
500416 tosses
10
50
100
200
500
1,000
10,000
100,000
500,000
1,000,000 93
fraction of heads
0.4
0.58
0.49
0.505
0.470
0.518
0.4997
0.50021
0.49999
0.50042 Table 3.3: The number and fraction of heads in M tosses of a coin. (We did not really toss a coin
in the air 106 times. Instead we used a computer to generate a sequence of random numbers to
simulate the tossing of a coin. Because you might not be familiar with such sequences, imagine a
robot that can write the positive integers between 1 and 231 on pieces of paper. Place these pieces
in a hat, shake the hat, and then chose the pieces at random. If the number chosen is less than
1
31
2 × 2 , then we say that we found a head. Each piece is placed back in the hat after it is read.
what do you think the ratio of the number of heads to the total number of tosses will be? Do you
obtain this number after 100 tosses? 10,000 tosses?
Another way of estimating the probability is to perform a single measurement on many copies
or replicas of the system of interest. For example, instead of ﬂipping a single coin 100 times in
succession, we collect 100 coins and ﬂip all of them at the same time. The fraction of coins that
show heads is an estimate of the probability of that event. The collection of identically prepared
systems is called an ensemble and the probability of occurrence of a single event is estimated with
respect to this ensemble. The ensemble consists of a large number M of identical systems, that is,
systems that satisfy the same known conditions.
If the system of interest is not changing in time, it is reasonable that an estimate of the
probability by either a series of measurements on a single system at diﬀerent times or similar
measurements on many identical systems at the same time would give consistent results.
Note that we have estimated various probabilities by a frequency, but have not deﬁned probability in terms of a frequency. As emphasized by D’Agostini, past frequency is experimental data.
This data happened with certainty so the concept of probability no longer applies. Probability is
how much we believe that an event will occur taking into account all available information including past frequencies. Because probability quantiﬁes the degree of belief at a given time, it is not
measurable. If we make further measurements, they can only inﬂuence future assessments of the
probability. 3.4.1 Information and uncertainty Consider an experiment that has two outcomes E1 and E2 with probabilities P1 and P2 . For
example, the experiment could correspond to the toss of a coin. For one coin the probabilities are CHAPTER 3. CONCEPTS OF PROBABILITY 94 P1 = P2 = 1/2 and for the other (a bent coin) P1 = 1/5 and P2 = 4/5. Intuitively, we would say
that the result of the ﬁrst experiment is more uncertain.
Next consider two additional experiments. In the third experiment there are four outcomes
with P1 = P2 = P3 = P4 = 1/4 and in the fourth experiment there are six outcomes with
P1 = P2 = P3 = P4 = P5 = P6 = 1/6. Intuitively the fourth experiment is the most uncertain
because there are more outcomes and the ﬁrst experiment is the least uncertain. You are probably
not clear about how to rank the second and third experiments.
We will now ﬁnd a mathematical measure that is consistent with our intuitive sense of uncertainty. Let us deﬁne the uncertainty function S (P1 , P2 , . . . , Pj , . . .) where j labels the possible
events and Pj is the probability of event j . We ﬁrst consider the case where all the probabilities
Pj are equal. Then P1 = P2 = . . . = Pj = 1/Ω, where Ω is the total number of outcomes. In this
case we have S = S (1/Ω, 1/Ω, . . .) or simply S (Ω).
It is easy to see that S (Ω) has to satisfy some simple conditions. For only one outcome Ω = 1
and there is no uncertainty. Hence we must have
S (Ω = 1) = 0. (3.18) S (Ω1 ) > S (Ω2 ) if Ω1 > Ω2 . (3.19) We also have that
That is, S (Ω) is a increasing function of Ω.
We next consider multiple events. For example, suppose that we throw a die with Ω1 outcomes
and ﬂip a coin with Ω2 equally probable outcomes. The total number of outcomes is Ω = Ω1 Ω2 . If
the result of the die is known, the uncertainty associated with the die is reduced to zero, but there
still is uncertainty associated with the toss of the coin. Similarly, we can reduce the uncertainty
in the reverse order, but the total uncertainty is still nonzero. These considerations suggest that
S (Ω1 Ω2 ) = S (Ω1 ) + S (Ω2 ). (3.20) It is remarkable that there is an unique functional form that satisﬁes the three conditions
(3.18)–(3.20). We can ﬁnd this form by writing (3.20) in the form
S (xy ) = S (x) + S (y ), (3.21) and taking the variables x and y to be continuous. (The analysis can be done assuming that x and
y are continuous variables, but the analysis is simpler if we assume that x and y are continuous
variables. The functional form of S might already be obvious.) This generalization is consistent
with S (Ω) being a increasing function of Ω. First we take the partial derivative of S (xy ) with
respect to x and then with respect to y . We have
∂z dS (z )
dS (z )
∂S (z )
=
=y
∂x
∂x dz
dz
∂S (z )
∂z dS (z )
dS (z )
=
=x
,
∂y
∂y dz
dz (3.22a)
(3.22b) CHAPTER 3. CONCEPTS OF PROBABILITY 95 where z = xy . But from (3.21) we have
dS (x)
∂S (z )
=
∂x
dx
∂S (z )
dS (y )
=
.
∂y
dy (3.23a)
(3.23b) By comparing the righthand side of (3.22) and (3.23), we have
dS
dS
=y
dx
dz
dS
dS
=x .
dy
dz (3.24a)
(3.24b) If we multiply (3.24a) by x and (3.24b) by y , we obtain
z S (x)
S (y )
dS (z )
=x
=y
.
dz
dx
dy (3.25) Note that the second part of (3.25) depends only on x and the third part depends only on y .
Because x and y are independent variables, the three parts of (3.25) must be equal to a constant.
Hence we have the desired condition
x S (x)
S (y )
=y
= A,
dx
dy (3.26) where A is a constant. The diﬀerential equation in (3.26) can be integrated to give
S (x) = A ln x + B. (3.27) The integration constant B must be equal to zero to satisfy the condition (3.18). The constant A
is arbitrary so we choose A = 1. Hence for equal probabilities we have that
S (Ω) = ln Ω. (3.28) What about the case where the probabilities for the various events are unequal? We will show
in Appendix 3A that the general form of the uncertainty S is
S=− Pj ln Pj . (3.29) j Note that if all the probabilities are equal then
Pj = 1
Ω (3.30) for all j . In this case
S=−
j 1
1
1
ln = Ω ln Ω = ln Ω,
ΩΩ
Ω (3.31) CHAPTER 3. CONCEPTS OF PROBABILITY 96 because there are Ω equal terms in the sum. Hence (3.29) reduces to (3.28) as required. We also
see that if outcome i is certain, Pi = 1 and Pj = 0 if i = j and S = −1 ln 1 = 0. That is, if the
outcome is certain, the uncertainty is zero and there is no missing information.
We have shown that if the Pj are known, then the uncertainty or missing information S
can be calculated. Usually the problem is the other way around, and we want to determine
the probabilities. Suppose we ﬂip a perfect coin for which there are two possibilities. We know
intuitively that P1 (heads) = P2 (tails) = 1/2. That is, we would not assign a diﬀerent probability
to each outcome unless we had information to justify it. Intuitively we have adopted the principle
of least bias or maximum uncertainty. Lets reconsider the toss of a coin. In this case S is given by
Pj ln Pj = −(P1 ln P1 + P2 ln P2 ) S=− (3.32a) j = −(P1 ln P1 + (1 − P1 ) ln(1 − P1 ), (3.32b) where we have used the fact that P1 + P2 = 1. To maximize S we take the derivative with respect
to P1 :3
dS
P1
= −[ln P1 + 1 − ln(1 − P1 ) − 1] = − ln
= 0.
(3.33)
dP1
1 − P1
The solution of (3.33) satisﬁes
P1
= 1,
1 − P1 (3.34) which is satisﬁed by the choice P1 = 1/2. We can check that this solution is a maximum by
calculating the second derivative.
∂2S
1
1
= −4 < 0,
2 =− P + 1−P
∂P1
1
1 (3.35) which is less then zero as we expected.
Problem 3.26. (a) Consider the toss of a coin for which P1 = P2 = 1/2 for the two outcomes.
What is the uncertainty in this case? (b) What is the uncertainty for P1 = 1/3 and P2 = 1/3?
How does the uncertainty in this case compare to that in part (a)? (c) On page 94 we discussed
four experiments with various outcomes. Calculate the uncertainty S of the third and fourth
experiments.
Example 3.11. The toss of a threesided die yields events E1 , E2 , and E3 with a face of one,
two, and three points. As a result of tossing many dice, we learn that the mean number of points
is f = 1.9, but we do not know the individual probabilities. What are the values of P1 , P2 , and
P3 that maximize the uncertainty?
Solution. We have
S = −[P1 ln P1 + P2 ln P2 + P3 ln P3 ]. (3.36) f = (1 × P1 ) + (2 × P2 ) + (3 × P3 ), (3.37) We also know that
3 We have used the fact that d(ln x) = 1/x. CHAPTER 3. CONCEPTS OF PROBABILITY 97 and P1 + P2 + P3 = 1. We use the latter condition to eliminate P3 using P3 = 1 − P1 − P2 , and
rewrite (3.37) as
f = P1 + 2P2 + 3(1 − P1 − P2 ) = 3 − 2P1 − P2 .
(3.38)
We then use (3.38) to eliminate P2 and P3 from (3.36) using P2 = 3 − f − 2P1 and P3 = f − 2 + P1 :
S = −[P1 ln P1 + (3 − f − 2P1 ) ln(3 − f − 2P1 ) + (f − 2 + P1 ) ln(f − 2 + P1 )]. (3.39) Because S in (3.39) depends on only P1 , we can simply diﬀerentiate S with respect P1 to ﬁnd its
maximum value:
dS
= − ln P1 − 1 − 2[ln(3 − f − 2P1 ) − 1] + [ln(f − 2 + P1 ) − 1]
dP1
P1 (f − 2 + P1 )
= 0.
= ln
(3 − f − 2P1 )2 (3.40) We see that for dS/dP1 to be equal to zero, the argument of the logarithm must be one. The result
is a quadratic equation for P1 .
Problem 3.27. Fill in the missing steps in Example 3.11 and solve for P1 , P2 , and P3 .
In Appendix 3B we maximize the uncertainty for a case for which there are more than three
outcomes. 3.4.2 *Bayesian inference Let us deﬁne P (AB ) as the probability of A occurring given that we know B . We now discuss a
few results about conditional probability. Clearly,
P (A) = P (AB ) + P (AB ), (3.41) where B means B does not occur. Also, it is clear that
P (A and B) = P (AB )P (B ) = P (B A)P (A), (3.42) Equation (3.42) means that the probability that A and B occur equals the probability that A occurs
given B times the probability that B occurs, which is the same as the probability that B occurs
given A times the probability A that occurs. If we are interested in various possible outcomes Ai
for the same B , we can rewrite (3.42) as
P (Ai B ) = P (B Ai )P (Ai )
.
P (B ) (3.43) If all the Ai are mutually exclusive and if at least one of the Ai must occur, then we can also write
P (B Ai )P (Ai ). P (B ) =
i (3.44) CHAPTER 3. CONCEPTS OF PROBABILITY 98 If we substitute (3.44) for P (B ) into (3.43), we obtain the important result:
P (Ai B ) = P (B Ai )P (Ai )
.
i P (B Ai )P (Ai ) (Bayes’ theorem) (3.45) Equation 3.45 is known as Bayes’ theorem.
Bayes’ theorem is very useful for choosing the most probable explanation of a given data set.
In this context Ai represents the possible explanations and B represents the data. As more data
becomes available, the probabilities P (B Ai )P (Ai ) change.
As an example, consider the following quandary known as the Monty Hall Problem.4 In this
show a contestant is shown three doors. Behind one door is an expensive gift such as a car and
behind the other two doors are inexpensive gifts such as a tie. The contestant chooses a door.
Suppose she chooses door 1. Then the host opens door 2 containing the tie. The contestant now
has a choice – should she stay with her original choice or switch to door 3? What would you do?
Let us use Bayes’ theorem to determine her best course of action. We want to calculate
P (A1 B ) = P (car behind door 1door 2 open after door 1 chosen),
and
P (A3 B ) = P (car behind door 3door 2 open after door 1 chosen),
where Ai denotes car behind door i. We know that all the P (Ai ) equal 1/3, because with no
information we must assume that the probability that the car is behind each door is the same.
Because the host can open door 2 or 3 if the car is behind door 1, but can only open door 2 if the
car is behind door 3 we have
P (door 2 open after door 1 chosencar behind 1) = 1/2
P (door 2 open after door 1 chosencar behind 2) = 0 (3.46a)
(3.46b) P (door 2 open after door 1 chosencar behind 3) = 1. (3.46c) Using Bayes’ theorem we have
(1/2)(1/3)
= 1/3
+ 01 + 11
3
3
(1)(1/3)
P (car behind 3door 2 open after door 1 chosen) = 1 1
1
1 = 2/3.
2 3 + 03 + 13 P (car behind 1door 2 open after door 1 chosen) = 11
23 (3.47a)
(3.47b) The results in (3.47) suggest the contestant has a higher probability of winning the car if she
switches doors and chooses door 3. The same logic suggests that she should always switch doors
independently of which door she originally chose. A search of the internet for Monty Hall will
bring you to many sites that discuss the problem in more detail.
Example 3.12. Even though you have no symptoms, your doctor wishes to test you for a rare
disease that only 1 in 10,000 people of your age contract. The test is 98% accurate, which means
that if you have the disease, 98% of the times the test will come out positive, and 2% negative.
4 This question was posed on the TV game show, “Let’s Make A Deal,” hosted by Monty Hall. CHAPTER 3. CONCEPTS OF PROBABILITY 99 We will also assume that if you do not have the disease, the test will come out negative 98% of
the time and positive 2% of the time. You take the test and it comes out positive. What is the
probability that you have the disease? Is this test useful?
Solution. Let P (+D) = 0.98 represent the probability of testing positive and having the disease.
If D represents not having the disease and − represents testing negative, then we are given:
P (−D) = 0.02, P (−D) = 0.98, P (+D) = 0.02, P (D) = 0.0001, and P (D ) = 0.9999. From
Bayes’ theorem we have
P (+D)P (D)
P (+D)P (D) + P (+D)P (D)
(0.98)(0.0001)
=
(0.98)(0.0001) + (0.02)(0.9999)
= 0.0047 = 0.47%. P (D+) = (3.48) Problem 3.28. Imagine that you have a sack of 3 balls that can be either red or green. There
are four hypotheses for the distribution of colors for the balls: (1) all are red, (2) 2 are red, (3) 1
is red, and (4) all are green. Initially, you have no information about which hypothesis is correct,
and thus you assume that they are equally probable. Suppose that you pick one ball out of the
sack and it is green. Use Bayes’ theorem to determine the new probabilities for each hypothesis.
Problem 3.29. Make a table that determines the necessary accuracy for a test to give the probability of having a disease if tested positive equal to at least 50% for diseases that occur in 1 in
100, 1 in 1000, 1 in 10,000, and 1 in 100,000 people.
We have emphasized that the deﬁnition of probability as a frequency is inadequate. If you
are interesting in learning more about Bayesian inference, read Problem 3.92 and the reference by
D’Agostini. 3.5 Bernoulli processes and the binomial distribution Because most physicists spend little time gambling,5 we will have to develop our intuitive understanding of probability in other ways. Our strategy will be to ﬁrst consider some physical systems
for which we can calculate the probability distribution by analytical methods. Then we will use
the computer to generate more data to analyze.
Noninteracting magnetic moments
Consider a system of N noninteracting magnetic moments of spin 1 , each having a magnetic
2
moment µ in an external magnetic ﬁeld B . The ﬁeld B is in the up (+z ) direction. Spin 1
2
implies that a spin can point either up (parallel to B ) or down (antiparallel to B ). The energy
of interaction of each spin with the magnetic ﬁeld is E = ∓µB , according to the orientation of
the magnetic moment. As discussed in Section 1.10, this model is a simpliﬁcation of more realistic
magnetic systems.
5 After a Las Vegas hotel hosted a meeting of the American Physical Society in March, 1986, the physicists were
asked never to return. CHAPTER 3. CONCEPTS OF PROBABILITY 100 We will take p to be the probability that the spin (magnetic moment) is up and q the probability
that the spin is down. Because there are no other possible outcomes,we have p + q = 1 or q = 1 − p.
If B = 0, there is no preferred spatial direction and p = q = 1/2. For B = 0 we do not yet know
how to calculate p and for now we will assume that p is a known parameter. In Section 4.8 we will
learn how to calculate p and q when the system is in equilibrium at temperature T .
We associate with each spin a random variable si which has the values ±1 with probability
p and q , respectively. One of the quantities of interest is the magnetization M , which is the net
magnetic moment of the system. For a system of N spins the magnetization is given by
N M = µ(s1 + s2 + . . . + sN ) = µ si . (3.49) i=1 In the following, we will take µ = 1 for convenience whenever it will not cause confusion. Alternatively, we can interpret M as the net number of up spins.
We will ﬁrst calculate the mean value of M , then its variance, and ﬁnally the probability
distribution P (M ) that the system has magnetization M . To compute the mean value of M , we
need to take the mean values of both sides of (3.49). If we use (3.8), we can interchange the sum
and the average and write
N N si = M=
i=1 si . (3.50) i=1 Because the probability that any spin has the value ±1 is the same for each spin, the mean value
of each spin is the same, that is, s1 = s2 = . . . = sN ≡ s. Therefore the sum in (3.50) consists of
N equal terms and can be written as
M = N s.
(3.51)
The meaning of (3.51) is that the mean magnetization is N times the mean magnetization of a
single spin. Because s = (1 × p) + (−1 × q ) = p − q , we have that
M = N (p − q ). (3.52) Now let us calculate the variance of M , that is, (M − M )2 . We write
N ∆M = M − M = ∆si , (3.53) i=1 where
∆si ≡ si − s.
As an example, let us calculate (∆M )2 (3.54)
2 for N = 3 spins. In this case (∆M ) is given by (∆M )2 = (∆s1 + ∆s2 + ∆s3 )(∆s1 + ∆s2 + ∆s3 )
= (∆s1 )2 + (∆s2 )2 + (∆s3 )2 + 2 ∆s1 ∆s2 + ∆s1 ∆s3 + ∆s2 ∆s3 . (3.55) We take the mean value of (3.55), interchange the order of the sums and averages, and write
(∆M )2 = (∆s1 )2 + (∆s2 )2 + (∆s3 )2 + 2 ∆s1 ∆s2 + ∆s1 ∆s3 + ∆s2 ∆s3 . (3.56) CHAPTER 3. CONCEPTS OF PROBABILITY 101 The ﬁrst term on the right of (3.56) represents the three terms in the sum that are multiplied
by themselves. The second term represents all the cross terms arising from diﬀerent terms in the
sum, that is, the products in the second sum refer to diﬀerent spins. Because diﬀerent spins are
statistically independent (the spins do not interact), we have that
∆si ∆sj = ∆si ∆sj = 0, (i = j ) (3.57) because ∆si = 0. That is, each cross term vanishes on the average. Hence (3.57) reduces to a sum
of squared terms
(3.58)
(∆M )2 = (∆s1 )2 + (∆s2 )2 + (∆s3 )2 .
Because each spin is equivalent on the average, each term in (3.58) is equal. Hence, we obtain the
desired result
(∆M )2 = 3(∆s)2 .
(3.59)
The variance of M is 3 times the variance of a single spin, that is, the variance is additive.
We can evaluate (∆M )2 further by ﬁnding an explicit expression for (∆s)2 . We have that
s2 = [12 × p] + [(−1)2 × q ] = p + q = 1. Hence, we have
(∆s)2 = s2 − s2 = 1 − (p − q )2 = 1 − (2p − 1)2
= 1 − 4p2 + 4p − 1 = 4p(1 − p) = 4pq, (3.60) and our desired result for (∆M )2 is
(∆M )2 = 3(4pq ). (3.61) Problem 3.30. Use similar considerations to show that for N = 3 that
n = 3p (3.62) (n − n)2 = 3pq, (3.63) and
where n is the number of up spins. Explain the diﬀerence between (3.52) and (3.62) for N = 3,
and the diﬀerence between (3.61) and (3.63).
Problem 3.31. In the text we showed that (∆M )2 = 3(∆s)2 for N = 3 spins (see (3.59) and
(3.61)). Use similar considerations for N noninteracting spins to show that
(∆M )2 = N (4pq ). (3.64) Because of the simplicity of a system of noninteracting spins, we can calculate the probability
distribution itself and not just the ﬁrst few moments. As an example, let us consider the statistical
properties of a system of N = 3 noninteracting spins. Because each spin can be in one of two
states, there are 2N =3 = 8 distinct outcomes (see Figure 3.2). Because each spin is independent
of the other spins, we can use the multiplication rule (3.4) to calculate the probabilities of each
outcome as shown in Figure 3.2. Although each outcome is distinct, several of the conﬁgurations CHAPTER 3. CONCEPTS OF PROBABILITY 102 p3 p2q p2q p2q pq2 pq2 pq2 q3 Figure 3.2: An ensemble of N = 3 spins. The arrow indicates the direction of the magnetic moment
of a spin. The probability of each member of the ensemble is shown.
have the same number of up spins. One quantity of interest is the probability PN (n) that n spins
are up out a total of N spins. For example, there are three states with n = 2, each with probability
p2 q so the probability that two spins are up is equal to 3p2 q . For N = 3 we see from Figure 3.2
that
P3 (n = 3) = p3 (3.65a)
2 P3 (n = 2) = 3p q (3.65b) 2 (3.65c) P3 (n = 1) = 3pq
3 P3 (n = 0) = q . (3.65d) Example 3.13. Find the ﬁrst two moments of P3 (n).
Solution. The ﬁrst moment n of the distribution is given by
n = 0 × q 3 + 1 × 3pq 2 + 2 × 3p2 q + 3 × p3
= 3p (q 2 + 2pq + p2 ) = 3p (q + p)2 = 3p. (3.66) Similarly, the second moment n2 of the distribution is given by
n2 = 0 × q 3 + 1 × 3pq 2 + 4 × 3p2 q + 9 × p3
= 3p (q 2 + 4pq + 3p2 ) = 3p(q + 3p)(q + p)
= 3p (q + 3p) = (3p)2 + 3pq.
Hence
(n − n)2 = n2 − n2 = 3pq. (3.67) The mean magnetization M or the mean of the net number of up spins is given by the diﬀerence
between the mean number of spins pointing up minus the mean number of spins pointing down:
M = [n − (3 − n], or M = 3(2p − 1) = 3(p − q ). CHAPTER 3. CONCEPTS OF PROBABILITY 103 Problem 3.32. The outcome of N coins is identical to N noninteracting spins, if we associate the
number of coins with N , the number of heads with n, and the number of tails with N − n. For a
fair coin the probability p of a head is p = 1 and the probability of a tail is q = 1 − p = 1/2. What
2
is the probability that in three tosses of a coin, there will be two heads?
Problem 3.33. Onedimensional random walk. The original statement of the random walk problem was posed by Pearson in 1905. If a drunkard begins at a lamp post and takes N steps of equal
length in random directions, how far will the drunkard be from the lamp post? We will consider
an idealized example of a random walk for which the steps of the walker are restricted to a line
(a onedimensional random walk). Each step is of equal length a, and at each interval of time,
the walker either takes a step to the right with probability p or a step to the left with probability
q = 1 − p. The direction of each step is independent of the preceding one. Let n be the number of
steps to the right, and n the number of steps to the left. The total number of steps N = n + n .
(a) What is the probability that a random walker in one dimension has taken three steps to the
right out of four steps?
From the above examples and problems, we see that the probability distributions of noninteracting magnetic moments, the ﬂip of a coin, and a random walk are identical. These examples
have two characteristics in common. First, in each trial there are only two outcomes, for example,
up or down, heads or tails, and right or left. Second, the result of each trial is independent of all
previous trials, for example, the drunken sailor has no memory of his or her previous steps. This
type of process is called a Bernoulli process.6
Because of the importance of magnetic systems, we will cast our discussion of Bernoulli processes in terms of the noninteracting magnetic moments of spin 1 . The main quantity of interest is
2
the probability PN (n) which we now calculate for arbitrary N and n. We know that a particular
outcome with n up spins and n down spins occurs with probability pn q n . We write the probability
PN (n) as
PN (n) = WN (n, n ) pn q n ,
(3.68)
where n = N − n and WN (n, n ) is the number of distinct conﬁgurations of N spins with n up
spins and n down spins. From our discussion of N = 3 noninteracting spins, we already know the
ﬁrst several values of WN (n, n ).
We can determine the general form of WN (n, n ) by obtaining a recursion relation between
WN and WN −1 . A total of n up spins and n down spins out of N total spins can be found by
adding one spin to N − 1 spins. The additional spin is either
(a) up if there are (n − 1) up spins and n down spins, or
(b) down if there are n up spins and n down spins.
Because there are WN (n − 1, n ) ways of reaching the ﬁrst case and WN (n, n − 1) ways in the
second case, we obtain the recursion relation
WN (n, n ) = WN −1 (n − 1, n ) + WN −1 (n, n − 1).
6 These processes are named after the mathematician Jacob Bernoulli, 1654 – 1705. (3.69) CHAPTER 3. CONCEPTS OF PROBABILITY 104 1
1
1
1
1 1
2 3
4 1
1 3
6 4 1 Figure 3.3: The values of the ﬁrst few coeﬃcients WN (n, n ). Each number is the sum of the two
numbers to the left and right above it. This construction is called a Pascal triangle.
If we begin with the known values W0 (0, 0) = 1, W1 (1, 0) = W1 (0, 1) = 1, we can use the recursion
relation (3.69) to construct WN (n, n ) for any desired N . For example,
W2 (2, 0) = W1 (1, 0) + W1 (2, −1) = 1 + 0 = 1.
W2 (1, 1) = W1 (0, 1) + W1 (1, 0) = 1 + 1 = 2. (3.70a)
(3.70b) W2 (0, 2) = W1 (−1, 2) + W1 (0, 1) = 0 + 1. (3.70c) In Figure 3.3 we show that WN (n, n ) forms a pyramid or (a Pascal) triangle.
It is straightforward to show by induction that the expression
WN (n, n ) = N!
N!
=
n! n !
n!(N − n)! (3.71) satisﬁes the relation (3.69). Note the convention 0! = 1. We can combine (3.68) and (3.71) to ﬁnd
the desired result
PN (n) = N!
p n q N −n .
n! (N − n)! (binomial distribution) (3.72) The form (3.72) is called the binomial distribution. Note that for p = q = 1/2, PN (n) reduces to
PN (n) = N!
2 −N .
n! (N − n)! (3.73) The probability PN (n) is shown in Figure 3.4 for N = 16.
Problem 3.34. (a) Calculate the distribution PN (n) that n spins are up out of a total of N for
N = 4 and N = 16 and put your results in the form of a table. Calculate the mean values of n
and n2 using your tabulated values of PN (n). (b) Plot your tabulated results that you calculated
in part (a) (see Figure 3.4). Assume p = q = 1/2. Visually estimate the width of the distribution CHAPTER 3. CONCEPTS OF PROBABILITY 105 0.20 P(n) 0.15 0.10 0.05 0.00
0 2 4 6 8 10 12 14 16 n
Figure 3.4: The binomial distribution P16 (n) for p = q = 1/2 and N = 16. What is your visual
estimate for the width of the distribution?
for each value of N . What is the qualitative dependence of the width on N ? Also compare the
relative heights of the maximum of PN . (c) Plot PN (n) as a function of n/n for N = 4 and N = 16
on the same graph as in (b). Visually estimate the relative width of the distribution for each value
of N . (d) Plot ln PN (n) versus n/n for N = 16. Describe the behavior of ln PN (n). Can ln PN (n)
be ﬁtted to a parabola of the form A + B (n − n)2 , where A and B are ﬁt parameters?
Problem 3.35. (a) Plot PN (n) versus n for N = 16 and p = 2/3. For what value of n is PN (n) a
maximum? How does the width of the distribution compare to what you found in Problem 3.34?
(b) For what value of p and q do you think the width is a maximum for a given N ?
Example 3.14. Show that the expression (3.72) for PN (n) satisﬁes the normalization condition
(3.2).
Solution. The reason that (3.72) is called the binomial distribution is that its form represents a
typical term in the expansion of (p + q )N . By the binomial theorem we have
(p + q )N = N N!
p n q N −n .
n! (N − n)!
n=0 (3.74) We use (3.72) and write
N N PN (n) =
n=0 N!
pn q N −n = (p + q )N = 1N = 1,
n! (N − n)!
n=0 where we have used (3.74) and the fact that p + q = 1. (3.75) CHAPTER 3. CONCEPTS OF PROBABILITY 106 Calculation of the mean value
We now ﬁnd an analytical expression for the dependence of n on N and p. From the deﬁnition
(3.6) and (3.72) we have
N N n= n PN (n) =
n=0 n
n=0 N!
p n q N −n .
n! (N − n)! (3.76) We evaluate the sum in (3.76) by using a technique that is useful in a variety of contexts.7 The
technique is based on the fact that
d
p pn = npn .
(3.77)
dp
We use (3.77) to rewrite (3.76) as
N N!
p n q N −n
n! (N − n)! (3.78) ∂
N!
p p n q N −n .
n! (N − n)! ∂p
n=0 (3.79) n= n
n=0
N = We have used a partial derivative in (3.79) to remind us that the derivative operator does not act
on q . We interchange the order of summation and diﬀerentiation in (3.79) and write
n=p
=p ∂
∂p N N!
p n q N −n
n! (N − n)!
n=0 ∂
(p + q )N ,
∂p (3.80)
(3.81) where we have temporarily assumed that p and q are independent variables. Because the operator
acts only on p, we have
n = pN (p + q )N −1 .
(3.82)
The result (3.82) is valid for arbitrary p and q , and hence it is applicable for p + q = 1. Thus our
desired result is
n = pN.
(3.83)
The dependence of n on N and p should be intuitively clear. Compare the general result (3.83) to
the result (3.66) for N = 3. What is the dependence of n on N and p?
Calculation of the relative ﬂuctuations
7 The integral R∞
0 2 xn e−ax for a > 0 is evaluated in Appendix A using a similar technique. CHAPTER 3. CONCEPTS OF PROBABILITY 107 To determine ∆n2 we need to know n2 (see the relation (3.15)). The average value of n2 can be
calculated in a manner similar to that for n. We write
N n2 n2 =
n=0 N!
p n q N −n
n! (N − n)! N = ∂
N!
p
n! (N − n)! ∂p
n=0 =p ∂
∂p 2 2 (3.84) p n q N −n N ∂
N!
p n q N −n = p
n! (N − n)!
∂p
n=0 2 (p + q )N ∂
pN (p + q )N −1
∂p
= p N (p + q )N −1 + pN (N − 1)(p + q )N −2 .
=p (3.85) Because we are interested in the case p + q = 1, we have
n2 = p [N + pN (N − 1)]
= p [pN 2 + N (1 − p)] = (pN )2 + p (1 − p)N
= n2 + pqN, (3.86) where we have used (3.83) and let q = 1 − p. Hence, from (3.86) we ﬁnd that the variance of n is
given by
σn 2 = (∆n)2 = n2 − n2 = pqN.
(3.87)
Compare the calculated values of σn from (3.87) with your estimates in Problem 3.34 and to the
exact result (3.67) for N = 3.
The relative width of the probability distribution of n is given by (3.83) and (3.87)
√
q1 1
σn
pqN
2
√.
=
=
(3.88)
n
pN
p
N
√
We see that the relative width goes to zero as 1/ N .
Frequently we need to evaluate ln N ! for N
Stirling’s approximation is
ln N ! ≈ N ln N − N. 1. A simple approximation for ln N ! known as
(Stirling’s approximation) (3.89) A more accurate approximation is given by
ln N ! ≈ N ln N − N + 1
ln(2πN ).
2 A simple derivation of Stirling’s approximation is given in Appendix A. (3.90) CHAPTER 3. CONCEPTS OF PROBABILITY 108 Problem 3.36. (a) What is the largest value of ln N ! that you can calculate exactly using a
typical hand calculator? (b) Compare the approximations (3.89) and (3.90) to each other and to
the exact value of ln N ! for N = 5, 10, 20, and 50. If necessary, compute ln N ! directly using the
relation
N ln m. ln N ! = (3.91) m=1 (c) Use the simple form of Stirling’s approximation to show that
d
ln x! = ln x for x
dx 1. (3.92) Problem 3.37. Consider the binomial distribution PN (n) for N = 16 and p = q = 1/2. What is
the value of PN (n) for n = σn /2? What is the value of the product PN (n = n)σn ?
Problem 3.38. A container of volume V contains N molecules of a gas. We assume that the gas
is dilute so that the position of any one molecule is independent of all other molecules. Although
the density will be uniform on the average, there are ﬂuctuations in the density. Divide the volume
V into two parts V1 and V2 , where V = V1 + V2 . (a) What is the probability p that a particular
molecule is in each part? (b) What is the probability that N1 molecules are in V1 and N2 molecules
are in V2 ? (c) What is the average number of molecules in each part? (d) What are the relative
ﬂuctuations of the number of particles in each part?
Problem 3.39. Suppose that a random walker takes n steps to the right and n steps to the left
and each step is of equal length a. Denote x as the net displacement of a walker. What is the mean
value x for a N step random walk? What is the analogous expression for the variance (∆x)2 ?
Problem 3.40. Monte Carlo simulation. We can gain more insight into the nature of the Bernoulli
distribution by doing a Monte Carlo simulation, that is, by using a computer to “ﬂip coins” and
average over many measurements.8 In the context of random walks, we can implement a N step
walk by the following pseudocode:
do istep
if (rnd
x=x
else
x=x
end if
end do = 1,N
<= p) then
+1
1 The function rnd generates a random number between zero and one. The quantity x is the net
displacement assuming that the steps are of unit length. It is necessary to save the value of
x after N steps and average over many walkers. Write a simple program or use the applet at
<stp.clarku.edu/simulations/OneDimensionalWalk> to compute PN (x). First choose N = 4
and p = 1/2 and make a suﬃcient number of measurements so that the various quantities of
interest are known to a good approximation. Then take N = 100 and describe the qualitative
xdependence of PN (x).
8 The name “Monte Carlo” was coined by Nicolas Metropolis in 1949. CHAPTER 3. CONCEPTS OF PROBABILITY 109 2.0 x 105 H(x) 1.5 x 105 1.0 x 105 5.0 x 104 0
1.5 1.0 0.5 0.0 0.5 1.0 1.5 x
Figure 3.5: Histogram of the number of times that the displacement of a onedimensional random
walker is between x and x + ∆x after N = 16 steps. The data was generated by simulating 106
walkers. The length of each step was chosen at random between zero and unity and the bin width
is ∆x = 0.1. 3.6 Continuous probability distributions In many cases of physical interest the random variables have continuous values. Examples of
continuous variables are the position of the holes in a dart board, the position and velocity of a
classical particle, and the angle of a compass needle.
For continuous variables, the probability of obtaining a particular value is not meaningful.
For example, consider a onedimensional random walker who steps at random to the right or to
the left with equal probability, but with step lengths that are chosen at random between zero and
a maximum length a. The continuous nature of the length of each step implies that the position
x of the walker is a continuous variable. Because there are an inﬁnite number of possible x values
in a ﬁnite interval of x, the probability of obtaining any particular value of x is zero. Instead,
we have to reformulate the question and ask for the probability that the position of the walker is
between x and x + ∆x after N steps. If we do a simulation of such a walker, we would record the
number of times, H (x, ∆x), that a walker is in a bin of width ∆x a distance x from the origin,
and plot the histogram H (x, ∆x) as a function of x (see Figure 3.5). If the number of walkers
that is sampled is suﬃciently large, we would ﬁnd that H (x, ∆x) is proportional to the estimated
probability that a walker is in a bin of width ∆x a distance x from the origin after N steps. To
obtain the probability, we divide H (x, ∆x) by the total number of walkers. CHAPTER 3. CONCEPTS OF PROBABILITY 110 In practice, the choice of the bin width is a compromise. If ∆x is too big, the features of
the histogram would be lost. If ∆x is too small, many of the bins would be empty for a given
number of walkers. Hence, our estimate of the number of walkers in each bin would be less
accurate. Because we expect the number to be proportional to the width of the bin, we can write
H (x, ∆x) = p(x)∆x. The quantity p(x) is the probability density. In the limit that ∆x → 0,
H (x, ∆x) becomes a continuous function of x, and we can write the probability that a walker is in
the range between a and b as
b p(x) dx. P (a to b) = (3.93) a Note that the probability density p(x) is nonnegative and has units of one over the dimension of
x.
The formal properties of the probability density p(x) are easily generalizable from the discrete
case. For example, the normalization condition is given by
∞ p(x) dx = 1. (3.94) −∞ The mean value of the function f (x) in the interval a to b is given by
b f= f (x) p(x) dx. (3.95) a Problem 3.41. The random variable x has the probability density
A e−λx
0 p(x) = if 0 ≤ x ≤ ∞
x < 0. (3.96) (a) Determine the normalization constant A in terms of λ. (b) What is the mean value of x?
What is the most probable value of x? (c) What is the mean value of x2 ? (d) Choose λ = 1 and
determine the probability that a measurement of x yields a value less than 0.3.
2 Problem 3.42. Consider the probability density function p(v) = (a/π )3/2 e−av for the velocity
v of a particle. Each of the three velocity components can range from −∞ to +∞ and a is a
constant. (a) What is the probability that a particle has a velocity between vx and vx + dvx , vy
and vy + dvy , and vz and vz + dvz ? (b) Show that p(v) is normalized to unity. Use the fact that
∞ 2 e−au du = 0 1
2 π
.
a (3.97) Note that this calculation involves doing three similar integrals that can be evaluated separately.
(c) What is the probability that vx ≥ 0, vy ≥ 0, vz ≥ 0 simultaneously?
Problem 3.43. (a) Find the ﬁrst four moments of the Gaussian probability density
1 p(x) = (2π )− 2 e−x 2 /2 . (−∞ < x < ∞) (3.98) CHAPTER 3. CONCEPTS OF PROBABILITY 111 Guess the dependence of the k th moment on k for k even. What are the odd moments of p(x)?
(b) Calculate the value of C4 , the fourthorder cumulant, deﬁned by
2 C4 = x4 − 4x3 x − 3 x2 + 12 x2 x2 − 6 x4 . (3.99) Problem 3.44. Consider the probability density given by
p(x) = (2a)−1
0 for x ≤ a
for x > a (3.100) (a) Sketch the dependence of p(x) on x. (b) Find the ﬁrst four moments of p(x). (c) Calculate the
value of the fourthorder cumulant C4 deﬁned in (3.99)). What is C4 for the probability density
in (3.100)?
Problem 3.45. Not all probability densities have a ﬁnite variance. Sketch the Lorentz or Cauchy
distribution given by
1
γ
p(x) =
.
(−∞ < x < ∞)
(3.101)
π (x − a)2 + γ 2
Choose a = 0 and γ = 1 and compare the form of p(x) in (3.101) to the Gaussian distribution given
by (3.98). Give a simple argument for the existence of the ﬁrst moment of the Lorentz distribution.
Does the second moment exist? 3.7 The Gaussian distribution as a limit of the binomial
distribution In Problem 3.34 we found that for large N , the binomial distribution has a welldeﬁned maximum
at n = pN and can be approximated by a smooth, continuous function even though only integer
values of n are physically possible. We now ﬁnd the form of this function of n.
The ﬁrst step is to realize that for N
1, PN (n) is a rapidly varying function of n near
n = pN , and for this reason we do not want to approximate PN (n) directly. However, because
the logarithm of PN (n) is a slowly varying function (see Problem 3.34), we expect that the power
series expansion of ln PN (n) to converge. Hence, we expand ln PN (n) in a Taylor series about the
value of n = n at which ln PN (n) reaches its maximum value. We will write p(n) instead of PN (n)
˜
because we will treat n as a continuous variable and hence p(n) is a probability density. We ﬁnd
ln p(n) = ln p(n = n) + (n − n)
˜
˜ d ln p(n)
dn n=˜
n d2 ln p(n)
1
˜
+ (n − n)2
2
d2 n n=˜
n + ··· (3.102) Because we have assumed that the expansion (3.102) is about the maximum n = n, the ﬁrst deriva˜
tive d ln p(n)/dn n=˜ must be zero. For the same reason the second derivative d2 ln p(n)/dn2 n=˜
n
n
must be negative. We assume that the higher terms in (3.102) can be neglected and adopt the
notation
ln A = ln p(n = n),
˜
and (3.103) CHAPTER 3. CONCEPTS OF PROBABILITY
B=− d2 ln p(n)
dn2 112 n=˜
n . (3.104) The approximation (3.102) and the notation in (3.103) and (3.104) allows us to write
1
ln p(n) ≈ ln A − B (n − n)2 ,
˜
2 (3.105) or
1 2 ˜
p(n) ≈ A e− 2 B (n−n) . (3.106) We next use Stirling’s approximation to evaluate the ﬁrst two derivatives of ln p(n) and the
value of ln p(n) at its maximum to ﬁnd the parameters A, B , and n. We write
˜
ln p(n) = ln N ! − ln n! − ln(N − n)! + n ln p + (N − n) ln q. (3.107) It is straightforward to use the relation (3.92) to obtain
d(ln p)
= − ln n + ln(N − n) + ln p − ln q.
dn (3.108) The most probable value of n is found by ﬁnding the value of n that satisﬁes the condition
d ln p/dn = 0. We ﬁnd
q
N −n
˜
=,
(3.109)
n
˜
p
or (N − n)p = nq . If we use the relation p + q = 1, we obtain
˜
˜
n = pN.
˜ (3.110) Note that n = n, that is, the value of n for which p(n) is a maximum is also the mean value of n.
˜
The second derivative can be found from (3.108). We have
d2 (ln p)
1
1
.
=− −
dn2
n N −n (3.111) Hence, the coeﬃcient B deﬁned in (3.104) is given by
B=− d2 ln p
1
1
1
=
.
=+
dn2
n N −n
˜
˜
N pq (3.112) From the relation (3.87) we see that
B= 1
,
σ2 (3.113) where σ 2 is the variance of n.
If we use the simple form of Stirling’s approximation (3.89) to ﬁnd the normalization constant
A from the relation ln A = ln p(n = n), we would ﬁnd that ln A = 0. Instead, we have to use the
˜
more accurate form of Stirling’s approximation (3.90). The result is
A= 1
1
=
.
1/2
(2πN pq )
(2πσ 2 )1/2 (3.114) CHAPTER 3. CONCEPTS OF PROBABILITY 113 Problem 3.46. Derive (3.114) using the more accurate form of Stirling’s approximation (3.90)
with n = pN and N − n = qN .
If we substitute our results for n, B , and A into (3.106), we ﬁnd the standard form for the
˜
Gaussian distribution
2
2
1
p(n) = √
e−(n−n) /2σ .
(Gaussian probability density)
(3.115)
2
2πσ
An alternative derivation of the parameters A, B , and n is given in Problem 3.74.
˜
n
0
1
2
3
4
5 P10 (n)
0.000977
0.009766
0.043945
0.117188
0.205078
0.246094 Gaussian approximation
0.001700
0.010285
0.041707
0.113372
0.206577
0.252313 Table 3.4: Comparison of the exact values of P10 (n) with the Gaussian distribution (3.115) for
p = q = 1/2.
From our derivation we see that (3.115) is valid for large values of N and for values of n near
n. Even for relatively small values of N , the Gaussian approximation is a good approximation
for most values of n. A comparison of the Gaussian approximation to the binomial distribution is
given in Table 3.4.
The most important feature of the Gaussian distribution is that its relative width, σn /n,
decreases as N −1/2 . Of course, the binomial distribution shares this feature. 3.8 The central limit theorem or why is thermodynamics
possible? We have discussed how to estimate probabilities empirically by sampling, that is, by making
repeated measurements of the outcome of independent events. Intuitively we believe that if we
perform more and more measurements, the calculated average will approach the exact mean of the
quantity of interest. This idea is called the law of large numbers. However, we can go further and
ﬁnd the form of the probability distribution that a particular measurement diﬀers from the exact
mean. The form of this probability distribution is given by the central limit theorem. We ﬁrst
illustrate this theorem by considering a simple measurement.
Suppose that we wish to estimate the probability of obtaining face 1 in one throw of a die.
The answer of 1 means that if we perform N measurements, face 1 will appear approximately N/6
6
times. What is the meaning of approximately? Let S be the total number of times that face one
appears in N measurements. We write
N si , S=
i=1 (3.116) CHAPTER 3. CONCEPTS OF PROBABILITY 114 where
si = 1, if the ith throw gives 1
0 otherwise. (3.117) If N is large, then S/N approaches 1/6. How does this ratio approach the limit? We can empirically
answer this question by repeating the measurement M times. (Each measurement of S consists of
N throws of a die.) Because S itself is a random variable, we know that the measured values of
S will not be identical. In Figure 3.6 we show the results of M = 10, 000 measurements of S for
N = 100 and N = 800. We see that the approximate form of the distribution of values of S is a
Gaussian. In Problem 3.47 we calculate the absolute and relative width of the distributions.
Problem 3.47. Estimate the absolute width and the relative width of the distributions shown in
Figure 3.6 for N = 100 and N = 800. Does the error of any one measurement of S decreases with
increasing N as expected? How would the plot change if M were increased to M = 10, 000?
In Appendix 3A we show that in the limit of large N , the probability density p(S ) is given by
p(S ) = 1
2
2πσS e − (S − S ) 2 2
/2σS , (3.118) where
S = Ns
2
σS (3.119)
2 = Nσ , (3.120)
N with σ 2 = s2 − s2 . The quantity p(S )∆S is the probability that the value of i=1 si is between S
and S + ∆S . Equation (3.118) is equivalent to the central limit theorem. Note that the Gaussian
form in (3.118) holds only for large N and for values of S near its most probable (mean) value. The
latter restriction is the reason that the theorem is called the central limit theorem; the requirement
that N be large is the reason for the term limit.
The central limit theorem is one of the most remarkable results of the theory of probability.
In its simplest form, the theorem states that the sum of a large number of random variables
will approximate a Gaussian distribution. Moreover, the approximation steadily improves as the
number of variables in the sum increases.
1
5
For the throw of a die, s = 1 , s2 = 1 , and σ 2 = s2 − s2 = 1 − 36 = 36 . For N throws of a
6
6
6
2
die, we have S = N/6 and σS = 5N/36. Hence, we see that in this case the most probable relative
error in any one measurement of S decreases as σS /S = 5/N . Note that if we let S represent the displacement of a walker after N steps, and let σ 2 equal
the mean square displacement for a single step, then the result (3.118)–(3.120) is equivalent to
our results for random walks in the limit of large N . Or we can let S represent the magnetization
of a system of noninteracting spins and obtain similar results. That is, a random walk and its
equivalents are examples of an additive random process.
The central limit theorem shows why the Gaussian distribution is ubiquitous in nature. If a
random process is related to a sum of a large number of microscopic processes, the sum will be
distributed according to the Gaussian distribution independently of the nature of the distribution
of the microscopic processes. CHAPTER 3. CONCEPTS OF PROBABILITY 115 0.12 p(S) 0.10
0.08
N = 100
0.06
0.04
N = 800 0.02
0
0 50 100 150 200 S
Figure 3.6: The distribution of the measured values of M = 10, 000 diﬀerent measurements of the
sum S for N = 100 and N = 800 terms in the sum. The quantity S is the number of times that
face 1 appears in N throws of a die. For N = 100, the measured values are S = 16.67, S 2 = 291.96,
and σS = 3.74. For N = 800, the measured values are S = 133.31, S 2 = 17881.2, and σS = 10.52.
What are the estimated values of the relative width for each case?
The central limit theorem implies that macroscopic bodies have well deﬁned macroscopic
properties even though their constituent parts are changing rapidly. For example in a gas or
liquid, the particle positions and velocities are continuously changing at a rate much faster than a
typical measurement time. For this reason we expect that during a measurement of the pressure of
a gas or a liquid, there are many collisions with the wall and hence the pressure has a well deﬁned
average. We also expect that the probability that the measured pressure deviates from its average
value is proportional to N −1/2 , where N is the number of particles. Similarly, the vibrations of
the molecules in a solid have a time scale much smaller than that of macroscopic measurements,
and hence the pressure of a solid also is a welldeﬁned quantity.
Problem 3.48. Use the central limit theorem to show that the probability that a onedimensional
random walker has a displacement between x and x + dx. (There is no need to derive the central
limit theorem.)
Problem 3.49. Write a program to test the applicability of the central limit theorem. For
simplicity, assume that the variable si is uniformly distributed between 0 and 1. First compute
the mean and standard deviation of s and compare your numerical results with your analytical
calculation. Then sum N = 10 000 values of si to obtain one measurement of S . Compute the
sum for many measurements, say M = 1000. Store in an array H (S ) the number of times S is
between S and S + ∆S . Plot your results for H (S ) and determine how H (S ) depends on N . How
do you results change if M = 10000? Do your results for the form of H (S ) depend strongly on the
number of measurements M ? CHAPTER 3. CONCEPTS OF PROBABILITY 3.9 116 The Poisson distribution and should you ﬂy in airplanes? We now return to the question of whether or not it is safe to ﬂy. If the probability of a plane
crashing is p = 10−5 , then 1 − p is the probability of surviving a single ﬂight. The probability
of surviving N ﬂights is then PN = (1 − p)N . For N = 400, PN ≈ 0.996, and for N = 105 ,
PN ≈ 0.365. Thus, our intuition is veriﬁed that if we lived eighty years and took 400 ﬂights, we
would have only a small chance of crashing.
This type of reasoning is typical when the probability of an individual event is small, but
there are very many attempts. Suppose we are interested in the probability of the occurrence of n
events out of N attempts such that the probability p of the event for each attempt is very small.
The resulting probability is called the Poisson distribution, a distribution that is important in the
analysis of experimental data. We discuss it here because of its intrinsic interest.
To derive the Poisson distribution, we begin with the binomial distribution:
P (n) = N!
pn (1 − p)N −n .
n! (N − n)! (3.121) (As before, we suppress the N dependence of P .) As in Section (3.7, we will approximate ln P (n)
rather than P (n) directly. We ﬁrst use Stirling’s approximation to write
ln N!
= ln N ! − ln(N − n)!
(N − n)!
≈ N ln N − (N − n) ln(N − n)
≈ N ln N − (N − n) ln N
= N ln N − N ln N + n ln N
= n ln N. (3.122) (3.123) From (3.123) we obtain
N!
≈ en ln N = N n .
(N − n)! (3.124) For p
1, we have ln(1 − p) ≈ −p, eln(1−p) = 1 − p ≈ e−p , and (1 − p)N −n ≈ e−p(N −n) ≈ e−pN .
If we use the above approximations, we ﬁnd
P (n) ≈ (N p)n −pN
N n n −pN
pe
e
=
,
n!
n! (3.125) P (n) = n n −n
e,
n! (3.126) or (Poisson distribution) where
n = pN. (3.127) The form (3.126) is the Poisson distribution.
Let us apply the Poisson distribution to the airplane survival problem. We want to know the
probability of never crashing, that is, P (n = 0). The mean N = pN equals 10−5 × 400 = 0.004 for CHAPTER 3. CONCEPTS OF PROBABILITY 117 N = 400 ﬂights and N = 1 for N = 105 ﬂights. Thus, the survival probability is P (0) = e−N ≈
0.996 for N = 400 and P (0) ≈ 0.368 for N = 105 as we calculated previously. We see that if we
ﬂy 100,000 times, we have a much larger probability of dying in a plane crash.
Problem 3.50. Show that the Poisson distribution is properly normalized, and calculate the mean
and variance of n. Because P (n) for n > N is negligibly small, you can sum P (n) from n = 0
to n = ∞ even though the maximum value of n is N . Plot the Poisson distribution P (n) as a
function of n for p = 0.01 and N = 100. 3.10 *Traﬃc ﬂow and the exponential distribution The Poisson distribution is closely related to the exponential distribution as we will see in the
following. Consider a sequence of similar random events and let t1 , t2 , . . . be the time at which
each successive event occurs. Examples of such sequences are the successive times when a phone
call is received and the times when a Geiger counter registers a decay of a radioactive nucleus.
Suppose that we determine the sequence over a very long time T that is much greater than any
of the intervals ti − ti−1 . We also suppose that the average number of events is λ per unit time so
that in a time interval t, the mean number of events is λt.
Assume that the events occur at random and are independent of each other. Given λ, the
mean number of events per unit time, we wish to ﬁnd the probability distribution w(t) of the
interval t between the events. We know that if an event occurred at time t = 0, the probability
that another event occurs within the interval [0, t] is
t w(t)∆t, (3.128) 0 and the probability that no event occurs in the interval t is
t 1− w(t)∆t. (3.129) 0 Thus the probability that the duration of the interval between the two events is between t and
t + ∆t is given by
w(t)∆t = probability that no event occurs in the interval [0, t]
× probability that an event occurs in interval [t, t + ∆t]
t = 1− w(t)dt λ∆t. (3.130) 0 If we cancel ∆t from each side of (3.130) and diﬀerentiate both sides with respect to t, we ﬁnd
dw
= −λw,
dt
so that
w(t) = Ae−λt . (3.131) CHAPTER 3. CONCEPTS OF PROBABILITY 118 The constant of integration A is determined from the normalization condition:
∞ ∞ w(t)dt = 1 = A
0 e−λt dt = A/λ. (3.132) 0 Hence, w(t) is the exponential function
w(t) = λe−λt .
N
0
1
2
2
4
5
6
7
8
9
10
11
12
13
14
> 15 (3.133) frequency
1
7
14
25
31
26
27
14
8
3
4
3
1
0
1
0 Table 3.5: Observed distribution of vehicles passing a marker on a highway in thirty second intervals. The data was taken from page 98 of Montroll and Badger.
The above results for the exponential distribution lead naturally to the Poisson distribution.
Let us divide a long time interval T into n smaller intervals t = T /n. What is the probability that
0, 1, 2, 3, . . . events occur in the time interval t, given λ, the mean number of events per unit
time? We will show that the probability that n events occur in the time interval t is given by the
Poisson distribution:
(λt)n −λt
e.
(3.134)
Pn (t) =
n!
We ﬁrst consider the case n = 0. If n = 0, the probability that no event occurs in the interval t is
(see (3.130))
t Pn=0 (t) = 1 − λ e−λt dt = e−λt . (3.135) 0 For the case n = 1, there is exactly one event in time interval t. This event must occur at
some time t which may occur with equal probability in the interval [0, t]. Because no event can
occur in the interval [t , t], we have
t Pn=1 (t) =
0 λe−λt e−λ(t −t) dt , (3.136) CHAPTER 3. CONCEPTS OF PROBABILITY 119 where we have used (3.135) with t → (t − t). Hence,
t Pn=1 (t) = λe−λt dt = (λt)e−λt . (3.137) 0 In general, if n events are to occur in the interval [0, t], the ﬁrst must occur at some time t
and exactly (n − 1) must occur in the time (t − t ). Hence,
t Pn (t) = λe−λt Pn−1 (t − t ). (3.138) 0 Equation (3.138) is a recurrence formula that can be used to derive (3.134) by induction. It is easy
to see that (3.134) satisﬁes (3.138) for n = 0 and 1. As is usual when solving recursion formula by
induction, we assume that (3.134) is correct for (n − 1). We substitute this result into (3.138) and
ﬁnd
t
(λt)b −λt
e.
Pn (t) = λn e−λt (t − t )n−1 dt /(n − 1)! =
(3.139)
n!
0
An application of the Poisson distribution is given in Problem 3.51.
Problem 3.51. In Table 3.5 we show the number of vehicles passing a marker during a thirty
second interval. The observations were made on a single lane of a six lane divided highway. Assume
that the traﬃc density is so low that passing occurs easily and no platoons of cars develop. Is the
distribution of the number of vehicles consistent with the Poisson distribution? If so, what is the
value of the parameter λ?
As the traﬃc density increases, the ﬂow reaches a regime where the vehicles are very close to
one another so that they are no longer mutually independent. Make arguments for the form of the
probability distribution of the number of vehicles passing a given point in this regime. 3.11 *Are all probability distributions Gaussian? We have discussed the properties of random additive processes and found that the probability
distribution for the sum is a Gaussian. As an example of such a process, we discussed a onedimensional random walk on a lattice for which the displacement x is the sum of N random steps.
We now discuss random multiplicative processes. Examples of such processes include the
distributions of incomes, rainfall, and fragment sizes in rock crushing processes. Consider the
latter for which we begin with a rock of size w. We strike the rock with a hammer and generate
two fragments whose sizes are pw and qw, where q = 1 − p. In the next step the possible sizes
of the fragments are p2 w, pqw, qpw, and q 2 w. What is the distribution of the fragments after N
blows of the hammer?
To answer this question, consider a binary sequence in which the numbers x1 and x2 appear
independently with probabilities p and q respectively. If there are N elements in the product Π, we
can ask what is Π, the mean value of Π. To compute Π, we deﬁne P (n) as the probability that the
product of N independent factors of x1 and x2 has the value x1 n x2 N −n . This probability is given CHAPTER 3. CONCEPTS OF PROBABILITY 120 by the number of sequences where x1 appears n times multiplied by the probability of choosing a
speciﬁc sequence with x1 appearing n times:
P (n) = N!
p n q N −n .
n! (N − n)! (3.140) The mean value of the product is given by
N Π= P (n)x1 n x2 N −n (3.141) n=0 = (px1 + qx2 )N . (3.142) The most probable event is one in which the product contains N p factors of x1 and N q factors of
x2 . Hence, the most probable value of the product is
Π = (x1 p x2 q )N . (3.143) We have found that the average value of the sum of random variables is a good approximation
to the most probable value of the sum. Let us see if there is a similar relation for a random
multiplicative process. We ﬁrst consider x1 = 2, x2 = 1/2, and p = q = 1/2. Then Π =
[(1/2) × 2 + (1/2) × (1/2)]N = (5/4)N = eN ln 5/4 . In contrast Π = 21/2 × (1/2)1/2 = 1.
The reason for the large discrepancy between Π and Π is the relatively important role played
by rare events. For example, a sequence of N factors of x1 = 2 occurs with a very small probability,
but the value of this product is very large in comparison to the most probable value. Hence, this
extreme event makes a ﬁnite contribution to Π and a dominant contribution to the higher moments
m
Π.
∗ Problem 3.52. (a) Conﬁrm the above general results for N = 4 by showing explicitly all the
possible values of the product. (b) Consider the case x1 = 2, x2 = 1/2, p = 1/3, and q = 2/3, and
calculate Π and Π.
m ∗ Problem 3.53. (a) Show that Π reduces to (pxm )N as m → ∞. This result implies that
1
for m
1, the mth moment is determined solely by the most extreme event. (b) Based on the
Gaussian approximation for the probability of a random additive process, what is a reasonable
guess for the continuum approximation to the probability of a random multiplicative process?
Such a distribution is called the lognormal distribution. Discuss why or why not you expect the
lognormal distribution to be a good approximation for N
1. (c) More insight can be gained by
running the applet at <stp.clarku.edu/simulations/product> which simulates the distribution
of values of the product x1 n x2 N −n . Choose x1 = 2, x2 = 1/2, and p = q = 1/2 for which we have
already calculated the analytical results for Π and Π. First choose N = 4 and estimate Π and
Π. Do your estimated values converge more or less uniformly to the exact values as the number
of measurements becomes large? Do a similar simulation for N = 40. Compare your results with
a similar simulation of a random walk and discuss the importance of extreme events for random
multiplicative processes. An excellent discussion is given by Redner (see references). Vocabulary
sample space, events, outcome CHAPTER 3. CONCEPTS OF PROBABILITY 121 uncertainty, principle of least bias or maximum uncertainty
probability distribution, probability density
mean value, moments, variance, standard deviation
conditional probability, Bayes’ theorem
binomial distribution, Gaussian distribution, Poisson distribution
random walk, random additive processes, central limit theorem
Stirling’s approximation
Monte Carlo sampling
Notation
probability distribution P (i), mean value f (x), variance ∆x2 , standard deviation σ
conditional probability P (AB ), probability density p(x) Appendix 3A: The uncertainty for unequal probabilities
Consider a loaded die for which the probabilities Pj are not equal. We wish to motivate the form
(3.29) for S . Imagine that we roll the die a large number of times N . Then each outcome would
occur Nj = N Pj times and there would be Nj = N P1 outcomes of face 1, N P2 outcomes of face
2, . . . These outcomes could occur in many diﬀerent orders. Thus the original uncertainty about
the outcome of one roll of a die is converted into an uncertainty about order. Because all the
possible orders that can occur in an experiment of N rolls are equally likely, we can use (3.28) for
the associated uncertainty SN :
SN = ln Ω = ln N!
,
j Nj ! (3.144) The righthand side of (3.144) equals the total number of possible sequences.
To understand the form (3.144) suppose that we know that if we toss a coin four times, we
will obtain 2 heads and 2 tails. What we don’t know is the sequence. In Table 3.6 we show the six
possible sequences. It is easy to see that this number is given by
M= N!
4!
= 6.
=
Nj
2! 2!
j (3.145) Now that we understand the form of SN in (3.144), we can ﬁnd the desired form of S . The
uncertainty SN in (3.144) is the uncertainty associated with all N rolls. The uncertainty associated
with one roll is
1
1
SN = lim
ln
N →∞ N
N →∞ N S = lim N!
1
= lim
ln N ! −
N →∞ N
j Nj ! ln Nj ! .
j (3.146) CHAPTER 3. CONCEPTS OF PROBABILITY
H
H
H
T
T
T H
T
T
T
H
H 122
T
H
T
H
T
H T
T
H
H
H
T Table 3.6: Possible sequences of tossing a coin four times such that two heads and two tails are
obtained. We can reduce (3.146) to a simpler form by using Stirling’s approximation, ln N ! ≈ N ln N − N
for large N and substituting Nj = N Pj :
S = lim N →∞ 1
N ln N − N −
N (N Pj ) ln(N Pj ) − (3.147) j 1
N ln N − N − ln N
N →∞ N = lim Pj ln Pj − N (N Pj ) + N
j 1
N ln N − N − N ln N − N
N →∞ N j = lim
=− (N Pj ) j Pj ] (3.148) j Pj ln Pj (3.149) j Pj ln Pj . (3.150) j Appendix 3B: Method of undetermined multipliers
Suppose that we want to maximize the function f (x, y ) = xy 2 subject to the constraint that
x2 + y 2 = 1. One way would be to substitute y 2 = 1 − x2 and maximize f (x) = x(1 − x2 ).
However, this approach works only if f can be reduced to a function of one variable. However
we ﬁrst consider this simple case as a way of introducing the general method of undetermined
multiplers.
We wish to maximize f (x, y ) subject to the constraint that g (x, y ) = x2 + y 2 − 1 = 0. In the
method of undetermined multipliers, this problem can be reduced to solving the equation
df − λdg = 0, (3.151) where df = y 2 dx + 2xydy = 0 at the maximum of f and dg = 2xdx + 2ydy = 0. If we substitute
df and dg in (3.151), we have
(y 2 − 2λx)dx + 2(xy − λy )dy = 0. (3.152) We can choose λ = y 2 /2x so that the ﬁrst term is zero. Because this term is zero, the second term
√
must also be zero; that is, x = λ = y 2 /2x, so x = ±y/ 2. Hence, from the constraint g (x, y ) = 0,
we obtain x = 1/3 and λ = 2. CHAPTER 3. CONCEPTS OF PROBABILITY 123 In general, we wish to maximize the function f (x1 , x2 , . . . , xN ) subject to the constraints
gj (x1 , x2 , . . . , xN ) = 0 where j = 1, 2, . . . , M with M < N . The maximum of f is given by
N ∂f
dxi = 0,
∂xi i=1 (3.153) ∂ gj
dxi = 0.
∂xi df = (3.154) and the constraint can be expressed as
N dg =
i=1 As in our example, we can combine (3.153) and (3.154) and write df −
N M
j =1 λj dgj = 0 or M i=1 ∂f
∂ gj
−
λj
∂xi
∂xi
i=1 dxi = 0. (3.155) We are free to choose all M values of αj such that the ﬁrst M terms in the square brackets are
zero. For the remaining N − M terms, the dxi can be independently varied because the constraints
have been satisﬁed. Hence, the remaining terms in square brackets must be independently zero
and we are left with N − M equations of the form
M ∂ gj
∂f
−
λj
∂xi
∂xi
i=1 = 0. (3.156) In Example 3.11 we were able to obtain the probabilities by reducing the uncertainty S to
a function of a single variable P1 and then maximizing S (P1 ). We now consider a more general
problem where there are more outcomes, the case of a loaded die for which there are six outcomes.
Suppose that we know that the average number of points on the face of a die if f . Then we wish
to determine P1 , P2 , . . . , P6 subject to the constraints
6 Pj = 1, (3.157) jPj = f. (3.158) j =1
6
j =1 For a perfect die f = 3.5. Equation (3.156) becomes
6 (1 + ln Pj ) + α + βj ]dPj = 0, (3.159) j =1
6 6 where we have used dS = − j =1 d(Pj ln Pj ) = − j =1 (1 + ln Pj )dPj ; α and β are the undetermined (Lagrange) multipliers. We choose α and β so that the ﬁrst two terms in the brackets (with
j = 1 and j = 2 are independently zero. We write
α = ln P2 − 2 ln P1 − 1 (3.160a) β = ln P1 − ln P2 . (3.160b) CHAPTER 3. CONCEPTS OF PROBABILITY 124 We can solve (3.160b) for ln P2 = ln P1 − β and use (3.160a) to ﬁnd ln P1 = −1 − α − β and use
this result to write P2 = −1 − α − β 2. We can independently vary dP3 , . . . dP6 because the two
constraints are satisﬁed by the values of P1 and P2 . We let
ln Pj = −1 − α − jβ, (3.161) Pj = e−1−α e−βj . (3.162) or We can eliminate the constant α by the normalization condition (3.157):
Pj = e−βj
.
−βj
je (3.163) The constant β is determined by the constraint (3.38):
f= e −β + 2 e −β 2 + 3 e −β 3 + 4 e −β 4 + 5 e −β 5 + 6 e −β 6
.
e −β + e −β 2 + e −β 3 + e −β 4 + e −β 5 + e −β 6 (3.164) In general, (3.164) must be solved numerically.
Problem 3.54. Show that the solution to (3.164) is β = 0 for f = 7/2, β = +∞ for f = 2,
β = −∞ for f = 6, and β = −0.1746 for f = 4. Appendix 3C: Derivation of the central limit theorem
To discuss the derivation of the central limit theorem, it is convenient to introduce the characteristic
function φ(k ) of the probability density p(x). The main utility of the characteristic function is that
it simpliﬁes the analysis of the sums of independent random variables. We deﬁne φ(k ) as the Fourier
transform of p(x):
∞ φ(k ) = eikx = dx eikx p(x). (3.165) −∞ Because p(x) is normalized, it follows that φ(k = 0) = 1. The main property of the Fourier
transform that we need is that if φ(k ) is known, we can ﬁnd p(x) by calculating the inverse Fourier
transform:
1∞
dk e−ikx φ(k ).
(3.166)
p(x) =
2π −∞
Problem 3.55. Calculate the characteristic function of the Gaussian probability density.
One useful property of φ(k ) is that its power series expansion yields the moments of p(x):
∞ φ(k ) = k n dn φ(k )
n! dk n
n=0 , (3.167) (ik )n xn
.
n!
n=0 (3.168) k=0 ∞ = eikx = CHAPTER 3. CONCEPTS OF PROBABILITY 125 By comparing coeﬃcients of k n in (3.167) and (3.168), we see that
x = −i dφ
dk k=0 . (3.169) In Problem 3.56 we show that
x2 − x2 = − d2
ln φ(k )
dk 2 k=0 (3.170) and that certain convenient combinations of the moments are related to the power series expansion
of the logarithm of the characteristic function.
Problem 3.56. The characteristic function generates the cumulants Cm deﬁned by
∞ ln φ(k ) = (ik )m
Cm .
m!
m=1 (3.171) Show that the cumulants are combinations of the moments of x and are given by
C1 = x (3.172a)
2 C2 = σ = x2 −x 2 (3.172b) C3 = x3 − 3 x2 x + 2 x 3 (3.172c)
2 C4 = x4 − 4 x3 x − 3 x2 + 12 x2 x2 − 6 x4 . (3.172d) Now let us consider the properties of the characteristic function for the sums of independent
variables. For example, let p1 (x) be the probability density for the weight x of adult males and let
p2 (y ) be the probability density for the weight of adult females. If we assume that people marry
one another independently of weight, what is the probability density p(z ) for the weight z of an
adult couple? We have that
z = x + y. (3.173) How do the probability densities combine? The answer is
p(z ) = dx dy p1 (x)p2 (y ) δ (z − x − y ). (3.174) The integral in (3.174) represents all the possible ways of obtaining the combined weight z as
determined by the probability density p1 (x)p2 (y ) for the combination of x and y that sums to
z . The form (3.174) of the integrand is known as a convolution. An important property of a
convolution is that its Fourier transform is a simple product. We have
φz (k ) = dz eikz p(z ) (3.175) dx dy eikz p1 (x)p2 (y ) δ (z − x − y ) = dz = dx eikx p1 (x) = φ1 (k )φ2 (k ). dy eiky p2 (y )
(3.176) CHAPTER 3. CONCEPTS OF PROBABILITY 126 It is straightforward to generalize this result to a sum of N random variables. We write
z = x1 + x2 + . . . + xN . (3.177) Then
N φz (k ) = φi (k ). (3.178) i=1 That is, for independent variables the characteristic function of the sum is the product of the
individual characteristic functions. If we take the logarithm of both sides of (3.178), we obtain
N ln φz (k ) = ln φi (k ). (3.179) i=1 Each side of (3.179) can be expanded as a power series and compared order by order in powers
of ik . The result is that when random variables are added, their associated cumulants also add.
That is, the nth order cumulants satisfy the relation:
z
1
2
N
Cn = Cn + Cn + . . . + Cn . (3.180) We conclude see that if the random variables xi are independent (uncorrelated), their cumulants
and in particular, their variances, add.
If we denote the mean and standard deviation of the weight of an adult male as x and σ
respectively, then from (3.172a) and (3.180) we ﬁnd that the mean weight of N adult males is
given by N x. Similarly from (3.172b) we see that the standard deviation of the weight of N adult
√
2
males is given by σN = N σ 2 , or σN = N σ . Hence, we ﬁnd the now familiar result that the sum
√
of N random variables scales as N while the standard deviation scales as N .
We are now in a position to derive the central limit theorem. Let x1 , x2 , . . ., xN be N mutually
independent variables. For simplicity, we assume that each variable has the same probability
2
density p(x). The only condition is that the variance σx of the probability density p(x) must be
ﬁnite. For simplicity, we make the additional assumption that x = 0, a condition that always can
be satisﬁed by measuring x from its mean. The central limit theorem states that the sum S has
the probability density
2
2
1
p(S ) = √
e−S /2N σx
(3.181)
2πN σx 2
2
From (3.172b) we see that S 2 = N σx , and hence the variance of S grows linearly with N . However,
the distribution of the values of the arithmetic mean S/N becomes narrower with increasing N : x1 + x2 + . . . xN
N 2 = 2
N σx
σ2
= x.
N2
N (3.182) From (3.182) we see that it is useful to deﬁne a scaled sum:
1
z = √ (x1 + x2 + . . . + xN ),
N (3.183) CHAPTER 3. CONCEPTS OF PROBABILITY 127 and to write the central limit theorem in the form
2
2
1
p(z ) = √
e−z /2σ .
2
2πσ (3.184) To obtain the result (3.184), we write the characteristic function of z as
dx eikz φz (k ) = dx1 dx2 · · · dxN δ z − x1 + x2 + . . . + xN
N 1/2 × p(x1 ) p(x2 ) . . . p(xN )
=
=φ dx1
k
N 1/2 dx2 . . . dxN eik(x1 +x2 +...xN )/N
N 1/2 p(x1 ) p(x2 ) . . . p(xN ) . (3.185) We next take the logarithm of both sides of (3.185) and expand the righthand side in powers of
k to ﬁnd
∞
(ik )m 1−m/2
N
Cm .
(3.186)
ln φz (k ) =
m!
m=2
The m = 1 term does not contribute in (3.186) because we have assumed that x = 0. More
importantly, note that as N → ∞, the higherorder terms are suppressed so that
ln φz (k ) → −
or φz (k ) → e−k 2 k2
C2 ,
2 σ 2 /2 + ... (3.187)
(3.188) Because the inverse Fourier transform of a Gaussian is also a Gaussian, we ﬁnd that
2
2
1
p(z ) = √
e−z /2σ .
2
2πσ (3.189) The leading correction to φ(k ) in (3.189) gives rise to a term of order N −1/2 , and therefore does
not contribute in the limit N → ∞.
The conditions for the rigorous applicability of the central limit theorem can be found in
textbooks on probability. The only requirements are that the various xi be statistically independent
and that the second moment of p(x) exists. Not all probabilities satisfy this latter requirement as
demonstrated by the Lorentz distribution (see Problem 3.45). It is not necessary that all the xi
have the same distribution. CHAPTER 3. CONCEPTS OF PROBABILITY 128 Figure 3.7: Representation of a square lattice of 16 × 16 sites. The sites are represented by squares
and are either occupied (shaded) or empty (white). Additional Problems
Problems
3.1, 3.2, 3.3, 3.4, 3.5, 3.6
3.7, 3.8, 3.9, 3.10, 3.11, 3.12, 3.13, 3.14, 3.15
3.16
3.17, 3.18, 3.19, 3.20, 3.21
3.22, 3.23
3.24
3.26, 3.27
3.28, 3.29
3.30, 3.31
3.32, 3.33
3.34, 3.35
3.36
3.37, 3.38, 3.39, 3.40
3.41
3.42, 3.43, 3.44, 3.45
3.46
3.47
3.48, 3.49
3.50
3.51
3.52, 3.53
3.54
3.55, 3.56
Listing of inline problems. page
83
84
86
88
89
92
96
99
101
103
104
108
108
110
111
113
114
115
117
119
120
124
125 Problem 3.57. In Figure 3.7 we show a square lattice of 162 sites each of which is either occupied
or empty. Estimate the probability that a site in the lattice is occupied.
Problem 3.58. Three coins are tossed in succession. Assume that the simple events are equiprobable. Find the probabilities of the following: (a) the ﬁrst coin is a heads, (b) exactly two heads
have occurred, (c) not more than two heads have occurred. CHAPTER 3. CONCEPTS OF PROBABILITY 1
2
3
4
5 xi , yi
0.984, 0.246
0.860, 0.132
0.316, 0.028
0.523, 0.542
0.349, 0.623 129 6
7
8
9
10 xi , yi
0.637,
0.779,
0.276,
0.081,
0.289, 0.581
0.218
0.238
0.484
0.032 Table 3.7: A sequence of ten random pairs of numbers.
Problem 3.59. A student tries to solve Problem 3.18 by using the following reasoning. The
probability of a double six is 1/36. Hence the probability of ﬁnding at least one double six in 24
throws is 24/36. What is wrong with this reasoning? If you have trouble understanding the error
in this reason, try solving the problem of ﬁnding the probability of at least one double six in two
throws of a pair of dice. What are the possible outcomes? Is each outcome equally probable?
Problem 3.60. A farmer wants to estimate how many ﬁsh are in her pond. She takes out 200 ﬁsh
and tags them and returns them to the pond. After suﬃcient time to allow the tagged ﬁsh to mix
with the others, she removes 250 ﬁsh at random and ﬁnds that 25 of them are tagged. Estimate
the number of ﬁsh in the pond.
Problem 3.61. A farmer owns a ﬁeld that is 10 m × 10 m. In the midst of this ﬁeld is a pond of
unknown area. Suppose that the farmer is able to throw 100 stones at random into the ﬁeld and
ﬁnds that 40 of the stones make a splash. How can the farmer use this information to estimate
the area of the pond?
Problem 3.62. Consider the ten pairs of numbers, (xi , yi ), given in Table 3.7. The numbers are
all are in the range 0 < xi , yi ≤ 1. Imagine that these numbers were generated by counting the
clicks generated by a Geiger counter of radioactive decays, and hence they can be considered to
be a part of a sequence of random numbers. Use this sequence to estimate the magnitude of the
integral
1 F= dx (1 − x2 ). (3.190) 0 If you have been successful in estimating the integral in this way, you have found a simple version
of a general method known as Monte Carlo integration.9 An applet for estimating integrals by
Monte Carlo integration can be found at <stp.clarku.edu/simulations/estimation>.
Problem 3.63. A person playing darts hits a bullseye 20% of the time on the average. Why is
the probability of b bullseyes in N attempts a binomial distribution? What are the values of p and
q ? Find the probability that the person hits a bullseye (a) once in ﬁve throws; (b) twice in ten
throws. Why are these probabilities not identical?
Problem 3.64. There are 10 children in a given family. Assuming that a boy is as likely to be
born as a girl, ﬁnd the probability of the family having (a) 5 boys and 5 girls; (b) 3 boys and 7
girls.
9 Monte Carlo methods were ﬁrst developed to estimate integrals that could not be performed by other ways. CHAPTER 3. CONCEPTS OF PROBABILITY 130 Problem 3.65. What is the probability that ﬁve children produced by the same couple will consist
of (a) three sons and two daughters? (b) alternating sexes? (c) alternating sexes starting with a
son? (d) all daughters? Assume that the probability of giving birth to a boy and a girl is the same.
Problem 3.66. A good hitter in baseball has a batting average of 300 or more, which means that
the hitter will be successful 3 times out of 10 tries on the average. Assume that on average a hitter
gets three hits for each 10 times at bat and that he has 4 times at bat per game. (a) What is the
probability that he gets zero hits in one game? (b) What is the probability that he will get two
hits or less in a three game series? (c) What is the probability that he will get ﬁve or more hits
in a three game series? Baseball fans might want to think about the signiﬁcance of “slumps” and
“streaks” in baseball.
Problem 3.67. In the World Series in baseball and in the playoﬀs in the National Basketball
Association and the National Hockey Association, the winner is determined by a best of seven
series. That is, the ﬁrst team that wins four games wins the series and is the champion. Do
a simple statistical calculation assuming that the two teams are evenly matched and the winner of any game might as well be determined by a coin ﬂip and show that a seven game series should occur 31.25% of the time. What is the probability that the series lasts n games?
More information can be found at <www.mste.uiuc.edu/hill/ev/seriesprob.html> and at
<www.insidescience.org/reports/2003/080.html>.
Problem 3.68. The Galton board (named after Francis Galton (1822–1911)), is a triangular array
of pegs. The rows are numbered 0, 1, . . . from the top row down such that row n has n + 1 pegs.
Suppose that a ball is dropped from above the top peg. Each time the ball hits a peg, it bounces to
the right with probability p and to the left with probability 1 − p, independently from peg to peg.
Suppose that N balls are dropped successively such that the balls do not encounter one another.
How will the balls be distributed at the bottom of the board? Links to applets that simulate the
Galton board can be found in the references.
Problem 3.69. (a) What are the chances that at least two people in your class have the same
birthday? Assume that the number of students is 25. (b) What are the chances that at least one
other person in your class has the same birthday as you? Explain why the chances are less in the
second case.
Problem 3.70. Many analysts attempt to select stocks by looking for correlations in the stock
market as a whole or for patterns for particular companies. Such an analysis is based on the belief
that there are repetitive patterns in stock prices. To understand one reason for the persistence
of this belief do the following experiment. Construct a stock chart (a plot of stock price versus
time) showing the movements of a hypothetical stock initially selling at $50 per share. On each
successive day the closing stock price is determined by the ﬂip of a coin. If the coin toss is a head,
the stock closes 1/2 point ($0.50) higher than the preceding close. If the toss is a tail, the price
is down by 1/2 point. Construct the stock chart for a long enough time to see “cycles” and other
patterns appear. The moral of the charts is that a sequence of numbers produced in this manner
is identical to a random walk, yet the sequence frequently appears to be correlated.
Problem 3.71. Suppose that a random walker takes N steps of unit length with probability p
of a step to the right. The displacement m of the walker from the origin is given by m = n − n ,
where n is the number of steps to the right and n is the number of steps to the left. Show that
2
m = (p − q )N and σm = (m − m)2 = 4N pq . CHAPTER 3. CONCEPTS OF PROBABILITY 131 2
Problem 3.72. The result (3.61) for (∆M )2 diﬀers by a factor of four from the result for σn in
(3.87). Why? Compare (3.61) to the result of Problem 3.39. Problem 3.73. A random walker is observed to take a total of N steps, n of which are to the
right. (a) Suppose that a curious observer ﬁnds that on ten successive nights the walker takes
N = 20 steps and that the values of n are given successively by 14, 13, 11, 12, 11, 12, 16, 16, 14,
8. Compute n, n2 , and σn . Use this information to estimate p. If your reasoning gives diﬀerent
values for p, which estimate is likely to be the most accurate? (b) Suppose that on another ten
successive nights the same walker takes N = 100 steps and that the values of n are given by 58,
69, 71, 58, 63, 53, 64, 66, 65, 50. Compute the same quantities as in part (a) and estimate p. How
does the ratio of σn to n compare for the two values of N ? Explain your results. (c) Compute m
and σm , where m = n − n is the net displacement of the walker. This problem inspired an article
by Zia and Schmittmann (see the references).
Problem 3.74. In Section 3.7 we evaluated the derivatives of P (n) to determine the parameters
A, B , and n in (3.106). Another way to determine these parameters is to assume that the binomial
˜
distribution can be approximated by a Gaussian and require that the ﬁrst several moments of the
Gaussian and binomial distribution be equal. We write
2 1 ˜
P (n) = Ae− 2 B (n−n) , and require that (3.191) N P (n) dn = 1. (3.192) 0 Because P (n) depends on the diﬀerence n − n, it is convenient to change the variable of integration
˜
in (3.192) to x = n − n. We have
˜
N −n
˜ P (x) dx = 1, (3.193) −n
˜ where 1 2 P (x) = Ae− 2 Bx . (3.194) In the limit of large N , we can extend the upper and lower limits of integration in (3.193) and
write
∞
P (x) dx = 1, (3.195) nP (n) dn = pN. (3.196) xP (x) dx = n − n.
˜ (3.197) −∞ The ﬁrst moment of P (n) is given by
N n=
0 Make a change of variables and show that
∞
−∞ Because the integral in (3.197) is zero, we can conclude that n = n. We also have that
˜
N (n − n)2 = (n − n)2 P (n) dn = pqN. 0 (3.198) CHAPTER 3. CONCEPTS OF PROBABILITY 132 Figure 3.8: Example of a castle wall as explained in Problem 3.75.
Do the integrals in (3.198) and (3.195) (see (A.17) and (A.21)) and conﬁrm that the values of B
and A are given by (3.112) and (3.114), respectively. The generality of the arguments leading to
the Gaussian distribution suggests that it occurs frequently in probability when large numbers are
involved. Note that the Gaussian distribution is characterized completely by its mean value and
its width.
Problem 3.75. Consider a twodimensional “castle wall” constructed from N squares as shown in
Figure 3.8. The base row of the cluster must be continuous, but higher rows can have gaps. Each
column must be continuous and selfsupporting. Determine the total number WN of diﬀerent N site clusters, that is, the number of possible arrangements of N squares consistent with the above
rules. Assume that the squares are identical.
Problem 3.76. Suppose that a onedimensional unbiased random walker starts out at the origin
x = 0 at t = 0. How many steps will it take for the walker to reach a site at x = 4? This quantity,
known as the ﬁrst passage time, is a random variable because it is diﬀerent for diﬀerent possible
realizations of the walk. Possible quantities of interest are the probability distribution of the ﬁrst
passage time and the mean ﬁrst passage time, τ . Write a computer program to estimate τ (x) and
then determine its analytical dependence on x. Why is it more diﬃcult to estimate τ for x = 8
than for x = 4?
Problem 3.77. Two people take turns tossing a coin. The ﬁrst person to obtain heads is the
winner. Find the probabilities of the following events: (a) the game terminates at the fourth toss;
(b) the ﬁrst player wins the game; (c) the second player wins the game.
∗ Problem 3.78. How good is the Gaussian distribution as an approximation to the binomial
distribution as a function of N ? To determine the validity of the Gaussian distribution, consider
the next two terms in the power series expansion of ln P (n):
1
1
(n − n)3 C +
˜
(n − n)4 D,
˜
3!
4!) (3.199) ˜
with C = d3 ln P (n)/d3 n and D = d4 ln P (n)/d4 n evaluated at n = n. (a) Show that C = 0 if
p = q . Calculate D for p = q and estimate the order of magnitude of the ﬁrst nonzero correction.
Compare this correction to the magnitude of the ﬁrst nonzero term in ln P (n) (see (3.102)) and
determine the conditions for which the terms beyond (n − n)2 can be neglected. (b) Deﬁne the
˜
error as
Binomial(n)
(3.200)
E (n) = 1 −
Gaussian(n) CHAPTER 3. CONCEPTS OF PROBABILITY 133 Plot E (n) versus n and determine the approximate width of E (n). (c) Show that if N is suﬃciently
large and neither p nor q is too small, the Gaussian distribution is a good approximation for n
near the maximum of P (n). Because P (n) is very small for large (n − n), the error in the Gaussian
approximation for larger n is negligible.
Problem 3.79. Consider a random walk on a twodimensional square lattice where the walker
has an equal probability of taking a step to one of four possible directions, north, south, east, or
west. Use the central limit theorem to ﬁnd the probability that the walker is a distance r to r + dr
from the origin, where r2 = x2 + y 2 and r is the distance from the origin after N steps. There is
no need to do an explicit calculation.
Problem 3.80. One of the ﬁrst continuum models of a random walk is due to Rayleigh (1919).
In the Rayleigh model the length a of each step is a random variable with probability density p(a)
and the direction of each step is random. For simplicity consider a walk in two dimensions and
choose p(a) so that each step has unit length. Then at each step the walker takes a step of unit
length at a random angle. Use the central limit theorem to ﬁnd the asymptotic form of p(r, N ) dr,
the probability that the walker is in the range r to r + dr, where r is the distance from the origin
after N steps.
Problem 3.81. Suppose there are three boxes each with two balls. The ﬁrst box has two green
balls, the second box has one green and one red ball, and the third box has two red balls. Suppose
you choose a box at random and ﬁnd one green ball. What is the probability that the other ball
is green?
Problem 3.82. Open a telephone directory to an random page and make a list corresponding to
the last digit n of the ﬁrst 100 telephone numbers. Find the probability P (n) that the number n
appears. Plot P (n) as a function of n and describe its ndependence. Do you expect that P (n)
should be approximately uniform?
∗ Problem 3.83. A simple model of a porous rock can be imagined by placing a series of overlapping spheres at random into a container of ﬁxed volume V . The spheres represent the rock and
the space between the spheres represents the pores. If we write the volume of the sphere as v , it
can be shown the fraction of the space between the spheres or the porosity φ is φ = exp(−N v/V ),
where N is the number of spheres. For simplicity, consider a twodimensional system, and write
a program to place disks of diameter unity into a square box. The disks can overlap. Divide the
box into square cells each of which has an edge length equal to the diameter of the disks. Find the
probability of having 0, 1, 2, or 3 disks in a cell for φ = 0.03, 0.1, and 0.5.
∗ Problem 3.84. Do a search of the Web and ﬁnd a site that lists the populations of various cities
in the world (not necessarily the largest ones) or the cities of your state or region. The quantity
of interest is the ﬁrst digit of each population. Alternatively, scan the ﬁrst page of your local
newspaper and record the ﬁrst digit of each of the numbers you ﬁnd. (The ﬁrst digit of a number
such as 0.00123 is 1.) What is the probability P (n) that the ﬁrst digit is n, where n = 1, . . . , 9?
Do you think that P (n) will be the same for all n?
It turns out that the form of the probability P (n) is given by
P (n) = log10 1 + 1
.
n (3.201) CHAPTER 3. CONCEPTS OF PROBABILITY 134 The distribution (3.201) is known as Benford’s law and is named after Frank Benford, a physicist.
It implies that for certain data sets, the ﬁrst digit is distributed in a predictable pattern with a
higher percentage of the numbers beginning with the digit 1. What are the numerical values of
P (n) for the diﬀerent values of n? Is P (n) normalized? Suggest a hypothesis for the nonuniform
nature of the Benford distribution.
Accounting data is one of the many types of data that is expected to follow the Benford
distribution. It has been found that artiﬁcial data sets do not have ﬁrst digit patterns that follow
the Benford distribution. Hence, the more an observed digit pattern deviates from the expected
Benford distribution, the more likely the data set is suspect. Tax returns have been checked in
this way.
The frequencies of the ﬁrst digit of 2000 numerical answers to problems given in the back of
four physics and mathematics textbooks have been tabulated and found to be distributed in a way
consistent with Benford’s law. Benford’s Law is also expected to hold for answers to homework
problems (see James R. Huddle, “A note on Benford’s law,” Math. Comput. Educ. 31, 66 (1997)).
The nature of Benford’s law is discussed by T. P. Hill, “The ﬁrst digit phenomenon,” Amer. Sci.
86, 358–363 (1998).
∗ Problem 3.85. Ask several of your friends to ﬂip a coin 200 times and record the results or
pretend to ﬂip a coin and fake the results. Can you tell which of your friends faked the results?
Hint: What is the probability that a sequence of six heads in a row will occur? Can you suggest
any other statistical tests? ∗ Problem 3.86. Analyze a text and do a ranking of the word frequencies. The word with rank
r is the rth word when the words of the text are listed with decreasing frequency. Make a loglog
plot of word frequency f versus word rank r. The relation between word rank and word frequency
was ﬁrst stated by George Kingsley Zipf (1902–1950). This relation states that to a very good
approximation for a given text
1
,
(3.202)
f∼
r ln(1.78R)
where R is the number of diﬀerent words. Note the inverse power law behavior. The relation
(3.202) is known as Zipf ’s law. The top 20 words in an analysis of a 1.6 MB collection of 423 short
Time magazine articles (245,412 term occurrences) are given in Table 3.8. ∗ Problem 3.87. If you receive an email, how long does it take for you to respond to it? If you
keep a record of your received and sent mail, you can analyze the distribution of your response
times – the number of days (or hours) between receiving an email from someone and then replying. It turns out that the time it takes people to reply to emails can be described by a power law;
a
that is, P (τ ) ∼ τ −a with a ≈ 1. Oliveira and Barab´si have shown that the response times of
Einstein and Darwin to letters can also be described by a power law, but with an exponent a ≈ 3/2
(see J. G. Oliveira and A.L. Barab´si, “Darwin and Einstein correspondence patterns,” Nature
a
437, 1251 (2005). Their results suggest that there is an universal pattern for human behavior in
response to correspondence. What is the implication of a power law response?
Problem 3.88. Three cards are in a hat. One card is white on both sides, the second is white
on one side and red on the other, and the third is red on both sides. The dealer shuﬄes the
cards, takes one out and places it ﬂat on the table. The side showing is red. The dealer now CHAPTER 3. CONCEPTS OF PROBABILITY
1
2
3
4
5
6
7
8
9
10 the
of
to
a
and
in
that
for
was
with 15861
7239
6331
5878
5614
5294
2507
2228
2149
1839 135
11
12
13
14
15
16
17
18
19
20 his
is
he
as
on
by
at
it
from
but 1839
1810
1700
1581
1551
1467
1333
1290
1228
1138 Table 3.8: Ranking of the top 20 words.
says, “Obviously this card is not the whitewhite card. It must be either the redwhite card or the
redred card. I will bet even money that the other side is red.” Is this bet fair?
Problem 3.89. Estimate the probability that an asteroid will impact the earth and cause major
damage. Does it make sense for society to take steps now to guard itself against such an occurrence?
∗ Problem 3.90. The likelihood of the breakdown of the levees near New Orleans was well known
before their occurrence on August 30, 2005. Discuss the various reasons why the decision was
made not to strengthen the levees. Relevant issues include the ability of people to think about
the probability of rare events and the large amount of money needed to strengthen the levees to
withstand such an event. ∗ Problem 3.91. Does capital punishment deter murder? Are vegetarians more likely to have
daughters? Does it make sense to talk about a “hot hand” in basketball? Are the digits of
π random? Visit <chance.dartmouth.edu/chancewiki/> and <www.dartmouth.edu/~chance/>
and read about other interesting issues involving probability and statistics. ∗ Problem 3.92. D’Agostini has given a simple example where it is clear that determining the
probability of various events using all the available information is more appropriate than estimating
the probability from a relative frequency. [xx not ﬁnished xx] ∗ Problem 3.93. A doctor has two drugs, A and B , which she can prescribe to patients with a
certain illness. The drugs have been rated in terms of their eﬀectiveness on a scale of 1 to 6, with
1 being the least eﬀective and 6 being the most eﬀective. Drug A is uniformly eﬀective with a
value of 3. The eﬀectiveness of drug B is variable and 54% of the time it scores a value of 1, and
46% of the time it scores a value of 5. The doctor wishes to provide her patients with the best
possible care and asks her statistician friend which drug has the highest probability of being the
most eﬀective. The statistician says, “It is clear that drug A is the most eﬀective drug 54% of the
time. Thus drug A is your best bet.”
Later a new drug C becomes available. Studies show that on the scale of 1 to 6, 22% of
the time this drug scores a 6, 22% of the time it scores a 4, and 56% of the time it scores a 2.
The doctor, again wishing to provide her patients with the best possible care, goes back to her
statistician friend and asks him which drug has the highest probability of being the most eﬀective. CHAPTER 3. CONCEPTS OF PROBABILITY 136 The statistician says, ”Because there is this new drug C on the market, your best bet is now drug
B , and drug A is your worst bet.” Show that the statistician is right. Suggestions for Further Reading
Vinay Ambegaokar, Reasoning About Luck, Cambridge University Press (1996). A book developed for a course intended for nonscience majors. An excellent introduction to statistical
reasoning and its uses in physics.
Peter L. Bernstein, Against the Gods: The Remarkable Story of Risk, John Wiley & Sons (1996).
The author is a successful investor and an excellent writer. The book includes an excellent
summary of the history of probability.
David S. Betts and Roy E. Turner, Introductory Statistical Mechanics, AddisonWesley (1992).
Section 3.4 is based in part on Chapter 3 of this text.
The <www.dartmouth.edu/~chance/> Chance database encourages its users to apply statistics
to everyday events.
Giulio D’Agostini, “Teaching statistics in the physics curriculum: Unifying and clarifying role of
subjective probability,” Am. J. Phys. 67, 1260–1268 (1999). The author, whose main research
interest is in particle physics, discusses subjective probability and Bayes’ theorem. Section 3.4
is based in part on this article.
See <www.math.uah.edu/stat/objects/> for an excellent simulation of the Galton board.
Gene F. Mazenko, Equilibrium Statistical Mechanics, John Wiley & Sons (2000). Sections 1.7 and
1.8 of this graduate level text discuss the functional form of the missing information.
Elliott W. Montroll and Michael F. Shlesinger, “On the wonderful world of random walks,” in
Studies in Statistical Mechanics, Vol. XI: Nonequilibrium Phenomena II, J. L. Lebowitz and
E. W. Montroll, eds., NorthHolland (1984).
Elliott W. Montroll and Wade W. Badger, Introduction to Quantitative Aspects of Social Phenomena, Gordon and Breach (1974). The applications of probability that are discussed include
traﬃc ﬂow, income distributions, ﬂoods, and the stock market.
Richard Perline, “Zipf’s law, the central limit theorem, and the random division of the unit
interval,” Phys. Rev. E 54, 220–223 (1996).
S. Redner, “Random multiplicative processes: An elementary tutorial,” Am. J. Phys. 58, 267–273
(1990).
Charles Ruhla, The Physics of Chance, Oxford University Press (1992).
B. Schmittmann and R. K. P. Zia, “‘Weather’ records: Musings on cold days after a long hot
Indian summer,” Am. J. Phys. 67, 1269–1276 (1999). A relatively simple introduction to the
statistics of extreme values. Suppose that some breaks the record for the 100 meter dash.
How long do records typically survive before they are broken? CHAPTER 3. CONCEPTS OF PROBABILITY 137 Kyle Siegrist at the University of Alabama in Huntsville has developed many applets to illustrate
concepts in probability and statistics. See <www.math.uah.edu/stat/> and follow the link
to Bernoulli processes.
G. Troll and P. beim Graben, “Zipf’s law is not a consequence of the central limit theorem,”
Phys. Rev. E 57, 1347–1355 (1998).
Charles A. Whitney, Random Processes in Physical Systems: An Introduction to ProbabilityBased Computer Simulations, John Wiley & Sons (1990).
R. K. P. Zia and B. Schmittmann, “Watching a drunkard for 10 nights: A study of distributions
of variances,” Am. J. Phys. 71, 859 (2003). See Problem 3.73. Chapter 4 The Methodology of Statistical
Mechanics
c 2006 by Harvey Gould and Jan Tobochnik
11 January 2006
We develop the basic methodology of statistical mechanics and provide a microscopic foundation
for the concepts of temperature and entropy. 4.1 Introduction We ﬁrst discuss a simple example to make explicit the probabilistic assumptions and types of
calculations that we do in statistical mechanics.
Consider an isolated system of N = 5 noninteracting with magnetic moment µ and spin 1/2
in a magnetic ﬁeld B . If the total energy E = −µB , what is the mean magnetic moment of a
given spin in the system? The essential steps needed to analyze this system can be summarized as
follows.
1. Specify the macrostate and accessible microstates of the system. The macroscopic state or
macrostate of the system corresponds to the information that we know. For this example the
observable quantities are the total energy E , the magnetization M , the number of spins N , and
the external magnetic ﬁeld B . (Because the spins are noninteracting, it is redundant to specify
both M and E .)
The most complete speciﬁcation of the system corresponds to a enumeration of the microstates
or conﬁgurations of the system. For N = 5, there are 25 = 32 microstates, each speciﬁed by the
orientation of each spin. However, not all of the 32 microstates are consistent with the information
that E = −µB . For example, E = −5µB for the microstate shown in Figure 4.1a is not allowed,
that is, such a state is inaccessible. The accessible microstates of the system are those that are
consistent with the macroscopic conditions. In this example, ten of the thirtytwo total microstates
are accessible (see Figure 4.1b).
138 CHAPTER 4. STATISTICAL MECHANICS 139 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (a)
(b)
Figure 4.1: (a) Example of an inaccessible macrostate for the ensemble speciﬁed by E = −µB, N =
5. (b) The ten accessible members of the ensemble with E = −µB and N = 5. Spin 1 is the left
most spin.
2. Choose the ensemble. We calculate averages by preparing a collection of identical systems all
of which satisfy the macroscopic conditions E = −µB and N = 5. In this example the ensemble
consists of ten systems each of which is in one of the ten accessible microstates.
What can we say about the relative probability of ﬁnding the system in one of the ten accessible
microstates? Because the system is isolated and each microstate is consistent with the speciﬁed
macroscopic information, we assume that each microstate in the ensemble is equally likely. This
assumption of equal a priori probabilities implies that the probability pn that the system is in
microstate n is given by
1
pn = ,
(4.1)
Ω
where Ω represents the number of microstates of energy E . This assumption is equivalent to the
principle of least bias or maximum uncertainty that we discussed in Section 3.4.1. For our example,
we have Ω = 10, and the probability that the system is in any one of its accessible microstates is
1/10.
3. Calculate the mean values and other statistical properties. As an example of a probability
calculation, we calculate the mean value of the orientation of spin 1 (see Figure 4.1b). Because s1
assumes the value ±1, we have
10 s1 = p n sn
n=1 1
(+1) + (+1) + (+1) + (−1) + (+1) + (+1) + (−1) + (+1) + (−1) + (−1)
10
2
1
=
=.
10
5
= (4.2) The sum is over all the accessible microstates and sn is the value of spin 1 in the nth member of
the ensemble. We see from (4.2) that the mean value of s1 is s1 = 1/5.
Problem 4.1. (a) What is the mean value of spin 2 in the above example? (b) What is the mean
magnetic moment per spin? (c) What is the probability p that a given spin points up? (d) What
is the probability that if spin 1 is up, spin 2 also is up? CHAPTER 4. STATISTICAL MECHANICS 140 Of course there is a more direct way of calculating s1 in this case. Because M = 1, six out of
the ten spins are up. The equivalency of the spins implies that the probability of a spin being up
is 6/10. Hence, s = (3/5)(1) + (2/5)(−1) = 1/5. What is the implicit assumption that we made in
the more direct method?
Problem 4.2. (a) Consider N = 4 noninteracting spins with magnetic moment µ and spin 1/2
in a magnetic ﬁeld B . If the total energy E = −2µB , what are the accessible microstates and the
probabilities that a particular spin has a magnetic moment ±µ? (b) Consider N = 9 noninteracting
spins with total energy E = −µB . What is the net number of up spins, the number of accessible
microstates, and the probabilities that a particular spin has magnetic moment ±µ? Compare these
probabilities to the analogous ones calculated in part (a).
Problem 4.3. Consider a onedimensional ideal gas consisting of N = 5 particles each of which
has the same speed v , but velocity ±v. The velocity of each particle is independent. What is the
probability that all the particles are moving in the same direction?
The model of noninteracting spins that we have considered is an example of an isolated system,
that is, a system with ﬁxed E , B , and N . In general, an isolated system cannot exchange energy
or matter with its surroundings nor do work on another system. The macrostate of such a system
is speciﬁed by E , V , and N (B instead of V for a magnetic system). Our strategy is to ﬁrst
understand how to treat isolated systems. Conceptually, isolated systems are simpler because all
the accessible microstates have the same probability (see Section 4.5). 4.2 A simple example of a thermal interaction We will now consider some model systems that can exchange energy with another system. This
exchange has the eﬀect of relaxing one of the internal constraints and, as we will see, imposing
another. We will see that for nonisolated systems, the probability of each microstate will not be
the same.
We know what happens when we place two bodies at diﬀerent temperatures into thermal
contact with one another – energy is transferred from the hotter to the colder body until thermal
equilibrium is reached and the two bodies have the same temperature. We now consider a simple
model that illustrates how statistical concepts can help us understand the transfer of energy and
the microscopic nature of thermal equilibrium.
Consider a model system of N noninteracting distinguishable particles such that the energy
of each particle is given by integer values, n = 0, 1, 2, 3, . . . We can distinguish the particles
by their colors. (Or we can assume that the particles have the same color, but are ﬁxed on lattice
sites.) For reasons that we will discuss in Section 6.12, we will refer to this model system as an
Einstein solid.1
Suppose that we have an Einstein solid with N = 3 particles (with color red, white, and blue)
in an isolated box and that their total energy is E = 3. For these small values of N and E , we can
enumerate the accessible microstates by hand. The ten accessible microstates of this system are
particles are equivalent to the quanta of the harmonic oscillator, which have energy En = (n + 1 ) ω . If
2
we measure the energies from the lowest energy state, 1 ω , and choose units such that ω = 1, we have n = n.
2
1 These CHAPTER 4. STATISTICAL MECHANICS
microstate
1
2
3
4
5
6
7
8
9
10 141
red
1
2
2
1
1
0
0
3
0
0 white
1
0
1
0
2
1
2
0
3
0 blue
1
1
0
2
0
2
1
0
0
3 Table 4.1: The ten accessible microstates of a system of N = 3 distinguishable particles with total
energy E = 3. Each particle may have energy 0, 1, 2, . . .
shown in Table 4.1. If the ten accessible microstates are equally probable, what is the probability
that if one particle has energy 1, another particle has energy 2?
Problem 4.4. Consider an Einstein solid composed of N particles with total energy E . It can be
shown that the general expression for the number of microstates of this system is
Ω= (E + N − 1)!
.
E ! (N − 1)! (4.3) (a) Verify that this expression yields the correct answers for the case N = 3 and E = 3. (b) What
is the number of microstates for an Einstein solid with N = 4 and E = 6?
Now that we know how to enumerate the number of microstates for an Einstein solid, consider
an isolated system of N = 4 particles that is divided into two subsystems surrounded by insulating,
rigid, impermeable outer walls and separated by a similar partition (see Figure 4.2). Subsystem
A consists of two particles, R (red) and G (green), with EA = 5; subsystem B consists of two
particles, B (black) and W (white), with energy EB = 1. The total energy E of the composite
system consisting of subsystem A plus subsystem B is
E = EA + EB = 5 + 1 = 6. (4.4) The accessible microstates for the composite system are shown in Table 4.2. We see that subsystem
A has ΩA = 6 accessible microstates and subsystem B has ΩB = 2 accessible microstates. The
total number of microstates Ω accessible to the composite system is
Ω = ΩA × ΩB = 6 × 2 = 12. (4.5) The partition is an internal constraint that prevents the transfer of energy from one subsystem to
another and in this case keeps EA = 5 and EB = 1. (The internal constraint also keeps the volume
and number of particles in each subsystem ﬁxed.)
We now consider a simple example of a thermal interaction. Suppose that the insulating,
rigid, impermeable partition separating subsystems A and B is changed to a conducting, rigid,
impermeable partition (see Figure 4.2). The partition maintains the volumes VA and VB , and CHAPTER 4. STATISTICAL MECHANICS 142 conducting, rigid, impermeable wall insulating, rigid, impermeable wall
subsystem B subsystem A B R B R G G W EA = 5 subsystem B subsystem A W
EA + EB = 6 EB = 1 (a) (b) Figure 4.2: Two subsystems, each with two distinguishable particles, surrounded by (a) insulating,
rigid, and impermeable outer walls and (b) separated by a conducting, rigid, and impermeable
wall. The other walls remain the same.
EA
5 accessible microstates
5,0
0,5
4,1
1,4
3,2
2,3 EB accessible microstates
1,0
0, 1 1 Table 4.2: The 12 equally probable microstates accessible to subsystems A and B before the
removal of the internal constraint. The conditions are NA = 2, EA = 5, NB = 2, and EB = 1. hence the single particle energy eigenvalues are not changed. Because the partition is impermeable,
the particles cannot penetrate the partition and go from one subsystem to the other. However,
energy can be transferred from one subsystem to the other, subject only to the constraint that
the total energy of subsystems A and B is constant, that is, E = EA + EB = 6. The microstates
of subsystems A and B are listed in Table 4.3 for all the possible values of EA and EB . The
total number of microstates Ω(EA , EB ) accessible to the composite system whose subsystems have
energy EA and EB is
Ω(EA , EB ) = ΩA (EA ) × ΩB (EB ).
(4.6)
For example. if EA = 4 and EB = 2, then subsystem A can be in any one of ﬁve microstates and
subsystem B can be in any of three microstates. These two sets of microstates of subsystems A
and B can be combined to give 5 × 3 = 15 microstates of the composite system.
The total number of microstates Ω accessible to the composite system can be found by summing ΩA (EA )ΩB (EB ) over the possible values of EA and EB consistent with the condition that
EA + EB = 6. Hence, Ω can be expressed as
ΩA (EA )ΩB (E − EA ). Ω=
EA (4.7) CHAPTER 4. STATISTICAL MECHANICS 143 From Table 4.3 we see that
Ω = (7 × 1) + (6 × 2) + (5 × 3) + (4 × 4) + (3 × 5) + (2 × 6) + (1 × 7) = 84.
EA
6 5 4
3
2 1 0 microstates
6,0
0,6
5,1
1,5
4,2
2,4
3,3
5,0
0,5
4,1
1,4
3,2
2,3
4,0
0,4
3,1
1,3
2,2
3,0
0,3
2,1
1,2
2,0
0,2
1,1
1,0 ΩA (EA )
7 EB
0 ΩB (EB )
1 ΩA ΩB
7 1,0 6 microstates
0,0 0,1 2 12 2,0
1,1 0,2 3 15 3,0
2,1
4,0
3,1
2,2
5,0
4,1
3,2
6,0
5,1
4,2
3,3 0,3
1,2
0,4
1,3 4 16 5 15 0,5
1,4
2,3
0,6
1,5
2,4 6 12 7 (4.8) 7 1
5
2
4 3 3 4 0,1 2 5 0,0 1 6 Table 4.3: The 84 equally probable microstates accessible to subsystems A and B after the removal
of the internal constraint. The total energy is E = EA + EB = 6 with NA = 2 and NB = 2.
Because the composite system is isolated, its accessible microstates are equally probable, that
is, the composite system is equally likely to be in any one of its 84 accessible microstates. An
inspection of Table 4.3 shows that the probability that the composite system is in any one of the
microstates in which EA = 2 and EB = 4 is 15/84. Let PA (EA ) be the probability that subsystem
A has energy EA . Then PA (EA ) is given by
PA (EA ) = ΩA (EA )ΩB (E − EA )
.
Ω (4.9) We show in Table 4.4 and Figure 4.3 the various values of PA (EA ).
The mean energy of subsystem A is found by doing an ensemble average over the 84 microstates
accessible to the composite system. We have that
EA = 0 × 12
15
16
15
12
7
7
+ 1×
+ 2×
+ 3×
+ 4×
+ 5×
+ 6×
= 3. (4.10)
84
84
84
84
84
84
84 CHAPTER 4. STATISTICAL MECHANICS
EA
6
5
4
3
2
1
0 ΩA (EA )
7
6
5
4
3
2
1 144 ΩB (6 − EA )
1
2
3
4
5
6
7 ΩA ΩB
7
12
15
16
15
12
7 PA (EA )
7/84
12/84
15/84
16/84
15/84
12/84
7/84 Table 4.4: The probability PA (EA ) that subsystem A has energy EA .
0.20 P(EA) 0.15 0.10 0.05 0.00
0 1 2 3 4 5 6 EA
Figure 4.3: The probability PA (EA ) that subsystem A has energy EA . The line between the points
is only a guide to the eye.
˜
In this simple case the mean value of EA is equal to EA , the energy corresponding to the most
probable value of PA (EA ).
Note that the total number of microstates accessible to the composite system increases from
12 to 84 when the internal constraint is removed. From the microscopic point of view, it is
clear that the total number of microstates must either remain the same or increase when an
internal constraint is removed. Because the number of microstates becomes a very large number
for macroscopic systems, it is convenient to work with the logarithm of the number of microstates.
We are thus led to deﬁne the quantity S by the relation
S = k ln Ω, (4.11) where k is an arbitrary constant. Note the similarity to the expression for the missing information
on page 95. We will later identify the quantity S that we have introduced in (4.11) with the
thermodynamic entropy we discussed in Chapter 2. CHAPTER 4. STATISTICAL MECHANICS 145 Although our simple model has only four particles, we can ask questions that are relevant
to much larger systems. For example, what is the probability that energy is transferred from
the “hotter” to the “colder” subsystem? Given that EA = 5 and EB = 1 initially, we see from
Table 4.4 that the probability of subsystem A gaining energy when the internal constraint is
removed is 7/84. The probability that its energy remains unchanged is 12/84. In the remaining
65/84 cases, subsystem A loses energy and subsystem B gains energy. We expect that if the two
subsystems had a larger number of particles, the overwhelming probability would be that that
energy goes from the hotter to the colder subsystem.
Problem 4.5. Consider two Einstein solids with NA = 3 and EA = 4 and NB = 4 and EB = 2
initially. The two systems are thermally isolated from one another. Use the relation (4.3) to
determine the initial number of accessible microstates for the composite system. Then remove the
internal constraint so that the two subsystems may exchange energy. Determine the probability
˜
˜
PA (EA ) that system A has energy EA , the most probable energies EA and EB , the probability
that energy goes from the hotter to the colder system, and the mean and variance of the energy of
each subsystem. Plot PA versus EA and discuss its qualitative energy dependence. Make a table
similar to the one in Table 4.3, but do not list the microstates explicitly.
Problem 4.6. The applet/application at <stp.clarku.edu/simulations/einsteinsolid> determines the number of accessible microstates of an Einstein solid using (4.3) and will help you
answer the following questions. Suppose that initially system A has NA = 4 particles with energy
EA = 10 and system B has NB = 4 particles with energy EB = 2. Initially, the two systems
are thermally isolated from one another. The initial number of states accessible to subsystem
A is given by ΩA = 13!/(10! 3!) = 286, and the initial number of states accessible to subsystem
B is ΩB = 5!/(2! 3!) = 10. Then the internal constraint is removed so that the two subsystems
may exchange energy. (a) Determine the probability PA (EA ) that system A has energy EA , the
˜
˜
most probable energies EA and EB , the mean and variance of the energy of each subsystem, and
the probability that energy goes from the hotter to the colder system. (b) Plot PA versus EA
and discuss its qualitative energy dependence. (c) What is the number of accessible microstates
for the (composite) system after the internal constraint has been removed? What is the total
entropy (choose units such that k = 1)? What is the change in the total entropy of the system? (d) The entropy of the composite system when each subsystem is in its most probable
˜
˜
macrostate is given by k ln ΩA (EA )ΩB (E − EA ). Compare this contribution to the total entropy,
k EA ln ΩA (EA )ΩB (E − EA ). (e) Increase NA , NB , and the total energy by a factor of ten, and
discuss the qualitative changes in the various quantities of interest. Consider successively larger
systems until you have satisﬁed yourself that you understand the qualitative behavior of the various
quantities. Use Stirling’s approximation (3.89) to calculate the entropies in part (e).
Problem 4.7. Suppose that system A is an Einstein solid with NA = 8 particles and system B
consists of NB = 8 noninteracting spins that can be either up or down. The external magnetic ﬁeld
is such that µB = 1/2. (The magnitude of µB has been chosen so that the changes in the energy
of system B are the same as system A, that is, ∆E = ±1.) The two systems are initially isolated
and the initial energies are EA = 4 and EB = 4. What is the initial entropy of the composite
system? Use the fact that ΩB = NB !/(n! (NB − n)!), where n is the number of up spins in system
B (see Section 3.5). Remove the internal constraint and allow the two systems to exchange energy.
Determine the probability PA (EA ) that system A has energy EA , the mean and variance of the
˜
˜
energy of each subsystem, the most probable energies EA and EB , and the probability that energy
goes from the hotter to the colder system. What is the change in the total entropy of the system? CHAPTER 4. STATISTICAL MECHANICS 146 From our examples, we conclude that we can identify thermal equilibrium with the most
probable macrostate and the entropy with the logarithm of the number of accessible microstates,
We also found that the probability P (E ) that a system has energy E is approximately a Gaussian
if the system is in thermal equilibrium with a much bigger system. What quantity can we identify
with the temperature? The results of Problem 4.7 and the example in (4.12) should convince you,
if you were not convinced already, that in general, this quantity is not same as the mean energy
per particle of the two systems.
Let’s return to the Einstein solid and explore the energy dependence of the entropy. Consider
a system with NA = 3, NB = 4, and total energy E = 10. The number of microstates for the two
systems for the various possible values of EA are summarized in Table 4.5. We see that that the
˜
˜
most probable energies and hence thermal equilibrium corresponds to EA = 4 and EB = 6. Note
˜A = EB . The mean energy of system A is given by
˜
that E
1
[(10 × 66) + (9 × 220) + (8 × 450) + (7 × 720) + (6 × 980) + (5 × 1176)
8008
+ (4 × 1260) + (3 × 1200) + (2 × 990) + (1 × 660) + (0 × 286)]
= 34320/8008 = 4.286.
(4.12) EA = ˜
In this case we see that E A = EA .
In this example, the quantity that is the same for both systems in thermal equilibrium is not
the most probable energy nor the mean energy. (In this case, the energy per particle of the two
systems is the same, but this equality does not hold in general.) In general, what quantity is the
same for system A and B at equilibrium? From our understanding of thermal equilibrium, we know
that this quantity must be the temperature. In columns 5 and 10 of Table 4.5 we show the inverse
slope of the entropy of systems A and B calculated from the central diﬀerence approximation for
the slope at E :
[S (E + ∆E ) − S (E − ∆E )]
1
≈
.
(4.13)
T (E )
2∆E
(We have chosen units such that Boltzmann’s constant k = 1.) We see that the inverse slopes are
˜
approximately equal at EA = EA = 4, corresponding to the value of the most probable energy.
(For this small system, the entropy of the composite system is not simply equal to the sum of the
entropies of the most probable macrostate, and we do not expect the slopes to be precisely equal.
To obtain more insight into how temperature is related to the slope of the entropy, we look at
an energy away from equilibrium, say EA = 2 in Table 4.5. Note that the slope of SA (EA = 2),
0.60, is steeper than the slope, 0.30, of SB (EB = 8), which means that if energy is passed from A
to B , the entropy gained by A will be greater than the entropy lost by B , and the total entropy
would increase. Because we know that the entropy is a maximum in equilibrium and energy
is transferred spontaneously from “hot” to “cold,” a steeper slope must correspond to a lower
temperature. This reasoning suggests that the temperature is associated with the inverse slope of
the energy dependence of the entropy. As we discussed in Chapter 2 the association of the inverse
temperature with the energy derivative of the entropy is more fundamental than the association
of the temperature with the mean kinetic energy per particle.
Problem 4.8. The applet/application at <stp.clarku.edu/simulations/entropy/> computes
the entropies of two Einstein solids in thermal contact. Explore the eﬀect of increasing the values CHAPTER 4. STATISTICAL MECHANICS
EA
10
9
8
7
6
5
4
3
2
1
0 ΩA (EA )
66
55
45
36
28
21
15
10
6
3
1 ln ΩA (EA )
4.19
4.01
3.81
3.58
3.33
3.05
2.71
2.30
1.79
1.10
0 −
TA 1
na
0.19
0.21
0.24
0.27
0.31
0.37
0.46
0.60
0.90
na TA
na
5.22
4.72
4.20
3.71
3.20
2.70
2.18
1.66
1.11
na EB
0
1
2
3
4
5
6
7
8
9
10 147
ΩB (EB )
1
4
10
20
35
56
84
120
165
220
286 ln ΩA (EA )
0
1.39
2.30
3.00
3.56
4.03
4.43
4.79
5.11
5.39
5.66 −
TB 1
na
1.15
0.80
0.63
0.51
0.44
0.38
0.34
0.30
0.28
na TB
na
0.87
1.24
1.60
1.94
2.28
2.60
2.96
3.30
3.64
na ΩA ΩB
66
220
450
720
980
1176
1260
1200
990
660
286 Table 4.5: The number of states for subsystems A and B for total energy E = EA + EB = 10
with NA = 3 and NB = 4. The number of states was determined using (4.3). There are a total
˜
of 8008 microstates. Note that most probable energy of subsystem A is ES = 4 and the fraction
of microstates associated with the most probable macrostate is 1260/8008 ≈ 0.157. This relative
fraction will approach unity as the number of particles in the systems become larger.
of NA , NB , and the total energy E . Discuss the qualitative dependence of SA , SB , and Stotal
on the energy EA . In particular, explain why SA is an increasing function of EA and SB is a
decreasing function of EA . Given this dependence of SA and SB on EA , why does Stotal have a
maximum at a particular value of EA ?
The interested reader may wish to skip to Section 4.5 where we will formally develop the
relations between the number of accessible microstates of an isolated system to various quantities
including the entropy and the temperature.
Boltzmann probability distribution. We next consider the same model system in another
physical context. Consider an isolated Einstein solid of six particles with total energy E = 12.
We focus our attention on one of the particles and consider it to be a subsystem able to exchange
energy with the other ﬁve particles. This example is similar to the ones we have considered, but in
this case the subsystem consists of only one particle. The quantity of interest is the mean energy
of the subsystem and the probability pn that the subsystem is in state n with energy n = n. The
number of ways that the subsystem can be in state n is unity because the subsystem consists of
only one particle. So for this special subsystem, there is a onetoone correspondence between the
quantum state and the energy of a microstate.
The number of accessible microstates of the composite system is shown in Table 4.6 using the CHAPTER 4. STATISTICAL MECHANICS
microstate n
12
11
10
9
8
7
6
5
4
3
2
1
0 n 12
11
10
9
8
7
6
5
4
3
2
1
0 E− n
0
1
2
3
4
5
6
7
8
9
10
11
12 148
ΩB
4!/(0! 4!) =
1
5!/(1! 4!) =
5
6!/(2! 4!) = 15
7!/(3! 4!) = 35
8!/(4! 4!) = 70
9!/(5! 4!) = 126
10!/(6! 4!) = 210
11!/(7! 4!) = 330
12!/(8! 4!) = 495
13!/(9! 4!) = 715
14!/(10! 4!) = 1001
15!/(11! 4!) = 1365
16!/(12! 4!) = 1820 pn
0.00016
0.00081
0.00242
0.00566
0.01131
0.02036
0.03394
0.05333
0.07999
0.11555
0.16176
0.22059
0.29412 Table 4.6: The number of microstates accessible to a subsystem of one particle that can exchange
energy with a system of ﬁve particles. The subsystem is in microstate n with energy n = n. The
third column is the energy of the system of N = 5 particles. The total energy of the composite
system is E = 12. The total number of microstates is 6188. relation (4.3). From Table 4.6 we can determine the mean energy of the subsystem of one particle:
12 = n pn
n=0 = 1
(0 × 1820) + (1 × 1365) + (2 × 1001) + (3 × 715) + (4 × 495)
6188
+ (5 × 330) + (6 × 210) + (7 × 126) + (8 × 70)
+ (9 × 35) + (10 × 15) + (11 × 5) + (12 × 1) = 2 . (4.14) The probability pn that the subsystem is in microstate n is plotted in Figure 4.4. Note that
pn decreases monotonically with increasing energy. A visual inspection of the energy dependence
of pn in Figure 4.4 indicates that pn can be approximated by an exponential of the form
pn = 1 −β n
e
,
Z (4.15) where n = n in this example and Z is a normalization constant. Given the form (4.15), we
can estimate the parameter β from the slope of ln pn versus n . The result is that β ≈ 0.57. The
interested reader might wish to skip to Section 4.6 to read about the generalization of these results.
Problem 4.9. Consider an Einstein solid with NA = 1 and NB = 3 with a total energy E = 6.
(A similar system of four particles was considered on page 142.) Calculate the probability pn that
system A is in microstate n. Why is this probability the same as the probability that the system
has energy n ? Is pn a decreasing or increasing function of n ? CHAPTER 4. STATISTICAL MECHANICS 149 0.35 Ps 0.30
0.25
0.20
0.15
0.10
0.05
0
0 2 4 6 8 10 12 εs
Figure 4.4: The probability pn for the subsystem to be in state n with energy n = n. The
subsystem can exchange energy with a system of N = 5 particles. The total energy of the composite
system of six particles is E = 12. The circles are the values of pn given in Table 4.6. The continuous
line corresponds to pn calculated from (4.15) with β = 0.57. Problem 4.10. From Table 4.3 determine the probability pn that system A is in microstate n
with energy En for the diﬀerent possible energies of A. (The microstate n corresponds to the state
of system A.) What is the qualitative dependence of pn on En , the energy of the microstate?
Problem 4.11. Use the applet/application at <stp.clarku.edu/simulations/einsteinsolid>
to compute the probability pn that a subsystem of one particle is in microstate n, assuming that
it can exchange energy with an Einstein solid of N = 11 particles. The total energy of the two
systems is E = 36. (The total number of particles in the composite system is 12.) Compare
your result for pn to the form (4.15) and compute the parameter β from a semilog plot. Also
determine the mean energy of the subsystem of one particle and show that it is given by ≈ 1/β .
Calculate the constant Z by normalizing the probability and show that Z is given approximately
by Z = (1 − e−β )−1 . We will generalize the results we have found here in Example 4.4.
Problem 4.12. (a) Explain why the probability pn (En ) that system A is in microstate n with
energy En is a monotonically decreasing function of En , given that the system is in thermal contact
with a much larger system. (b) Explain why the probability PA (EA ) that system A has energy EA
has a Gaussianlike form. (c) What is the diﬀerence between P (EA ) and pn (En )? Why do these
two probabilities have qualitatively diﬀerent dependencies on the energy?
Problem 4.13. (a) Consider an Einstein solid of N = 10 distinguishable oscillators. How does
the total number of accessible microstates Ω(E ) change for E = 10, 102 , 103 , . . .? Is Ω(E ) a rapidly
increasing function of E for ﬁxed N ? (b) Is Ω a rapidly increasing function of N for ﬁxed E ? CHAPTER 4. STATISTICAL MECHANICS 4.3 150 Counting microstates In the examples we have considered so far, we have seen that the most time consuming task is
enumerating (counting) the number of accessible microstates for a system of ﬁxed energy and
number of particles. We now discuss how to count the number of accessible microstates for several
other systems of interest. 4.3.1 Noninteracting spins We ﬁrst reconsider an isolated system of N noninteracting spins with spin 1/2 and magnetic
moment µ in an external magnetic ﬁeld B . Because we can distinguish spins at diﬀerent lattice
sites, a particular state or conﬁguration of the system is speciﬁed by giving the orientation (up
or down) of each of the N spins. We want to ﬁnd the total number of accessible microstates
Ω(E, B, N ) for particular values of E , B , and N .
We know that if n spins are parallel to B and N − n spins are antiparallel to B , the energy
of the system is
E = n(−µB ) + (N − n)(µB ) = −(2n − N )µB.
(4.16)
For a given N and B , n speciﬁes the energy and vice versa. If we solve (4.16) for n, we ﬁnd
n= E
N
−
.
2
2µB (4.17) As we found in (3.71), the total number of microstates with energy E is given by the number of
ways n spins out of N can be up. This number is given by
Ω(n, N ) = N!
,
n! (N − n)! (4.18) where n is related to E by (4.17). We will apply this result in Example 4.2 on page 163. 4.3.2 *Onedimensional Ising model It is instructive to discuss the number of states for the onedimensional Ising model. For small
N we can determine Ω(E, N ) by counting on our ﬁngers. For example, it is easy to verify that
Ω(−2, 2) = 2 and Ω(0, 2) = 2 and Ω(−3, 3) = 2 and Ω(1, 3) = 6 using periodic boundary conditions.
It turns out that the general expression for Ω(E, N ) for the onedimensional Ising model for even
N is
N
N!
,
(i = 0, 2, 4, . . . , N )
(4.19)
Ω(E, N ) = 2
=2
i
i! (N − i)!
where E = 2i − N . We will discuss the Ising model in more detail in Chapter 5.
Problem 4.14. Verify that (4.19) gives the correct answers for N = 2 and 4. CHAPTER 4. STATISTICAL MECHANICS 151 p
pmax L x Figure 4.5: The phase space for a single particle of mass m and energy E in a onedimensional
√
box of length L. The maximum value of the momentum is pmax = 2mE . Any point within the
shaded rectangle corresponds to a microstate with energy less than or equal to E . 4.3.3 A particle in a onedimensional box Classical calculation. Consider the microstates of a single classical particle of mass m conﬁned
to a onedimensional box of length L. We know that the microstate of a particle is speciﬁed by
its position x and momentum p.2 We say that the microstate (x, p) is a point in phase space (see
Figure 4.5).
As in Section 4.3.1, we want to calculate the number of microstates of the system with energy
E . Because the values of the position and momenta of a particle are continuous variables, this
question is not meaningful and instead we will determine the quantity g (E )∆E , the number of
microstates between E and E + ∆E ; the quantity g (E ) is the density of states. However, it is
easier to ﬁrst calculate Γ(E ), the number of microstates of the system with energy less than or
equal to E . Then the number of microstates between E and E + ∆E , g (E )∆E , is related to Γ(E )
by
dΓ(E )
∆E.
(4.20)
g (E )∆E = Γ(E + ∆E ) − Γ(E ) ≈
dE
If the energy of the particle is E and the dimension of the box is L, then the microstates of
the particle with energy less than or equal to E are restricted to the rectangle shown in Figure 4.5,
√
where pmax = 2mE . However, because the possible values of x and p are continuous, there are
an inﬁnite number of microstates within the rectangle! As we discussed in Section 3.6, we have
to group or bin the microstates so that we can count them, and hence we divide the rectangle in
Figure 4.5 into bins or cells of area ∆x∆p.
The area of phase space occupied by the trajectory of a particle whose position x is less than
or equal to L and whose energy is less than or equal to E is equal to 2pmax L. Hence, the number
2 We could equally well specify the velocity v rather than p, but the momentum p is the appropriate conjugate
variable to x in the formal treatment of classical mechanics. CHAPTER 4. STATISTICAL MECHANICS 152 of cells or microstates equals
Γcl (E ) = L
2pmax L
=2
(2mE )1/2 ,
∆x∆p
∆x∆p (4.21) where the values of ∆x and ∆p are arbitrary. What is the corresponding density of states?
Quantum calculation. The most fundamental description of matter at the microscopic level is
given by quantum mechanics. Although the quantum mechanical description is more abstract, we
will ﬁnd that it makes counting microstates more straightforward.
As before, we consider a single particle of mass m in a onedimensional box of length L.
According to de Broglie, a particle has wave properties associated with it, and the corresponding
standing wave has a node at the boundaries of the box. The wave function of the wave with one
antinode can be represented as in Figure 4.6; the corresponding wavelength is given by
λ = 2L. (4.22) In general, the greater the number of antinodes of the wave, the greater the energy associated with
the particle. The possible wavelengths that are consistent with the boundary conditions at x = 0
and x = L are given by
2L
,
(n = 1, 2, 3, . . .)
(4.23)
λn =
n
where the index n labels the quantum state of the particle and can be any nonzero, positive integer.
From the de Broglie relation,
h
p= ,
(4.24)
λ
and the nonrelativistic relation between the energy E and the momentum p, we ﬁnd that the
eigenvalues of the particle are given by
En = h2
p2
n2 h2
n
=
=
.
2
2m
2m λn
8mL2 (4.25) It is now straightforward to count the number of microstates with energy less than or equal
to E . The value of n for a given E is (see (4.25))
n= 2L
(2mE )1/2 .
h (4.26) Because successive microstates correspond to values of n that diﬀer by unity, the number of states
with energy less than or equal to E is given by
Γqm (E ) = n = 2L
(2mE )1/2 .
h (4.27) Unlike the classical case, the number of states Γqm (E ) for a quantum particle in a onedimensional box has no arbitrary parameters such as ∆x and ∆p. If we require that the classical
and quantum enumeration of microstates agree in the semiclassical limit,3 we see that the number
3 Note that the semiclassical limit is not equivalent to simply letting → 0. CHAPTER 4. STATISTICAL MECHANICS x=0 153 x=L Figure 4.6: Representation of the ground state wave function of a particle in a onedimensional
box. Note that the wave function equals zero at x = 0 and x = L. of microstates, Γcl (E ) and Γqm (E ), agrees for all E if we let 2/(∆x∆p) = 1/(π ). This requirement
implies that the area ∆x∆p of a cell in phase space is given by
∆x ∆p = h. (4.28) We see that Planck’s constant h can be interpreted as the volume (area for a twodimensional
phase space) of the fundamental cell in phase space. That is, in order for the counting of microstates
in the classical system to be consistent with the more fundamental counting of microstates in a
quantum system, we cannot specify a microstate of the classical system more precisely than to
assign it to a cell of area h in phase space. This fundamental limitation implies that the subdivision
of phase space into cells of volume less than h is physically meaningless, a result consistent with
the Heisenberg uncertainty principle.
Problem 4.15. Suppose that the energy of an electron in a onedimensional box of length L is
E = 144 (h2 /8mL2). How many microstates are there with energy less than or equal to this value
of E ? 4.3.4 Onedimensional harmonic oscillator The onedimensional harmonic oscillator provides another example for which we can count the
number of microstates in both the classical and quantum cases. The total energy of the harmonic
oscillator is given by
1
p2
+ kx2 ,
(4.29)
E=
2m 2
where k is the spring constant and m is the mass of the particle. CHAPTER 4. STATISTICAL MECHANICS 154 Classical calculation. The shape of the phase space area traversed by the trajectory x(t), p(t)
can be determined from (4.29) by dividing both sides by E and substituting ω 2 = k/m:
x(t)2
p(t)2
= 1.
+
2E/mω 2
2mE (4.30) where the total energy E is a constant of the motion.
From the form of (4.30) we see that the shape of phase space of a onedimensional harmonic
oscillator is an ellipse,
x2
p2
+ 2 = 1,
(4.31)
a2
b
with a2 = 2E/(mω 2 ) and b2 = 2mE . Because the area πab = 2πE/ω , the number of states with
energy less than or equal to E is given by
Γcl (E ) = 2πE
πab
=
.
∆x ∆p
ω ∆x ∆p (4.32) Quantum mechanical calculation. The energy eigenvalues of the harmonic oscillator are given
by
1
En = (n + ) ω .
(n = 0, 1, 2, . . .)
(4.33)
2
Hence the number of microstates is given by
Γqm (E ) = n = 1
E
E
−→
.
ω2
ω We see that Γqm (E ) = Γcl (E ) for all E , if 2π/(∆x ∆p) = 4.3.5 (4.34) or ∆x ∆p = h as before. One particle in a twodimensional box Consider a single particle of mass m in a rectangular box of sides Lx and Ly . The wave function
takes the form of a standing wave in two dimensions. The energy of the particle is given by
E= p2
1
=
(px 2 + py 2 ),
2m
2m (4.35) where p = k. The wave vector k satisﬁes the conditions for a standing wave:
kx = π
nx ,
Lx ky = π
ny .
Ly (nx , ny = 1, 2, 3, . . .) (4.36) The corresponding eigenvalues are given by
Enx ,ny = h2 nx 2
ny 2
+ 2.
8m Lx 2
Ly (4.37) The states of the particle are labeled by the two integers nx and ny with nx , ny > 0. The
possible values of nx , ny lie at the centers of squares of unit area as shown in Figure 4.7. For CHAPTER 4. STATISTICAL MECHANICS 155 not done Figure 4.7: The points represent possible values of nx and ny such that R2 = n2 + n2 = 102 and
x
y
nx > 0 and ny > 0. The number of states for R = 10 is 69. The corresponding number from the
asymptotic relation is Γ(E ) = π 102 /4 ≈ 78.5. simplicity, we assume that the box is square so that Lx = Ly . The values of (nx , ny ) for a given E
satisfy the condition
2L 2
(2mE ).
(4.38)
R 2 = nx 2 + ny 2 =
h
For large values of nx and ny , the values of nx and ny that correspond to states with energy less
than or equal to E lie inside the positive quadrant of a circle of radius R, where
R= 2L
(2mE )1/2 .
h (4.39) Recall that nx and ny are both positive. Hence, the number of states with energy less than or
equal to E is given by
L2
1
(4.40)
Γ(E ) = πR2 = π 2 (2mE ).
4
h
Note that V = L2 in this case.
Problem 4.16. The expression (4.40) for Γ(E ) is valid only for large E because the area of a
quadrant of a circle overestimates the number of lattice points nx , ny inside a circle of radius
R. Explore how the relation Γ = πR2 /4 approximates the actual number of microstates by
writing a program that computes the number of nonzero, positive integers that satisfy the condition
n2 + n2 ≤ R2 . Pseudocode for such a program is listed in the following:
x
y
R = 10
R2 = R*R
states = 0
do nx = 1,R
do ny = 1,R
if ((nx*nx + ny*ny) <= R2) then
states = states + 1
end if
end do
end do CHAPTER 4. STATISTICAL MECHANICS 156 What is the minimum value of R for which the diﬀerence between the asymptotic relation and the
exact count is less than 1%? 4.3.6 One particle in a threedimensional box The generalization to three dimensions is straightforward. If we assume that the box is a cube
with linear dimension L, we have
E= h2
[n2 + n2 + n2 ].
y
z
8mL2 x (4.41) The values of nx , ny , and nz that correspond to microstates with energy less than or equal to E
lie inside the positive octant of a sphere of radius R given by
R 2 = n2 + n2 + n2 =
x
y
z
Hence
Γ(E ) = π 2L
14 3
πR =
83
6h 3 2L
h 2 (2mE ). (2mE )3/2 = 4π V
(2mE )3/2 ,
3 h3 (4.42) (4.43) where we have let V = L3 .
Problem 4.17. The expression (4.43) for Γ(E ) is valid only for large E because the area of an
octant of a sphere overestimates the number of lattice points nx , ny , nz . Explore how the relation
Γ = πR3 /6 approximates the total number of microstates by writing a program that computes the
number of nonzero, positive integers that satisfy the condition n2 + n2 + n2 ≤ R2 .
x
y
z
Problem 4.18. Estimate the number of microstates accessible to a gas molecule at typical room
temperatures and pressures. We can proceed by estimating the mean energy E of a gas molecule
such as nitrogen at room temperature by using the relation E = 3N kT /2. Calculate the number of
microstates Γ(E ) with energy less than E accessible to such a molecule enclosed in a box having a
volume of one liter (103 cm3 ). Consider a small energy interval ∆E = 10−27 J that is much smaller
than E itself, and calculate the number of microstates g (E )∆E accessible to the molecule in the
range between E and E + ∆E . 4.3.7 Two noninteracting identical particles and the semiclassical limit Consider two noninteracting particles of mass m of the same species in a onedimensional box of
length L. The total energy is given by
En1 ,n2 = h2
[n2 + n2 ],
2
8mL2 1 (4.44) where the quantum numbers n1 and n2 are positive nonzero integers. However, to count the
microstates correctly, we need to take into account that particles of the same species are indistinguishable, one of the fundamental principles of quantum mechanics.
As an example of how we would count the microstates of this two particle system, suppose that
the total energy is such that n2 + n2 ≤ 25. The values of n1 and n2 that satisfy this constraint are
1
2 CHAPTER 4. STATISTICAL MECHANICS
distinguishable particles
n1
n2
1
1
2
1
1
2
2
2
3
1
1
3
3
2
2
3
3
3
4
1
1
4
4
2
2
4
4
3
3
4 157 Bose statistics
n1
n2
1
1
2
1 Fermi statistics
n1
n2 semiclassical
n1
n2 2 1 2
1 1
2 3
1
3
2 1
3
2
3 4
1
4
2
4
3 1
4
2
4
3
4 2
3 2
1 3 1 3 2 3 2 3
4 3
1 4 1 4 2 4 2 4 3 4 3 Table 4.7: The quantum numbers of two noninteracting identical particles of mass m in a onedimensional box of length L with energies such that n2 + n2 ≤ 25.
1
2 given in Table 4.7. However, the indistinguishability of the particles means that we cannot simply
assign the quantum numbers n1 and n2 subject only to the constraint that n2 + n2 ≤ 25. For
1
2
example, because the state (n1 = 1, n2 = 2) is indistinguishable from the state (n1 = 2, n2 = 1),
we can count only one of these states.
The assignment of quantum numbers is further complicated by the fact that the particles must
obey quantum statistics. We will discuss the nature of quantum statistics in Section 6.5. In brief,
the particles must obey either Bose or Fermi statistics. If the particles obey Bose statistics, then
any number of particles can be in the same single particle quantum state. However, if the particles
obey Fermi statistics, then two particles cannot be in the same single particle quantum state, and
hence the states (n1 , n2 ) = (1, 1), (2, 2), (3,3) are excluded.
Because the particles are indistinguishable, there are fewer microstates than if the particles
were distinguishable, and we might think that counting the microstates is easier. However, the
counting problem (enumerating the accessible microstates) is much more diﬃcult because we cannot
enumerate the states for each particle individually. For example, if n1 = 1, then n2 = 1. However,
the counting of states can be simpliﬁed in the semiclassical limit. Because the indistinguishability
of particles of the same species is intrinsic, the particles remain indistinguishable even as we let
h → 0. Because the classical limit corresponds to very large quantum numbers (see Problem 6.27)
and the total number of states is huge, we can ignore the possibility that two particles will be in
the same single particle quantum state and assume that the particles occupy single particle states
that are all diﬀerent. That is, in the semiclassical limit, there are many more microstates than
particles and including a few extra microstates won’t make any diﬀerence.
For the simple example summarized in Table 4.7, the assumption that every particle is in a CHAPTER 4. STATISTICAL MECHANICS 158 diﬀerent microstate implies that we can ignore the microstates (1, 1), (2, 2), and (3, 3). Hence, in
the semiclassical limit, we are left with six states (2, 1), (3, 1), (3, 2), (4, 1), (4, 2), and (4, 3) that
satisfy the criterion n2 + n2 ≤ 25.
1
2
This example illustrates how we can simplify the counting of the microstates in the semiclassical limit. We ﬁrst count the total number of microstates of the N identical particles assuming
that the particles are distinguishable. For N = 2 and the constraint that n2 + n2 ≤ 25, we would
1
2
ﬁnd 12 microstates, assuming that the two particles are in diﬀerent single particle states (see the
last column of Table 4.7). We then correct for the overcounting of the microstates due to the
indistinguishability of the particles by dividing by N !, the number of permutations of the diﬀerent
single particle states. For our example we would correct for the overcounting by dividing by the
2! ways of permuting two particles, and we obtain a total of 12/2! = 6 states. 4.4 The number of states of N noninteracting particles: Semiclassical limit We now apply these considerations to count the number of microstates of N noninteracting particles
in a threedimensional box in the semiclassical limit. A simpler way to do so that yields the correct
E and V dependence is given in Problem 4.19, but the numerical factors will not be identical to
the result of the more accurate calculation that we discuss here.
The idea is to ﬁrst count the microstates assuming that the N particles are distinguishable
and then divide by N ! to correct for the overcounting. We know that for one particle in a threedimensional box, the number of microstates with energy less than or equal to E is given by
the volume of the positive part of the threedimensional sphere of radius R (see (4.39)). For N
distinguishable particles in a threedimensional box, the number of microstates with energy less
than or equal to E is given by the volume of the positive part of the 3N dimensional hypersphere
of radius R = (2mE )1/2 (2L/h). To simplify the notation, we consider the calculation of Vn (R),
the volume of a ndimensional hypersphere of radius R and write Vn (R) as
Vn (R) = 2
2
2
r1 +r2 +···+rn <R2 dr1 dr2 · · · drn . (4.45) It is shown in Appendix 4A that Vn (R) is given by (for integer n)
Vn (R) = 2π n/2
Rn ,
nΓ(n/2) (4.46) √
where the Gamma function Γ(n) = (n − 1)!, Γ(n +1) = nΓ(n) if n is an integer, and Γ(1/2) = π/2.
2
2
The cases n = 2 and n = 3 yield the expected results, V2 = 2πR /(2Γ(1)) = πR because Γ(1) = 1,
and V3 = 2π 3/2 R3 /(3Γ(3/2)) = 4 πR3 because Γ(3/2) = Γ(1/2) = π 1/2 /2. The volume of the
3
positive part of a ndimensional sphere of radius R is given by
Γ(R) = 1
2 n Vn (R). (The volume Γ(R) should not be confused with the Gamma function Γ(n).) (4.47) CHAPTER 4. STATISTICAL MECHANICS 159 We are interested in the case n = 3N and R = (2mE )1/2 (2L/h). In this case the volume
Γ(E, V, N ) is given by
Γ(E, V, N ) = 1
2 3N 2π 3N/2
R3N/2
3N (3N/2 − 1)! (4.48a) = 1
2 3N π 3N/2 3N/2
R
(3N/2)! (4.48b) = 1
2 3N = V
h3 N 2L
h 3N/2 π 3N/2
(2mE )3N/2
(3N/2)! (2πmE )3N/2
.
(3N/2)! (4.48c)
(4.48d) If we include the factor of 1/N ! to correct for the overcounting of microstates in the semiclassical
limit, we obtain the desired result:
Γ(E, V, N ) = 1V
N ! h3 N (2πmE )3N/2
.
(3N/2)! (semiclassical limit) (4.49) A more convenient expression for Γ can be found by using Stirling’s approximation for N
We have
3N
3
V
!
+ N ln(2πmE ) − ln
h3
2
2
3N
3
3N
3N
3
= −N ln N + N + N ln V −
ln h2 + N ln(2πmE ) − N ln
+
2
2
2
2
2
V
3
4πmE
5
= N ln
+ N ln
+N
N
2
3 N h2
2
V
3
mE
5
= N ln
+ N ln
+ N,
2
N
2
3N π
2 ln Γ = − ln N ! + N ln 1. (4.50a)
(4.50b)
(4.50c)
(4.50d) where we have let h = 2π to obtain (4.50d) from (4.50c) .
Problem 4.19. We can obtain an equivalent expression for Γ(E, V, N ) using simpler physical
considerations. We write
Γ(E, V, N ) ≈ E
E
E
1
Γ1 ( , V )Γ1 ( , V ) . . . Γ1 ( , V ),
N!
N
N
N (4.51) where Γ1 (E, V ) is the number of states for a particle with energy less than E in a threedimensional
box of volume V . We have assumed that on the average each particle has an energy E/N . Find
the form of Γ(E, V, N ) using the relation (4.43) for Γ1 . Compare the V and E dependencies of
Γ(E, V, N ) obtained from this simple argument to (4.49). What about the N dependence?
Problem 4.20. Calculate g (E, V, N ) and verify that Γ(E, V, N ) and g (E, V, N ) are rapidly increasing functions of E, V , and N . CHAPTER 4. STATISTICAL MECHANICS 4.5 160 The microcanonical ensemble (ﬁxed E, V, and N) So far, we have learned how to count the number of microstates of an isolated system. Such
a system of particles is speciﬁed by the energy E , volume V , and number of particles N . All
microstates that are consistent with these conditions are assumed to be equally probable. The
collection of systems in diﬀerent microstates and speciﬁed values of E , V , and N is called the
microcanonical ensemble. In general, the energy E is a continuous variable, and the energy is
speciﬁed to be in the range E to E + ∆E .4
In the following we show how the quantities that correspond to the usual thermodynamic
quantities, for example, the entropy, temperature, and pressure, are related to the number of
microstates. We will then use these relations to derive the ideal gas equation of state and other
well known results using (4.50d) for the number of microstates of an ideal gas of N particles in a
volume V with energy E .
We ﬁrst establish the connection between the number of accessible microstates to various thermodynamic quantities by using arguments that are similar to our treatment of the simple models
that we considered in Section 4.2. Consider two isolated systems A and B that are separated by an
insulating, rigid, and impermeable wall. The macrostate of each system is speciﬁed by EA , VA , NA
and EB , VB , NB , respectively, and the corresponding number of microstates is ΩA (EA , VA , NA )
and ΩB (EB , VB , NB ). Equilibrium in this context means that each accessible microstate is equally
represented in our ensemble. The number of microstates of the composite system consisting of the
two isolated subsystems A and B is
Ω = ΩA (EA , VA , NA ) ΩB (EB , VB , NB ). (4.52) We want a deﬁnition of the entropy that is a measure of the number of microstates and that
is additive. It was assumed by Boltzmann that S is related to Ω by the famous formula, ﬁrst
proposed by Planck:
S = k ln Ω.
(4.53)
Note that if we substitute (4.52) in (4.53), we ﬁnd that S = SA + SB , and S is an additive function
as it must be.
Next we modify the wall between A and B so that it becomes conducting, rigid, and impermeable. We say that we have relaxed the internal constraint of the composite system. The
two subsystems are now in thermal contact so that the energies EA and EB can vary, subject to
the condition that the total energy E = EA + EB is ﬁxed; the volumes VA and VB and particle
numbers NA and NB remain unchanged. What happens to the number of accessible microstates
after we relax the internal constraint? In general, we expect that there are many more microstates
available after the constraint is removed. If subsystem A has energy EA , it can be in any one of
its Ω(EA ) microstates. Similarly, subsystem B can be in any one of its ΩB (E − EA ) microstates.
Because every possible state of A can be combined with every possible state of B to give a diﬀerent
state of the composite system, it follows that the number of distinct microstates accessible to the
4 For a quantum system, the energy E must always be speciﬁed in some range. The reason is that if the energy
were speciﬁed exactly, the system would have to be in an eigenstate of the system. If it were, the system would
remain in this eigenstate indeﬁnitely, and a statistical treatment would be meaningless. CHAPTER 4. STATISTICAL MECHANICS 161 composite system when A has energy EA is the product ΩA (EA )ΩB (E − EA ). Hence, the total
number of accessible microstates after the subsystems are in thermal equilibrium is
ΩA (EA )ΩB (E − EA ). Ω(E ) = (4.54) EA The probability that system A has energy EA is given by
P (EA ) = ΩA (EA )ΩB (E − EA )
.
Ω(E ) (4.55) Note that the logarithm of (4.54) does not yield a sum of two functions. However, the dominant
˜
˜
contribution to the righthand side of (4.54) comes from the term with EA = EA , where EA is the
most probable value of EA . With this approximation we can write
˜
˜
Ω ≈ ΩA (EA )ΩB (E − EA ). (4.56) The approximation (4.56) becomes more and more accurate as the thermodynamic limit (N, V →
∞, ρ = N/V = constant) is approached and allows us to write
S = k ln Ω = SA + SB (4.57) before and after the constraint is removed.
The relation S = k ln Ω is not very mysterious. It is simply a matter of counting the number
of accessible microstates and assuming that they are all equally probable. We see immediately
that one consequence of this deﬁnition is that the entropy increases or remains unchanged after an
internal constraint is relaxed. Given the deﬁnition (4.53) of S as a function of E , V , and N , it is
natural to adopt the thermodynamic deﬁnitions of temperature, pressure, and chemical potential:
∂S
1
=
.
T
∂E
∂S
P
=
.
T
∂V
∂S
µ
=−
.
T
∂N (4.58)
(4.59)
(4.60) We have made the connection between statistical mechanics and thermodynamics.
How should we deﬁne the entropy for a system in which the energy is a continuous variable?
Three possibilities are
S = k ln Γ
S = k ln g (E )∆E (4.61a)
(4.61b) S = k ln g (E ). (4.61c) It is easy to show that in the limit N → ∞, the three deﬁnitions yield the same result (see
Problem 4.21). The reason is that Γ(E ) and g (E ) are such rapidly increasing functions of E that
it makes no diﬀerence whether we include the microstates with energy less than or equal to E or
just the states between E and E + ∆E . CHAPTER 4. STATISTICAL MECHANICS 162 Example 4.1. Find the pressure and thermal equations of state of an ideal classical gas.
Solution. If we use any of the deﬁnitions of S given in (4.61), we ﬁnd that the entropy of an ideal
gas in the semiclassical limit for N → ∞ is given by
S (E, V, N ) = N k ln V
3
mE
5
+ ln
+.
N
2 3N π 2
2 (4.62) Problem 4.21. (a) Justify the statement made in the text that any of the deﬁnitions of S given
in (4.61) yield the result (4.62). (b) Verify the result (4.62) for the entropy S of an ideal gas.
Problem 4.22. Compare the form of S given in (4.62) with the form of S determined from
thermodynamic considerations in Section 2.19.
We now use the result (4.62) for S to obtain the thermal equation of state of an ideal classical
gas. From (4.62) we see that
∂S
3 Nk
1
=
,
(4.63)
=
T
∂E V ,N
2E
and hence we obtain the familiar result
E= 3
N kT.
2 (4.64) The pressure equation of state follows from (4.59) and (4.62) and is given by
∂S
P
=
T
∂V E ,N = Nk
,
V and hence
P V = N kT. (4.65) We have ﬁnally derived the equations of state of an ideal classical gas from ﬁrst principles! We see
that we can calculate the thermodynamic information for an isolated system by counting all the
accessible microstates as a function of the total energy E , volume V , and number of particles N .
Do the equations of state depend on and the various constants in (4.49)?
Note that we originally deﬁned the ideal gas temperature scale in Section 2.4 by assuming
that T ∝ P . We then showed that the ideal gas temperature scale is consistent with the thermodynamic temperature deﬁned by the relation 1/T = (∂S/∂E )V,N . Finally, we have shown that the
association of S with the logarithm of the number of accessible microstates is consistent with the
relation P ∝ T for an ideal classical gas.
Problem 4.23. Use the relations (4.62) and (4.64) to obtain S as a function of T , V , and N
instead of E , V , and N . This relation is known as the SackurTetrode equation.
Problem 4.24. Use (4.60) and (4.62) to derive the dependence of the chemical potential µ on
E , V , and N for a ideal classical gas. Then use (4.64) to determine µ(T, V, N ). (We will derive
µ(T, V, N ) for the ideal classical gas more simply in Section 6.8.) CHAPTER 4. STATISTICAL MECHANICS 163 Example 4.2. Consider a system of N noninteracting spins and ﬁnd the dependence of its temperature T on the total energy E . What is the probability that a given spin is up?
Solution. First we have to ﬁnd the dependence of the entropy S on the energy E of the system.
As discussed in Sec. 4.3.1, the energy E for a system with n spins up out of N in a magnetic ﬁeld
B is given by
E = −(n − n )µB = −[n − (N − n)]µB = −(2n − N )µB, (4.16) where n = N − n is the number of down spins and µ is the magnetic moment of the spins. The
corresponding number of microstates is given by (4.18):
Ω(n) = N!
.
n!(N − n)! (4.18) From (4.16), we ﬁnd that the value of n corresponding to a given E is given by
n= E
1
N−
.
2
µB (4.66) The thermodynamic temperature T is given by
∂S
dS (n) dn
1 dS
1
=
=
=−
.
T
∂E
dn dE
2µB dn (4.67) It is understood that the magnetic ﬁeld B is held ﬁxed.
To calculate dS/dn, we use the approximation (3.92) for large n:
d
ln n! = ln n,
dn (4.68) and ﬁnd dS (n)
= k [− ln n + ln(N − n)],
dn
where S (n) = k ln Ω(n) from (4.18). Hence
1
1
N −n
= −k
ln
.
T
2µB
n (4.69) (4.70) Equation (4.70) yields T as a function of E by eliminating n using (4.66).
The natural variables in the microcanonical ensemble are E , V , and N . Hence, T is a derived
quantity and is found as a function of E . As shown in Problem 4.25, we can rewrite this relation
to express E as a function T . The result is
E = −N µB tanh
where β = 1/kT . µB
= −N µB tanh βµB,
kT (4.71) CHAPTER 4. STATISTICAL MECHANICS 164 ensemble
macrostate probability distribution thermodynamics microcanonical E, V, N pn = 1 / Ω S (E, V, N ) = k ln Ω −βEn canonical T, V, N pn = e /Z grand canonical T, V, µ pn = e−β (En −µNn ) /Z F (T, V, N ) = −kT ln Z
Ω(T, V, µ) = −kT ln Z Table 4.8: Summary of the three common ensembles. Note that Ω is the number of accessible microstates in the microcanonical ensemble and the thermodynamic potential in the grand canonical
ensemble. The probability p that a given spin is up is equal to the ratio n/N . We can solve (4.70) for
n/N and obtain (see Problem 4.25)
1
n
=
,
−2µB/kT
N
1+e
eµB/kT
eβµB
= µB/kT
= βµB
,
e
+ e−βµB
e
+ e−µB/kT p= (4.72a)
(4.72b) We have obtained the result for p that we promised in Section 3.5.
Note we have had to consider all N spins even though the spins do not interact with each
another. The reason is that the N spins have a deﬁnite energy and hence we cannot assign the
orientation of the spins independently. We will obtain the result (4.72) by a more straightforward
method in Section 4.6.
Problem 4.25. Solve (4.70) for n/N and verify (4.72). Then use (4.16) to solve for E as a function
of T for a system of N noninteracting spins.
Although the microcanonical ensemble is conceptually simple, it is not the most practical ensemble. The major problem is that because we must satisfy the constraint that E is speciﬁed, we
cannot assign energies to each particle individually, even if the particles do not interact. Another
problem is that because each microstate is as important as any other, there are no obvious approximation methods that retain only the most important microstates. Moreover, isolated systems
are very diﬃcult to realize experimentally, and the temperature rather than the energy is a more
natural independent variable.
Before we discuss the other common ensembles, we summarize their general features in Table 4.8. The internal energy E is ﬁxed in the microcanonical ensemble and hence only the mean
temperature is speciﬁed and the temperature ﬂuctuates. In the canonical ensemble the temperature T and hence the mean energy is ﬁxed, but the energy ﬂuctuates. Similarly, the chemical
potential and hence the mean number of particles is ﬁxed in the grand canonical ensemble, and
the number of particles ﬂuctuates. In all of these ensembles, the volume V is ﬁxed which implies
that the pressure ﬂuctuates. We also can choose an ensemble in which the pressure is ﬁxed and
the volume ﬂuctuates.
Problem 4.26. Consider a collection of N distinguishable, harmonic oscillators with total energy
E . The oscillators are distinguishable because they are localized on diﬀerent lattice sites. In one CHAPTER 4. STATISTICAL MECHANICS 165 dimension the energy of each particle is given by n = (n + 1 ) ω , where ω is the angular frequency.
2
Hence, the total energy can be written as E = (Q + 1 N ) ω , where Q is the number of quanta.
2
Calculate the dependence of the temperature T on the total energy E in the microcanonical
ensemble using the result that the number of accessible microstates in which N distinguishable
oscillators can share Q indistinguishable quanta is given by Ω = (Q + N − 1)!/Q!(N − 1)! (see
(4.3)). Use this relation to ﬁnd E (T ). The thermodynamics of this system is calculated much
more simply in the canonical ensemble as shown in Example 4.52. 4.6 Systems in contact with a heat bath: The canonical
ensemble (ﬁxed T, V, and N) We now assume that the system of interest can exchange energy with a much larger system known
as the heat bath. The heat bath is suﬃciently large that it is not signiﬁcantly aﬀected by the
smaller system. For example, if we place a glass of cold water into a room, the temperature of the
water will eventually reach the temperature of the air in the room. Because the volume of the glass
is small compared to the volume of the room, the cold water does not cool the air appreciably and
the air is an example of a heat bath.
The composite system, the system of interest plus the heat bath, is an isolated system. We
can characterize the macrostate of the composite system by E, V, N . The accessible microstates
of the composite system are equally probable. If the system of interest is in a microstate with
energy En , then the energy of the heat bath is Ebath = E − En . Because the system of interest
is much smaller than the heat bath, we know that En
E . For small systems it is not clear
how we should assign the potential energy of interaction of particles at the interface of the system
and the heat bath. However, if the number of particles is large, the number of particles near the
interface is small in comparison to the number of particles in the bulk so that the potential energy
of interaction of particles near the surface can be ignored. Nevertheless, these interactions are
essential in order for the system to come into thermal equilibrium with the heat bath.
For a given microstate of the system, the heat bath can be in any one of a large number of
microstates such that the total energy of the composite system is E . The probability pn that the
system is in microstate n with energy En is given by (see (4.52))
pn = 1 × Ω(E − En )
,
n Ω(E − En ) (4.73) where Ω(E − En ) is the number of microstates of the heat bath for a given microstate n of the
system of interest. As En increases, Ω(E − En ), the number of accessible microstates available to
the heat bath, decreases. We conclude that pn is a decreasing function of En , because the larger
the value of En , the less energy is available to the heat bath.
E . However, we cannot approxiWe can simplify the form of pn by using the fact that En
mate Ω(E − En ) directly because Ω is a rapidly varying function of the energy. For this reason we CHAPTER 4. STATISTICAL MECHANICS 166 take the logarithm of (4.73) and write
ln pn = C + ln Ω(Ebath = E − En )
∂ ln Ω(Ebath )
∂Ebath ≈ C + ln Ω(E ) − En
= C + ln Ω(E ) − (4.74a)
Ebath =E En
,
kT (4.74b)
(4.74c) where C is related to the denominator of (4.73) and does not depend on En . We have used the
relation
∂ ln Ω(Ebath )
1
=
.
(4.75)
β≡
kT
∂Ebath
N ,V
As can be seen from (4.75), β is proportional to the inverse temperature of the heat bath. From
(4.74c) we obtain
1
pn = e−βEn .
(Boltzmann distribution)
(4.76)
Z
The function Z is found from the normalization condition
e−βEn . Z= n pn = 1 and is given by (partition function) (4.77) n The “sum over states” Z (T, V, N ) is known as the partition function. (In German Z is known as
the Zustandsumme, a more descriptive term.) Note that pn applies to a system in equilibrium with
a heat bath at temperature T . The nature of the system has changed from Section 4.5.
Problem 4.27. Discuss the relation between the qualitative results that we obtained in Table 4.6
and the Boltzmann distribution in (4.76).
Problem 4.28. The hydrocarbon 2butene, CH3 CH = CHCH3 occurs in two conformations
(geometrical structures) called the cis and transconformations. The energy diﬀerence ∆E between
the two conformations is approximately ∆E/k = 4180 K,with the trans conformation lower than
the cis conformation. Determine the relative abundance of the two conformations at T = 300 K
and T = 1000 K.
In the canonical ensemble the temperature T is ﬁxed by the heat bath, and a macrostate is
speciﬁed by the temperature T , volume V , and the number of particles N . The mean energy E is
given by
1
E=
pn En =
En e−βEn ,
(4.78)
Zn
n
where we have substituted the Boltzmann form (4.76) for the probability distribution. We use a
trick similar to that used in Section 3.5 to obtain a simpler form for E . First we write
E=− 1∂
Z ∂β e−βEn ,
n (4.79) CHAPTER 4. STATISTICAL MECHANICS
where we have used the fact that derivative ∂
−βEn
)
∂β (e ∂Z
=−
∂β
we can write
E=− 167
= −En e−βEn . Because En e−βEn , (4.80) n ∂
1 ∂Z
=−
ln Z.
Z ∂β
∂β (4.81) We see that E is a function of T for ﬁxed V and N and can be expressed as a derivative of Z .
In the same spirit, we can express CV , the heat capacity at constant volume, in terms of Z .
We have
dβ ∂ E
∂E
=
,
∂T
dT ∂β
1 ∂Z
1 1 ∂2Z
−2
=
2 Z ∂β 2
kT
Z ∂β CV = (4.82)
2 , (4.83) 1 ∂2Z
,
Z ∂β 2 (4.84) where ∂ E/∂β has been calculated from (4.81). Because
1
Z E2 = n 2
En e−βEn = we obtain the relation 1
2
[E 2 − E ].
(4.85)
2
kT
Equation (4.85) relates the response of the system to a change in energy to the equilibrium energy
ﬂuctuations. Note that we can calculate the variance of the energy, a measure of the magnitude of
the energy ﬂuctuations, from the heat capacity. We will later ﬁnd other examples of the relation
of the linear response of an equilibrium system to the equilibrium ﬂuctuations of an associated
quantity.5
CV = ∗ Problem 4.29. The isothermal compressibility of a system is deﬁned as κ = −(1/V ) ∂ V /∂P T .
In what way is κ a linear response? In analogy to the relation of CV to the ﬂuctuations in the
energy, how do you think κ is related to the ﬂuctuations in the volume of the system at ﬁxed T ,
P , and N ? Because the energy is restricted to a very narrow range in the microcanonical ensemble and
can range anywhere between zero and inﬁnity in the canonical ensemble, it is not obvious that
the two ensembles give the same results for the thermodynamic properties of a system. One way
to understand why the thermodynamic properties are independent of the choice of ensemble is
to use the relation (4.85) to estimate the range of energies in the canonical ensemble that have a
signiﬁcant probability. Because both E and CV are extensive quantities, they are proportional to
N . Hence, the relative ﬂuctuations of the energy in the canonical ensemble is given by
E2 − E
E 2 √
kT 2 CV
N 1/2
∼ N −1/2 .
=
∼
N
E (4.86) 5 The relation (4.85) is important conceptually and is useful for simulations at a given temperature (see Section 4.11). However, it is almost always more convenient to calculate CV from its deﬁnition in (4.82). CHAPTER 4. STATISTICAL MECHANICS 168 From (4.86) we see that in the limit of large N , the relative ﬂuctuations in the values of E that
would be observed in the canonical ensemble are vanishingly small. For this reason the mean
energy in the canonical ensemble is a well deﬁned quantity just like it is in the microcanonical
ensemble. However, the ﬂuctuations in the energy are qualitatively diﬀerent in the two ensembles
(see Appendix 4B).
Problem 4.30. The Boltzmann probability given by (4.76) is the probability that the system is
in a particular microstate with energy En . On the basis of what you have learned so far, what do
you think is the form of the probability p(E )∆E that the system has energy E between E and
E + ∆E ?
In addition to the relation of the mean energy to ∂ ln Z/∂β , we can express the mean pressure
P in terms of ∂ ln Z/∂V . If the system is in microstate n, then a quasistatic change dV in the
volume produces the energy change
dEn = dEn
dV = −Pn dV.
dV (4.87) The quantity dEn in (4.87) is the work done on the system in state n to produce the volume change
dV . The relation (4.87) deﬁnes the pressure Pn of the system in state n. Hence, the mean pressure
of the system is given by
dEn
.
(4.88)
P =−
pn
dV
n
From (4.77) and (4.88) we can express the mean pressure as
P = kT ∂ ln Z
∂V T ,N . (4.89) Note that in deﬁning the pressure, we assumed that a small change in the volume does not
change the probability distribution of the microstates. In general, a perturbation of the system will
induce transitions between the diﬀerent microstates of the system so that if initially the system is
in a microstate n, it will not stay in that state as the volume is changed. However, if the change
occurs suﬃciently slowly so that the system can adjust to the change, then the system will remain
in its same state. As discussed in Chapter 2, such a change is called quasistatic.
We can use the relation E = n pn En to write the total change in the energy as dE = dpn En +
n pn dEn . (4.90) n The second term in (4.90) can be written as
pn dEn =
n pn
n dEn
dV.
dV (4.91) The identiﬁcation of the second term in (4.90) with the work done on the system allows us to write
En dpn − P dV. dE =
n (4.92) CHAPTER 4. STATISTICAL MECHANICS 169 If we use the fundamental thermodynamic relation (2.110), dE = T dS − P dV (for ﬁxed N ), we
can identify the ﬁrst term in (4.92) with the change in entropy of the system. Hence, we have
En dpn . T dS = (4.93) n From (4.93) we see that a change in entropy of the system is related to a change in the probability
distribution.
We can use (4.93) to obtain an important conceptual expression for the entropy. We rewrite
pn = e−βEn /Z as En = −kT (ln Z + ln pn ), and substitute this relation for En into (4.93):
En dpn = −kT T dS =
n ln Z dpn − kT
n ln pn dpn . (4.94) n The ﬁrst term in (4.94) is zero because the total change in the probability must sum to zero. From
(4.94) we write
dS = −k ln pn dpn , (4.95) d(pn ln pn ). (4.96) n or
= −k
n We can integrate both sides of (4.96) to obtain the desired result:
S = −k pn ln pn . (4.97) n We have assumed that the constant of integration is zero. The quantity deﬁned by (4.11) and
(4.97) is known as the statistical entropy in contrast to the thermodynamic entropy introduced in
Chapter 2. Note the similarity of (4.97) to (3.29).
The relation (4.97) for S is also applicable to the microcanonical ensemble. If there are Ω
accessible microstates, then pn = 1/Ω for each state because each state is equally likely. Hence,
Ω S = −k 1
1
1
1
ln = −k Ω ln = k ln Ω.
ΩΩ
ΩΩ
n=1 (4.98) Note that the constant of integration in going from (4.96) to (4.97) must be set to zero so that
S reduces to its form in the microcanonical ensemble. We see that we can interpret (4.97) as the
generalization of its microcanonical form with the appropriate weight for each state.
It is remarkable that the statistical entropy deﬁned by (4.11) and (4.97) is equivalent to its
thermodynamic deﬁnition which can be expressed as
dS = dQ
.
T (4.99) The relation (4.97) is of fundamental importance and shows that the entropy is uniquely
determined by the probability distribution pn of the diﬀerent possible states. Note that complete CHAPTER 4. STATISTICAL MECHANICS 170 predictability (only one accessible microstate) implies the vanishing of the entropy. Also as the
number of accessible microstates increases, the greater the value of S and hence the higher the
degree of unpredictability of the system.
The idea of entropy has come a long way. It was ﬁrst introduced into thermodynamics as a state
function to account for the irreversible behavior of macroscopic systems under certain conditions.
The discovery of the connection between this quantity and the probability distribution of the
system’s microstates was one of the great achievements of Ludwig Boltzmann, and the equation
S = k ln Γ (his notation) appears on his tombstone.6 Since then, our understanding of entropy has
been extended by Shannon and Jaynes and others to establish a link between thermodynamics and
information theory (see Section 3.4.1). In this context we can say that S is a measure of the lack
of information, because the greater the number of microstates that are available to a system in a
given macrostate, the less we know about which microstate the system is in.
Although the relation (4.11) is of fundamental importance, we will not be able to use it to
calculate the entropy in any of the applications that we consider. The calculation of the entropy
will be discussed in Section 4.7.
The third law of thermodynamics. One statement of the third law of thermodynamics is
The entropy approaches a constant value as the temperature approaches zero.
The third law was ﬁrst formulated by Nernst in 1906 based on experimental observations. We can
easily see that the law follows simply from the statistical deﬁnition of the entropy. At T = 0, the
system is in the ground state which we will label by 0. From (4.97) we see that if pn = 1 for state 0
and is zero for all other microstates, then S = 0. We conclude that S → 0 as T → 0 if the system
has an unique ground state. This behavior is the type that we would expect for simple systems.
If there are g (0) microstates with the same ground state energy, then the corresponding entropy
is S (T = 0) = k ln g (0). As an example, because an electron has spin 1 , it has two quantum states
2
for each value of its momentum. Hence, an electron in zero magnetic ﬁeld has degeneracy7 gn = 2,
because its energy is independent of its spin orientation, and the ground state entropy of a system
of electrons would be kN ln 2. However, there are some complex systems for which g (0) ∼ N . In
any case, we can conclude that the heat capacities must go to zero as T → 0 (see Problem 4.45). 4.7 Connection between statistical mechanics and thermodynamics We have seen that the statistical quantity that enters into the calculation of the mean energy and
the mean pressure is not Z , but ln Z (see (4.81) and (4.89)). We also learned in Section 2.21 that
the Helmholtz free energy F = E − T S is the thermodynamic potential for the variables T , V , and
N . Because this set of variables corresponds to the variables speciﬁed by the canonical ensemble,
it is natural to look for a connection between ln Z and F , and we deﬁne the latter as
F = −kT ln Z . (statistical mechanics deﬁnition of the free energy)
6 See
7 An www.lecb.ncifcrf.gov/~toms/icons/aust2002/photosbytds/all/index.105.html .
energy level is said to be degenerate if there are two or more microstates with the same energy. (4.100) CHAPTER 4. STATISTICAL MECHANICS 171 At this stage the quantity deﬁned in (4.100) has no obvious relation to the thermodynamic potential
F = E − T S that we deﬁned earlier.
We now show that F as deﬁned by (4.100) is equivalent to the thermodynamic deﬁnition
F = E − T S . The relation (4.100) gives the fundamental relation between statistical mechanics
and thermodynamics for given values of T , V , and N , just as S = k ln Ω gives the fundamental
relation between statistical mechanics and thermodynamics for given values of E , V , and N (see
Table 4.8).
We write the total change in the quantity βF = − ln Z as
1 ∂Z
1 ∂Z
dβ −
dV
Z ∂β
Z ∂V
= Edβ − β P dV, d(βF ) = − (4.101) where we have used (4.81) and (4.88). We add and subtract βdE to the righthand side of (4.101)
to ﬁnd
d(βF ) = Edβ + βdE − βdE − β P dV
= d(β E ) − β (dE + P dV ). (4.102) Hence, we can write
d(βF − β E ) = −β (dE + P dV ). (4.103) From the thermodynamic relation dE = T dS − P dV (for ﬁxed N ), we can rewrite (4.103) as
d(βF − β E ) = −β (dE + P dV ) = −βT dS = −dS/k. (4.104) If we integrate (4.104), we ﬁnd
S/k = β (E − F ) + constant, (4.105) F = E − T S + constant. (4.106) or
If we make the additional assumption that the free energy should equal the internal energy of the
system at T = 0, we can set the constant in (4.106) equal to zero, and we obtain
F = E − T S. (4.107) Equation (4.107) is equivalent to the thermodynamic deﬁnition of the Helmholtz free energy with
E replaced by E . In the following, we will write E instead of E because the distinction will be
clear from the context.
In Section 2.21 we showed that the Helmholtz free energy F is the natural thermodynamic
potential for given values of T , V , and N and that
∂F
.
∂T V ,N
∂F
P =−
∂V T ,N
∂F
µ=
.
∂N T ,V
S=− (4.108)
(4.109)
(4.110) CHAPTER 4. STATISTICAL MECHANICS 172 These relations still hold with F = −kT ln Z .
In the above we started with the statistical mechanical relation F = −kT ln Z (see (4.100))
and found that it was consistent with the thermodynamic relation F = E − T S (see (4.107)). It
is instructive to start with the latter and show that it implies that F = −kT ln Z . We substitute
E = −∂ ln Z/∂β and the relation S = kβ 2 (∂F/∂β ) (see (4.108)) and ﬁnd
F = E − TS = − ∂F
∂ ln Z
−β
∂β
∂β V ,N . (4.111) We rewrite (4.111) as
∂F
∂ βF
∂ ln Z
=
=−
.
∂β V ,N
∂β
∂β V ,N
If we integrate both sides of (4.112), we ﬁnd (up to a constant) that
F +β F = −kT ln Z. 4.8 (4.112) (4.113) Simple applications of the canonical ensemble To gain experience with the canonical ensemble, we consider some relatively simple examples. In all
these examples, the goal is to calculate the sum over microstates in the partition function. Then
we can calculate the free energy using (4.100), the entropy from (4.108), and the mean energy
from (4.81). (In these simple examples, the volume of the system will not be relevant, so we will
not calculate the pressure.) In principle, we can follow this “recipe” for any physical systems.
However, we will ﬁnd that summing over microstates to evaluate the partition function is usually
a formidable task.
Example 4.3. Consider a system consisting of two distinguishable particles. Each particle has two
states with single particle energies 0 and ∆. The quantity ∆ is called the energy gap. The system
is in equilibrium with a heat bath at temperature T . What are the thermodynamic properties of
the system?
Solution. The states of this twoparticle system are (0, 0), (0, ∆), (∆, 0), and (∆, ∆). The partition
function Z2 is given by
4 e−βEn Z2 =
n=1 = 1 + 2e−β ∆ + e−2β ∆
= (1 + e −β ∆ 2 ). (4.114)
(4.115) As might be expected, we can express Z2 in terms of Z1 , the partition function for one particle:
2 e −β Z1 = n = 1 + e −β ∆ . (4.116) n=1 By comparing the forms of (4.115) and (4.116), we ﬁnd that
2
Z2 = Z1 . (4.117) CHAPTER 4. STATISTICAL MECHANICS 173 What do you expect the relation is between ZN , the partition function for N noninteracting
distinguishable particles, and Z1 ?
Note that if the two particles were indistinguishable, there would be three microstates if the
particles were bosons and one microstate if the particles are fermions, and the relation (4.117)
would not hold.
Because Z2 is simply related to Z1 , we can consider the statistical properties of a system
consisting of one particle with Z1 given by (4.116). From (4.76) we ﬁnd the probability that the
system is in each of its two possible states is given by:
1
1
=
Z1
1 + e −β ∆
−β ∆
e
e −β ∆
p2 =
=
.
Z1
1 + e −β ∆ p1 = (4.118a)
(4.118b) The average energy is given by
2 e= pn n n=1 = ∆ e −β ∆
.
1 + e −β ∆ (4.119) Of course, e could also be found from the relation e = −∂ ln Z/∂β . (We have used the symbol to
denote the energy of a single particle.) The energy of N noninteracting, distinguishable particles
of the same type is given by E = N e.
It is easy to calculate the various thermodynamic quantities directly from the partition function
in (4.115). The free energy per particle, f , is given by
f = −kT ln Z1 = −kT ln[1 + e−β ∆], (4.120) and s, the entropy per particle, is given by
s=− ∂f
∂T V = k ln[1 + e−β ∆ ] + k β∆
.
1 + eβ ∆ (4.121) If we had not already calculated the average energy e, we could also obtain it from the relation
e = f − T s. (As before, we have used lower case symbols to denote that the results are for one
particle.) Conﬁrm that the various ways of determining e yield the same results as found in (4.119).
The behavior of the various thermodynamic properties of this system are explored in Problem 4.49.
Example 4.4. Determine the thermodynamic properties of a onedimensional harmonic oscillator
in equilibrium with a heat bath at temperature T .
Solution. The energy levels of a single harmonic oscillator are given by
n 1
= (n + ) ω .
2 (n = 0, 1, 2, . . .) (4.122) CHAPTER 4. STATISTICAL MECHANICS 174 The partition function is given by
∞ Z= e −β n=0
−β ω /2 =e ω (n+1/2) (1 + e−β = e −β
ω +e ω /2 ∞ e−nβ ω n=0
−2β ω + · · · ) = e −β (4.123)
ω /2 (1 + x + x2 + · · · ), (4.124) where x = e−β ω . The inﬁnite sum in (4.124) is a geometrical series in x and can be summed using
the result that 1 + x + x2 + . . . = 1/(1 − x) (see Appendix A). The result is
Z= e−β ω/2
,
1 − e −β ω (4.125) and
1
ln Z = − β ω − ln(1 − e−β
2 ω ). (4.126) We leave it as an exercise for the reader to show that
1
ω + kT ln(1 − e−β ω )
2
βω
s=k β ω
− ln(1 − e−β
e
−1
1
1
.
e= ω + β ω
2e
−1 f= (4.127)
ω ) (4.128)
(4.129) Equation (4.129) is Planck’s formula for the mean energy of an oscillator at temperature T . The
heat capacity is discussed in Problem 4.52.
Problem 4.31. What is the mean energy of a system of N harmonic oscillators in equilibrium with
a heat bath at temperature T ? Compare your result with the result for the energy of N harmonic
oscillators calculated in the microcanonical ensemble in Problem 4.26. Do the two ensembles give
identical answers?
Equation (4.77) for Z is a sum over all the microstates of the system. Because the energies
of the diﬀerent microstates may be the same, we can group together microstates with the same
energy and write (4.77) as
g (En ) e−βEn ,
(4.130)
Z=
levels where g (En ) is the number of microstates of the system with energy En . The sum in (4.130) is
over all the energy levels of the system.
Example 4.5. Consider a three level single particle system with ﬁve microstates with energies
0, , , , and 2 . What is g ( n ) for this system? What is the mean energy of the system if it is
equilibrium with a heat bath at temperature T ?
Solution. The partition function is given by (see (4.130))
Z1 = 1 + 3e−β + e−2β . CHAPTER 4. STATISTICAL MECHANICS 175 Hence, the mean energy of a single particle is given by
e= 3e−β + 2e−2β
.
1 + 3e−β + e−2β What is the energy of N such particles?
Problem 4.32. In Section 4.3.2 we were given the number of states with energy E for the onedimensional Ising model. Use the result (4.19) to calculate the free energy of the onedimensional
Ising model for N = 2 and 4. 4.9 A simple thermometer Consider a system of one particle which we will call a demon that can exchange energy with another
system (see page 17). The demon obeys the following rules or algorithm:
1. Set up an initial microstate of the system with the desired total energy and assign an initial
energy to the demon. (The initial demon energy is usually set to zero.)
2. Make a trial change in the microstate. For the Einstein solid, choose a particle at random
and randomly increase or decrease its energy by unity. For a system of particles, change the
position of a particle by a small random amount. For the Ising model, ﬂip a spin chosen at
random. Compute the change in energy of the system, ∆E . If ∆E ≤ 0, accept the change,
and increase the energy of the demon by ∆E . If ∆E > 0, accept the change if the demon
has enough energy to give to the system, and reduce the demon’s energy by ∆E . If a trial
change is not accepted, the existing microstate is counted in the averages. In either case the
total energy of the system plus the demon remains constant.
3. Repeat step 2 many times choosing particles (or spins) at random.
4. Compute the averages of the quantities of interest once the system and the demon have
reached equilibrium.
The demon can trade energy with the system as long as its energy remains greater than its lower
bound, which we have chosen to be zero. The demon is a facilitator that allows the particles in
the system to indirectly trade energy with one another.
Problem 4.33. The demon can be considered to be a small system in equilibrium with a much
larger system. Because the demon is only one particle, its microstate is speciﬁed by its energy.
Given these considerations, what is the form of the probability that the demon is in a particular
microstate?
In Problems 4.34 and 4.35 we use the demon algorithm to determine the probability that the
demon is in a particular microstate.
Problem 4.34. Consider a demon that exchanges energy with an ideal classical gas of N identical
particles of mass m in one dimension. Because the energy of a particle depends only on its speed,
the positions of the particles are irrelevant in this case. Choose a particle at random and change CHAPTER 4. STATISTICAL MECHANICS 176 its velocity by an amount, δ , chosen at random between −∆ and ∆. The change in energy of the
system is the diﬀerence ∆E = 1 [(v + δ )2 − v 2 ], where we have chosen units so that m = 1. The
2
parameter ∆ is usually chosen so that the percentage of accepted changes is between 30% to 50%.
The applet/application at <stp.clarku.edu/simulations/demon> implements this algorithm.
(a) First consider a small number of particles, say N = 10. The applet chooses the special
microstate for which all the velocities of the particles in the system are identical such that the
system has the desired initial energy. After the demon and the system have reached equilibrium,
what is the mean kinetic energy per particle, the mean velocity per particle, and the mean energy
of the demon? (b) Compare the initial mean velocity of the particles in the system to the mean
value after equilibrium has been established and explain the result. (c) Compute the probability,
p(Ed )dEd , that the demon has an energy between Ed and Ed + dEd . Fit your results to the form
p(Ed ) ∝ exp(−βEd ), where β is a parameter. Given the form of p(Ed ), determine analytically
the dependence of the mean demon energy on β and compare your prediction with your numerical
results. (d) What is form of the distribution of the velocities and the kinetic energies of the system
after it has reached equilibrium? (e) How would your results change for an ideal gas in two and
three dimensions? (f) For simplicity, the initial demon energy was set to zero. Would your results
be diﬀerent if the demon had a nonzero initial energy if the total energy of the demon plus the
system was the same as before?
Problem 4.35. Consider a demon that exchanges energy with an Einstein solid of N particles.
First do the simulation by hand choosing N = 4 and E = 8. For simplicity, choose the initial
demon energy to be zero. Choose a particle at random and randomly raise or lower its energy
by one unit consistent with the constraint that the energy of the demon Ed ≥ 0. In this case
the energy of the particle chosen also must remain nonnegative. Note that if a trial change is not
accepted, the existing microstate is counted in all averages.
After you are satisﬁed that you understand how the algorithm works, use the applet at
<stp.clarku.edu/simulations/demon/einsteinsolid> and choose N = 20 and E = 40. Does
Ed eventually reach a well deﬁned average value? If so, what is the mean energy of the demon after
equilibrium between the demon and the system has been established? What is the probability that
the demon has energy Ed ? What is the mean and standard deviation of the energy of the system?
What are the relative ﬂuctuations of the energy in the system? Compute the probability, P (Ed ),
that the demon has an energy Ed . Fit your results to the form P (Ed ) ∝ exp(−βEd ), where β is
a parameter. Then increase E to E = 80. How do the various averages change? If time permits,
increase E and N and determine any changes in Pd .
Example 4.6. A demon exchanges energy with a system of N quantized harmonic oscillators (see
Problem 4.35). What is the mean energy of the demon?
Solution. The demon can be thought of as a system in equilibrium with a heat bath at temperature
T . For simplicity, we will choose units such that the harmonic oscillators have energy 0, 1, 2, . . .,
and hence, the energy of the demon is also restricted to integer values. Because the probability of
a demon microstate is given by the Boltzmann distribution, the demon’s mean energy is given by
Ed = ∞
−βn
n=0 ne
.
∞
−βn
n=0 e (4.131) Explain why the relation (4.131) for the demon energy is reasonable, and do the sums in (4.131) CHAPTER 4. STATISTICAL MECHANICS 177 to determine the temperature dependence of E d . (It is necessary to only do the sum in the
denominator of (4.131).)
Example 4.7. A demon exchanges energy with an ideal classical gas of N particles in one dimension (see Problem 4.34). What is the mean energy of the demon?
Solution. In this case the demon energy is a continuous variable. Hence,
Ed = ∞
−βEd
0 Ed e
.
∞ −βE
d
e
0 (4.132) Explain why the relation (4.132) for the demon energy is reasonable and determine the temperature
dependence of E d . Would this temperature diﬀerence be diﬀerent if the gas were threedimensional?
Compare the temperature dependence of E d for a demon in equilibrium with an ideal classical gas to
a demon in equilibrium with a system of harmonic oscillators. Why is the temperature dependence
diﬀerent? 4.10 Simulations of the microcanonical ensemble How can we implement the microcanonical ensemble on a computer? One way to do so for a
classical system of particles is to use the method of molecular dynamics (see Section 1.5). In
this method we choose initial conditions for the positions and velocities of each particle that are
consistent with the desired values of E , V , and N . The numerical solution of Newton’s equations
generates a trajectory in 3N dimensional phase space. Each point on the trajectory represents a
microstate of the microcanonical ensemble with the additional condition that the momentum of
the center of mass is ﬁxed. The averages over the phase space trajectory represent a time average.
To do such a simulation we need to be careful to choose a representative initial condition.
For example, suppose that we started with the particles in one corner of the box. Even though a
microstate with all the particles in one corner is as likely to occur as other microstates with same
energy, there are many more microstates for which the particles are spread throughout the box
than there are those with particles in one corner.
As we will justify further in Section 6.3, we can identify the temperature of a system of
interacting particles with the kinetic energy per particle using the relation (4.64). (For the ideal
gas the total energy is simply the kinetic energy.) If we were to do a molecular dynamics simulation,
we would ﬁnd that the total energy is (approximately) constant, but the kinetic energy and hence
the temperature ﬂuctuates. The mean temperature of the system becomes well deﬁned if the system
is in equilibrium, the number of particles in the system is suﬃciently large, and the simulation is
done for a suﬃciently long time.
Our assumption that a molecular dynamics simulation generates microstates consistent with
the microcanonical ensemble is valid as long as a representative sample of the accessible microstates
can be reached during the duration of the simulation. Such a system is said to be quasiergodic.
What if we have a system of ﬁxed total energy for which Newton’s equations of motion is
not applicable? For example, there is no dynamics for the model introduced in Section 4.2 in
which the particles have only integer values of the energy. Another general way of generating
representative microstates is to use a Monte Carlo method. As an example, consider a system CHAPTER 4. STATISTICAL MECHANICS 178 of N noninteracting distinguishable particles whose single particle energies are 0, 1, 2, . . . For this
model the relevant variables are the quantum numbers of each particle such that their sum equals
the desired total energy E . Given a set of quantum numbers, how do we generate another set of
quantum numbers with the same energy? Because we want to generate a representative sample
of the accessible states, we need to make all changes at random. One possibility is to choose a
(distinguishable) particle at random and make a trial change in its energy by ±1. However, such
a trial change would change the total energy of the system and hence not be acceptable. (For this
simple example of noninteracting particles, we could choose two particles at random and make
trial changes that would leave the total energy unchanged.)
A more interesting example is a system of particles interacting via the LennardJones potential
which has the form
σ
σ
(4.133)
u(r) = 4 ( )12 − ( )6 ,
r
r
where r is the separation between two particles, σ is a measure of the diameter of a particle, and
is a measure of the depth of the attractive part of the force. Note that u(r) is repulsive at short
distances and attractive at large distances. The 126 potential describes the interaction of the
monatomic atoms of the noble gases and some diatomic molecules such as nitrogen and oxygen
reasonably well. The parameters and σ can be determined from experiments or approximate calA
culations. The values = 1.65 × 10−21 J and σ = 3.4 ˚ yield good agreement with the experimental
properties of liquid Argon.
As we will see in Chapter 6, we can ignore the velocity coordinates and consider only the
positions of the particles and their potential energy. If we were to choose one particle at random,
and make a random displacement, the potential energy of the system would almost certainly change.
The only way we could keep the energy constant (or within a ﬁxed interval ∆E ) as required by the
microcanonical ensemble is to displace two particles chosen at random and hope that their random
displacements would somehow keep the potential energy constant. Very unlikely!
The condition that the total energy is ﬁxed makes sampling the accessible microstates diﬃcult.
This diﬃculty is analogous to the diﬃculty that we have already found doing calculations in the
microcanonical ensemble. We can get around this diﬃculty by relaxing the condition that the total
energy be ﬁxed. One way is to add to the system of N particles an extra degree of freedom called
the demon, as we discussed in Sec. 4.9. The total energy of the demon plus the original system is
ﬁxed. Because the demon is one particle out of N + 1, the ﬂuctuations in the energy of the original
system are order 1/N , which goes to zero as N → ∞. Another way of relaxing the condition that
the total energy is ﬁxed is to use the canonical ensemble. 4.11 Simulations of the canonical ensemble Suppose that we wish to simulate a system that is in equilibrium with a heat bath at temperature
T . One way to do so is to start with an arbitrary microstate of energy E and weight it by its
relative probability e−βE . For example, for the Einstein solid considered in Section 4.10, we could
generate another microstate by choosing a particle at random and changing its energy by ±1 at
random. A new microstate would be generated and the mean energy of the system would be CHAPTER 4. STATISTICAL MECHANICS
estimated by
E (T ) = 179 M
−βEn
n=1 En e
,
M
−βEn
n=1 e (4.134) where En is the energy of microstate n and the sum is over the M states that have been sampled.
However, this procedure would be very ineﬃcient because the M states would include many states
whose weight in averages such as (4.134) would be very small.
To make the sampling procedure eﬀective, we need to generate microstates with probabilities
proportional to their weight, that is, proportional to e−βEn . In this way we would generate states
with the highest probability. Such a sampling procedure is known as importance sampling. The
simplest and most common method of importance sampling in statistical mechanics is known as
the Metropolis algorithm. The method is based on the fact that the ratio of the probability that
the system is in state j with energy Ej to the probability of being in state i with energy Ei is
pj /pi = e−β (Ej −Ei ) = e−β ∆E , where ∆E = Ej − Ei . We then interpret this ratio as the probability
of making a transition from state i to state j . If ∆E < 0, the quantity e−β ∆E is greater than
unity, and the probability is unity. The Metropolis algorithm can be summarized as follows:
1. Choose an initial microstate, for example, choose random initial energies for each particle in
an Einstein solid or random positions in a system of particles interacting via the LennardJones potential.
2. Make a trial change in the microstate. For the Einstein solid, choose a particle at random
and increase or decrease its energy by unity. For a system of particles, change the position
of a particle by a small random amount. Compute the change in energy of the system, ∆E ,
corresponding to this change. If ∆E < 0, then accept the change. If ∆E > 0, accept the
change with probability w = e−β ∆E . To do so, generate a random number r uniformly
distributed in the unit interval. If r ≤ w, accept the new microstate; otherwise, retain the
previous microstate.
3. Repeat step 2 many times.
4. Compute the averages of the quantities of interest once the system has reached equilibrium.
Problem 4.36. Use the Metropolis probability to simulate an Einstein solid of N particles. Choose
N = 20 and β = 1. Choose a particle at random and randomly lower or raise its energy by one
unit. If the latter choice is made, generate a number at random in the unit interval and accept
the change if r ≤ e−β . If a trial change is not accepted, the existing microstate is counted in all
averages. Does the energy of the system eventually reach a well deﬁned average? If so, vary β and
determine E (T ). Compare your results to the analytical results you found in Example 4.4. 4.12 Grand canonical ensemble (ﬁxed T, V, and µ) In Section 4.6 we derived the Boltzmann probability distribution for a system in equilibrium with
a heat bath at temperature T . The role of the heat bath is to ﬁx the mean energy of the system.
We now generalize this derivation and ﬁnd the probability distribution for a system in equilibrium CHAPTER 4. STATISTICAL MECHANICS 180 with a heat bath at temperature T = 1/kβ and a particle reservoir with chemical potential µ. In
this case the role of the particle reservoir is to ﬁx the mean number of particles. This ensemble is
known as the grand canonical ensemble.
As before, the composite system is isolated with total energy E , total volume V , and total
number of particles N . The probability that the (sub)system is in microstate n with Nn particles
is given by (see (4.73))
1 × Ω(E − En , N − Nn )
.
(4.135)
pn =
n Ω(E − En , N − Nn )
The diﬀerence between (4.73) and (4.135) is that we have allowed both the energy and the number
of particles of the system of interest to vary. As before, we take the logarithm of both sides of
E and Nn
N . We have
(4.135) and exploit the fact that En
ln pn ≈ constant − En ∂ ln Ω(E )
∂ ln Ω(N )
− Nn
.
∂E
∂N (4.136) The derivatives in (4.136) are evaluated at Ebath = E and Nreservoir = N , respectively. If we
substitute β = ∂ ln Ω/∂E and βµ = −∂ ln Ω/∂N , we obtain
ln pn = constant − µNn
En
+
,
kT
kT (4.137) (Gibbs distribution) (4.138) or
pn = 1 −β (En −µNn )
e
.
Z Equation (4.138) is the Gibbs distribution for a variable number of particles. That is, pn is the
probability that the system is in state n with energy En and Nn particles. The grand partition
function Z in (4.138) is found from the normalization condition
pn = 1 . (4.139) e−β (En−µNn ) . (4.140) n Hence, we obtain
Z=
n In analogy to the relations we found in the canonical ensemble, we expect that there is a
simple relation between the Landau potential deﬁned in (2.144) and the grand partition function.
Because the derivation of this relation proceeds as in Sec. 4.6, we simply give the relation:
Ω = −kT ln Z . (4.141) Example 4.8. Many impurity atoms in a semiconductor exchange energy and electrons with the
electrons in the conduction band. Consider the impurity atoms to be in thermal and chemical
equilibrium with the conduction band, which can be considered to be an energy and particle
reservoir. Assume that ∆ is the ionization energy of the impurity atom. Find the probability that
an impurity atom is ionized.
Solution. Suppose that one and only one electron can be bound to an impurity atom. Because
an electron has a spin, both spin orientations ↑ and ↓ are possible. An impurity atom has three CHAPTER 4. STATISTICAL MECHANICS 181 allowed states: state 1 without an electron (atom ionized), state 2 with an electron attached with
spin ↑, and state 3 with an electron attached with spin ↓. We take the zero of energy to correspond
to the two bound states. The microstates of the system are summarized below.
state n
1
2
3 description
electron detached
electron attached, spin ↑
electron attached, spin ↓ N
0
1
1 n −∆
0
0 The grand partition function of the impurity atom is given by
Z = eβ ∆ + 2eβµ . (4.142) Hence, the probability that an atom is ionized (state 1) is given by
P (ionized) = 4.13 eβ ∆
1
=
.
βµ
−β (∆−µ)
+ 2e
1+e eβ ∆ (4.143) Entropy and disorder Many texts and articles for the scientiﬁcally literate refer to entropy as a measure of “disorder” or
“randomness.” This interpretation is justiﬁed by the relation, S = k ln Ω. The argument is that an
increase in the disorder in a system corresponds to an increase in Ω. Usually a reference is made
to a situation such as the tendency of students’ rooms to become messy. There are two problems
with this interpretation – it adds nothing to our understanding of entropy and is inconsistent with
our naive understanding of structural disorder.
We have already discussed the interpretation of entropy in the context of information theory
as a measure of the uncertainity or lack of information. Thus, we already have a precise deﬁnition
of entropy and can describe a student’s messy room as having a high entropy because of our lack
of information about the location of a particular paper or article of clothing. We could deﬁne
disorder as lack of information, but such a deﬁnition does not help us to understand entropy any
better because it would not provide an independent understanding of disorder.
The other problem with introducing the term disorder to describe entropy is that it can lead
us to incorrect conclusions. In the following we will describe two examples where the crystalline
phase of a given material has a higher entropy than the liquid phase. Yet you would probably
agree that a crystal is more ordered than a liquid. So how can a crystal have a higher entropy?
Suppose that we are going on a short trip and need to pack our suitcase with only a few
articles.8 In this case the volume of the suitcase is much greater than the total volume of the articles
we wish to pack, and we would probably just randomly throw the articles into the suitcase. Placing
the articles in an ordered arrangement would require extra time and the ordered arrangement would
probably be destroyed during transport. In statistical mechanics terms we say that there are many
more ways in which the suitcase can be packed in a disordered arrangement than the ordered one.
Hence, we could include that the disordered state has a higher entropy than the ordered state.
This low density case is consistent with the usual association of entropy and disorder.
8 This example is due to Laird. CHAPTER 4. STATISTICAL MECHANICS 182 Now suppose that we are going on a long trip and need to pack many articles in the same
suitcase, that is, the total volume of the articles to be packed is comparable to the volume of the
suitcase. In this high density case we know from experience that randomly throwing the articles
into the suitcase won’t allow us to shut the suitcase. Such a conﬁguration is incompatible with the
volume constraints of the suitcase. If we randomly throw the articles in the suitcase many times,
we might ﬁnd a few conﬁgurations that would allow us to close the suitcase. In contrast, if we pack
the articles in a neat and ordered arrangement, the suitcase can be closed. Also there are many
such conﬁgurations that would satisfy the constraints. We conclude that the number of ordered
arrangements (of the suitcase articles) is greater than the number of corresponding disordered
arrangements. Therefore an ordered arrangement in the high density suitcase has a higher entropy
than a structurally disordered state. The association of disorder with entropy is not helpful here.
The suitcase example is an example of an entropydriven transition because energy did not
enter into our considerations at all. Another example of an entropydrived transition is a system of
hard spheres or hard disks. In this seemingly simple model the interaction between two particles
is given by
∞ r<σ
u(r) =
(4.144)
0 r ≥ σ.
In Chapter 8 we will learn that the properties of a liquid at high density are determined mainy by
the repulsive part of the interparticle potential. For this model only nonoverlapping conﬁgurations
are allowed and so the potential energy is zero. Hence, the internal energy is solely kinetic and
the associated contribution to the free energy is the ideal gas part which depends only on the
temperature and the density. Hence, the diﬀerence in the free energy ∆F = ∆E − T ∆S between
a hard sphere crystal and a hard sphere ﬂuid at the same density and temperature must equal
−T ∆S .
In Chapter 8 we will do simulations that indicate that a transition from a ﬂuid at low density
to a crystal at high density exists (at ﬁxed temperature). (More extensive simulations and theory
show the the crystal has fcc symmetry and that the coexistence densities of the crystal and ﬂuid
are between ρσ 3 = 0.945 and 1.043.) Thus at some density ∆F must become negative, which can
occur only if ∆S = Scrystal − Sﬂuid is positive. We conclude that at high density the entropy of
the crystal must be greater than that of a ﬂuid at equal temperature and density for a ﬂuidsolid
(freezing) transition to exist. Vocabulary
composite system, subsystem
equal a priori probabilities
microcanonical ensemble, canonical ensemble, grand canonical ensemble
Boltzmann distribution, Gibbs distribution
entropy S , Helmholtz free energy F , Gibbs free energy G, Landau potential Ω
demon algorithm, Metropolis algorithm CHAPTER 4. STATISTICAL MECHANICS 183 Appendix 4A: The volume of a hypersphere
We derive the volume of a hypersphere of n dimensions given in (4.46). As in (4.45), the volume
is given by
Vn (R) = x2 +x2 +···+x2 <R2
n
1
2 dx1 dx2 · · · dxn . (4.145) Because Vn (R) ∝ Rn for n = 2 and 3, we expect that Vn is proportional to Rn . Hence, we write
Vn = Cn Rn , (4.146) where Cn is the (unknown) constant of proportionality that depends only on n. We rewrite the
volume element dVn = dx1 dx2 · · · dxn as
dVn = dx1 dx2 · · · dxn = Sn (R) dR = nCn Rn−1 dR, (4.147) where Sn = nCn Rn−1 is the surface area of the hypersphere. As an example, for n = 3 we have
dV3 = 4πR2 dR and S3 = 4πR2 . To ﬁnd Cn for general n, consider the identity (see Appendix A)
∞ In = −∞ dx1 · · · ∞
−∞ 2 ∞ 2 dxn e−(x1 +···+xn ) = dx e−x 2 n = π n/2 . (4.148) −∞ The lefthand side of (4.148) can be written as
∞ In = −∞ dx1 · · ·
∞ = nCn ∞
−∞ dR R 2 ∞ 2 dxn e−(x1 +···+xn ) = 2 dR Sn (R) e−R 0 n−1 −R2 e . (4.149) 0 We can relate the integral in (4.149) to the Gamma function Γ(n) deﬁned by the relation
∞ Γ(n) = dx xn−1 e−x . (4.150) 0 The relation (4.150) holds for n > −1 and whether or not n is an integer. We make the change of
variables x = R2 so that
In = 1
nCn
2 ∞ dx xn/2−1 e−x = 0 1
nCn Γ(n/2).
2 (4.151) A comparison of (4.151) with (4.148) yields the relation
Cn = π n/2
2π n/2
=
.
nΓ(n/2)
(n/2)Γ(n/2) It follows that
Vn (R) = 2π n/2
Rn .
nΓ(n/2) (4.152) (4.153) CHAPTER 4. STATISTICAL MECHANICS 184 Appendix 4B: Fluctuations in the canonical ensemble
To gain more insight into the spread of energies that are actually observed in the canonical ensemble, we calculate the probability P (E )∆E that a system in equilibrium with a heat bath at
temperature T has a energy E in the range ∆E . In most macroscopic systems, the number of
microstates with the same energy is large. In such a case the probability that the system is in any
of the microstates with energy En can be written as
g (En )e−βEn
,
−βEn
n g (En )e pn = (4.154) where g (En ) is the number of microstates with energy En . In the thermodynamic limit N , V → ∞,
the spacing between consecutive energy levels becomes very small and we can regard E as a
continuous variable. We write P (E )dE for the probability that the system in the range E and
E + dE and let g (E ) dE be the number of microstates between E and E + dE . (The function g (E )
is the density of states and is the same function discussed in Section 4.3.) Hence, we can rewrite
(4.154) as
g (E )e−βE dE
.
(4.155)
P (E ) dE = ∞
g (E )e−βE dE
0
As we did in Section 3.7, we can ﬁnd an approximate form of P (E ) by expanding P (E ) about
˜
E = E , the most probable value of E . To do so, we evaluate the derivatives ∂ ln P/∂E and
2
∂ ln P/∂E 2 using (4.155):
∂ ln P
∂E ˜
E =E = ∂ ln g
∂E = ∂ 2 ln g
∂E 2 ˜
E =E − β = 0. (4.156) and
∂ 2 ln P
∂E 2
We have ∂ 2 ln g
∂E 2 ˜
E =E ˜
E =E = ∂ ∂ ln g
∂E ∂E ˜
E =E . ˜
E =E (4.157) = ∂β
.
∂E (4.158) Finally, we obtain
∂β
1 ∂T
1
=− 2
=− 2
.
∂E
kT ∂E
kT CV (4.159) ˜
˜
We can use the above results to expand ln P (E ) about E = E through second order in (E −E )2 .
The result is
˜
(E − E )2
˜
ln P (E ) = ln P (E ) −
+ ...
2C
2kT V (4.160) 2
˜2
˜
P (E ) = P (E )e−(E −E) /2kT CV . (4.161) or ˜
If we compare (4.161) to the standard form of a Gaussian distribution (3.115), we see that E = E
2
and σE = kT 2 CV as expected. CHAPTER 4. STATISTICAL MECHANICS 185 Additional Problems
Problems
4.1
4.2, 4.3
4.4, 4.5, 4.6, 4.7
4.8
4.9, 4.10
4.11, 4.12, 4.13
4.14
4.15
4.16
4.17, 4.18
4.19, 4.20
4.21, 4.22, 4.23, 4.24
4.25, 4.26
4.27, 4.28
4.29, 4.30
4.31, 4.32
4.34, 4.35
4.33
4.36 page
139
140
145
146
148
149
150
153
155
156
159
162
164
166
167
175
175
175
179 Table 4.9: Listing of inline problems. Problem 4.37. Discuss the statistical nature of the Clausius statement of the second law that
energy cannot go spontaneously from a colder to a hotter body. Under what conditions is the
statement applicable? In what sense is this statement incorrect?
Problem 4.38. Given our discussion of the second law of thermodynamics from both the macroscopic and microscopic points of view, discuss the following quote due to Arthur Stanley Eddington:
The law that entropy always increases, the Second Law of Thermodynamics, holds . . .
the supreme position among the laws of Nature. If someone points out to you that
your pet theory of the universe is in disagreement with Maxwell’s equations, then so
much the worse for Maxwell’s equations. . . But if your theory is found to be against
the second law of thermodynamics, I can give you no hope; there is nothing for it but
to collapse in deepest humiliation.
Problem 4.39. Consider an isolated composite system consisting of subsystems 1 and 2 that can
exchange energy with each other. Subsystem 1 consists of three noninteracting spins, each having
magnetic moment µ. Subsystem 2 consists of two noninteracting spins each with a magnetic
moment 2µ. A magnetic ﬁeld B is applied to both systems. (a) Suppose that the total energy is
E = −3µB . What are the accessible microstates of the composite system? What is the probability
P (M ) that system 1 has magnetization M ? (b) Suppose that systems 1 and 2 are initially separated CHAPTER 4. STATISTICAL MECHANICS 186 from each other and that the net magnetic moment of 1 is −3µ and the net magnetic moment of
2 is +4µ. The systems are then placed in thermal contact with one another and are allowed to
exchange energy. What is the probability P (M ) that the net magnetic moment of system 1 has
one of its possible values M ? What is the mean value of the net magnetic moment of system 1?
Problem 4.40. Consider two isolated systems of noninteracting spins with NA = 4 and NB = 16.
If their initial energies are EA = −2µB and EB = −2µB , what is the total number of microstates
available to the composite system? If the two systems are now allowed to exchange energy with
one another, what is the probability that system 1 has energy EA ? What is the mean value of EA
and its relative ﬂuctuations of EA ? Calculate the analogous quantities for system B . What is the
most probable macrostate for the composite system?
Problem 4.41. Show that the relations (4.58)–(4.60) follow from the thermodynamic relation
dE = T dS − P dV + µdN (see (2.110)).
Problem 4.42. Suppose that the number of states between energy E and E + ∆E of an isolated
system of N particles in a volume V is given by
g (E )∆E = c(V − bN )N (E + N 2 a 3N/2
)
∆E,
V (4.162) where a, b, and c are constants. What is the entropy of the system? Determine the temperature
T as a function of E . What is the energy in terms of T , the density ρ = N/V , and the parameters
a and b? What is the pressure as a function of T and ρ? What are the units of the parameters a
and b?
Problem 4.43. Discuss the assumptions that are needed to derive the classical ideal gas equations
of state, (4.64) and (4.65).
Problem 4.44. Assume that g (E ) = E 3N/2 for a classical ideal gas. Plot g (E ), e−βE , and the
product g (E ) e−βE versus E for N = 6 and β = 1. What is the qualitative behavior of the three
˜
functions? Show that the product g (E )e−βE has a maximum at E = 3N/(2β ). Compare this value
to the mean value of E given by
E= ∞
−βE
dE
0 Eg (E )e
.
∞
g (E )e−βE dE
0 (4.163) Problem 4.45. Explain why the various heat capacities must go to zero as T → 0.
Problem 4.46. The partition function of a hypothetical system is given by
ln Z = aT 4 V, (4.164) where a is a constant. Evaluate the mean energy E , the pressure P , and the entropy S .
Problem 4.47. (a) Suppose that you walk into a store with little money in your pocket (and no
credit card). Would you care about the prices of the articles you wished to purchase? Would you
care about the prices if you had just won the lottery? (b) Suppose that you wish to purchase a
car that costs $20,000 but have no money. You then ﬁnd a dollar bill on the street. Has your
“capacity” for purchasing the car increased? Suppose that your uncle gives you $8000. Has your
capacity for purchasing the car increased substantially? How much money would you need before
you might think about buying the car? CHAPTER 4. STATISTICAL MECHANICS 187 Problem 4.48. Show that the partition function Z12 of two independent distinguishable systems
1 and 2 both in equilibrium with a heat bath at temperature T equals the product of the partition
functions of the separate systems:
(4.165)
Z12 = Z1 Z2 .
Problem 4.49. (a) Consider a system of N noninteracting, distinguishable particles each of which
can be in single particle states with energy 0 and ∆ (see Example 4.3). The system is in equilibrium
with a beat bath at temperature T . Sketch the probabilities that a given particle is in the ground
state and the excited state with energy ∆, and discuss the limiting behavior of the probabilities
for low and high temperatures. What does high and low temperature mean in this case? Sketch
the T dependence of the mean energy E (T ) and give a simple argument for its behavior. From
your sketch of E (T ) sketch the T dependence of the heat capacity C (T ) and describe its qualitative
behavior. Give a simple physical argument why C has a maximum and estimate the temperature at
which the maximum occurs. (b) Calculate C (T ) explicitly and verify that its behavior is consistent
with the qualitative features illustrated in your sketch. The maximum in the heat capacity of a two
state system is called the Schottky anomaly, but the characterization of this behavior as anomaly
is a misnomer because many systems behave as two level systems at low temperatures.
Problem 4.50. Consider a system of N noninteracting, distinguishable particles. Each particle
can be in one of three states with energies 0, ∆, and 10∆. Without doing an explicit calculation,
sketch the temperature dependence of the heat capacity at low temperatures.
Problem 4.51. Consider a system of one particle in equilibrium with a heat bath. The particle
has two microstates of energy 1 = 0 and 2 = ∆. Find the probabilities p1 and p2 when the mean
energy of the system is 0.2∆, 0.4∆, 0.5∆, 0.6∆, and ∆, respectively. What are the corresponding
temperatures? (Hint: Write the mean energy as x∆ and express your answers in terms of x.)
Problem 4.52. (a) Calculate the heat capacity CV of a system of N onedimensional harmonic
oscillators (see Example 4.4). (b) Plot the T dependence of the mean energy E and the heat
capacity C = dE/dT . (c) Show that E → kT at high temperatures for which kT
ω . This
result corresponds to the classical limit and will be shown in Section 6.3 to be a consequence of
the equipartition theorem. In this limit the thermal energy kT is large in comparison to ω , the
separation between energy levels. Hint: expand the exponential function in (4.129). (d) Show
that at low temperatures for which ω
k T , E = ω ( 1 + e−β ω ). What is the value of the
2
heat capacity? Why is the latter so much smaller than it is in the high temperature limit? (e)
Verify that S → 0 as T → 0 in agreement with the third law of thermodynamics, and that at high
T , S → kN ln(kT / ω ). The latter result implies that the eﬀective number of microstates over
which the probability is nonzero is ekT / ω . This result is reasonable because the width of the
Boltzmann probability distribution is kT , and hence the number of microstates that are occupied
at high temperature is kT / ω .
Problem 4.53. In the canonical ensemble the temperature is ﬁxed and the constant volume heat
capacity is related to the variance of the energy ﬂuctuations (see (4.85)). As discussed on page 177,
the temperature ﬂuctuates in the microcanonical ensemble. Guess how the constant volume heat
capacity might be expressed in the microcanonical ensemble.
Problem 4.54. Consider the system illustrated in Figure 4.8. The system consists of two distinguishable particles, each of which can be in either of two boxes. Assume that the energy of a CHAPTER 4. STATISTICAL MECHANICS 188 2
1 Figure 4.8: The two particles considered in Problem 4.54. The two distinguishable particles can
each be in one of the two boxes. The energy of the system depends on which box the particles
occupy.
particle is zero if it is in the left box and r if it is in the right box. There is also a correlation energy
term that lowers the energy by ∆ if the two particles are in the same box. (a) Enumerate the
22 = 4 microstates and their corresponding energy. (b) Suppose that r = 1 and ∆ = 15. Sketch
the qualitative behavior of the heat capacity C as a function of T . (c) Calculate the partition
function Z for arbitrary values of r and ∆ and use your result to ﬁnd the mean energy and the
heat capacity. Explain your result for C in simple terms. (d) What is the probability that the
system is in a particular microstate?
Problem 4.55. Consider a system in equilibrium with a heat bath at temperature T and a particle
reservoir at chemical potential µ. The reservoir has a maximum of four distinguishable particles.
Assume that the particles in the system do not interact and can be in one of two states with
energies zero or ∆. Determine the (grand) partition function of the system.
Problem 4.56. The following demonstration illustrates an entropydriven transition. Get a bag
of M&M’s or similar diskshaped candy. Ball bearings work better, but they are not as tasty. You
will also need a ﬂat bottom glass dish (preferably square) that ﬁts on an overhead projector.
Place the glass dish on the overhead projector and add a few of the candies. Shake the
dish gently from side to side to simulate the eﬀects of temperature. You should observe a twodimensional model of a gas. Gradually add more candies while continuing to shake the dish. As
the density is increased further, you will begin to notice clusters of hexagonal crystals. Do these
clusters disappear if you shake the dish faster? At what density do large clusters of hexagonal
crystals begin to appear? Is this density less than the maximum packing density? Suggestions for Further Reading
Joan Adler, “A walk in phase space: Solidiﬁcation into crystalline and amorphous states,” Am.
J. Phys. 67, 1145–1148 (1999). Adler and Laird discuss the demonstration in Problem 4.56.
Ralph Baierlein, Thermal Physics, Cambridge University Press, New York (1999).
Brian B. Laird, “Entropy, disorder and freezing,” J. Chem. Educ. 76, 1388–1390 (1999).
Thomas A. Moore and Daniel V. Schroeder, “A diﬀerent approach to introducing statistical
mechanics,” Am. J. Phys. 65, 26–36 (1997).
F. Reif, Statistical Physics, Volume 5 of the Berkeley Physics Series, McGrawHill (1965). CHAPTER 4. STATISTICAL MECHANICS 189 W. G. V. Rosser, An Introduction to Statistical Physics, Ellis Horwood Limited (1982).
Daniel V. Schroeder, An Introduction to Thermal Physics, AddisonWesley (1999).
Ruth Chabay and Bruce Sherwood, Matter & Interactions, John Wiley & Sons (2002), Vol. 1,
Modern Mechanics. Chapter 5 Magnetic Systems
c 2005 by Harvey Gould and Jan Tobochnik
7 December 2005
In Chapter 4 we developed the general formalism of statistical mechanics. We now apply this formalism to several magnetic systems for which the interactions between the magnetic moments are
important. We will discover that these interactions lead to a wide range of phenomena, including
the existence of phase transitions and other cooperative phenomena. We also introduce several
other quantities of physical interest. 5.1 Paramagnetism We ﬁrst review the behavior of a system of noninteracting magnetic moments with spin 1/2 in
equilibrium with a thermal bath at temperature T . We discussed this system in Section 4.3.1 and
in Example 4.2 using the microcanonical ensemble. We will ﬁnd that this system is much easier to
treat in the canonical ensemble.
Because the magnetic moments or spins are noninteracting, the only interaction is that of
the spins with an external magnetic ﬁeld B in the z direction. The magnetic ﬁeld due to the
spins themselves is assumed to be negligible. The energy of interaction of a spin with the external
magnetic ﬁeld B is given by
E = −µ · B = −µz B = −µBs, (5.1) where µz is the component of the magnetic moment in the direction of the magnetic ﬁeld B . We
write µz = sµ, where s = ±1.
We assume that the spins are ﬁxed on a lattice so that they are distinguishable even though
the spins are intrinsically quantum mechanical (because of the association of a magnetic moment
with a spin angular momentum). What would we like to know about the properties of a system
of noninteracting spins? In the absence of an external magnetic ﬁeld, there are not many physical
quantities of interest. The spins point randomly up or down because there is no preferred direction,
190 CHAPTER 5. MAGNETIC SYSTEMS 191 and the mean internal energy is zero. However, in the presence of an external magnetic ﬁeld, the
net magnetic moment and the energy of the system are nonzero.
Because each spin is independent of the others, we can ﬁnd the partition function for one
N
spin, Z1 , and use the relation ZN = Z1 to obtain ZN , the partition function for N spins. We can
derive this relation by writing the energy of the N spins as E = −µB N si and expressing the
i=1
partition function ZN for the N spin system as
ZN = N eβµB Σi=1 si ...
s1 =±1 s2 =±1 eβµBs1 eβµBs2 . . . eβµBsN ... =
s1 =±1 s2 =±1 = e
s1 =±1 sN =±1 βµBs1 eβµBs2 . . .
s2 =±1 eβµBs1 = (5.2) sN =±1 N eβµBsN
sN =±1 N
= Z1 . (5.3) s1 =±1 To ﬁnd Z1 we write
e−βµBs = eβµB (−1) + eβµB (+1) = 2 cosh βµB, Z1 = (5.4) s=±1 where we have performed the sum over s = ±1. The partition function for N spins is simply
ZN = (2 cosh βµB )N . (5.5) We now use the canonical ensemble formalism that we developed in Section 4.6 to ﬁnd the
thermodynamic properties of the system for a given T and B . In the following, we will use the
notation A instead of A to designate an ensemble average. We will also frequently omit the
brackets . . . , because it will be clear from the context when an average is implied.
The free energy is given by
F = −kT ln ZN = −N kT ln Z1 = −N kT ln(2 cosh βµB ). (5.6) The mean energy E is
E=− ∂ (βF )
∂ ln ZN
=
= −N µB tanh βµB.
∂β
∂β (5.7) Note that we have omitted the brackets because it is clear that E is the mean energy in the present
context. From (5.7) we see that E → 0 as T → ∞ (β → 0).
Problem 5.1. Compare the result (5.7) for the mean energy in the canonical ensemble to the
corresponding result that you found in Problem 4.25 in the microcanonical ensemble.
The heat capacity C is a measure of the change of the temperature due to the addition of
energy at constant magnetic ﬁeld. The heat capacity at constant magnetic ﬁeld can be expressed
as
∂E
∂E
= −kβ 2
.
(5.8)
C=
∂T
∂β CHAPTER 5. MAGNETIC SYSTEMS 192 (We will write C rather than CB because no confusion will result.) From (5.7) and (5.8), we ﬁnd
that the heat capacity of a system of N noninteracting spins is given by
C = N (βµB )2 sech2 βµB. (5.9) Note that the heat capacity is always positive, goes to zero as T → 0 consistent with the third law
of thermodynamics, and goes to zero at high T .
Magnetization and Susceptibility. Two additional macroscopic quantities of interest are
the mean magnetic moment or magnetization (in the z direction)
N si , M =µ (5.10) . (5.11) i=1 and the isothermal susceptibility χ:
χ= ∂M
∂B T The susceptibility χ is a measure of the change of the magnetization due to a change in the external
magnetic ﬁeld and is another example of a linear response function. We can express M and χ in
terms of derivatives of ln Z by noting that the total energy can be written in the general form as
E = E0 − M B, (5.12) where E0 is the energy of interaction of the spins with themselves and −M B is the energy of
interaction of the spins with the magnetic ﬁeld. (For noninteracting spins E0 = 0.) This form of
E implies that we can write Z in the form
e−β (E0,s −Ms B ) , Z= (5.13) s where Ms and E0,s are the values of M and E0 in microstate s. From (5.13) we have
∂Z
=
∂B βMs e−β (E0,s −Ms B ) , (5.14) s and hence the mean magnetization is given by
M= 1
Z Ms e−β (E0,s −Ms B ) (5.15a) s 1 ∂Z
βZ ∂B
∂ ln ZN
= kT
.
∂B
= (5.15b)
(5.15c) If we substitute the relation F = −kT ln Z , we obtain
M =− ∂F
.
∂B (5.16) CHAPTER 5. MAGNETIC SYSTEMS 193 Often we are more interested in the mean magnetization per spin m , which is simply
m= 1
M.
N (5.17) We leave it as an exercise for the reader to show that in the limit B → 0 (see Problem 5.31)
χ= 1
[ M 2 − M 2 ].
kT (5.18) The quantity χ in (5.18) is the zeroﬁeld susceptibility.1 Note the similarity of the form (5.18) with
the form (4.85) for the heat capacity CV . That is, the response functions CV and χ are related to
the corresponding equilibrium ﬂuctuations in the system.
From (5.6) and (5.16) we ﬁnd that the magnetization of a system of noninteracting spins is
M = N µ tanh βµB. (5.19) The susceptibility can be calculated using (5.11) and (5.19) and is given by
χ = N µ2 β sech2 βµB. (5.20) For high temperatures (small β ), sech βµB → 1, and the leading behavior of χ is given by
N µ2
.
(high temperature)
(5.21)
kT
The result (5.21) is known as the Curie form for the isothermal susceptibility and is commonly
observed for magnetic materials at high temperatures (kT
µB ).
χ → N µ2 β = We see that M is zero at B = 0 for all T implying that the system is paramagnetic. For
B = 0, we note that M → 0 as β → 0 (high T ), which implies that χ → 0 as T → ∞. Because a
system of noninteracting spins is paramagnetic, such a model is not applicable to materials such as
iron that can have a nonzero magnetization even when the magnetic ﬁeld is zero. Ferromagnetism
is due to the interactions between the spins.
Problem 5.2. (a) Plot the magnetization per spin as given by (5.19) and the heat capacity C
as given by (5.9) as a function of T . Give a simple argument why C must have a broad maximum
somewhere between T = 0 and T = ∞. (b) Plot the isothermal susceptibility χ versus T for ﬁxed
B and describe its limiting behavior for low and high T .
Problem 5.3. Calculate the entropy of a system of N noninteracting spins and discuss its limiting
behavior at low and high temperatures.
Problem 5.4. (a) Consider a solid containing N noninteracting paramagnetic atoms whose
magnetic moments can be aligned either parallel or antiparallel to the magnetic ﬁeld B . The system
is in equilibrium with a thermal bath at temperature T . The magnetic moment is µ = 9.274 × 10−24
J/tesla. If B = 4 tesla, at what temperature are 75% of the spins oriented in the +z direction? (b)
Assume that N = 1023 , T = 1 K, and that B is increased quasistatically from 1 tesla to 10 tesla.
What is the magnitude of the energy transfer from the thermal bath? (c) If the system is now
thermally isolated at T = 1 K and B is quasistatically decreased from 10 tesla to 1 tesla, what is
the ﬁnal temperature of the system? This process is known as adiabatic demagnetization.
1 We will use the same notation for the zeroﬁeld isothermal susceptibility and the isothermal susceptibility in a
nonzero ﬁeld because the distinction will be clear from the context. CHAPTER 5. MAGNETIC SYSTEMS 5.2 194 Thermodynamics of magnetism Note that the free energy F deﬁned by the relation F = −kT ln Z implies that F is a function
of T , B , and N . The magnetization M ﬂuctuates. It can be shown (see Appendix 5B) that the
magnetic work done on a magnetic system with magnetization M in an external magnetic ﬁeld B
is given by dW = −M dB . For ﬁxed N , we have the thermodynamic relation
dF (T, B ) = −SdT − M dB. (5.22) From (5.22) we obtain (5.16) for the magnetization in terms of the free energy. As an aside, we
note that if M is a constant and B is allowed to vary, we can deﬁne G = F + M H so that
dG(T, M ) = −SdT + BdM. 5.3 (5.23) The Ising model As we saw in Section 5.1, the absence of interactions between the spins implies that the system
can only be paramagnetic. The most important model of a system that exhibits a phase transition
is the Ising model, the harmonic oscillator (or fruit ﬂy) of statistical mechanics.2 The model was
proposed by Wilhelm Lenz (1888–1957) in 1920 and was solved exactly for the onedimensional
case by his student Ernest Ising in 1925.3 Ising was very disappointed because the onedimensional
case does not have a phase transition. Lars Onsager solved the Ising model exactly in 1944 for
two dimensions in the absence of an external magnetic ﬁeld and showed that there was a phase
transition in two dimensions.4 The twodimensional Ising model is the simplest model of a phase
transition.
In the Ising model the spin at every site is either up (+1) or down (−1). Unless otherwise
stated, the interaction is between nearest neighbors only and is given by −J if the spin are parallel
and +J if the spins are antiparallel. The total energy can be expressed in the form5
N E = −J
i,j =nn(i) N si sj − B si , (Ising model) (5.24) i=1 where si = ±1 and J is known as the exchange constant. In the following, we will refer to s itself
as the spin.6 The ﬁrst sum in (5.24) is over all pairs of spins that are nearest neighbors. The
interaction between two nearest neighbor spins is counted only once. We have implicitly assumed
2 Each year hundreds of papers are published that apply the Ising model to problems in such diverse ﬁelds as
neural networks, protein folding, biological membranes and social behavior.
3 A biographical note about Ising’s life is at <www.bradley.edu/las/phy/personnel/isingobit.html> .
4 The model is sometimes known as the LenzIsing model. The history of the Ising model is discussed by
Stephen Brush.
5 If we interpret the spin as a operator, then the energy is really a Hamiltonian. The distinction is unimportant
in the present context.
6 Because the spin S is a quantum mechanical object, we expect that the commutator of the spin operator with
the Hamiltonian is nonzero. However, because the Ising model retains only the component of the spin along the
direction of the magnetic ﬁeld, the commutator of the spin S with the Hamiltonian is zero, and we can treat the
spins in the Ising model as if they were classical. CHAPTER 5. MAGNETIC SYSTEMS 195 that the external magnetic ﬁeld is in the up or positive z direction. The factors of µ0 and g have
been incorporated into the quantity B which we will refer to as the magnetic ﬁeld. In the same
spirit the magnetization becomes the net number of positive spins rather than the net magnetic
moment. A discussion of how magnetism occurs in matter in given in Appendix 5A.
In addition to the conceptual diﬃculties of statistical mechanics, there is no standard procedure
for calculating the partition function. In spite of the apparent simplicity of the Ising model, we can
ﬁnd exact solutions only in one dimension and in two dimensions in the absence of a magnetic ﬁeld.
In other cases we need to use approximation methods and computer simulations.7 In the following
section we will discuss the onedimensional Ising model for which we can ﬁnd an exact solution. In
Section 5.5 we will brieﬂy discuss the nature of the exact solutions for the twodimensional Ising
model. We will ﬁnd that the twodimensional Ising model exhibits a continuous phrase transition.
We will also consider some straightforward simulations of the Ising model to gain more insight into
the behavior of the Ising model. In Section 5.6 we will discuss a simple approximation known as
meanﬁeld theory that is applicable to a wide variety of systems. A more advanced treatment of
the Ising model is given in Chapter 9. 5.4 The Ising Chain In the following we describe several methods for obtaining exact solutions of the onedimensional
Ising model and introduce an additional physical quantity of interest. 5.4.1 Exact enumeration The canonical ensemble is the natural choice for calculating the thermodynamic properties of the
N
Ising model. Because the spins are interacting, we no longer have the relation ZN = Z1 , and
we have to calculate ZN directly. The calculation of the partition function ZN is straightforward
in principle. The goal is to enumerate all the microstates of the system and the corresponding
energies, calculate ZN for ﬁnite N , and then take the limit N → ∞. The diﬃculty is that the total
number of states, 2N , is too many for N large. However, for the onedimensional Ising model or
Ising chain, we can calculate ZN for small N and quickly see how to generalize to arbitrary N .
For a ﬁnite chain we need to specify the boundary condition for the spin at each end. One
possibility is to choose free ends so that the spin at each end has only one interaction (see Figure 5.1a). Another choice is toroidal boundary conditions as shown in Figure 5.1b. This choice
implies that the N th spin is connected to the ﬁrst spin so that the chain forms a ring. The choice
of boundary conditions does not matter in the thermodynamic limit, N → ∞.
In the absence of an external magnetic ﬁeld, we will ﬁnd that it is more convenient to choose
free boundary conditions when calculating Z directly. The energy of the Ising chain in the absence
7 In three dimensions it has been shown that the Ising model is NPcomplete, that it is computationally intractable.
That is, the threedimensional Ising model (and the twodimensional Ising model with next nearestneighbor interactions in addition to the nearestneighbor kind) falls into the same class as other hard problems such as the
traveling salesman problem. CHAPTER 5. MAGNETIC SYSTEMS 196 (a) (b)
Figure 5.1: (a) Example of free boundary conditions for N = 9 spins. The spins at each end
interact with only one spin. In contrast, all the other spins interact with two spins. (b) Example
of toroidal boundary conditions. The N th spin interacts with the ﬁrst spin so that the chain forms
a ring. As a result, all the spins have the same number of neighbors and the chain does not have
a surface.
of an external magnetic ﬁeld is given explicitly by
N −1 E = −J si si+1 . (free boundary conditions) (5.25) i=1 We begin by calculating the partition function for two spins. There are four possible states: both
spins up with energy −J , both spins down with energy −J , and two states with one spin up and
one spin down with energy +J (see Figure 5.2). Thus Z2 is given by
Z2 = 2eβJ + 2e−βJ t = 4 cosh βJ.
↑↑ ↓↓ ↑↓ ↓↑ −J −J +J (5.26) +J Figure 5.2: The four possible conﬁgurations of the N = 2 Ising chain.
In the same way we can enumerate the eight microstates for N = 3 (see Problem 5.6). We
ﬁnd that
Z3 = 2 e2βJ + 4 + 2 e−2βJ (5.27a) = 2(eβJ + e−βJ )2 = 8(cosh βJ )2
= (e βJ +e −βJ (5.27b) )Z2 = (2 cosh βJ )Z2 . (5.27c) The relation (5.27c) between Z3 and Z2 suggests a general relation between ZN and ZN −1 :
ZN = (2 cosh βJ )ZN −1 = 2 2 cosh βJ N −1 . (5.28) CHAPTER 5. MAGNETIC SYSTEMS 197 We can derive the recursion relation (5.28) directly by writing ZN for the Ising chain in the
form
P N −1
ZN =
···
eβJ i=1 si si+1 .
(5.29)
s1 =±1 sN =±1 The sum over the two possible states for each spin yields 2N microstates. To understand the
meaning of the sums in (5.29), we write (5.29) for N = 3:
eβJs1 s2 +βJs2 s3 . Z3 = (5.30) s1 =±1 s2 =±1 s3 =±1 The sum over s3 can be done independently of s1 and s2 , and we have
eβJs1 s2 eβJs2 + e−βJs2 Z3 = (5.31a) s1 =±1 s2 =±1 eβJs1 s2 2 cosh βJs2 = 2 =
s1 =±1 s2 =±1 eβJs1 s2 cosh βJ. (5.31b) s1 =±1 s2 =±1 We have used the fact that the cosh function is even and hence cosh βJs2 = cosh βJ , independently
of the sign of s2 . The sum over s1 and s2 in (5.31b) is straightforward, and we ﬁnd,
Z3 = (2 cosh βJ )Z2 , (5.32) in agreement with (5.27c).
The analysis of (5.29) proceeds similarly. Note that spin N occurs only once in the exponential
and we have, independently of the value of sN −1 ,
eβJsN −1 sN = 2 cosh βJ. (5.33) sN =±1 Hence we can write ZN as
ZN = (2 cosh βJ )ZN −1 . (5.34) Problem 5.5. Use the recursion relation (5.34) and the result (5.32) for Z2 to conﬁrm the result
(5.28) for ZN .
We can use the general result (5.28) for ZN to ﬁnd the Helmholtz free energy:
F = −kT ln ZN = −kT ln 2 + (N − 1) ln(2 cosh βJ ) . (5.35) In the thermodynamic limit N → ∞, the term proportional to N in (5.35) dominates, and we have
the desired result:
F = −N kT ln 2 cosh βJ .
(5.36)
Problem 5.6. Enumerate the 2N microstates for the N = 4 Ising chain and ﬁnd the corresponding contributions to Z4 for free boundary conditions. Then show that Z4 satisﬁes the recursion
relation (5.34) for free boundary conditions. CHAPTER 5. MAGNETIC SYSTEMS 198 0.5 specific heat 0.4
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 kT/J
Figure 5.3: The temperature dependence of the heat capacity C of an Ising chain in the absence
of an external magnetic ﬁeld. At what value of kT /J does C exhibit a maximum? Explain. Problem 5.7. (a) What is the ground state of the Ising chain? (b) What is the behavior of S in
the limits T → 0 and T → ∞? The answers can be found without doing an explicit calculation.
(c) Use (5.36) for F to verify the following results for the entropy S , the mean energy E , and the
heat capacity C of the Ising chain:
S = N k ln(e2βJ + 1) − 2βJ
.
1 + e−2βJ E = −N J tanh βJ.
2 (5.37)
(5.38) 2 C = N k (βJ ) (sech βJ ) . (5.39) Verify your answers for the limiting behavior of S given in part (b). A plot of the T dependence
of the heat capacity in the absence of a magnetic ﬁeld is given in Figure 5.3.
∗ Problem 5.8. In Problem 4.19 the density of states was given (without proof) for the onedimensional Ising model for even N and toroidal boundary conditions:
Ω(E, N ) = 2 N
i =2 N!
,
i! (N − 1)! (i = 0, 2, 4, . . . , N ) (4.19) with E = 2 i − N . Use this form of Ω and the relation
Ω(E, N )e−βE ZN = (5.40) E to ﬁnd the free energy for small values of (even) N . (b) Use the results for ZN that you found by
exact enumeration to ﬁnd Ω(E, N ) for small values of N . CHAPTER 5. MAGNETIC SYSTEMS 5.4.2 ∗ 199 Spinspin correlation function We can gain further insight into the properties of the Ising model by calculating the spinspin
correlation function G(r) deﬁned as
G(r) = sk sk+r − sk s k +r . (5.41) Because the average of sk is independent of the choice of the site k and equals m, the magnetization
per spin m = M/N and G(r) can be written as
G(r) = sk sk+r − m2 . (5.42) The average denoted by the brackets . . . is over all spin conﬁgurations. Because all lattice sites
are equivalent, G(r) is independent of the choice of k and depends only on the separation r (for
a given T and B ), where r is the separation between spins in units of the lattice constant. Note
that for r = 0, G(r) = m2 − m 2 ∝ χ (see (5.18)).
The spinspin correlation function tells us the degree to which a spin at one site is correlated
with a spin at another site. If the spins are not correlated, then G(r) = 0. At high temperatures
the interaction between spins is unimportant, and hence the spins are randomly oriented in the
absence of an external magnetic ﬁeld. Thus in the limit kT
J , we expect that G(r) → 0 for
ﬁxed r. For ﬁxed T and B , we expect that if spin k is up, then the two adjacent spins will have a
greater probability of being up than down. Why? As we move away from spin k , we expect that
the probability that spin k + r is up will decrease. Hence, we expect that G(r) → 0 as r → ∞.
We will show in the following that G(r) can be calculated exactly for the Ising chain. The
result is
r
(5.43)
G(r) = tanh βJ .
A plot of G(r) for βJ = 2 is shown in Figure 5.4. Note that G(r) → 0 for increasing r as expected.
We will ﬁnd it useful to deﬁne the correlation length ξ by writing G(r) in the form
G(r) = e−r/ξ . (5.44) For the onedimensional Ising model
ξ=− 1
.
ln(tanh βJ ) (5.45) At low temperatures, tanh βJ ≈ 1 − 2e−2βJ , and
ln tanh βJ ≈ −2e−2βJ . (5.46) Hence
ξ= 1 2βJ
e.
2 (βJ 1) (5.47) From (5.47) we see that the correlation length becomes very large for low temperatures (βJ
The correlation length gives the length scale for the decay of correlations between the spins. 1). Problem 5.9. What is the maximum value of tanh βJ ? Show that for ﬁnite values of βJ , G(r)
given by (5.43) decays with increasing r. CHAPTER 5. MAGNETIC SYSTEMS 200 1 G(r) 0.8
0.6
0.4
0.2
0
0 50 r 100 150 Figure 5.4: Plot of the spinspin correlation function G(r) as given by (5.43) for the Ising chain
for βJ = 2. To calculate G(r), we assume free boundary conditions as before and consider only the zeroﬁeld case. It is convenient to generalize the Ising model and assume that the magnitude of each of
the nearestneighbor interactions is arbitrary so that the total energy E is given by
N −1 E=− Ji si si+1 , (5.48) i=1 where Ji is the interaction energy between spin i and spin i + 1. At the end of the calculation we
will set Ji = J . We will ﬁnd in Section 5.4.4, that m = 0 for T > 0 for the onedimensional Ising
model. Hence, we can write G(r) = sk sk+r . For the form (5.48) of the energy, sk sk+r is given
by
s k s k +r = 1
ZN s N −1 ··· 1 =± 1 βJi si si+1 , sk sk+r exp
sN =±1 (5.49) i=1 where
N −1 ZN = 2 2 cosh βJi . (5.50) i=1 The righthand side of (5.49) is the value of the product of two spins separated by a distance r in
a particular conﬁguration times the probability of that conﬁguration.
We now use a trick similar to that used in Appendix A to calculate various integrals. If we
take the derivative of the exponential with respect to Jk , we bring down a factor of sk sk+1 . Hence,
the nearestneighbor spinspin correlation function G(r = 1) = sk sk+1 for the Ising model with CHAPTER 5. MAGNETIC SYSTEMS 201 Ji = J can be expressed as
sk sk+1 = N −1 1
ZN s ··· 1 =± 1 =
= βJi si si+1 , sk sk+1 exp
sN =±1 11∂
ZN β ∂Jk ··· (5.51) i=1
N −1 s1 =±1 exp
sN =±1 1 1 ∂ZN (J1 , · · · , JN −1 )
ZN β
∂Jk βJi si si+1 ,
i=1 J i =J sinh βJ
= tanh βJ,
=
cosh βJ (5.52) where we have used the form (5.50) for ZN . To obtain G(r = 2), we use the fact that s2 +1 = 1 to
k
write sk sk+2 = sk sk+1 sk+1 sk+2 and
G(r = 2) =
= 1
ZN N −1 βJi si si+1 , sk sk+1 sk+1 sk+2 exp
{sj } (5.53) i=1 1 1 ∂ 2 ZN (J1 , · · · , JN −1 )
= [tanh βJ ]2 .
ZN β 2
∂Jk ∂Jk+1 (5.54) It is clear that the method used to obtain G(r = 1) and G(r = 2) can be generalized to
arbitrary r. We write
G(r) = 11∂
∂
∂
···
ZN ,
ZN β r ∂Jk Jk+1
Jk+r−1 (5.55) and use (5.50) for ZN to ﬁnd that
G(r) = tanh βJk tanh βJk+1 · · · tanh βJk+r−1 ,
r = tanh βJk+r−1 . (5.56) k=1 For a uniform interaction, Ji = J , and (5.56) reduces to the result for G(r) in (5.43).
Problem 5.10. Consider an Ising chain of N = 4 spins and calculate G(r) by exact enumeration
of the 24 microstates. What are the possible values of r for free and toroidal boundary conditions?
Choose one of these boundary conditions and calculate G(r = 1) and G(r = 2) using the microstates
that were enumerated in Problem 5.6. Assume that the system is in equilibrium with a thermal
bath at temperature T and in zero magnetic ﬁeld. (For convenience choose k = 1.) 5.4.3 Simulations of the Ising chain Although we have found an exact solution for the onedimensional Ising model, we can gain additional physical insight by doing simulations. As we will see, simulations are essential for the Ising
model in higher dimensions. CHAPTER 5. MAGNETIC SYSTEMS 202 As we discussed in Section 4.11, the Metropolis algorithm is the simplest and most common
Monte Carlo algorithm for a system in equilibrium with a thermal bath at temperature T . In the
context of the Ising model, the Metropolis algorithm can be implemented as follows:
1. Choose an initial microstate of N spins. The two most common initial states are the ground
state with all spins parallel or the T = ∞ state where each spin is chosen to be ±1 at random.
2. Choose a spin at random and make a trial ﬂip. Compute the change in energy of the system,
∆E , corresponding to the ﬂip. The calculation is straightforward because the change in
energy is determined by only the nearest neighbor spins. If ∆E < 0, then accept the change.
If ∆E > 0, accept the change with probability p = e−β ∆E . To do so, generate a random
number r uniformly distributed in the unit interval. If r ≤ p, accept the new microstate;
otherwise, retain the previous microstate.
3. Repeat step (2) many times choosing spins at random.
4. Compute the averages of the quantities of interest such as E , M , C , and χ after the
system has reached equilibrium.
In the following two problems we explore some of the qualitative properties of the Ising chain.
Problem 5.11. Use the applet/application at <stp.clarku.edu/simulations/ising1d> to simulate the onedimensional Ising model. It is convenient to measure the temperature in units such
that J/k = 1. For example, a temperature of T = 2 really means that T = 2J/k . The “time”
is measured in terms of Monte Carlo steps per spin, where in one Monte Carlo step per spin, N
spins are chosen at random for trial changes. (On the average each spin will be chosen equally,
but during any ﬁnite interval, some spins might be chosen more than others.) Choose B = 0.
(a) Choose N = 500 spins and start the system at T = 2 and observe the evolution of the
magnetization and energy per spin to equilibrium. The initial state is chosen to be the ground
state. What is your criterion for equilibrium? What is the approximate relaxation time for
the system to reach equilibrium? What is the mean energy, magnetization, heat capacity, and
susceptibility? Estimate the mean size of the domains of parallel spins.
(b) Consider T = 1.0 and T = 0.5 and observe the size of the domains of parallel spins. Estimate
the mean size of the domains at these temperatures.
(c) Approximately how many spins should you choose to avoid ﬁnite size eﬀects at T = 0.5?
Problem 5.12. The thermodynamic quantities of interest for the Ising model include the mean
energy E , the speciﬁc heat C , and the isothermal susceptibility χ. We are especially interested in
the temperaturedependence of these quantities near T = 0.
(a) Why is the mean value of the magnetization of little interest for the onedimensional Ising
model?
(b) How can the speciﬁc heat and susceptibility be computed during the simulation at a given
temperature? CHAPTER 5. MAGNETIC SYSTEMS 203 (c) Use the applet at <stp.clarku.edu/simulations/ising1d> to estimate these quantities and
determine the qualitativedependence of χ and the correlation length ξ on T at low temperatures.
(d) Why does the Metropolis algorithm become ineﬃcient at low temperatures? 5.4.4 *Transfer matrix So far we have considered the Ising chain only in zero external magnetic ﬁeld. As might be expected,
the solution for B = 0 is more diﬃcult. We now apply the transfer matrix method to solve for
the thermodynamic properties of the Ising chain in nonzero magnetic ﬁeld. The transfer matrix
method is very general and can be applied to various magnetic systems and to seemingly unrelated
quantum mechanical systems. The transfer matrix method also is of historical interest because it
led to the exact solution of the twodimensional Ising model in the absence of a magnetic ﬁeld.
To apply the transfer matrix method to the onedimensional Ising model, it is necessary to
adopt toroidal boundary conditions so that the chain becomes a ring with sN +1 = s1 . This
boundary condition enables us to write the energy as:
N E = −J N 1
si si+1 − B
(si + si+1 ).
2 i=1
i=1 (toroidal boundary conditions) (5.57) The use of toroidal boundary conditions implies that each spin is equivalent.
The transfer matrix T is deﬁned by its four matrix elements which are given by
1 Ts,s = eβ [Jss + 2 B (s+s )] . (5.58) The explicit form of the matrix elements is
T++ = eβ (J +B )
T−− = e β (J − B ) T−+ = T+− = e−βJ , (5.59a)
(5.59b)
(5.59c) or
T=
= T++ T+−
T−+ T−−
eβ (J +B ) e−βJ
.
e−βJ
e β (J − B ) (5.60) The deﬁnition (5.58) of T allows us to write ZN in the form
··· ZN (T, B ) =
s1 s2 Ts1 ,s2 Ts2 ,s3 · · · TsN ,s1 .
sN The form of (5.61) is suggestive of our interpretation of T as a transfer function. (5.61) CHAPTER 5. MAGNETIC SYSTEMS 204 The rule for matrix multiplication that we need for the transfer matrix method is
(T2 )s1 ,s3 = Ts1 ,s2 Ts2 ,s3 . (5.62) Ts1 ,s2 Ts2 ,s3 · · · TsN ,sN +1 . (5.63) s2 If we multiply N matrices together, we obtain:
(TN )s1 ,sN +1 = ···
s2 s3 sN This result is very close to what we have in (5.61). To make it identical, we use periodic boundary
conditions and set sN +1 = s1 , and sum over s1 :
(TN )s1 ,s1 =
s1 Because
we have ···
s1 N
s1 (T )s1 ,s1 s2 s3 Ts1 ,s2 Ts2 ,s3 · · · TsN ,s1 = ZN . (5.64) sN is the deﬁnition of the trace (the sum of the diagonal elements) of (TN ),
ZN = trace(TN ). (5.65) Because the trace of a matrix is independent of the representation of the matrix, the trace in
(5.65) may be evaluated by bringing T into diagonal form:
T= λ+ 0
.
0 λ− (5.66) The matrix TN is diagonal with the diagonal matrix elements λN , λN . If we choose the diagonal
+
−
representation fo T in (5.66), we have
trace (TN ) = λN + λN ,
+
− (5.67) where λ+ and λ− are the eigenvalues of T. Hence, we can express ZN as
ZN = λN + λN .
+
− (5.68) The fact that ZN is the trace of the N th power of a matrix is a consequence of our assumption of
toroidal boundary conditions.
The eigenvalues λ± are given by the solution of the determinant equation
e−βJ
e β ( J +B ) − λ
= 0.
−βJ
β (J − B )
e
e
−λ (5.69) The roots of (5.69) are
λ± = eβJ cosh βB ± e−2βJ + e2βJ sinh2 βB 1/2 . (5.70) It is easy to show that λ+ > λ− for all B and β , and consequently (λ−/λ+ )N → 0 as N → ∞. In
the thermodynamic limit (N → ∞), we obtain from (5.68) and (5.70)
λ−
1
ln ZN (T, B ) = ln λ+ + ln 1 +
N
λ+ N → ln λ+ ,
N →∞ (5.71) CHAPTER 5. MAGNETIC SYSTEMS 205 and the free energy per spin is given by
1
F (T, B ) = −kT ln eβJ cosh βJ + e2βJ sinh2 βB + e−2βJ
N 1/2 . (5.72) We can use (5.72) to ﬁnd the magnetization M at nonzero T and B :
M= sinh βB
∂F
=N
.
∂B
(sinh2 βB + e−4βJ )1/2 (5.73) We know that a system is paramagnetic if M = 0 only for B = 0, and is ferromagnetic if M = 0
for B = 0. For the onedimensional Ising model, we see from (5.73) that M = 0 for B = 0, and
there is no spontaneous magnetization at nonzero temperature. (Recall that sinh x ≈ x for small
x.) That is, the onedimensional Ising model undergoes a phase transition from the paramagnetic
to the ferromagnetic state only at T = 0. In the limit of low temperature (βJ
1 and βB
1),
e−2βJ and m = M/N ≈ 1 for B = 0. Hence, at low temperatures only a small
sinh βB ≈ 1 eβB
2
ﬁeld is needed to produce saturation, corresponding to m = 1.
Problem 5.13. More insight into the properties of the Ising chain in nonzero magnetic ﬁeld can be
found by calculating the isothermal susceptibility χ. Calculate χ using (5.73). What is the limiting
behavior of χ in the limit T → 0? Express this limiting behavior in terms of the correlation length
ξ. 5.4.5 Absence of a phase transition in one dimension We learned in Section 5.4.4 that the onedimensional Ising model does not have a phase transition
except at T = 0. We now argue that a phase transition in one dimension is impossible if the
interaction is shortrange, that is, if only a ﬁnite number of spins interact with one another.
At T = 0 the energy is a minimum with E = −(N − 1)J (for free boundary conditions), and
the entropy S = 0.8 Consider all the excitations at T > 0 obtained by ﬂipping all the spins to the
right of some site (see Figure 5.5(a)). The energy cost of creating such a domain wall is 2J . Because
there are N − 1 sites where the wall may be placed, the entropy increases by ∆S = k ln(N − 1).
Hence, the free energy cost associated with creating one domain wall is
∆F = 2J − kT ln(N − 1). (5.74) We see from (5.74) that for T > 0 and N → ∞, the creation of a domain wall lowers the free
energy. Hence, more domain walls will be created until the spins are completely randomized and
the net magnetization is zero. We conclude that M = 0 for T > 0 in the limit N → ∞.
Problem 5.14. Compare the energy of the conﬁguration in Figure 5.5(a) with the energy of the
conﬁguration shown in Figure 5.5(b) and discuss why the number of spins in a domain in one
dimension can be changed without the cost of energy.
8 The ground state for B = 0 corresponds to all spins up or all spins down. It is convenient to break this symmetry
by assuming that B = 0+ and letting T → 0 before setting B = 0. CHAPTER 5. MAGNETIC SYSTEMS (a) 206 (b) Figure 5.5: A domain wall in one dimension for a system of N = 8 spins. In (a) the energy of the
system is E = −5J with free boundary conditions. Because the energy of the ground state equals
7J , the energy cost for forming a domain wall is 2J . In (b) the domain wall has moved with no
cost in energy. 5.5 The TwoDimensional Ising Model We ﬁrst give an argument similar to the one that given in Appendix 5C to suggest the existence of
a phase transition (to ferromagnetism) in two dimensions. We need to show that the mean value
of the magnetization is nonzero at low, but nonzero temperatures and in zero magnetic ﬁeld.
The key diﬀerence between the one and twodimensional case is that in one dimension, the
existence of one domain wall allows the system to have regions of up and down spins, and the size
of each region can be changed without any cost of energy. So on the average the number of up
and down spins is the same. In two dimensions the existence of one domain does not make the
magnetization zero. The regions of down spins cannot grow at low temperature because expansion
requires longer boundaries and hence more energy.
In two dimensions the points between pairs of spins of opposite signs can be joined to form
boundary lines dividing the lattice into domains (see Figure 5.6). The net magnetization is proportional to the area of the positive domains minus the area of the negative domains. At T = 0
all the spins are in the same (positive) direction and there are no boundary lines. At T > 0, there
is suﬃcient energy to create boundary lines and negative domains will appear. If the perimeter
of a negative domain is b, then the energy needed to create it is 2Jb. Hence, the probability of
having a negative domain is e−2βbJ . Because b must be at least 4, negative regions of large area
are unlikely at low T . Therefore most of the spins will remain positive, and the magnetization
remains positive. Hence M > 0 for T > 0, and the system is ferromagnetic. We will ﬁnd in the
following that M becomes zero at a critical temperature Tc > 0. 5.5.1 Onsager solution The twodimensional Ising model was solved exactly in zero magnetic ﬁeld for a rectangular lattice
by Lars Onsager in 1944.9 Onsager’s calculation was the ﬁrst exact solution that exhibited a phase
transition in a model with shortrange interactions. Before his calculation, some people believed
that statistical mechanics was not capable of yielding a phase transition.
Although Onsager’s solution is of much historical interest, the mathematical manipulations
are very involved. Moreover, the manipulations are special to the Ising model and cannot be
9A short biography of Onsager can be found at <www.nobel.se/chemistry/laureates/1968/onsagerbio.html> . CHAPTER 5. MAGNETIC SYSTEMS 207 Figure 5.6: Example of a domain wall in the twodimensional Ising model. generalized to other systems. For these reasons few workers in statistical mechanics have gone
through the Onsager solution in great detail. (It is probably true that fewer people understand the
Onsager solution of the twodimensional Ising model than understand Einstein’s theory of general
relativity.) In the following, we give only the results of the twodimensional solution for a square
lattice and concentrate on approximation methods of more general applicability.
The critical temperature Tc is given by
sinh
or
kTc /J = 2J
= 1,
kTc (5.75) 2
√ ≈ 2.269.
ln(1 + 2) (5.76) It is convenient to express the mean energy in terms of the dimensionless parameter κ deﬁned as
κ=2 sinh 2βJ
.
(cosh 2βJ )2 (5.77) A plot of the parameter κ versus βJ is given in Figure 5.7. Note that κ is zero at low and high
temperatures and has a maximum of unity at T = Tc .
The exact solution for the energy E can be written in the form
E = −2N J tanh 2βJ − N J
where sinh2 2βJ − 1
2
K1 (κ) − 1 ,
sinh 2βJ cosh 2βJ π π /2 K1 (κ) =
0 dφ
1 − κ2 sin2 φ . (5.78) (5.79) K1 is known as the complete elliptic integral of the ﬁrst kind. The ﬁrst term in (5.78) is similar to
the result (5.38) for the energy of the onedimensional Ising model with a doubling of the exchange CHAPTER 5. MAGNETIC SYSTEMS 208 1.0 κ 0.8
0.6
0.4
0.2
0
0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 βJ Figure 5.7: Plot of the function κ deﬁned in (5.77) as a function of J/kT . interaction J for two dimensions. The second term in (5.78) vanishes at low and high temperatures
(because of the term in brackets) and at T = Tc because of the vanishing of the term sinh2 2βJ − 1.
However, K1 (κ) has a logarithmic singularity at T = Tc at which κ = 1. Hence, the entire second
term behaves as (T − Tc ) ln T − Tc  in the vicinity of Tc . We conclude that E (T ) is continuous at
T = Tc and at all other temperatures.
The heat capacity can be obtained by diﬀerentiating E (T ) with respect to temperature. It
can be shown after some tedious algebra that
4
C (T ) = N k (βJ coth 2βJ )2 K1 (κ) − E1 (κ)
π
π
+ (2 tanh2 2βJ − 1)K1 (κ) ,
− (1 − tanh2 2βJ )
2
where π /2 E1 (κ) = dφ 1 − κ2 sin2 φ. (5.80) (5.81) 0 E1 is the complete elliptic integral of the second kind. Near Tc , C is given by
C ≈ −N k 2 2J
π kTc 2 ln 1 − T
 + constant.
Tc (T ≈ Tc ) (5.82) The most important property of the Onsager solution is that the heat capacity diverges logarithmically at T = Tc :
C (T ) ∼ ln  ,
(5.83)
where the reduced temperature diﬀerence is given by
= (Tc − T )/Tc . (5.84) CHAPTER 5. MAGNETIC SYSTEMS 209 m Tc T Figure 5.8: The temperaturedependence of the spontaneous magnetization of the twodimensional
Ising model. A major test of the approximate treatments that we will develop in Section 5.6 and in Chapter 9
is whether they can yield a heat capacity that diverges as in (5.83).
To know whether the logarithmic divergence of the heat capacity at T = Tc is associated with
a phase transition, we need to know if there is a spontaneous magnetization. That is, is there
a range of T > 0 such that M = 0 for B = 0? However, Onsager’s solution is limited to zero
magnetic ﬁeld. To calculate the spontaneous magnetization, we need to calculate the derivative
of the free energy with respect to B for ﬁnite B and then let B = 0. The exact behavior of the
twodimensional Ising model as a function of the magnetic ﬁeld B is not known. In 1952, Yang
was able to calculate the magnetization for T < Tc and the zeroﬁeld susceptibility.10 Yang’s exact
result for the magnetization per spin can be expressed as
m(T ) = 0
−4 1/8 1 − [sinh 2βJ ] T > Tc
T < Tc (5.85) A graph of m is shown in Figure 5.8. We see that m vanishes near Tc as m ∼ 1/8 . The magnetization m is an example of an order parameter. The order parameter provides a signature of the
order, that is, m = 0 for T > Tc (disordered state) and m = 0 for T ≤ Tc (ordered state).
The behavior of the zeroﬁeld susceptibility as T → Tc is given by
χ ∼  −7/4 . (5.86) The most important results of the exact solution of the twodimensional Ising model are that
the energy (and the free energy and the entropy) are continuous functions for all T , m vanishes
continuously at T = Tc , the heat capacity diverges logarithmically at T = Tc , and the zeroﬁeld susceptibility diverges as a power law. When we discuss phase transitions in more detail
10 The result (5.85) was ﬁrst announced by Onsager at a conference in 1944 but not published. Yang is the
same person who together with Lee shared the 1957 Nobel Prize in Physics for work on parity violation. See
<nobelprize.org/physics/laureates/1957/> . CHAPTER 5. MAGNETIC SYSTEMS 210 in Chapter 9, we will understand that the paramagnetic ↔ ferromagnetic transition in the twodimensional Ising model is continuous. That is, the order parameter m vanishes continuously rather
than discontinuously. Because the transition occurs only at T = Tc and B = 0, the transition occurs
at a critical point.
The spinspin correlation function G(r) cannot be expressed in terms of simple analytical
expressions for all r and all T . However, the general behavior of G(r) for T near Tc is known to
be
1
G(r) ∼ d−2+η e−r/ξ .
(large r and  
1),
(5.87)
r
where d is the spatial dimension and η is another critical exponent. The correlation length ξ
diverges as
(5.88)
ξ ∼  −ν .
The exact result for the critical exponent ν for the twodimensional Ising model is ν = 1. At
T = Tc , G(r) decays as a power law:
G(r) = 1
.
rη (r 1 and T = Tc ) (5.89) The powerlaw behavior in (5.89). For the twodimensional Ising model η = 1/4. The value of
the various critical exponents for the Ising model in two and three dimensions is summarized in
Table 5.1.
quantity
speciﬁc heat
order parameter
susceptibility
equation of state (T = Tc )
correlation length
power law decay at T = Tc exponent
α
β
γ
δ
η
ν
η d = 2 (exact)
0 (logarithmic)
1/8
7/4
15
1/4
1
1/4 d=3
0.113
0.324
1.238
4.82
0.031(5)
0.629(4)
0.04 meanﬁeld
0 (jump)
1/2
1
3
0
1/2
0 Table 5.1: Values of the static critical exponents for the Ising model.
There is a fundamental diﬀerence between the exponential behavior of G(r) for T = Tc in
(5.87) and the power law behavior of G(r) for T = Tc in (5.89). Systems with correlation functions
that decay as a power law are said to be scale invariant. That is, power laws look the same on
all scales. The replacement x → ax in the function f (x) = Ax−η yields a function g (x) that
is indistinguishable from f (x) except for a change in the amplitude A by the factor a−η . In
contrast, this invariance does not hold for functions that decay exponentially because making the
replacement x → ax in the function e−x/ξ changes the correlation length ξ by the factor a. The
fact that the critical point is scale invariant is the basis for the renormalization group method
considered in Chapter 9.
We stress that the phase transition in the Ising model is the result of the cooperative interactions between the spins. Phase transitions are of special interest in physics. Although phase
transitions are commonplace, they are remarkable from a microscopic point of view. How does CHAPTER 5. MAGNETIC SYSTEMS 211 the behavior of the system change so remarkably with a small change in the temperature even
though the interactions between the spins remain unchanged and shortrange? The study of phase
transitions in relatively simple systems such as the Ising model has helped us begin to understand phenomena as diverse as the distribution of earthquakes, the shape of snow ﬂakes, and the
transition from a boom economy to a recession. 5.5.2 Computer simulation of the twodimensional Ising model The implementation of the Metropolis algorithm for the twodimensional model proceeds as in one
dimension. The only diﬀerence is that an individual spin interacts with four nearest neighbors
on a square lattice rather than only two as in one dimension. Simulations of the Ising model
in two dimensions allow us to compare our approximate results with the known exact results.
Moreover, we can determine properties that cannot be calculated analytically. We explore some of
the properties of the twodimensional Ising model in Problem 5.15.
Problem 5.15. Use the applet at <stp.clarku.edu/simulations/ising2d/> to simulate the
twodimensional Ising model at a given temperature. First choose N = L2 = 322 . Set the external
magnetic ﬁeld B = 0 and take T = 10. (Remember that we are measuring T in terms of J/k .) For
simplicity, the initial orientation of the spins is all spins parallel.
(a) Is the orientation of the spins random, that is, is the mean magnetization equal to zero? Is
there a slight tendency for a spin to align with its neighbors?
(b) Next choose a low temperature such as T = 0.5. Are the spins still random or do a majority
choose a preferred direction?
(c) Choose L = 4 and T = 2.0. Does the sign of the magnetization change during the simulation?
Choose a larger value of N and observe if the sign of the magnetization changes.
(d) You probably noticed that M = 0 for suﬃcient high T and is nonzero for suﬃciently low T .
Hence, there is an intermediate value of T at which M ﬁrst becomes nonzero. Choose L = 32
and start with T = 4 and gradually lower the temperature. Note the groups of aligned spins
that grow as T is decreased. Estimate the value of T at which the mean magnetization ﬁrst
becomes nonzero.
Problem 5.16. We can use the applet at <stp.clarku.edu/simulations/ising2d/> to obtain
more quantitative information. Choose N = 322 and set B = 0. Start from T = 4 and determine
the temperaturedependence of the magnetization M , the zeroﬁeld susceptibility χ, the mean
energy E , and the speciﬁc heat C . Decrease the temperatures in intervals of ∆T = 0.2 until about
T = 1.6. Describe the qualitative behavior of these quantities. 5.6 MeanField Theory Because we cannot solve the thermodynamics of the Ising model exactly in three dimensions and
the exact solution of the twodimensional Ising model is limited to zero external magnetic ﬁeld, we
need to develop approximate theories. In this section we develop an approximate theory known CHAPTER 5. MAGNETIC SYSTEMS 212 as meanﬁeld or Weiss molecular ﬁeld theory. Meanﬁeld theories are easy to treat, usually yield
qualitatively correct results, and provide insight into the nature of phase transitions. We will see
that their main disadvantage is that they ignore longrange correlations and are insensitive to the
dimension of space. In Section 8.9 we will learn how to apply similar ideas to gases and liquids
and in Section 9.4 we consider more sophisticated versions of meaneld theory to Ising systems.
In its simplest form meanﬁeld theory assumes that each spin interacts with the same eﬀective
magnetic ﬁeld. The eﬀective ﬁeld is due to the external magnetic ﬁeld plus the internal ﬁeld due
to all the other spins. That is, spin i “feels” an eﬀective ﬁeld Beﬀ given by
q Beﬀ = J sj + B, (5.90) j =1 where the sum over j in (5.90) is over the q neighbors of i. Because the orientation of the neighboring spins depends on the orientation of spin i, Beﬀ ﬂuctuates from its mean
q Beﬀ = J sj + B, (5.91a) j =1 = Jqm + B, (5.91b) where sj = m for all j . In the meanﬁeld approximation, we ignore the deviations of Beﬀ from
Beﬀ and assume that the ﬁeld at i is Beﬀ , independent of the orientation of si . This assumption
is clearly an approximation because if si is up, then its neighbors are more likely to be up. This
correlation is ignored in the meanﬁeld approximation.
It is straightforward to write the partition function for one spin in an eﬀective ﬁeld:
eβs1 Beff = 2 cosh β (Jqm + B ). Z1 = (5.92) s1 =±1 The free energy per spin is
f =− 1
ln Z1 = −kT ln 2 cosh β (Jqm + B ) ,
β (5.93) ∂f
= tanh β (Jqm + B ).
∂B (5.94) and the magnetization is
m=− Equation (5.94) for m is a selfconsistent transcendental equation whose solution yields m.
That is, the meanﬁeld that inﬂuences the mean value of m depends on the mean value of m.
From Figure 5.9 we see that nonzero solutions for m exist for B = 0 when βqJ ≥ 1. Thus the
critical temperature Tc is given by
kTc = Jq.
(5.95)
That is, m = 0 for T ≤ Tc and m = 0 for T > Tc for B = 0. Near Tc the magnetization is small,
and we can expand tanh βJqm to ﬁnd
1
m = βJqm − (βJqm)3 + . . .
3 (5.96) CHAPTER 5. MAGNETIC SYSTEMS 1 213 1 βJq = 0.8 βJq = 2.0
stable 0.5 0.5 0 0
stable 0.5
1
1.5 unstable 0.5 stable 1
1 0.5 0 0.5 1 1.5 m 1.5 1 0.5 0 0.5 1 1.5 m Figure 5.9: Graphical solution of the selfconsistent equation (5.94). The solution m = 0 exists for
all T , but the solution m = 0 exists only for T suﬃciently small that the initial slope of tanh βqJ
is larger than one.
Equation (5.96) has two solutions:
m = 0, (5.97a) and
m= 31/2
(βJq − 1)1/2 .
(βJq )3/2 (5.97b) The ﬁrst solution corresponds to the high temperature disordered paramagnetic state and the
second solution to the low temperature ordered ferromagnetic state. How do we know which
solution to choose? The answer can be found by calculating the free energies for both solutions
and choosing the solution that gives the smaller free energy (see Problem 5.39).
If we set kTc = Jq , it is easy to see that the spontaneous magnetization vanishes as
m(T ) = 31/2 T Tc − T
Tc
Tc 1/2 . (5.98) We see from (5.98) that m approaches zero as T approaches from Tc from below. As mentioned,
the quantity m is called the order parameter of the system, because m = 0 implies that the system
is ordered and m = 0 implies that it is not.
In terms of the dimensionless temperature diﬀerence = (Tc − T )/Tc, the exponent for the
asymptotic power law behavior of the order parameter is given by
m(T ) ∼ β . (5.99) where we have introduced the critical exponent β (not to be confused with the inverse temperature).
From (5.98) we see that meanﬁeld theory predicts that β = 1/2. What is the value of β for the
twodimensional Ising model (see Table 5.1)?
Problem 5.17. Plot the numerical solution of (5.94) for m as a function of T /Tc for B = 0. CHAPTER 5. MAGNETIC SYSTEMS 214 Problem 5.18. Determine m(T ) from the numerical solution of (5.94) for B = 1 and compare
your values with the exact solution in one dimension (see (5.73)).
We next ﬁnd the behavior of other important physical properties near Tc . The zeroﬁeld
isothermal susceptibility (per spin) is given by
β (1 − tanh2 βJqm)
∂m
=
.
B →0 ∂B
1 − βJq (1 − tanh2 βJqm) χ = lim (5.100) Note that for very high temperatures, βJ → 0, and χ from (5.100) approaches the Curie law for
noninteracting spins as expected. For T above and close to Tc , we ﬁnd (see Problem 5.19)
χ∼ T
.
T − Tc (5.101) The result (5.101) for χ is known as the CurieWeiss law. We characterize the divergence of the
zeroﬁeld susceptibility as the critical point is approached from either the low or high temperature
side as
χ ∼  −γ .
(T near Tc )
(5.102)
The meanﬁeld prediction for the critical exponent γ is γ = 1.
+
Problem 5.19. (a) Use the relation (5.100) to show that χ = (1/k )(T − Tc )−1 for T → Tc . (b)
−1
−
It is more diﬃcult to show that χ = (1/2k )(T c − T ) for T → Tc . The magnetization at Tc as a function of B can be calculated by expanding (5.94) to third
order in B with β = βc = 1/qJ :
1
m = m + βc B − (m + βc B )3 + . . .
3
For m and B very small, we can assume that βc B
m. This assumption yields
m = (3βc B )1/3 , (T = Tc ) (5.103) (5.104) which is consistent with this assumption. In general, we write
m ∼ B 1/δ . (T = Tc ) (5.105) The meanﬁeld prediction is δ = 3.
The energy per spin in the meanﬁeld approximation is simply
1
E = − Jqm2 ,
(5.106)
2
which is the average value of the interaction energy divided by two to account for double counting.
Because m = 0 for T > Tc , the energy vanishes for all T > Tc and thus the heat capacity also
vanishes according to meanﬁeld theory. Below Tc the energy is given by
1
2
(5.107)
E = − Jq tanh(β (Jqm + B )) .
2
The speciﬁc heat can be calculated from (5.107) for T < Tc . As shown in Problem 5.20, C → 3k/2
for T → Tc from below. Hence, meanﬁeld theory predicts predicts that there is a jump in the
speciﬁc heat. CHAPTER 5. MAGNETIC SYSTEMS 215 Problem 5.20. Show that according to meanﬁeld theory, there is a jump of 3k/2 in the speciﬁc
heat at T = Tc .
Now let us compare the results of meanﬁeld theory near the phase transition with the exact
results for the one and twodimensional Ising models. The fact that the meanﬁeld result (5.95)
for Tc depends only on q , the number of nearest neighbors, and not the spatial dimension d is one
of the inadequacies of the theory. The simple meanﬁeld theory even predicts a phase transition in
one dimension, which we know is qualitatively incorrect. In Table 5.2 the meanﬁeld predictions
for Tc are compared to the best known estimate of the critical temperatures for the Ising model
on two and threedimensional lattices. We see that for each dimension the meanﬁeld theory
prediction improves as the number of neighbors increases. Another diﬃculty is that the mean
energy vanishes above Tc , a result that is clearly incorrect. The source of this diﬃculty is that the
correlation between the spins has been ignored.
lattice
square
triangular
diamond
simple cubic
bcc
fcc d
2
2
3
3
3
3 q
4
6
4
6
8
12 Tmf /Tc
1.763
1.648
1.479
1.330
1.260
1.225 Table 5.2: Comparison of the meanﬁeld predictions for the critical temperature of the Ising model
with exact results and the best known estimates for diﬀerent spatial dimensions d and lattice
symmetries. Meanﬁeld theory predicts that near Tc , various thermodynamic properties exhibit power law
behavior as deﬁned in (5.99), (5.102), and (5.105). The meanﬁeld prediction for the critical
exponents are β = 1/2, γ = 1, and δ = 3 respectively (see Table 5.1). Note that the meanﬁeld results for the critical exponents are independent of dimension. These values of the critical
exponents do not agree with the results of the Onsager solution of the twodimensional Ising
model. On the other hand, the meanﬁeld predictions for the critical exponents are not terribly
wrong. Another limitation of meanﬁeld theory is that it predicts a jump in the speciﬁc heat,
whereas the Onsager solution predicts a logarithmic divergence. Similar disagreements are found
in three dimensions. However, the meanﬁeld predictions do yield the correct results for the critical
exponents in the unphysical case of four and higher dimensions. In Section 9.4 we discuss more
sophisticated treatments of meanﬁeld theory that yield better results for the temperature and
magnetic ﬁeld dependence of the magnetization and other thermodynamic quantities. However,
all meanﬁeld theories predict the same (incorrect) values for the critical exponents.
Problem 5.21. From Table 5.1, we see that the predictions of meanﬁeld theory increase in
accuracy with increasing dimensionality. Why is this trend reasonable? CHAPTER 5. MAGNETIC SYSTEMS 5.7 216 *Inﬁniterange interactions We might expect that meanﬁeld theory would become exact in a system for which every spin
interacts equally strongly with every other spin. We will refer to this model as the inﬁniterange
Ising model, although the interaction range becomes inﬁnite only in the limit N → ∞. We will
leave it as a problem to show that for such a system of N spins, the energy is given by
E= JN
(N − M 2 ),
2 (5.108) where M is the magnetization and JN is the interaction between any two spins. Note that E
depends only on M . We also leave it as a problem to show that the number of states with
magnetization M is given by
N!
,
(5.109)
g (M ) =
n!(N − n)!
where n is the number of up spins. As before, we have n = N/2 + M/2 and N − n = N/2 − M/2.
Problem 5.22. (a) Show that the energy of a system for which every spin interacts with every
other spin is given by (5.108). One way to do so is to consider a small system, say N = 9 and
to work out the various possibilities. As you do so, you will see how to generalize your results to
arbitrary N . (b) Use similar considerations as in part (a) to ﬁnd the number of states as in (5.109).
We have to scale the energy of interaction JN to obtain a wellbehaved thermodynamic limit.
If we did not, the energy change associated with the ﬂip of a spin would grow linearly with N and
a welldeﬁned thermodynamic limit would not exist. We will choose
JN = qJ
,
N (5.110) so that kTc /J = q when N → ∞.
Given the energy in (5.108) and the number of states in (5.109), we can write the partition
function as
2
N!
(5.111)
ZN =
e−βJN (N −M )/2 e−βBM ,
N
M
N
M
( 2 + 2 )!( 2 − 2 )!
M
where we have included the interaction with an external magnetic ﬁeld. For N not too large, we
can evaluate the sum over M numerically. For N very large we can convert the sum to an integral.
We write
∞
dM Z (M ), (5.112) N!
e−βE e−βBM ,
n!(N − n)! (5.113) Z=
−∞ where
Z (M ) = where n = (M + N )/2. A plot of Z (M ) shows that it is peaked about some value of M . So let
us do our usual trick of expanding ln ZM about its maximum. For simplicity, we will ﬁrst ﬁnd the
value of M for which Z (M ) is a maximum. We write
ln Z (M ) = ln N ! − ln n! − ln(N − n)! − βE + βhM. (5.114) CHAPTER 5. MAGNETIC SYSTEMS
Then we use that the fact that
obtain d
dx (ln x!) 217
= ln x, dn/dM = 1/2, and d(N − n)/dM = −1/2 and d ln Z (M )
1
1
= − ln n + ln(N − n) + βJN M + βB
dM
2
2
1N
1N
= − ln (1 + m) + ln (1 − m) + qβJm + βB
2
2
2
2
1
1
= − ln(1 + m) + ln(1 − m) + qβJm + βB = 0.
2
2 (5.115a)
(5.115b)
(5.115c) We set d(ln Z (M ))/dM = 0 because we wish to ﬁnd the value of M that maximizes Z (M ). We
have
1 1−m
ln
= −β (qJm + B ),
(5.116)
2 1+m
so that
1−m
= e−2β (qJm+B ) = x
(5.117)
1+m
Finally we solve (5.117) for m in terms of x and obtain
1 − m = x(1 + m)
m(−1 − x) = −1 + x (5.118a)
(5.118b) −2β (Jqm+B ) 1−x
1−e
= −2β (Jqm+B )
1+x
e
+1
eβ (Jqm+B ) − e−β (qm+B )
= −β (Jqm+B )
e
+ eβ (qm+B )
= tanh(β (Jqm + B ). m= (5.119a)
(5.119b)
(5.119c) Note that (5.119c) is identical to the previous meanﬁeld result in (5.94).
∗ Problem 5.23. Show that Z (M ) can be written as a Gaussian and then do the integral over M
in (5.112) to ﬁnd the meanﬁeld form of Z . Then use this form of Z to ﬁnd the meanﬁeld result
for the free energy F . Appendix 5A. How does magnetism occur in matter?
Classical electromagnetic theory tells us that magnetic ﬁelds are due to electrical currents and
changing electric ﬁelds, and that the magnetic ﬁelds far from the currents are described by a
magnetic dipole. It is natural to assume that magnetic eﬀects in matter are due to microscopic
current loops created by the motion of electrons in atoms. However, it was shown by Niels Bohr
in his doctoral thesis of 1911 and independently by Johanna H. van Leeuwen in her 1919 doctoral
thesis that the phenomena of diamagnetism does not exist in classical physics (see Problem 6.76).
Hence, magnetism can be understood only by using quantum mechanics.
The most obvious new physics due to quantum mechanics is the existence of an intrinsic
magnetic moment. The intrinsic magnetic moment is proportional to the intrinsic spin, another CHAPTER 5. MAGNETIC SYSTEMS 218 quantum mechanical property. The interaction energy between a single spin and an externally
applied magnetic ﬁeld B is given by
E = − µ · B.
(5.120)
There is a distinction between the magnetic ﬁeld produced by currents external to the material
and the ﬁeld produced internally by the magnetic moments within the material. The applied ﬁeld
is denoted as H, and the total ﬁeld is denoted as B. The ﬁelds B and H are related to the
magnetization per unit volume, m = M/V , by
B = µ0 (H + m). (5.121) The energy due to the external magnetic ﬁeld H coupled to M is
E = − M · H. (5.122) The origin of the interaction energy between magnetic moments must be due to quantum
mechanics. Because the electrons responsible for magnetic behavior are localized near the atoms of
a regular lattice in most magnetic materials, we consider the simple case of two localized electrons.
Each electron has a spin 1/2 which can point either up or down along the axis that is speciﬁed by
the applied magnetic ﬁeld. The electrons interact with each other and with nearby atoms and are
described in part by the spatial wavefunction ψ (r1 , r2 ). This wavefunction must be multiplied by
the spin eigenstates to obtain the actual state of the two electron system. We denote the basis for
these states as
 ↑↑ ,  ↓↓ ,  ↑↓ ,  ↓↑ ,
(5.123)
where the arrows corresponds to the spin of the electrons. These states are eigenstates of the
z component of the total spin angular momentum, Sz , such that Sz operating on any of the
states in (5.123) has an eigenvalue equal to the sum of the spins in the z direction. For example,
Sz  ↑↑ = 1 ↑↑ and Sz  ↑↓ = 0 ↑↓ . Similarly, Sx or Sy give zero if either operator acts on these
states. Because electrons are fermions, the basis states in (5.123) are not physically meaningful,
because if two electrons are interchanged, the new wavefunction must either be the same or diﬀer
by a minus sign. The simplest normalized linear combinations of the states in (5.123) that satisfy
this condition are
1
√ ( ↑↓ −  ↓↑ )
2
 ↑↑
1
√ ( ↑↓ +  ↓↑ )
2
 ↓↓ (5.124a)
(5.124b)
(5.124c)
(5.124d) The state in (5.124a) is antisymmetric, because interchanging the two electrons leads to minus the
original state. This state has a total spin, S = 0, and is called the singlet state. The collection
of the last three states is called the triplet state and has S = 1. Because the states of fermions
must be antisymmetric, the spin state is antisymmetric when the spatial part of the wavefunction
ψ (r1 , r2 ) is symmetric and vice versa. That is, if the spins are parallel, then ψ (r1 , r2 ) = −ψ (r2 , r1 ).
Similarly, if the spins are antiparallel, then ψ (r1 , r2 ) = +ψ (r2 , r1 ). Hence, when r1 = r2 , ψ is zero
for parallel spins and is nonzero for antiparallel spins. We conclude that if the spins are parallel, CHAPTER 5. MAGNETIC SYSTEMS 219 the separation between the two electrons will rarely be small and their average electrostatic energy
will be less than it is for antiparallel spins. We denote Etriplet and Esinglet as the triplet energy and
the singlet energy, respectively, and write the interaction energy in terms of the spin operators.
We write
(5.125)
(S1 + S2 )2 = S1 2 + S2 2 + 2 S1 · S2 .
For spin 1/2, S1 2 = S1 (S1 + 1) = 3/4 = S2 2 . The total spin, S = S1 + S2 equals zero for the singlet
state and unity for the triplet state. Hence, S 2 = S (S + 1) = 0 for the singlet state and S 2 = 2 for
the triplet state. These results lead to S1 · S2 = −3/4 for the singlet state and S1 · S2 = 1/4 for
the triplet state and allows us to write
E= 1
(Esinglet + 3Etriplet ) − J S1 · S2 ,
4 (5.126) where J = Esinglet − Etriplet. The ﬁrst part of (5.126) is a constant and can be omitted by suitably
deﬁning the zero of energy. The second term represents a convenient form of the interaction
between two spins.
Can we write the total eﬀective interaction of a system of three spins as −J12 S1 · S2 −
J23 S2 · S3 − J13 S1 · S3 ? In general, the answer is no, and we can only hope that this simple form is
a reasonable approximation. The total energy of the most common model of magnetism is based
on the form (5.126) for the spinspin interaction and is expressed as
N ˆ
H =− N Jij Si · Sj − gµ0 H ·
i<j =1 Si , (Heisenberg model) (5.127) i=1 where gµ0 is the magnetic moment of the electron. The exchange interaction Jij can be positive
or negative. The form (5.127) of the interaction energy is known as the Heisenberg model. Note
ˆ
that S as well as the Hamiltonian H is an operator, and that the Heisenberg model is quantum
ˆ
mechanical in nature. The distinction between the operator H and the magnetic ﬁeld H will be
clear from the context.
As we have seen, the Heisenberg model assumes that we can treat all interactions in terms of
pairs of spins. This assumption means that the magnetic ions in the crystal must be suﬃciently
far apart that the overlap of their wavefunctions is small. We also have neglected any orbital
contribution to the total angular momentum. In addition, dipolar interactions can be important
and lead to a coupling between the spin degrees of freedom and the relative displacements of the
magnetic ions. In general, it is very diﬃcult to obtain the exact Hamiltonian from ﬁrst principles,
and the Heisenberg form of the Hamiltonian should be considered as a reasonable approximation
with the details buried into the exchange constant J .
The Heisenberg model is the starting point for most microscopic models of magnetism. We
can go to the classical limit S → ∞, consider spins with one, two, or three components, place the
spins on lattices of any dimension and any crystal structure, and allow J to be positive, negative,
random, nearestneighbor, longrange, etc. In addition, we can include other interactions such as
the interaction of an electron with an ion. The theoretical possibilities are very rich as are the
types of magnetic materials of interest experimentally. CHAPTER 5. MAGNETIC SYSTEMS 220 Figure 5.10: The ground state of N = 5 Ising spins in an external magnetic ﬁeld. For toroidal
boundary conditions, the ground state energy is E0 = −5J − 5B . Figure 5.11: The ﬂip of a single spin of N = 5 Ising spins. The corresponding energy cost is
4J + 2B . Appendix 5B. The Thermodynamics of Magnetism
[xx not written xx] Appendix 5C: Low Temperature Expansion
The existence of exact analytical solutions for systems with nontrivial interactions is the exception.
In general, we must be satisﬁed with approximate solutions with limited ranges of applicability.
To understand the nature of one class of approximations, we reconsider the onedimensional Ising
model at low temperatures.
Suppose that we are interested in the behavior of the Ising model at low temperatures in
the presence of a magnetic ﬁeld B . We know that the state of lowest energy (the ground state)
corresponds to all spins completely aligned. What happens when we raise the temperature slightly
above T = 0? The only way that the system can raise its energy is by ﬂipping one or more spins.
At a given temperature we can consider the excited states corresponding to 1, 2, . . . , f ﬂipped
spins. These f spins may be connected or may consist of disconnected groups.
As an example, consider a system of N = 5 spins with toroidal boundary conditions. The
ground state is shown in Figure 5.10. The energy cost of ﬂipping a single spin is 4J + 2B . (The
energy of interaction of the ﬂipped spin changes from −2J to +2J .) A typical conﬁguration is
shown in Figure 5.11. Because the ﬂipped spin can be at N = 5 diﬀerent sites, we write
Z = [1 + 5 e−β (4J +2B ) + . . .]e−βE0 , (5.128) where E0 = −5(J + B ).
The next higher energy excitation consists of a pair of ﬂipped spins with one contribution
arising from pairs that are not nearest neighbors and the other contribution arising from nearestneighbor pairs (see Figure 5.12). We will leave it as an exercise (see Problem 5.24) to determine
the corresponding energies and the number of diﬀerent ways that this type of excitation occurs. CHAPTER 5. MAGNETIC SYSTEMS 221 (a) (b) Figure 5.12: Conﬁgurations corresponding to two ﬂipped spins. In (a) the ﬂipped spins are not
nearest neighbors and in (b) the ﬂipped spins are neighbors. Problem 5.24. Use the microstates that were enumerated in Problem 5.6 to ﬁnd the low temperature expansion of Z for a system of N = 5 spins in one dimension. Use toroidal boundary
conditions. Write your result for Z in terms of the variables
u = e−2βJ , (5.129) w = e−2βB . (5.130) and
∗ Problem 5.25. Generalize the low temperature expansion to ﬁnd the order w3 contributions to
ZN . If you have patience, go to higher order. An inspection of the series might convince you that
the low temperature series can be summed exactly, (The low temperature series for the Ising can
be summed exactly only in one dimension.)
Problem 5.26. Use (5.35 and (5.37) to ﬁnd the low temperature behavior of F and S for a onedimensional Ising chain in the absence of an external magnetic ﬁeld. Compare your results with
the above qualitative arguments. Appendix D: High Temperature Expansion
At high temperatures for which J/kT
1, the eﬀects of the interaction between the spins become
small. We can develop a perturbation method that is based on expanding Z in terms of the small
parameter J/kT . For simplicity, we consider the Ising model in zero magnetic ﬁeld. We write
eβJsi sj , ZN =
s (5.131) i,j =nn(i) where the sum is over all states of the N spins, and the product is restricted to nearestneighbor
pairs of sites ij in the lattice. We ﬁrst apply the identity
eβJsi sj = cosh βJ + si sj sinh βJ
= cosh βJ (1 + vsi sj ), (5.132) where
v = tanh βJ. (5.133) CHAPTER 5. MAGNETIC SYSTEMS 222 The identity (5.132) can be demonstrated by considering the various cases si , sj = ±1 (see Problem 5.34). The variable v approaches zero as T → ∞ and will be used as an expansion parameter
instead of J/kT for reasons that will become clear later. Equation (5.131) can now be written as
ZN = (cosh βJ )p (1 + vsi sj ),
s (5.134) ij where p is the total number of nearestneighbor pairs in the lattice, that is, the total number of
interactions. For a lattice with toroidal boundary conditions
p= 1
N q,
2 (5.135) where q is the number of nearest neighbor sites of a given site; q = 2 for an Ising chain.
To make the above procedure explicit, consider the case N = 3 with toroidal boundary
conditions. For this case p = 3(2)/2 = 3, and there are three factors in the product in (5.134):
(1 + vs1 s2 )(1 + vs2 s3 )(1 + vs3 s1 ). If we expand this product in powers of v , we obtain the 2p = 8
terms in the partition function:
1 1 1 ZN =3 = (cosh βJ )3 1 + v (s1 s2 + s2 s3 + s3 s1 )
s1 =−1 s2 =−1 s3 =−1 + v 2 (s1 s2 s2 s3 + s1 s2 s3 s1 + s2 s3 s3 s1 ) + v 3 s1 s2 s2 s3 s3 s1 . (5.136) It is convenient to introduce a onetoone correspondence between each of the eight terms
in the bracket in (5.136) and a diagram on the lattice. The set of eight diagrams is shown in
Figure 5.13. Because v enters into the product in (5.136) as vsi sj , a diagram of order v n has n
v bonds. We can use the topology of the diagrams to help us to keep track of the terms in (5.136).
The term of order v 0 is simply 2N =3 = 8. Because si =±1 si = 0, each of the terms of order v
vanish. Similarly, each of the three terms of order v 2 contains at least one of the spin variables
raised to an odd power so that these terms also vanish. For example, s1 s2 s2 s3 = s1 s3 , and both
s1 and s3 enter to ﬁrstorder. In general, we have
1 si n =
si =−1 2
0 n even
n odd (5.137) From (5.137) we see that only terms of order v 0 and v 3 contribute so that
ZN =3 = cosh3 βJ [8 + 8v 3 ] = 23 (cosh3 βJ + sinh3 βJ ). (5.138) We now generalize the above analysis to arbitrary N . We have observed that the diagrams
that correspond to nonvanishing terms in Z are those that have an even number of bonds from
each vertex; these diagrams are called closed. The reason is that a bond from site i corresponds
to a product of the form si sj . An even number of bonds from site i implies that si to an even
power enters into the sum in (5.134). Hence, only diagrams with an even number of bonds from
each vertex yield a nonzero contribution to ZN . CHAPTER 5. MAGNETIC SYSTEMS 223 1 v0
2 3 v1 v2 v3 Figure 5.13: The eight diagrams that correspond to the eight terms in the Ising model partition
function for the N = 3 Ising chain. The term si sj is represented by a line is represented by a line
between the neighboring sites i and j . For the Ising chain, only two bonds can come from a given site. Hence, we see that although
there are 2N diagrams for a Ising chain of N spins with toroidal boundary conditions, only the
diagrams of order v 0 (with no bonds) and of order v N will contribute to ZN . We conclude that
ZN = (cosh βJ )N [2N + 2N v N ]. (5.139) Problem 5.27. Draw the diagrams that correspond to the terms in the high temperature expansion of the Ising model partition function for the N = 4 Ising chain.
Problem 5.28. The form of ZN in (5.139) is not identical to the form of ZN given in (5.28). Use
the fact that v < 1 and take the thermodynamic limit N → ∞ to explain the equivalence of the
two results for ZN . Vocabulary
magnetization m, zero ﬁeld susceptibility χ
Ising model, exchange constant J
correlation function G(r), correlation length ξ , domain wall
order parameter, continuous phase transition, critical point
critical temperature Tc , critical exponents α, β , δ , γ , ν , η CHAPTER 5. MAGNETIC SYSTEMS 224 exact enumeration, meanﬁeld theory
low and high temperature expansions Additional Problems
Problems
5.1
5.2, 5.3, 5.4
5.5
5.6, 5.7
5.8
5.9
5.10
5.11, 5.12
5.13
5.14
5.15, 5.16
5.17, 5.18
5.19
5.20
5.21, 5.22
5.23
5.24, 5.25, 5.26
5.27, 5.28 page
191
193
197
198
198
199
201
202
205
205
211
214
214
215
216
217
221
223 Table 5.3: Listing of inline problems.
Problem 5.29. Thermodynamics of classical spins. The energy of interaction of a classical magnetic dipole with an external magnetic ﬁeld B is given by
E = −µ · B = −µB cos θ, (5.140) where θ is the continuously variable angle between µ and B. In the absence of an external
ﬁeld, the dipoles (or spins as they are commonly called) are randomly oriented so that the mean
magnetization is zero. If B = 0, the mean magnetization is given by
M = µN cos θ . (5.141) The direction of the magnetization is parallel to B. Show that the partition function for one spin
is given by
2π π Z1 =
0 eβµB cos θ sin θ dθ dφ. (5.142) 0 How is cos θ related to Z1 ? Show that
M = N µL β µB , (5.143) CHAPTER 5. MAGNETIC SYSTEMS 225 where the Langevin function L(x) is given by
L(x) = 1
1
e x + e −x
− = coth x − .
x − e −x
e
x
x (5.144) For x < π , L(x) can be expanded as
L(x) = 22n B2n
x x3
−
+ ... +
+ ...,
3
45
(2n)! (x 1) (5.145) where Bn is the Bernoulli number of order n (see Appendix A). What is M and the susceptibility
in the limit of high T ? For large x, L(x) is given by
L(x) ≈ 1 − 1
+ 2e−2x .
x (x 1) (5.146) What is the behavior of M in the limit of low T ?
Problem 5.30. Arbitrary spin. The magnetic moment of an atom or nucleus is associated with
its angular momentum which is quantized. If the angular momentum is J , the magnetic moment
along the direction of B is restricted to (2J + 1) orientations. We write the energy of an individual
atom as
(5.147)
E = −gµ0 J · B = −gµ0 Jz B.
The values of µ0 and g depend on whether we are considering a nucleus, an atom, or an electron.
The values of Jz are restricted to −J, −J + 1, −J + 2, . . . , J − 1, J . Hence, the partition function
for one atom contains (2J + 1) terms:
J e−β (−gµ0 mB ) . Z1 = (5.148) m =− J The summation index m ranges from −J to J in integral steps.
To simplify the notation, we let α = βgµ0 B , and write Z1 as a ﬁnite geometrical series:
J Z1 = emα , (5.149a) m =− J = e−αJ (1 + eα + e2α + . . . + e2Jα ). (5.149b) The sum of a ﬁnite geometrical series is given by
n Sn =
p=0 xp = xn+1 − 1
.
x−1 (5.150) Given that there are (2J + 1) terms in (5.149b), show that
Z = e−αJ e(2J +1)α − 1
[1 − e(2J +1)α ]
= e−αJ
.
eα − 1
1 − eα (5.151) CHAPTER 5. MAGNETIC SYSTEMS 226 Figure 5.14: Five conﬁgurations of the N = 10 Ising chain with toroidal boundary conditions
generated by the Metropolis algorithm at βJ = 1 and B = 0. Use the above relations to show that
M = N gµ0 JBJ (α), (5.152) where the Brillouin function BJ (α) is deﬁned as
BJ (α) = 1
1
(J + 1/2) coth(J + 1/2)α − coth α/2 .
J
2 (5.153) What is the limiting behavior of M for high and low T for ﬁxed B ? What is the limiting behavior
of M for J = 1 and J
1?
2
Problem 5.31. Suppose that the total energy of a system of spins can be written as
E = E0 − M B, (5.154) where the ﬁrst term does not depend explicitly on the magnetic ﬁeld B , and M is the magnetization
of the system (in the direction of B). Show that the form of (5.154) implies that the zeroﬁeld
susceptibility can be expressed as
χ= ∂M
∂B B =0 = 1
kT M2 − M 2 . (5.155) Problem 5.32. The ﬁve conﬁgurations shown in Figure 5.14 for the Ising chain were generated
using the Metropolis algorithm (see Section 5.4.3) at βJ = 1 using toroidal boundary conditions.
On the basis of this limited sample, estimate the mean value of E/J , the speciﬁc heat per spin,
and the spin correlation G(r) for r = 1, 2, and 3. For simplicity, take only one of the spins to be
the origin. However, better results CHAPTER 5. MAGNETIC SYSTEMS 227 Problem 5.33. Use the applet at <stp.clarku.edu/simulations/ising/> to determine P (E ),
the probability that the system has energy E , for the twodimensional Ising model. (For the
Ising model the energy is a discrete variable.) What is the approximate form of the probability
distribution at T = 4? What is its width? Then take T = Tc ≈ 2.269. Is the form of P (E ) similar?
If not, why?
Problem 5.34. Verify the validity of the identity (5.132) by considering the diﬀerent possible
values of si sj and using the identities 2 cosh x = ex + e−x and 2 sinh x = ex − e−x .
Problem 5.35. Explore the analogy between the behavior of the Ising model and the behavior
of a large group of people. Under what conditions would a group of people act like a collection
of individuals doing their “own thing?” Under what conditions might they act as a group? What
factors could cause such a transition?
∗ Problem 5.36. Ising chain. Write a program that uses the demon algorithm to generate a
representative sample of microstates for the Ising chain at ﬁxed energy. The energy for a particular
conﬁguration of spins is given by
N E = −J si si+1 , (5.156) i=1 where si = ±1 and sN +1 = s1 . The easiest trial change is to ﬂip a spin from +1 to −1 or vice
versa. For such a ﬂip the possible changes in energy are ∆E = Etrial − Eold = 0 and ±4J . Conﬁrm
that the possible energies of the spins are E = −N J, −N J + 4J, −N J + 8J . . . + N J , and that
the possible demon energies are Ed = 4nJ , where n = 0, 1, 2, . . . The most diﬃcult part of the
simulation is choosing the initial state so that it has the desired energy. Choose N = 20 and
Ed = 0 initially. Write a subroutine that begins with all spins down and randomly ﬂips spins
until the desired energy is reached. Choose J to be the unit of energy. Collect data for the mean
energy of the system and the mean demon energy for about ten diﬀerent energies and plot your
results. Equilibrate the spins for about 100 ﬂips per spin before taking averages for each value of
the total energy. Average over approximately 1000 ﬂips per spin. Increase the number of ﬂips per
spin until you obtain reasonable results. Compute the probability density, P (Ed ), and determine
its dependence on Ed by making a semilog plot.
∗ Problem 5.37. Consider a onedimensional Isingtype model deﬁned by the usual Hamiltonian
with B = 0, but with si = 0, ±1. Use the transfer matrix method to calculate the dependence of
the energy on T . The solution requires the diﬀerentiation of the root of a cubic equation that you
might wish to do numerically.
Problem 5.38. Calculate the partition function for the Ising model on a square lattice for N = 4
and N = 9 in the presence of an external magnetic ﬁeld. Assume that the system is in equilibrium
with a thermal bath at temperature T . You might ﬁnd it easier to write a short program to
enumerate all the microstates. Choose either toroidal or open boundary conditions. Calculate the
corresponding values of the mean energy, the heat capacity, and the zero ﬁeld susceptibility.
Problem 5.39. It can be shown that the free energy per spin in the meanﬁeld approximation is
given by
(5.157)
f = −(q/2)Jm2 − Bm + kT s(m), CHAPTER 5. MAGNETIC SYSTEMS 228 Figure 5.15: Two examples of possible diagrams on the square lattice. The only term that contributes to Z corresponds to the square. where 1
1
1
1
(1 + m) ln (1 + m) + (1 − m) ln (1 − m).
(5.158)
2
2
2
2
Show that m = 0 provides a lower free energy for T > Tc , and that m = 0 provides a lower free
energy for T < Tc .
s(m) = Problem 5.40. Write (5.94) in the form βqJm = tanh−1 m = (1/2) ln[(1 + m)/(1 − m)] and show
that
as T → 0
(5.159)
m ≈ 1 − 2e−βqJ
∗ Problem 5.41. The high temperature expansion we discussed for the Ising chain in Section 5.7
is very general and can be readily applied to the two and threedimensional Ising model. We write
N q/2 ZN = (cosh βJ )N q/2 2N g (b)v b , (5.160) b=0 where g (b) is the number of diagrams with b bonds such that each vertex of the diagram is even.
It is understood that g (0) = 1. The form of (5.160) implies that we have reduced the calculation
of the Ising model partition function to the problem of counting closed diagrams on a lattice. For
the Ising model on the square lattice (q = 4), the ﬁrst nontrivial contribution to ZN comes from
loops made up of four bonds (see Figure 5.15) and is given by
(cosh βJ )2N 2N g (4)v 4 , (5.161) where g (4) = N . It is possible to sum many terms in the high temperature expansion of ZN and
other quantities and determine the thermodynamic behavior for all temperatures including the
vicinity of the phase transition. [xx need to expand later xx]
To make the high temperature expansion more explicit, work out the ﬁrst several terms in
(5.160) for a twodimensional Ising model with N = 4 and N = 9.
Problem 5.42. Use the Metropolis algorithm to simulate the twodimensional Ising model. Suggestions for Further Reading CHAPTER 5. MAGNETIC SYSTEMS 229 Stephen G. Brush, “History of the LenzIsing Model,” Rev. Mod. Phys. 39, 883–893 (1967).
M. E. J. Newman and G. T. Barkema, Monte Carlo Methods in Statistical Physics, Clarendon
Press (1999).
H. Eugene Stanley, Introduction to Phase Transitions and Critical Phenomena, Oxford University
Press (1971). Chapter 6 Noninteracting Particle Systems
c 2006 by Harvey Gould and Jan Tobochnik
21 July 2006
The goal of this chapter is to apply the general formalism of statistical mechanics to classical
and quantum systems of noninteracting particles. 6.1 Introduction Noninteracting systems are important for several reasons. For example, the interaction between
the atoms in a gas can be ignored in the limit of low densities. In the limit of high temperatures,
the interaction between the spins in an Ising model can be neglected because the thermal energy
is much larger than the potential energy of interaction. Another reason for studying systems
of noninteracting particles is that there are many cases for which the equilibrium properties of
a system of interacting particles can be reformulated as a collection of noninteracting modes or
quasiparticles. We will see such an example when we study the harmonic model of a crystal. 6.2 The Classical Ideal Gas Consider an ideal gas of N identical particles of mass m conﬁned within a cube of volume V = L 3 .1
If the gas is in thermal equilibrium with a heat bath at temperature T , it is natural to treat the
ideal gas in the canonical ensemble. However, because the particles are not localized, they cannot
be distinguished from each other as were the harmonic oscillators considered in Example 4.4 and
the spins in Chapter 5. Hence, we cannot simply focus our attention on one particular particle.
On the other hand, if the temperature is suﬃciently high, we expect that we can treat a system
of particles semiclassically. To do so, the de Broglie wavelength associated with the particles must
1 An ideal gas is a good approximation to a real gas at low densities where the mean interparticle distance is
much larger than the range of the interparticle interactions. 230 231 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS be small. That is, for the semiclassical description to be valid, the mean de Broglie wavelength
λ must be smaller than any other length in the system. For an ideal gas, the only two lengths
are L, the linear dimension of the system, and ρ−1/3 , the mean distance between particles (in
three dimensions). Because we are interested in the thermodynamic limit for which L
λ, the
semiclassical limit requires that
ρ−1/3 or ρλ3 λ 1. (semiclassical limit) (6.1) Problem 6.1. Mean distance between particles
(a) Consider a system of N particles conﬁned to a line of length L. What is the mean distance
between the particles?
(b) Consider a similar system of N particles conﬁned to a square of linear dimension L. What is
the mean distance between the particles?
To estimate the magnitude of λ, we need to know the typical value of the momentum of a
particle. If the kinematics of the particles can be treated nonrelativistically, we know from ( 4.64)
that p2√ m = 3kT/2. (We will rederive this result in Section 6.3.) Hence, we have p2 ∼ mkT and
/2
λ ∼ h/ mkT . We will ﬁnd it is convenient to deﬁne the thermal de Broglie wavelength λ as
λ= h2
2πmkT h
=
(2πmkT )1/2 1/2 = 2π 2
mkT 1/2 . (thermal de Broglie wavelength) (6.2) The calculation of the partition function of an ideal gas in the semiclassical limit proceeds
as follows. First, we assume that λ
ρ−1/3 so that we could pick out one particle from another
if the particles were distinguishable. (If λ ∼ ρ−1/3 , the wave functions of the particles would
overlap.) Of course, identical particles are intrinsically indistinguishable, so we will have to correct
for overcounting later.
With these considerations we now calculate Z1 , the partition function for one particle, in the
semiclassical limit. As we found in (4.41), the energy eigenvalues of a particle in a cube of side L
are given by
h2
(6.3)
(nx 2 + ny 2 + nz 2 ),
n=
8mL2
where the subscript n represents the set of quantum numbers nx , ny , and nz , each of which can
be any nonzero, positive integer. The corresponding partition function is given by
∞ e−β Z1 = n ∞ ∞ e−βh = n 2 (nx 2 +ny 2 +nz 2 )/8mL2 . (6.4) nx =1 ny =1 nz =1 Because the sum over each quantum number is independent of the other two quantum numbers,
we can rewrite (6.4) as
∞ ∞ ∞ e−αnx Z1 = 2 nx =1
3
= Z x Zy Zz = Z x , e−αny
ny =1 2 e−αnz 2 (6.5a) nz =1 (6.5b) 232 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
where
α= βh2
,
8mL2 (6.6) and
∞
2 e−αnx . Zx = (6.7) nx =1 The functions Zy and Zz have the same form as Zx . Of course, we could have guessed beforehand
that Z1 in (6.5b) would factor into three terms. Why? Note that the magnitude of α in (6.7) is
the order of λ2/L2
1.
We found in Problem 4.18 that the quantum numbers are order 1010 for a macroscopic system
at room temperature. Thus we can convert the sum in (6.7) to an integral:
∞ ∞
2 nx =1 ∞ 2 e−αnx = Zx = nx =0 e−αnx − 1 → 0 2 e−αnx dnx − 1. (6.8) We have accounted for the fact that the sum over nx in (6.7) is from nx = 1 rather than nx = 0.
We make a change of variables and write x2 = αn2 . The Gaussian integral can be easily evaluated
x
(see Appendix A), and we have that
Zx = L 23 m
βh2 ∞ 1/2
0 2 e−x dx − 1 = L 2πm
βh2 1/2 − 1. (6.9) Because the ﬁrst term in (6.9) is order L/λ
1, we can ignore the second term. The expressions
for Zy and Zz are identical, and hence we obtain
Z1 = Z x Zy Zz = V 2πm 3/2
.
βh2 (6.10) The result (6.10) is the partition function associated with the translational motion of one particle
in a box. Note that Z1 can be conveniently expressed as
Z1 = V
.
λ3 (6.11) It is straightforward to ﬁnd the mean pressure and energy for one particle in a box. We take
the logarithm of both sides of (6.10) and ﬁnd
ln Z1 = ln V − 3
3 2πm
ln β + ln 2 .
2
2
h (6.12) 1
kT
=
,
βV
V (6.13) Hence the mean pressure is given by
p1 =
and the mean energy is 1 ∂ ln Z1
β ∂V T ,N = CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
e1 = − ∂ ln Z1
∂β V ,N 3
3
= kT.
2β
2 = 233
(6.14) The mean energy and pressure of an ideal gas of N particles is N times that of the corresponding
quantities for one particle. Hence, we obtain for an ideal classical gas2 the equations of state
P V = N kT, (6.15) and
E= 3
N kT.
2 (6.16) The heat capacity at constant volume of an ideal gas of N particles is
CV = ∂E
∂T V = 3
N k.
2 (6.17) We have derived the mechanical and thermal equations of state for a classical ideal gas for
a second time! The derivation of the equations of state is much easier in the canonical ensemble
than in the microcanonical ensemble. The reason is that we were able to consider the partition
function of one particle because the only constraint is that the temperature is ﬁxed instead of the
total energy.
Problem 6.2. The volume dependence of Z1 should be independent of the shape of the box. Show
that the same result for Z1 is obtained if the box has linear dimensions Lx , Ly , and Lz .
Problem 6.3. We obtained the semiclassical limit of the partition function Z1 for one particle in
a box by writing it as a sum over single particle states and then converting the sum to an integral.
Show that the semiclassical partition Z1 for a particle in a onedimensional box can be expressed
as
dp dx −βp2/2m
Z1 =
e
.
(one dimension)
(6.18)
h
Remember that the integral over p in (6.18) extends from −∞ to +∞.
The entropy of a classical ideal gas of N particles. Although it is straightforward to calculate
the mean energy and pressure of an ideal classical gas, the calculation of the entropy is more subtle.
To understand the diﬃculty, consider the calculation of the partition function of an ideal gas of
three particles. Because there are no interactions between the particles, we can write the total
energy as a sum of the single particle energies: Es = 1 + 2 + 3 , where i is the energy of the ith
particle. We write the partition function Z3 as
e−β ( Z3 = 1+ 2 + 3 ) . (6.19) all states The sum over all states in (6.19) is over the states of the three particle system. If the three particles
were distinguishable, there would be no restriction on the number of particles that could be in any
2 If this section had a musical theme, the theme music for this section would be found at www.classicalgas.com/ . 234 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS single particle state, and we could sum over the possible states of each particle separately. Hence,
the partition function for a system of three distinguishable particles has the form
3
Z3 = Z 1 . (distinguishable particles) (6.20) It is instructive to show the origin of the relation (6.20) for an speciﬁc example. Suppose the
three particles are red, white, and blue and are in equilibrium with a heat bath at temperature T .
For simplicity, we assume that each particle can be in one of only three states with energy 1 , 2 ,
or 3 . The partition function for one particle is given by
Z1 = e−β 1 + e−β 2 + e−β 3 . (6.21) In Table 6.2 we show the twentyseven possible states of the system of three distinguishable particles. The corresponding partition function is given by
Z3 = e−3β 1 + e−3β + 3 e−β (2
+ e−β (2 1+ 2 ) 2+ 3 ) + 6 e−β ( 2 + e−3β 3 + e−β ( 1 +2 2 ) + e−β ( 1+ 2 + 3 ) . 1 +2 3 ) + e−β (2 + e−β ( 1+ 3 ) 2 +2 3 ) (three distinguishable particles) (6.22) It is easy to see that Z3 in (6.22) can be factored and expressed as
3
Z3 = Z 1 . (6.23) If the three particles are indistinguishable, many of the microstates shown in Table 6.2 would
be impossible. In this case we cannot assign the states of the particles independently, and the sum
over all states in (6.19) cannot be factored as in (6.20). For example, the state 1, 2, 3 could not
be distinguished from the state 1, 3, 2.
As discussed in Section 4.3.7, the semiclassical limit assumes that states with multiple occupancy such as 1, 1, 2 and 1, 1, 1 can be ignored because there are many more single particle states
than there are particles (see Problem 4.18). (In our simple example, each particle can be in one
of only three states and the number of states is comparable to the number of particles.) If we
assume that the particles are indistinguishable and that microstates with multiple occupancy can
be ignored, then Z3 is simply given by
Z3 = e−β (E1 +E2 +E3 ) . (indistinguishable, multiple occupancy ignored) (6.24) However, if the particles are distinguishable, there are 3! states (states 22–27 in Table 6.2) with
energy 1 + 2 + 3 (again ignoring states with multiple occupancy). Thus if we count microstates
assuming that the three particles are distinguishable, we overcount the number of states by the
number of permutations of the particles. Hence, in the semiclassical limit we can write
Z3 = 3
Z1
.
3! (correction for overcounting) (6.25) In general, if we begin with the fundamental quantum mechanical description of matter, then
identical particles are indistinguishable at all temperatures. However, if we make the assumption 235 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
state s
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27 red white blue 1 1 1 2 2 2 3 3 3 2 1 1 1 2 1 1 1 2 1 2 2 2 1 2 2 2 1 3 1 1 1 3 1 1 1 3 3 2 2 2 3 2 2 2 3 1 3 3 3 1 3 3 3 1 2 3 3 3 2 3 3 3 2 1 2 3 1 3 2 2 1 3 2 3 1 3 1 2 3 2 1 Es
31
32
33
2 1+ 2
2 1+ 2
2 1+ 2
1+2 2
1+2 2
1+2 2
2 1+ 3
2 1+ 3
2 1+ 3
2 2+ 3
2 2+ 3
2 2+ 3
1+2 3
1+2 3
1+2 3
2+2 3
2+2 3
2+2 3
1+ 2+
1+ 2+
1+ 2+
1+ 2+
1+ 2+
1+ 2+ 3
3
3
3
3
3 Table 6.1: The twentyseven diﬀerent states of an ideal gas of three distinguishable particles (red,
white, and blue). Each particle can be in one of three states with energy 1 , 2 , or 3 . that single particle states with multiple occupancy can be ignored, we can express Z N , the partition
function of N identical particles, as
Z1 N
.
N!
If we substitute for Z1 from (6.10), we obtain
ZN = (semiclassical limit) (6.26) V N 2πmkT 3N/2
.
(6.27)
N!
h2
If we take the logarithm of both sides of (6.27) and use Stirling’s approximation (3.89), we can
write the free energy of a noninteracting classical gas as
ZN = F = −kT ln ZN = −kT N ln V
3
2πmkT
+ ln
N
2
h2 +1 . (6.28) 236 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS In Section 6.8 we will use the grand canonical ensemble to obtain the entropy of an ideal
classical gas without any ad hoc assumptions such as introducing the factor of N ! and assuming
that the particles are distinguishable. That is, in the grand canonical ensemble we will be able to
automatically satisfy the condition that the particles are indistguishable.
Problem 6.4. Use the result (6.28) to ﬁnd the pressure equation of state and the mean energy of an ideal gas. Do these relations depend on whether the particles are indistinguishable or
distinguishable?
Problem 6.5. Entropy of an ideal gas
(a) The entropy can be found from the relations, F = E − T S or S = −∂F/∂T . Show that
S (T, V, N ) = N k ln 3
2πmkT
V
+ ln
N
2
h2 + 5
.
2 (6.29) The form of S in (6.29) is known as the SackurTetrode equation (see Problem 4.24). Is this
form of S applicable in the limit of low temperatures?
(b) Express kT in terms of E and show that S (E, V, N ) can be expressed as
S (E, V, N ) = N k ln V
3
4πmE
+ ln
N
2
3N h2 + 5
,
2 (6.30) in agreement with the result (4.62) found by using the microcanonical ensemble. The form
(6.30) of S in terms of its natural variables E , V , and N is known as the fundamental relation
for an ideal classical gas.
Problem 6.6. Order of magnitude estimates
Calculate the entropy of one mole of helium gas at standard temperature and pressure. Take
V = 2.24 × 10−2 m3 , N = 6.02 × 1023 , m = 6.65 × 10−27 kg, and T = 273 K.
Problem 6.7. Use the relation µ = ∂F/∂N and the result (6.28) to show that the chemical
potential of an ideal classical gas is given by
µ = −kT ln V 2πmkT
N
h2 3/2 . (6.31) We will see in Problem 6.49 that if two systems are placed into contact with diﬀerent initial chemical
potentials, particles will go from the system with high chemical potential to the system with low
chemical potential. (This behavior is analogous to energy going from high to low temperatures.)
Does “high” chemical potential for an ideal classical gas imply “high” or “low” density?
Problem 6.8. Entropy as an extensive quantity
(a) Because the entropy is an extensive quantity, we know that if we double the volume and double
the number of particles (thus keeping the density constant), the entropy must double. This
condition can be written formally as S (T, λV, λN ) = λS (T, V, N ). Although this behavior of
the entropy is completely general, there is no guarantee that an approximate calculation of S
will satisfy this condition. Show that the SackurTetrode form of the entropy of an ideal gas
of identical particles, (6.29), satisﬁes this general condition. CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS 237 (b) Show that if the N ! term were absent from (6.27) for ZN , S would be given by
S = N k ln V + 3
2πmkT
3
+.
ln
2
h2
2 (6.32) Is this form of S proportional to N for V/N constant?
(c) The fact that (6.32) yields an entropy that is not extensive does not indicate that identical
particles must be indistinguishable. Instead the problem arises from our identiﬁcation of S
with ln Z as discussed in Section 4.6. Recall that we considered a system with ﬁxed N and
made the identiﬁcation (see (4.104))
dS/k = d(ln Z + βE ). (6.33) It is straightforward to integrate (6.33) and obtain
S = k (ln Z + βE ) + g (N ), (6.34) where g (N ) is an arbitrary function of N only. Although we usually set g (N ) = 0, it is
important to remember that g (N ) is arbitrary. What must be the form of g (N ) in order that
the entropy of an ideal classical gas be extensive?
Entropy of mixing. Consider two identical ideal gases at the same temperature T in separate
boxes each with the same density. What is the change in entropy of the combined system after the
gases are allowed to mix? We can answer this question without doing any calculations. Because
the particles in each gas are identical, there would be no change in the total entropy. Why? What
if the gases were not identical? In this case, there would be a change in entropy because removing
the partition between the two boxes is an irreversible process. (Reinserting the partition would
not separate the two gases.) In the following we calculate the change in both cases.
Consider two ideal gases at the same temperature T with N1 and N2 particles in boxes of
volume V1 and V2 , respectively. The gases are initially separated by a partition. If we use (6.29)
for the entropy, we ﬁnd
V1
+ f (T ) ,
N1
V2
S2 = N2 k ln
+ f (T ) ,
N2
S1 = N1 k ln (6.35a)
(6.35b) where the function f (T ) = 3/2 ln(2πmkT /h2 ) + 5/2. We then allow the particles to mix so that
they ﬁll the entire volume V = V1 + V2 . If the particles are identical, the total entropy after the
removal of the partition is given by
S = k (N1 + N2 ) ln V1 + V 2
+ f (T ) ,
N1 + N 2 (6.36) and the change in the value of S , the entropy of mixing, is given by
∆S = k (N1 + N2 ) ln V1 + V 2
V1
V2
− N1 ln
− N2 ln
. (identical gases)
N1 + N 2
N1
N2 (6.37) 238 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS Problem 6.9. For the special case of equal densities of the two gases before separation, use ( 6.37)
to show that ∆S = 0 as expected. (Use the fact that N1 = ρV1 and N2 = ρV2 .) Why is the
entropy of mixing nonzero for N1 = N2 and/or V1 = V2 even though the particles are identical?
If the two gases are not identical, the total entropy after mixing is
S = k N1 ln V1 + V 2
V1 + V 2
+ N2 ln
+ (N1 + N2 )f (T ) .
N1
N2 (6.38) Then the entropy of mixing becomes
∆S = k N1 ln V1 + V 2
V1 + V 2
V1
V2
.
+ N2 ln
− N1 ln
− N2 ln
N1
N2
N1
N2 (nonidentical gases) (6.39) For the special case of N1 = N2 = N and V1 = V2 = V , we ﬁnd
∆S = 2N k ln 2. (6.40) Explain the result (6.40) in simple terms.
Problem 6.10. What would be the result for the entropy of mixing if we had used the result
(6.32) for S instead of (6.29)? Consider the special case of N1 = N2 = N and V1 = V2 = V . 6.3 Classical Systems and the Equipartition Theorem We have used the microcanonical and canonical ensembles to show that the mean energy of an
ideal classical gas in three dimensions is given by E = 3kT /2. Similarly, we have found that the
mean energy of a onedimensional harmonic oscillator is given by E = kT in the limit of high
temperatures. These results are special cases of the equipartition theorem which can be stated as
follows:
For a classical system in equilibrium with a heat bath at temperature T , the mean value
of each contribution to the total energy that is quadratic in a coordinate equals 1 kT .
2
Note that the equipartition theorem holds regardless of the coeﬃcients of the quadratic terms.
To derive the equipartition theorem, we consider the canonical ensemble and express the
average of any physical quantity f (r, p) in a classical system by
f= f (r1 , . . . , rN , p1 , . . . , pN ) e−βE (r1 ,...,rN ,p1 ,...,pN ) dr1 . . . drN dp1 . . . dpN
,
e−βE (r1 ,...,rN ,p1 ,...,pN ) dr1 . . . drN dp1 . . . dpN (6.41) where we have used the fact that the probability density of a particular microstate is proportional
to e−βE (r1 ,...,rN ,p1 ,...,pN ) . Remember that a microstate is deﬁned by the positions and momenta of
every particle. Classically, the sum over quantum states has been replaced by an integration over
phase space.
Suppose that the total energy can be written as a sum of quadratic terms. The most common
case is the kinetic energy in the nonrelativistic limit. For example, the kinetic energy of one particle 239 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS in three dimensions can be expressed as (p2 + p2 + p2 )/2m. Another example is the onedimensional
x
y
z
harmonic oscillator for which the total energy is p2 /2m + kx2 /2. Let us consider a onedimensional
x
system for simplicity, and suppose that the energy of the system can be written as
E= 1 (p1 ) ˜
+ E (x1 , . . . , xN , p2 , . . . , pN ), (6.42) where 1 = ap2 . We have separated out the quadratic dependence of the energy of particle 1 on
1
its momentum. We use (6.41) and express the mean value of 1 in one dimension as
1 =
= = ∞
e−βE (x1 ,...,xN ,p1 ,...,pN ) dx1 . . . dxN dp1 . . . dpN
−∞ 1
∞
e−βE (x1 ,...,xN ,p1 ,...,pN ) dx1 . . . dxN dp1 . . . dpN
−∞
∞
˜
−β [ 1 +E (x1 ,...,xN ,p2 ,...,pN )]
dx1 . . . dxN dp1 . . . dpN
−∞ 1 e
∞
˜
−β [ 1 +E (x1 ,...,xN ,p2 ,...,pN )] dx . . . dx dp . . . dp
e
1
N
1
N
−∞
∞
˜
−β 1
dp1 e−β E dx1 . . . dxN dp2 . . . dpN
−∞ 1 e
.
∞
˜
e−β 1 dp1 e−β E dx1 . . . dxN dp2 . . . dpN
−∞ (6.43a)
(6.43b) (6.43c) The integrals over all the coordinates except p1 cancel, and we have
1 Note that we could have written 1 ∞
−β
−∞ 1 e
∞ −β
1
−∞ e = 1 dp1 dp1 . (6.44) in the form (6.44) directly without any intermediate steps because the probability density can be written as a product of two terms – one term that depends
only on p1 and another term that depends on all the other coordinates. As we have done in other
contexts, we can write 1 as
∞
∂
e−β 1 dp1 .
ln
(6.45)
1=−
∂β
−∞
If we substitute 1 = ap2 , the integral in (6.45) becomes
1
∞ ∞ e−β 1 dp1 = Z=
−∞ 2 e−βap1 dp1 (6.46a) −∞ = (βa)−1/2 ∞ 2 e−x dx, (6.46b) −∞ where we have let x2 = βap2 . Note that the integral in (6.46b) is independent of β . Hence
1 =− ∂
1
ln Z (β ) = kT.
∂β
2 (6.47) Equation (6.47) is an example of the equipartition theorem of classical statistical mechanics.
Problem 6.11. Explain why we could have written (6.44) directly. What is the physical interpretation of the integrand in the numerator? CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS 240 The equipartition theorem is not a theorem and is not a new result. It is applicable only when
the system can be described classically and is applicable only to each term in the energy that is
proportional to a coordinate squared. This coordinate must take on a continuum of values from
−∞ to +∞.
Applications of the equipartition theorem. A system of particles in three dimensions has
3N quadratic terms in the kinetic energy, three for each particle. From the equipartition theorem,
we know that the mean kinetic energy is 3N kT /2, independent of the nature of the interactions,
if any, between the particles. Hence, the heat capacity at constant volume of an ideal classical
monatomic gas is given by CV = 3N k/2 as found previously.
Another application of the equipartition function is to the onedimensional harmonic oscillator
in the classical limit. In this case there are two quadratic contributions to the total energy and hence
the mean energy of a classical harmonic oscillator in equilibrium with a heat bath at temperature
T is kT . In the harmonic model of a crystal each atom feels a harmonic or springlike force due
to its neighboring atoms. The N atoms independently perform simple harmonic oscillations about
their equilibrium positions. Each atom contributes three quadratic terms to the kinetic energy and
three quadratic terms to the potential energy. Hence, in the high temperature limit the energy of
a crystal of N atoms is E = 6N kT /2 and the heat capacity at constant volume is
CV = 3N k. (law of Dulong and Petit) (6.48) The result (6.48) is known as the law of Dulong and Petit. This result was ﬁrst discovered empirically, is not a law, and is valid only at suﬃciently high temperatures. At low temperatures the
independence of CV on T breaks down and a quantum treatment is necessary. The heat capacity
of an insulating solid at low temperatures is discussed in Section 6.12.
The result (6.47) implies that the heat capacity of a monatomic classical ideal gas is 3N kT /2.
Let us consider a gas consisting of diatomic molecules. Its equation of state is still given by
P V = N kT assuming that the molecules do not interact. Why? However, its heat capacity
diﬀers in general from that of a monatomic gas because a diatomic molecule has additional energy
associated with vibrational and rotational motion. We expect that the two atoms of a diatomic
molecule can vibrate along the line joining them and rotate about their center of mass, in addition
to the translational motion of their center of mass. Hence, we would expect that C V for an ideal
diatomic gas is greater than CV for a monatomic gas. The heat capacity of a diatomic molecule is
explored in Problem 6.51.
We have seen that it is convenient to do calculations for a ﬁxed number of particles for
classical systems. For this reason we usually calculate the heat capacity of a N particle system or
the speciﬁc heat per particle. Experimental chemists usually prefer to give the speciﬁc heat as the
heat capacity per mole and experimental physicists frequently prefer to give the speciﬁc heat as
the heat capacity per kilogram or gram. All three quantities are known as the speciﬁc heat and
their precise meaning is clear from their units and the context. 6.4 Maxwell Velocity Distribution We now ﬁnd the distribution of particle velocities in a classical system that is in equilibrium with
a heat bath at temperature T . We know that the total energy can be written as the sum of 241 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS two parts: the kinetic energy K (p1 , . . . , pN ) and the potential energy U (r1 , . . . , rN ). The kinetic
energy is a quadratic function of the momenta p1 , . . . , pN (or velocities), and the potential energy
is a function of the positions r1 , . . . , rN of the particles. We write the total energy as E = K + U .
The probability density of a conﬁguration of N particles deﬁned by r1 , . . . , rN , p1 , . . . , pN is given
in the canonical ensemble by
p(r1 , . . . , rN ; p1 , . . . , pN ) = A e−[K (p1 ,p2 ,...,pN )+U (r1 ,r2 ,...,rN )]/kT
= Ae (6.49a) −K (p1 ,p2 ,...,pN )/kT −U (r1 ,r2 ,...,rN )/kT e , (6.49b) where A is a normalization constant. Note that the probability density p is a product of two
factors, one that depends only on the particle positions and the other that depends only on the
particle momenta. This factorization implies that the probabilities of the momenta and positions
are independent. For example, the momentum of a particle is not inﬂuenced by its position and
vice versa. The probability of the positions of the particles can be written as
f (r1 , . . . , rN ) dr1 . . . drN = B e−U (r1 ,...,rN )/kT dr1 . . . drN , (6.50) and the probability of the momenta is given by
f (p1 , . . . , pN ) dp1 . . . dpN = C e−K (p1 ,...,pN )/kT dp1 . . . dpN . (6.51) For simplicity, we have denoted the two probability densities by f , even though their functional
form is diﬀerent in (6.50) and (6.51). The constants B and C in (6.51) and (6.50) can be found by
requiring that each probability is normalized.
We emphasize that the probability distribution for the momenta does not depend on the
nature of the interaction between the particles and is the same for all classical systems at the
same temperature. This statement might seem surprising at ﬁrst because it might seem that the
velocity distribution should depend on the density of the system. An external potential also does
not aﬀect the velocity distribution. These statements are not true for quantum systems, because in
this case the position and momentum operators do not commute. That is, e−β (K +U ) = e−βK e−βU
for quantum systems.
Because the total kinetic energy is a sum of the kinetic energy of each of the particles, the
probability density f (p1 , . . . , pN ) is a product of terms that each depend on the momenta of only
one particle. This factorization implies that the momentum probabilities of the various particles
are independent, that is, the momentum of one particle does not aﬀect the momentum of any
other particle. These considerations imply that we can write the probability that a particle has
momentum p in the range dp as
2 2 2 f (px , py , pz ) dpx dpy dpz = c e−(px +py +pz )/2mkT dpx dpy dpz . (6.52) The constant c is given by the normalization condition
∞ ∞ ∞ 2 2 2 ∞ e−p e−(px +py +pz )/2mkT dpx dpy dpz = c c /2mkT dp 3 = 1. (6.53) −∞ −∞ −∞ −∞
∞ 2 2 If we use the fact that −∞ e−αx dx = (π/α)1/2 (see Appendix A), we ﬁnd that c = (2πmkT )−3/2 .
Hence the momentum probability distribution can be expressed as
f (px , py , pz ) dpx dpy dpz = 2
2
2
1
e−(px +py +pz )/2mkT dpx dpy dpz .
(2πmkT )3/2 (6.54) 242 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
The corresponding velocity probability distribution is given by
f (vx , vy , vz ) dvx dvy dvz = ( 2
2
2
m 3/2 −m(vx +vy +vz )/2kT
)e
dvx dvy dvz .
2πkT (6.55) Equation (6.55) is called the Maxwell velocity distribution. Note that it is a Gaussian. The
probability distribution for the speed is derived in Problem 6.23.
Because f (vx , vy , vz ) is a product of three independent factors, the probability of the velocity
of a particle in a particular direction is independent of the velocity in any other direction. For
example, the probability that a particle has a velocity in the xdirection in the range v x to vx + dvx
is
2
m 1/2 −mvx /2kT
)e
dvx .
(6.56)
f (vx ) dvx = (
2πkT
Many textbooks derive the Maxwell velocity distribution for an ideal classical gas and give the
mistaken impression that the distribution applies only if the particles are noninteracting. We stress
that the Maxwell velocity (and momentum) distribution applies to any classical system regardless
of the interactions, if any, between the particles.
Problem 6.12. Is there an upper limit to the velocity?
The upper limit to the velocity of a particle is the velocity of light. Yet the Maxwell velocity
distribution imposes no upper limit to the velocity. Is this contradiction likely to lead to diﬃculties?
Problem 6.13. Alternative derivation of the Maxwell velocity distribution
We can also derive the Maxwell velocity distribution by making some plausible assumptions. We
ﬁrst assume that the probability density f (v) for one particle is a function only of its speed v
or equivalently v 2 . We also assume that the velocity distributions of the components vx , vy , vz are
independent of each other.
(a) Given these assumptions, explain why we can write
2
2
2
2
2
2
f (vx + vy + vz ) = f (vx )f (vy )f (vz ). (6.57) (b) Show that the only mathematical function that satisﬁes the condition (6.57) is the exponential
function
2
f (v 2 ) = c e−αv ,
(6.58)
where c and α are independent of v.
(c) Determine c in terms of α using the normalization condition 1 =
nent. Why must α be positive?
1
2
(d) Use the fact that 2 kT = 1 mvx to ﬁnd the result (6.55).
2 ∞
−∞ f (v )dv for each compo 243 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS 6.5 Occupation Numbers and Bose and Fermi Statistics We now develop the formalism for calculating the thermodynamic properties of ideal quantum
systems. The absence of interactions between the particles of an ideal gas enables us to reduce
the problem of determining the energy levels Es of the gas as a whole to determining k , the
energy levels of a single particle. Because the particles are indistinguishable, we cannot specify
the microstate of each particle. Instead a microstate of an ideal gas is speciﬁed by the occupation
numbers nk , the number of particles in each of the single particle energies k . If we know the value
of the occupation number for each state, we can write the total energy of the system as
nk Es = k. (6.59) k The set of nk completely speciﬁes a microstate of the system. The partition function for an ideal
gas can be expressed in terms of the occupation numbers as
e−β Z (V, T, N ) = P
k nk k , (6.60) {nk } where the occupation numbers nk satisfy the condition
nk . N= (6.61) k As discussed in Section 4.3.7 one of the fundamental results of relativistic quantum mechanics
is that all particles can be classiﬁed into two groups. Particles with zero or integral spin such as 4 He
are bosons and have wave functions that are symmetric under the exchange of any pair of particles.
Particles with halfintegral spin such as electrons, protons, and neutrons are fermions and have
wave functions that are antisymmetric under particle exchange. The Bose or Fermi character of
composite objects can be found by noting that composite objects that have an even number of
fermions are bosons and those containing an odd number of fermions are themselves fermions. 3 For
example, an atom of 3 He is composed of an odd number of particles: two electrons, two protons,
1
and one neutron each of spin 2 . Hence, 3 He has halfintegral spin making it a fermion. An atom
4
of He has one more neutron so there are an even number of fermions and 4 He is a boson. What
type of particle is a hydrogen molecule, H2 ?
It is remarkable that all particles fall into one of two mutually exclusive classes with diﬀerent
spin. It is even more remarkable that there is a connection between the spin of a particle and
its statistics. Why are particles with halfintegral spin fermions and particles with integral spin
bosons? The answer lies in the requirements imposed by Lorentz invariance on quantum ﬁeld
theory. This requirement implies that the form of quantum ﬁeld theory must be the same in all
inertial reference frames. Although most physicists believe that the relation between spin and
statistics must have a simpler explanation, no such explanation yet exists.4
3 You might have heard of the existence of Boselike bound pairs of electrons (Cooper pairs) in what is known as
the BCS theory of superconductivity. However such pairs are not composite objects in the usual sense.
4 In spite of its fundamental importance, it is only a slight exaggeration to say that “everyone knows the spinstatistics theorem, but no one understands it.” 244 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
n1
0
1
1
1 n2
1
0
1
1 n3
1
1
0
1 n4
1
1
1
0 Table 6.2: Possible states of a three particle fermion system with four single particle energy states.
The quantity nk represents the number of particles in a single particle state k . Note that we have
not speciﬁed which particle is in a particular state. The diﬀerence between fermions and bosons is speciﬁed by the possible values of nk . For
fermions we have
nk = 0 or 1.
(fermions)
(6.62)
The restriction (6.62) states the Pauli exclusion principle for noninteracting particles – two identical
fermions cannot be in the same single particle state. In contrast, the occupation numbers n k for
identical bosons can take any positive integer value:
nk = 0, 1, 2, · · · (bosons) (6.63) We will see in the following sections that the fermion or boson nature of a many particle system
can have a profound eﬀect on its properties.
Example 6.1. Calculate the partition function of an ideal gas of N = 3 identical fermions in
equilibrium with a heat bath at temperature T . Assume that each particle can be in one of four
possible states with energies, 1 , 2 , 3 , and 4 .
Solution. The possible microstates of the system are summarized in Table 6.2 with four single
particle states. The spin of the fermions is neglected. Is it possible to reduce this problem to a one
body problem as we did for a classical noninteracting system?
The partition function is given by
Z3 = e−β ( 2+ 3 + 4 ) + e−β ( 1+ 3+ 4) + e−β ( 1+ 2 + 4 ) + e−β ( 1+ 2 + 3 ) . Problem 6.14. Calculate n1 , the mean number of fermions in the state with energy
system in Example 6.1. (6.64)
1, for the Problem 6.15. Calculate the mean energy of an ideal gas of N = 3 identical bosons in equilibrium
with a heat bath at temperature T , assuming that each particle can be in one of three states with
energies, 0, ∆, and 2∆. Is it possible to reduce this problem to a one body problem as we did for
a classical noninteracting system?
∗
Problem 6.16. Consider a single particle of mass m in a onedimensional harmonic oscillator
1
potential given by V (x) = 2 kx2 . As we found in Example 4.4, the partition function is given by
Z1 = e−x/2/(1 − e−x ), where x = β ω . (a) What is the partition function Z2d for two noninteracting distinguishable particles in the same
potential? CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS 245 (b) What is the partition function Z2f,S =0 for two noninteracting fermions in the same potential
assuming the fermions have no spin?
(c) What is the partition function Z2b for two noninteracting bosons in the same potential? Assume
the bosons have spin zero.
∗ Problem 6.17. Calculate the mean energy and entropy in the four cases considered in Problem 6.16. Plot E and S as a function of T and compare the behavior of E and S in the limiting
cases of T → 0 and T → ∞. 6.6 Distribution Functions of Ideal Bose and Fermi Gases The calculation of the partition function for an ideal gas in the semiclassical limit was straightforward because we were able to choose a single particle as the system. This choice is not possible
for an ideal gas at low temperatures where the quantum nature of the particles cannot be ignored.
So we need a diﬀerent strategy. The key idea is that it is possible to distinguish the set of all
particles in a given single particle state from the particles in any other single particle state. For
this reason we choose the system of interest to be the set of all particles that are in a given single
particle state. Because the number of particles in a given quantum state varies, we need to use
the grand canonical ensemble and assume that each system is populated from a particle reservoir
independently of the other single particle states.
Because we have not used the grand canonical ensemble until now, we brieﬂy review its
nature. The thermodynamic potential in the grand canonical ensemble is denoted by Ω(T, V, µ)
and is equal to −P V (see (2.149)). The connection of thermodynamics to statistical mechanics is
given by Ω = −kT ln Z , where the grand partition function Z is given by
Z= e−β (En−µNn ) . (6.65) n In analogy to the procedure for the canonical ensemble, our goal is to calculate Z , then Ω and the
pressure equation of state −P V (in terms of T , V , and µ), and then determine S from the relation
∂Ω
∂T V ,µ and the mean number of particles from the relation
∂Ω
N =−
∂µ T ,V S=− , (6.66) (6.67) The probability of a particular microstate is given by
pn = 1 −β (En −µNn )
e
.
Z We will use all these relations in the following. (Gibbs distribution) (6.68) 246 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS The ﬁrst step in our application of the grand canonical ensemble is to calculate the grand
partition function Zk for all particles in the k th single particle state. Because the energy of the nk
particles in the k th state is nk k , we can write Zk as
Zk = e−βnk ( k −µ) , (6.69) nk where the sum is over the possible values of nk . For fermions this sum is straightforward because
nk = 0 and 1 (see (6.62)). Hence
Zk = 1 + e−β ( k −µ) .
(6.70)
The corresponding thermodynamic or Landau potential, Ωk , is given by
Ωk = −kT ln Zk = −kT ln[1 + e−β ( k −µ)
. (6.71) We can use the relation nk = −∂ Ωk /∂µ (see (6.67)) to ﬁnd the mean number of particles in the
k th quantum state. The result is
nk = −
or
nk = ∂ Ωk
e−β (µ− k )
,
=
∂µ
1 + e−β (µ− k ) (6.72) 1 .
(FermiDirac distribution)
(6.73)
+1
The result (6.73) for the mean number of particles in the k th state is known as the FermiDirac
distribution.
eβ ( k −µ) The integer values of nk are unrestricted for bosons. We write (6.69) as
∞ Zk = 1 + e−β ( k −µ) + e−2β ( k −µ) +··· = e−β ( k −µ) nk . (6.74) nk =0 The geometrical series in (6.74) is convergent for e−β ( k −µ) < 1. Because this condition must be
satisﬁed for all values of k , we require that eβµ < 1 or
µ < 0. (bosons) (6.75) In contrast, the chemical potential may be either positive or negative for fermions. The summation
of the geometrical series in (6.74) gives
Zk = 1
1− e−β ( k −µ) , (6.76) and hence we obtain
Ωk = kT ln 1 − e−β ( k −µ) . (6.77) The mean number of particles in the k th state is given by
nk = −
or e−β ( k −µ)
∂ Ωk
=
∂µ
1 − e−β ( k −µ) (6.78) 247 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
nk = 1
eβ ( k −µ) −1 . (BoseEinstein distribution) (6.79) The form (6.79) is known as the BoseEinstein distribution.
It is frequently convenient to group the FermiDirac and BoseEinstein distributions together
and to write
nk = 1
eβ ( k −µ) ±1 + FermiDirac distribution
− BoseEinstein distribution . (6.80) The convention is that the upper sign corresponds to Fermi statistics and the lower sign to Bose
statistics.
The classical limit. The FermiDirac and BoseEinstein distributions must reduce to the classical
limit under the appropriate conditions. In the classical limit nk
1 for all k , that is, the mean
number of particles in any single particle state must be small. Hence β ( k −µ)
1 and in this limit
both the FermiDirac and BoseEinstein distributions reduce to
nk = e−β ( k −µ) (MaxwellBoltzmann distribution) (6.81) This result (6.81) is known as the MaxwellBoltzmann distribution. 6.7 Single Particle Density of States If we sum (6.80) over all single particle states, we obtain the mean number of particles in the
system:
1
.
(6.82)
N (T, V, µ) =
nk =
eβ ( k −µ) ± 1
k
k
For a given temperature T and volume V , (6.82) is an implicit equation for the chemical potential
µ in terms of the mean number of particles. That is, the chemical potential determines the mean
number of particles just as the temperature determines the mean energy. Similarly, we can write
the mean energy of the system as
E (T, V, µ) = k nk . (6.83) k Because the (grand) partition function Z is a product, Z =
ideal gas is given by
Ω(T, V, µ) = Ωk =
k kT
k k Zk , the Landau potential for the ln 1 ± e−β ( k −µ) . (6.84) For a macroscopic system the number of particles and the energy are well deﬁned, and we will
usually replace n and E by N and E respectively.
Because we have described the microscopic states at the most fundamental level, that is, by
using quantum mechanics, we see that the macroscopic averages of interest such as ( 6.82), (6.83)
and (6.84) involve sums rather than integrals over the microscopic states. However, because our CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS 248 systems of interest are macroscopic, the volume of the system is so large that the energies of the
discrete microstates are very close together and for practical purposes indistinguishable from a
continuum. Because it is easier to do integrals than to do sums over a very large number of states,
we wish to replace the sums in (6.82)–(6.84) by integrals. For example, we wish to write for an
arbitrary function f ( )
∞ k f ( k) → f ( ) g ( )d , (6.85) 0 where g ( ) d is the number of single particle states between and + d . The quantity g ( ) is
known as the density of states, although a better term would be the density of single particle states.
Although we already have calculated the density of states g ( ) for a single particle in a box
(see Section 4.3), we review the calculation here to emphasize its generality and the common
aspects of the calculation for black body radiation, elastic waves in a solid, and electron waves.
For convenience, we choose the box to be a cube of linear dimension L and assume that the wave
function vanishes at the faces of the cube. This condition ensures that we will obtain standing
waves. The condition for a standing wave in one dimension is that the wavelength satisﬁes the
condition
2L
n = 1, 2, . . .
(6.86)
λ=
n
where n is a nonzero positive integer. It is useful to deﬁne the wave number k as
k= 2π
,
λ (6.87) and write the standing wave condition as k = nπ/L. Because the waves in the x, y , and z directions
satisfy similar conditions, we can treat the wave number as a vector whose components satisfy
π
k = (nx , ny , nz ) ,
L (6.88) where nx , ny , nz are positive integers. Not all values of k are permissible and each combination
of {nx , ny , nz } corresponds to a diﬀerent state. In the “number space” deﬁned by the three
perpendicular axes labeled by nx , ny , and nz , the possible values of states lie at the centers
of cubes of unit edge length. These quantum numbers are usually very large for a macroscopic
box, and hence the integers {nx , ny , nz } and k can be treated as continuous variables.
Because the energy of a wave depends only on the magnitude of k, we want to know the
number of states between k and k + dk . As we did in Section 4.3, it is easier to ﬁrst ﬁnd Γ(k ), the
number of states with wave number less than or equal to k . We know that the volume in nspace
of a single state is unity, and hence the number of states in number space that are contained in
1
the positive octant of a sphere of radius n is given by Γ(n) = 8 (4πn3 /3), where n2 = n2 + n2 + n2 .
x
y
z
Because k = π n/L, the number of states with wave vector less than or equal to k is
Γ(k ) = 1 4πk 3 /3
.
8 (π/L)3 If we use the relation
g (k ) dk = Γ(k + dk ) − Γ(k ) = (6.89) dΓ(k )
dk,
dk (6.90) CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
we obtain
g (k ) dk = V k 2 dk
,
2π 2 249 (6.91) where the volume V = L3 . Equation (6.91) gives the density of states in k space between k and
k + dk .
Although we obtained the result (6.91) for a cube, the result is independent of the shape of
the enclosure and the nature of the boundary conditions (see Problem 6.47). That is, if the box is
suﬃciently large, the surface eﬀects introduced by the box do not aﬀect the physical properties of
the system.
Problem 6.18. Find the form of the density of states in k space for standing waves in a twodimensional and in a onedimensional box. 6.7.1 Photons The result (6.91) for the density of states in k space holds for any wave in a threedimensional
enclosure. Now we wish to ﬁnd the number of states g ( ) d as a function of the energy . We
adopt the same symbol to represent the density of states in k space and in space because the
interpretation of g will be clear from the context.
The form of g ( ) depends on how the energy depends on k . For electromagnetic waves of
frequency ν , we know that λν = c, ω = 2πν , and k = 2π/λ. Hence, ω = 2πc/λ or
ω = ck.
The energy (6.92) of a photon of frequency ω is
= ω = ck. (6.93) Because k = / c, we ﬁnd that
2 g( ) d = V 2π 2 3 c3 d. (6.94) The result (6.94) requires one modiﬁcation. The state of an electromagnetic wave or photon
depends not only on its wave vector or momentum, but also on its polarization. There are two
mutually perpendicular directions of polarization (right circularly polarized and left circularly
polarized) for each electromagnetic wave of wave number k.5 Thus the number of photon states
in which the photon has a energy in the range to + d is given by
2 g( ) d = V d π 2 3 c3 . (photons) (6.95) 5 In the language of quantum mechanics we say that the photon has spin one and two helicity states. The fact
that the photon has spin S = 1 and two rather than (2S + 1) = 3 helicity states is a consequence of special relativity
for massless particles. 250 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS 6.7.2 Electrons For a nonrelativistic particle of mass m, we know that
p2
.
(6.96)
2m
From the relations p = h/λ and k = 2π/λ, we ﬁnd that the momentum p of a particle is related to
its wave vector k by p = k . Hence, the energy can be expressed as
= 22 = k
,
2m and (6.97) 2 k
dk.
(6.98)
m
If we use the relations (6.97) and (6.98), we ﬁnd that the number of states in the interval to + d
is given by
V
(6.99)
g ( ) d = ns 2 3 (2m)3/2 1/2 d .
4π
We have included a factor of ns , the number of spin states for a given value of k or . Because
electrons have spin 1/2, ns = 2, and we can write (6.99) as
d= V
(2m)3/2 1/2 d .
(electrons)
(6.100)
2π 2 3
Because it is common to choose units such that = 1, we will express most of our results in the
remainder of this chapter in terms of instead of h.
g( ) d = Problem 6.19. Calculate the energy density of states for a nonrelativistic particle of mass m in
d = 1 and d = 2 spatial dimensions (see Problem 6.18). Sketch g ( ) on one graph for d = 1, 2,
and 3 and comment on the dependence of g ( ) on for diﬀerent spatial dimensions.
Problem 6.20. Calculate the energy density of states for a relativistic particle of rest mass m for
which 2 = p2 c2 + m2 c4 .
Problem 6.21. The relation between the energy and equation of state for an ideal gas
The mean energy E is given by
∞ E= n( ) g ( ) d (6.101a) 0 V
4π 2 = ns ∞ (2m)3/2
3 0 3/2 eβ ( d −µ) ±1 . (6.101b) Use (6.84) for the Landau potential and (6.99) for the density of states of nonrelativistic particles
in three dimensions to show that Ω can be expressed as
∞ Ω= kT
0 = kT g ( ) ln[1 ± e−β ( ns V
(2m)3/2
4π 2 3 −µ)
d , (6.102a) ∞
1/2
0 ln[1 ± e−β ( −µ)
d . (6.102b) 251 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
Integrate (6.102b) by parts with u = ln[1 ± e−β (
2
V
Ω = − ns 2
3 4π −µ) and dv =
∞ (2m)3/2
3 3/2 1/2 d and show that d
.
±1 eβ ( −µ) 0 (6.103) 2
The form (6.101b) for E is the same as the general result (6.103) for Ω except for the factor of − 3 .
Because Ω = −P V (see (2.149)), we obtain PV = 2
E.
3 (6.104) The relation (6.104) is exact and holds for an ideal gas with any statistics at any temperature T ,
and depends only on the nonrelativistic relation, = p2 /2m.
Problem 6.22. The relation between the energy and equation of state for photons
Use similar considerations as in Problem 6.21 to show that for photons:
PV = 1
E.
3 (6.105) Equation (6.105) holds at any temperature and is consistent with Maxwell’s equations which implies
that the pressure due to an electromagnetic wave is related to the energy density by P = u(T )/3.
The distribution of speeds in a classical system of particles can be found from (6.55). As we
did previously, we need to know the number of states between v and v + dv . This number is simply
4π (v + ∆v )3 /3 − 4πv 3 /3 → 4πv 2 ∆v in the limit ∆v → 0. Hence, the probability that a particle
has a speed between v and v + dv is given by
f (v )dv = 4πA v 2 e−mv 2 /2kT dv, (6.106) where A is a normalization constant which we obtain in Problem 6.23.
Problem 6.23. Maxwell speed distribution
(a) Compare the form of (6.106) with (6.91).
(b) Use the fact that ∞
0 f (v ) dv f (v )dv = 4πv 2 ( = 1 to calculate A and show that m 3/2 −mv2 /2kT
)e
dv.
2πkT (Maxwell speed distribution) (6.107) ˜
(c) Calculate the mean speed v , the most probable speed v , and the rootmean square speed v rms
and discuss their relative magnitudes.
Make the change of variables u = v/ (2kT /m) and show that
√
2
f (v )dv = f (u)du = (4/ π )u2 e−u du, (6.108) where again we have used same the same notation for two diﬀerent, but physically related
probability densities. The (dimensionless) speed probability density f (u) is shown in Figure 6.1. 252 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS 1.0 u = 1.13
f(u) 0.8
0.6
0.4
0.2
0 urms = 1.22
0 0.5 1.0 1.5 u 2.0 2.5 3.0 √
2
Figure 6.1: The probability density f (u) = 4/ πu2 e−u that a particle has a speed u. Note the
diﬀerence between the most probable speed u = 1, the mean speed u ≈ 1.13, and the rootmean˜
square speed urms ≈ 1.22 in units of (2kT /m)1/2 . 6.8 The Equation of State for a Noninteracting Classical
Gas We have already seen how to obtain the equation of state and other thermodynamic quantities for
the classical ideal gas in the canonical ensemble (ﬁxed T , V , and N ). We now discuss how to use the
grand canonical ensemble (ﬁxed T , V , and µ) to ﬁnd the analogous quantities under conditions for
which the MaxwellBoltzmann distribution is applicable. The calculation in the grand canonical
ensemble will automatically satisfy the condition that the particles are indistinguishable. For
simplicity, we will assume that the particles are spinless.
As an example, we ﬁrst compute the chemical potential from the condition that the mean
number of particles is given by N . If we use the Maxwelldistribution distribution (6.81) and the
density of states (6.100) for spinless particles of mass m, we obtain
∞ N=
k = nk → V 2m
2
4π 2 n( ) g ( ) d (6.109a) 0
∞ 3/2 e−β ( −µ) 1/2 d. (6.109b) e−x x1/2 dx. (6.110) 0 We make the change of variables x = β and write (6.109b) as
N= V
4π 2 2m
2β 3/2 ∞ eβµ
0 253 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS The integral in (6.110) can be done analytically (make the change of variables x = y 2 ) and has the
1
value π 2 /2 (see Appendix A). Hence, the mean number of particles is given by
N (T, V, µ) = V m
2π 2 β 3/2 βµ e . (6.111) Because we cannot easily measure µ, we are not satisﬁed with knowing the function N (T, V, µ).
Instead, we can ﬁnd the value of µ that yields the desired value of N by solving (6.111) for the
chemical potential:
N 2π 2 β 3/2
.
(6.112)
µ = kT ln
V
m
What is the diﬀerence, if any, between (6.111) and the result (6.31) for µ found in the canonical
ensemble?
Problem 6.24. Estimate the chemical potential of a monatomic ideal gas at room temperature.
As we saw in Section 2.21, the chemical potential is the change in each of the thermodynamic
potentials when one particle is added. It might be expected that µ > 0, because it should cost
energy to add a particle. On the other hand, because the particles do not interact, perhaps µ = 0?
So why is µ
0 for a classical ideal gas? The reason is that we have to determine how much
energy must be added to the system to keep the entropy and the volume ﬁxed. Suppose that we
add one particle with zero kinetic energy. Because the gas is ideal, there is no potential energy of
interaction. However, because V is ﬁxed, the addition of an extra particle leads to an increase in
S . (S is an increasing function of N and V .) Because S also is an increasing function of the total
energy, we have to reduce the energy.
The calculation of N (T, V, µ) leading to (6.111) was not necessary because we can calculate
the equation of state and all the thermodynamic quantities from the Landau potential Ω. We
calculate Ω from (6.84) by noting that eβµ
1 and approximating the argument of the logarithm
by ln (1 ± x) ≈ ±x. We ﬁnd that
Ω= kT
k → −kT ln 1 ± e−β (
e−β ( k −µ) . k −µ) (6.113a) (semiclassical limit) (6.113b) k As expected, the form of Ω in (6.113b) is independent of whether we started with Bose or Fermi
statistics.
As usual, we replace the sum over the single particle states by an integral over the density of
states and ﬁnd
∞ Ω = −kT eβµ g ( ) e−β d V
2m 3/2 βµ
e
4π 2 3 β
V
m 3/2 βµ
= −k 5/2
e.
2π 2
β
= −kT (6.114a) 0
∞ x1/2 e−x dx (6.114b) 0 (6.114c) 254 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
If we substitute λ = (2πβ 2 /m)1/2 , we ﬁnd
Ω = −kT V βµ
e.
λ3 (6.115) From the relation Ω = −P V (see (2.149)), we obtain
kT βµ
e.
λ3 P= (6.116) If we use the thermodynamic relation (6.67), we obtain
N =− ∂Ω
∂µ V ,T = V βµ
e.
λ3 (6.117) The usual classical equation of state, P V = N kT , is obtained by using (6.117) to eliminate µ. The
simplest way of ﬁnding the energy is to use the relation (6.104).
We can ﬁnd the entropy S (T, V, µ) using (6.115) and (6.66):
S (T, V, µ) = − ∂Ω
∂T V ,µ = V kβ 2 ∂Ω
∂β
µ
− 6/2
β = kβ 2 5
2β 7/2 (6.118a)
m
2π 2 3/2 βµ e . (6.118b) If we eliminate µ from (6.118b), we obtain the SackurTetrode expression for the entropy of an
ideal gas:
N
5
2π 2 3/2
.
(6.119)
S (T, V, N ) = N k − ln
− ln
2
V
mkT
We have written N rather than N in (6.119). Note that we did not have to introduce any extra factors of N ! as we did in Section 6.2, because we already correctly counted the number of
microstates.
Problem 6.25. Complete the missing steps and derive the ideal gas equations of state.
Problem 6.26. Show that N can be expressed as
N= V βµ
e,
λ3 and hence
µ(T, V ) = −kT ln (6.120)
1
,
ρλ3 (6.121) where ρ = N /V .
Problem 6.27. In Section 6.2 we argued that the semiclassical limit λ
ρ−1/3 (see (6.1)) implies
that nk
1, that is, the mean number of particles that are in any single particle energy state is
very small. Use the expression (6.121) for µ and (6.81) for nk to show that the condition nk
1
implies that λ
ρ−1/3 . CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS 6.9 255 Black Body Radiation We can regard electromagnetic radiation as equivalent to a system of noninteracting bosons (photons), each of which has an energy hν , where ν is the frequency of the radiation. If the radiation is
in an enclosure, equilibrium will be established and maintained by the interactions of the photons
with the atoms of the wall in the enclosure. Because the atoms emit and absorb photons, the total
number of photons is not conserved.
One of the important observations that led to the development of quantum theory was the
consideration of the frequency spectrum of electromagnetic radiation from a black body. If a body
in thermal equilibrium emits electromagnetic radiation, then this radiation is described as black
body radiation and the object is said to be a black body. This statement does not mean that
the body is actually black. The word “black” indicates that the radiation is perfectly absorbed
and reradiated by the object. The spectrum of light radiated by such an idealized black body
is described by a universal spectrum called the Planck spectrum, which we will derive in the
following (see (6.129)). The nature of the spectrum depends only on the absolute temperature T
of the radiation.
The physical system that most closely gives the spectrum of a black body is the spectrum
of the cosmic microwave background.6 The observed cosmic microwave background spectrum ﬁts
the theoretical spectrum of a black body better than the best black body spectrum that we can
make in a laboratory! In contrast, a piece of hot, glowing ﬁrewood, for example, is not really in
thermal equilibrium, and the spectrum of glowing embers is only a crude approximation to black
body spectrum. The existence of the cosmic microwave background spectrum and its ﬁt to the
black body spectrum is compelling evidence that the universe experienced a Big Bang.
We can derive the Planck radiation law using either the canonical or grand canonical ensemble
because the photons are continuously absorbed and emitted by the walls of the container and hence
their number is not conserved. Let us ﬁrst consider the canonical ensemble, and consider a gas of
photons in equilibrium with a heat bath at temperature T . The total energy of the system is given
by E = n1 1 + n2 2 + . . ., where nk is the number of photons with energy k . Because there is no
6 The universe is ﬁlled with electromagnetic radiation with a distribution of frequencies given by ( 6.129) with
T ≈ 2.73 K. The existence of this background radiation is a remnant from a time when the universe was composed
primarily of electrons and protons at a temperature of about 4000 K. This plasma of electrons and protons interacted
strongly with the electromagnetic radiation over a wide range of frequencies, so that the matter and the radiation
reached thermal equilibrium. By the time that the universe had cooled to 3000 K, the matter was primarily in
the form of atomic hydrogen, which interacts with radiation only at the frequencies of the hydrogen spectral lines.
As a result most of the radiation energy was eﬀectively decoupled from matter. Electromagnetic radiation, such
as starlight, radiated by matter since the decoupling, is superimposed on the cosmic black body radiation. More
information about the cosmic microwave background can be found at <www.astro.ubc.ca/people/scott/cmb.html>
and at many other sites. 256 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
constraint on the total number of photons, we can write the canonical partition function as
∞ e−βEs = Z (T, V ) =
s
∞ e−βn1 = 1 e−β (n1
n1 ,n2 ,...=0
∞
−βn2 e 2 1 +n2 2 + ...) ... (6.122a)
(6.122b) n2 =0 n1 =0
∞ e−βnk = k . (6.122c) nk =0 k The lack of a constraint means that we can do the sum over each occupation number separately.
Because the term in brackets in (6.122c) is a geometrical series, we obtain
1
1 − e−β Z (T, V ) =
k k . (photon gas) (6.123) A quantity of particular interest is the mean number of photons in state k . In the canonical
ensemble we have
nk e−βEs
=
−βEs
se
1
∂
=
Z ∂ (−β k ) n ,n nk = 1 = n1 ,n2 ,... nk s e−β (n1 1 +n2 2 +...+nk k +...) (6.124a) Z
e−β (n1 1 +n2 2 +···+nk k +...) (6.124b) 2 ,... ∂ ln Z
.
∂ (−β k ) (6.124c) Because the logarithm of a product of terms equals the sum of the logarithms of each term, we
have from (6.123) and (6.124c)
nk =
=
or
nk = ∂
∂ (−β k ) k ln (1 − e−β k ) e−β k
,
1 − e−β k
eβ 1
.
−1 k (6.125a)
(6.125b) (Planck distribution) (6.125c) If we compare the form of (6.125c) with the general BoseEinstein distribution in (6.79), we
see that the two expressions agree if we set µ = 0. This result can be understood by simple considerations. As we have mentioned, equilibrium is established and maintained by the interactions
between the photons and the atoms of the wall in the enclosure. The number N of photons in the
cavity cannot be imposed externally on the system and is ﬁxed by the temperature T of the walls
and the volume V enclosed. Hence, the free energy F for photons cannot depend on N because
the latter is not a thermodynamic variable, and we have µ = ∂F/∂N = 0. If we substitute µ = 0 257 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS into the general result (6.79) for noninteracting bosons, we ﬁnd that the mean number of photons
in state k is given by
1
nk = βhν
,
(6.126)
e
−1
in agreement with (6.125c). That is, the photons in blackbody radiation are bosons whose chemical
potential is zero. The role of the chemical potential is to set the mean number of particles, just
as the temperature sets the mean energy. However, the chemical potential has no role to play for
a system of photons in black body radiation. So we could have just started with (6.79) for nk in
the grand canonical ensemble and set µ = 0.
Planck’s theory of black body radiation follows from the form of the density of states for
photons found in (6.95). The number of photons with energy in the range to + d is given by
N ( ) d = n( )g ( ) d = 2 V
π2 3 c3 d
.
eβ − 1 (6.127) (For simplicity, we have ignored the polarization of the electromagnetic radiation, and hence the
spin of the photons.) If we substitute = hν in the righthand side of (6.127), we ﬁnd that the
number of photons in the frequency range ν to ν + dν is given by
N (ν ) dν = 8πV ν 2 dν
.
c3 eβhν − 1 (6.128) The distribution of radiated energy is obtained by multiplying (6.128) by hν :
E (ν )dν = hνN (ν ) dν = 8πhV ν 3
dν
.
3
βhν − 1
c
e (6.129) Equation (6.129) gives the energy radiated by a black body of volume V in the frequency range
between ν and ν + dν . The energy per unit volume u(ν ) is given by
u(ν ) =
We can change variables to 8πhν 3
1
.
3
βhν − 1
c
e (Planck’s radiation law) (6.130) = hν and write the energy density as
u( ) = 8π
(hc)3 e 3
/kT −1 . (6.131) The temperature dependence of u( ) is shown in Figure 6.2.
Problem 6.28. Wien’s displacement law
The maximum of u(ν ) shifts to higher frequencies with increasing temperature. Show that the
maximum of u can be found by solving the equation
(3 − x)ex = 3, (6.132) where x = βhνmax . Solve (6.132) numerically for x and show that
hνmax
= 2.822.
kT (Wien’s displacement law) (6.133) 258 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS x3/(ex  1) 1.5 1.0 0.5 0 0 2 4 6 8
x = ε/kT 10 12 Figure 6.2: The Planck spectrum as a function of x = /kT . The area under any portion of
the curve multiplied by 8π (kT )4 /(hc)3 gives the energy of electromagnetic radiation within the
corresponding energy or frequency range.
Problem 6.29. Derivation of the RayleighJeans and Wien’s laws
(a) Use (6.130) to ﬁnd the energy emitted by a black body at a wavelength between λ and λ + dλ.
(b) Determine the limiting behavior of your result for long wavelengths. This form is called the
RayleighJeans law and is given by
u(λ)dλ = 8πkT
dλ.
λ4 (6.134) Does this form involve Planck’s constant? The result in (6.134) was originally derived from
purely classical considerations. Classical theory predicts the socalled ultraviolet catastrophe,
namely that an inﬁnite amount of energy is radiated at high frequencies. or short wavelengths.
Explain how (6.134) would give an inﬁnite result for the total energy that would be radiated.
(c) Determine the limiting behavior for short wavelengths. This behavior is called Wien’s law.
Problem 6.30. Thermodynamic functions of black body radiation 259 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
Use the various thermodynamic relations to show that
∞ E=V u(ν ) dν =
0 4σ
V T 4.
c (StefanBoltzmann law) 4σ
V T 4.
3c
16σ
S=
V T 3.
3c
4σ 4 1 E
P=
T=
.
3c
3V
G = F + P V = 0. (6.135a)
(6.135b) F =− (6.135c)
(6.135d)
(6.135e) The free energy F in (6.135b) can be calculated from Z starting from (6.123) and using (6.95).
The StefanBoltzmann constant σ is given by
2π 5 k 4
.
15h3 c2 (6.136) π4
x3 dx
=
.
x−1
e
15 (6.137) σ=
The integral ∞
0 is evaluated in Appendix A.
The relation (6.135a) between the total energy and T is known as the StefanBoltzmann law.
Because G = N µ and N = 0, we again ﬁnd that the chemical potential equals zero for an ideal
gas of photons. What is the relation between E and P V ? Why is it not the same as (6.104)?
Also note that for a quasistatic adiabatic expansion or compression of the photon gas, the product
V T 3 = constant. Why? How are P and V related?
Problem 6.31. Show that the total mean number of photons in black body radiation is given by
N= V
π 2 c3 ∞
0 ω 2 dω
V (kT )3
= 233
πc
e
−1
ω /kT ∞
0 x2 dx
.
ex − 1 (6.138) The integral in (6.138) can be expressed in terms of known functions (see Appendix A). The result
is
∞2
x dx
= 2 × 1.202.
(6.139)
x
0 e −1
Hence N = 0.244V 6.10 kT
c 3 . (6.140) Noninteracting Fermi Gas The properties of metals are dominated by the behavior of the conduction electrons. Given that
there are Coulomb interactions between the electrons as well as interactions between the electrons 260 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS 1 n(ε) T=0 T>0
εF
Figure 6.3: The FermiDirac distribution at T = 0 and for T ε
TF . and the positive ions of the lattice, it is remarkable that the free electron model in which the
electrons are treated as an ideal gas of fermions near zero temperature is an excellent model of
the conduction electrons in a metal under most circumstances. In the following, we investigate
the properties of an ideal Fermi gas and brieﬂy discuss its applicability as a model of electrons in
metals.
As we will see in Problem 6.32, the thermal de Broglie wavelength of the electrons in a typical
metal is much larger than the mean interparticle spacing, and hence we must treat the electrons
using Fermi statistics. When an ideal gas is dominated by quantum mechanical eﬀects, it is said
to be degenerate. 6.10.1 Groundstate properties We ﬁrst discuss the noninteracting Fermi gas at T = 0. From (6.73) we see that the zero temperature limit (β → ∞) of the FermiDirac distribution is
n( ) = 1 for < µ
0 for > µ. (6.141) That is, all states whose energies are below the chemical potential are occupied, and all states
whose energies are above the chemical potential are unoccupied. The Fermi distribution at T = 0
is shown in Figure 6.3a.
The consequences of (6.141) are easy to understand. At T = 0, the system is in its ground
state, and the particles are distributed among the single particle states so that the total energy of 261 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS the gas is a minimum. Because we may place no more than one particle in each state, we need
to construct the ground state of the system by adding a particle, one at a time, into the lowest
available energy state until we have placed all the particles. To ﬁnd the value of µ(T = 0) ≡ F ,
we write
µ(T =0)
∞
F
(2m)3/2 1/2
d.
(6.142)
g( ) d = V
N=
n( ) g ( ) d T→0
→
2π 2 3
0
0
0
We have substituted the electron density of states (6.100) in (6.142). The upper limit F in (6.142)
is equal to µ(T = 0) and is determined by requiring the integral to give the desired number of
particles N . We ﬁnd that
V 2m F 3/2
.
(6.143)
N= 2
2
3π
The energy of the highest occupied state is called the Fermi energy
√
Fermi momentum is pF = 2m F . From (6.143) we have that
2 = F 2m F and the corresponding (3π 2 ρ)2/3 , (6.144) where the density ρ = N/V . It follows that the Fermi momentum is given by
pF = (3π 2 ρ)1/3 .
(6.145) At T = 0 all the states with momentum less that pF are occupied and all the states above this
momentum are unoccupied. The boundary in momentum space between occupied and unoccupied
states at T = 0 is called the Fermi surface. For an ideal Fermi gas, the Fermi surface is the surface
of a sphere with radius pF . Note that the Fermi momentum can be estimated by assuming the de
Broglie relation p = h/λ and taking λ ∼ ρ−1/3 , the mean distance between particles. That is, the
particles are “localized” within a box of order ρ−1/3 .
The chemical potential at T = 0 equals F and is positive. On the other hand, in Section 6.8 we
argued in that µ should be a negative quantity for a classical ideal gas, because we have to subtract
energy to keep the entropy from increasing when we add a particle to the system. However, this
argument depends on the possibility of adding a particle with zero energy. In a Fermi system at
T = 0, no particle can be added with energy less than µ(T = 0), and hence µ(T = 0) > 0.
We will ﬁnd it convenient in the following to introduce a characteristic temperature, the Fermi
temperature TF , by
TF = F /k.
(6.146)
The order of magnitude of TF for typical metals is estimated in Problem 6.32.
to A direct consequence of the fact that the density of states in three dimensions is proportional
is that the mean energy per particle is 3 F /5: 1/2 F g( ) d
=
F
g( ) d
0 0 = 2
5
2
3 5/2
F
3/2
F = 3
5 F. 3/2 F F E
=
N 1/2 d 0
0 d (6.147a)
(6.147b) 262 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
The total energy is given by
E= 3
N
5 F = 2
3
N (3π 2 )2/3
ρ2/3 .
5
2m (6.148) The pressure can be immediately found from the general relation P V = 2E/3 (see ( 6.104)) for an
noninteracting, nonrelativistic gas at any temperature. Alternatively, the pressure can be found
either from the relation
2E
∂F
=
,
(6.149)
P =−
∂V
3V
because the entropy is zero at T = 0, and the free energy is equal to the total energy, or from the
Landau potential Ω = −P V as discussed in Problem 6.33. The result is that the pressure at T = 0
is given by
2
P = ρ F.
(6.150)
5
The fact that the pressure is nonzero even at zero temperature is a consequence of the Pauli
exclusion principle, which allows only one particle to have zero momentum (two electrons if the
spin is considered). All other particles have ﬁnite momentum and hence give rise to a zeropoint
pressure.
The nature of the result (6.150) can be understood simply by noting that the pressure is
related to the rate of change of momentum at the walls of the system. We take dp/dt ∝ p F /τ with
τ ∝ L/(pF /m). Hence, the pressure due to N particles is proportional to N p2 /mV ∝ ρ F .
F
Problem 6.32. Order of magnitude estimates
(a) Estimate the magnitude of the thermal de Broglie wavelength λ for an electron in a typical
metal at room temperature. Compare your result for λ to the interparticle spacing, which you
can estimate using the data in Table 6.3.
(b) Use the same data to estimate the Fermi energy F , the Fermi temperature TF (see (6.146)),
and the Fermi momentum pF . Compare the values of TF to room temperature.
Problem 6.33. The Landau potential for an ideal Fermi gas at arbitrary T can be expressed as
∞ Ω = −kT g ( ) ln[1 + e−β ( −µ)
d . To obtain the T = 0 limit of Ω, we have that < µ in (6.151), β → ∞, and hence ln[1+ e−β (
ln e−β ( −µ = −β ( − µ). Hence, show that
Ω= (6.151) 0 (2m)3/2 V
2π 2 2 F d
0 1/2 − F . Calculate Ω and determine the pressure at T = 0.
Problem 6.34. Show that the limit (6.141) for n( ) at T = 0 follows only if µ > 0. −µ)
→ (6.152) CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
element
Li (T = 78 K)
Cu
Fe Z
1
1
2 263 ρ (1028 /m3 )
4.70
8.47
17.0 Table 6.3: Conduction electron densities for several metals at room temperature and atmospheric
pressure unless otherwise noted. Data are taken from N. W. Ashcroft and N. D. Mermin, Solid
State Physics, Holt, Rinehart and Winston (1976). 6.10.2 Low temperature thermodynamic properties One of the greatest successes of the free electron model and FermiDirac statistics is the explanation
of the temperature dependence of the heat capacity of a metal. If the electrons behaved like a
3
classical noninteracting gas, we would expect a contribution to the heat capacity equal to 2 N k .
Instead, we typically ﬁnd a very small contribution to the heat capacity which is linear in the
temperature, a result that cannot be explained by classical statistical mechanics. Before we derive
this result, we ﬁrst give a qualitative argument for the low temperature dependence of the heat
capacity of an ideal Fermi gas.
As we found in Problem 6.32b, the Fermi temperature for the conduction electrons in a metal
is much greater than room temperature, that is, T
TF . Hence, at suﬃciently low temperature,
we should be able to understand the behavior of an ideal Fermi gas in terms of its behavior at zero
temperature. Because there is only one characteristic energy in the system (the Fermi energy), the
criterion for low temperature is that T
TF . For example, we ﬁnd TF ≈ 8.2 × 104 K for copper,
−31
using me = 9.1 × 10
kg. Hence the conduction electrons in a metal are eﬀectively at absolute
zero even though the metal is at room temperature.
For 0 < T
TF , the electrons that are within order kT below the Fermi surface now have
enough energy to occupy states with energies that are order kT above the Fermi energy. In contrast,
the electrons that are deep within the Fermi surface do not have enough energy to be excited to
states above the Fermi energy. Hence, only a small fraction of order T /TF of the N electrons have
a reasonable probability of being excited, and the remainder of the electrons remain unaﬀected.
This reasoning leads us to write the heat capacity of the electrons as CV ∼ 3 Neﬀ k , where Neﬀ is
2
the number of electrons that can be excited by their interaction with the heat bath. For a classical
system, Neﬀ = N , but for a Fermi system at T
TF , we have that Neﬀ ∼ N (T /TF ). Hence, we
expect that the temperature dependence of the heat capacity is given by
CV ∼ N k T
.
TF (T TF ) (6.153) From (6.153) we see that the contribution to the heat capacity from the electrons is much smaller
than the prediction of the equipartition theorem and is linear in T as is found empirically. As an
example, the measured speciﬁc heat of copper for T < 1 K is dominated by the contribution of the
electrons and is given by CV /N = 0.8 × 10−4 kT .
We can understand why µ(T ) remains unchanged as T is increased slightly from T = 0 by the 264 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS not done Figure 6.4: The area under the step function is approximately equal to the area under the continuous function, that is, the areas of the two crosshatched areas are approximately equal. following reasoning. The probability that a state is empty is
1 − n( ) = 1 − 1
eβ ( −µ) +1 = 1
.
eβ (µ− ) + 1 (6.154) We see from (6.154) that for a given distance from µ, the probability that a particle is lost from a
previously occupied state below µ equals the probability that an previously empty state is occupied:
n( − µ) = 1 − n(µ − ). This property implies that the area under the step function at T = 0 is
nearly the same as the area under n( ) for T
TF (see Figure 6.4). That is, n( ) is symmetrical
about = µ. If we make the additional assumption that the density of states changes very little
in the region where n departs from a step function, we see that the mean number of particles lost
from the previously occupied states just balances the mean number gained by the previously empty
states. Hence, we conclude that for T
TF , we still have the correct number of particles without
any need to change the value of µ.
Similar reasoning implies that µ(T ) must decrease slightly as T is increased from zero. Suppose
that µ were to remain constant as T is increased. Because the density of states is an increasing
function of , the number of electrons we would add at > µ would be greater than the number
we would lose from < µ. As a result, we would increase the number of electrons by increasing T .
To prevent such an nonsensical increase, µ has to reduce slightly. We will show in the following
that µ(T ) − µ(T = 0) ∼ (T /TF )2 and hence to ﬁrst order in T /TF , µ is unchanged. Note that the
form of n( ) shown in Figure 6.3b is based on the assumption that µ(T ) ≈ F for T
TF .
Problem 6.35. Numerical evaluation of the chemical potential
To ﬁnd the chemical potential for T > 0, we need to ﬁnd the value of µ that yields the desired
number of particles. We have
∞ N= n( )g ( )d =
0 V (2m)3/2
2π 2 3 ∞
0 1/2 d eβ ( −µ) +1 , where we have used (6.100) for g ( ). It is convenient to let = x F , µ = µ∗
and rewrite (6.155) as
N
(2m)3/2 3/2 ∞
x1/2 dx
ρ=
=
,
23 F
(x−µ∗ )/T ∗ + 1
V
2π
e
0 (6.155)
F, and T ∗ = kT / F (6.156) 265 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
or
1= ∞ 3
2 x1/2 dx
e(x−µ∗ )/T ∗ 0 +1 , (6.157) where we have substituted (6.144) for F . To ﬁnd the dependence of µ on T (or µ∗ on T ∗ ), use
the application/applet ComputeFermiIntegralApp to evaluate ( 6.157). Start with T ∗ = 0.2 and
ﬁnd µ∗ such that (6.157) is satisﬁed. Does µ∗ initially increase or decrease as T is increased from
zero? What is the sign of µ∗ for T ∗
1? At what value of T ∗ is µ∗ ≈ 0?
We now derive a quantitative expression for CV valid for temperatures T
∆E = E (T ) − E (0) in the total energy is given by
∞ F ∆E = g( ) d . n( )g ( )d − 0 We multiply the identity F n( )g ( )d = g( ) d 0 by F (6.159) 0 to obtain ∞ F F n(e)g ( (6.158) 0 ∞ N= TF .7 The increase )d + F F n(e)g ( 0 F g( )d = )d . (6.160) 0 F We can use (6.160) to rewrite as (6.158) as
∞ ∆E =
F F (− F )n( ( )g ( )d + F 0 − )[1 − n( )]g ( )d . (6.161) The heat capacity is found by diﬀerentiating ∆E with respect to T . The only temperaturedependent term in (6.161) is n( ). Hence, we can write CV as
∞ CV =
0 (e − F) dn( )
g ( )d .
dT (6.162) near F . Hence it is a good approximation
For kT
F , the derivative dn/dT is large only for
to evaluate the density of states g ( ) at = F and take it outside the integral:
∞ CV = g ( F)
0 (e − F) dn
d.
dT We can also ignore the temperaturedependence of µ in n( ) and replace µ by
approximation we have
dn dβ
1 ( − F )eβ ( − F )
dn
=
=
.
dT
dβ dT
kT 2 [eβ ( −µ) + 1]2
We next let x = ( − F )/kT F. With this
(6.164) and use (6.163) and (6.164) to write CV as
∞ CV = k 2 T g ( x2 F)
−β 7 The (6.163) following derivation is adapted from Kittel. F ex
dx.
(ex + 1)2 (6.165) 266 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS We can replace the lower limit by −∞ because the factor ex in the integrand is negligible at
x = −β F for low temperatures. If we use the integral
∞ x2 −∞ (ex π2
ex
dx =
,
2
+ 1)
3 (6.166) we can write the heat capacity of an ideal Fermi gas as
CV = 12
π g(
3 F )k 2 T. (6.167) It is straightforward to show that
g( F) = 3N
3N
=
,
2F
2kTF (6.168) and we ﬁnally arrive at our desired result
CV = π2
T
Nk
.
2
TF (T TF ) (6.169) A more detailed discussion of the low temperature properties of an ideal Fermi gas is given in
Appendix 6A. For convenience, we summarize the main results here:
2 21/2 V m3/2 2 5/2 π 2
µ+
(kT )2 µ1/2 .
3
π2 3
5
4
V (2m)3/2 3/2 π 2
∂Ω
=
µ+
(kT )2 µ−1/2 .
N =−
∂µ
3π 2 3
8
Ω=− (6.170)
(6.171) The results (6.170) and (6.171) are in the grand canonical ensemble in which the chemical potential
is ﬁxed. However, most experiments are done on a sample with a ﬁxed number of electrons, and
hence µ must change with T to keep N ﬁxed. To ﬁnd this dependence we rewrite (6.171) as
π2
3π 2 3 ρ
(kT )2 µ−2 ,
= µ3/2 1 +
3/2
8
(2m) (6.172) where ρ = N /V . If we raise both sides of (6.172) to the 2/3 power and use (6.144), we have
π2
32/3 π 4/3 2 ρ2/3
1+
(kT )2 µ−2
2m
8
π2
−2/3
= F 1+
(kT )2 µ−2
.
8 µ= In the limit of T → 0, µ =
temperatures is given by F −2/3 , (6.173a)
(6.173b) as expected. From (6.173b) we see that the ﬁrst correction for low µ(T ) = F 1− 2 π 2 (kT )2
=
3 8 µ2 F 1− π2 T
12 TF 2 , (6.174) where we have made the expansion (1 + x)n ≈ 1 + nx and replaced µ on the righthand side by
F = kTF . 267 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS From (6.174), we see that the chemical potential decreases with temperature to keep N ﬁxed,
but the decrease is second order in T /TF (rather than ﬁrstorder), consistent with our earlier
qualitative considerations. The explanation for the decrease in µ(T ) is that more particles move
from low energy states below the Fermi energy to energy states above the Fermi energy as the
temperature rises. Because the density of states increases with energy, it is necessary to decrease
the chemical potential to keep the number of particles constant. In fact, as the temperature
becomes larger than the Fermi temperature, the chemical potential changes sign and becomes
negative.
Problem 6.36. Use (6.170) and (6.174) to show that the mean pressure for T
P= 2
ρ
5 F 1+ 5π 2 T
12 TF 2 TF is given by +... . Use the general relation between E and P V to show that
5π 2 T 2
3
+... .
E = N F 1+
5
12 TF (6.175) (6.176) Also show that the low temperature behavior of the heat capacity at constant volume is given by
T
π2
Nk
.
(6.177)
CV =
2
TF
For completeness, show that the low temperature behavior of the entropy is given by
π2
T
S=
Nk
.
2
TF (6.178) Why is it not possible to calculate S by using the relations Ω = −P V and S = −∂ Ω/∂T , with P
given by (6.175)?
We see from (6.177) that the conduction electrons of a metal contribute a linear term to the
heat capacity. In Section 6.12 we shall see that the contribution from lattice vibrations contributes
a term proportional to T 3 to CV at low T . Thus for suﬃciently low temperature, the linear term
dominates.
Problem 6.37. In Problem 6.32b we found that TF = 8.5 × 104 K for Copper. Use (6.177) to ﬁnd
the predicted value of C/N kT for Copper. How does this value compare with the experimental
value C/N kT = 8 × 10−5 ? It is remarkable that the theoretical prediction agrees so well with
the experimental result based on the free electron model. Show that the small discrepancy can be
removed by deﬁning an eﬀective mass m∗ of the conduction electrons equal to 1.3 me , where me is
the mass of an electron. What factors might account for the eﬀective mass being greater than m e ?
Problem 6.38. Consider a system of electrons restricted to a twodimensional surface of area A.
Show that the mean number of electrons can be written as
N= mA
π2 ∞
0 d
eβ ( −µ) +1 . (6.179) The integral in (6.179) can be evaluated in closed form using
ebx
dx
1
= ln
+ constant.
1 + aebx
b 1 + aebx (6.180) 268 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
Show that
µ(T ) = kT ln eρπ 2 /mkT −1 , (6.181) where ρ = N/A. What is the value of the Fermi energy F = µ(T = 0)? What is the value of µ for
T
TF ? Plot µ versus T and discuss its qualitative dependence on T . 6.11 Bose Condensation The historical motivation for discussing the noninteracting Bose gas is that this idealized system
exhibits BoseEinstein condensation. The original prediction of BoseEinstein condensation by
Satyendra Nath Bose and Albert Einstein in 1924 was considered by some to be a mathematical
artifact or even a mistake. In the 1930’s Fritz London realized that superﬂuid liquid helium could
be understood in terms of BoseEinstein condensation. However, the analysis of superﬂuid liquid
helium is complicated by the fact that the helium atoms in a liquid strongly interact with one
another. For many years scientists tried to create a Bose condensate in less complicated systems.
In 1995 several groups used laser and magnetic traps to create a BoseEinstein condensate of alkali
atoms at approximately 10−6 K. In these systems the interaction between the atoms is very weak
so that the ideal Bose gas is a good approximation and is no longer only a textbook example. 8
Although the form of the Landau potential for the ideal Bose gas and the ideal Fermi gas
diﬀers only superﬁcially (see (6.84)), the two systems behave very diﬀerently at low temperatures.
The main reason is the diﬀerence in the ground states, that is, for a Bose system there is no limit
to the number of particles in a single particle state.
The ground state of an ideal Bose gas is easy to construct. We can minimize the total energy
by putting all the particles into the single particle state of lowest energy:
1 = π2 2 2
π2 2
(1 + 12 + 12 ) = 3
.
2
2mL
2mL2 (6.182) Note that 1 is a very small energy for a macroscopic system. The energy of the ground state is
given by N 1 . For convenience, we will choose the energy scale such that the ground state energy
is zero. The behavior of the system cannot depend on the choice of the zero of energy.
The behavior of an ideal Bose gas can be understood by calculating N (T, V, µ): k = ∞ 1 N=
V
4π 2 eβ ( k −µ) −1 (2m)3/2
3 0 → n( )g ( )d (6.183) 0 ∞ ∞ 1/2 eβ ( d
=gV
−µ) − 1 0 1/2 d
.
−1 eβ ( −µ) (6.184) For simplicity, we will assume that our gas of bosons has zero spin, the same value of the spin as
the helium isotope 4 He.
8 The 2001 Nobel Prize for Physics was awarded to Eric Cornell, Wolfgang Ketterle, and Carl Wieman for achieving
BoseEinstein condensation in dilute gases of alkali atoms and for early fundamental studies of the properties of the
condensate. 269 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS To understand the nature of an ideal Bose gas at low temperatures, suppose that the mean
density of the system is ﬁxed and consider the eﬀect of lowering the temperature. The correct
choice of µ gives the desired value of ρ when substituted into (6.185).
ρ= N
=g
V ∞
0 1/2 eβ ( d
.
−1 (6.185) −µ) We study the behavior of µ as a function of the temperature in Problem 6.39.
Problem 6.39. We know that in the high temperature limit, the chemical potential µ must be
given by its classical value given in (6.112). As we saw in Problem 6.24, µ is negative and large
in magnitude. Let us investigate numerically how µ changes as we decrease the temperature.
The application/applet ComputeBoseIntegralApp evaluates the integral on the righthand side of
(6.185) for a given value of β and µ. The goal is to ﬁnd the value of µ for a given value of T that
yields the desired value of ρ.
Let ρ∗ = ρ/g = 1 and begin with T = 10. First choose µ = −10 and ﬁnd the computed value
of the righthand side. Do you have to increase or decrease the value of µ to make the computed
value of the integral closer to ρ∗ = 1? By using trial and error, you should ﬁnd that µ ≈ −33.4.
Next choose T = 5 and ﬁnd the value of µ needed to keep ρ∗ ﬁxed at ρ∗ = 1. Does µ increase or
decrease in magnitude? Note that you can generate a plot of µ versus T by clicking on the Accept
parameters button.
We found numerically in Problem 6.39 that as T is decreased at constant density, µ must
decrease. Because µ is negative for BoseEinstein statistics, this dependence implies that µ becomes
less negative. However, this behavior implies that there would be a lower bound for the temperature
at which µ = 0 (the upper bound for µ for Bose systems). We can ﬁnd the value of this temperature
by solving (6.185) with µ = 0:
∞ ρ=g
0 ∞ 1/2 eβc d
= g(kTc )3/2
−1 0 x1/2 dx
,
ex − 1 (6.186) where Tc is the value of T at which µ = 0. The deﬁnite integral in (6.186) can be written in terms
of known functions (see Appendix A) and has the value:
∞
0 We have
kTc = π 1/2
x1/2 dx
= 2.612
= C.
x−1
e
2
ρ
gC 2/3 = 4π
C 2/3 (6.187) 2 2ma2 , (6.188) where a = ρ−1/3 is the mean interparticle spacing. We thus obtain the temperature Tc that
satisﬁes (6.186) for ﬁxed density. Note that the energy 2 /2ma2 in (6.188) can be interpreted as
the zeropoint energy associated with localizing a particle of mass m in a volume a 3 .
Similarly we can ﬁnd the maximum density for a given temperature:
ρc = 2.612
.
λ3 (6.189) 270 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS Problem 6.40. Use ComputeBoseIntegralApp to ﬁnd the numerical value of T at which µ = 0
for ρ∗ = 1. Conﬁrm that your numerical value is consistent with (6.188).
Problem 6.41. Show that the thermal de Broglie wavelength is comparable to the interparticle
spacing at T = Tc . What is the implication of this result? P T
Figure 6.5: Sketch of the dependence of the pressure P on the temperature T for a typical gas and
liquid.
Of course there is no physical reason why we cannot continue lowering the temperature at ﬁxed
density (or increasing the density at ﬁxed temperature). Before discussing how we can resolve this
diﬃculty, consider a familiar situation in which an analogous phenomena occurs. Suppose that we
put Argon atoms into a container of ﬁxed volume at a given temperature. If the temperature is high
enough and the density is low enough, Argon will be a gas and obey the ideal gas equation of state
which we write as P = N kT /V . If we now decrease the temperature, we expect that the measured
pressure will decrease. However at some temperature, this dependence will abruptly break down,
and the measured P will stop changing as indicated in Figure 6.5. We will not study this behavior
of P until Chapter 9, but we might recognize this behavior as a signature of the condensation of
the vapor and the existence of a phase transition. That is, at a certain temperature for a ﬁxed
density, droplets of liquid Argon will begin to form in the container. As the temperature is lowered
further, the liquid droplets will grow, but the pressure will remain constant because most of the
extra particles go into the denser liquid state.
We can describe the ideal Bose gas in the same terms, that is, in terms of a phase transition.
That is, at a critical value of T , the chemical potential stops increasing and reaches its limit of
µ = 0. Beyond this point, the relation (6.184) is no longer able to keep track of all the particles.
The resolution of the problem lies with the behavior of the threedimensional density of states
g ( ), which is proportional to 1/2 (see (6.99)). Because of this dependence on , g ( = 0) = 0,
and our calculation of N has ignored all the particles in the ground state. For the classical and
Fermi noninteracting gas, this neglect is of no consequence. In the classical case the mean number
of particles in any state is much less than unity, while in the degenerate Fermi case there are only 271 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS two electrons in the zero kinetic energy state. However, for the noninteracting Bose gas, the mean
number of particles in the ground state is given by
N0 = 1
,
e−βµ − 1 (6.190) (Remember that we have set 0 = 0.) When T is suﬃciently low, N0 will be very large. Hence, the
denominator of (6.190) must be very small, which implies that e−βµ ≈ 1 and the argument of the
exponential −βµ must be very small. Therefore, we can approximate e−βµ as 1 − βµ and N0
1
becomes
kT
.
(6.191)
N0 = −
µ
The chemical potential must be such that the number of particles in the ground state approaches
its maximum value which is order N . Hence, if we were to use the integral (6.184) to calculate N
for T < Tc , we would have ignored the particles in the ground state. We have resolved the problem
– the missing particles are in the ground state. The phenomena we have described, macroscopic
occupation of the ground state, is called BoseEinstein condensation. That is for T < T c , N0 /N is
nonzero in the limit of N → ∞.
Now that we know where to ﬁnd the missing particles, we can calculate the thermodynamics
of the ideal Bose gas. For T < Tc , the chemical potential is zero in the thermodynamic limit, and
the number of particles not in the ground state is given by (6.184):
N= V
4π 2 (2m)3/2
3 ∞
0 1/2 eβ d
T
=N
−1
Tc 3/2 , (T < Tc ) (6.192) where Tc is deﬁned by (6.188). All of the remaining particles, which we denote as N0 , are in
the ground state, that is, have energy = 0. Another way of understanding (6.192) is that for
T < Tc , µ must be zero because the number of particles not in the ground state is determined by
the temperature. Thus
N0 = N − N = N 1 − T
Tc 3/2 . (T < Tc ) (6.193) Note that for T < Tc , a ﬁnite fraction of the particles are in the ground state.
Because the energy of the gas is determined by the particles with > 0, we have for T < T c
∞ E=
0 g( ) d
V (mkT )3/2 kT
=
eβ − 1
21/2 π 2 3 ∞
0 x3/2 dx
.
ex − 1 (6.194) The deﬁnite integral in (6.194) is given in Appendix A:
∞
0 x3/2 dx
3π 1/2
= 1.341
.
ex − 1
4 (6.195) If we substitute (6.195) into (6.194), we can write the energy as
E=3 m3/2 (kT )5/2
1.341 V (mkT )3/2 kT
= 0.1277 V
.
3
3
25/2 π 3/2 (6.196) 272 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
Note that E ∝ T 5/2 for T < Tc . The heat capacity at constant volume is
CV = ∂E
(mkT )3/2 k
= 3.2V
,
3
∂T (6.197) or
CV = 1.9N k . (6.198) Note that the heat capacity has a form similar to an ideal classical gas for which C V = 1.5N k .
The pressure of the Bose gas for T < Tc can be obtained easily from the general relation
P V = 2 E for a nonrelativistic ideal gas. From (6.196) we obtain
3
P= 1.341 m3/2 (kT )5/2
3 23/2 π 3/2 = 0.085 m3/2(kT )5/2
3 . (6.199) Note that the pressure is proportional to T 5/2 and is independent of the density. This behavior is
a consequence of the fact that the particles in the ground state do not contribute to the pressure.
If additional particles are added to the system at T < Tc , the number of particles in the state = 0
increases, but the pressure does not increase.
What is remarkable about the phase transition in an ideal Bose gas is that it occurs at all.
That is, unlike all other known transitions, its occurrence has nothing to do with the interactions
between the particles and has everything to do with the nature of the statistics. Depending on
which variables are being held constant, the transition in an ideal Bose gas is either ﬁrstorder or
continuous. We postpone a discussion of the nature of the phase transition of an ideal Bose gas
until Chapter 9 where we will discuss phase transitions in more detail. It is suﬃcient to mention
here that the order parameter in the ideal Bose gas can be taken to be the fraction of particles
in the ground state, and this fraction goes continuously to zero as T → Tc from below at ﬁxed
density.
What makes the Bose condensate particular interesting is that for T < Tc , a ﬁnite fraction
of the atoms are described by the same quantum wavefunction, which gives the condensate many
unusual properties. In particular, Bose condensates have been used to produce atom lasers – laserlike beams in which photons are replaced by atoms – and to study fundamental processes such as
superﬂuidity.
Problem 6.42. Show that the ground state contribution to the pressure is given by
kT
(6.200)
ln(N0 + 1).
V
Explain why P0 can be regarded as zero and why the pressure of an Bose gas for T < Tc is
independent of the volume.
P0 = Problem 6.43. What is the approximate value of Tc for a noninteracting Bose gas at a density
of ρ = 0.14 gm cm−3 , the density of liquid 4 He? Take m = 6.65 × 10−27 kg. 6.12 The Heat Capacity of a Crystalline Solid The free electron model of a metal successfully explains the temperature dependence of the contribution to the heat capacity from the electrons. What about the contribution from the ions? 273 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS In a crystal each ion is localized about its lattice site and oscillates due to springlike forces between nearestneighbor atoms. Classically, we can regard each atom of the solid as having three
1
degrees of freedom, each of which contributes 2 kT to the mean kinetic energy and 1 kT to the
2
mean potential energy. Hence, the heat capacity at constant volume of a homogeneous isotropic
solid is given by CV = 3N k , independent of the nature of the solid. This behavior of CV agrees
with experiment remarkably well at high temperatures, where the meaning of high temperature
will be deﬁned later in terms of the parameters of the solid. At low temperatures, the classical
behavior is an overestimate of the experimentally measured heat capacity, and C V is found to be
proportional to T 3 . To understand this behavior, we ﬁrst consider the Einstein model and then
the more sophisticated Debye model of a solid. 6.12.1 The Einstein model The reason why the heat capacity starts to decrease at low temperature is that the oscillations of
the crystal must be treated quantum mechanically rather than classically. The simplest model of
a solid, proposed by Einstein in 1906, is that each atom behaves like three independent harmonic
oscillators each of frequency ω and possible energies = (n + 1 ) ω . Because the 3N identical
2
oscillators are independent and are associated with distinguishable sites, we need only to ﬁnd the
thermodynamic functions of one of them. The partition function for one oscillator in one dimension
is
∞ Z1 = e −β ω /2 e−β ωn (6.201a) n=0
−β ω /2 = e
1 − e−β ω . (6.201b) (We considered this calculation in Example 4.4.) Other thermodynamic properties of one oscillator
are given by
ω
f = −kT ln Z1 =
+ kT ln[1 − e−β ω ].
(6.202)
2
s=− ∂f
= −k ln[1 − e−β
∂T ω
+β ω 1
eβ ω 1
e = f + T s = (n + ) ω ,
2 −1 . (6.203)
(6.204) where 1
.
(6.205)
eβ ω − 1
Note the form of n. To obtain the extensive quantities such as F , S , and E , we multiply the single
particle values by 3N . For example, the heat capacity of an Einstein solid is given by
n= CV = ∂E
∂T V = 3N = 3N k (β ω )2 ∂e
∂T
eβ [eβ ω (6.206) V ω − 1]2 . (6.207) CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS 274 It is convenient to introduce the Einstein temperature
kTE = ω ,
and rewrite CV as
CV = 3N k TE
T 2 (6.208)
eTE /T
.
− 1]2 [eTE /T (6.209) The limiting behavior of CV from (6.207) or (6.209) is
CV → 3N k. (T TE ) (6.210) and ω 2 − ω/kT
)e
.
(T
TE )
(6.211)
kT
The calculated heat capacity is consistent with the third law of thermodynamics and is not very
diﬀerent from the heat capacity actually observed for insulating solids. However, it decreases too
quickly at low temperatures and is not consistent with the observed low temperature behavior
satisﬁed by all solids:
CV ∝ T 3 .
(6.212)
CV → 3N k ( Problem 6.44. Explain the form of n in (6.205). Why is the chemical potential zero in this case?
Problem 6.45. Derive the limiting behavior in (6.210) and (6.211). 6.12.2 Debye theory The Einstein model is based on the idea that each atom behaves like an harmonic oscillator whose
motion is independent of the other atoms. A better approximation was made by Debye (1912) who
observed that solids can carry sound waves. Because waves are inherently a collective phenomena
and are not associated with the oscillations of a single atom, it is better to think of a crystalline
solid in terms of the collective rather than the independent motions of the atoms. The collective
or cooperative motions correspond to normal modes of the system, each with its own frequency.
There are two independent transverse modes and one longitudinal mode corresponding to
transverse and longitudinal sound waves with speeds, ct and cl , respectively. (Note that ct and cl
are speeds of sound, not light.) Given that the density of states of each mode is given by ( 6.94),
the density of states of the system is given by
g (ω )dω = (2gt + gl )dω = V ω 2 dω 2
1
+3.
2π 2
c3
cl
t (6.213) It is convenient to deﬁne a mean speed of sound c by the relation
1
3
2
3 = c3 + c3 .
c
t
l (6.214) Then the density of states can be written as
g (ω ) dω = 3V ω 2 dω
.
2π 2 c3 (6.215) CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS 275 The total energy is given by
ω n(ω )g (ω ) dω, E=
= (6.216a) ω 3 dω
.
−1 (6.216b) 3V
2π 2 c3 eβ ω Equation (6.216b) does not take into account the higher frequency modes that do not satisfy the
linear relation ω = kc. However, we do not expect that the higher frequency modes will contribute
much to the heat capacity. After all, we already know that the Einstein model gives the correct
high temperature behavior. Because the low temperature heat capacity depends only on the low
frequency modes, which we have treated correctly using (6.215), it follows that we can obtain a
good approximation to the heat capacity by extending (6.215) beyond its range of validity up to a
cutoﬀ frequency chosen to give the correct number of modes. That is, we assume that g (ω ) ∝ ω 2
up to a maximum frequency ωD such that
ωD g (ω ) dω. 3N = (6.217) 0 If we substitute (6.215) into (6.217), we ﬁnd that
3ρ
4π ωD = 2π c 1/3 . (6.218) It is convenient to relate the maximum frequency ωD to a characteristic temperature, the Debye
temperature TD , by the relation
ωD = kTD .
(6.219)
The thermal energy can now be expressed as
E= 3V
2π 2 c3 = 9N kT k TD /
0 T
TD ω 3 dω
,
−1 eβ ω TD /T 3
0 x3 dx
.
ex − 1 (6.220a)
(6.220b) In the high temperature limit, TD /T → 0, and the important contribution to the integral in
(6.220b) comes from small x. Because the integrand is proportional x2 for small x, the integral is
proportional to (T /TD )−3 , and hence the energy is proportional to T . Thus in the high temperature
limit, the heat capacity is independent of the temperature, consistent with the law of Dulong and
Petit. In the low temperature limit TD /T → ∞, and the integral in (6.220b) is independent of
temperature. Hence in the limit T → 0, the energy is proportional to T 4 and the heat capacity is
proportional to T 3 , consistent with experimental results at low temperature. Vocabulary
thermal de Broglie wavelength, λ 276 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
equipartition theorem
Maxwell velocity and speed distribution
occupation numbers, spin and statistics, bosons and fermions
BoseEinstein distribution, FermiDirac distribution, MaxwellBoltzmann distribution
single particle density of states, g ( )
Fermi energy F, temperature TF , and momentum pF macroscopic occupation, BoseEinstein condensation
Einstein and Debye theories of a crystalline solid, law of Dulong and Petit Appendix 6A: Low Temperature Expansion
For convenience, we repeat the formal expressions for the thermodynamic properties of a noninteracting Fermi gas at temperature T . The mean number of particles is given by
N= 21/2 V m3/2
π2 3 ∞ 1/2 d
.
+1 eβ ( −µ) 0 (6.221) After an integration by parts, the Landau potential Ω is given by the expression (see ( 6.103))
Ω=− 2 21/2 V m3/2
3
π2 3 ∞
0 3/2 d
.
+1 eβ ( −µ) (6.222) The integrals in (6.221) and (6.222) cannot be expressed in terms of familiar functions for
all T . However, in the limit T
TF (as is the case for almost all metals), it is suﬃcient to
approximate the integrals. To understand the approximations, we express the integrals ( 6.221)
and (6.222) in the form
∞
f( ) d
I=
,
(6.223)
eβ ( −µ) + 1
0
where f ( ) = 1/2 and e3/2 , respectively.
The expansion procedure is based on the fact that the Fermi distribution function n( ) diﬀers
from its T = 0 form only in a small range of width kT about µ. We let − µ = kT x and write I as
∞ I = kT f (µ + kT x)
dx
ex + 1
−βµ (6.224) 0 = kT f (µ + kT x)
dx + kT
ex + 1
−βµ ∞
0 In the ﬁrst integrand in (6.225) we let x → −x so that
βµ I = kT
0 f (µ − kT x)
dx + kT
e−x + 1 ∞
0 f (µ + kT x)
dx.
ex + 1 (6.225) f (µ + kT x)
dx.
ex + 1 (6.226) 277 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
We next write 1/(e−x + 1) = 1 − 1/(ex + 1) in the ﬁrst integrand in (6.226) and obtain
βµ I = kT
0 βµ f (µ − kT x) dx − kT 0 f (µ − kT x)
dx + kT
ex + 1 ∞
0 f (µ + kT x)
dx.
ex + 1 (6.227) Equation (6.227) is still exact.
Because we are interested in the limit T
TF or βµ
1, we can replace the upper limit in
the second integral by inﬁnity. Then after making a change of variables in the ﬁrst integrand, we
ﬁnd
µ
∞
f (µ + kT x) − f (µ − kT x)
I = f ( ) d + kT
dx.
(6.228)
ex + 1
0
0
The values of x that contribute to the integrand in the second term in (6.228) are order unity, and
hence it is reasonable to expand f (µ ± kT x) in a power series in kT x and integrate term by term.
The result is
µ I= ∞ f ( ) d + 2(kT )2 f (µ)
0 0 xdx
1
dx + (kT )4 f (µ)
x+1
e
3 ∞
0 x3 dx
dx + . . .
ex + 1 (6.229) The deﬁnite integrals in (6.229) can be evaluated using analytical methods (see Appendix A). The
results are
∞ x dx
π2
=
x+1
12
0e
∞3
x dx
7π 4
=
x+1
120
0e (6.230)
(6.231) If we substitute (6.230) and (6.231) into (6.229), we obtain our desired result
µ I= f( ) d +
0 π2
7π 4
(kT )2 f (µ) +
(kT )4 f
6
360 +... (6.232) Note that although we expanded f (µ − kT x) in a power series in kT x, the expansion of I in ( 6.232)
is not a power series expansion in (kT )2 . Instead (6.232) represents an asymptotic series that is a
good approximation to I if only the ﬁrst several terms are retained.
To ﬁnd Ω in the limit of low temperatures, we let f ( ) =
(6.232) we ﬁnd that in the limit of low temperatures in (6.232). From (6.222) and 2 21/2 V m3/2 2 5/2 π 2
µ+
(kT )2 µ1/2 .
3
π2 3
5
4 (6.233) ∂Ω
V (2m)3/2 3/2 π 2
=
µ+
(kT )2 µ−1/2 .
∂µ
3π 2 3
8 (6.234) Ω=−
N =− 3/2 Note that the expansions in (6.233) and (6.234) are asymptotic and provide good approximations
only if the ﬁrst few terms are kept. A more careful derivation of the low temperature behavior of
an ideal Fermi gas is given by Weinstock. CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS 278 Additional Problems
Problems
6.1
6.2, 6.3
6.4, 6.5, 6.6, 6.7, 6.8
6.10
6.11
6.12, 6.13
6.14, 6.15, 6.16
6.17
6.18
6.19, 6.20, 6.21, 6.22, 6.23
6.24
6.25, 6.26, 6.27
6.28, 6.29
6.30, 6.31
6.32
6.33, 6.34
6.36, 6.37, 6.38
6.41
6.42, 6.43
6.44, 6.45 page
231
233
236
238
239
242
244
245
249
251
253
254
258
259
262
262
267
270
272
274 Listing of inline problems.
Problem 6.46. We can write the total energy of a system of N particles in the form
N E=
i=1 N N p2
i
+
uij
2m i=j +1 j =1 (6.235) where uij = u(ri − rj ) is the interaction energy between particles i and j . Discuss why the
partition function of a classical system of N particles can be written in the form
ZN = P
P2
1
d3Np d3Nr e−β i pi /2m e−β i<j uij .
3N
N! h (6.236) ∗ Problem 6.47. Assume periodic boundary conditions so that the wave function ψ satisﬁes the
condition (in one dimension)
ψ (x) = ψ (x + L).
(6.237)
The form of the one particle eigenfunction consistent with (6.237) is given by
ψ (x) ∝ eikx x . (6.238) What are the allowed values of kx ? How do they compare with the allowed values of kx for a particle
in a onedimensional box? Generalize the form (6.238) to a cube and determine the allowed values
of k. Find the form of the density of states and show that the same result (6.91) is obtained. 279 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
Problem 6.48. Estimate the chemical potential of one mole of helium gas at STP. Problem 6.49. Suppose that two systems are initially in thermal and mechanical equilibrium,
but not in chemical equilibrium, that is, T1 = T2 , P1 = P2 , but µ1 = µ2 . Use reasoning similar to
that used in Section 2.12 to show that particles will move from the system with higher density to
the system at lower density.
Problem 6.50. Explain in simple terms why the mean kinetic energy of a classical particle in
1
equilibrium with a heat bath at temperature T is 2 kT per quadratic contribution to the kinetic
energy, independent of the mass of the particle.
Problem 6.51. The atoms we discussed in Section 6.3 were treated as symmetrical, rigid structures capable of only undergoing translation motion, that is, their internal motion was ignored.
Real molecules are neither spherical nor rigid, and rotate about two or three axes and vibrate
with many diﬀerent frequencies. For simplicity, consider a linear rigid rotator with two degrees of
freedom. The rotational energy levels are given by
2 (j ) = j (j + 1) 2I , (6.239) where I is the moment of inertia and j = 0, 1, 2, . . . The degeneracy of each rotational level is
(2j + 1).
(a) Find the partition function Zrot for the rotational states of one molecule.
(b) For T
Tr = 2 /(2kI ), the spectrum of the rotational states may be approximated by a
continuum and the sum over j can be replaced by an integral. Show that the rotational heat
capacity is given by CV,rot = N k in the high temperature limit. Compare this result with the
prediction of the equipartition theorem.
(c) A more accurate evaluation of the sum for Zrot can be made using the EulerMaclaurin formula
(see Appendix A)
∞ ∞ f (x) =
i=0 0 1
1
1
f (0) + . . .
f (x) dx + f (0) − f (0) +
2
12
720 (6.240) Show that the corresponding result for CV,rot is
CV,rot = N k [1 + 1 Tr
45 T (d) Show that the leading contribution to CV,rot for T
CV,rot = 12N k Tr
T 2 + . . .]. (6.241) Tr s is 2 −2Tr /T e +... (6.242) CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS 280 not done Figure 6.6: A schematic representation of a diatomic molecule.
Problem 6.52. In Section 6.3 we found the speciﬁc heat of monatomic gases using the equipartition theorem. In this problem we consider the speciﬁc heat of a diatomic gas. A monatomic
gas is described by three independent coordinates and is said to have three degrees of freedom per
particle. The total energy of a diatomic gas is a sum of three terms, a translational, rotational,
and vibrational part, and hence the total speciﬁc heat of the gas can be written as
cv = ctr + crot + cvib . (6.243) The last two terms in (6.243) arise from the internal degrees of freedom, two for rotation and one
for vibration. (Some textbooks state that there are two vibrational degrees of freedom because the
vibrational energy is part kinetic and part potential.) What is the high temperature limit of c v for
a diatomic gas? The values of 2 /2kI and ω /k for H2 are 85.5 K and 6140 K, respectively, where
ω is the vibrational frequency. What do you expect the value of cv to be at room temperature?
Sketch the T dependence of cv in the range 10 K ≤ T ≤ 10000 K.
Problem 6.53. What is the probability that a classical nonrelativistic particle has kinetic energy
in the range to + d ?
Problem 6.54. Consider a classical ideal gas in equilibrium at temperature T in the presence of
an uniform gravitational ﬁeld. Find the probability P (z )dz that an atom is at a height between z
and z + dz above the earth’s surface. How do the density and the pressure depend on z ?
Problem 6.55. A system of glass beads or steel balls is an example of a granular system. In
such system the beads are macroscopic objects and the collisions between the beads is inelastic.
(Think of the collision of two basketballs.) Because the collisions in such a system are inelastic,
a gaslike steady state is achieved only by inputting energy, usually by shaking or vibrating the
walls of the container. Suppose that the velocities of the particles are measured in a direction
perpendicular to the direction of shaking. Do you expect the distribution of the velocities to
be given by a Gaussian distribution as in (6.56)? See for example, the experiments by Daniel
L. Blair and Arshad Kudrolli, “Velocity correlations in dense granular gases,” Phys. Rev. E 64,
050301(R) (2001) and the theoretical arguments by J. S. van Zon and F. C. MacKintosh, “Velocity
distributions in dissipative granular gases,” Phys. Rev. Lett. 93, 038001 (2004).
∗ Problem 6.56. In one of his experiments on gravitational sedimentation, Perrin observed the
number of particles in water at T = 293 K and found that when the microscope was raised by 281 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS 100 µm, the mean number of particles in the ﬁeld of view decreased from 203 to 91. Assume that
the particles have a mean volume of 9.78 × 10−21 m3 and a mass density of 1351 kg/m3 . The density
of water is 1000 kg/m3 . Use this information to estimate the magnitude of Boltzmann’s constant.
Problem 6.57.
(a) What is the most probable kinetic energy of an atom in a classical system in equilibrium with
1
˜
˜
a heat bath at temperature T ? Is it equal to 2 mv 2 , where v is the most probable
2 22
2
(b) Find the following mean values for the same system: v x , vx , vx vy , and vx vy . Try to do a
minimum of calculations. Problem 6.58. Consider a classical onedimensional oscillator whose energy is given by
p2
+ ax4 ,
2m
where x, p, and m have their usual meanings; the parameter a is a constant.
= (6.244) (a) If the oscillator is in equilibrium with a heat bath at temperature T , calculate the mean kinetic
energy, the mean potential energy, and the mean total energy of the oscillator.
(b) Consider a classical onedimensional oscillator whose energy is given by
= 1
p2
+ kx2 + ax4 .
2m 2 (6.245) In this case the anharmonic contribution ax4 is very small. What is the leading contribution
of this term to the mean potential energy?
Problem 6.59. Consider a system consisting of two noninteracting particles connected to a heat
bath at temperature T . Each particle can be in one of three states with energies 0, 1 , and 2 .
Find the partition function for the following cases:
(a) The particles obey MaxwellBoltzmann statistics and can be considered distinguishable.
(b) The particles obey FermiDirac statistics.
(c) The particles obey BoseEinstein statistics.
(d) Find the probability in each case that the ground state is occupied by one particle.
(e) What is the probability that the ground state is occupied by two particles?
(f) Estimate the probabilities in (d) and (e) for kT = 2 = 2 1. Problem 6.60. Show that the grand partition function Z can be expressed as
∞ Z= eβµN ZN (6.246) N =0 where ZN is the partition function for a system of N particles. Consider a system of noninteracting
(spinless) fermions such that each particle can be a single particle state with energy 0, ∆, and 2∆.
Find an expression for Z . Show how the mean number of particles depends on µ for T = 0,
kT = ∆/2, and kT = ∆. CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS 282 Problem 6.61. A system contains N identical noninteracting fermions with 2N distinct single
particle states. Suppose that 2N/3 of these states have energy zero, 2N/3 have energy ∆, and
2N/3 have energy 2∆. Show that µ is independent of T . Calculate and sketch the T dependence
of the energy and heat capacity.
Problem 6.62. Find general expressions for N , Ω, and E for a highly relativistic ideal gas and
ﬁnd a general relation between P V and E .
Problem 6.63. Calculate the chemical potential µ(T ) of a noninteracting Fermi gas at low temperatures T
TF for a onedimensional ideal Fermi gas. Use the result for µ(T ) found for the
twodimensional case in Problem 6.38 and compare the qualitative behavior of µ(T ) in one, two,
and three dimensions.
Problem 6.64. Discuss the meaning of the Fermi temperature TF . Why is it not the temperature
of the Fermi gas?
Problem 6.65. High temperature limit for ideal Fermi gas
If T
TF at ﬁxed density, quantum eﬀects can be neglected and the thermal properties of an
ideal Fermi gas reduces to the ideal classical gas. Does the pressure increase or decrease when the
temperature is lowered (at constant density)? That is, what is the ﬁrst quantum correction to the
classical equation of state? The pressure is given by (see (6.103))
(2m)3/2
3π 2 3 P=
In the high temperature limit, eβµ 0 3/2 d
.
+1 eβ (x−µ) (6.247) 1, and we can make the expansion
1 eβ ( ∞ −µ) +1 = eβ (µ−
≈ eβ (µ− 1
1 + e−β ( −µ)
)
[1 − e−β ( −µ) ].
) (6.248a)
(6.248b) If we use (6.248b), we obtain
∞ eβµ
0 x3/2 e−x (1 − eβµ e−x ) dx = 1
3 1/2 βµ
π e [1 − 5/2 eβµ ].
4
2 (6.249) Use (6.249) to show that P is given by
P= m3/2 (kT )5/2 βµ
1
e 1 − 5/2 eβµ .
21/2 π 3/2 3
2 (6.250) Find a similar expression for N . Eliminate µ and show that the leading order correction to the
equation of state is given by
π 3/2
ρ3
,
4 (mkT )3/2
1
= N kT 1 + 7/2 ρλ3 .
2 P V = N kT 1 + (6.251a)
(6.251b) 283 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS What is the condition for the correction term in (6.251b) to be small? Note that as the temperature
is lowered at constant density, the pressure increases. This dependence implies that quantum eﬀects
due to Fermi statistics lead to an eﬀective “repulsion” between the particles. What do you think
would be the eﬀect of Bose statistics in this context (see Problem 6.69)?
Mullin and Blaylock have emphasized that it is misleading to interpret the sign of the correction term in (6.251b) in terms of an eﬀective repulsive exchange “force,” and stress that the
positive sign is a consequence of the symmetrization requirement for same spin fermions.
∗ Problem 6.66. Numerical calculation of the chemical potential for the ideal Fermi gas
Although it is not possible to do integrals of the type (6.223) analytically for all T , we can perform
these integrals numerically. From (6.227) show that we can express I in the form
µ I= ∞ f ( ) d + kT
0 0 f (µ + kT x) − f (µ − kT x)
dx + kT
ex + 1 ∞
βµ f (µ − kT x)
dx.
ex + 1 (6.252) To calculate N we choose f ( ) = 21/2 V m3/2 1/2 /π 2 3 (see (6.221)). We also introduce the dimensionless variables u and t deﬁned by the relations µ = u F and T = tTF . As a result, show that
we ﬁnd the following implicit equation for u:
3
1 = u3/2 + t
2 ∞
0 (u + tx)1/2 − (u − tx)1/2
3
dx + t
ex + 1
2 ∞
t−1 (u + tx)1/2
dx
ex + 1 (6.253) Show that if the dimensionless “temperature” t is zero, then the dimensionless chemical potential
u = 1. Use Simpson’s rule or a similar integration method to ﬁnd u for t = 0.1, 0.2, 0.5, 1.0, 1.2, 2,
and 4. Plot µ/ F versus T /TF and discuss its qualitative T dependence. At approximately what
temperature does u = 0?
Problem 6.67. In the text we gave a simple argument based on the assumption that C V ∼ Neﬀ k
to obtain the qualitative T dependence of CV at low temperatures for an ideal Bose and Fermi gas.
Use a similar argument based on the assumption that P V = Neﬀ kT to obtain the T dependence
of the pressure at low temperatures.
∗ Problem 6.68. Consider a system of N noninteracting fermions with single particle energies
given by n = n∆, where n = 1, 2, 3, . . . Find the mean energy and heat capacity of the system.
Although this problem can be treated exactly, it is not likely that you will be able to solve the
problem by thinking about the case of general N . The exact partition function for general N
has been found by several authors including Peter Borrmann and Gert Franke, “Recursion formulas for quantum statistical partition functions,” J. Chem. Phys. 98, 2484–2485 (1993) and K.
Sch¨nhammer, “Thermodynamics and occupation numbers of a Fermi gas in the canonical ensemo
ble,” Am. J. Phys. 68, 1032–1037 (2000).
Problem 6.69. High temperature limit for ideal Bose gas
If T
Tc at ﬁxed density, quantum eﬀects can be neglected and the thermal properties of an
ideal Bose gas reduces to the ideal classical gas. Does the pressure increase or decrease when the
temperature is lowered (at constant density)? That is, what is the ﬁrst quantum correction to the
classical equation of state? The pressure is given by (see (6.103))
P= 21/2 m3/2 (kT )5/2
3π 2 3 ∞
0 x3/2 dx
.
−1 ex−βµ (6.254) 284 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
Follow the same procedure as in Problem 6.65 and show that
P V = N kT 1 − ρ3
π 3/2
.
2 (mkT )3/2 (6.255) We see that as the temperature is lowered at constant density, the pressure becomes less than its
classical value. We can interpret this change due to an eﬀective “attraction” between the particles
due to the Bose statistics.
Problem 6.70. Does Bose condensation occur for a one and twodimensional ideal Bose gas? If
so, ﬁnd the transition temperature. If not, explain.
Problem 6.71. Discuss why Bose condensation does not occur in a gas of photons in thermal
equilibrium (black body radiation).
∗ Problem 6.72. Eﬀect of boundary conditions (a) Assume that N noninteracting bosons are enclosed in a cube of edge length L with rigid walls.
What is the ground state wave function? How does the density of the condensate vary in
space?
(b) Assume instead the existence of periodic boundary conditions. What is the spatial dependence
of the ground state wave function on this case?
(c) Do the boundary conditions matter in this case? If so, why?
∗ Problem 6.73. BoseEinstein condensation in lowdimensional traps
As we found in Problem 6.70, BoseEinstein condensation does not occur in ideal one and twodimensional systems. However, this result holds only if the system is conﬁned by rigid walls. In
the following, we will show that BoseEinstein condensation can occur if a system is conﬁned by a
spatially varying potential. For simplicity, we will treat the system semiclassically
Let us assume that the conﬁning potential has the form
V (r) ∼ rn .
Then the region accessible to a particle with energy
corresponding density of states behaves as
g ( ) ∼ Ld 1
2 d−1 ∼ d/n (6.256)
has a radius L ∼
1
2 d−1 ∼ α 1/n . Show that the , where (6.257) d
d
+ −1
(6.258)
n2
What is the range of values of n for which Tc > 0 for d = 1 and 2? More information about
experiments on BoseEinstein condensation can be found in the references.
α= ∗ Problem 6.74. Numerical evaluation of the chemical potential of an ideal Bose gas 285 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS Make the change of variables µ = kT u and β = x + u and show that (6.184) for ρ = N (T, V, µ)/V
can be written as
(2mkT )3/2 ∞ dx (x + u)1/2
.
(6.259)
ρ=
4π 2 3
ex − 1
0
Use the expression (6.188) for Tc to ﬁnd the implicit equation for u:
T
Tc 2.31 = ∞ 3/2
0 dx (x + u)1/2
.
ex − 1 (6.260) Evaluate the integral in (6.260) numerically and ﬁnd the value of u and hence µ for T = 4Tc,
T = 2Tc , T = 1.5Tc , and T = 1.1Tc.
Problem 6.75. (a) Show that if the volume of the crystal is N a3 , where a is the equilibrium
distance between atoms, then the Debye wave number, kD = ωD /c, is about π/a.
(b) Evaluate the integral in (6.220b) numerically and plot the heat capacity versus T /TD over the
entire temperature range.
∗ Problem 6.76. Show that the probability P (N ) of ﬁnding a system in the T, V, µ ensemble with
exactly N particles, regardless of their positions and momenta, is given by
1 βN µ
e
ZN (V, T ).
Z P (N ) =
Use (6.261) to show that ∞ N= N P (N ) =
N =0 z ∂Z
∂ ln Z
=
,
Z ∂z
∂βµ (6.261) (6.262) where the activity z is deﬁned as
z = eβµ . (6.263) Also show that the variance of the number of particles is given by
2 N 2 − N = kT
∗ ∂N
.
µ (6.264) Problem 6.77. Number ﬂuctuations in a noninteracting classical gas Show that the grand partition function of a noninteracting classical gas can be expressed as
∞ Z= N =0 (zZ1 )N
= ezZ1 .
N! (6.265) Show that the mean value of N is given by
N = zZ1 , (6.266) and that the probability that there are N particles in the system is given by a Poisson distribution:
N PN = (zZ1 )N
z N ZN
N −N
=
=
e.
Z
N !Z
N! (6.267) What is the variance, (N − N )2 , and the N dependence of the relative root mean square deviation,
2
[ N 2 − N ]1/2 /N ? 286 CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS
∗ Problem 6.78. Number ﬂuctuations in a degenerate noninteracting Fermi gas Use the relation
(N − N )2 = kT ∂N
∂µ (6.268) to ﬁnd the number ﬂuctuations in the noninteracting Fermi gas for ﬁxed T, V and µ. Show that
(N − N )2 =
→ kT V (2m)3/2
2 2π 2 3
3NT
.
2TF ∞
0 (T −1/2 eβ ( −µ)
TF ) d
,
+1 (6.269a)
(6.269b) Explain why the ﬂuctuations in a degenerate Fermi system are much less than in the corresponding
classical system.
∗ Problem 6.79. Absence of classical magnetism
As mentioned in Chapter 5, van Leeuwen’s theorem states that the phenomena of diamagnetism
does not exist in classical physics. Hence, magnetism is an intrinsically quantum mechanical
phenomena. Prove van Leeuwen’s theorem using the following hints.
The proof of this theorem requires the use of classical Hamiltonian mechanics for which the
regular momentum p is replaced by the canonical momentum p − A/c, where the magnetic ﬁeld
enters through the vector potential, A. Then make a change of variables that eliminates A, and
thus the electric and magnetic ﬁelds from the Hamiltonian. Because the local magnetic ﬁelds are
proportional to the velocity, they too will vanish when the integral over momenta is done in the
partition function.
∗ Problem 6.80. The FermiPastaUlam (FPU) problem The same considerations that make the Debye theory of solids possible also suggest that a molecular
dynamics simulation of a solid at low temperatures will fail. As we noted in Section 6.12, a system
of masses linked by Hooke’s law springs can be represented by independent normal modes. The
implication is that a molecular dynamics simulation of a system of particles interacting via the
LennardJones potential will fail at low temperatures because the simulation will not be ergodic.
The reason is that at low energies, the particles will undergo small oscillations, and hence the
system can be represented by a system of masses interacting via Hooke’s law springs. A initial
set of positions and velocities would correspond to a set of normal modes. Because the system
would remain in this particular set of modes indeﬁnitely, a molecular dynamics simulation would
not sample the various modes and the simulation would not be ergodic.
In 1955 Fermi, Pasta, and Ulam did a simulation of a onedimensional chain of springs connected by springs. If the force between the springs is not linear, for example, V (x) = kx2 /2+ κx4/4,
the normal modes will not be an exact representation of the system for κ > 0. Would a molecular dynamics simulation be ergodic for κ > 0? The answer to this question is nontrivial and has
interested physicists and mathematicians ever since. A good place to start is the book by Weissert. Suggestions for Further Reading CHAPTER 6. NONINTERACTING PARTICLE SYSTEMS 287 More information about BoseEinstein condensation can be found at <jilawww.colorado.edu/bec/>,
<bec.nist.gov/>, and <cua.mit.edu/ketterle_group/>.
Vanderlei Bagnato and Daniel Kleppner, “BoseEinstein condensation in lowdimensional traps,”
Phys. Rev. A 44, 7439 (1991).
Ian Duck and E. C. G. Sudarshan, Pauli and the SpinStatistics Theorem, World Scientiﬁc (1998).
This graduate level book simpliﬁes and clariﬁes the formal statements of the spinstatistics
theorem, and corrects the ﬂawed intuitive explanations that are frequently given.
David L. Goodstein, States of Matter, Prentice Hall (1975). An excellent text whose emphasis is
on the applications of statistical mechanics to gases, liquids and solids. Chapter 3 on solids
is particularly relevant to this chapter.
J. D. Gunton and M. J. Buckingham, “Condensation of ideal Bose gas as cooperative transition,”
Phys. Rev. 166, 152 (1968).
F. Herrmann and P. W¨rfel, “Light with nonzero chemical potential,” Am. J. Phys. 73, 717–721
u
(2005). The authors discuss thermodynamic states and processes involving light in which the
chemical potential of light is nonzero.
Charles Kittel, Introduction to Solid State Physics, seventh edition, John Wiley & Sons (1996).
See Chapters 5 and 6 for a discussion of the Debye model and the free electron gas.
W. J. Mullin and G. Blaylock, “Quantum statistics: Is there an eﬀective fermion repulsion or
boson attraction?,” Am. J. Phys. 71, 1223–1231 (2003).
Jan Tobochnik, Harvey Gould, and Jonathan Machta, “Understanding temperature and chemical
potential using computer simulations,” Am. J. Phys. 73, 708–716 (2005).
Donald Rogers, Einstein’s Other Theory: The PlanckBoseEinstein of Heat Capacity, Princeton
University Press (2005).
Robert Weinstock, “Heat capacity of an ideal free electron gas: A rigorous derivation,” Am. J.
Phys. 37, 1273 (1969).
Thomas P. Weissert, The Genesis of Simulation in Dynamics: Pursuing the FermiPastaUlam
Problem, SpringerVerlag (1998). Chapter 7 Thermodynamic Relations and
Processes
c 2005 by Harvey Gould and Jan Tobochnik
20 November 2005 7.1 Introduction All thermodynamic measurements can be expressed in terms of partial derivatives.1 For example,
the pressure P can be expressed as P = −∂F/∂V . Suppose that we make several thermodynamic
measurements, for example, CV , CP , and KT , the isothermal compressibility. The latter is deﬁned
as
1 ∂V
.
(isothermal compressibility)
(7.1)
KT = −
V ∂P T
Now suppose that we wish to know the (isobaric) coeﬃcient of thermal expansion α, which is
deﬁned as
1 ∂V
.
(thermal expansion coeﬃcient)
(7.2)
α=
V ∂T P
(The number of particles N is assumed to be held constant in the above derivatives.) Do we need to
make an independent measurement of α or can we determine α by knowing the values of CV , CP ,
and KT ? To answer this question and related ones, we ﬁrst need to know how to manipulate
partial derivatives. This aspect of thermodynamics can be confusing when ﬁrst encountered.
Thermodynamic systems normally have two or more independent variables. For example, the
combination E, V, N or T, P, N . Because there are many choices of combinations of independent
variables, it is important to be explicit about which variables are independent and which variables
are being held constant in any partial derivative. We suggest that you reread Appendix 2B for a
review of some of the properties of partial derivatives. The following example illustrates the power
of purely thermodynamic arguments based on the manipulation of thermodynamic derivatives.
1 This point has been emphasized by Styer. 288 CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 289 Example 7.1. The thermodynamics of black body radiation. We can derive the relation u(T ) ∝ T 4
(see (6.135a)) for u, the energy per unit volume of blackbody radiation, by using thermodynamic
arguments and two reasonable assumptions.
Solution. The two assumptions are that u depends only on T and the radiation exerts a pressure
on the walls of the cavity given by
1
(7.3)
P = u(T ).
3
Equation (7.3) follows directly from Maxwell’s electromagnetic theory and was obtained in Section 6.9 from ﬁrst principles (see Problem 6.30).
We start from the fundamental thermodynamic relation dE = T dS − P dV , and write it as
dS = dE
P
+ dV.
T
T (7.4) We let E = V u, substitute dE = V du + udV and the relation (7.3) into (7.4), and write
dS = u
1u
V du
4u
V
du + dV +
dV =
dT +
dV.
T
T
3T
T dT
3T (7.5) From (7.5) we have
∂S
∂V
∂S
∂T T
V 4u
3T
V du
.
=
T dT
= (7.6a)
(7.6b) Because the order of the derivatives is irrelevant, ∂ 2 S/∂V ∂T and ∂ 2 S/∂T ∂V are equal. Hence, we
obtain:
∂ V du
4∂ u
.
(7.7)
=
3 ∂T T
∂V T dT
Next we assume that u depends only on T and perform the derivatives in (7.7) to ﬁnd
4 1 du
1 du
u
−2=
,
3 T dT
T
T dT
which reduces to
If we substitute the form u(T ) = aT n (7.8) u
du
=4 .
(7.9)
dT
T
in (7.9), we ﬁnd that this form is a solution for n = 4:
u(T ) = aT 4 . (7.10) The constant a in (7.10) cannot be determined by thermodynamic arguments.
We can obtain the entropy by using the ﬁrst partial derivative in (7.6). The result is
S= 4
V u(T ) + constant.
3T (7.11) The constant of integration in (7.11) must be set equal to zero to make S proportional to V .
Hence, we conclude that S = 4aV T 3 /3. The above thermodynamic argument was ﬁrst given by
Boltzmann in 1884. CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 7.2 290 Maxwell Relations Example 7.1 illustrates the power of thermodynamic arguments and indicates that it would be
useful to relate various thermodynamic derivatives to one another. The Maxwell relations, which
we derive in the following, relate the various thermodynamic derivatives of E , F , G, and H to one
another and are useful for eliminating quantities that are diﬃcult to measure in terms of quantities
that can be measured directly. We will see that the Maxwell relations can be used to show that the
internal energy and enthalpy of an ideal gas depend only on the temperature. More im portantly,
we also will answer the question posed in Section 7.1 and relate the coeﬃcient of thermal expansion
to other thermodynamic derivatives.
We start with E (S, V, N ) and write
dE = T dS − P dV + µdN. (7.12) In the following we will assume that N is a constant. From (7.12) we have that
T= ∂E
∂S V . (7.13) and
P =− ∂E
∂V S . (7.14) Because the order of diﬀerentiation should be irrelevant, we obtain from (7.13) and (7.14)
∂2E
∂2E
=
,
∂V ∂S
∂S∂V (7.15) or
∂T
∂V S =− ∂P
∂S V . (7.16) Equation (7.16) is our ﬁrst Maxwell relation. The remaining Maxwell relations are obtained in
Problem 7.1.
Problem 7.1. From the diﬀerentials of the thermodynamic potentials:
dF = −SdT − P dV, (7.17) dG = −SdT + V dP,
dH = T dS + V dP, (7.18)
(7.19) derive the Maxwell relations:
∂S
∂V
∂S
∂P
∂T
∂P T
T
S ∂P
,
∂T V
∂V
=−
,
∂T P
∂V
=
.
∂S P
= (7.20)
(7.21)
(7.22) CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 291 Also consider a variable number of particles to derive the Maxwell relations
∂V
∂N P ∂µ
∂V N ∂µ
∂P = (7.23) N and 7.3 =− ∂P
∂N V . (7.24) Applications of the Maxwell Relations The Maxwell relations depend on our identiﬁcation of (∂E/∂S )V with the temperature, a relation
that follows from the second law of thermodynamics. The Maxwell relations are not purely mathematical in content, but are diﬀerent expressions of the second law. In the following, we use these
relations to derive some useful relations between various thermodynamic quantities. 7.3.1 Internal energy of an ideal gas We ﬁrst show that the internal energy E of an ideal gas is a function only of T given the pressure
equation of state, P V = N kT . That is, if we think of E as a function of T and V , we want to
show that (∂E/∂V )T = 0. From the fundamental thermodynamic relation, dE = T dS − P dV , we
see that (∂E/∂V )T can be expressed as
∂E
∂V T =T ∂S
∂V T − P. (7.25) To show that E is a function of T only, we need to show that the righthand side of (7.25) is zero.
The term involving the entropy in (7.25) can be rewritten using the Maxwell relation (7.20):
∂E
∂V T =T ∂P
∂T V − P. (7.26) Because (∂P/∂T )V = P/T for an ideal gas, we see that the righthand side of (7.26) is zero.
Problem 7.2. Show that the enthalpy of an ideal gas is a function of T only. 7.3.2 Relation between the speciﬁc heats As we have seen, it is much easier to calculate the heat capacity at constant volume than at
constant pressure. However, it is usually easier to measure the heat capacity at constant pressure.
For example, most solids expand when heated, and hence it is easier to make measurements at
constant pressure. In the following, we derive a thermodynamic relation that relates CV and CP .
First recall that
CV =
and ∂E
∂T V =T ∂S
∂T V (7.27a) CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 292 ∂H
∂S
=T
.
(7.27b)
∂T P
∂T P
We consider S as a function of T and P and write
∂S
∂S
dS =
dT +
dP,
(7.28)
∂T
∂P
and take the partial derivative with respect to temperature at constant volume of both sides of
(7.28):
∂P
∂S
∂S
∂S
=
+
.
(7.29)
∂T V
∂T P
∂P T ∂T V
We then use (7.27) to rewrite (7.29) as
CP = ∂P
∂S
CV
CP
=
.
(7.30)
+
T
∂P T ∂T V
T
Because we would like to express CP − CV in terms of measurable quantities, we use the Maxwell
relation (7.21) to eliminate (∂S/∂P ) and rewrite (7.30) as:
∂V
∂T CP − CV = T ∂P
∂T P . (7.31) = −1, (7.32) V We next use the identity (see (2.162)),
∂V
∂T P ∂T
∂P V ∂P
∂V T to eliminate (∂P/∂T )V and write:
∂V 2
∂P
.
(7.33)
∂V T ∂T P
If we substitute the deﬁnitions (7.1) of the isothermal compressibility KT and (7.2) for the thermal
expansion coeﬃcient α, we obtain the desired general relation:
CP − CV = −T CP − CV = V T2
α.
KT (7.34) Note that (7.34) is more general that the relation (2.36) which depends on only the ﬁrst law.
For an ideal gas we have KT = 1/P and α = 1/T and (7.34) reduces to the familiar result
(see (2.37))
(7.35)
CP − CV = N k.
Although we will not derive these conditions here, it is plausible that the heat capacity and
compressibility of equilibrium thermodynamic systems must be positive. Given these assumptions,
we see from (7.34) that CP > CV in general. 7.4 Applications to Irreversible Processes Although the thermodynamic quantities of a system can be deﬁned only when the system is in
equilibrium, we found in Chapter 2 that it is possible to obtain useful results for systems that pass
through nonequilibrium states if the initial and ﬁnal states are in equilibrium. In the following, we
will consider some well known thermodynamic processes. CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 293 (b) (a) Figure 7.1: (a) A gas is kept in the left half of a box by a partition. The right half is evacuated.
(b) The partition is removed and the gas expands irreversibly to ﬁll the entire box. 7.4.1 The Joule or free expansion process In a Joule or free expansion the system expands into a vacuum while the entire system is thermally
isolated (see Figure 7.1). The quantity of interest is the temperature change that is produced.
Although this process is irreversible, it can be treated by thermodynamics as we learned in Section 2.14. Because dQ = 0 and dW = 0, the energy is a constant so that dE (T, V ) = 0. This
condition can be written as
∂E
∂E
dT +
dV = 0.
(7.36)
dE =
∂T
∂V
Hence, we obtain
∂T
∂V E (∂E/∂V )T
,
(∂E/∂T )V
1
∂P
=−
T
−P .
CV
∂T V
=− (7.37)
(7.38) Equation (7.38) follows from the deﬁnition of CV and from (7.26). The partial derivative (∂T /∂V )E
is known as the Joule coeﬃcient. For a ﬁnite change in volume, the total temperature change is
found by integrating (7.38):
∆T = − V2
V1 1
∂P
T
CV
∂T V − P dV. (7.39) Because (∂P/∂T )V = P/T for an ideal gas, we conclude that the temperature of an ideal
gas is unchanged in a free expansion. If the gas is not dilute, we expect that the intermolecular
interactions are important and that the temperature will change in a free expansion. In Chapter 8
we will discuss several ways of including the eﬀects of the intermolecular interactions. For now we CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 294 will be satisﬁed a simple modiﬁcation of the ideal gas equation of state due to van der Waals (see
(2.13):
N2
(van der Waals equation of state)
(7.40)
(P + 2 a)(V − N b) = N kT.
V
Problem 7.3. Calculate (∂T /∂V )E for the van der Waals equation of state (7.40) and show that
a free expansion results in cooling.
The physical reason for the cooling of a real gas during a free expansion can be understood as
follows. The derivative (∂E/∂V )T depends only on the potential energy of the particles because
the temperature is held constant. As shown in Figure 1.1, the intermolecular potential is repulsive
for small separations r and is attractive for large r. For a dilute gas the mean separation between
the particles is greater than r0 = 21/6 σ , the distance at which the potential is a minimum. As the
volume increases, the mean separation between the molecules increases and hence the energy of
interaction becomes less negative, that is, increases. Hence we conclude that (∂E/∂V )T increases.
Because the heat capacity is always positive, we ﬁnd that (∂T /∂V )E is negative and that real
gases always cool in a free expansion. 7.4.2 JouleThomson process The JouleThomson (or JouleKelvin2 or porous plug) process is a steady state ﬂow process in
which a gas is forced through a porous plug or expansion value from a region of high pressure P1 to
a region of lower pressure P2 (see Figure 7.2). The gas is thermally isolated from its surroundings.
The process is irreversible because the gas is not in equilibrium. We will see that a real gas is
either cooled or heated in passing through the plug.
Consider a given amount (for example, one mole) of a gas that occupies a volume V1 at pressure
P1 on the lefthand side of the valve and a volume V2 at pressure P2 on the righthand side. The
work done on the gas is given by
V2 0 W =− V1 P dV − P dV. (7.41) 0 The pressure on each side of the porous plug is constant, and hence we obtain
W = P1 V1 − P2 V2 . (7.42) Because the process talks place in an isolated cylinder, there is no energy transfer due to heating,
and the change in the internal energy is given by
∆E = E2 − E1 = W = P1 V1 − P2 V2 . (7.43) E2 + P2 V2 = E1 + P1 V1 , (7.44) Hence, we obtain which can be written as
2 William Thomson was later awarded a peerage and became Lord Kelvin. CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 295 P2 P1 Figure 7.2: Schematic representation of the JouleThomson process. The two pistons ensure
constant pressures on each side of the porous plug. The porous plug can be made by packing glass
wool into a pipe. The process can be made continuous by using a pump to return the gas from the
region of low pressure to the region of high pressure. H 2 = H1 . (7.45) That is, the JouleThomson process occurs at constant enthalpy. All we can say is that the ﬁnal
enthalpy equals the initial enthalpy; the intermediate states of the gas are nonequilibrium states
for which the enthalpy is not deﬁned.
The calculation of the temperature change in the JouleThomson eﬀect is similar to our treatment of the Joule eﬀect. Because the process occurs at constant enthalpy, it is useful to write
∂H
∂H
dT +
dP = 0.
∂T
∂P
As before, we assume that the number of particles is a constant. From (7.46) we have
dH (T, P ) = dT = − (∂H/∂P )T
.
(∂H/∂T )P (7.46) (7.47) From the relation, dH = T dS + V dP , we have (∂H/∂P )T = T (∂S/∂P )T + V . If we substitute this
relation in (7.47), use the Maxwell relation (7.21), and the deﬁnition CP = (∂H/∂T )P , we obtain
∂T
∂P H = V
(T α − 1) ,
CP (7.48) where the thermal expansion coeﬃcient α is deﬁned by (7.2). Note that the change in pressure dP
is negative, that is, the gas goes from a region of high pressure to a region of low pressure. To ﬁnd
the temperature change produced in a ﬁnite pressure drop, we integrate (7.48) and ﬁnd
∆T = T2 − T1 = P2
P1 V
(T α − 1) dP.
CP (7.49) For an ideal gas, α = 1/T and ∆T = 0 as expected.
To understand the nature of the temperature change in a real gas, we calculate α for the van
der Waals equation of state (7.40). We write the latter in the form
P + aρ2 = ρkT
,
1 − bρ (7.50) CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 296 and take the derivative with respect to T at constant P :
2aρ ∂ρ
∂T P = ∂ρ
ρk
+
1 − bρ
∂T P kT
.
(1 − bρ)2 (7.51) If we express α as
α=− 1 ∂ρ
ρ ∂T P , (7.52) we can write (7.51) in the form:
kT
k
.
− 2aρ α =
(1 − bρ)2
(1 − bρ) (7.53) For simplicity, we consider only low densities in the following. In this limit we can write α as
k (1 − bρ)
,
kT − 2aρ(1 − bρ)2
1
≈ (1 − bρ) 1 + 2aβρ(1 − bρ)2 ,
T
1
≈ [1 − ρ(b − 2aβ )].
T α= (7.54a)
(7.54b)
(7.54c) From (7.54c) we obtain (T α − 1) = ρ(2aβ − b) at low densities.
We can deﬁne an inversion temperature Ti at which the derivative (∂T /∂P )H changes sign.
From (7.54) and (7.48), we see that kTi = 2a/b for a low density gas. For T > Ti , the gas warms
as the pressure falls in the JouleThomson expansion. However, for T < Ti , the gas cools as the
pressure falls.
For most gases Ti is well above room temperatures. Although the cooling eﬀect is small, the
eﬀect can be made cumulative by using the cooled expanded gas in a heat exchanger to precool
the incoming gas. 7.5 Equilibrium Between Phases Every substance can exist in qualitatively diﬀerent forms, called phases. For example, most substances exist in the form of a gas, liquid, or a solid. The most familiar substance of this type is
water which exists in the form of water vapor, liquid water, and ice.3 The existence of diﬀerent
phases depends on the pressure and temperature and the transition of one phase to another occurs
at particular temperatures and pressures. For example, water is a liquid at room temperature and
atmospheric pressure, but if it is cooled below 273.15 K, it solidiﬁes eventually, and if heated about
373.15 K it vaporizes.4 At each of these temperatures, water undergoes dramatic changes in its
properties, and we say that a phase transition occurs. The existence of distinct phases must be
the result of the intermolecular interactions, yet these interactions are identical microscopically in
3 All of the natural ice on earth is hexagonal, as manifested in sixcornered snow ﬂakes. At lower temperatures
and at pressures above about 108 Pa, many other ice phases with diﬀerent crystalline structures exist.
4 If you were to place a thermometer in a perfectly pure boiling water, the thermometer would not read 100◦ C.
A few degrees of superheating is almost inevitable. Superheating and supercooling are discussed in Section xx. CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 297 all phases. Why is the eﬀect of the interactions so diﬀerent macroscopically? The answer is the
existence of cooperative eﬀects, which we discussed brieﬂy in Section 5.5.1 and will discuss in more
detail in Chapter 8. 7.5.1 Equilibrium conditions Before we discuss the role of intermolecular interactions, we obtain the conditions for equilibrium
between two phases of a substance consisting of a single type of molecule. We discuss mixtures
of more than one substance in Section ??. For example, the phases might be a solid and a liquid
or a liquid and a gas. We know that as for any two bodies in thermodynamic equilibrium, the
temperatures T1 and T2 of the two phases must be equal:
T1 = T2 . (7.55) We also know that the pressure on the two phases must be equal,
P1 = P2 , (7.56) because the forces exerted by the two phases on each other at their surface of contact must be
equal and opposite.
We show in the following that because the number of particles N1 and N2 of each species can
vary, the chemical potentials of the two phases must be equal:
µ1 = µ2 . (7.57) Because the temperatures and pressures are uniform, we can write (7.57) as
µ1 (T, P ) = µ2 (T, P ). (7.58) Note that because µ(T, P ) = g (T, P ), where g is the Gibbs free energy per particle, we can
equivalently write the equilibrium condition (7.58) as
g1 (T, P ) = g2 (T, P ). (7.59) We now derive the equilibrium condition (7.58) for the chemical potential. Because T and P
are well deﬁned quantities for a system of two phases, the natural thermodynamic potential is the
Gibbs free energy G = E − T S + P V . Let Ni be the number of particles in phase i and gi (T, P )
be the Gibbs free energy per particle in phase i. Then G can be written as
G = N1 g1 + N2 g2 . (7.60) Conservation of matter implies that the total number of particles remains constant:
N = N1 + N2 = constant. (7.61) Suppose we let N1 vary. Because G is a minimum in equilibrium, we have
dG = 0 = g1 dN1 + g2 dN2 = (g1 − g2 )dN1 , (7.62) with dN2 = −dN1 . Hence, we ﬁnd that a necessary condition for equilibrium is
g1 (T, P ) = g2 (T, P ). (7.63) CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 298 phase coexistence
curve P
phase 1 b ∆P
a phase 2
∆T
T
Figure 7.3: Derivation of the ClausiusClapeyron equation. 7.5.2 ClausiusClapeyron equation Usually, the thermodynamics of a simple substance depends on two variables, for example, T and
P . However, if two phases of a substance are to coexist in equilibrium, then only one variable can
be chosen freely. For example, the pressure and temperature of a given amount of liquid water
may be chosen at will, but if liquid water is in equilibrium with its vapor, then the pressure of the
water equals the vapor pressure, which is a unique function of the temperature. If the pressure is
increased above the vapor pressure, the vapor will condense. If the pressure is decreased below the
vapor pressure, the liquid will evaporate.
In general, gi is a welldeﬁned function that is characteristic of the particular phase i. If T
and P are such that g1 < g2 , then the minimum value of G corresponds to all N particles in phase
1 and G = N g1 . If T and P are such that g1 > g2 , then the minimum value of G corresponds to
all N particles in phase 2 so that G = N g2 . If T and P are such that g1 = g2 , then any number
N1 of particles in phase 1 can coexist in equilibrium with N2 = N − N1 of particles in phase 2.
The locus of points (T, P ) such that g1 = g2 is called the phase coexistence curve.
We now show that the equilibrium condition (7.59) leads to a diﬀerential equation for the slope
of the phase coexistence curve. Consider two points on the phase coexistence curve, for example,
point a at T, P and nearby point b at T + ∆T and P + ∆P (see Figure 7.3). The equilibrium
condition (7.59) implies that g1 (T, P ) = g2 (T, P ) and g1 (T + ∆T, P + ∆P ) = g2 (T + ∆T, P + ∆P ).
If we write g (T + ∆T, P + ∆P ) = g (T, P ) + ∆g , we have
∆g1 = ∆g2 , (7.64) −s1 ∆T + v1 ∆P = −s2 ∆T + v2 ∆P. (7.65) or using (2.139)
Therefore, ∆P
s2 − s1
∆s
=
.
=
∆T
v2 − v1
∆v (ClausiusClapeyron equation) (7.66) CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 299 The relation (7.66) is called the ClausiusClapeyron equation. It relates the slope of the phase
coexistence curve at the point T, P to the entropy change ∆s per particle and the volume change
∆v per particle when the curve is crossed at this point. For N particles we have ∆S = N ∆s and
∆V = N ∆v , and hence (7.66) can be expressed as
dP
∆S
=
.
dT
∆V (7.67) From the relation (7.25), we can write
T ∂S
∂E
=
+ P.
∂V
∂V (7.68) At the phase coexistence curve for a given T and P , we can write
T S2 − S1
E2 − E1
=
+ P,
V2 − V1
V2 − V1 (7.69) or
T (S2 − S1 ) = (E2 − E1 ) + P (V2 − V1 ). (7.70) Because the enthalpy H = U + P V , it follows that
L2→1 = T (S2 − S1 ) = H2 − H1 . (7.71) The energy L required to melt a given amount of a solid is called the enthalpy of fusion.5 The
enthalpy of fusion is related to the diﬀerence in entropies of the liquid and the solid phase and is
given by
(7.72)
Lfusion = Hliquid − Hsolid = T (Sliquid − Ssolid ),
where T is the melting temperature at the given pressure. Similarly, the equilibrium of a vapor
and liquid leads to the enthalpy of vaporization
vaporization = hvapor − hliquid . (7.73) where h is the speciﬁc enthalpy. The enthalpy of sublimation associated with the equilibrium of
vapor and solid is given by
(7.74)
sublimation = hvapor − hsolid .
We say that if there is a discontinuity in the entropy and the volume at the transition, the
transition is discontinuous or ﬁrstorder and L = ∆H = T ∆S . Thus the ClausiusClapeyron
equation can be expressed in the form
dP
L
=
=
.
dT
T ∆V
T ∆v (7.75) 5 The more familiar name is latent heat of fusion. As we discussed in Chapter 2, latent heat is an archaic term
and is a relic from the time it was thought that there were two kinds of heat: sensible heat, the kind you can feel,
and latent heat, the kind you cannot. CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 7.5.3 300 Simple phase diagrams A typical phase diagram for a simple substance is shown in Figure 7.4a. The lines represent the
phase coexistence curves between the solid and liquid phases, the solid and vapor phases, and the
liquid and vapor phases. The condition g1 = g2 = g3 for the coexistence of all three phases leads
to a unique temperature and pressure that deﬁnes the triple point. This unique property of the
triple point makes the triple point of water a good choice for a readily reproducible temperature
reference point. If we move along the liquidgas coexistence curve toward higher temperatures, the
two phases become more and more alike. At the critical point, the liquidgas coexistence curve
ends, and the volume change ∆V between a given amount of liquid and gas has approached zero.
Beyond point c there is no distinction between a gas and a liquid, and there exists only a dense
ﬂuid phase. Note that a system can cross the phase boundary from its solid phase directly to its
vapor without passing through a liquid, a transformation known as sublimination. An important
commercial process that exploits this transformation is called freeze drying.
For most substances the slope of the solidliquid coexistence curve is positive. The ClausiusClapeyron equation shows that this positive slope is due to the fact that most substances expand
on melting and therefore have ∆V > 0. Water is an important exception and contracts when it
melts. Hence, for water the slope of the melting curve is negative (see Figure 7.4b). P
melting
curve liquid
solid sublimation
curve triple
point critical
point
vapor pressure
curve
gas T
(a)
Figure 7.4: (a) Typical phase diagram of simple substances, for example, carbon dioxide. The
triple point of CO2 is at illustrates the more common forward slope of the melting point line.
Notice that the triple point of carbon dioxide is well above one atmosphere. Notice also that at
1 atm carbon dioxide can only be the solid or the gas. Liquid carbon dioxide does not exist at 1
atm. Dry ice (solid carbon dioxide) has a temperature of −78.5◦ at room pressure which is why
you can get a serious burn (actually frostbite) from holding it in your hands. (b) Phase diagram
of water which expands on freezing. [xx not done xx] Example 7.2. Why is the triplepoint temperature of water, Ttp = 273.16 K higher than the CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 301 icepoint temperature, Tice = 273.15 K , especially given that at both temperatures ice and water
are in equilibrium?
Solution. The triplepoint temperature T3 is the temperature at which water vapor, liquid water,
and ice are in equilibrium. At T = T3 , the vapor pressure of water equals the sublimation pressure
of ice which is equal to P3 = 611 Pa. The ice point is deﬁned as the temperature at which pure ice
and airsaturated liquid water are in equilibrium under a total pressure of 1 atm = 1.013 × 105 Pa.
Hence, the triplepoint temperature and the ice point temperature diﬀer for two reasons – the total
pressure is diﬀerent and the liquid phase is not pure water.
Let us ﬁnd the equilibrium temperature of ice and pure water when the pressure is increased
from the triple point to a pressure of 1 atm. From (7.75), we have for liquidsolid equilibrium
∆T = T (vsolid − vliquid ) ∆P. (7.76) fusion Because the changes in T and P are very small, we can assume that all the terms in the coeﬃcient
of ∆P are very small. Let Tice be the equilibrium temperature of ice and pure water. If we integrate
the lefthand side of (7.76) from T3 to Tice and the right side from P3 to atmospheric pressure P ,
we obtain
T (vsolid − vliquid )
(P − P3 ).
(7.77)
Tice − T3 =
lfusion
To three signiﬁcant ﬁgures, T = 273 K, P − P3 = 1.01 × 105 Pa, vsolid = 1.09 × 10−3 m3 /kg,
vliquid = 1.00 × 10−3 m3 /kg, and fusion = 3.34 × 105 J/kg. If we substitute these values into
(7.77), we ﬁnd Tice − T3 = −0.0075 K. That is, the ice point temperature of pure water is 0.0075 K
below the temperature of the triple point. Hence, the eﬀect of the dissolved air is to lower the
temperature by 0.0023 K at which the liquid phase is in equilibrium with pure ice at atmospheric
pressure below the equilibrium temperature for pure water. 7.5.4 Pressure dependence of the melting point We consider the equilibrium between ice and water as an example of the pressure dependence of
the melting point. The enthalpy of fusion of water at 0◦ C is
fusion = 3.35 × 105 J/kg. (7.78) The speciﬁc volumes in the solid and liquid phase are
vsolid = 1.09070 × 10−3 m3 /kg, and vliquid = 1.00013 × 10−3 m3 /kg,
so that ∆v = vliquid − vsolid = −0.0906 × 10−3 m3 /kg. If we substitutes these values of
in (7.75), we ﬁnd
3.35 × 105
dP
=−
= −1.35 × 107 Pa/K.
dT
273.2 × 9.06 × 10−5 (7.79)
and ∆v
(7.80) From (7.80) we see that an increase in pressure of 1.35 × 107 Pa or 133 atmospheres lowers the
melting point by 1◦ C. CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 302 The lowering of the melting point of ice under pressure is responsible for the motion of glaciers.
The deeper parts of a glacier melt under the weight of ice on top allowing the bottom of a glacier
to ﬂow. The bottom freezes again when the pressure decreases.
Some textbooks state that ice skaters are able to skate freely because the pressure of the ice
skates lowers the melting point of the ice and allows ice skaters to skate on a thin ﬁlm of water
between the blade and the ice. As soon as the pressure is released, the water refreezes. From the
above example we see that if the ice is at −1◦ C, then the pressure due to the skates must be 135
atmospheres for bulk melting to occur. However, even for extremely narrow skates and a large
person, the skates do not exert enough pressure to cause this phenomenon. As an example, we
take the contact area of the blades to be 10−4 m2 and the mass of the skater to be 100 kg. Then
the pressure is given by
F
mg
P=
=
≈ 107 Pa ≈ 100 atm.
(7.81)
A
A
Given that on many winter days, the temperature is lower than a fraction of a degree below
freezing, there must a mechanism diﬀerent than pressureinduced melting that is responsible for
ice skating. And how do we explain the slide of a hockey puck, which has a large surface area and
a small weight? The answer appears to be the existence of surface melting, that is, the existence
of a layer of liquid water on the surface of ice that exists independently of the pressure of an ice
skate (see the references). 7.5.5 Pressure dependence of the boiling point Because ∆v is always positive for the transformation of liquid to gas, increasing the pressure on a
liquid always increases the boiling point. For water the enthalpy of vaporization is
vaporization = 2.257 × 106 J/kg. (7.82) The speciﬁc volumes in the liquid and gas phase at T = 373.15 K and P = 1 atm are
vliquid = 1.043 × 10−3 m3 /kg and vgas = 1.673 m3 /kg. (7.83) Hence from (7.75) we have
2.257 × 106
dP
=
= 3.62 × 103 Pa/K.
dT
373.15 × 1.672 7.5.6 (7.84) The vapor pressure curve The ClausiusClapeyron equation for the vapor pressure curve can be approximated by neglecting
the speciﬁc volume of the liquid in comparison to the gas, ∆v = vgas − vliquid ≈ vgas . From (7.83)
we see that for water at its normal boiling point, this approximation introduces an error of less than
0.1 per cent. If we assume that the vapor behaves like an ideal gas, we have that vgas = RT /P for
one mole of the gas. With these approximations, the ClausiusClapeyron equation can be written
as
dT
dP
=
.
(7.85)
P
RT 2 CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 303 If we also assume that is approximately temperature independent, we can integrate (7.85) to ﬁnd
ln P (T ) = − + constant RT (7.86) or
P (T ) ≈ P0 e− /RT , (7.87) where P0 is a constant.
Example 7.3. In the vicinity of the triple point the liquidvapor coexistence curve of liquid
ammonia can be represented by ln P = 24.38 − 3063/T , where the pressure is given in Pascals.
The vapor pressure of solid ammonia is ln P = 27.92 − 3754/T . What are the temperature and
pressure at the triple point? What are the enthalpies of sublimation and vaporization? What is
the enthalpy of fusion at the triple point?
Solution. At the triple point, Psolid = Pliquid or 24.38 − 3063/T = 27.92 − 3754/T . The solution
is T = 691/3.54 = 195.2 K. The corresponding pressure is 8.7 Pa. The relation (7.86), ln P =
− /RT + constant, can be used to ﬁnd the enthalpy of sublimation and vaporization of ammonia
at the triple point. We have sublimation = 3754R = 3.12 × 104 J/mol and vaporization = 3063R =
2.55 × 104 J/mol. The enthalpy of melting satisﬁes the relation sublimation = vaporization + fusion.
Hence, fusion = (3.12 − 2.55) × 104 = 5.74 × 103 J/mol. 7.6 Vocabulary Maxwell relations
free expansion, JouleThomson process
phase coexistence curve, phase diagram
triple point, critical point
ClausiusClapeyron equation
enthalpy of fusion, vaporization, and sublimation Additional Problems
Problems
7.1
7.2
7.3 page
290
291
294 Table 7.1: Listing of inline problems.
Problem 7.4. Show that the three enthalpy (diﬀerences) are not independent, but are related by
fusion + Interpret this relation in physical terms. vaporization = sublimation . (7.88) CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 304 Problem 7.5. Show that
∂ CP
∂P T = −T ∂2V
∂T 2 P , (7.89) and
∂ CV
∂V
Problem 7.6. Show that T =T ∂2P
∂T 2 V . KT
CP
=
,
KS
CV (7.90) (7.91) where
1 ∂V
V ∂P
1 ∂V
KS = −
V ∂P KT = − (7.92a) T
S . (7.92b) KS is the adiabatic compressibility. Use (7.92b) and (7.34) to obtain the relation
KT − KS = TV 2
α.
CP (7.93) Problem 7.7. The inversion temperature for the JouleThomson eﬀect is determined by the
relation (∂T /∂V )P = T /V (see (7.48))). In Section 7.4.2 we showed that for low densities and
high temperatures (low pressures) the inversion temperature is given by kTinv = 2a/b. Show that
at high pressures, Tinv is given by
kTinv = 2a
(2 ±
9b 1 − 3b2 P/a)2 . (7.94) Show that as P → 0, kTinv = 2a/b. For P < a/3b2, there are two inversion points between which
the derivative (∂T /∂P )H is positive. Outside this temperature interval the derivative is negative.
For P > a/3b2 there are no inversion points and (∂T /∂P )H < 0 is negative everywhere. Find the
pressure dependence of the inversion temperature for the JouleThomson eﬀect.
Problem 7.8. Use the result (7.84) to estimate the boiling temperature of water at the height of
the highest mountain in your geographical region.
Problem 7.9. A particular liquid boils at 127◦C at a pressure of 1.06 × 105 Pa. Its enthalpy of
vaporization is 5000 J/mol. At what temperature will it boil if the pressure is raised to 1.08×105 Pa?
Problem 7.10. A particular liquid boils at a temperature of 105◦ C at the bottom of a hill and at
95◦ C at the top of the hill. The enthalpy of vaporization is 1000 J/mol. What is the approximate
height of the hill? CHAPTER 7. THERMODYNAMIC RELATIONS AND PROCESSES 305 Suggestions for Further Reading
David Lind and Scott P. Sanders, The Physics of Skiing: Skiing at the Triple Point, Springer
(2004). See Technote 1 for a discussion of the thermodynamics of phase changes.
Daniel F. Styer, “A thermodynamic derivative means an experiment,” Am. J. Phys. 67, 1094–
1095 (1999).
James D. White, “The role of surface melting in ice skating,” Phys. Teacher 30, 495 (1992). Chapter 8 Theories of Classical Gases and
Liquids
c 2005 by Harvey Gould and Jan Tobochnik
15 November 2005 8.1 Introduction Because there are few problems in statistical physics that can be solved exactly, we need to develop
techniques for obtaining approximate solutions. In this chapter we introduce perturbation methods
that are applicable whenever there is a small expansion parameter, for example, low density. As
an introduction to the nature of manybody perturbation theory, we ﬁrst consider the classical
monatomic gas. The discussion in Section 8.4 involves many of the considerations and diﬃculties
encountered in quantum ﬁeld theory. For example, we will introduce diagrams that are analogous to
Feynman diagrams and ﬁnd divergences analogous to those found in quantum electrodynamics. We
also will derive what is known as the linked cluster expansion, a derivation that is straightforward
in comparison to its quantum counterpart. 8.2 The Free Energy of an Interacting System Consider a gas of N identical particles of mass m at density ρ = N/V and temperature T . If
we make the assumption that the total potential energy U is a sum of twobody interactions
u(ri − rj ) = uij , we can write U as
N uij . U=
i<j 306 (8.1) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 307 u σ r ε Figure 8.1: The LennardJones potential u(r). Note that u(r) = 0 for r = σ and the depth of the
well is − . The simplest interaction is the hard sphere interaction
u(r) = ∞ r<σ
0 r > σ. (8.2) This interaction has no attractive part and is often used in model calculations of liquids. A more
realistic interaction is the semiempirical LennardJones potential (see Figure 8.1):1
σ
σ
u(r) = 4 ( )12 − ( )6 .
r
r (8.3) Both of the hard sphere interaction and LennardJones potential are shortranged, that is, u(r) ≈ 0
for r greater than a separation r0 .
The LennardJones potential is the most common form of the interaction that is used to
qualitatively represents the behavior of a typical intermolecular potential. The existence of many
calculations and simulation results for the LennardJones potential encourages us to use it even
though there are more accurate forms of the interparticle potential for real ﬂuids. The attractive
1/r6 contribution is well justiﬁed theoretically and is due to the induced dipoledipole interaction
of two atoms. Although each atom is electrically neutral, the instantaneous ﬂuctuations in the
charge distribution can have nonspherical symmetry. The resulting dipole in one atom can induce
a dipole moment in the other atom. The cause of the repulsive interaction between atoms at small
separations is much diﬀerent. At small r, the electron distributions of the two atoms distort to
avoid spatial overlap not allowed by the Pauli exclusion principle. The distortion of the electron
distributions causes the energy of the atoms to increase thus leading to a repulsion between the
atoms. However, the 1/r12 form of the repulsive potential is chosen only for convenience.
1 This form of the potential was ﬁrst proposed by John Edward LennardJones in 1924. CHAPTER 8. CLASSICAL GASES AND LIQUIDS 308 Problem 8.1. Show that the minimum of the LennardJones potential is at r0 = 21/6 σ and that
u(r0 ) = − . At what value of r is u(r) a minimum?
We assume that the condition λ
n−1/3 is satisﬁed so that the system can be treated
semiclassically. In this limit the partition function for N identical particles is given by
ZN =
where the kinetic energy K =
particles, Zideal , is given by i 1
N ! h3N d3Np d3Nr e−βK e−βU , (8.4) p2 /2m. The partition function for an ideal classical gas of N
i Zideal = 1
V N d3Np e−βK .
N !h3N (8.5) We have already seen in Section 6.8 that Zideal can be evaluated exactly. The corresponding free
energy of a classical ideal gas is Fideal = −kT ln Zideal .
Because the potential energy does not depend on the momenta of the particles, the kinetic
energy part of the integrand in (8.4) can be integrated separately, and ZN can be written in the
form
1
ZN
= N d3 r1 d3 r2 . . . d3 rN e−βU ,
(8.6)
Zideal
V
We adopt the notation · · · 0 to denote an average over the microstates of the ideal gas. That is,
each particle in (8.6) has a probability d3 r/V of being in a volume d3 r. Using this notation we
write
ZN
= e−βU 0 .
(8.7)
Zideal
The contribution to the free energy from the correlations between the particles due to their interactions, Fc , has the form
Fc = −kT ln Z
= −kT ln e−βU 0 .
Zideal (8.8) We see that the calculation of the free energy of a classical system of interacting particles can be
reduced to the evaluation of the ensemble average in (8.8). The evaluation of Fc for a classical
system of interacting particles is the overall goal of this chapter.
Because we do not expect to be able to calculate Fc exactly for arbitrary densities, we ﬁrst
seek an approximation for Fc for low densities where we expect that the interactions between the
particles are not too important. We know that the ideal gas equation of state, P V /N kT = 1, is
a good approximation to a gas in the dilute limit where the intermolecular interactions can be
ignored. If the interactions are shortrange, we assume that we can make an expansion of the
pressure in powers of the density. This expansion is known as the virial expansion and is written
as
PV
= 1 + ρB2 (T ) + ρ2 B3 (T ) + . . .
(8.9)
N kT CHAPTER 8. CLASSICAL GASES AND LIQUIDS 309 The quantities Bn are known as virial coeﬃcients and involve the interaction of n particles. The
ﬁrst four virial coeﬃcients are given by the integrals (B1 = 1)
1
d3 r1 d3 r2 f12 ,
2V
1
d3 r1 d3 r2 d3 r3 f12 f13 f23 ,
B3 (T ) = −
3V
1
d3 r1 d3 r2 d3 r3 d3 r4 3f12 f23 f34 f41
B4 (T ) = −
8V
+ 6f12 f23 f34 f41 f13 + f12 f23 f34 f41 f13 f24 ,
B2 (T ) = − where fij = f (ri − rj ), and f (r) = e−βu(r) − 1. (8.10a)
(8.10b) (8.10c) (8.11) The function f (r) deﬁned in (8.11) is known as the Mayer f function.
Problem 8.2. The density expansion of the free energy Fc is usually written as
∞ −β Fc
b p ρp
=
,
N
p+1
p=1 (8.12) where the bp are known as cluster integrals. Use the thermodynamic relation between the pressure
and the free energy to show that Bn and bn−1 are related by
Bn = − n−1
bn−1
n (8.13) Problem 8.3. Plot the Mayer function f (r) for a hard sphere interaction and a LennardJones
potential. Does f (r) depend on T for hard spheres? 8.3 Second Virial Coeﬃcient We can ﬁnd the form of the second virial coeﬃcient B2 by simple considerations. For simplicity,
we ﬁrst consider the partition function for N = 2 particles:
Z2
1
=2
Zideal
V d3 r1 d3 r2 e−βu12 . (8.14) We can simplify the integrals in (8.14) by choosing particle 1 as the origin and specifying particle
2 relative to particle 1. This choice of coordinates gives a factor of V because particle 1 can be
anywhere in the box. Hence, we can write (8.26) as
1
Z2
=
Zideal
V d3 r e−βu(r) , (8.15) where r = r2 − r1 and r = r. The function e−βu(r) that appears in the integrand for Z2 has the
undesirable property that it approaches unity rather than zero as r → ∞. Because we want to CHAPTER 8. CLASSICAL GASES AND LIQUIDS 310 obtain an expansion in the density, we want to write the integrand in (8.15) in terms of a function
of r which is signiﬁcant only if two particles are close to each other. Such a function is the Mayer
function f (r) deﬁned in (8.11). We write e−βu(r) = 1 + f (r) and express (8.15) as
Z2
1
=
Zideal
V d3 r [1 + f (r)]. (8.16) Note that f (r) → 0 for suﬃciently large r. The ﬁrst term in the integrand in (8.16) corresponds
to no interactions and the second term corresponds to the second virial coeﬃcient B2 deﬁned in
(8.10a). To see this correspondence, we choose particle 1 as the origin as before, and rewrite (8.10a)
for B2 as
1
B2 = −
d3 rf (r).
(8.17)
2
If we compare the form (8.16) and (8.17), we see that we can express Z2 /Zideal in terms of B2 :
2
Z2
= 1 − B2 .
Zideal
V (8.18) We next evaluate ZN for N = 3 particles. We have
P
Z3
1
= 3 d3 r1 d3 r2 d3 r3 e−β uij
Zideal
V
1
(1 + fij ),
= 3 d3 r1 d3 r2 d3 r3
V
i<j (8.19)
(8.20) 1
d3 r1 d3 r2 d3 r3 (1 + f12 )(1 + f13 )(1 + f23 )
V3
1
= 3 d3 r1 d3 r2 d3 r3 1 + (f12 + f13 + f23 )
V
+ (f12 f13 + f12 f23 + f13 f23 ) + f12 f13 f23 ) = (8.21) (8.22) We see that we can group terms of products of f ’s. If we neglect all but the ﬁrst term in the
expansion (8.22), we recover the ideal gas result Z/Zideal = 1. It is plausible that in the limit of
low density, only the second sum in (8.22) involving pairs of particles is important, and we can
ignore the remaining terms involving products of two and three f ’s. Because the three terms f12 ,
f13 , and f23 give the same contribution, we have
Z3
3
≈1+
Zideal
V d3 rf (r) = 1 − 6
B2
V (8.23) We will see in Section 8.5 that the term in (8.22) involving a product of three f ’s is related to the
third virial coeﬃcient.
For arbitrary N we write
ZN
1
=N
Zideal
V
1
=N
V d3 r1 d3 r2 . . . d3 rN e−β
d3 r1 d3 r2 . . . d3 rN P uij (1 + fij ).
i<j (8.24) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 311 We have
(1 + fij ) = 1 +
i<j fkl +
k<l fkl fmn + . . . (8.25) k<l, m<n We keep only the ideal gas contribution and the terms involving pairs of particles, and we ignore
the remaining terms involving products of two or more f ’s. There are a total of 1 N (N − 1) terms
2
in the sum
fkl corresponding to the number of ways of choosing pairs of particles. These terms
are all equal because they diﬀer only in the way the variables of integration are labeled. Hence,
we can write the integral of the ﬁrst sum in (8.25) as
1
VN d3 r1 . . . d3 rN fkl =
k<l 1 N (N − 1) 3
d r1 . . . d3 rN f (r12 ).
VN
2 (8.26) The integration with respect to r3 . . . rN over the volume of the system gives a factor of V N −2 .
As before, we can simplify the remaining integration over r1 and r2 by choosing particle 1 as
the origin and specifying particle 2 relative to particle 1. Hence, we can write the righthand side
of (8.26) as
N (N − 1) 3
N2 3
(8.27)
d rf (r) →
d rf (r),
2V
2V
where we have replaced N − 1 by N because N
1. If we identify the integral in (8.27) with B2 ,
we see that we can write
ZN
≈ 1 − N ρB2 .
(8.28)
Zideal
If the interparticle potential u(r) ≈ 0 for r > r0 , then f (r) diﬀers from zero only for r < r0
and the integral B2 is bounded and is order r0 3 in three dimensions (see Problem 8.4). Hence B2
is independent of V and is an intensive quantity. However, this wellbehaved nature of B2 implies
that the second term in (8.28) is proportional to N and in the limit N → ∞, this term is larger
than the ﬁrst — not a good omen for a perturbation theory.
The reason we have obtained this apparent divergence in ZN /Zideal is that we have calculated
the wrong quantity. The quantity of physical interest is F rather than Z . Because F is proportional
to N , it follows from the relation F ∝ ln Z that ZN must depend on the N th power of an intensive
quantity. Thus Z must have the form
ZN
= 1 − ρB2 + . . .
Zideal N , (8.29) so that F will be proportional to N . Hence, the leading contribution to Z/Zideal should be written
as
ZN
= 1 − N ρB2 → (1 − ρB2 )N .
(8.30)
Zideal
The corresponding free energy is given by
F = Fideal − N kT ln(1 − ρB2 ),
≈ Fideal + N kT ρB2, (8.31) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 312 where we have used the fact that ln(1 + x) ≈ x for x
1. The corresponding equation of state is
given by
PV
= 1 + ρB2 .
(8.32)
N kT
The second term in (8.32) represents the ﬁrst density correction to the ideal gas equation of state.
The properties of B2 and the approximate equation of state (8.32) are explored in Problems 8.4–8.7.
Problem 8.4. To calculate B2 we need to ﬁrst perform the angular integrations in (8.17). Show
that because u(r) depends only on r, B2 can be written as
B2 (T ) = − 1
2 ∞ drr2 (1 − e−βu(r) ). d3 r f (r) = 2π (8.33) 0 Show that B2 = 2πσ 3 /3 for a system of hard spheres of diameter σ .
Problem 8.5. Qualitative temperature behavior of B2 (T ). Suppose that u(r) has the qualitative
behavior shown in Figure 8.1. Take r0 to be the value of r at which u(r) is a minimim (see
Problem 8.1). Interpret r0 in terms of the eﬀective diameter of the atoms.
For high temperatures (kT
), u(r)/kT
1 for r > r0 . Explain why the value of the integral
in (8.33) is determined by the contributions from the integrand for r < r0 where u(r)/kT is large
3
and positive. Show that in this limit B2 (T ) ≈ b, where b = 2πr0 /3. What is the interpretation of
the parameter b?
For low temperatures (kT
), the dominant contributions to the integral are determined by
the contributions from the integrand for r > r0 . What is the sign of u(r) for r > r0 ? What is the
sign of B2 (T ) at low T ? Show that in this limit B2 ≈ −a/kT , where
a = −2π ∞ u(r)r2 dr. (8.34) r0 Show that this reasoning implies that B2 can be written in the approximate form
B2 = b − a
,
kT (8.35) 3
where b = 2πr0 /3 and a is given by (8.34). Why does B2 (T ) pass through zero at some intermediate
temperature? The temperature at which B2 (T ) = 0 is known at the Boyle temperature. Problem 8.6. Use a simple numerical method such as Simpson’s rule (see Appendix Bxx) to
determine the T dependence of B2 for the LennardJones potential. At what temperature does
B2 vanish? How does this temperature compare with that predicted by (8.35)? Compare your
numerical result with the approximate result found in Problem 8.5.
Problem 8.7. Inversion temperature of Argon. In Section 7.4.2 we discussed the porous plug or
JouleThompson process in which a gas is forced from a high pressure chamber through a porous
plug into a lower pressure chamber. The process occurs at constant enthalpy and the change in
temperature of the gas is given by dT = (∂T /∂P )H dP for small changes in pressure. We know
that (see (7.48))
1
∂V
∂T
=
−V .
(8.36)
T
∂P H
CP
∂T P,N CHAPTER 8. CLASSICAL GASES AND LIQUIDS 313 The locus of points (∂T /∂P )H = 0 is called the JouleThompson inversion curve. Assume the
approximate equation of state V = N kT /P + N B2 and use your numerical results for B2 from
Problem 8.6 for the LennardJones potential to compute the inversion temperature at which the
inversion curve is a maximum. Use σ = 3.4 ˚and /k = 120 K and compare your result with the
A
experimental value of 780 K.
Problem 8.8. Assume that u(r) has the form u(r) ∼ r−n for large r. What is the condition on
n such that the integral in (8.33) for B2 exists? Why is it plausible that the density expansion
(8.8) is not applicable to a system of particles with an interaction proportional to 1/r (a Coulomb
system)?
∗ Problem 8.9. Show that if the last term in (8.22) is neglected, Z3 /Zideal = 1 − 6B2 /V +
2
12B2 /V 2 ≈ (1 − 2ρB2 )3 , where ρ = 3/V . Also show that the last term in (8.22) is related to B3 .
In Problem 8.5 we found that B2 can be written in the approximate form (8.35). This approximate form of B2 allows us to write the equation of state as (see (8.32))
P V = N kT 1 + ρb − or ρa
kT 1
ρa
−
1 − ρb kT
ρkT
=
− ρ2 a2 .
1 − ρb P ≈ ρkT [ (8.37) Note that we have made the approximation 1 + ρb ≈ 1/(1 − ρb), which is consistent with our
assumption that ρb
1. The approximate equation of state in (8.37) is known as the van der
Waals equation of state as discussed on page 33. Note that the parameter a takes into account the
longrange attraction of the molecules and the parameter b takes into account their shortrange
repulsion. A more systematic derivation of the van der Waals equation of state will be given in
Section 8.9.1. 8.4 Cumulant Expansion In Section 8.2 we had to make some awkward assumptions to obtain the form of B2 from an
expansion of Z/Zideal . To ﬁnd the form of the higher order virial coeﬃcients, we introduce what is
known as the cumulant expansion. In Section 8.5 we apply this formalism to obtain an expansion
of ln Z/Zideal in powers of β . Finally, we rearrange this expansion so that it becomes an expansion
of ln Z/Zideal in powers of ρ.
The form (8.8) for Fc is similar to that frequently encountered in probability theory. We
introduce the function φ(t) deﬁned as
φ(t) ≡ etx , (8.38) where the random variable x occurs according to the probability distribution p(x), that is, the
average denoted by . . . is over p(x). The function φ(t) is an example of a moment generating CHAPTER 8. CLASSICAL GASES AND LIQUIDS 314 function because a power series expansion in t yields
122
t x + ···] ,
2!
t2 2
x + ··· ,
= 1+t x +
2!
∞nn
tx
.
=
n!
n=0 φ(t) = [1 + tx + (8.39) (8.40) In the present case the physical quantity of interest is proportional to ln Z , so we will want to
consider the series expansion of ln φ rather than φ. (The correspondence is t → −β and x → U .)
The series expansion of ln φ(t) can be written in the form
ln φ = ln etx = ∞ tn Mn (x)
,
n!
n=1 (8.41) where the coeﬃcients Mn are known as cumulants or semiinvariants. The ﬁrst four cumulants
are
M1 = x ,
M2 = x2 − x 2 , (8.42a)
(8.42b) M3 = x3 − 3 x2 x + 2 x 3 ,
M4 = x4 − 4 x3 x − 3 x2 2 + 12 x2 x 2 − 6 x 4. (8.42c)
(8.42d) Problem 8.10. Use the ﬁrst few terms in the Taylor expansion of ln(1 + x) (see Appendix A) to
obtain the expressions for Mn given in (8.42).
Because ln φ(x) is an extensive or additive quantity, the independent random variables x and
y satisfy the condition that
ln et(x+y) = ln etx ety = ln etx + ln ety .
Because
ln et(x+y) = (8.43) ∞ tn
Mn (x + y ),
n!
n=0 (8.44) we have the result
Mn (x + y ) = Mn (x) + Mn (y ). (8.45) The relation (8.45) implies that all cross terms in Mn involving independent variables vanish.
Problem 8.11. As an example of the cancellation of cross terms, consider M3 (x + y ). From
(8.42c) we know that M3 (x + y ) is given by
3 M3 (x + y ) = (x + y )3 − 3(x + y )2 (x + y ) + 2(x + y ) Show explicitly that all cross terms cancel and that M3 (x + y ) = M3 (x) + M3 (y ). (8.46) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 8.5 315 High Temperature Expansion Now that we have discussed the formal properties of the cumulants, we can use these properties
to evaluate Fc . According to (8.8) and (8.41) we can write Fc as
−βFc = ln e−βU ∞ 0 = (−β )n
Mn .
n!
n=1 (8.47) The expansion (8.47) in powers of β is known as a high temperature expansion. Such an expansion
is very natural because β = 1/kT is the only parameter that appears explicitly in (8.47). So an
obvious strategy is to assume that the inverse temperature is small and expand Fc in powers of β .
However, β enters in the combination βu0 , where u0 is a measure of the strength of the interaction.
Although we can choose β to be as small as we wish, the potential energy of interaction between
the particles in a gas is strongly repulsive at short distances (see Figure 8.1), and hence u0 is not
well deﬁned. Hence, a strategy based on expanding in the parameter β is not physically reasonable.
The diﬃculty of generating an appropriate perturbation theory for a dilute gas is typical of the
diﬃculties of doing perturbation theory for a system with many degrees of freedom. Our strategy
is to begin with what we can do and then try to do what we want. What we want is to generate an
expansion in powers of ρ for a dilute gas, even though the parameter ρ does not appear in (8.47).
Our strategy will be to formally do a high temperature expansion and ﬁnd the high temperature
expansion coeﬃcients Mn . Then we will ﬁnd that we can reorder the high temperature expansion
to obtain a power series expansion in the density.
The ﬁrst expansion coeﬃcient in (8.47) is the average of the total potential energy:
M1 = U = 1
VN dr1 dr2 . . . drN uij . (8.48) i<j Because every term in the sum gives the same contribution, we have
1
1
N (N − 1) N dr1 dr2 . . . drN u12
2
V
1
1
= N (N − 1) N V N −2 dr1 dr2 u12
2
V
1
1
= N (N − 1) 2 dr1 dr2 u12 .
2
V M1 = (8.49) The combinatorial factor 1 N (N − 1) is the number of terms in the sum. Because we are interested
2
in the limit N → ∞, we replace N − 1 by N . We can simplify (8.49) further by measuring the
position of particle 2 from particle 1, and write
M1 =
or N2
V
2V 2 ρ
M1
=
N
2 dr u(r), dr u(r). Note that M1 is an extensive quantity as is the free energy. (8.50) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 316 Before we proceed further, we note that we have implicitly assumed that the integral in (8.50)
converges. However, the integral diverges for small r for a system of hard spheres or the LennardJones potential. So what can we do? The answer might seem strange at ﬁrst, but we will let our
intuition rather than mathematical rigor guide us. So let us assume that we can overcome these
problems later. Or if this assumption seems unacceptable, we can equally well assume that the
potential energy of interaction is not strongly repulsive at small separations and is shortrange.
Examples of such potentials include the step potential u(r) = u0 for r < a and u(r) = 0 for r > a,
2
2
and the Gaussian potential u(r) = u0 e−r /a .
We next consider M2 which is given by
M2 = U 2 − U 2 , (8.51) uij . (8.52) where
U=
i<j j i<j j and
U2 = uij ukl ,
k<l (8.53) l The various terms in (8.53) and (8.52) may be classiﬁed according to the number of subscripts in
common. As an example, suppose that N = 4. We have
4 uij = u12 + u13 + u14 + u23 + u24 + u34 , U= (8.54) i<j =1 and
U 2 = [u2 + u2 + u2 + u2 + u2 + u2 ]
12
13
14
23
24
34
+ 2[u12 u13 + u12 u14 + u12 u23 + u12 u24 + u13 u14 + u13 u23
+ u13 u34 + u14 u24 + u14 u34 + u23 u24 + u23 u34 + u24 u34 ]
+ 2[u12 u34 + u13 u24 + u14 u23 ] (8.55) An inspection of (8.55) shows that the 36 terms in (8.55) can be grouped into three classes:
No indices in common (disconnected terms). A typical disconnected term is u12 u34 .
Because the variables r12 and r34 are independent, u12 and u34 are independent, and we can write
u12 u34 = u12 u34 . (8.56) From (8.45) we know that every disconnected term such as the one in (8.56) is a cross term that
is canceled if all terms in M2 are included.
One index in common (reducible terms). An example of a reducible term is u12 u23 . We
can see that such a term also factorizes because of the homogeneity of space. Suppose we choose CHAPTER 8. CLASSICAL GASES AND LIQUIDS 317 particle 2 as the origin and integrate over r1 and r3 . We have
1
dr1 dr2 dr3 u12 u23
V3
1
= 2 dr12 dr23 u12 u23
V
= u12 u23 . u12 u23 = (8.57) (8.58) A factor of V was obtained because particle 2 can be anywhere in the box of volume V . Again we
ﬁnd that the variables uij ujk are independent and hence are canceled by other terms in M2 .
Both pairs of indices in common (irreducible terms). An example of an irreducible term
is u2 . Such a term is not canceled, and we ﬁnd that the corresponding contribution to M2 is
12
N [ u2 − uij 2 ].
ij M2 = (8.59) i<j =1 We can simplify (8.59) further by comparing the magnitude of the two types of terms in the limit
N → ∞. We have that
1
1
1
∝O
u2 drij ∝
.
ij
V
V
N
1
1
2
.
=
uij drij ∝ O
V
N2 u2 =
ij (8.60a) 2 (8.60b) uij From (8.60) we see that we can ignore the second term in comparison to the ﬁrst (assuming that
the above integrals converge.)
The above considerations lead us to the desired form of M2 . Because there are N (N − 1)/2
identical contributions such as (8.60a) to M2 in (8.59), we obtain in the limit N → ∞, V → ∞
ρ
M2
=
N
2 u2 (r) dr. (8.61) The most important result of our evaluation of M1 and M2 is that the disconnected and
reducible contributions are canceled. This cancellation is due to the statistical independence of
the variables that appear. According to (8.45) their contribution must necessarily vanish. The
vanishing of the disconnected contributions is essential for Mn and thus for Fc to be an extensive
quantity proportional to N . For example, consider the contribution i<j,k<l uij ukl for i = j =
k = l. As we saw in (8.60b), each uij is order 1/V . Because each index is diﬀerent, the number
of terms is ∼ N 4 and hence the order of magnitude of this type of contribution is N 4/V 2 ∼ N 2 .
(Recall that N/V = ρ is ﬁxed.) Because the presence of the disconnected terms in M2 would
imply that Fc would be proportional to N 2 rather than N , it is fortunate that this spurious N
dependence cancels exactly. The fact that the disconnected terms do not contribute to Fc was ﬁrst
shown for a classical gas by Mayer in 1937. The corresponding result was not established for a
quantum gas until 1957.
The reducible terms also vanish but do not lead to a spurious N dependence. As an example,
consider the term uij ujk ukl with all four indices distinct. We can choose relative coordinates and CHAPTER 8. CLASSICAL GASES AND LIQUIDS (a) (b) 318 (c) Figure 8.2: Examples of disconnected diagrams with three bonds. For (a) the volume dependence is
V −3 , the number of terms in the sum is O(N 6 ) and the magnitude of the contribution is O(N 3 ). For
(b) the volume dependence is V −3 , the number of terms in the sum is O(N 5 ), and the magnitude of
the contribution is O(N 2 ). For (c) the volume dependence is V −2 , the number of terms is O(N 4 ),
and magnitude is O(N 2 ). show that this term factorizes, uij ujk ukl = uij ujk ukl , and hence is canceled for a classical
gas. However, the N dependence of this term is N 4/V 3 ∼ N . This fact that the reducible terms
are proportional to N is fortunate because the reducible terms do not cancel for a quantum gas.
Problem 8.12. Consider a system of N = 4 particles and obtain the explicit form of the ﬁrst
three cumulants. Show that the disconnected and reducible contributions cancel.
To consider the higherorder cumulants, we introduce a graphical notation that corresponds
to the various contributions to Mn . As we have seen, we do not need to consider products of
expectation values because they either cancel or are O(1/N ) relative to the irreducible terms
arising from the ﬁrst term in Mn . The rules for the calculation of Mn are
a. For each particle (subscript on u) draw a vertex (a point).
b. Draw a bond (dotted line) between two vertices. There is a total of n bonds among p
vertices, where 2 ≤ p ≤ n.
c. If the diagram contains two or more pieces not joined by a bond, then the diagram is disconnected;
If the diagram can be separated into two disconnected pieces by removing one vertex, then the
diagram is reducible. The remaining diagrams are irreducible and are the only ones that need to
be considered.
Examples of the various types of disconnected and reducible diagrams are shown in Figs. 8.2–8.4
corresponding to Mn=3 . What are the corresponding contributions to M3 ?
It is straightforward to ﬁnd the contributions to M3 corresponding to the two types of irreducible diagrams shown in Figure 8.4. There are 1 N (N − 1) identical contributions of type (a)
2
and N (N − 1)(N − 2) of the type shown in (b). Hence, the form of M3 in the limit N → ∞ is
M3
ρ
=
N
2 u3 (r) dr + ρ2 u12 u23 u31 dr12 dr23 . (8.62) CHAPTER 8. CLASSICAL GASES AND LIQUIDS (a) 319 (b) (c) Figure 8.3: Examples of reducible diagrams with three (potential) bonds. The u bonds are represented by dotted lines and the vertices are represented by ﬁlled circles. For (a) the volume
dependence is V −3 , the number of terms is O(N 4 ), and the magnitude is O(N ). For (b) the volume dependence is V −2 , the number of terms is O(N 3 ), and the magnitude is O(N ). For (c) the
volume dependence is V −3 , the number of terms is O(N 4 ), and the magnitude is O(N ). (b) (a) Figure 8.4: Examples of irreducible diagrams with three (potential) bonds. For (a) the volume
dependence is V −1 , the number of terms is O(N 2 ), and the magnitude is O(N ). For (b) the volume
dependence is V −2 , the number of terms is O(N 3 ), and the magnitude is N . 8.6 Density Expansion We saw in Section 8.5 that we have reduced the problem of calculating Mn to enumerating all
irreducible diagrams containing nbonds among p vertices, where 2 ≤ p ≤ n. The expansion (8.47)
is a expansion in β or in the number of bonds n. We now show how this expansion can be reordered
so that we an obtain an expansion in the density ρ or in the number of vertices p.
Consider an irreducible diagram of n bonds and p vertices. An example is shown in Figure 8.4
for p = 2 and n = 3. Because there are p particles, there is a factor of N p . For such a diagram
all but p − 1 integrations can be performed leaving a factor of 1/V p−1 . We conclude that an
irreducible diagram with p vertices contributes a term that is order N p /V p−1 , leading to an order
ρp−1 contribution to Fc /N . Hence, a classiﬁcation of the diagrams according to the number of
bonds corresponds to a high temperature expansion while a classiﬁcation according to the number
of vertices is equivalent to a density expansion. That is, by summing all diagrams with a given
number of vertices, the expansion (8.47) can be converted to a power series in the density. The
result is traditionally written in the form (see (8.12))
∞ −β Fc
b p ρp
=
.
N
p+1
p=1 (8.63) Our goal in the following is to ﬁnd the form of the ﬁrst few cluster integrals bp .
We ﬁrst add the contribution of all the twovertex diagrams to ﬁnd b1 (see Figure 8.5). From
(8.47) we see that a factor of −β is associated with each bond. The contribution to −βFc from all CHAPTER 8. CLASSICAL GASES AND LIQUIDS 320 Figure 8.5: The ﬁrst several irreducible diagrams with two vertices. Figure 8.6: The single diagram contributing to b1 . The solid line represents a f bond. two vertex diagrams is
∞ (−β )n
Mn (contribution from irreducible diagrams with two vertices)
n!
n=1
∞ =N ρ
(−β )n
2 n=1 n! un (r) dr = N ρ
(e−βu(r) − 1) dr.
2 (8.64) Because B2 = − 1 b1 , we recover the result (8.33) that we found in Section 8.2 by a plausibility
2
argument. Note the appearance of the Mayer f function in (8.64).
We can now simplify the diagrammatic expansion by replacing the inﬁnite sum of u (potential)
bonds between any two particles by f . For example, b2 corresponds to the single diagram shown
in Figure 8.6.
To ﬁnd b2 we consider the set of all irreducible diagrams containing n = 3 vertices. Some of
the diagrams with u bonds are shown in Figure 8.7a. By considering all the possible combinations
of the u bonds, we can add up all the irreducible diagrams containing three vertices with l12 , l23 ,
l31 bonds. Instead, we will use our intuition and simply replace the various combinations of u
bonds by a single f bond between any two vertices as shown in Figure 8.7b. The corresponding
contribution to b2 is
1
b2 =
f12 f23 f31 dr12 dr23 .
(8.65)
2! (a) (b) Figure 8.7: (a) The ﬁrst several irreducible diagrams with three vertices and various numbers of u
bonds (dotted lines). (b) The corresponding diagram with f bonds (solid lines). CHAPTER 8. CLASSICAL GASES AND LIQUIDS 321 1 4 1 21 41 21 2 3 2 3 42 33 43 4 1 4 1 41 41 41 2 3 2 2 32 33 23 4 Figure 8.8: The fourvertex diagrams with all the possible diﬀerent labelings. Note that the bonds
are f bonds. In general, it can be shown that
bp = 1
p! fij dr1 . . . drp , (8.66) where the sum is over all irreducible topologically distinct diagrams among p + 1 vertices. For
example, b3 corresponds to the fourvertex diagrams shown in Figure 8.8. The corresponding
result for b3 is
b3 = 1
(3f12 f23 f34 f41 + 6f12 f23 f34 f41 f13 + f12 f23 f34 f41 f13 f24 ) dr2 dr3 dr4 .
3! (8.67) We see that we have converted the original high temperature expansion to a density expansion
by summing what are known as ladder diagrams. These diagrams corresponding to all the possible
u bonds between any two particles (vertices). The result of this sum is the Mayer f function. The
resultant density expansion for the free energy and the pressure equation of state is one of the few
expansions known in physics that is convergent for suﬃciently small densities. In contrast, most
expansions in physics are asymptotic.
The procedure for ﬁnding higher order terms in the density expansion of the free energy is
straightforward in principle. To ﬁnd the contribution of order ρp−1 , we enumerate all the diagrams
with p vertices and various numbers of f bonds such that the diagrams are irreducible. There is
only one f bond between any two vertices. However, the enumeration of the cluster integrals bp
becomes more and more diﬃcult for larger p. Even more diﬃcult is the calculation of the cluster
integrals even for hard spheres.
The results for the known virial coeﬃcients for hard spheres are summarized in Table 8.1 in
terms of the parameter
η = πρσ 3 /6,
(8.68)
where σ is the diameter of the spheres. The parameter η can be expressed as
η = ρ(4π/3)(σ/2)3 . (8.69) The form of (8.69) shows that η is the fraction of the space occupied by N spheres. For this reason
η is often called the packing fraction. CHAPTER 8. CLASSICAL GASES AND LIQUIDS
virial coeﬃcient
ρB2
ρ2 B3
ρ3 B4
ρ4 B5
ρ5 B6
ρ6 B7 322 magnitude
2
3
3 πρσ = 4η
5 226
2
18 π ρ σ = 10η
3
18.365η
28.24η 4
39.5η 5
56.5η 6 Table 8.1: The values of the known virial coeﬃcients for hard spheres. Problem 8.13. The results shown in Table 8.1 imply that the equation of state of a system of
hard spheres can be written as
6 PV
= 1+
Cs η s
N kT
s=1
= 1 + 4η + 10η 2 + 18.365η 3 + 28.24η 4 + 39.5η 5 + 56.5η 6 , (8.70) where the dimensionless parameter η = πρσ 3 /6. It is clear from the form (8.70) that the convergence is not very rapid.
Show that the coeﬃcients Cs can be written in the approximate form Cs = 3s + s2 and this
form of Cs implies that
∞ PV
= 1+
(3s + s2 )η s .
N kT
s=1
Do the sum in (8.71) and show that
1 + η + η2 − η3
PV
=
.
N kT
(1 − η )3 (8.71) (8.72) The form (8.72) is known as the CarnahanStarling equation of state.
√
Problem 8.14. Show that the closepacked volume of a system of hard spheres is equal to N σ 3 / 2.
What is the corresponding maximum value of η ? Do you expect the CarnahanStarling equation
of state to be applicable at densities close to the ﬂuidsolid transition?
Problem 8.15. The applet at xx simulates a system of hard disks. Choose N = xx and start at
density of ρ = xx. Run long enough to obtain meaningful estimates of the pressure. Then gradually
increase the density. What happens to the pressure at ρ ≈ xx? Evidence for the existence of a
ﬂuidsolid transition was ﬁrst hypothesized on the basis of similar computer simulations. Computer
simulation studies suggest that for ρσ 2 is between approximately 0.91 and ρσ 2 = 2/31/2 ≈ 1.1547,
the thermodynamically stable phase is the triangular crystal with increased mean neighbor spacing
that allows restricted local motion. The equilibrium concentration of defects, especially defects,
appears to be very small in this density range and to vanish exponentially as ρ approaches the
density of maximum packing. CHAPTER 8. CLASSICAL GASES AND LIQUIDS 323 Problem 8.16. Use the CarnahanStarling equation of state (8.72) to derive analytical expressions
for the excess entropy Sex , excess energy ESex , excess Helmholtz free energy for a hardsphere ﬂuid.
The excess entropy Sex is deﬁned by S − −N K ln P − N k ln βλ3 + 5N k/2 + Sex and the excess
energy is deﬁned by E = 3N kT + Eex .
Problem 8.17. Consider a onedimensional gas of particles conﬁned to a box of length L. Suppose
that the interparticle interaction is given by
u(x) = ∞ x<a
0 x≥a (8.73) Such a system of hard rods is known as a Tonks gas. (a) Evaluate the virial coeﬃcients B2 and
B3 . It is possible to do the integrals analytically. (b) Note that the form of the interaction (8.73)
prevents particles from exchanging places, that is, from changing their order. What is the available
“volume” in which the particles can move? Use this consideration to guess the form of the equation
of state. (c∗ ) Calculate the partition function and the equation of state exactly and show that
your results are consistent with part (a). 8.7 Radial Distribution Function Now that we know how to include the interactions between particles to obtain the equation of state,
we can consider how these interactions lead to correlations between the particles. We know that if
the interactions are neglected, the positions of the particles are uncorrelated and the probability of
ﬁnding a particle a distance r away from a particular particle is proportional to ρ. In the following,
we deﬁne a quantity that is a measure of the correlations between particles and determine its
dependence on r in the limit of low density.
The probability of a conﬁguration of N particles in a volume V is given in the canonical
ensemble by
PN (r1 , r2 , . . . , rN ) dr1 dr2 . . . drN = e−βU dr1 dr2 . . . drN
e−βU dr1 dr2 . . . drN = e−βU dr1 dr2 . . . drN
,
QN (8.74) where U is the total potential energy of the conﬁguration. The quantity PN dr1 dr2 . . . drN is the
probability that particle 1 is in the range dr1 about r1 , particle 2 is in the range dr2 about r2 , etc.
The conﬁgurational integral QN is deﬁned as
QN = dr1 dr2 . . . drN e−βU . (8.75) Note that QN is related to the partition function ZN for a classical system by QN = ZN /(N ! λ3N ).
QN is deﬁned so that QN = V N for an ideal gas.
The probability that particle 1 is in the range dr1 about r1 and particle 2 is in the range dr2
about r2 is obtained by integrating (8.74) over the positions of particles 3 through N :
P2 (r1 , r2 ) dr1 dr2 = e−βU dr3 . . . drN
dr1 dr2 .
QN (8.76) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 324 The probability that any particle is in the range dr1 about r1 and any other particle is in the range
dr2 about r2 is N (N − 1)P2 . In general, we deﬁne the nparticle density function for a system of
N particles as
e−βU drN −n . . . drN
N!
ρn (r1 , . . . , rn ) =
.
(8.77)
(N − n)!
QN
It is easy to see that ρ2 = N (N − 1)P2 . The normalization condition for ρn is
ρn (r1 , . . . , rn ) dr1 . . . drn = N!
.
(N − n)! (8.78) For n = 1, we have
ρ1 (r) dr = N. (8.79) If the system is homogeneous, then ρ1 (r) is independent of r, and it follows from (8.79) that
ρ1 (r) = N
= ρ.
V (8.80) We deﬁne the pair distribution function g (r1 , r2 ) as
ρ2 g (r1 , r2 ) = ρ2 = N (N − 1)P2 = N (N − 1)
or
g (r1 , r2 ) = (1 − e−βU dr3 . . . drN
,
QN 1 V 2 e−βU dr3 . . . drN
.
)
N
QN (8.81) (8.82) If the interparticle interaction is spherically symmetric and the system is a liquid, then g (r1 , r2 )
depends only on the relative separation r12 = r1 − r2  between particles 1 and 2. We adopt the
notation g (r) = g (r12 ) in the following; g (r) is called the radial distribution function.
Note that g (r) is deﬁned so that g (r) = 1 for an ideal gas. (Actually g (r) = 1 − 1/N , but we
can ignore the 1/N correction in the thermodynamic limit.) The quantity ρg (r)dr is related to the
probability of observing a second particle in the range dr a distance r away from a particle at the
origin. The normalization for ρg (r) is given by
ρ g (r) 4πr2 dr = ρ(1 − 1
1
)V 2 = N − 1.
N
V (8.83) From (8.83) we see that ρg (r) 4πr2 dr is the number of particles between r and r + dr about a
particular particle, that is, the product ρg (r) can be interpreted as the local density about a
particular particle.
The qualitative features of g (r) for a LennardJones ﬂuid are shown in Figure 8.9. We see
that g → 0 as r → 0 because the interaction between the particles does not allow the particles to
become too close. Similarly at large r the other particles are not correlated with the ﬁxed particle
and g (r) → 1 as r → ∞.
It is easy to ﬁnd the form of g (r) in the limit of low densities. In the limit N → ∞, g (r) is
given by (see (8.82))
V 2 e−βU dr3 . . . drN
.
(8.84)
g (r) =
QN CHAPTER 8. CLASSICAL GASES AND LIQUIDS 325 not done Figure 8.9: Dependence of g (r) on r for a threedimensional LennardJones liquid at kT / = xx
and ρσ 3 = xx. At low densities, we may integrate over particles 3, 4, . . . , N , assuming that these particles are
distant from particles 1 and 2 and also distant from each other. Also for almost all conﬁgurations,
we can replace U by u12 in the numerator. Similarly, the denominator can be approximated by
V N . Hence we have
V 2 e−βu12 V N −2
= e−βu(r) .
(8.85)
g (r) ≈
VN
Another way of deﬁning g (r) is in terms of the local particle density. Consider the sum
δ (r − ri ). Given the properties of the delta function, we know that the integral of this sum
about r over the volume element d3 r yields the number of particles in the volume element. Hence,
we can deﬁne the local particle density as
N
i=1 N δ (r − ri ). (8.86) δ (r − ri ) , ρ(r) = (8.87) i=1 We have N ρ1 (r) =
i=1 where δ (r − ri ) is given by
1
QN
1
=
QN δ (r − ri ) = δ (r − r1 ) e−βU (r1 ,...,rN ) dr1 . . . drN (8.88) e−βU (r,r2 ,...,rN ) dr2 . . . drN . (8.89) For a homogeneous system ρ1 (r1 ) = ρ.
Problem 8.18. Use (8.89) and the deﬁnition (8.87) to show that ρ1 (r1 ) = ρ for a homogeneous
system.
We can deﬁne ρ2 (r1 , r2 ) as
δ (r1 − ri )δ (r2 − rj ) ρ2 (r1 , r2 ) =
i=j (8.90) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 326 We can use the same reasoning as in (8.89) to show that
ρ2 (r1 , r2 ) = N (N − 1) d3 r3 d3 r4 . . . d3 rN e−βU (r1 ,r2 ,r3 ...rN )
3 r d3 r d3 r d3 r . . . d3 r e−βU (r1 ,r2 ,r3 ...rN )
d1 2 3 4
N (8.91) If we compare (8.91) and (8.81), we see that we can also deﬁne g (r1 , r2 ) as
δ (r1 − ri )δ (r2 − rj ) ρ2 g (r1 , r2 ) = (8.92) i=j Hence, we see that g (r) is related to the spatial correlation of the density ﬂuctuations in the system. 8.7.1 Relation of thermodynamic functions to g(r) We ﬁrst derive a relation for the mean energy E in terms of g (r). We write
E= 3
N kT + U ,
2 U= 1
QN where ··· (8.93)
U e−βU dr1 dr2 . . . drN . (8.94) The two terms in (8.93) are the mean kinetic energy and the mean potential energy U , respectively.
If we assume that U is given by (8.1), we can write U as
N (N − 1) 1
e−βU u(r12 ) dr1 . . . drN ,
2
QN
e−βU dr3 . . . drN
N (N − 1)
=
dr1 dr2 ,
u(r12 )
2
QN
1
=
u(r12 ) ρ2 g (r1 , r2 ) dr1 dr2 .
2 U= Hence
U= N2
2V u(r)g (r)d3 r = N ρ
2 u(r)g (r) 4πr2 dr, (8.95) (8.96) (8.97) and the total energy E per particle is given by
E
3
ρ
= kT +
N
2
2 ∞ u(r)g (r) 4πr2 dr. (8.98) 0 The form of the second term in (8.98) for the mean potential energy is clear from physical
considerations. The potential energy of interaction between a particular particle and all the other
particles between r and r + dr is u(r)ρg (r)4πr2 dr, where we have interpreted ρg (r) as the local
density. The total potential energy is found by integrating over all values of r and multiplying by
N/2. The factor of N is included because any of the N particles can be chosen as the particle at
the origin; the factor of 1/2 is included so that each pair interaction is counted only once. CHAPTER 8. CLASSICAL GASES AND LIQUIDS 327 We can derive a similar relation between the mean pressure and an integral over the product
of g (r) and the force −du(r)/dr. The pressure is related to the conﬁgurational part of the partition
function by
∂ ln QN
∂F
= kT
.
(8.99)
P =−
∂V
∂V
For large V the pressure is independent of the shape of the container. For convenience, we assume
that the container is a cube of linear dimension L, and we write the conﬁguration integral QN as
L QN = L ··· 0 e−βU dx1 dy1 dz1 . . . dxN dyN dzN . (8.100) 0 We ﬁrst change variables so that the limits of integration are independent of L and let xi = xi /L,
˜
etc. Then
1 QN = V N 1 ...
0 e−βU dx1 dy1 dz1 . . . dxN dyN dzN ,
˜˜˜
˜˜˜ (8.101) 0 where U depends on the separation rij = L[(˜i − xj )2 + (˜i − yj )2 + (˜i − zj )2 ]1/2 . This change of
x
˜
y
˜
x
˜
variables allows us to take the derivative of QN with respect to V . We have
∂QN
= N V N −1
∂V
−
where ∂U
=
∂V i<j VN
kT 1 1 ···
0
1 1 ···
0 e−βU dx1 . . . dzN
˜
˜ 0 e−βU 0 du(rij ) drij dL
=
drij dL dV ∂U
dx1 . . . dzN ,
˜
˜
∂V i<j du(rij ) rij 1
.
drij L 3L2 (8.102) (8.103) Now that we have diﬀerentiated QN with respect to V , we transform back to the original variables
x1 , . . . , zN . In the second term of (8.103) we also can use the fact that there are N (N − 1)/2
identical contributions. We obtain
∂ ln QN
1 ∂QN
N
ρ2
=
=
−
∂V
QN ∂V
V
6V kT du(r12 )
g (r12 ) dr1 dr2 ,
dr12 (8.104) (virial equation of state) (8.105) r12 and hence
PV
ρ
=1−
N kT
6kT ∞ r
0 du(r)
g (r) 4πr2 dr.
dr The integrand in (8.105) is related to the virial in classical mechanics and is the mean value of the
product r · F (cf. Goldstein). 8.7.2 Relation of g(r) to static structure function S(k) A measurement of the radial distribution function g (r) probes distances on the order of the mean
interparticle spacing, which is the order of Angstroms or 10−10 m. Such wavelengths can be obtained with neutrons or xrays. To understand how an elastic scattering experiment can probe the CHAPTER 8. CLASSICAL GASES AND LIQUIDS 328 ˚
structure, we consider xray scattering. A typical xray wavelength is 1 A, and the corresponding
energy per photon is ω = hc/λ ≈ 104 eV. This energy is very large in comparison to the thermal energy of the particles in a liquid that is the order of kT or approximately 0.1 eV at room
temperatures. Hence, collisions with the particles in a liquid will leave the photon energies almost
unaltered and to a good approximation, the scattering can be treated as elastic.
In Appendix 8A, we show that the scattered intensity of the xrays is given by
I (θ) = N I1 S (k ), (8.106) where the wave vector k is related to the scattering angle θ. The static structure function S (k ) is
given by
N
1
eik·(ri −rj ) .
(8.107)
S (k ) =
N i,j =1
It is easy to show that S (k ) is N if the particles are not correlated because in this case the only
contribution to the sum in (8.107) is for i = j .
The static structure function S (k ) is a measure of the correlations between the positions of
the atoms in the liquid and is related to the Fourier transform of g (r) as we show in the following.
We ﬁrst divide the sum over i and j in (8.107) into self, i = j , and distinct, i = j , contributions.
There are N of the former and N (N − 1) of the latter. We write
1
N (N − 1) eik·(r1 −r2 )
N
N (N − 1) dr1 . . . drN eik·(r1 −r2 ) e−βU
= 1+
.
N
dr1 . . . drN e−βU S (k ) = 1 + (8.108)
(8.109) If we use the deﬁnition (8.81) of g (r1 , r2 ), we can write (8.109) as
S (k ) = 1 + 1
N dr1 dr2 ρ2 g (r1 , r2 ) eik·(r1 −r2 ) . (8.110) For a homogeneous liquid, g (r1 , r2 ) depends only on r1 − r2 , and we obtain
S (k ) = 1 + ρ dr g (r)eik·r . (8.111) It is customary to rewrite (8.111) as
S (k ) − 1 = ρ
=ρ dr [g (r) − 1] eik·r + ρ dr eik·r dr [g (r) − 1] eik·r + ρ(2π )3 δ (k). (8.112) The contribution of the δ (k) term in (8.112) is unimportant because it is identically zero except
when k = 0, that is, for radiation not scattered by the atoms in the ﬂuid. Hence, we can rewrite
(8.112) in the desired form:
S (k ) − 1 = ρ dr [g (r) − 1] eik·r . (8.113) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 329 From (8.113) we see that S (k ) − 1 is the Fourier transform of g (r) − 1, and a measurement of
the intensity of elastically scattered radiation yields the Fourier transform of the radial correlation
function. 8.7.3 Variable number of particles In a scattering experiment the beam samples a subset of the total volume. Because the number
of particles in the subset ﬂuctuates, we need to use the grand canonical ensemble to describe the
measured value of S (k ). The energy and pressure equations (8.98) and (8.105) are identical in both
ensembles, but the compressibility equation, which we derive in the following, can only be derived
in the grand canonical ensemble because it relates the integral of g (r) − 1 and hence S (k = 0) to
ﬂuctuations in the density.
The compressibility KT is deﬁned as (see (7.1))
KT = − 1 ∂ρ
1 ∂V
=
.
V ∂P
ρ ∂P (8.114) We will use thermodynamic arguments in the following to show that
ρKT = 1 ∂N
N ∂µ T ,V . (8.115) If we combine (8.115) with (6.262), we see that the compressibility is related to the density ﬂuctuations:
1 N2 − N 2
.
(8.116)
ρKT =
kT
N
The thermodynamic argument proceeds as follows. We have that
dΩ = −P dV − SdT − N dµ = −P dV − V dP. (8.117) We cancel the P dV term on both sides of (8.117) and solve for dµ:
dµ = V
S
dP − dT.
N
N (8.118) We let v = V /N ; v is the volume per particle or speciﬁc volume. Because µ is an intensive quantity,
we can express µ as µ(v, T ) and obtain
∂µ
∂v T =v ∂P
∂v T . (8.119) We can change v by changing either V or N :
∂µ
∂v
∂µ
∂v T ,V
T ,N N 2 ∂µ
∂N ∂µ
=−
∂v ∂N
V ∂N
∂µ
∂V ∂µ
= −N
.
=
∂v ∂V
∂V
= (8.120)
(8.121) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 330 However, the way in which v is changed cannot aﬀect (8.119). Hence,
− N 2 ∂µ
V ∂N T ,V =V ∂P
∂V T ,N . (8.122) We leave it as an exercise to the reader to use (8.122) and (8.114) to obtain the desired relation
(8.115).
We now relate the integral over g (r) to the density ﬂuctuations. In the grand canonical
ensemble the probability density of ﬁnding n particular particles with positions r1 , . . . , rn in the
range dr1 , . . . , drn is given by
∞ Pn (r1 , . . . , rn ) =
N =0 zN
N! Z e−βUN drn+1 . . . drN , (8.123) where z = eβµ . There are N (N − 1) . . . (N − n + 1) = N !/(N − n)! diﬀerent sets of particles which
can correspond to the n particles. Hence, the total probability that any n particles occupy these
volume elements is given by
Pn (r1 , . . . , rn ) ≡ ρn g (r1 , . . . , rn ),
∞ =
N =0 N z
N − n)! Z (8.124a)
· · · e−βUN drN +1 , . . . drN . (8.124b) As before, the main quantity of interest is g (r1 , r2 ).
From the deﬁnition of P1 and P2 , it follows that
P1 (r1 ) dr1 = N , (8.125) and
P2 (r1 , r2 ) dr1 dr2 = N (N − 1) . (8.126) We can use (8.126) and (8.125) to obtain
P2 (r1 , r2 ) − P1 (r1 )P1 (r2 ) dr1 dr2 = N 2 − N 2 −N. (8.127) The lefthand side of (8.127) is equal to V ρ2 [g (r) − 1] d3 r for a homogeneous system. Hence we
obtain
N ρ [g (r) − 1] d3 r = N 2 − N 2 − N ,
(8.128)
or
1+ρ [g (r) − 1] d3 r = N2 − N
N 2 . (8.129) (compressibility equation) (8.130) If we use the relation (8.115), we ﬁnd the desired relation
1 + ρ [g (r) − 1] d3 r = ρkT KT . CHAPTER 8. CLASSICAL GASES AND LIQUIDS 331 The relation (8.130), known as the compressibility equation, expresses KT as an integral over g (r)
and holds only in the grand canonical ensemble. From the relation (8.113), we have
S (k = 0) − 1 = ρ [g (r) − 1] d3 r,
and hence
S (k = 0) = ρkT KT . (8.131) As mentioned, the condition (8.131) on S (k = 0) only applies in the grand canonical ensemble.
What is S (k = 0) in the canonical ensemble? Why is the value of S (k = 0 diﬀerent in these two
ensembles? 8.7.4 Density expansion of g(r) The density expansion of g (r) is closely related to the density expansion of the free energy. Instead
of deriving the relation here, we show the ﬁrst few diagrams corresponding to the ﬁrst two density
contributions. We write
g (r) = e−βu(r) y (r),
and ∞ y (r) = ρn yn (r). (8.132) (8.133) n=0 The function y (r) has the convenient property that y (r) → 1 in the limit of low density. The
diagrams for y (r) have two ﬁxed points represented by open circles corresponding to particles 1
and 2. The other particles are integrated over the volume of the system. The diagrams for y1 (r)
and y2 (r) are shown in Figure 8.10. The corresponding integrals are
y1 (r) =
and
y2 (r) = f (r13 )f (r23 ) dr3 ,
1
[2f13 f34 f42 + 4f13 f34 f42 f32 + f13 f42 f32 f14
2
+ f13 f34 f42 f32 f14 ] dr3 dr4 . (8.134) (8.135) Note that the diagrams in Figure 8.10 and hence the integrals in (8.134) and (8.135) are closely
related to the diagrams for the virial coeﬃcients. 8.8 Computer Simulation of Liquids We have already discussed several models that can be solved analytically and be used to increase
our understanding of gases and crystalline solids. For example, the ideal gas can be used as the
starting point for a density expansion, and the harmonic model provides an idealized model of a
solid. For liquids no simple model exists. The closest candidate is the hard sphere model, the CHAPTER 8. CLASSICAL GASES AND LIQUIDS 3 3 1 2
(a) 332 4 3 4 3 4 3 4 1 2 1 2 1
(b) 2 1 2 Figure 8.10: The diagrams contributing to (a) y1 (r) and (b) y2 (r). properties of which can be calculated approximately using the various methods described in the
following sections.
The development of a microscopic theory of liquids was hampered for many years by the lack
of a convenient small expansion parameter. One of the ﬁrst successes of the computer simulations
of simple liquids is that they lead to the development of a microscopic theory of liquids based
on the realization that the details of the weak attractive part of the interparticle interaction are
unimportant. The implication of this realization is that g (r) does not depend strongly on the
temperature. As we will ﬁnd in Problem 8.19, simulations have also taught us that the behavior
of g (r) is also insensitive to the details of the repulsive part of the interaction. Hence g (r) for a
system of system of hard spheres is a good approximation to g (r) for LennardJones systems at
the same density. Moreover, we can simulate hard sphere systems and obtain essentially exact
solutions for g (r) and the equation of state.
In the Monte Carlo implementation of the canonical ensemble, a particle is chosen at random
and is moved randomly to a new position, and the change in energy of the system, ∆E , is calculated.
Why are the velocities of the particles irrelevant? As discussed in Section 4.11, we can do a Monte
Carlo simulation of the equilibrium properties of a system of particles by using the Metropolis
algorithm. In this algorithm, a particle is chosen at random and randomly displaced. The move
is accepted if the change is energy ∆E < 0; otherwise a random number r between 0 and 1 is
generated and the move is accepted if r ≤ e−β ∆E . Otherwise, the move is rejected, and the new
conﬁguration is identical to the old conﬁguration.
For a system of hard spheres, the rules are even simpler. If the trial position of a particle does
not overlap another sphere, the trial position is accepted; otherwise it is rejected.
Problem 8.19. Use the applet at xx to determine the behavior of g (r) by doing Monte Carlo
simulations for diﬀerent interactions, densities, and temperatures. (a) For a given interaction,
describe qualitatively how g (r) changes with the density ρ at a given temperature T . (b) How does
g (r) change with T for a given ρ? (c) Consider the LennardJones and hard sphere interactions
and conﬁrm that the form of g (r) depends more on the repulsive part of the interaction and only
weakly on the attractive part. (d) Conﬁrm that the function y (r) introduced in (8.132) depends
less sensitively on the interaction than does g (r) itself. CHAPTER 8. CLASSICAL GASES AND LIQUIDS 8.9 333 Perturbation Theory of Liquids The physical picture of a liquid that we developed in Section 8.8 from several computer simulations
suggests that the repulsive part of the interaction dominates the structure of a liquid, and the
attractive part of the interaction can be treated as a perturbation. In the following we develop a
perturbation theory of liquids in which the unperturbed or reference system is taken to be a system
of hard spheres rather than an ideal gas. The idea is that the eﬀect of the diﬀerence between the
hard sphere interaction and the real interaction should be small. We will see that if we choose the
eﬀective diameter of the hard spheres in a clever way, the diﬀerence can be minimized.
We begin by writing the potential energy as
U = Uref + U1 , (8.136) where Uref is the potential energy of the reference system, and U1 will be treated as a perturbation.
The conﬁgurational integral QN can be written as
QN = · · · e−β (Uref +U1 ) dr1 . . . drN . (8.137) We multiply and divide the righthand side of (8.137) by
Qref = · · · e−βUref dr1 . . . drN , (8.138) and write
QN = ···
= Qref e−βUref dr1 . . . drN
··· · · · e−β (Uref +U1 ) dr1 . . . drN
,
Qref Pref e−βU1 dr1 . . . drN , where
Pref = e−βUref
.
Qref (8.139) (8.140) We see that we have expressed QN in (8.139) as the average of exp(−βU1 ) over the reference
system. We write
(8.141)
QN = Qref e−βU1 ref ,
and
− βF1 = ln e−βU1
∞ = ref , (8.142) n (−β ) Mn
.
n!
n=1 (8.143) The brackets . . . ref denote an average over the microstates of the reference system. That is,
the microstates are weighted by the probability Pref . Note that in Section 8.2, the brackets . . .
included a factor 1/V with each integral over r. This factor is omitted in this section and in the
following. CHAPTER 8. CLASSICAL GASES AND LIQUIDS 334 Problem 8.20. Show that if we choose the reference system to be an ideal gas, the above expressions reduce to formal expressions similar to those considered in Section 8.2.
The leading term in (8.143) is
N M 1 = U1 ref = u1 (rij ) ref , i<j =1 = N (N − 1) 1
2
Qref · · · e−βUref u1 (r12 ) dr1 . . . drN . (8.144) The radial distribution function of the reference system is given by
ρ2 gref (r12 ) = N (N − 1) e−βUref dr3 . . . drN
.
Qref (8.145) Hence
M1 = ρ2
2
= u1 (r12 )gref (r12 ) dr1 dr2
ρN
2 and
F = Fref + u1 (r)gref (r) d3 r, ρN
2 u1 (r)gref (r) d3 r. (8.146) (8.147) Note that Fref in (8.147) is the free energy of the reference system. We evaluate (8.147) with
increasing sophistication in the following two subsections. 8.9.1 The van der Waals Equation As we have mentioned, the reason for the success of the reference theory of liquids is that the
structure of a simple liquid is determined primarily by the repulsive part of the potential. The
main eﬀect of the remaining part of the potential is to provide a uniform potential in which the
particles move. This idea is not new and is the basis of the van der Waals equation of state. We
now show how the van der Waals equation of state can be derived from the perturbation theory
developed in Section 8.9 about the hard sphere reference potential.
If we assume that gref (r) has the simple form
gref (r) = 0 r<σ
1 r ≥ σ, (8.148) then M1 in (8.146) can be written as
∞ M1 = 2πρN
where σ u1 (r) r2 dr = −ρaN, (8.149) CHAPTER 8. CLASSICAL GASES AND LIQUIDS
∞ a = −2π σ 335 u1 (r) r2 dr. (8.150) Our next job is to approximate Fref . Until recently, the equation of state of hard spheres was
not known very accurately. A simple way of approximating Fref is based on the assumption that
the eﬀective volume available to a particle in a ﬂuid is smaller than the volume available in an ideal
gas. In this spirit we assume that Fref has the same form as it does for an ideal gas (see (6.28)),
but with V replaced by Veﬀ . We write
Fref
= 3 ln λ − 1 − ln Veﬀ + ln N,
N kT (8.151) Veﬀ = V − V0 , (8.152) where
and 1 4πσ 3
N
.
(8.153)
2
3
In (8.153) we have accounted for the fact that only half of the volume of a sphere can be assigned
to a given particle. The corresponding equation of state with the above approximations for M1 ,
Fref , and Veﬀ is given by
V0 = N b = 1
aρ
PV
=
−
.
N kT
1 − bρ kT (van der Waals equation) (8.154) Equation (8.154) is the familiar van der Waals equation of state and gives results that are in poor
agreement with experimental data, especially if a is calculated using (8.150). Better results can
be obtained by regarding a and b as free parameters chosen to ﬁt a measured equation of state.
However, agreement with experimental data is only qualitative. Nevertheless, the van der Waals
equation of state predicts a liquidgas transition as we will discuss in Section xx. 8.9.2 ChandlerWeeksAndersen theory We now develop a more systematic way of choosing the reference potential. The most appropriate
way of dividing the interparticle potential into a reference part and a perturbative part is not
obvious. One way is to separate the potential into positive and negative contributions. This choice
implies that we should separate the LennardJones potential at r = σ . Another way is to separate
the potential into a part containing all the repulsive forces and a part containing all the attractive
forces. This choice is the one adopted by the ChandlerWeeksAndersen theory. We write the
LennardJones potential as
(8.155)
u(r) = uref (r) + u1 (r),
and add and subtract a factor of to write:
uref (r) = u1 (r) = u(r) +
0
−
u(r) Note that we have added and subtracted a factor of r < 21/6 σ
r ≥ 21/6 σ
r < 21/6 σ
r ≥ 21/6 σ
to u(r) for r < 21/6 σ . (8.156) (8.157) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 336 Problem 8.21. (a) Plot the dependence of u1 (r) on r and conﬁrm that u1 (r) is a slowly varying
function of r.
Because the reference system in the ChandlerWeeksAndersen theory is not a hard sphere
interaction, the properties of the reference system are not well known and further approximations
are necessary. In the ChandlerWeeksAndersen approach, the reference system is approximated by
hard spheres with a temperature and densitydependent diameter d. As we have seen, the function
y (r) = eβu(r) g (r) is a slowly varying function whose shape does not depend sensitively on the form
of the interaction. In the ChandlerWeeksAndersen theory y (r) and the reference free energy are
approximated by their hard sphere forms:
gref (r) ≈ e−βuref (r) yhs (r, d), (8.158) Fref = Fhs (d). (8.159) and
Fhs and yhs are the free energy and the distribution functions for hard spheres of diameter d.
The value of d at a given value of ρ and T is found by requiring that the compressibility of the
reference system equal the compressibility of a hard sphere system of diameter d via the relation
(see (8.130)):
∂ρ
= 1 + ρ [g (r) − 1] dr.
(8.160)
kT
∂P
We write
[e−βuref yhs (r, d) − 1] dr = e−βuhs [yhs (r, d) − 1] dr,
(8.161)
where uhs is the hard sphere interaction. The numerical solution of (8.161) yields a unique value
of d as a function of T and ρ. 8.10 *The OrnsteinZernicke Equation We have seen that we can derive a density expansion for the pressure and for the radial distribution
function g (r). However, we do not expect to obtain very accurate results by keeping only the ﬁrst
several terms. An alternative procedure is to express g (r) in terms of the direct correlation function
c(r). The latter function is generally a more slowly varying of r. An approximation for c(r) then
yields a better approximation for g (r) than would be found by approximating g (r) directly.
Because g (r) → 1 for r
1, it is convenient to deﬁne the function h(r) by the relation
h(r) = g (r) − 1 (8.162) so that h(r) → 0 for suﬃciently large r and h(r) = 0 for an ideal gas. The function h(r) describes
the correlations between two particles. It also is convenient to introduce a third correlation function, c(r), known as the direct correlation function. We can think of the total correlation between
particles 1 and 2 as due to the direct correlation between them and the indirect correlation due
to the presence of increasingly larger numbers of intermediate particles. We deﬁne c(r) by the
relation (for a homogeneous and isotropic system):
h(r) = c(r) + ρ c(r − r ) h(r ) dr. (OrnsteinZernicke equation) (8.163) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 337 The relation (8.163) was ﬁrst proposed in 1914 and is known as the OrnsteinZernicke equation. It
is plausible that the range of c(r) is comparable to the range of the potential u(r), and that h(r)
is longer ranged than u(r) due to the eﬀects of the indirect correlations. That is, in general, c(r)
has a much shorter range than h(r). To lowest order in the density, c(r) = f (r) and for large r,
c(r) = −βu(r) as expected.
We can ﬁnd c(r) from h(r) by introducing the Fourier transforms
c(k ) = c(r) eik·r dr, (8.164) h(k ) = h(r) eik·r dr. (8.165) and If we take the Fourier transform of both sides of (8.163), we ﬁnd that
h(k ) = c(k ) + ρ c(k )h(k ), (8.166) or
c(k ) = h(k )
,
1 + ρh(k ) (8.167) h(k ) = c(k )
.
1 − ρc(k ) (8.168) and 8.11 *Integral Equations for g(r) The OrnsteinZernicke equation can be used to motivate several approximate integral equations
for g (r) that are applicable to dense liquids. The most useful of these equations for systems with
shortrange repulsive interactions is known as the PercusYevick equation. Although this equation
can be derived using graph theory and corresponds to ignoring an (inﬁnite) subset of diagrams
for c(r), the derivation does not add much physical insight. Instead we will only summarize the
nature of the equation.
PercusYevick equation. It is convenient to consider the function y (r) deﬁned by (8.132) instead
of the function g (r) because y (r) is more slowly varying. One way to motivate the PercusYevick
equation is to note that to lowest order in the density, y (r) = ρy1 (r), where y1 is given by (8.134).
Similarly, the lowest order density contributions to c(r) are c0 (r) = f (r) and c1 (r) = f (r)y1 (r).
We generalize this relation between c(r) and y (r) and assume that
c(r) = f (r)y (r) = [1 − eβu(r) ] g (r). (8.169) Equation (8.169) is correct to ﬁrst order in the density. Note that the c(r) as given by (8.169) is a
shortrange function, that is, the same range as f (r) or u(r). If we substitute the approximation
(8.169) into the OrnsteinZernicke equation, we ﬁnd
y (r12 ) = 1 + ρ f (r13 )y (r13 )h(r23 ) dr3 . (PercusYevick equation) (8.170) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 338 Equation (8.170) is an example of a nonlinear integral equation. In general, the PercusYevick
must be solved numerically, but it can be solved analytically for hard spheres (see Appendix 8C).
The analytical solution of the PercusYevick equation for hard spheres can be expressed as
c(r) = −(1 − η )−4 (1 − 2η )2 − 6η (1 + 1 η )2 (r/σ ) + 1 η (1 + 2η )2 (r/s)2
2
2
0, r<σ
r > σ. (8.171) Given c(r), we can ﬁnd g (r). Because the derivation is tedious, we will omit it and give only
the result for g (r) at contact:
g (σ ) = lim g (r = σ + ) =
+
→0 1 + 1η
2
.
(1 − η )2 (8.172) Given the PercusYevick result for g (σ ) we can ﬁnd the corresponding approximate equation
of state for hard spheres from the virial equation of state (8.105). For a discontinuous potential
we can simplify the form of (8.105). We write
du −βu(r)
1 d −βu(r)
du
g (r) =
e
e
y (r) = −
y (r).
dr
dr
β dr (8.173) For hard spheres we know that e−βu(r) = 0 for r < σ and e−βu(r) = 1 for r > σ . Hence, we can
write e−βu(r) = Θ(r − σ ) and de−βu(r) /dr = δ (r − σ ). Using this result we have that the pressure
of a hard sphere ﬂuid is given by
P = ρkT 1 + 2π 3
ρσ g (σ ) .
3 (8.174) If we substitute (8.172) into (8.174), we ﬁnd the virial equation of state for a hard sphere ﬂuid
P = ρkT 1 + 2η + 3η 2
,
(1 − η )2 (virial equation of state) (8.175) where η = πρσ 3 /6.
An alternative way of calculating the pressure is to use the compressibility relation (8.130).
We can write (8.130) in terms of c(r) rather than g (r) − 1 by using the OrnsteinZernicke equation:
g (r) − 1 = c(r) + ρ c(r − r )[g (r ) − 1]d3 r . (8.176) We multiply both sides of this equation by d3 r and integrate noting that c(r − r ) d3 r = c(r) d3 r
and rearranging the results to ﬁnd
[g (r) − 1]d3 r = c(r) d3 r
.
1 − ρ c(r) d3 r (8.177) Finally we combine (8.177) with (8.130) to ﬁnd
(kT ∂ρ −1
) =1−ρ
∂P c(r) d3 r. (8.178) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 339 If we substitute the PercusYevick result (8.171) into (8.178) and integrate, we ﬁnd
P = ρkT 1 + η + η2
.
(1 − η )3 (compressibility equation of state) (8.179) If the PercusYevick equation were exact, the two ways of obtaining the equation of state would
yield identical results.
The PercusYevick equation gives reasonable predictions for g (r) and the equation of state for
ﬂuid densities. However, it √
predicts ﬁnite pressures for η < 1 even though the maximum density
√
is ρmax = 2/σ 3 or ηmax = 2π/6 ≈ 0.74.
Mean spherical approximation. Another simple closure approximation for the OrnsteinZernicke equation can be motivated by the following considerations. Consider a ﬂuid whose particles interact via a pair potential of the form
u(r) = ∞
v (r) r<σ
r>σ (8.180) where v (r) is a continuous function of r. Because of the hard sphere cutoﬀ, g (r) = 0 for r < σ .
For large r the PercusYevick approximation for c(r) reduces to
c(r) = −βv (r). (8.181) The mean spherical approximation is based on the assumption that (8.181) holds not just for large
r, but for all r. Hence, the mean spherical approximation is
c(r) = −βv (r) r>σ (8.182a) and
g (r) = 0 r<σ (8.182b) together with the OrnsteinZernicke equation.
The hypernettednetted chain approximation is another useful integral equation for g (r). It is
equivalent to setting
c(r) = f (r)y (r) + y (r) − 1 − ln y (r). (hypernetted chain equation) (8.183) If we analyze the PercusYevick and the hypernettedchain approximations in terms of a virial
expansion (see Problem 8.32), we ﬁnd that the hypernettedchain approximation appears to be
more accurate than the PercusYevick approximation. However, it turns out that the PercusYevick approximation works better for hard spheres and other shortrange potentials. In contrast,
the hypernettedchain approximation works better for the onecomponent plasma and other longrange potentials. 8.12 *Coulomb Interactions The virial or density expansion of the free energy and other thermodynamic quantities is not
applicable to a gas consisting of charged particles interacting via the Coulomb potential. For CHAPTER 8. CLASSICAL GASES AND LIQUIDS 340 example, we found in Problem 8.8 that the second virial coeﬃcient does not exist if the interparticle
potential u(r) decreases less rapidly than 1/r3 at large distances. This divergence is symptomatic of
the fact that a density expansion is meaningless for a system of particles interacting via longrange
forces. 8.12.1 DebyeH¨ckel Theory
u The simplest model of a system of particles interacting via the Coulomb potential is a gas of
mobile electrons moving in a ﬁxed, uniform, positive background. The charge density of the
positive background is chosen to make the system overall neutral. Such a system is known as an
electron gas or a onecomponent plasma.
Debye and H¨ ckel developed a theory that includes the interactions between charged particles
u
in an approximate, but very clever way. Consider an electron at r = 0. The average electric
potential φ(r) in the neighborhood of r = 0 is given by Poisson’s equation:
∇2 φ(r) = −4π [(negative point charge at origin)
+ (density of positive uniform background)
+ (density of negative electron gas charge)] (8.184) or
∇2 φ(r) = −4π [−eδ (r) + eρ − eρ(r)], (8.185) where ρ is the mean number density of the electrons and the uniform positive background, and
ρ(r) is the number density of the electrons in the vicinity of r = 0. We assume that to a good
approximation ρ(r) is given by the Boltzmann factor
ρ(r) = ρ eeφ(r)/kT . (8.186) If we combine (8.185) and (8.186), we obtain the PoissonBoltzmann equation for φ(r):
∇2 φ(r) = 4πe δ (r) − ρ + ρeφ(r)/kT . (8.187) Equation (8.187) is a nonlinear diﬀerential equation for φ(r). For φ/kT
1, we can write
eβφ(r) ≈ 1 + βφ(r). In this way we obtain the linear PoissonBoltzmann equation
(∇2 − κ2 )φ(r) = 4πe δ (r), (8.188) where κ2 is given by
4πρe2
.
(8.189)
kT
We look for solutions to (8.187) that are spherically symmetric. The solution of (8.189) can be
shown to be
e−κr
.
(8.190)
φ(r) = −e
r
The energy of the other elections in the potential φ is u = −eφ so that the eﬀective energy of
interaction is given by
e−κr
.
(8.191)
ueﬀ (r) = e2
r
κ2 = CHAPTER 8. CLASSICAL GASES AND LIQUIDS 341 The result (8.190) or (8.191) shows that the electrons collaborate in such a way as to screen the
potential of a given electron over a distance λD = 1/κ. The quantity λD is called the Debye length.
Note that to use statistical arguments, it is necessary that many particles have approximately
the same potential energy, that is, many particles need to be within the range λD of φ. This
requirement can be written as ρλ3
1 or
D
ρλ3 =
D kT
4πe2 ρ1/3 3/2 . (8.192) The condition (8.192) holds in the limit of low density and high temperature.
Problem 8.22. Show that (8.192) is equivalent to the condition that the mean interaction energy
is much smaller than the mean kinetic energy. 8.12.2 Linearized DebyeH¨ckel approximation
u The OrnsteinZernicke is exact, but h(r) and c(r) are not known in general and some type of
closure approximation must be made. The simplest approximation is to assume that c(r) is given
by
c(r) = −βu(r)
(8.193)
for all r. For the one component plasma, u(r) = e2 /r, u(k ) = 4πe2 /k 2 , and (8.193) implies that
4πe2
.
k2 (8.194) 4πβe2
,
k 2 + κ2 (8.195) βe2 −κr
e
,
r (8.196) c(k ) = −β
If we substitute (8.194) into (8.168), we ﬁnd
h(k ) = −
where κ2 = 4πβρe2 . Hence, we have
h(r) = −
and βe2 −κr
e
.
(8.197)
r
Note that in this approximation g (r) becomes negative at small r. This failure can be corrected
by letting
βe2 −κr
e
.
(8.198)
g (r) = exp −
r
Equation (8.198) is the DebyeH¨ ckel form of g (r).
u
g (r) = 1 − CHAPTER 8. CLASSICAL GASES AND LIQUIDS 342 Figure 8.11: The ﬁrst several diagrams for the high temperature expansion of g (r). The bonds
represent the Coulomb interaction u(r) = e2 /r. 8.12.3 Diagrammatic Expansion for Charged Particles Recall that we derived the density expansion of the free energy by ﬁrst doing a high temperature
expansion. Although individual terms in this expansion diverge at short separations for systems
with a strongly repulsive interaction, the divergence can be removed by rearranging the expansion
so that all terms of a given density were grouped together and summed. However, an expansion in
terms of the density and the interaction of small numbers of particles makes no sense if the interaction is longrange. We now discuss how to group the individual terms in the high temperature
expansion so that the divergence at large separations for the Coulomb potential is removed.
Instead of considering the diagrammatic expansion for the free energy, we consider the diagrammatic expansion for g (r), because the latter is easier to interpret physically. It is convenient
to write g (r) in the form
(8.199)
g (r) = e−βψ(r) ,
where ψ (r) is the potential of mean force. Instead of deriving the high temperature or low density
expansion for ψ (r), we will use our intuition to write down what we think are the relevant diagrams.
Some typical diagrams for ψ (r) are shown in Figure 8.11. It is easy to show that the contribution
of the individual diagrams diverges for the Coulomb potential. Recall that we obtained the low
density expansion by ﬁrst summing up all possible u bonds between two particles. Because the
Coulomb interaction is longrange, it is plausible that we should ﬁrst sum up the inﬁnite class of
diagrams corresponding to the maximum number of particles for a given number of bonds. The
ﬁrst several ring diagrams are shown in Figure 8.12 and are called ring diagrams. Because we know
that the Coulomb interaction is screened, we expect that we should add up the contributions of
all the ring diagrams before we include the contributions of other diagrams.
The contribution of the ring diagrams to ψ (r) is given by
− βψ (r) = −βu(r) + ρ
+ ρ2 [−βu(r13 )][−βu(r32 )] d3 r3 [−βu(r13 )][−βu(r34 )][−βu(r42 )] d3 r3 d3 r4 + . . . , (8.200) where r = r12 . The structure of the integrals in (8.200) is the same as the convolution integral
discussed in Appendix 8A. We follow the same procedure and take the spatial Fourier transform
of both sides of (8.200). The result can be written as
−βψ (k ) = −βu(k ) + ρ[−βu(k )]2 + ρ2 [−βu(k )]3 + . . . , (8.201) where u(k ) is the Fourier transform of u(r) = e2 /r. In Problem 8.34 it is shown that
u(k ) = 4πe2
.
k2 (8.202) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 343 Figure 8.12: The ﬁrst several ring diagrams for g (r). The bonds represent the Coulomb interaction
u(r) = e2 /r. Because the terms in (8.201) form a geometrical series, we obtain
−βu(k )
1 + ρβu(k )
β 4πe2
=− 2
k + ρβ 4πe2
4πe2
= −β 2
,
k + κ2 − βψ (k ) = (8.203) (8.204) where κ is given by (8.189). Hence
ψ (k ) =
and
ψ (r) = 4πe2
+ κ2 (8.205) k2
e2
kT −κr r . (8.206) The form of (8.206) implies that e2 −κr
.
(8.207)
kT r
From (8.207) we see that the eﬀective interaction is the DebyeH¨ckel screened potential. Note
u
that the screening of the interaction between any two electrons is due to the collective eﬀect of all
the electrons.
g (r) − 1 = − Problem 8.23. The total energy per particle of the onecomponent plasma is given by
3
ρ
E
= N kT +
N
2
2 [g (r) − 1]u(r) d3 r. (8.208) The factor of −1 on the righthand side of (8.208) is due to the uniform positive background.
Substitute the approximation (8.207) for g (r) and show that
3
1 e2
E
= N kT −
.
N
2
2 λD 8.13 Vocabulary density expansion, virial coeﬃcients (8.209) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 344 Mayer f function
cumulant, high temperature expansion
disconnected, reducible, and irreducible diagrams
ladder diagrams, ring diagrams
radial distribution function g (r)
static structure function S (k )
reference theory of liquids
DebyeH¨ ckel theory
u Appendix 8A: The third virial coeﬃcient for hard spheres
In Problem 8.4 we showed that B2 = 2πσ 3 /3 for hard spheres. The form of the third virial
coeﬃcient B3 is (see (8.10b))
B3 = − 1
3V f (r1 − r2 )f (r1 − r3 )f (r2 − r3 ) dr1 dr2 dr3 . (8.210) B3 also can be calculated by geometrical considerations, but these considerations test one’s geometrical intuition. An easier way to proceed is to take advantage of the fact that (8.210) is an
example of a convolution integral which can be expressed more simply in terms of the Fourier
transform of f (r). We introduce the Fourier transform of f (r) as
f (k ) = f (r) e−ik·r d3 r
∞ = 4π
0 r2 f (r) sin kr
dr
kr (8.211) The form (8.211) follows from (8.211) by choosing k to be along the z axis of a spherical coordinate
system. This choice allows us to write k · r = kr cos θ and d3 r = r2 dr sin θdθ dφ and do the
integration over θ and φ (see Problem 8.35). Because f (r) = −1 for 0 < r < 1 and f (r) = 0 for
r > 1, we can do the integral in (8.211) to ﬁnd
f (k ) = 4πσ 3 cos kσ
sin kσ
−
.
(kσ )2
(kσ )3 (8.212) The term in brackets in (8.212) can be related to a halfinteger Bessel Function.
The main trick is to express B3 in (8.210) in terms of f (k ). To do so, we introduce the inverse
transform
d3 k
f (r) =
f (k )eik·r
(8.213)
(2π )3 CHAPTER 8. CLASSICAL GASES AND LIQUIDS 345 for any function f (r). If we substitute (8.213) into (8.210), we ﬁnd
1
dk1 dk2 dk3 ik1 ·(r2 −r1 ) ik2 ·(r3 −r1 ) ik3 ·(r3 −r2 )
e
e
e
3V (2π )3 (2π )3 (2π )3
f (k1 )f (k2 )f (k3 ) dr1 dr2 dr3 . B3 = − (8.214) We ﬁrst group the exponentials in (8.214) so that we can do the integrals over r1 and r2 by using
the property that
d3 r eik·r = (2π )3 δ (k). (8.215) The result is that k1 = k3 = −k2 . The integral over r3 gives a factor of V . Because f depends
only on k, we can write (8.214) as
B3 = − 1
3 d3 k
f (k )3 ,
(2π )3 (8.216) where f (k ) is given by (8.212). It is straightforward to evaluate the resultant integral by numerical
methods. The resultant integral can also be evaluated analytically (see Problem 8.35). The result
is
5π 2 6
σ.
B3 =
(8.217)
18
B4 can also be evaluated analytically, but higher order virial coeﬃcients have to be evaluated by
Monte Carlo methods (see Problem 8.38). Appendix 8B: Relation of Scattered Intensity of XRays to
g(r)
Consider a plane wave of the form a0 eiq·r incident on a ﬂuid sample that contains N atoms of
atomic number Z . We want to know the intensity of the scattered radiation at an angle θ relative
to the incident beam (see Figure 8.13). The scattering is the result of the interaction of the incident
(n)
beam with the atomic electrons. We let ri be the position of the ith atom and ri the position
of the nth electron of the ith atom. The amplitude at a distance r of the wave scattered by an
electron at the origin is
eiq r
,
(8.218)
A(θ) = f (θ)
r
where q is the magnitude of the wave vector of the scattered wave. Note that (8.218) has the form
of a spherical wave. The atomic scattering form factor depends on the polarization of the incident
beam according to Thomson’s formula:
e2
cos θ
mc2
e2
= a0 2
mc f (θ) = a0 (polarization vector in the scattering plane) (8.219a) (polarization is orthogonal) (8.219b) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 346 not done Figure 8.13: The geometry of elastic scattering. (n) An electron at ri scatters in the direction θ (wave vector q ) with a phase diﬀerence such that
(n) q · ri (n) − q · ri (n) = k · ri , (8.220) where
k=q −q (8.221) is the scattering vector. The amplitude of the wave scattered by the electron
eiq r ik·r(n)
e i.
r
Because the scattering is elastic, q = q , and the magnitude of k is
(n) Ai (θ) = f (θ) (n)
ri is given by
(8.222) θ
(8.223)
k = k = 2q sin .
2
The amplitude of the wave scattering by the ith atom is the sum of the amplitudes scattered by
each of the Z electrons:
Ai (θ) = f (θ) eiq r
r Z (n) eik·ri (8.224a) n=1
Z = f (θ) (n)
eiq r ik·ri
e
eik·(ri −ri ) .
r
n=1 (8.224b) (n) Note that ri − ri is the position of the nth electron of the ith atom relative to its own nucleus.
The amplitude of the wave scattered by the ﬂuid is given by
N A(θ) = Ai (θ). (8.225) i=1 The intensity of the radiation that is scattered in the direction θ is given by A(θ)2 , where
the average represents the quantum average over the electronic states of each atom and an ensemble
average over the positions of the atoms in the ﬂuid. We have
I (θ) = A(θ)2 = a(θ)2
N F (k )S (k ),
r2 (8.226) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 347 where a(θ)2 is an average over the two polarization states:
a(θ)2 = a2
0 e4 1 + cos2 θ
.
m 2 c4
2 (8.227) The form factor F (k ) is the same for each atom and is given by
Z (n) eik·(ri F (k ) =  −ri ) 2 , (8.228) n=1 where the average in (8.228) denotes a quantum average. The structure function S (k ) is deﬁned
by the statistical average:
N
1
2
S (k ) =
eik·ri .
(8.229)
N i=1
We see that the scattered intensity is proportional to S (k ), the quantity of interest. Appendix 8C: Solution of the PercusYevick Equation
[xx not ﬁnished xx] 8.14 Additional Problems Problem 8.24. Why is the method that we have used to obtain the virial expansion for a classical
ﬂuid not applicable to a quantum system?
Problem 8.25. Show that the virial coeﬃcient B2 given in (8.33) can be written in the form
B2 = − 1
6kT ∞ r
0 du(r) −βu(r)
e
4πr2 dr.
dr (8.230) What is the condition on u(r) for this form to be meaningful?
Problem 8.26. Calculate ZN /Zideal for a system of N = 3 particles using (8.24). Express B3 in
terms of Z3 /Zideal and Z2 /Zideal.
Problem 8.27. Assume that g (r) can be written in the form
g (r) = g0 + ρg1 (r) + ρ2 g2 (r) + · · · (8.231) Use the virial equation of state (8.105) to obtain
PV
ρ
=1−
N kT
6kT ∞
n=0 ∞ ρn r
0 du
gn (r) 4πr2 dr.
dr (8.232) CHAPTER 8. CLASSICAL GASES AND LIQUIDS 348 Compare the density expansion (8.9) of P V /N kT with (8.231) and show that
Bn+2 = − ∞ 1
6kT r
0 du
gn (r)4πr2 dr.
dr (8.233) From the result of Problem 8.25 show that
g0 (r) = e−βu(r) . (8.234) Problem 8.28. Show that a system of hard disks attains its maximum packing density in a
regular triangular lattice with each disk touching six neighbors. In this arrangement ρ0 σ 2 =
2/31/2 ≈ 1.1547.2
∗ Problem 8.29. Use the applet at xx to ﬁnd the pressure of the hard disk crystal as a function of
the density ρ. It is generally accepted that the pressure exhibits a simplepole divergence at close
packing:
4/31/2
βP σ 2 =
.
(8.235)
1 − ρ/ρ0
Problem 8.30. Use the relation (8.105) to ﬁnd the form of B2 implied by (8.85).
Problem 8.31. Use the fact that the Fourier transform of the density ρ(r) is
ρk = e −ik·r N ρ(r) dr = e−ik·ri (8.236) i=1 to show that S (k) can be expressed as
1
ρk ρ− k .
N S (k) = (8.237) From (8.237) is straightforward to show that S (k) can be written as
S (k) = 1
N N N e−ik·ri eik·rj (8.238) i=1 j =1
N = 1
e−ik·ri 2 .

N i=1 (8.239) Problem 8.32. Write y (r) in the form of a density expansion
y (r) = eβu g (r) = 1 + ρy1 (r) + ρ2 y2 (r) + · · · (8.240) What is the form of y1 and y2 implied by the PercusYevick and hypernetted chain approximations?
Problem 8.33. Pad´ approximation for the equation of state. [xx not ﬁnished xx]
e
Problem 8.34. Derive the form of the Fourier transform of the Coulomb potential u(r) = e2 /r.
2 See for example, C. A. Rogers, Packing and Covering, Cambridge University Press (1964), Chapter 1. CHAPTER 8. CLASSICAL GASES AND LIQUIDS 349 Problem 8.35. Do the integral (8.211) in Appendix 8A.
Problem 8.36. Use the approximate results (8.175) and (8.179) for the PercusYevick equation
of state of hard spheres to compare the exact virial coeﬃcients summarized in Table 8.1 with those
predicted by (8.175) and (8.179). Which equation of state gives better results?
Problem 8.37. Show that the mean spherical approximation (8.182) is equivalent to the PercusYevick equation (8.170) for hard spheres.
Problem 8.38. (a) Use a Monte Carlo method to evaluate the integrals associated with B3 for
a hard core interaction. Your estimate should be consistent with the result B3 = 5πσ 6 /18 (see
(8.73)). Choose units such that σ = 1. One simple procedure is to generate points [xx not ﬁnished
xx] Higher order virial coeﬃcients are given by Rhee. Suggestions for Further Reading
Earlier work on hard disks was done by B. J. Alder and T. E. Wainwright, Phys. Rev. 127,
349 (1962). See also A. C. Mitus, H. Weber, and D. Marx, Phys. Rev. E 55, 6855 (1997).
John A. Barker and Douglas Henderson, “What is “liquid”? Understanding the states of
matter,” Rev. Mod. Phys. 48, 587 (1976).
R. Brout and P. Carruthers, Lectures on the ManyElectron Problem, Interscience Publishers
(1963). Chapter 1 has an excellent introduction to the cumulant expansion.
David Chandler, Introduction to Modern Statistical Mechanics, Oxford University Press (1987).
Herbert Goldstein, Classical Mechanics, second edition, AddisonWesley (1980).
JeanPierre Hansen and Ian R. McDonald, Theory of Simple Liquids, second edition, Academic
Press (1986).
Joseph Kestin and J. R. Dorfman, A Course in Statistical Thermodynamics, Academic Press
(1971). Chapter 7 in this senior/graduate text has a good discussion of the properties of real gases. Chapter 9 Critical Phenomena and the
Renormalization Group
c 2005 by Harvey Gould and Jan Tobochnik
14 December 2005
One of the greatest recent achievements of theoretical physics has been the development of the
renormalization group. This development has had a major impact on our understanding of phase
transitions, quantum ﬁeld theory, turbulence, and the emerging science of dynamical systems. We
ﬁrst introduce the renormalization group in the context of a simple geometrical model that exhibits
a continuous transition. We then discuss some of the ideas associated with critical phenomena by
discussing the van der Waals equation of state and introducing more sophisticated versions of
meanﬁeld theory. Finally, we discuss two renormalization group methods for the Ising model. 9.1 A Geometrical Phase Transition We ﬁrst consider a simple geometrical model that exhibits a continuous phase transition. The
model, known as percolation, does not involve the temperature or the evaluation of a partition
function. Although this transition is easy to imagine, especially by implementing the model on a
computer, the questions that are raised by the model are as profound as they are for more complex
systems such as the Ising model.
The applications of percolation are more diﬃcult to understand than percolation in the context
of a welldeﬁned mathematical problem. The ﬂow of oil through porous rock, the behavior of a
random resistor network, and the spread of a forest ﬁre are just a few of the many applications.
The problem of percolation is easiest to formulate on a lattice. Consider a large lattice and
assume that every site in the lattice can be in one of two states, “occupied” or “empty.” Each site
is occupied independently of its neighbors with probability p. This model of percolation is called
site percolation. The nature of percolation is related to the connectivity properties of the lattice.
We deﬁne a cluster as a group of occupied sites that are connected by nearestneighbor distances
350 CHAPTER 9. CRITICAL PHENOMENA 351 (see Figure 9.1). Two occupied sites belong to the same cluster if they are linked by a path of
nearestneighbor connections joining occupied sites. (b) (a) Figure 9.1: The deﬁnition of a cluster on a square lattice. In (a) the two sites are in the same
cluster, and in (b) they are not. We can use the random number generator on a calculator to study the qualitative properties
of percolation. The idea is to generate a random number for each lattice site and occupy a site if
its random number is less than p. Because each site is independent, the order that the sites are
visited is irrelevant. If p is small, there are many small disconnected clusters (see Figure 9.2a). As
p is increased, the size of the clusters increases. If p ∼ 1, most of the occupied sites form one large
cluster that extends from one end of the lattice to the other (see Figure 9.2c). Such a cluster is
said to “span” the lattice and to be a spanning cluster. What happens for intermediate values of
p, for example between p = 0.5 and p = 0.7 (see Figure 9.2b)? We shall ﬁnd that in the limit of
an inﬁnite lattice, there exists a well deﬁned threshold probability pc such that
For p ≥ pc , one spanning cluster or path exists.
For p ≤ pc , no spanning cluster exists and all clusters are ﬁnite.
The essential characteristic of percolation is connectedness. Because the connectedness exhibits
a qualitative change at a well deﬁned value of p, the transition from a state with no spanning cluster
to a state with one spanning cluster is an example of a phase transition. From our introductory p = 0.2 p = 0.59 p = 0.8 Figure 9.2: Examples of site percolation conﬁgurations for diﬀerent values of p. CHAPTER 9. CRITICAL PHENOMENA 352 discussion of critical phenomena in Chapter 5, we know that it is convenient to deﬁne an order
parameter, a quantity that vanishes for p < pc and is nonzero for p ≥ pc . The order parameter for
percolation is P∞ , the probability that an occupied site is part of the spanning cluster. We can
estimate P∞ for a given conﬁguration from its deﬁnition:
P∞ = number of sites in the spanning cluster
.
total number of occupied sites (9.1) To calculate P∞ exactly we need to average over all conﬁgurations. For p < pc , P∞ = 0 because
there is no spanning cluster. At p = 1, P∞ has its maximum value of unity because only the
spanning cluster exists. These properties suggest that P∞ is a reasonable choice for the order
parameter.
Because p can be varied continuously, we expect that the phase transition at pc is continuous.
In the critical region near and above pc , we assume that P∞ vanishes as
P∞ ∼ (p − pc )β , (9.2) where β is an example of a critical exponent.
More information can be obtained from the cluster size distribution ns (p) deﬁned as
ns (p) = mean number of clusters of size s
.
total number of lattice sites (9.3) For p ≥ pc , the spanning cluster is excluded from ns . To get an idea of how we can calculate ns ,
we consider ns (p) for small s on the square lattice. The probability of ﬁnding a single isolated
occupied site (s = 1) is
n1 (p) = p(1 − p)4 ,
(9.4)
because the probability that one site is occupied is p and the probability that all of its four
neighboring sites are empty is (1 − p)4 . Similarly, n2 (p) is given by
n2 (p) = 4p2 (1 − p)6 . (9.5) The factor of 4 in (9.5) is due to the four possible orientations of a cluster of two particles. Because
N s sns is the total number of occupied sites not in the spanning cluster (N is the total number
of lattice sites), we have the relation
sns (p) + P (p) = p, (9.6) s where the sum is over all nonspanning clusters.
Similarly, because Nsns is the number of occupied sites in clusters of size s, the quantity
ws = sns
,
s sns (9.7) is the probability that an occupied site chosen at random is part of an ssite cluster.
At p = pc , ns scales as n s ∼ s− τ . (9.8) CHAPTER 9. CRITICAL PHENOMENA 353 The meaning of the power law relation (9.8) is that at p = pc on an inﬁnite lattice, clusters of all
sizes exist.
Many of the properties of interest are related to moments of ns . For example, the mean size
(number of particles) of the clusters is deﬁned as
S (p) = sws = s2 n s
.
s sns s s (9.9) The sum in (9.9) is over the ﬁnite clusters. The quantity S (p) behaves near pc as
S (p) ∼ (p − pc )−γ . (9.10) It is convenient to associate a characteristic length ξ with the clusters. One way to do so is
to introduce the radius of gyration Rs of a single cluster of s particles:
Rs 2 = 1
s where
r= 1
s s (ri − r)2 , (9.11) ri , (9.12) i=1
s
i=1 and ri is the position of the ith site in the cluster. The correlation length ξ is deﬁned as a weighted
average over the radius of gyration of all the ﬁnite clusters.1 The statistical weight of the clusters
of size s is the probability ws that a given site is a member of such a cluster times the number of
sites s in the cluster, that is, s2 ns (p). Hence, the average radius of gyration of all the nonspanning
clusters is given by ξ as
s2 ns Rs 2
.
(9.13)
ξ2 = s
2
s s ns
Near pc we assume that the correlation length ξ diverges as
ξ ∼ pc − p−ν . (9.14) Why do systems exhibit power law behavior as the critical point is approached? For example,
the order parameter goes to zero as a power of (pc − p)/pc for percolation and as a power of
= (Tc − T )/Tc for thermal systems. In the limit that (pc − p)/pc or becomes small, there is
no other quantity to set the scale for or (pc − p)/pc . For example, we cannot have functions
such as sin , because as goes to zero, sin reduces to . The simplest functions without a
characteristic scale are powers. Hence, we should not be surprised that various quantities show
power law behavior near the critical points.
How can we determine the values of pc and the critical exponents
Problem 9.1. Run the applet at xxx to simulate percolation of the square and triangular lattices.
Deﬁne a spanning cluster as one that connects the top and bottom of the lattice and the left
and right. Visually estimate the value of pc at which a spanning cluster ﬁrst occurs. Choose
progressively larger lattices and try to extrapolate your estimated values of pc for various lattice
dimensions L to L → ∞.
1 More precisely, ξ should be called the connectedness length. CHAPTER 9. CRITICAL PHENOMENA
lattice
linear chain
square
triangular
honeycomb
diamond
simple cubic
bcc
fcc d
1
2
2
2
3
3
3
3 354
q
2
4
6
3
4
6
8
12 pc (site)
1
0.59275 pc (bond)
1 1
2 0.3473
0.6527
0.3886(5)
0.2492(2)
0.1795(3)
0.1198(3) 0.698(3)
0.4299(8)
0.3117(3)
0.24674(7)
0.1998 1
2 Table 9.1: Summary of known results for the percolation threshold. The ﬁgure in parentheses is
the probable error in the last ﬁgure.
quantity
order parameter
mean size of ﬁnite clusters
connectedness length
cluster distribution functional form
P∞ ∼ (p − pc )β
S (p) ∼ p − pc −γ
ξ (p) = p − pc −ν
ns ∼ s−τ (p = pc ) exponent
β
γ
ν
τ d=2
5/36
43/18
4/3
187/91 d=3
0.403(8)
1.73(3)
0.89(1)
2.2 Table 9.2: Summary of known results for the static critical exponents associated with the percolation transition. The number in parentheses is the probable error in the last ﬁgure. The values of the percolation threshold depend on the symmetry of the lattice and are summarized in Table 9.1. A summary of the known values of the various critical exponents for percolation
is given in Table 9.2. For two dimensions, the static exponents are known exactly. For three dimensions no exact results are known, and the exponents have been estimated using various approximate
methods. The accuracy of the numerical values for the critical exponents is consistent with the
assumption that the exponents are independent of the symmetry of the lattice and depend only
on d.
The critical exponents for percolation satisfy a set of relations that are identical to the scaling
relations for thermal systems (see Table 9.3). For example, we will ﬁnd that
2β + γ = dν, (9.15) where d is the spatial dimension.
We would like to be able to understand these results on a deeper level than the qualitative
description we have given. To do this end we next introduce a renormalization group method for
percolation. 9.2 Renormalization Group for Percolation The important idea that is incorporated into the renormalization group is that all length scales
are important at the critical point. For percolation this idea implies that clusters of all sizes exist CHAPTER 9. CRITICAL PHENOMENA 355 Figure 9.3: The ﬁve vertically and horizontally spanning conﬁgurations for a 2 × 2 cell on a square
lattice. on an inﬁnite lattice at p = pc . If all length scales are present at p = pc , then the system looks the
same on any length scale. This property is called selfsimilarity. In contrast, this behavior is not
seen for p far from pc .
The renormalization group is a general approach for extracting quantitative information from
selfsimilarity. The approach is to average over smaller length scales and to ﬁnd how the system
is changed by such an average. The rate at which the system changes gives us information on
the critical exponents. What kind of averaging is appropriate for site percolation? The averaging
should be such that it preserves the important physics. For percolation the important physics is
associated with connectivity. Suppose that the lattice is divided into b × b cells each with b2 sites.
We adopt the rule that a cell is replaced by a single renormalized occupied site if the cell spans,
and replaced by an unoccupied site if it does not. It is not clear which spanning rule to adopt, for
example, vertical spanning, horizontal spanning, vertical and horizontal spanning, and vertical or
horizontal spanning. In the following, we will adopt horizontal and vertical spanning. In the limit
of very large cells, these diﬀerent rules should yield results for pc and ν that converge to the same
value.
Suppose that we make the approximation that each cell is independent of all the other cells and
is characterized only by the probability p that the cell is occupied. If the sites are occupied with
probability p, then the cells are occupied with probability p , where p is given by a renormalization
transformation of the form
(9.16)
p = R(p).
R(p) is the total probability that the sites form a spanning path.
An example will make the meaning of the formal relation (9.16) more clear. In Figure 9.3
we show the ﬁve vertically and horizontally spanning site conﬁgurations for a b = 2 cell. The
probability p that the renormalized site is occupied is given by the sum of all the possibilities:
p = R(p) = p4 + 4p3 (1 − p). (9.17) In general, the probability p that the renormalized site is occupied is diﬀerent than the occupation
probability p of the original sites. For example, suppose that we begin with p = p0 = 0.5. After a
single renormalization transformation, the value of p obtained from (9.17) is p1 = R(p0 = 0.5) =
0.3125. A second renormalization transformation yields p2 = R(p1 ) = 0.0934. It is easy to see that
further transformations drives the system to the trivial ﬁxed point p∗ = 0. Similarly, if we begin
with p = p0 = 0.8, we ﬁnd that successive transformations drive the system to the trivial ﬁxed
point p∗ = 1. CHAPTER 9. CRITICAL PHENOMENA 356 To ﬁnd the nontrivial ﬁxed point associated with the critical threshold pc , we need to ﬁnd the
special value of 0 < p < 1 such that
p∗ = R(p∗ ).
(9.18)
For the recursion relation (9.17) we ﬁnd that the solution of the fourth degree polynomial equation
for p∗ yields the two trivial ﬁxed points p∗ = 0 and p∗ = 1, and the nontrivial ﬁxed point
p∗ = 0.7676 that we associate with pc . This calculated value of p∗ for a 2 × 2 cell should be
compared with the best known estimate pc = 0.5927 for the square lattice. Note that p∗ is an
example of an unstable ﬁxed point because the iteration of (9.17) for p arbitrarily close to p∗ will
drive p to one of the stable ﬁxed points.
To calculate the critical exponent ν from the renormalization transformation, we note that on
the renormalized lattice all lengths are reduced by a factor of b in comparison to all lengths in the
original system. Hence ξ , the connectedness length in the renormalized lattice, is related to ξ , the
connectedness length in the original lattice, by
ξ
ξ= .
b (9.19) Because ξ (p) = constp − pc −ν for p ∼ pc and pc corresponds to p∗ , we have
p − p∗ −ν = 1
p − p∗ −ν .
b (9.20) To ﬁnd the relation between p and p near pc , we expand the renormalization transformation (9.16)
about p = p∗ and obtain to ﬁrst order
p − p∗ = R(p) − R(p∗ ) ≈ λ (p − p∗ ),
where
λ= dR
dp . (9.21) (9.22) p=p∗ We need to do a little algebra to obtain an explicit expression for ν . We ﬁrst raise the left
and right sides of (9.21) to the −ν power and write
p − p∗ −ν = λ−ν p − p∗ −ν . (9.23) We compare (9.20) and (9.23) and obtain
λ−ν = b−1 . (9.24) Finally, we take the logarithm of both sides of (9.24) and obtain the desired relation for the critical
exponent ν :
ln b
ν=
.
(9.25)
ln λ
As an example, we calculate ν for b = 2 using (9.17). We write R(p) = p4 + 4p3 (1 − p) =
−3p + 4p3 and ﬁnd λ = dR/dpp=p∗ = 12p2 (1 − p)p=0.7676 = 1.6432. We then use the relation
(9.25) to obtain
ln 2
= 1.40.
(9.26)
ν=
ln 1.6432
4 CHAPTER 9. CRITICAL PHENOMENA 357 Figure 9.4: Example of an error after one renormalization. The agreement of the numerical result (9.26) with the exact result ν = 4/3 in d = 2 is remarkable
given the simplicity of our calculation. In comparison, what would we be able to conclude if we
were to measure ξ (p) directly on a 2 × 2 lattice? However, this agreement is fortuitous because the
accuracy of our calculation of ν is not known a priori.
What is the nature of the approximations that we have made in calculating ν and pc ? Our
basic assumption has been that the occupancy of each cell is independent of all other cells. This
assumption is correct for the original sites but after one renormalization, we lose some of the
original connecting paths and gain connecting paths that were not present in the original lattice.
An example of this “interface” problem is shown in Figure 9.4. Because this surface eﬀect becomes
less important with increasing cell size, one way to improve our renormalization group calculation
is to consider larger cells.
Problem 9.2. Choose a simple cell for onedimensional percolation and show that pc = 1 and
ν = 1 exactly.
√
Problem 9.3. What are the four spanning conﬁgurations for the smallest possible cell (b = 3)
on a triangular lattice? Show that the corresponding recursion relation can be expressed as R(p) =
3p2 − 2p3 . Find p∗ and ν . The result p∗ = 1/2 is exact.
Problem 9.4. Enumerate all the possible spanning conﬁgurations for a b = 3 cell. Assume that a
cell is occupied if a cluster spans the cell vertically and horizontally. Determine the probability of
each conﬁguration and ﬁnd the renormalization transformation R(p). Then solve for the nontrivial
ﬁxed point p∗ and ν . The simplest way to ﬁnd the ﬁxed point is by trial and error using a hand
calculator or computer. Another way is to plot the diﬀerence R(p) − p versus p and ﬁnd the value
of p at which R(p) − p crosses the horizontal axis. Are your results for pc and ν closer to their
known values than for b = 2?
Problem 9.5. Instead of renormalizing the set of all spanning 3 × 3 cells to a single occupied site,
an alternative is go from cells of linear dimension b1 = 3 to cells of linear dimension b2 = 2. Use
the fact that the connectedness lengths of the two lattices are related by ξ (p2 )/ξ (p1 ) = (b1 /b2 )−1
to derive the relation
ln b1 /b2
,
(9.27)
ν=
ln λ1 /λ2
where λi = dR(p∗ , bi )/dp is evaluated at the solution p∗ of the ﬁxed point equation, R2 (b2 , p∗ ) =
R3 (b1 , p∗ ). The “celltocell” transformation yields better results in the limit in which the change
in length scale is inﬁnitesimal and is the preferred method. CHAPTER 9. CRITICAL PHENOMENA 9.3 358 The LiquidGas Transition Experimentally we know that a simple system of say, Argon atoms, is a gas at high temperatures
and a liquid at low temperatures. The van der Waals equation of state is the simplest equation of
state that exhibits a gasliquid transition. We derived its form in Section 8.9.1 and obtained:
P= N kT
− ρ2 a2 .
V − Nb (9.28) The parameter a is a measure of the attractive interactions among the particles and the parameter
b is a measure of the repulsive interactions. The latter become important when two particles come
too close to each other. In the following we rederive the van der Waals equation to emphasize its
meanﬁeld nature.
Let us assume that the intermolecular interaction consists of a hardcore repulsion and a weak
longrange interaction:
u(r) = uhc + uattr .
(9.29)
The partition function can be expressed as
Z
Zideal = dr1 · · · drN e−β P
i<j u(ri −rj ) . (9.30) We assume that uattr has the special form, uattr ∝ λd φ(λr) such that uattr dd r is a constant as
λ → 0 and d is the spatial dimension. This form implies that as the range of uattr increases, its
strength decreases. An example of such an interaction is
uattr = −caλd e−λr , (9.31) where a will be associated with the van der Waals parameter. The constant c is chosen so that
− uattr (r) dr = a. (9.32) The basic assumption of meanﬁeld theory is that the ﬂuctuating ﬁeld experienced by an
individual particle can be replaced by its average. This assumption is reasonable if a large number
of particles contribute to the ﬂuctuating ﬁeld and can be shown to be rigorously correct for a
potential of the form λd φ(λr) in the limit λ → 0.
This assumption allows us to replace the potential j uattr (ri − rj ) seen by particle i by
its average
N
N
uattr (r) dr = − a.
(9.33)
V
V
Note that the average in (9.33) is independent of ri . We now sum over all i < j and obtain
1 N2
.
uattr (ri − rj ) = − a
2V
i<j (9.34) The factor onehalf enters because of the condition i < j . Hence, we can rewrite (9.30) as
Z = Zhc eβaN 2 /2V , (9.35) CHAPTER 9. CRITICAL PHENOMENA 359 where Zhc represents the partition function for a system of hard core particles. The free energy is
given by
1 N2
,
(9.36)
F = −kT ln Z (T, V, N ) = Fhc − a
2V
and the pressure is given by
∂F
P =−
= Phc − ρ2 a.
(9.37)
∂V T
As we discussed in Section 8.9.1, van der Waals assumed that Zhc is given by
Zhc = 11
(V − N b)N .
N ! λ3N (9.38) It is straightforward to use (9.38) to obtain (9.28).
We now consider the isotherms predicted by (9.28). For ﬁxed T , P approaches inﬁnity as
V /N → b. In contrast, for an ideal gas, P becomes inﬁnite only in the limit V /N → 0. For
V /N
b, the van der Waals equation of state reduces to the ideal gas equation. Between these
two limits, the van der Waals equation exhibits behavior very diﬀerent from the ideal gas. For
example, both the ﬁrst and second derivatives of P may vanish (see Problem 9.6).
Problem 9.6. Sketch several isotherms (P versus V at constant T and N ) of the van der Waals
equation of state (9.28). The isotherms above T = Tc are monotonic, but those below Tc develop
a region of positive slope, (∂P/∂V )T > 0. Explain why the latter behavior is nonphysical. Find
the value of Tc by solving the equations
∂P
∂V T ∂2P
∂V 2 T = 0, (9.39) = 0. (9.40) and These two conditions together with (9.28) give three conditions for the three unknowns Tc , Pc , and
Vc in terms of the parameters a and b. To obtain the solution, rewrite (9.28) as a cubic equation
in V . Conﬁrm that
a2 bN 3
N kT 2 aN 2
V 3 − (N b +
)V +
V−
= 0.
(9.41)
P
P
P
On any isotherm, there should be three solutions for any value of P . At the critical point, these
three solutions merge so that
(V − Vc )3 = V 3 − 3Vc V 2 + 3Vc2 V − Vc3 = 0. (9.42) Compare the coeﬃcients of each power of V in (9.41) and (9.42) and conﬁrm that
Vc = 3N b, Pc = a2
,
27b2 kTc = 8a
.
27b (9.43) Problem 9.7 shows that the van der Waals equation can be written in the same form for all
substances. CHAPTER 9. CRITICAL PHENOMENA 360 2.0 1.5
P/Pc 1.2Tc
1.1Tc 1.0
Tc
0.9Tc 0.5 0.8Tc
0.0 0.5
0.0 0.5 1.0 1.5 2.0
2.5
V/Vc 3.0 3.5 Figure 9.5: The van der Waals isotherms near the critical temperature. Note that for T > Tc , the
isotherms have no minima or maxima. Problem 9.7. Deﬁne the dimensionless quantities
T
˜
T=
,
Tc P
˜
P=
,
Pc V
˜
V=
,
Vc (9.44) and conﬁrm that all van der Waals ﬂuids have the same form of the equation of state
˜
(P + 3
˜
˜
)(3V − 1) = 8T .
˜2
V (9.45) Equation (9.45) implies that the equation of state is the same form for all gases, which is called
the law of corresponding states. In reality, such a statement is only approximately true.
Now let us examine the conditions for which a van der Waals isotherm has minima and
˜˜
maxima. For ∂ P /∂ V to vanish, we must have
˜
f (V ) = ˜
(3V − 1)2
˜
= T.
˜
4V 3 (9.46) ˜
˜
˜
The function f (V ) vanishes at V = 1/3 and in the limit V → ∞. The derivative of f vanishes
˜
˜
for V = 1 and hence the maximum value of f (V ) is f (1) = 1. This behavior implies that (9.46)
˜
cannot be satisﬁed for T > 1 or equivalently T > Tc . For T > Tc , the van der Waals isotherms
have no minima or maxima. CHAPTER 9. CRITICAL PHENOMENA 361 Now let us consider the van der Waals isobars. From (9.45), we obtain
˜
3V − 1
˜
T=
8 3
˜
P+
.
˜2
V (9.47) ˜
For T to have a maximum or minimum requires that
2
3
˜
−
=P
˜2
˜3
V
V (9.48) ˜
Because the values of the lefthand side of (9.48) are less than or equal to one, we conclude that T
˜ > 1 (P > Pc ).
has no extrema if P
For T < Tc the isotherms have one maximum and one minimum between which the slope is
positive. What is the signiﬁcance of a positive slope? 9.4 Bethe Approximation In Section 5.6 we introduced a simple meanﬁeld approach in the context of the Ising model. The
basic approximation was to ignore all correlations between the spins, including nearest neighbors,
and approximate si sj as si sj ≈ si sj = m2 . In the following, we discuss how to improve this
approximation. This approximation method is due to Bethe.2
The idea is that instead of considering the eﬀective ﬁeld felt by a single spin, we consider
the eﬀective ﬁeld felt by a group of spins. In particular, consider a central spin and its nearest
neighbors (see Figure 9.6). The interactions of the nearest neighbors with the central spin is
calculated exactly, and the rest of the spins in the system are assumed to act on the nearest
neighbors through an eﬀective ﬁeld that we calculate selfconsistently. The energy of the central
cluster is
q Hc = −Js0 q sj − Bs0 − Beﬀ
j =1 sj (9.49a) j =1 = −(Js0 + Beﬀ ) sj − Bs0 , (9.49b) j where q is the coordination number of the lattice. (For a square lattice q = 4.) Note that the
ﬂuctuating ﬁeld acting on the nearest neighbor spins s1 , . . . , sq has been replaced by the eﬀective
ﬁeld Beﬀ .
The cluster partition function Zc is given by
e−βHc . Zc = (9.50) s0 =±1,sj =±1 We ﬁrst do the sum over s0 = ±1 and write
eβ (J +Beff ) Zc = eβB
sj =±1 Pq
j sj eβ (−J +Beff ) + e−βB Pq
j sj . (9.51) sj =±1 2 Bethe is probably better known for his work in nuclear physics and other areas. The reference is H. Bethe,
Proc. Royal Society A 150, 552 (1935). CHAPTER 9. CRITICAL PHENOMENA 362 s4 s1 s0 s3 s2 Figure 9.6: The simplest cluster on the square lattice used in the Bethe approximation. For simplicity, we ﬁrst evaluate the partition function of the cluster for the onedimensional lattice
for which q = 2. Because the two neighboring cluster spins can take the values ↑↑, ↑↓, ↓↑, and ↓↓,
the sums in (9.51) becomes
Zc = eβB [e2β (J +Beff ) + 2 + e−2β (J +Beff ) ]
+ e−βB [e2β (−J +Beff ) + 2 + e−2β (−J +Beff ) ]
= 4 eβB cosh2 β (J + Beﬀ ) + e−βB cosh2 β (J − Beﬀ ) . (9.52a)
(9.52b) For general q , it can be shown that Zc is given by
Zc = 2q eβB coshq β (J + Beﬀ ) + e−βB coshq β (J − Beﬀ ) . (9.53) The expectation value of the central spin is given by
s0 = 1∂
2q βB
ln Zc =
e coshq β (J + Beﬀ ) − e−βB coshq β (J − Beﬀ ) .
β ∂B
Zc (9.54) We also want to calculate the expectation value of the spin of the nearest neighbors sj for
j = 1, . . . , q . Because the system is translationally invariant, we require that s0 = sj and ﬁnd
the eﬀective ﬁeld Beﬀ by requiring that it satisfy this condition. From (9.51) we see that
sj = 1
∂
ln Zc .
q ∂ (βBeﬀ ) (9.55) If we substitute (9.53) for Zc in (9.55), we ﬁnd
sj = 2q βB
e sinh β (J + Beﬀ ) coshq−1 β (J + Beﬀ )
Zc
− e−βB sinh β (J − Beﬀ ) coshq−1 β (J − Beﬀ ) . (9.56) CHAPTER 9. CRITICAL PHENOMENA 363 To ﬁnd the critical temperature, we set B = 0. The requirement s0 = sj yields the
equation
coshq β (J + Beﬀ ) coshq β (J − Beﬀ ) = sinh β (J + Beﬀ ) coshq−1 β (J + Beﬀ )
− sinh β (J − Beﬀ ) coshq−1 β (J − Beﬀ ). (9.57) Equation (9.57) can be simpliﬁed by writing sinh x = cosh x − e−x with x = β (J ± Beﬀ ):
coshq−1 β (J + Beﬀ )
= e2βBeff .
coshq−1 β (J − Beﬀ ) (9.58) Equation (9.58) always has the solution Beﬀ = 0 corresponding to the disordered high temperature phase. As Beﬀ → ∞, the lefthand side of (9.58) approaches e2βJ (q−1) , a constant
independent of Beﬀ , and the righthand side diverges. Therefore, if the slope of the function on
the left at Beﬀ = 0 is greater than 2β , the two functions must intersect again at ﬁnite Beﬀ . If we
take the derivative of the lefthand side of (9.58) with respect to Beﬀ and set it equal to 2β , we
ﬁnd that the condition for a solution to exist is
coth βc J = q − 1, (9.59) where coth x = cosh x/ sinh x. Because (9.58) is invariant under Beﬀ → −Beﬀ , there will be two
solutions for T ≤ Tc .
On the square lattice, (9.59) yields kTc /J = 2.888 . . . in comparison to the exact result
kTc /J = 2.269 . . . (see (5.76)) and the result of simple meanﬁeld theory, kTc /J = 4. Note
that for the onedimensional Ising model, q = 2 and the Bethe approximation predicts Tc = 0 in
agreement with the exact result.
Better results can be found by considering larger clusters. However, although such an approach
yields more accurate phase diagrams, it always yields the same meanﬁeld exponents because it
depends on the truncation of correlations beyond a certain distance. This approximation must
break down in the vicinity of a critical point where the correlations become inﬁnite.
Problem 9.8. Work out the details of the Bethe approximation, an improved meanﬁeld method
discussed in Section 9.4, for the Ising chain. Show that the eﬀective three spin Hamiltonian can
be written as
(9.60)
H3 = −J (s1 + s2 + s3 ) − mJ (s1 + s3 ),
where m = s . (Explain why the eﬀective ﬁeld Beﬀ is given by Beﬀ = Jm.) Evaluate the partition
function for the cluster Z3 by summing over the possible values of s1 , s2 and s3 . The magnetization
per spin, m, can be found by evaluating s2 . Expand your result for m near m = 0 and show that
Tc = 0. 9.5 Landau Theory of Phase Transitions As we have discussed, the key idea of meanﬁeld theory is the neglect of correlations between the
spins or other relevant coordinates. For the Ising model we can write the spin at lattice point r as CHAPTER 9. CRITICAL PHENOMENA 364 m(r) = m + φ(r) and express the interaction term in the energy as
m(r)m(r ) = [m + φ(r)][m + φ(r )]
= m2 + m[φ(r) + φ(r )] + φ(r)φ(r ).
If we assume that φ(r) (9.61) m, we may neglect the last term in (9.61) and obtain m(r)m(r ) ≈ m2 + m[φ(r) + φ(r )] = m2 + m[m(r) + m(r ) − 2m],
= m[φ(r) + φ(r )] − m2 . (9.62a)
(9.62b) Using this approximation, we write the total energy as
−J m(r)m(r ) − B
r r,r m[m(r) + m(r )] − m2 − B m(r) −J
≈
= 1
N Jm2 − (Jm + B )
2 m(r) (9.63a) r r,r m(r). (9.63b) r Note that the spin m(r) enters only linearly in (9.63b). We evaluate the partition function as in
Chapter 5 and ﬁnd
2
1
(9.64)
Z ≈ Zmf = e− 2 N βJm [2 cosh β (Jm + B )]N .
The free energy for a given value of m in this simple meanﬁeld approximation is proportional to
−kT ln Zmf and can be expressed as
Fmf = 1
N Jm2 − N β ln cosh β (Jm + B ),
2 (9.65) apart from an irrelevant constant. Because the equilibrium free energy must be a minimum for a
given value of T and B , we can determine m by requiring that it minimize Fmf . The result is
m = tanh β (Jm + B ), (9.66) as we found in (5.94).
Problem 9.9. (a) Derive (9.66) from the requirement that F in (9.65) is a minimum in equilibrium.
(b) Expand Fmf in a power series in m and show that Fmf can be written in the Landau form
(9.67).
Landau realized that the qualitative features of meanﬁeld theory can be summarized by a
simple expression for the free energy. Because m is small near the critical point, it is reasonable
to assume that the free energy density (free energy per unit volume) can be written in the form
c
b
f (m, T ) = a + m2 + m4 − Bm,
2
4 (9.67) where the parameters a, b, and c depend on T . The assumption underlying the form (9.67) is that
f can be expanded in a power series in m about m = 0 near the critical point. Although this
assumption is not correct, Landau theory, like meanﬁeld theory in general, is still a useful tool.
For the Ising model, Landau also assumed that f (m) is symmetrical about m = 0 so that the m3 CHAPTER 9. CRITICAL PHENOMENA 365 not done Figure 9.7: The dependence of the Landau form of the free energy on the order parameter m. term can be omitted. The quantity m is called the order parameter because it is zero for T > Tc ,
is nonzero for T ≤ Tc , and characterizes the nature of the transition.
The equilibrium value of m is the value that minimizes the free energy. In Figure 9.7 we show
the dependence of f on m for B = 0. We see that if b > 0 and c > 0, then the minimum of f is at
m = 0. However if b < 0 and c > 0, then the minimum of f is at m = 0. We have for B = 0 that
∂f
= bm + cm3 = 0.
∂m (9.68) If we assume that b = b0 (T − Tc ) and c > 0, we ﬁnd
m=± b0
c 1/2 (Tc − T )1/2 . (9.69) The behavior of the speciﬁc heat can be found from the relation C = T ∂s/∂T . The entropy density
is given by
∂f
b
b
c
s=−
= −a − m2 − (m2 ) − (m4 ) ,
(9.70)
∂T
2
2
4
where the primes in (9.70) denote the derivative with respect to T . From (9.70) we have
C =T cT
ds
= −T a − T b (m2 ) −
(m4 ) ,
dT
4 (9.71) where we have used the fact that b = 0, and have assumed that c is independent of T . Because
m = 0 for T ≥ Tc , we have C → −T a as T → Tc from above. For T → Tc from below, we have
(m2 ) = b0 /c, b = b0 , and (m4 ) → 2(b0 /c)2 . Hence, we obtain
C→ +
−T a
T → Tc
2
−
−T a + T b0 /2c T → Tc (9.72) We see that the order parameter m and the speciﬁc heat C have the same behavior near Tc as we
previously obtained in our simple meanﬁeld treatment of the Ising model.
Problem 9.10. The susceptibility per spin can be found from the relation
χ= ∂m
.
∂B CHAPTER 9. CRITICAL PHENOMENA 366 (a) Show that if B = 0, bm + cm3 − B = 0 in the Landau theory, and hence χ−1 = (b + 3cm2 ). For
T > Tc , m = 0 and χ−1 = b = b0 (T − Tc ). For T < Tc , m2 = −b/c, and hence χ−1 = 2b0 (Tc − T ).
Hence Landau theory predicts that γ = γ = 1. (b) Show that at the critical point cm3 = B , and
hence δ = 3.
We can generalize Landau theory to incorporate spatial ﬂuctuations by writing
b
c
g
f (r) = a + m2 (r) + m4 (r) + [∇m(r)]2 − B m(r),
2
4
2 (9.73) where the parameter g > 0. This form of the free energy density is commonly known as the
LandauGinzburg form. The gradient term in (9.73) expresses the fact that the free energy is
raised by spatial ﬂuctuations in the order parameter. The total free energy is given by
F = f (r) d3 r, (9.74) M = m(r) d3 r. (9.75) and the total magnetization is We follow the same procedure as before and minimize the total free energy:
δF = δ m(r)[b m(r) + c m3 (r) − B ] + g ∇δm(r) · ∇m(r) d3 r = 0. (9.76) The last term in the integrand of (9.76) can be simpliﬁed by integrating by parts and requiring
that δm(r) = 0 at the surface. In this way we obtain
B (r) = b m(r) + c m(r)3 − g ∇2 m(r). (9.77) It is clear that (9.77) reduces to the usual Landau theory by letting B (r) = B and ∇m(r) = 0.
If we imagine that a localized perturbation B (r) = B0 δ (r) is applied to the system, we can
use (9.77) to calculate its eﬀect. We write
m(r) = m0 + φ(r), (9.78) and assume that the spatially varying term φ(r) is small so that m(r)3 ≈ m3 + 3m2 φ(r). We
0
0
substitute (9.78) into (9.77) and obtain
c
b
b
c
B0
∇2 φ(r) − φ(r) − 3 m2 φ(r) − m0 − m3 = − δ (r).
0
0
g
g
g
g
g (9.79) If we substitute m0 = 0 for T > Tc and m2 = −b/c for T < Tc into (9.79), we obtain
0
B0
b
∇2 φ − φ = − δ (r)
g
g
B0
b
∇2 φ + 2 φ = − δ (r).
g
g (T > Tc ) (9.80a) (T < Tc ) (9.80b) CHAPTER 9. CRITICAL PHENOMENA 367 Note that φ in (9.80) satisﬁes an equation of the same form as the one we found in the DebyeHu¨kel theory (see Section 8.12). The solution of (9.80) can be written in spherical coordinates as
c
φ(r) = B0 1 −r/ξ
e
,
4πg r (9.81) with
ξ (T ) = g
b(T ) ξ (T ) = −g
2b(T ) and 1/2 1/2 (T > Tc ) (T < Tc ) . (9.82a) (9.82b) We will see that the quantity ξ (T ) can be interpreted as the correlation length. Because b(T ) =
b0 (T − Tc ), we see that ξ diverges from both above and below Tc as
ξ (T ) ∼ T − Tc −ν . (9.83) In general, ξ ∼ T − Tc −ν ; meanﬁeld theory predicts that ν = 1/2.
The correlation function of the order parameter is given by
G(r) = m(r)m(0) − m 2 . (9.84) We can relate φ(r) to G(r) by the following considerations. Because we can write the total energy
in the form
H = H0 − m(r)B (r) d3 r,
(9.85)
we have R m(r) = s ms (r)e−β [H0,s − B (r )ms (r )d
R
−β [H0,s − B (r )ms (r )d3 r ]
se 3 r] , (9.86) where H0 is the part of H that is independent of B (r), and H0,s and ms (r) denote the values of
H0 and m(r) in state s. We see that
δ m(r)
= β m(r)m(0) − m(r) m(0)
δ B (0) = βG(r). (9.87) Because m(r) = m0 + φ(r), we also have δ m(r) /δB (0) = φ(r)/B0 . Hence, G(r) = kT φ(r)/B0
and we ﬁnd using (9.81) that
kT 1 −r/ξ
G(r) =
e
.
(9.88)
4πg r
From the form of (9.88) we recognize ξ as the correlation length in the neighborhood of Tc . At
T = Tc , b = 0, and we ﬁnd from (9.88) that G(r) ∼ 1/r. For arbitrary spatial dimension d we can
write this spatial dependence as
G(r) ∼ 1
,
rd−2+η (T = Tc ) (9.89) where we have introduced another critical exponent η . We see that Landau theory yields η = 0. CHAPTER 9. CRITICAL PHENOMENA 368 Problem 9.11. Derive the relation (9.87) between the linear response δ m(r) /δB (0) and the
spin correlation function G(r).
The existence of longrange correlations in the correlation function of the order parameter G(r)
is associated with the divergence of χ, the susceptibility per spin. As we showed in Chapter 5, χ
is related to the ﬂuctuations in M (see (5.18)):
1
[ M 2 − M 2]
N kT
1
[M − M ]2 .
=
N kT χ= (9.90a)
(9.90b) (We have deﬁned χ as the isothermal susceptibility per spin.) We write
N M− M = [si − si ], (9.91) [ si sj − si sj ] (9.92a) i=1 and
χ= = 1
N kT
1
kT N
i,j =1 N G1j , (9.92b) j =1 where Gij = si sj − si sj . We have used the deﬁnition of Gij and the fact that all lattice sites
are equivalent. The generalization of (9.92b) to a continuous system is
χ= 1
kT G(r) d3 r. (9.93) From (9.93) we see that the divergence of the susceptibility is associated with the existence of
longrange correlations.
Problem 9.12. Show that the relation (9.93) and the form (9.88) implies that χ ∼ T − Tc −1 .
Range of Validity of MeanField Theory. It is interesting that meanﬁeld theory carries
with it the seeds of its own destruction and must break down when the system is suﬃciently close
to its critical point. That is, meanﬁeld theory should be applicable if the ﬂuctuations in the
order parameter are much smaller than its mean value so that the ﬂuctuations can be ignored.
Conversely, if ﬂuctuations are larger than the mean value, meanﬁeld theory should break down.
[ m(r)m(0) − m(r) m(0) ] d3 r
m2 d3 r 1. (9.94) The condition (9.94) is known as the Ginzburg criterion and gives a criterion for the selfconsistency
of meanﬁeld theories. If we substitute G(r) from (9.88) into (9.94) and integrate over a sphere of
radius ξ , we ﬁnd
kT ξ −r/ξ
2
0.264kT ξ 2
kT ξ 2
(1 − ) =
.
(9.95)
e
4πrdr =
4πg 0
g
e
g CHAPTER 9. CRITICAL PHENOMENA 369 The Ginzburg criterion for the validity of meanﬁeld theory becomes
0.264kT ξ 2
g
or 0.063
g 4π 3 2
ξm
3
ξ m2 . (9.96) (Ginzburg criterion) (9.97) Because ξ ∼ T − Tc −1/2 and m2 ∼ (T − Tc ), we see that the magnitude of the product ξm2
0
approaches zero as T Tc and the Ginzburg criterion will not be satisﬁed for T suﬃciently close
to Tc . Hence, meanﬁeld theory contains the seeds of its own destruction and must break down
when the system is suﬃciently close to its critical point. However, there exist some systems, for
example, superconductivity as described by BCS theory, for which ξ is very large and (9.97) is
satisﬁed in practice down to temperature diﬀerences as small as T − Tc  ∼ 10−14 .
Problem 9.13. We can generalize the results of meanﬁeld theory to arbitrary spatial dimension
d. Equation (9.80) can be generalized to arbitrary d by replacing the operator ∇2 by the analogous
ddimensional operator. The general solution is not as simple as (9.5), but has the asymptotic form
e−r/ξ
.
(9.98)
r d −2
Generalize the Ginzburg criterion (9.94) to arbitrary d and show the inequality is satisﬁed if
dν − 2β − 2ν , or
d > 2 + 2β/ν.
(9.99)
G(r) ∼ Ignore all numerical factors. Because meanﬁeld theory yields β = 1 and ν = 1 , we ﬁnd that
2
2
meanﬁeld theory is valid for all T near Tc if d > dc = 4. At d = dc , the upper critical dimension,
there are small logarithmic corrections to the meanﬁeld critical exponents. That is, near the
critical point, meanﬁeld theory is only reliable for dimensions greater than four. 9.6 Other Models of Magnetism The Ising model is an excellent model for testing methods of statistical mechanics, and there are
still aspects of the model that are of much current interest.3 Many variations of the Ising model
and the Heisenberg model provide the basis for models of magnetic phenomena. One interesting
limit is S → ∞, corresponding to classical spins. In this limit the direction of S is not quantized. It
is convenient to replace the spin S by a unit vector s = S/S and redeﬁne the coupling constant such
that the old J is replaced by J/S 2 . The factor of S 2 cancels, and we are left with the Hamiltonian
H = −J si · sj . (classical Heisenberg model) (9.100) If the spins can point in any direction in threedimensional space, this model is referred to as
the classical Heisenberg model or sometimes the O(2) model from group theory. If the spins are
restricted to stay in a plane, then the model is called the planar or xy model. The properties
3 The existence of many papers on the Ising model has led some workers in statistical mechanics to dub the Ising
model as the “fruitﬂy” of statistical mechanics. CHAPTER 9. CRITICAL PHENOMENA 370 of these models depends on the dimension of the lattice. For example, in three dimensions the
classical Heisenberg model has a phase transition between a low temperature ferromagnetic state
and a high temperature paramagnetic state. However, on a twodimensional lattice the classical
Heisenberg model has no phase transition and is always in the disordered paramagnetic state except
at T = 0. This result is related to the issue of conﬁnement of quarks in the theory of quantum
chromodynamics.
The planar model has some very interesting properties on a twodimensional lattice. Although
there is no spontaneous magnetization, there nevertheless is a phase transition between a low
temperature quasiordered phase and a high temperature disordered phase. The phase transition
is caused by the unbinding of spin vortex pairs.
We also can include the eﬀects of anisotropy by replacing J Si · Sj by Jx Sx,i Sx,j + Jy Sy,i Sy,j +
Jz Sz,i Sz,j , and giving Jx , Jy , and Jz diﬀerent values. An important special case is Jx = Jy = Jz .
Interesting models can be created by extending the interactions between the spins beyond nearest
neighbors, and adjusting the interaction to be diﬀerent for diﬀerent neighbors and for diﬀerent
distances. An example is the ANNNI model or antiferromagnetic nextnearest neighbor model. In
this model the nearest neighbors have ferromagnetic couplings, but nextnearest neighbors have
antiferromagnetic couplings. We also can give the coupling Ji,j for each pair of spins, i and j , a
random value. Such a model is called a spin glass and is currently an area of active research. Other
extensions include adding a random magnetic ﬁeld at each site to favor a particular direction of
the spin (the random ﬁeld Ising model).
In addition to magnetic systems, the above models are useful for studying other systems. For
example, “up” and “down” in the Ising model can represent A and B atoms in an AB alloy. In
this case the model is referred to as a lattice gas. The principal diﬀerence is that is that the net
spin or magnetization (or total number of A atoms or total number of gas particles) is ﬁxed. This
distinction means that a diﬀerent ensemble is appropriate.
Superﬂuids can be modeled by the planar model near the superﬂuid phase transition. The
reason is that the spinspin interaction in the planar model can be expressed as cos(θi − θj ), where
θi is an angle that goes from 0 to 2π . In superﬂuids the order parameter is described by a quantum
mechanical phase that also is an angle.
Another extension to the Heisenberg model is the q state Potts model for which there are q
discrete states at each site; the energy is −J if two neighboring sites are in the same state and
zero otherwise. A related extension is the q state clock model, where the spin is restricted to point
in a twodimensional plane at q equally spaced angles. These models are useful for modeling the
behavior of atoms absorbed on surfaces, when there are q possible types of sites on the surface at
which the atom can stick.
Similar spin models are being extended to understand the ﬁeld theories used in high energy
physics. The principal diﬀerence is that the interactions are typically four spin interactions between
spins located at the corners of the square plaquettes of the lattice. The planar model turns out to
represent electromagnetism. We can obtain the other fundamental interactions by invoking more
complicated spins that become matrices instead of simple vectors. CHAPTER 9. CRITICAL PHENOMENA 9.7 371 Universality and Scaling Relations For convenience, we summarize the standard static critical exponents in Table 9.3. The quantity
= (Tc − T )/Tc . The notation C ∼  −α means that C has a singular contribution proportional
to  −α .
heat capacity
order parameter
susceptibility
equation of state ( = 0)
correlation length
power law decay at = 0 C ∼  −α
m∼ β
χ ∼  −γ
m ∼ B −1/δ
ξ ∼  −ν
G(r) ∼ 1/rd−2+η Table 9.3: Summary of the deﬁnition of the standard critical exponents expressed in the language
of magnetic systems.
The above deﬁnitions of the critical exponents implicitly assume that the singularities are the
same whether the critical point is approached from above or below. (The exception is m which is
zero for T > Tc .) In the following, we will not bother to write   instead of .
The importance of the critical exponents is that they are universal, that is, they depend only
on the spatial dimension of the lattice d and the number of components of the order parameter,
and do not depend on the details of the interactions. For example, the critical exponents for the
Ising model are the same as those for the liquidgas critical point. The standard universality classes
correspond to the scalar, planar, and threedimensional vector order parameter for which n = 1,
n = 2, and n = 3, respectively. Examples of n = 1 are the Ising model, the lattice gas model,
and the liquidgas transition. Examples of n = 2 are planar ferromagnets, the superﬂuid, and
ordinary superconductivity. The latter two cases correspond to n = 2 because the order parameter
is complex. The case n = 3 corresponds to the Heisenberg model.
We will ﬁnd in the following that only two of the above six critical exponents are independent.
The exponents are related by the scaling relations which are summarized in Table 9.4.
Fisher
Rushbrooke
Widom
Josephson γ = ν (2 − η )
α + 2β + γ = 2
γ = β (d − 1)
νd = 2 − α Table 9.4: Examples of simple scaling relations between the critical exponents. The essential physics of the critical point is that the correlation length ξ is the only characteristic length of the system. This assumption is known as the “scaling hypothesis,” although
now it is more than a hypothesis. A simple way to obtain the above scaling relations is to use
dimensional analysis and assume that a quantity that has dimension L−d is proportional to ξ −d
near the critical point. CHAPTER 9. CRITICAL PHENOMENA 372 Because the quantity βF is dimensionless and proportional to N , we have that βF/V has
dimensions
[βf ] = L−d .
(9.101)
Similarly the correlation function G(r) depends on L according to
[G(r)] = L2−d−η . (9.102) By deﬁnition G(r) has the same dimension as m2 so
[m] = L(2−d−η)/2 . (9.103) [kT χ] = L2−η . (9.104) If we use the relation (9.90b), we have Finally, because M = −∂F/∂B (see (5.16)), we have
[B/kT ] − L(2+d−η)/2 . (9.105) We now replace the L in the above formulae by ξ and let ξ ∼ −ν . Because the dimensions of
the heat capacity are the same as βf in (9.101), we obtain 2 − α = νd. We leave it as an exercise
for the reader to obtain the other scaling relations (see Problem 9.14).
Problem 9.14. Use the above dimensional analysis arguments to obtain the relations −ν (2 − d −
η )/2 = β , −ν (2 − η ) = −γ , and ν (2 + d − η )/2 = βδ . Then do some simple algebra to derive the
Rushbrooke and Widom scaling laws. 9.8 The Renormalization Group and the 1D Ising Model Although the onedimensional Ising model does not have a critical point, the application of the
renormalization group method has much pedagogical value and is the only case in which a simple
grouping of spins can be carried out analytically. The Hamiltonian for the Ising chain with periodic
boundary conditions is (see (5.57))
N H = −J N 1˜
si si+1 − B
(si + si+1 ).
2 i=1
i=1 (9.106) ˜
It is convenient to deﬁne the dimensionless parameters K = βJ and B = β B . For periodic
boundary conditions the partition function can be written as
N exp Z=
{s} i=1 1
K si si+1 + B (si + si+1 ) .
2 (9.107) The sum in (9.107) is over all possible spin conﬁgurations. For simplicity, we ﬁrst consider B = 0. CHAPTER 9. CRITICAL PHENOMENA 373 One way to obtain a renormalized lattice is to group sites into cells. Another way to reduce
the number of degrees of freedom is to average or sum over them. For example, for the d = 1 Ising
model we can write Z as
eK (s1 s2 +s2 s3 ) eK (s3 s4 +s4 s5 )... . Z (K, N ) = (9.108) {s} The form (9.108) suggests that we sum over even spins:
eK (s1 +s3 ) + e−K (s1 +s3 ) × eK (s3 +s5 ) + e−K (s3 +s5 ) × · · · Z (K, N ) = (9.109) odd spins This method of reducing the degrees of freedom is called decimation.
We next try to write the partially averaged partition function (9.109) in its original form with
N/2 spins and, in general, a diﬀerent interaction K . If such a rescaling were possible, we could
obtain a recursion relation for K in terms of K . We require that
eK (s1 +s3 ) + e−K (s1 +s3 ) = A(K ) eK s1 s3 , (9.110) where the function A does not depend on s1 or s3 . If the relation (9.110) exists, we can write
A(K ) eK Z (K, N ) = s1 s3 A(K ) eK s3 s5 ..., (9.111a) s1 ,s3 ,s5 ... = [A(K )]N/2 Z (K , N/2). (9.111b) In the limit N → ∞, we know that ln Z is proportional to N , that is,
ln Z = N f (K ), (9.112) where f (K ) depends on K and is independent of N . From (9.111b) and (9.112) we obtain
N
N
ln A(K ) + ln Z (K , )
2
2
N
N
=
ln A(K ) + f (K ),
2
2 ln Z = N f (K ) = (9.113a)
(9.113b) or
f (K ) = 2f (K ) − ln A(K ). (9.114) We can ﬁnd the form of A(K ) from (9.110). Recall that (9.110) holds for all values of s1 and
s3 . We ﬁrst consider the cases s1 = s3 = 1 and s1 = s3 = −1 for which
e2K + e−2K = A eK . (9.115) For the case s1 = 1 and s3 = −1 or s1 = −1 and s3 = 1, we have
2 = A e −K . (9.116) From (9.116) we have A = 2eK , and hence from (9.115), we obtain
e2K + e−2K = 2e2K , (9.117) CHAPTER 9. CRITICAL PHENOMENA 374 K=∞
T=0 K=0
T=∞ Figure 9.8: Renormalization group ﬂow diagram for onedimensional Ising model in zero magnetic
ﬁeld. or 1
ln cosh(2K ) .
2
From (9.116) we ﬁnd that A(K ) is given by
K = R(K ) = (recursion relation) A(K ) = 2 cosh1/2 (2K ). (9.118) (9.119) We can use the form of A(K ) in (9.119) to rewrite (9.114) as
f (K ) = 2f (K ) − ln[2 cosh1/2 (2K )]. (free energy per spin) (9.120) Equations (9.118) and (9.120) are the essential results of the renormalization group analysis.
Because 1 ln cosh(2K ) ≤ K , the successive use of (9.118) leads to smaller values of K and
2
smaller values of the correlation length. Thus K = 0 or T = ∞ is a trivial ﬁxed point (see
Figure 9.8). This behavior is to be expected because the Ising chain does not have a phase
transition at nonzero temperature. As an example, suppose we start with K = 10 corresponding
to a low temperature. The ﬁrst iteration gives K = 9.65 and further iterations lead to K = 0.
Because any system with K = 0 ultimately renormalizes to K = 0, we conclude that every point
for K > 0 is in the same phase. Only exactly at zero temperature is this statement not true. We
say that there are two ﬁxed points; the one at T = 0 is unstable because any perturbation away
from T = 0 is ampliﬁed. The ﬁxed point at T = ∞ is stable. The renormalization group ﬂows go
from the unstable ﬁxed point to the stable ﬁxed point as shown in Figure 9.8.
Because a nontrivial ﬁxed point of (9.118) between T = 0 and T = ∞ does not exist, the
recursion relation is reversible, and we can follow the transformation backwards starting from
K ≈ 0 (T = ∞) and going to K = ∞ (T = 0). The advantage of starting from T ≈ ∞ is that we
can start with the exact solution for K = 0 (T = ∞) and iterate the equation to higher values of
K . To ﬁnd the corresponding recursion relation that works in this direction we solve (9.118) for
K in terms of K . Similarly, we solve (9.120) to ﬁnd f (K ) in terms of f (K ). The result is
1
cosh−1 (e2K ),
2
1
1
1
f (K ) = ln 2 + K + f (K ).
2
2
2
K= (9.121)
(9.122) The relations (9.121) and (9.122) can be used to calculate f (K ). Suppose we begin with
K = 0.01. Because this value of K is close to zero, the eﬀect of the spinspin interactions is very
small, and we can take Z (K = 0.01, N ) ≈ Z (K = 0, N ) = 2N . From (9.112) we have
f (K = 0.01) ≈ ln 2 ≈ 0.693147. (9.123) CHAPTER 9. CRITICAL PHENOMENA
K
0.01
0.100334 K
0.100334
0.327447 375
f (K )
0.693147
0.698147 f (K )
0.698147
0.745814 Table 9.5: Summary of the ﬁrst few steps of the calculation of f , the free energy per spin, for the
d = 1 Ising model from the recursion relations (9.121) and (9.122). The function f is related to
the partition function Z by ln Z = N f (see (9.112)). Given K = 0.01, we calculate K from (9.121) and obtain the result K = 0.100334. The value of
f (K ) for this value of K is found from (9.122) to be 0.698147. This calculation of f (K ) and K is
the ﬁrst step in an iterative procedure that can be repeated indeﬁnitely with K chosen to be the
value of K from the prior iteration and f (K ) chosen to be the value of f (K ) from the previous
iteration. The ﬁrst few iterations are shown in Table 9.5.
Problem 9.15. Extend the calculation of f (k ) in Table 9.5 to larger values of K by doing several
more iterations of (9.121) and (9.122). Also calculate the exact value of the free energy for the
calculated values of K using (5.28) and compare your results to f (K ). Because the recursion
relations (9.121) and (9.122) are exact, the only source of error is the ﬁrst value of f . Does the
error increase or decrease as the calculation proceeds?
∗ Problem 9.16. For nonzero magnetic ﬁeld show that the function A(K, B ) satisﬁes the relation:
2eB (s1 +s3 )/2 cosh[K (s1 + s3 ) + B ] = A(K, B ) e2f +K s1 s3 + 1 B (s1 +s3 )
2 , (9.124) Show that the recursion relations for nonzero magnetic ﬁeld are
1 cosh(2K + B ) cosh(2K − B )
ln
,
4
cosh2 B
1 cosh(2K + B )
,
B = B + ln
2 cosh(2K − B ) K= (9.125a)
(9.125b) and
f (K, B ) = 1
ln 16 cosh(2K + B ) cosh(2K − B ) cosh2 B
8 (9.125c) The recursion relations (9.125) have a line of trivial ﬁxed points satisfying K ∗ = 0 and arbitrary
B ∗ , corresponding to the paramagnetic phase, and an unstable ferromagnetic ﬁxed point at K ∗ =
0, B ∗ = 0.
Justify the relation
1
Z (K, B, N ) = eN f (K,B ) Z (K , B , N ).
2
∗ (9.126) Problem 9.17. Transfer matrix method. As shown in Section 5.4.4, the partition function for
the N spin Ising chain can be written as the trace of the N th power of the transfer matrix T. We
can reduce the number of degrees of freedom by describing the system in terms of twospin cells.
We write Z as
N/2
N/2
= trace T
.
(9.127)
Z = trace TN = trace (T2 ) CHAPTER 9. CRITICAL PHENOMENA 376 The transfer matrix for twospin cells, T2 , can be written as
T2 = TT = e2K +2B + e−2K
e B + e −B
−B
B
2K −2B
e
+e
e
+ e−2K . (9.128) In (9.128) we have written K = βJ and B represents the product βB as we did earlier. We require
that T have the same form as T:
T =C e K +B e − K
e −K e K −B . (9.129) A parameter C must be introduced because matching (9.128) with (9.129) requires matching
three matrix elements, which in general is impossible with only two variables, K and h . We have
three unknowns to satisfy the three conditions:
CeK eB = e−2K + e2K e2B
Ce −K =e −B +e −B CeK e−B = e−2K + e2K e−2B . (9.130a)
(9.130b)
(9.130c) Show that the solution can be written as
e −B + e B
,
e−2K −B + e2K +B
e−2K + e2K −2B
e−2B = −2K
,
e
+ e2K +2B
4C = (e4K + e−4K + e−2B + e2B )(2 + e−2B + e2B ). e−2K = (9.131a)
(9.131b)
(9.131c) Also show that the recursion relations in (9.131) reduce to (9.118) for B = 0. For B = 0, start
from some initial state K0 , B0 and calculate a typical renormalization group trajectory. To what
phase (paramagnetic or ferromagnetic) does the ﬁxed point correspond? 9.9 The Renormalization Group and the TwoDimensional
Ising Model For simplicity, we consider B = 0 so that there is only one coupling constant K = βJ . As
pointed out by Wilson, there is no recipe for constructing a renormalization group transformation,
and we will consider only one possible approach. In particular, we consider the majority rule
transformation developed by Niemeijer and van Leeuwen for the ferromagnetic Ising model on the
triangular lattice. This approach is known as a realspace method, because the renormalization
group transformation is applied directly to the spins on the lattice.
The idea is to divide the original lattice into cells and replace the site spins si = ±1 by the
renormalized cell spins µα = ±1. The Latin indices i and j denote the original lattice sites and
the Greek indices α and β denote the renormalized cell spins. As shown in Figure 9.9, we take
the sites of the original triangular lattice and group them into cells or blocks of three. The cells CHAPTER 9. CRITICAL PHENOMENA 377 Figure 9.9: Cell spins on the triangular lattice. The solid lines indicate intracell interactions; the
dotted lines show the intercell interactions. form a√
triangular lattice with a lattice constant a =
is b = 3. √
3a so that the length rescaling parameter We write the original Hamiltonian in the form
H = βH = −K si sj , (9.132) ij and the partition function in the form
e−H({s}) , Z (K ) = (9.133) {s} where K = βJ . We have incorporated the factor of β into the Hamiltonian and have written
H = βH .
The new Hamiltonian for the renormalized lattice can be written as
H = H0 + V , (9.134) where H0 represents the sum of all the interactions between spins within the same cell, and V is
the interaction of spins between diﬀerent cells. We write
H0 = −K si sj ,
α i,j ⊂α (9.135) CHAPTER 9. CRITICAL PHENOMENA 378 where the sum over α represents the sum over cells. The restricted sum in (9.135) (denoted by a
prime) is over conﬁgurations of the original lattice that are consistent with a given set of cell spins
√
µ. For b = 3, the Hamiltonian for cell α has the form
H0,α = −K (s1,α s2,α + s1,α s3,α + s2,α s3,α ).
We write V as V = −K si sj .
α,β
α= β (9.136)
(9.137) i⊂α j ⊂β The replacement of the original site spins by cell spins leads in general to a Hamiltonian that
does not have the same form as (9.132). That is, the new Hamiltonian involves interactions between
cell spins that are not nearest neighbors. Nevertheless, we assume that the new Hamiltonian has
the same form:
µα µβ .
(9.138)
Fr + H = − K
αβ The term Fr in (9.138) is independent of the cell spin conﬁgurations. The representation (9.134)–
(9.137) is exact.
In the following, we treat the interactions of the spins within the cells exactly and the interactions between the cells approximately. We will obtain a recursion relation
K = R(K ), (9.139) and a nontrivial ﬁxed point K ∗ such that
λK = ∂K
∂K , (9.140) K∗ and
ν= ln λK
,
ln b (9.141) where b is the length rescaling parameter.
The renormalized Hamiltonian is given formally by
e−Fr −H = P (µ, s) e−(H0 +V ) , (9.142) {s} where the operator P (µ, s) transforms the original three spins to the cell spin and implements the
majority rule so that the renormalized cell spin equals the sign of the sum of the site spins in the
cell. Formally, we can write P for cell spin α as
P (µα , s) = δ µα − sgn(s1 + s2 + s3 ) . (9.143) Because we need to treat the interaction between the cell spins approximately, we introduce
an average over the original spin variables with respect to H0 :
A 0 = {s} A(s) P (µ, s) e−H0 (s) {s} P (µ, s) e−H0 (s) . (9.144) CHAPTER 9. CRITICAL PHENOMENA
We write 379 e−Fr e−H = P (µ, s)e−(H0 +V ) , (9.145) {s} and multiply the top and bottom of (9.145) by Z0 , the partition function associated with H0 :
e−Fr e−H = P (µ, s) e−H0 {s} P (µ, s) e−(H0 +V ) {s} {s} P (µ, s) e−H0 = Z0 e−V 0 . (9.146a)
(9.146b) We write Z0 = z (µ)N , where N = N/bd is the number of cells, and z (µ) is the sum over the
internal spin states for one cell for a given orientation of µ. The average e−V 0 is with respect to
the intracell Hamiltonian H0 . We take the logarithm of both sides of (9.146b) and obtain
Fr + H = −N ln z − ln e−V 0 . (9.147) Then we can identify
fr = N
1
Fr
=
ln z = d ln z,
N
N
b (9.148) and
H = − ln e−V 0 . (9.149) Before we evaluate the average in (9.149), we ﬁrst calculate z and fr . The sum over the spins
in a given cell for µ = 1 can be written as
eK (s1 s2 +s2 s3 +s3 s1 ) . z (µ = 1) = (9.150) {s} The restricted sum over the four states s1 , s2 , s3 =↑↑↑, ↑↑↓, ↑↓↑, ↓↑↑ (see Figure 9.10) gives
z (µ = 1) = e3K + 3 e−K . (9.151) From (9.148) we have
1
ln(e3K + 3 e−K ).
(9.152)
3
In the absence of a magnetic ﬁeld, it is clear that the sum for µ = −1 gives the same value for z
(see Problem 9.18).
fr = Problem 9.18. Calculate z (µ = −1) and show that z (µ = +1) = z (µ = −1).
The diﬃcult part of the calculation is the evaluation of the average e−V 0 . Our approach
will be to evaluate it approximately by keeping only the ﬁrst cumulant. Because the cumulant
expansion is essentially a power series in K = βJ , it is reasonable to assume that the series
converges given that Kc ≈ 0.275 for the triangular lattice. We write
ln e −V ∞ 0 = (−1)n
Mn ,
n!
n=1 (9.153) CHAPTER 9. CRITICAL PHENOMENA 380 Figure 9.10: Three spin cluster on the triangular lattice. and keep only the ﬁrst cumulant
M1 = V 0 . (9.154) The ﬁrst approximation to the intercell interaction V can be written as (see Figure 9.11)
Vαβ = −Ks1,α [s2,β + s3,β ]. (9.155) Note that V (1) in (9.155) includes only the interaction of two nearest neighbor cells and that this
approximation does not preserve the symmetry of the triangular lattice. However, this approximation is consistent with our assumption that the renormalized Hamiltonian has the same form
as the original Hamiltonian. Because H0 does not couple diﬀerent cells, we have
0 = −2K s1,α s2,β 0 , (9.156a) = −2K s1,α Vαβ (9.156b) 0 s2,β 0 . (The factor of 2 in (9.156) arises from the fact that s2,β = s3,β .)
From (9.156b) we see that we need to ﬁnd s1,α 0 . Suppose that µα = 1. The four states
consistent with this condition are shown in Figure 9.10. It is easy to see that
s1,α 0 = 1
z s1 eK (s1 s2 +s2 s3 +s3 s1 ) (9.157a) {s} 1
+ 1e3K + 1e−K + 1e−K − 1e−K
z
1
(µα = +1).
= [e3K + e−K ].
z
= (9.157b)
(9.157c) Similarly, we can show that
s1,α 0 = 1
1
[−e3K − e−K ] = − [e3K + e−K ]
z
z (µα = −1) (9.158) Hence, we can write
s1,α 0 = 1 3K
[e + e−K ]µα .
z (9.159) 0 = −2Kf (K )2µα µβ , (9.160) From (9.156b) and (9.159) we have
Vαβ CHAPTER 9. CRITICAL PHENOMENA 381
1 β
3 2 1 α
3 2 Figure 9.11: The couplings between nearestneighbor cell spins Vαβ . where
f (K ) = e3K + e−K
.
e3K + 3e−K Vαβ = −2Kf (K )2 (9.161) We write
V 0 = 0 αβ µα µβ (9.162a) αβ = −K µα µβ . (9.162b) αβ Note that V 0 has the same form as the original nearest neighbor interaction with a renormalized
value of the interaction. If we compare (9.160) and (9.162b), we ﬁnd the recursion relation
K = R(K ) = 2Kf (K )2 , (9.163) H = −K (9.164) and
µα µβ .
αβ Because f (K = 0) = 1/2 and f (K = ∞) = 1, it is easy to see that there are trivial ﬁxed
√
points at K ∗ = 0 and K ∗ = ∞. The nontrivial ﬁxed point occurs at f (K ) = 1/ 2 or at
K∗ = √
1
ln(2 2 + 1) ≈ 0.3356.
4 (9.165) CHAPTER 9. CRITICAL PHENOMENA 382 The exact answer for Kc for the triangular lattice is Kc =
λK = dK
dK K =K ∗ 1
3 ln 3 = 0.2747. We also have = 1.624, (9.166) and hence
ν= ln 1.624
√ ≈ 0.882.
ln 3 (9.167) For comparison, the exact result is ν = 1.
Problem 9.19. Conﬁrm the above results for K ∗ , λK , and ν .
We can extend the renormalization group analysis by considering higher order cumulants. The
second order cumulant introduces two new interactions that are not in the original Hamiltonian.
That is, the cell spins interacts not only with nearestneighbor cell spins, but also with second and
third neighbor cell spins. Hence, for consistency we have to include in our original Hamiltonian
second and third neighbor interactions also. Fortunately, good results can usually be found by
stopping at the second cumulant. More details can be found in the references. 9.10 Vocabulary Landau theory
meanﬁeld critical exponents
Ginzburg criterion
scaling relations, universality
percolation, connectivity
cluster, spanning cluster
recursion relation
decimation
cell spins
real space renormalization group approach 9.11 Additional Problems Problem 9.20. Calculate the temperature and density dependence of the compressibility of a gas
assuming that it satisﬁes the van der Waals equation of state (8.154). CHAPTER 9. CRITICAL PHENOMENA 383 ∗ Problem 9.21. We can generalize the procedure of Problem 9.3 to large lattices. Consider the
seven site cell shown in Figure 9.12 and assume that the cell is occupied if the majority of its sites
are occupied. Show that the recursion relation is
p = R(p) = 35p4 − 84p5 + 70p6 − 20p7 . (9.168) Show that (9.168) has a nontrivial ﬁxed point at p∗ = 0.5 and that the correlation length exponent
is given by
ln 7
≈ 1.243.
(9.169)
ν=
2 ln 7
3 Figure 9.12: The seven site cell considered in Problem 9.21. Problem 9.22. Apply the same approach as in Problem 9.8 to the Ising model on a triangular
lattice and consider the three spin cluster, examples of which are shown in Figure 9.10. How many
neighbors does each boundary spin have? Show that the eﬀective three spin Hamiltonian can be
written as
(9.170)
H3 = −J (s1 s2 + s1 s3 + s2 s3 ) − 4mJ (s1 + s2 + s3 ),
where m = s . Then show by summing over the possible combinations of s1 , s2 , and s3 that the
partition function for the cluster can be written as
Z3 = 2e3βJ cosh 12βJm + 6e−βJ cosh 4βJm. (9.171) Verify that the magnetization is given by
m= e3βJ sinh 12βJm + e−βJ sinh 4βJm
.
e3βJ cosh 12βJm + 3e−βJ cosh 4βJm (9.172) To ﬁnd the critical temperature, assume that m is small and expand (9.172) for small m. Show
that the result can be written as
1 = 4βJ (3e3βJ + e−βJ )
(3e4K + 1)
= 4K 4K
,
3βJ + 3e−βJ )
(e
(e + 3) (9.173) CHAPTER 9. CRITICAL PHENOMENA 384 where K = βJ . Show numerically that the value of K that solves (9.173) is K = Kc ≈ 0.177.
Compare this value of βc J to the exact value βc HTc = 1 ln 3 ≈ 0.366 for the triangular lattice
3
and the simple meanﬁeld value (see Table 5.2). As an example of the fact that all meanﬁeld
approximations give the same critical exponents, show that β = 1 .
2
Problem 9.23. Another way to express the scaling hypothesis is to assume that for B = 0, G(r)
near = 0 has the form
1
(9.174)
G(r) ∼ d−2+η ψ± (r/ξ ),
r
where ψ± is an unspeciﬁed scaling function. Use (9.174) and the relation (9.93) to obtain Fisher’s
scaling law, γ = ν (2 − η ).
∗ Problem 9.24. Develop an improved meanﬁeld theory for the Ising model on a square lattice
based on a four spin cluster. ∗ Problem 9.25. We can apply meanﬁeld ideas to percolation. It is easier to do so in the context
of bond percolation. In bond percolation we place bonds between lattice sites with probability p
and we call any set of sites that are connected to one another by bonds a cluster. The critical
bond probability pc occurs when there is a spanning cluster.
Let P∞ be the probability that a randomly chosen bond forms part of the inﬁnite cluster. A
given bond will form part of such a cluster only if it has at least one neighboring bond. That is,
a bond will not form part of an inﬁnite cluster if its neighboring bonds are themselves not part
of the cluster. This requirement gives us a relation connecting the probabilities that neighboring
bonds are part of the inﬁnite cluster. If we assume that these probabilities are the same for all
bonds (a meanﬁeld assumption), we require that
q (1 − pP∞,j ), 1 − P∞,i = (9.175) j =1 where P∞,i is the probability that bond i belongs to the inﬁnite cluster. However, because we have
assumed that all bonds see the same local environment, we have that P∞,i = P∞ and hence
1 − P∞ = (1 − pP∞ )q . (9.176) Work out the graphical solution of (9.176) and show that pc = 1/q . If we set P∞ ∼ (p − pc )β , what
is the meanﬁeld value of β for percolation?
∗ Problem 9.26. The method we used in Section 9.8 for the onedimensional Ising model is an
example of a decimation method. In this problem, we apply another decimation method to the
twodimensional Ising model on the square lattice in the absence of an external magnetic ﬁeld. The
MigdalKadanoﬀ approximation consists of two steps that are illustrated in Figure 9.13. First the
sites denoted by an × in Figure 9.13a are removed. To compensate for this removal, the remaining
bonds are doubled in strength so that the new Hamiltonian becomes
βH = −2K si sj .
i,j (9.177) CHAPTER 9. CRITICAL PHENOMENA K
K 385 2K 2K x x x K' 2K K' 2K x (a) (b) (c) Figure 9.13: An example of the decimation of half of the spins on a square lattice. The prime indicates that the new Hamiltonian is deﬁned on the lattice shown in Figure 9.13b.
The main virtue of this approximation is that it makes the renormalization easier. Then sum
over the spins on the sites indicated by open circles in Figure 9.13b to obtain the lattice shown in
Figure 9.13c. Show that the new Hamiltonian has the same form as the original Hamiltonian with
the renormalized nearest neighbor interaction K :
H = −K µi µj , (9.178) ij and that the partition function diﬀers from the original one only by a constant factor. Let x = 2K
and x = 2K and show that the resulting recursion relation can be written as
x= 12
(x + x−2 ).
2 (9.179) For the onedimensional Ising model we found that the ﬁxed points are the trivial ones at
K ∗ = 0 and K ∗ = ∞. Show that in addition to these trivial ﬁxed points, there is a new ﬁxed point
of (9.179) at x = x∗ . Show that this ﬁxed point is unstable by showing that λ = dx /dxx =x∗ > 1.
What is the corresponding value of the exponent ν ? (Problem adapted from Huang, page 466.) Suggestions for Further Reading
David Chandler, Introduction to Modern Statistical Mechanics, Oxford University Press (1987).
Chapter 5 has a clear explanation of the renormalization group method.
R. J. Creswick, H. A. Farach, and C. P. Poole, Jr., Introduction to Renormalization Group
Methods in Physics, John Wiley & Sons (1992).
Kerson Huang, Statistical Mechanics, second edition, John Wiley & Sons (1987).
Leo P. Kadanoﬀ, Statistical Physics: Statics, Dynamics, and Renormalization, WorldScientiﬁc
(2000).
H. J. Maris and L. P. Kadanoﬀ, Am. J. Phys. 46, 652–657 (1978). CHAPTER 9. CRITICAL PHENOMENA 386 Michael Plischke and Birger Bergersen, Equilibrium Statistical Physics, second edition, World
Scientiﬁc (1994). An excellent text for an introductory graduate level course. Chapter 10 Introduction to ManyBody
Perturbation Theory
c 2006 by Harvey Gould and Jan Tobochnik
2 January 2006
We introduce the language of second quantization in the context of quantum many body systems
and treat the weakly interacting Bose gas at low temperatures. 10.1 Introduction As we saw in Chapter 8, it is diﬃcult to treat the interparticle interactions in a classical many
body system of particles. As might be expected, the analysis of the analogous quantum system is
even more diﬃcult.
Just as we developed the density expansion of a classical gas by doing perturbation theory
about the ideal gas, we will ﬁrst treat an interacting manybody quantum system by starting from
the single particle approximation. We know that the wave function Ψ(r1 , r2 , . . . , rN ) for a system of
N identical interacting particles can be expanded in terms of the wave function Φ(r1 , r2 , . . . , rN ) of
the noninteracting system. The wave function Φ is given in terms of suitably symmetrized products
of the single particle eigenfunctions φ(ri ). If we adopt periodic boundary conditions, φk (r) is given
by
1
(10.1)
φk (r) = 3/2 eik·r ,
L
where L is the linear dimension of the system. Note that φ is a eigenfunction of the momentum
p = k.
If the particles are bosons, the wave function Ψ and hence Φ must be symmetric with respect to
the interchange of any two particles. If the particles are fermions, Ψ and Φ must be antisymmetric
with respect to the interchange of any two particles. The latter condition is the generalization of
the Pauli exclusion principle.
387 CHAPTER 10. INTRODUCTION TO MANYBODY PERTURBATION THEORY 388 Because of the impossibility of distinguishing identical particles, it is useful to describe noninteracting quantum systems by specifying only the number of particles in each single particle
state (see Section 6.5). That is, instead of working in coordinate space, we can represent the basis
functions of the manybody wave functions by
n1 n2 . . . , (10.2) where nk is the number of particles in the single particle state φk . For fermions nk equals 0 or 1;
there is no restriction for bosons. For a system with a ﬁxed number of particles N , the occupation
numbers nk satisfy the condition
N=
nk .
(10.3)
k We also learned in Section 6.5 that it is convenient to treat quantum mechanical systems in the
grand canonical ensemble in which the number of particles in a particular single particle quantum
state may vary. For this reason we next introduce a formalism that explicitly allows us to write
the energy of the system in terms of operators that change the number of particles in a given state. 10.2 Occupation Number Representation If we specify a state of the system in the occupation number representation, it is convenient to
introduce the operators ak and a† that act on states such as in (10.2). For bosons we deﬁne ak
ˆ
ˆk
ˆ
and a† by
ˆk
√
ak  . . . n k . . . = n k  . . . n k − 1 . . . ,
ˆ
(10.4a)
and
a†  . . . n k . . . =
ˆk √
nk + 1 . . . nk + 1 . . . (10.4b) From the deﬁnition (10.4a) we see that ak reduces the number of particles in state k and leaves the
ˆ
other occupation numbers unchanged. For this reason ak is called the annihilation or destruction
ˆ
operator. Similarly, from (10.4b) we that a† increases the occupation number of state k by unity
ˆk
√
and is called the creation operator. The factor of nk is included in (10.4a) to normalize the N
and N − 1 particle wave functions and to make the deﬁnitions consistent with the assertion that
√
ˆk
ak and a† are Hermitian conjugates. The factor 1 + nk is included for the latter reason.
ˆ
From the deﬁnitions in (10.4), it is easy to show that
ak a† nk = (nk + 1)nk
ˆ ˆk (10.5a) a† ak nk
ˆk ˆ (10.5b) = nk nk . We have written nk for  . . . , nk , . . . . By subtracting (10.5b) from (10.5a), we have
(ˆk a† − a† ak )nk = nk .
a ˆk ˆ k ˆ
In general, we may write that (10.6) [ˆk , a† ] ≡ ak a† − a† ak = 1,
a ˆk
ˆ ˆ k ˆk ˆ (10.7) CHAPTER 10. INTRODUCTION TO MANYBODY PERTURBATION THEORY 389 and show that
[ˆk , a† ] = δkk ,
a ˆk (10.8) [ˆk , ak ] = [ˆ† , a† ] = 0.
aˆ
ak ˆ k (10.9) and The commutation relations (10.8) and (10.9) deﬁne the creation and destruction operators a† and
ˆk
ak .
ˆ The appropriate deﬁnition of ak and a† is a little more tedious for fermions, and we shall
ˆ
ˆk
simply deﬁne them by the anticommutation relations:
{ ak , a† } ≡ ak a† + a† ak = 1 ,
ˆ ˆk
ˆ ˆk ˆk ˆ (10.10) { ak , ak } = { a † , a† } = 0 .
ˆˆ
ˆk k (10.11) and Equation (10.11) is equivalent to the statement that it is not possible to create two particles in the
same single particle state. 10.3 Operators in the Second Quantization Formalism ˆ
It is easy to show that for both Bose and Fermi statistics, the number operator Nk is given by
ˆ
N k = a† ak .
ˆk ˆ (10.12) ˆ
The eigenvalues of Nk acting on nk are zero or unity for fermions and either zero or any positive
integer for bosons.
We now wish to write other operators in terms of ak and a† . To do so, we note that a† and
ˆ
ˆk
ˆk
ak are the creation and destruction operators for a free particle with momentum p = k described
ˆ
by the wave function (10.1). The kinetic energy is an example of a oneparticle operator
ˆ
T =− N 2 2m ˆi
∇2 . (10.13) i=1 The form (10.13) in which the momentum p is expressed as an operator is an example of what is
called ﬁrst quantization. Note that the sum in (10.13) is over the indistinguishable particles in the
ˆ
system. A more convenient form for T in the second quantization formalism is given by
ˆ
T= p
p a† ap ,
ˆp ˆ (10.14) where p = p2 /2m and p = k. Note that the kinetic energy is diagonal in p. The form of (10.14)
is suggestive and can be interpreted as the sum of the kinetic energy in state p times the number
of particles in this state. CHAPTER 10. INTRODUCTION TO MANYBODY PERTURBATION THEORY 390 ˆ
The form of the twoparticle potential energy operator U can be obtained from straightforward
but tedious arguments. The result can be written as
1
ˆ
U=
2 k1 ,k1 ,k1 ,k2 k1 k2 uk1 k2 a† a† ak1 ak2 .
ˆk ˆk ˆ ˆ
1 2 (10.15) The summation in (10.15) is over all values of the momenta (wave vectors) of a pair of particles
such that the total momentum is conserved in the interaction:
k1 + k2 = k1 + k2 . (10.16) The matrix element k1 k2 uk1 k2 is given by
k1 k2 uk1 k2 = 1
V2 ei(k1 −k1 )·r1 +i(k2 −k2 )·r2 u(r2 − r1 ) dr1 dr2 . (10.17) We next make the change of variables, R = (r1 + r2 )/2 and r = r1 − r2 , and write
k1 k2 uk1 k2 = 1
V2 ei(k1 −k1 +k2 −k2 )·R ei(k1 −k1 −k2 +k2 )·r/2 u(r) dR dr. (10.18a) Because of the homogeneity of space, the integral over R can be done yielding a Dirac delta function
and the condition (10.16). We thus obtain obtain
k1 k2 uk1 k2 = u(k ) = e−ik·r u(r) dr, (10.18b) where k = k2 − k2 = −(k1 − k1 ) is the momentum (wave vector) transferred in the interaction.
With these considerations we can write the Hamiltonian in the form
ˆ
H=
p 1
p2 †
ap ap +
ˆˆ
2m
2V k,p1 ,p2 u(k )ˆ† 1 +k a† 2 −k ap2 ap1 .
ap
ˆp
ˆˆ (10.19) We have written p1 and p2 instead of k1 and k2 in (10.19) and chosen units such that = 1.
The order of the operators in (10.15) and (10.19) is important for fermions because the fermion
operators anticommute. The order is unimportant for bosons. The form of the interaction term in
(10.19) can be represented as in Figure 10.1. 10.4 Weakly Interacting Bose Gas A calculation of the properties of the dilute Bose gas was once considered to have no direct physical
relevance because the gases that exist in nature condense at low temperatures. However, such a
calculation was interesting because the properties of the weakly interacting Bose gas are similar
to liquid 4 He. In particular, a dilute Bose gas can be a superﬂuid even though an ideal Bose gas
cannot. Moreover, in recent years, the dilute Bose gas at low temperatures has been created in
the laboratory (see references).
The condition for a gas to be dilute is that the range of interaction σ should be small in
1. Because the gas is
comparison to the mean distance between the particles, ρ−1/3 , that is ρσ 3 CHAPTER 10. INTRODUCTION TO MANYBODY PERTURBATION THEORY 391 not done Figure 10.1: Representation of the matrix element in (10.15). dilute, we need to consider only binary interactions between particles using quantum perturbation
theory. The diﬃculty is that because of the rapid increase in the interparticle potential u(r) at
small r, ordinary perturbation theory (the Born approximation) cannot be directly applied.
We can circumvent the lack of applicability of the Born approximation by the following argument. The scattering cross section is given by f 2 , where f is the scattering amplitude. In the
Born approximation, f is given by
f (k ) = − m
4π 2 u(r)e−ik·r dr, (10.20) where k is the momentum transferred in the interaction. In the limit of low temperatures, the
particle momenta are small, and we can set k = 0 in (10.20). If we set f (k = 0) = −a, where a is
the scattering amplitude, we obtain
(10.21)
a = mU0 /4π 2
where
U0 = u(r) dr. (10.22) In the following, we will set u(k = 0) = U0 = 4π 2 a/m, so that we will be able to mimic the result
of doing a true perturbation theory calculation.1
If we assume that u(k ) = U0 for al k , a constant, we can write the Hamiltonian as
ˆ
H=
p U0
p2 †
a ap +
ˆˆ
2m p
2V k,p1 ,p2 a † 1 − k a † 2 +k a p 2 a p 1 .
ˆp
ˆp
ˆˆ (10.23) The form of (10.23) is the same for Bose or Fermi statistics. Only the commutation relations for
the creation and destruction operators are diﬀerent.
We now follow the approximation method developed by Bogolyubov (1947). In the limit
ˆ
U0 → 0, H reduces to the Hamiltonian of the ideal Bose gas. We know that the latter has
1 In the language of quantum mechanics, we need to replace the bare interaction u by the t matrix. This
replacement is the quantum mechanical generalization of replacing −βu by the Mayer f function. Not surprisingly,
this replacement can be represented by an inﬁnite sum of laddertype diagrams. Note that if we interpret the Mayer
f function as the eﬀective interaction between particles, the ﬁrst cumulant in a high temperature expansion would
yield the same result as the ﬁrst term in the classical virial expansion. CHAPTER 10. INTRODUCTION TO MANYBODY PERTURBATION THEORY 392 a condensate, that is, there is macroscopic occupation of the zero momentum state, so that at
T = 0, N0 = N , and Np = 0 for p = 0. For the weakly interacting Bose gas, we expect that the
low lying states do not have zero occupation, but that Np for p > 0 is small so that N0 ≈ N . We
ˆ
proceed by assuming that N − N0 is small and extract the k = 0 terms in H . For example,
ˆ
N=
p a† ap = a† a0 +
ˆp ˆ
ˆ0 ˆ a† ap .
ˆp ˆ (10.24) p =0 Because a† a0 = N0 ≈ N is much larger than unity, it follows that a0 a† − a† a0 = 1 is small in
ˆ0 ˆ
ˆ ˆ0 ˆ0 ˆ
√
comparison to a0 and a† and hence a0 and a† may be regarded as numbers (equal to N0 ), and
ˆ
ˆ0
ˆ
ˆ0
we can ignore the fact that they do not commute.
We now expand the potential energy in (10.23) in powers of the small quantities ap , a† for
ˆ ˆp
p = 0. The zerothorder term is
U0 4
U0 2
U0 † †
a a a0 a0 =
a0 =
N.
2V 0 0
2V
2V 0 (10.25) There are no ﬁrstorder terms proportional to a0 3 , because they cannot be made to satisfy conservation of momentum. The secondorder contributions are proportional to (U0 /2V )N0 and are
given by
(a)
(b)
(c)
(d)
(e)
(f) a† a† k
ˆk ˆ−
a† 1 ap 1
ˆp ˆ
a† 2 ap 2
ˆp ˆ
ap 1 a− p 1
ˆˆ
a† 2 ap 2
ˆp ˆ
a† 1 ap 1
ˆp ˆ p1 = p2 = 0, k = 0
k = −p1 , p2 = 0
k = p2 , p1 = 0
p1 = −p2 = −k
k = p1 = 0, p2 = 0
k = p2 = 0, p1 = 0 We will ignore all higher order terms, which is equivalent to ignoring the interaction between
excited particles. Hence, if we extend the above approximations to T above Tc , our approximate
Hamiltonian would reduce to the Hamiltonian for the ideal gas.
The approximate Hamiltonian can now be written as
ˆ
H=
p p2 †
U0 2 U0
a ap +
ˆˆ
N+
N0
2m p
2V 0
2V k a† a† k + ak a−k + 4ˆ† ak .
ˆ k ˆ−
ˆˆ
ak ˆ The notation
denotes that the sum excludes terms with p = 0 and k = 0.
In general, we have
a† ap = N 0 +
ˆp ˆ
a† ap .
ˆp ˆ
N = a2 +
0
p p ˆ† ˆ
p ap ap + (10.27) p 2
2
For consistency, we replace N0 in (10.26) by N0 = N 2 − 2N
be replaced by N . The result of these replacements is that N2
ˆ
ˆ
H ≈ HB =
U0 +
2V (10.26) N
U0
2V k p a† ap . Similarly N0 in (10.26) may
ˆp ˆ a† a† k + ak a−k + 2ˆ† ak .
ˆk ˆ−
ˆˆ
ak ˆ (10.28) CHAPTER 10. INTRODUCTION TO MANYBODY PERTURBATION THEORY 393 Note that HB only allows excitation of pairs of momentum k and −k from the condensate and
reentry of such pairs into the condensate.
ˆ
ˆ
ˆ
The approximate Hamiltonian HB is bilinear in a and a† . This form is similar to that of a
ˆ
harmonic oscillator. This similarity suggests that we can diagonalize HB by making an appropriate
†
ˆ B is put into diagonal form, then HB would
ˆ
linear transformation of the operators a and a . If H
ˆ
ˆ
have the same form as an ideal gas, and we could easily calculate the energy eigenvalues.
b
We deﬁne new operators ˆ† and ˆ by
b
ak = ukˆk + vk ˆ† k
ˆ
b
b−
†
†
a = ukˆ + vk ˆ−k ,
ˆ
b
b
k (10.29a)
(10.29b) k and require them to satisfy the Bose commutation relations
ˆkˆ† − ˆ† ˆk = δkk
b bk
bk b and ˆkˆk = ˆk ˆk .
bb
bb (10.30) As shown in Problem 10.1, ˆ† and ˆ satisfy the Bose commutation relations only if the relation
b
b
(10.31) between uk and vk is satisﬁed:
2
u2 − vk = 1.
k (10.31) Problem 10.1. (a) Use (10.29) to express ˆ† and ˆ in terms of a† and a. (b) Show that the
b
b
ˆ
ˆ
commutation relations (10.30) are satisﬁed only if (10.31) is satisﬁed.
If we substitute the above expressions for a† and a into (10.28), we obtain
ˆ
ˆ
ˆ
ˆ
ˆ
HB = E0 + HD + HI (10.32a) where
E0 = N 2 U0
+
2V (
k p + N U0
N U0
uk vk
vk 2 +
V
V (10.32b) ( p + 2 N U0
N U0
) uk 2 + vk 2
uk vk ˆ† bk
bk
V
V (10.32c) k ˆ
HD = + N U0
V (10.32d) k ˆ
HI =
k 1
2
uk vk + vk (u2 + vk )
k
2 ˆ† ˆ† + ˆkˆ−k .
b k b −k b b ˆ
ˆ
From the form of (10.32), we see that HB would be diagonal if HI = 0. This condition is satisﬁed
if
N U0
2
2 k+
uk vk + vk (u2 + vk = 0,
(10.33)
k
V
and the relation (10.31) is satisﬁed. Note that we have two equations for the two unknown uk and
vk . We can satisfy the relation (10.31) automatically by letting
uk = cosh θk
vk = sinh θk . (10.34a)
(10.34b) CHAPTER 10. INTRODUCTION TO MANYBODY PERTURBATION THEORY 394 2
If we use the identities 2uk vk = 2 cosh θk sinh θk = sinh 2θk and u2 + vk = cosh 2θk , we can express
k
(10.33) as
N U0
N U0
) sinh 2θk +
cosh 2θk = 0,
( k+
(10.35)
V
V
or
ρU0
tanh 2θk = −
.
(10.36)
k + ρU0 Note that (10.36) has a solution for all k only if U0 > 0.
The solution (10.36) is equivalent to
2
u2 + vk =
k + ρU0
E (k ) (10.37) ρU0
,
E (k ) (10.38) k and
2uk vk = −
where
E (k ) = k( k + 2ρU0 ). (10.39) + ρU0
+1 ,
E (k )
k + ρU0
−1 .
E (k ) 1
2
1
2
vk =
2 k u2 =
k (10.40a)
(10.40b) ˆ
If we substitute uk and vk into HB , we obtain
1
ˆ
HB = N ρU0 +
2 E (k ) − k − ρU0 + k k E (k ) ˆ† ˆk .
bk b (10.41) b
From the form of (10.41) we see that ˆ† and ˆk are the creation and destruction operators for
bk
quasiparticles or elementary excitations with energy E (k ) obeying Bose statistics. If we replace
U0 by 4π 2 a/m, we see that the quasiparticle energy is given by
E (p) = c2 p2 + (p2 /2m)2 , (10.42) 4π 2 ρa
.
m2 (10.43) where
c= Note that for small p, E (p) is proportional to p and hence the excitations are phonons with velocity
c.
The ground state energy E0 is given by
E0 = 1
N ρU0 +
2 E (k ) −
k k − ρU0 . (10.44) CHAPTER 10. INTRODUCTION TO MANYBODY PERTURBATION THEORY 395 We can replace the summation over discrete values of k by an integration over p and multiply by
V /(2π )3 . We obtain (see Huang)
a3 ρ
.
π 2πaρ
128
E0
=
1+
N
m
15 (10.45) Problem 10.2. Show that c is equal to the sound speed using the relation (see Reif)
c = (ρκS )−1/2 , (10.46) where κS is the adiabatic compressibility:
κS = − 1 ∂V
V ∂P S . (10.47) The above relations can be used to express c as
c2 = ∂ρ
∂P S . (10.48) At T = 0, the pressure is given by
∂E0
.
(10.49)
∂V
Use the above relations and (10.45) to show that the calculated speed of sound is consistent with
the phonon speed (10.43) to lowest order in (ρa3 )1/2 .
P =− Problem 10.3. The number of quasiparticles of momentum p for T > 0 is given by
np = 1
.
eβE (p) − 1 (10.50) Why is the chemical potential equal to zero?
Problem 10.4. The momentum distribution of the actual particles in the gas is given by
N p = a† ap .
ˆp ˆ (10.51) Use the relation between a† and ap and ˆ† and ˆp , and the fact that the products ˆ† ˆ† p and
ˆp
ˆ
bp
b
bp b−
ˆ−pˆp have no diagonal matrix elements to show that
bb
Np = np + fp (np + 1)
,
1 − fp where
fp = m
4πaρ 2 E (p) − p2
− mc2 .
2m (10.52) (10.53) This result is valid only for p = 0. At T = 0, np = 0 for p = 0. Show that
Np = m 2 c4
.
2E (p)[E (p) + p2 /2m + mc2 ] (10.54) CHAPTER 10. INTRODUCTION TO MANYBODY PERTURBATION THEORY 396 The number of particles with zero momentum is
N0 = 1 − Np = 1 − V
p d3 p
N p.
(2π )3 (10.55) Note that the interaction between the particles causes the appearance of particles with nonzero
momentum even at T = 0. Use (10.54) to show that
8 ρa3 1/2
N0
=1− (
).
N
3π (10.56) Appendix A Useful Formulae
c 2006 by Harvey Gould and Jan Tobochnik
12 May 2006 A.1 Physical constants
constant
Avogadro’s number
Boltzmann’s constant
universal gas constant
Planck’s constant
speed of light
electron charge
electron mass
proton mass A.2 symbol
NA
k
R
h
c
e
me
mp magnitude
6.022 × 1023
1.381 × 10−23 J/K
8.314 J/(mol K)
6.626 × 10−34 J s
1.055 × 10−34 J s
2.998 × 108 m/s
1.602 × 10−19 C
9.109 × 10−31 kg
1.672 × 10−27 kg SI derived units
newton
joule
watt
pascal 1 N ≡ 1 kg m/s2
1J ≡ 1Nm
1 W ≡ 1 J/s
1 Pa ≡ 1 N/m2 397 398 APPENDIX A. USEFUL FORMULAE A.3 Conversion factors
1 atm = 1.013 bar
= 1.013 × 105 Pa
= 760 mm Hg
= 4.186 J
= 1.602 × 10−19 J 1 cal
1 eV A.4 Mathematical Formulae
1x
[e + e−x ].
2
1
sinh x = [ex − e−x ].
2
sinh x
ex − e−x
tanh x =
=x
.
cosh x
e + e−x
cosh x = A.5 (A.1)
(A.2)
(A.3) Approximations
∞ ex = x2
xn
≈1+x+
+···
n!
2!
n=0
∞ sin x =
n=0
∞ cos x =
n=0
∞ (−1)n x2n+1
x3
x5
≈x−
+
+···
(2n + 1)!
3!
5! (A.5) (−1)n x2
x4
x2n
≈1−
+
+···
(2n)!
2!
4! (A.6) 1
=
xn−1 ≈ 1 + x + x2 + · · ·
1 − x n=1
∞ (A.7) x2
1
xn
≈x−
+ x3 + · · ·
n
2
3 (A.8) 22n (22n − 1)
x3
2x5
B2n x2n−1 ≈ x −
+
+··· ,
(2n)!
3
15
n=1 (A.9) ln (1 + x) =
n=1
∞ tanh x = (A.4) (−1)n+1 where Bn are the Bernoulli numbers (see Sec. A.9). 399 APPENDIX A. USEFUL FORMULAE A.6 EulerMaclaurin formula
∞ ∞ 1
1
1
f (0)+
f (x) dx + f (0) − f (0) +
2
12
720
f (x) =
0 i=0 A.7 (A.10) Gaussian Integrals
∞ In = 2 dx xn e−ax . (n ≥ 0, a > 0) −∞ (A.11) π
I0 = ( )1/2
a
I1 = 0
1π
I2 = ( 3 )1/2
2a (A.12)
(A.13)
(A.14) Derivation: +∞ I0 = 2 e−ax dx. (A.15) −∞ We note that x in the integrand in (A.15) is a dummy variable. Hence, we may write I0 equally
well as
+∞ I0 = 2 e−ay dy. −∞ To convert the integrand to a form we can integrate, we multiply I by itself and write
2
I0 = +∞ +∞ 2 e−ax dx −∞ +∞ 2 e−ay dy = −∞ −∞ +∞ e−(x 2 +y 2 ) dx dy. (A.16) −∞ The double integral in (A.16) extends over the entire xy plane. We introduce the polar coordinates
2
r and θ, where r2 = x2 + y 2 . The element of area in polar coordinates is rdr dθ. Hence, I0 can be
rewritten in the form
2
I0 = 2π 0 ∞ ∞ 2 e−ar rdr dθ = 2π 0 2 e−ar rdr. 0 2 We let z = ar , dz = 2ardr, and write
2
I0 = π
a ∞ e−z dz = 0 π
− e−z
a ∞
0 = π
.
a Hence, we obtain the desired result
+∞ I0 = 2 e−ax dx = −∞ π
a 1/2 . (A.17) Clearly the values of In for n odd are zero by symmetry. For odd values of n, let us redeﬁne
In as
∞ In =
0 2 dxxn e−ax . (n odd) (A.18) 400 APPENDIX A. USEFUL FORMULAE
It is straightforward to show that 1
.
(A.19)
2a
All integrals In for n > 1 can be reduced to the integrals I0 or I1 using the recursion relation
I1 = ∂In−2
.
∂a (A.20) 1 π 1/2
().
2 a3 (A.21) In = −
For example,
I2 = A.8 Stirling’s formula Because N ! is deﬁned as
N ! = 1 × 2 × 3 × · · · × N, (A.22) we have
ln N ! = ln 1 + ln 2 + ln 3 + · · · + ln N
N ≈ 1 ln x dx = x ln x − x N
1 = N ln N − N + 1. (A.23) For N >> 1, we have
ln N ! N ln N − N. (simple form of Stirling’s approximation) (A.24) A more accurate approximation for N ! can be found from the integral representation:
∞ dx xN e−x . N! = (A.25) 0 In the integrand f (x) = xN e−x , xN is a rapidly increasing function of x for large N , and e−x is a
decreasing function of x. Hence f (x) exhibits a sharp maximum for some value of x. To ﬁnd this
maximum, we let z = x/N , z N = eN ln z , and write f as
f = xN e−x → N N z N e−N z = N N e−(z−ln z) . (A.26) Because the maximum of z − ln z is at z = 1, we write z = 1 + t and express (A.26) as
f = N N e−N [t+1−ln(1+t)] = N N e−N e−N [t−ln(1+t)] .
We let ln(1 + t) ≈ t − 1 t2 (see (A.8)) and write
2
f ≈ N N e−N e−N t 2 /2 . (A.27) 401 APPENDIX A. USEFUL FORMULAE
From (A.27) we see that f has a sharp maximum at t = 0 for large N , and hence
∞ ∞ f dx ≈ N N e−N N! =
0 = N N +1 e−N ∞ N dt e−N t = N N e−N (2πN ) /2 (A.28) −1 dt e−N t −∞
1/2 2 2 /2 , (A.29) and ﬁnally,
ln N ! = N ln N − N + 1
ln(2πN ). (stronger form of Stirling’s approximation)
2 (A.30) The Gamma function is deﬁned as
∞ Γ(n) = dx xn−1 e−x , (Gamma function) (A.31) for positive integer n. (A.32) 0 and is a useful in the context of factorials because
Γ(n + 1) = nΓ(n) = n! (The result (A.32) can be derived by an integration by parts.) Note that −1! = 0! = 1 and
Γ(1) = Γ(2) = 1.
For half integer arguments, Γ(n/2) has the special form
√
n
(n − 2)!! π
Γ( ) =
,
(A.33)
2
2n−1)/2
where the double factorial n!! = n × (n − 2) × · · ·×√× 1 if n is odd and n!! = n × (n − 2) × · · ·× 4 × 2
3
√
1
3
if n is even. We also have −1!! = 0!! = 1, Γ( 2 ) = π/2, and Γ( 2 ) = π/2. A.9 Constants The Bernoulli numbers are the coeﬃcients of xn /n! in the expansion of
x
xn
Bn .
=
ex − 1 n=0
n! (A.34) All the Bernoulli numbers Bn with odd n are zero except for B1 , that is B2n+1 = 0 for n > 0.
B0 = 1, 1
B1 = − ,
2 B2 = 1
,
6 B4 = −1/30, B6 = 1
,
42 B8 = − 1
.
30 (A.35) Euler’s constant
γ = 0.577 215 664 901 5325 . . . (A.36) 402 APPENDIX A. USEFUL FORMULAE A.10 Probability distributions
P (n, N ) = N!
pn q (N −n) .
n! (N − n)! (binomial distribution) (A.37) The binomial distribution is speciﬁed by the probability p = 1 − q and the number of trials N .
P (x) = √ 1
2πσ 2 e−(x−x) 2 /2σ 2 . (Gaussian distribution) (A.38) ¯
The Gaussian distribution is speciﬁed by x, the mean value of x, and σ 2 = x2 − x2 , the variance
¯
¯
of x.
λn −λ
P (n) =
e.
(Poisson distribution)
(A.39)
n!
The Poisson distribution is speciﬁed only by the parameter λ = n = pN .
¯ A.11 Fermi integrals The integrals that commonly occur in the context of the ideal Fermi gas have the form
∞ dx In =
0 xn e x
= n!(1 − 21−n )ζ (n).
(ex + 1)2 (A.40) where the Riemann zeta function is deﬁned by
∞ ζ (x) =
k=0 1
.
(k + 1)x (A.41) The values of the ﬁrst several zeta functions are
3
ζ ( ) ≈ 2.612
2
π2
ζ (2) =
≈ 1.645
6
5
ζ ( ) ≈ 1.341
2
ζ (3) ≈ 1.202 (A.42a)
(A.42b)
(A.42c)
(A.42d) 4 π
≈ 1.082
90
6
π
ζ (6) =
945
ζ (4) = (A.42e)
(A.42f) 403 APPENDIX A. USEFUL FORMULAE A.12 Bose integrals The integrals we need in the context of the ideal Bose gas have the form
∞ IB (n) = dx
0
∞ = xn
=
ex − 1 dx
0 ∞ dx xn 0 xn e−x
1 − e−x (A.43) e−(k+1)x
k=0 ∞ ∞ =
k=0 ∞ ∞ dx xn e−(k+1)x = 0 k=0 1
(k + 1)n+1 ∞ dy y n e−y . (A.44) 0 If we use the deﬁnition of the Riemann zeta function in (A.40) and the deﬁnition of the Gamma
function in (A.31), we obtain
IB (n) = ζ (n + 1)Γ(n + 1).
(A.45)
If n is an integer, than (A.45) reduces to
I (n) = n!ζ (n + 1). (A.46) ...
View
Full
Document
This note was uploaded on 02/06/2011 for the course MECHANICAL 351 taught by Professor Knight during the Spring '08 term at Rutgers.
 Spring '08
 Knight
 The Land

Click to edit the document details