Galaxy Formation and Large-Scale Structure of the Universe
The universe contains an estimated 2 trillion galaxies, a figure revised sharply upward from earlier estimates when Hubble Space Telescope deep-field surveys were combined with statistical modeling by Conselice et al. (2016, The Astrophysical Journal). How those galaxies formed, why they cluster the way they do, and what drives the web-like architecture visible at the largest scales are questions that sit at the intersection of quantum physics, gravity, and cosmology. This page covers the mechanics of galaxy formation from the earliest density fluctuations through to the observed cosmic web, the classification frameworks researchers use, and the genuine tensions that remain unresolved in the field.
- Definition and scope
- Core mechanics or structure
- Causal relationships or drivers
- Classification boundaries
- Tradeoffs and tensions
- Common misconceptions
- Checklist or steps
- Reference table or matrix
Definition and scope
Galaxy formation is the process by which primordial gas — overwhelmingly hydrogen and helium produced during Big Bang nucleosynthesis — collapses under gravity into the bound, rotating structures of stars, gas, dust, and dark matter halos that are called galaxies. Large-scale structure refers to the arrangement of those galaxies across hundreds of millions to billions of light-years: the filaments, walls, sheets, voids, and nodes that make up what cosmologists call the cosmic web.
The scope of the subject spans roughly 13.8 billion years of cosmic time, from the first density perturbations imprinted on the plasma of the early universe through to the galaxy clusters and superclusters visible in surveys like the Sloan Digital Sky Survey (SDSS), which mapped more than 3 million galaxies in three dimensions. The field draws on dark matter, dark energy, general relativistic gravity, hydrodynamics, stellar physics, and observational data from space telescopes and observatories spanning the full electromagnetic spectrum.
Core mechanics or structure
Galaxy formation follows a hierarchical, bottom-up sequence in the Lambda-CDM (Cold Dark Matter) cosmological model. Small structures collapse first; larger ones assemble later through mergers and accretion. The sequence runs roughly as follows:
Quantum fluctuations and inflation. Density perturbations at the quantum scale were stretched to macroscopic sizes during cosmic inflation. These tiny ripples — amplitude roughly 1 part in 100,000 — seeded all subsequent structure. Their statistical properties are encoded in the cosmic microwave background, which the Planck satellite (European Space Agency, 2018 data release) measured to extraordinary precision.
Recombination and the first neutral atoms. At approximately 380,000 years after the Big Bang, the universe cooled sufficiently for protons and electrons to combine into neutral hydrogen. Photons decoupled from matter. Gravity could then act unopposed on the slight overdensities in the dark matter distribution.
Dark matter halo formation. Dark matter — which interacts only gravitationally and does not radiate — collapsed first into halos. These halos provided the gravitational wells into which baryonic matter (gas) subsequently fell. Simulations such as the Millennium Simulation (Springel et al., 2005, Nature) demonstrated that the large-scale filamentary structure of the universe emerges naturally from this process, given CDM initial conditions.
Gas cooling and star formation. Baryonic gas falls into dark matter halos, heats up through shocks, then cools via radiative emission — primarily molecular hydrogen at early times and metal-line cooling once the first stellar generations have dispersed heavier elements. Once gas cools below ~10,000 K, it can fragment and form stars. The first stars (Population III) are thought to have been massive — potentially hundreds of solar masses — and short-lived, enriching the intergalactic medium with metals through supernovae and seeding conditions for the next stellar generation.
The cosmic web. At scales of tens to hundreds of megaparsecs (1 Mpc ≈ 3.26 million light-years), matter is not distributed uniformly. It arranges into a foam-like network: dense nodes where galaxy clusters reside, filaments connecting those nodes, flat walls bounding voids, and vast voids — some exceeding 300 Mpc in diameter — that are nearly empty of galaxies. The Boötes Void, discovered in 1981, spans approximately 330 million light-years and contains only 60 known galaxies where thousands might be expected.
Causal relationships or drivers
The amplitude and spatial scale of the initial density perturbations determine which structures form and when. A higher matter density parameter (Ω_m) accelerates structure growth; dark energy (characterized by the equation of state parameter w ≈ −1 in the standard model) acts as a brake on structure formation after it begins to dominate the energy budget of the universe — roughly 5 billion years ago.
Angular momentum is the reason galaxies are disks rather than spheres. As a proto-galactic cloud collapses, conservation of angular momentum causes it to spin faster and flatten, just as an ice skater pulls in their arms. Elliptical galaxies, by contrast, are thought to have had their rotation randomized or suppressed through major mergers — violent collisions between galaxies of comparable mass.
Feedback processes are central to why galaxies are not simply bigger. Star formation would proceed without limit if gravity were unchallenged. In practice, supernova feedback injects energy and momentum into the interstellar medium, suppressing further collapse in lower-mass halos. In massive galaxies and clusters, active galactic nuclei — powered by supermassive black holes — inject energy through jets and radiation sufficient to heat or expel gas from the entire galaxy, a process called AGN feedback. Without these mechanisms, simulations produce far too many stars relative to what observations show. Stellar evolution and life cycles are therefore not merely internal to galaxies — they regulate galaxy-scale properties.
The home reference for the broader framework of astrophysics covered across this network is at the site index, which maps connections among these overlapping processes.
Classification boundaries
The Hubble sequence, formalized by Edwin Hubble in 1926 and refined subsequently, divides galaxies into three primary morphological types:
- Elliptical galaxies (E0–E7): Smooth, featureless light distributions. Number classifications denote apparent ellipticity (E0 = circular, E7 = strongly elongated).
- Spiral galaxies (Sa–Sd; SBa–SBd): Disk galaxies with spiral arms. The "B" designation denotes a central bar. Arm winding and bulge prominence decrease from Sa to Sd.
- Irregular galaxies: No clear symmetry. Often the result of gravitational disturbance; the Large Magellanic Cloud is a canonical example.
A fourth category — lenticular (S0) — bridges ellipticals and spirals: disk structure without prominent spiral arms.
Mass is a separate classification axis. Dwarf galaxies may contain as few as 10^7 solar masses; giant ellipticals exceed 10^13 solar masses. The Milky Way sits at roughly 10^12 solar masses in its dark matter halo, according to Bland-Hawthorn & Gerhard (2016, Annual Review of Astronomy and Astrophysics).
Tradeoffs and tensions
The Lambda-CDM model is extraordinarily successful — and it has uncomfortable loose ends.
The small-scale structure problem. CDM simulations predict far more low-mass dark matter subhalos around galaxies like the Milky Way than observed satellite galaxies. The Milky Way has approximately 50 confirmed satellites, while some simulations predict hundreds to thousands. Proposed resolutions include baryonic feedback suppressing galaxy formation in small halos, or modifications to the dark matter particle physics (warm dark matter, self-interacting dark matter).
The cusp-core problem. Simulations predict dark matter halos with dense central cusps, but observations of many dwarf galaxies suggest flatter central density profiles (cores). Supernova feedback can, in principle, transform cusps into cores, but whether this operates efficiently enough remains debated.
The tension in H₀. The Hubble constant — the present-day expansion rate — measured from the CMB (Planck 2018: 67.4 km/s/Mpc) disagrees at the 5-sigma level with measurements from Cepheid-calibrated Type Ia supernovae (Riess et al., SH0ES program: ~73 km/s/Mpc). If the tension is real rather than systematic, it implies either new physics in the early universe or a gap in the standard cosmological model. This directly affects structure-formation predictions.
Top-down tension from JWST. James Webb Space Telescope observations beginning in 2022 have identified massive, luminous galaxy candidates at redshifts z > 10 — corresponding to less than 500 million years after the Big Bang — that are unexpectedly abundant and bright. Whether these represent a genuine overproduction of early massive galaxies or are explained by selection effects and photometric redshift uncertainties is an active research question that will take years to resolve.
Common misconceptions
"Galaxies are distributed randomly." The cosmic web is not random. The distribution of galaxies reflects the statistics of the initial density field, which is well-characterized as a Gaussian random field. Galaxy positions are correlated across scales up to hundreds of megaparsecs — a pattern measured through the baryon acoustic oscillation feature at ~150 Mpc comoving separation.
"Galaxy mergers are collisions of stars." When two galaxies merge, the probability of any two individual stars colliding is negligible — the average distance between stars in a galactic disk is several light-years, while stars themselves are tiny by comparison. What actually collides is the gas, and what gravitationally disrupts is the stellar distribution. Galactic mergers are more like two bee swarms passing through each other than two rocks hitting.
"The Milky Way is typical." The Milky Way is larger and more luminous than roughly 90% of galaxies in the observable universe by count. Most galaxies are dwarf systems. The Milky Way sits near the upper end of the spiral galaxy mass range — not a rare giant, but not representative of the average either.
"Voids are empty." Cosmic voids are underdense, not empty. They contain dark matter filaments, isolated galaxies, and gas at low density. The Boötes Void, for example, contains 60 confirmed galaxies. Void interiors are also useful cosmological probes: their shapes and growth rates are sensitive to dark energy properties.
"Galaxy formation is complete." Galaxy formation is ongoing. Mergers continue — the Milky Way is absorbing the Sagittarius Dwarf Galaxy in an ongoing tidal disruption, and the Andromeda–Milky Way collision is projected approximately 4.5 billion years from now based on proper motion measurements from the Gaia satellite (ESA).
Checklist or steps
The observational and theoretical pathway researchers use to characterize galaxy formation and large-scale structure follows a recognized sequence:
- Map the CMB power spectrum — establish the amplitude (A_s) and spectral index (n_s) of primordial fluctuations, and extract the matter density, baryon density, and Hubble constant from acoustic peak positions.
- Run N-body and hydrodynamic simulations — using CMB-derived initial conditions, simulate structure growth forward in time at varying mass resolution (e.g., IllustrisTNG, EAGLE, SIMBA suites).
- Compare simulated to observed galaxy properties — stellar mass functions, size-mass relations, star formation rate densities, color bimodality, and morphological fractions at multiple redshifts.
- Measure the galaxy power spectrum or two-point correlation function — from spectroscopic surveys (SDSS, DESI) to detect the baryon acoustic oscillation scale and constrain cosmological parameters.
- Identify the baryon acoustic oscillation peak — use its known physical scale (~150 Mpc) as a standard ruler to measure the expansion history of the universe across redshift.
- Characterize void statistics and weak gravitational lensing — both probe the growth rate of structure and are sensitive to dark energy and modified gravity models. See gravitational lensing for lensing methodology.
- Cross-compare with multi-wavelength observations — X-ray observations of hot intracluster gas (Chandra, XMM-Newton), radio observations of AGN jets, and infrared data on dust-obscured star formation together constrain feedback models.
- Iterate feedback prescriptions — adjust supernova and AGN feedback parameters in simulations until the simulated galaxy population matches observed stellar mass functions across redshift.
Reference table or matrix
Key scales and structures in the cosmic web
| Structure | Typical scale | Dominant content | Key observational probe |
|---|---|---|---|
| Individual galaxy | 1–100 kpc | Stars, gas, dark matter halo | Optical/IR photometry, spectroscopy |
| Galaxy group | 1–3 Mpc | 3–50 galaxies, diffuse X-ray gas | X-ray (Chandra), spectroscopic redshifts |
| Galaxy cluster | 2–10 Mpc | 50–1,000+ galaxies, hot ICM | X-ray, Sunyaev-Zel'dovich effect, weak lensing |
| Cosmic filament | 10–100 Mpc | Dark matter + baryons, galaxy chains | Galaxy surveys, Ly-α forest absorption |
| Cosmic wall/sheet | 50–200 Mpc | Flattened overdensities | SDSS and DESI redshift surveys |
| Cosmic void | 20–300+ Mpc | Underdense, sparse galaxies | Void catalogs from spectroscopic surveys |
| Observable universe | ~28,000 Mpc radius | All structure | Full-sky CMB + galaxy surveys |
Comparison of major N-body/hydrodynamic simulation suites
| Simulation | Institution | Box size (comoving) | Key physics included |
|---|---|---|---|
| Millennium (2005) | Max Planck Institute | 500 Mpc/h | Dark matter only; galaxy models semi-analytic |
| IllustrisTNG | MIT/MPA/CCA | 300 Mpc | Full baryons, AGN feedback, magnetic fields |
| EAGLE | Durham/Leiden | 100 Mpc | Star formation, stellar and AGN feedback |
| SIMBA | Florida/Edinburgh | 100 Mpc | Kinetic AGN jets, dust physics |
| Bolshoi-Planck | NASA/ARC | 250 Mpc/h | Dark matter; Planck 2015 cosmology |
References
- NASA Hubble Space Telescope — Galaxy Deep Fields
- ESA Planck Collaboration — 2018 Results (Cosmological Parameters)
- Sloan Digital Sky Survey (SDSS)
- Dark Energy Spectroscopic Instrument (DESI), NOIRLab
- IllustrisTNG Simulation Project
- ESA Gaia Mission — Galaxy Kinematics
- NASA Chandra X-ray Observatory — Galaxy Clusters
- James Webb Space Telescope Science — Early Universe Galaxies, NASA
- Conselice et al. (2016), The Astrophysical Journal — galaxy count revision to ~2 trillion (AJ abstract)
- Bland-Hawthorn & Gerhard (2016), Annual Review of Astronomy and Astrophysics — Milky Way mass (ARA&A)
- Springel et al. (2005), Nature — Millennium