Perspikacia - the Insights blog.

Saturday, May 22, 2021

Quantum for the curious: 1 - Qubits

Quantum computing has been in the news lately. Companies, governments, and research labs throughout the world are making significant investments in developing quantum computers. Large companies such as Google, Microsoft, and IBM are racing to build a scalable quantum computer because they believe it will give them a significant edge in high-performance computing. IBM was one of the first companies to offer cloud-based access to a quantum computer for experimentation and education. Google made a splash in October 2019 when they announced that they had achieved "quantum supremacy" for the first time in history. This means that they were able to perform a calculation on their quantum computer, which they estimated would take the fastest supercomputer 10,000 years to complete. Similar announcements have been made since then by teams of researchers in China. While there is a lot of hype around this technology, there are many challenges with building a scalable quantum computer. Quantum computers derive their power from massive parallelism that is made possible by the quantum nature of computational elements in them. But those quantum states are finicky and can be easily destroyed by noise in the system. To mitigate this phenomenon known as decoherence, quantum computing systems have to be isolated from the environment and cooled to very low temperatures. In addition, environmental noise can produce errors that have to be accounted for during computation. This is a major challenge today, but there is a tremendous amount of research being conducted to address this problem and a lot of progress is being made. So there is an expectation that in the next 20-30 years scalable quantum computers will be available to perform a variety of computational tasks. For example, IBM which runs a 65-qubit quantum processor has promised to have a 1000-qubit quantum computer by 2023. It does raise the question of what these quantum computers are going to be good for. There are many potential applications of quantum computers. Simulation of quantum phenomena in chemistry and physics, large-scale optimization problems, and machine learning are some of the well-known applications. But arguably the most dramatic consequence of a quantum computing revolution would be its implications for network communication and security. Public key cryptography is the foundation of secure communication on the internet. Modern eCommerce and internet banking as well as other forms of secure internet communication are all based on TLS, which in turn uses public-key cryptography. The latter is based on the computational hardness of problems such as large factorization. Peter Shor's discovery of an efficient quantum algorithm for integer factorization in 1994 was a showcase of the power of quantum computers to solve classically hard computational problems. But what also got a lot of attention was the realization that a scalable quantum computer would break all of modern public-key cryptography. This has led to a flurry of activity in developing quantum-safe classical cryptography (also known as post-quantum cryptography). Ironically, quantum information theory is also expected to revolutionize network communication by providing provable secure cryptography (via quantum key distribution) and efficient information transfer (via quantum teleportation and super-dense coding).

The Strange World of Quantum

Unlike traditional computers which perform computations using bits, quantum computers use a more exotic unit of information known as a qubit. A qubit (short for a quantum bit) is like the proverbial Schrodinger's cat. Unlike a bit which can take one of two possible values 0 or 1, a qubit can be simultaneously both 0 and 1 when it is not measured and assumes one of the two values 0 or 1 only when it is measured. This is like the cat in Schrodinger's thought experiment, which can be alive and dead at the same time when it is not observed, but decides to be dead or alive when it is measured. This strange description of nature caused enormous discomfort for scientists like Einstein and Schroedinger. In fact, Schroedinger's thought experiment was intended to highlight the absurd implications of quantum theory. If a radioactive atom can be in a superposition of "decayed" and "not-decayed" state at the same time, then the cat whose life depends on the state of the radioactive atom would be in a superposition state of a dead and alive state at the same time. Only when someone chose to open the door and observe the cat does "nature decide" whether the radioactive atom is decayed or not, which would lead to the cat's observed state to be dead or alive. This raised some very unpleasant questions about the nature of reality at the level of elementary particles, which seem to go against our common experience. Such philosophical objections notwithstanding, quantum theory has proved to be an enormously successful and precise theory of nature. The work of scientists such as John Bell and others have demonstrated that nature indeed is quantum in all its glorious strangeness. Moreover, key aspects of this quantum strangeness such as superposition, non-determinism, entanglement, and interference provide significant advantages for computation and communication. This blog post will try to explain these core aspects of quantum information in simple terms. Subsequent posts will cover the topics of quantum computation, its implications for public-key cryptography and quantum communication.

The Mighty Bit

Much of modern technology is based on digital systems. The fundamental unit of information in such systems is a bit. They are truly the atoms that make up the digital universe. In contrast to analog systems that work with continuously varying quantities, digital systems process discrete quantities of information made out of bits. Bits are so fundamental for information processing because all of the entities of information such as numbers, characters, data structures, memory addresses, and packets transmitted across a network can be encoded in bits. Moreover, using George Boole's laws of logic, bits can be combined and operated upon to perform logical operations. All computer programs ultimately boil down to operations on bits. The discovery of the transistor allowed the efficient realization and manipulation of bits. Very simply, when a transistor switch is on, it allows a current to flow and represents a 1, else it represents a zero. Turning the switch on and off represents bit transitions from, 1 to 0 and vice-versa. Chips made of transistors can switch their bit-states hundreds of thousands a second. More importantly, transistors can be combined to form logic gates to perform operations on bits. Logic gates can be combined to form logic circuits that perform arbitrary computations. Large numbers of transistors known as MOSFETs are combined to form integrated chips (ICs) that can execute computations using logic circuits. The advantage of working with bits (and qubits) is that they abstract away the rules of logic from the physical realizations. So one does not need to know the physics and electronics of computers to understand the rules and algorithms associated with information processing. Let's examine some basic operations one can perform with bits.

Logic Gates

All of these gates can be implemented efficiently using transistors. One thing to note is that while the NOT gate is reversible, all of the other gates are irreversible. We will see later that in contrast to classical logic gates, all quantum gates are reversible. This has deep implications for the physics of computation.

It also turns out that certain gates are universal meaning that all other gates can be expressed in terms of them. For example, it can be shown that the NAND gate is universal for all classical computation.

Binary Representation of Numbers

Data in computers is represented using an array of bits. For example, a fixed-size array of 32 or 64 bits could be used to represent integers. By using a base 2 (binary) representation one can convert an array of bits into integers.

Just as one adds integers in decimal notation by "carrying over" a 1 when the sum of the integers in a decimal location is greater than or equal to10, one adds binary digits by performing an XOR (which is the same as addition modulo 2) and carries a 1 over when the sum of two bits in a binary location is greater than or equal to 2. The process of adding two bits and carrying over can be represented by a logic circuit composed of an XOR gate and an AND gate. Such a circuit is called a "half-adder" circuit. It takes two input bits A and B and produces two outputs A XOR B and A AND B, which equate to the SUM bit and CARRY bit respectively. When adding two binary representations, one travels from right to left and at each step, one adds the bits to the previously carried bit and then records the sum of the bits and carries over any overflowing bit to the next step. So at each step, you have 3 inputs, namely the previously carried bit (called Cin) and the two bits A and B, and two outputs namely the SUM of A and B and a carried over bit (called Cout). To handle all the cases involved in this scenario one uses the "full-adder" circuit as shown below. It's easy to verify that the full adder circuit handles all the cases involved in adding the three bits Cin, A, and B.

Arbitrary integer addition can be performed by stacking a half-adder on top of a stack of full-adders and passing on the carry output of each addere as the carry input to the next adder below and accounting for any overflow in the end.

Similarly, other arithmetic operations can be performed using algorithms that ultimately boil down to logic circuits. For example, a recursive scheme called the Karatsuba algorithm is used to efficiently multiply large integers using products of smaller integers, which in turn can be multiplied using logic circuits such as the ones discussed above. We will discuss the Karatsuba algorithm when we discuss integer factoring and Shor's algorithm in a later post. In addition to integers, other data structures such as characters, floating-point numbers, images, file systems, and memory addresses are all ultimately represented using arrays of bits. In many cases, integers themselves are used such as code points for ASCII and Unicode characters and RBG values for images. Often various forms of encoding (such as UTF-8) are used to convert the data structures into bit arrays when writing out data into files or transmitting data across a network. Bit operations perform a crucial role in manipulating these data structures. In fact, much of modern cryptography relies on bit operations and integer arithmetic. For example, the "one-time pad" is a provably secure algorithm for performing one-time encryption of a given string of characters and it is based on the XOR operation.

Logic circuits are fundamental for modern computing. It turns out that the circuit model of computing extends to quantum computing as well and is essential for implementing quantum algorithms. In fact, classical circuits can be extended to reversible quantum circuits and are used in all the famous quantum algorithms. We will see how quantum circuits are built and used in the next post on quantum computing.

Qubits

The qubit is a dramatic generalization of the classical notion of a bit. A bit is an abstract representation of a 2-state system that one often encounters in daily life. For example, a light bulb is a 2-state system that can be on or off, a transistor is a 2-state system that can be on or off resulting in current flowing or not, a spinning top is a 2-state system that can be spinning clockwise or counter-clockwise and a cat is a 2-state system that is either dead or alive. But there are many 2-state systems studied in physics that have a ghostly nature that follow the strange rules of quantum mechanics. A qubit is similar to a bit in that it is a 2-state system when measured, meaning it can take only one of two possible values 0 or 1. However, when a qubit is not measured its evolution is based on rules of quantum mechanics that state that it is in a linear superposition of the 0 and 1 states. A system that is in superposition is in both 0 and 1 states simultaneously with a certain proportion. The squares of these proportions represent probabilities. The following diagram motivates the concept of a qubit.

A bit can have two possible values 0 or 1. These are represented above using two red dots. Now replace the dots with vectors of length 1 where an "up" vector represents 0 and a "down" vector represents 1. Now to rotate this unit vector in 3-dimensional space. The endpoints of the vectors will reside on a unit sphere known as the "Bloch sphere". A qubit refers to a point on the surface of this sphere. This is the state of the qubit when it is not measured. It turns out that one can express a point on the Bloch sphere as a "complex linear combination" of the up and down vectors. Here complex means a number of the form "a + b i" where "i" is the square root of -1 and a and b are real numbers. The up and down vectors are called "basis states" and the vector representing a point on the unit sphere represents an arbitrary qubit state. Therefore a qubit can be expressed as a complex vector "c |0> + d |1>" where c and d are complex numbers and |0> and |1> represent the "up" and "down" basis vectors respectively. One can prepare a qubit in a certain state. For example, a qubit could start out as the "up" arrow (|0> state) and then undergo transformations to assume a different state on the Bloch sphere. If the transformations are known then the qubit state and the associated coefficients c and d are also known. However, the coefficients can never be measured. This is the famous quantum indeterminism. The coefficients c and d represent the "private world" of the qubit. When a qubit is measured, it undergoes an irreversible disturbance that causes the state to collapse to |0> or |1> with the probabilities given by the absolute square of the coefficients. If we were to sample identical qubits by measuring them, one could estimate the probabilities, but one can never measure the "inner state" of the qubit.

There are many 2-state systems in nature that behave in this way and can be represented by qubits. Some well-known examples are currents flowing through superconducting wires at very low temperatures, magnetic spins of charged particles, energy levels of trapped ions, and the polarization modes of photons. To understand qubits in their full generality one has to review the properties of complex numbers and vectors. Most of the literature on quantum information (and there is plenty out there) starts with the Bloch sphere and the most general complex representation of a qubit. To the initiated casual reader it can be a little daunting to read at first. However, it turns out that one can understand a lot about qubits using just high school trigonometry by focusing on "real-valued" qubits. In fact, most of the concepts of quantum information can be understood using this simpler flavor of qubits. Some of the most important algorithms in quantum information such as the Deutsch-Jozsa algorithm, Bernstein-Vazirani algorithm, Simon's algorithm, Quantum Key Distribution (QKD), and Quantum Teleportation algorithms can be understood using just real-valued qubits. Moreover, there is a concrete physical realization of real-valued qubits namely the linear polarization of a photon of light. It provides a perfect illustration of the key aspects of a qubit without requiring an understanding of complex numbers and linear algebra. In fact, when we do discuss complex-valued qubits we will use circular polarization of a light photon to illustrate the more advanced aspects of qubits. Eventually to understand the crown jewels of quantum computing such as Shor's algorithm, the Quantum Fourier Transform, and Grover's algorithm we will need to work with complex-valued qubits. But in this post, we will focus only on real-valued qubits and discuss complex-valued qubits only when we need them in the next post. Here is a diagram that was drawn by my 17-year old daughter to help my 14-year old son understand the basics of trigonometry. It will be essential for our discussion of real-valued qubits.

by Tanvi Adhikari

A real-valued qubit can be represented by just a point on the unit circle as shown below. The point (1,0) on the X-axis is labeled |0> and the point (0,1) on the Y-axis is labeled |1>. A qubit is an arbitrary point on the circle whose values can be expressed in terms of the angle of the vector with respect to the X-axis. The point can also be expressed as a linear combination of |0> and |1> as shown below.

The state of a qubit is always expressed in terms of a measurement basis. Any pair of orthogonal unit vectors can serve as a basis for representing qubits. The vectors represented by |0> and |1> are orthogonal to each other and are together called the "computational basis". The state |0> represents a qubit whose measured value with respect to the computational basis is always 0, and the state |1> represents a qubit whose measured value with respect to the same basis is always 1. When an arbitrary qubit is measured in the computational basis its value is 0 or 1 with a probability given by the square of the coefficient of |0> or |1> (which are respectively the square of the cosine and sine of the angle of the qubit vector with respect to the X-axis as shown above). Therefore the probabilities of obtaining 0 or 1 are given by the squares of the cosine and sine respectively. When a measurement is performed on the qubit, the state of the qubit "collapses" to |0> or |1> with the same respective probabilities.

Since the probability of a measurement outcome in a basis is the square of the linear coefficient of the qubit state with respect to the basis, we get the following conclusions:

Measurement of |0> with respect to the computational basis leaves the qubit in the state |0> and produces a value of 0 with 100% probability. The analogous statement is true for the measurement of the qubit state |1> with respect to the computational basis.
Measurement of the qubit states |+> or |-> with respect to the computational basis collapses the state to |0> or |1> with a probability of 1/2 (50%) and the measured value is 0 or 1 respectively.
One can measure a qubit state with respect to a "rotated basis" such as the +- basis. Measuring the state |0> or |1> with respect to the +- basis will collapse the state to |+> or |-> with a 50% probability each. Therefore, even though measuring a state like |0> in the computational basis produces an outcome of |0> will 100% probability, measuring it on a rotating basis disturbs it irreversibly and produces an indeterminate value of + or - with 50% probability. This is the uncertainty principle in action.
This phenomenon of certainties becoming probabilities when measured on a different basis is a crucial aspect of quantum mechanics and plays an important role in security protocols such as quantum key distribution.

Photon Polarization

Linear polarization of light provides a perfect illustration of qubit states. The wave-particle duality of light (and matter) is a fundamental principle of quantum mechanics. Light consists of electromagnetic waves. The polarization of light refers to the direction of oscillation of the electrical and magnetic fields on the plane perpendicular to the direction of propagation of lightwave (represented by the wave vector). A horizontally polarized light wave has its electric field oscillating along the X-axis (and the magnetic field along the Y-axis) based on a chosen X-Y coordinate system perpendicular to the wave vector. Similarly, a vertically polarized light wave will have its electrical field oscillating in the Y-axis (and the magnetic field along the X-axis) with respect to the chosen X-Y coordinate system. A horizontal polarizer will allow only horizontally polarized light and block vertically polarized light and vice versa. This can be demonstrated by putting a vertical polarizer behind a horizontal polarizer and sending in a light beam. No light will go through because the light coming out of the horizontal polarizer is horizontally polarized, which is blocked by the vertical polarizer. So far so good. But things get interesting if you place a polarizer between the horizontal and vertical polarizers that is parallel to the polarizers but rotated at an angle to the X-axis. While one would expect the light to be still blocked, it turns out that now actually a portion of the light is allowed to go through and the proportion of the light intensity that goes through is given by the square of the cosine of the angle between the inclined polarizer and the X-axis. The amount of light blocked is the square of the sine of the angle.

But light also consists of quanta of energy known as photons. The intensity of light is proportional to the number of photons going through a perpendicular surface area. Then one can explain the transmission or blocking of the light through the inclined polarizer in terms of the probability that a photon is allowed to go through or the probability that it blocked by the inclined polarizer. A photon that is allowed to go through would be said to be polarized in the inclined direction and a photon that is blocked can be thought of as polarized in a direction perpendicular to the axis of inclination. Since the intensity of light that goes through is proportional to the square of the cosine of the angle of the direction of polarization with the X-axis it follows that the probability of a photon going through is the square of the cosine of the angle of inclination and the probability of being blocked is the square of the sine of the angle of inclination. This provides us with some evidence that the polarization mode of a photon could be represented by a qubit. The quantum theory of a photon posits that the polarization mode of light is based on the spin of the photon (which is a boson) which is in fact a qubit. In fact, quantum security protocols such as quantum key distribution (QKD) make use of photonic qubits to securely share cryptographic keys.

Entanglement

Arguably the most intriguing aspect of qubits is their ability to get entangled with each other. When two qubits are independent and don't interact with each other their combined state can be expressed in terms of the individual states as a tensor product of the two states. This is shown below. The computational basis for a two-qubit system is just the set of all 2-bit strings - (|00>, |01>, |10>, |11>. The most general state of a 2-qubit system is a linear superposition of the basis states. As shown below this includes states like the Bell states that cannot be separated into a tensor product of their individual qubit states.

When two qubits are entangled with each other just knowing the states of the individual qubits is not sufficient to know the state of the combined pair of qubits. Thus the whole is greater than the sum of its parts. Thus the pair of qubits must be treated as a single system. Moreover, even though the measurement outcome of each of the qubits is non-deterministic, the measurement outcomes are strongly correlated with each other. For example, in the first two Bell states measurement of each qubit will produce a 0 or 1 with 50% probability. But if the measurement outcome of one of the qubits is 0 then the measurement outcome of the other qubit is also 0. Alternatively, if the measurement outcome of one qubit is 1 then the measurement outcome of the other has to be 1. It appears as though when one of the qubits is measured it instantaneously forces the state of the other qubit to be one or the other based on the first qubit's measurement outcome. This happens no matter how far the individual qubits are from each other in space. This is what Einstein referred to as "spooky action at a distance". Einstein believed that if one qubit could influence the other qubit instantaneously that would constitute a violation of the special theory of relativity. Therefore, he claimed in the famous EPR paper that quantum theory was an incomplete theory of nature. A complete theory of nature would account for "hidden variables" that explain the correlations between the measurement outcomes of the two qubits. Einstein's objection was based on a philosophical assumption called "local realism". Local realism posits that faraway events can’t influence each other faster than the speed of light (“locality”) and properties of objects have a definite value even if we don’t measure them (“realism”). It turns out that by Bell's theorem local realism is incompatible with quantum theory.

Entanglement is a fundamental characteristic of quantum mechanics and has been observed in many systems across large distances. For example, the light going through certain types of crystals called "nonlinear crystals" can produce linearly polarized photon pairs whose polarization states are entangled with each other.

In subsequent posts, we will discuss Bell's theorem, quantum communication, quantum gates, quantum circuits, and their application to quantum computation.

Thursday, January 14, 2021

Quantum entanglement and its applications: Part 1

Reinhold Bertlmann

Credit: AB1927, Public domain, via Wikimedia Commons

Unlike classical physics and general relativity, which deal with the deterministic evolution of physical variables such as position and momentum, quantum mechanics deals with an abstract entity called the state vector. In general, the state vector resides in an infinite-dimensional complex Hilbert space. However, in the world of quantum information and quantum computing one deals mostly with state vectors that are finite-dimensional. For example, the spin of a particle or the direction of a superconducting current or energy state of a trapped ion has a state vector that is simply a vector residing in a 2-dimensional complex Hilbert (inner product) space. By the Born rule (or Born postulate), the probability of an outcome during measurement of the physical variable (for example spin) is given by the square of the norm of the state vector. The evolution of the state vector can simply be represented by a 2 X 2 complex unitary matrix. In the absence of measurement, subjecting the system to conservative force fields simply results in the transformation of the state vector by a suitable unitary 2 X 2 matrix. The transformation of a state vector by these unitary matrics is called a quantum logic gate and can be represented graphically. One of the most unique and intriguing aspects of quantum mechanics is the phenomenon of entanglement. It deals with non-local correlations between measurements of complementary observables (such as position and momentum or spin directions) performed on parts of a system that are physically separated by a "large" distance. In the language of state vectors, it simply represents an indecomposable vector in the tensor product of two complex Hilbert spaces. The phenomenon of entanglement was first discussed by Einstein-Podolsky-Rosen in the famous EPR paper with a clearly stated goal of demonstrating the incompleteness of quantum mechanics as a theory of physical reality. EPR demonstrated that quantum mechanics had non-local effects, an anathema for Einstein as it seemed to violate special relativity. Actually, EPR only showed that quantum mechanics implies non-local correlations between measurements, but such a correlation is so counter-intuitive that it seemed to imply that there was more to quantum mechanics than the Copenhagen interpretation of quantum mechanics. Little did Einstein know that John Bell would later show that non-locality was an essential component of quantum mechanics. Bell showed that any local hidden variable theory would have to satisfy an inequality (known as Bell's inequality), which quantum mechanics did not satisfy. Bell wrote a wonderful paper called "Bertellman's Socks and the Nature of Reality", explaining the crux of the EPR paradox and Bell's inequality.

Bertlmann and his socks

Credit: AB1927, Public domain, via Wikimedia Commons

Since then many people have recast the EPR paradox and Bell's discovery into different formats to convey the non-classical, counter-intuitive (Mermin's "Local Reality Machine"), and computationally powerful (CHSH Nonlocal game) nature of entanglement. Today, entanglement forms the foundation of modern quantum information theory and has applications to cybersecurity via schemes like quantum key distribution and quantum cryptography. The following presentation tries to give a flavor of the history and applications of quantum entanglement. I gave this presentation in an evening class on quantum computing that I took at the UW Physics department.

In future posts, I hope to explain each of the slides on quantum entanglement (including Bell's paper on "Bertellman's socks") in simple terms. Stay tuned.

Entanglement and its applications from Prashanth Adhikari

Sunday, January 3, 2021

Penrose's work on singularities: Part 2

In Part 1 of this article, we saw that by 1965 there was theoretical evidence that a sufficiently massive star that is spherically symmetrical would collapse to a black hole after it has spent all its nuclear fuel. Since General Relativity postulates that spacetime is curved by the presence of matter and energy, it is expected that there would be a severe distortion of spacetime when the matter is compressed to a high density. The Oppenheimer-Snyder model of gravitational collapse showed that for a spherically symmetric, homogenous, and static distribution of dust (with no internal pressure) of a sufficiently large mass, there is no known mechanism to prevent the compression of the matter to infinite density. This would result in the formation of a black hole with an event horizon at a radius of 2GM/c^2 (Schwarzschild radius) and a singularity behind the event horizon. It is customary to rescale the dimensions so that G/c^2=1, so the Schwarzschild radius is simply 2M. The singularity represents a point of infinite curvature and also a point where all future-directed paths and light rays come to an end. In 1965 Roger Penrose proved that even if we did not make any assumptions of spherical symmetry, the geometrical constraints imposed by a very strong gravitational field will inevitably result in spacetime singularities,

As discussed in Part 1, Roger Penrose formulated his groundbreaking theorem in terms of five assumptions, which he showed were collectively inconsistent with each other. Four of those assumptions are based on assumptions of reasonable "niceness" of spacetime, namely a) The "Past" and "Future" condition, which states that there exists a consistent definition of past and future everywhere, b) The "Null Completeness" condition, which states that all paths built out of light rays can be extended indefinitely at all points and in all directions, c) The "Cauchy Hypersurface" condition, which assumes the existence of a non-compact connected Cauchy hypersurface (to be defined below), d) The "Null Energy" condition, which states that the local energy at any point is always non-negative. The fifth condition is the only one that pertains to the conditions expected near a black hole where gravity is so strong that all light rays are bent towards each other. Penrose's fifth assumption is e) The "Trapped Surface" condition, which states that there exists a 2-dimensional compact surface (like a sphere) for which all light rays emanating from the surface are bent towards each other.

Understanding the statement of the theorem and its proof entails absorbing a fair amount of geometry, topology, and terminology associated with the causal theory of spacetime and general relativity. In this article, I will attempt to explain the concepts in the simplest possible way without appealing to all the jargon that one would typically encounter in a rigorous exposition of the topic. We start with the causal theory of spacetime, which is fundamental to the entire subject.

Local Causal Structure of Spacetime

The origin of the causal theory of spacetime lies in Minkowski's reformulation of Einstein's special theory of relativity in terms of a four-dimensional spacetime. In special relativity, the speed of light occupies a very special place. Nothing can travel faster than the speed of light, a universal constant independent of any inertial observer's frame of reference. Points in the Minkowski model are events whose separation is measured by the Lorentz metric, a quantity that is invariant under Lorentz transformations. The causal theory of what events can influence other events and the domain of influence is a feature unique to the special theory of relativity and is not present in classical Newtonian mechanics.

A fundamental geometrical object when studying causal theory is the "light cone". Imagine lighting a candle at a point on Earth. If you ignore the effects of gravity, light from the candle will spread out radially in all directions in straight lines. These straight lines will sweep out a sphere in 3-dimensional space. Now if we suppress one of the spatial dimensions (say the Z-axis), then we can visualize this as an expanding circle. If we choose the vertical axis to represent time and suppress one of the space dimensions (because we cannot really visualize a 4-dimensional object!), then we will see that as time progresses vertically, the wavefront of light will spread farther and farther, on the surface of a cone. A similar cone can be envisioned going back in time.

The points in the interior and on the surface of the upper cone represent the causal future of a point (event) at the origin (at time t=0). Every event in the interior of the cone can be influenced by an object or signal traveling at a speed strictly less than the speed of light. Therefore, the interior of the cone is called the chronological future of the point at the origin. Points on the surface of the cone represent the boundary of the casual future. They are events that can only be influenced by signals traveling at the speed of light. The curves traced by such signals are called null geodesics the surface of the cone which is swept out by the null geodesics (light signals) is called a null hypersurface.

Minkowski spacetime is a "flat" manifold, meaning it has no curvature. It represents an idealized condition where one is not subject to any force or influence of any kind and all entities are moving at a uniform speed relative to each other (inertial frames of reference). For example, the spacetime very far away from any star could be approximated by Minkowski spacetime. In General Relativity, spacetime is curved by the presence of matter and energy. To quote John Wheeler, "matter tells spacetime how to curve, and spacetime tells matter how to move". The fundamental postulate of General Relativity known as the "Principle of Equivalence" states that near a point in spacetime the gravitational forces can be canceled ("transformed away by a coordinate change") by moving to a freely falling frame of reference. For example, if I am in a space station above the Earth I will experience a gravitational pull from the Earth. But if I step into a spacecraft and start freely falling in Earth's gravitational field, I will experience weightlessness. In geometric terms, this amounts to carrying out a coordinate transformation to a coordinate system wherein a small neighborhood spacetime looks and feels like Minkowski spacetime. Thus the Principle of Equivalence implies that as a manifold, spacetime is locally Minkowski at every point. So at every point, the local causal structure can be represented by a light cone. Depending on how spacetime curves due to the presence of matter, the light cones at different points might be pointing in different directions.

Timelike Curves

As a particle moves in spacetime, its trajectory in spacetime is represented by a worldline (also known as a timelike curve). At each point of that curve is a lightcone which represents the boundary of all the different spacetime directions in which a signal can travel. If the particle is a photon (quantum of light), then the curve is built out of light rays, and in that case, the lightcone is tangential to the curve at every point in spacetime. Such a curve is called a lightlike or null curve.

Lightlike (null) curves

So the concepts of timelike geodesics, null geodesics, chronological future, causal future, and the null hypersurface built out of the boundary carry over verbatim to curved spacetimes (also known as Lorentz manifolds). An essential ingredient for this causal analysis is assumption a) of Penrose's theorem namely the "Past" and "Future" assumption. It is essential that there be a consistent way to define past and future across the spacetime manifold to avoid pathologies.

For Penrose's theorem, it is important to consider not just the causal future of a point in spacetime, but the causal future of an entire "spacelike" surface in spacetime. Spacelike simply represents a slice of spacetime at a particular choice of time chosen uniformly across all points (the fact that you can do it is an assumption known as time orientability). In other words, a spacelike surface is just a region of space at a particular time. The spatial slice could not be curved.

The chronological, causal, and null future of a set satisfies some easily provable topological properties. The chronological future is an open set, meaning that for every event in the chronological future, you can find a "ball" of neighboring events in spacetime that reside in the causal future. Similarly, for any point that is on the boundary, every ball of neighboring events will have an event that is in the chronological future (interior of the causal future). Moreover, the boundary if nonempty is a closed 3-dimensional achronal C^0 submanifold of the 4-dimensional spacetime. Achronal means that no two points of the boundary can be joined by a timelike curve (worldline of an object traveling at a speed smaller than lightspeed). C^0 submanifold means that for each point on the boundary of the causal future, there is a 3-dimensional neighborhood of a point on the boundary that is topologically equivalent to an open ball in R^3 (Euclidean 3-space). In general, the boundary of the causal future will not be a smooth manifold as is evident from the lightcone and the boundary of a disconnected set.

The C^0 (topological) manifold structure by taking the so-called Riemann Normal coordinates of 4-dimensional spacetime around any point p on the boundary of a causal future. For a sufficiently small neighborhood, one can choose one of the coordinates to be timelike (since the neighborhood can be chosen to be Minkowski). The integral curves of the tangent vector of this coordinate will intersect the boundary in exactly one point because the boundary is achronal (no two points are joined by timelike curves). So the remaining 3 coordinates can be used to define a homeomorphism to R^3.

Such considerations will become important when we discuss Cauchy surfaces, trapped surfaces, and the proof of the Penrose theorem.

Global Causal Structure of Spacetime

Penrose theorem relies heavily on certain global assumptions about spacetime. Locally it is fairly clear what is happening topologically in spacetime given that it is a Lorentz manifold by the General Theory of Relativity. But when you stitch together these locally Minkowski spacetime neighborhoods, the resulting spacetime could have all sorts of pathological conditions. But when we look around with our telescopes we don't see any pathologies in spacetime. It is important to make an assumption that the spacetime starts out being nice and smooth and then determine what happens when gravity becomes too strong. For example, an "asymptotically flat" spacetime is consists of a 3-dimensional space that extends out to infinity where the gravitational field (aka the curvature of spacetime) becomes negligible far away from the source of the field (typically a massive object such as a star). Near the source spacetime is curved but far away from the source spacetime is almost "flat" (hence the name "asymptotically flat").

Asymptotically flat spacetime

The Cauchy Hypersurface condition satisfies a global niceness condition. It states, that there is an initial connected (not broken up) smooth 3-dimensional space that is spread out infinitely from which all of the spacetime can be developed in a well-defined fashion. In fact, the assumption is that the entire spacetime can be "built" out of slices of spacetime at each instance of time. Asymptotically flat spacetime is a perfect example, of a spacetime satisfying the Cauchy Hypersurface condition.

Cauchy Surfaces

A spacetime that satisfies the Cauchy surface condition has some nice properties. In fact, the technical definition of a Cauchy surface is a surface having the property that every timelike curve (a curve that is pointing in the chronological future of an event) will intersect it. It turns out (from the work of Choquet-Bruhat and Geroch) that spacetimes can be built smoothly from Cauchy surfaces. In addition, such spaces are also known to be "Globally Hyperbolic". Without getting too technical, it simply means that in such a spacetime you cannot go back in time (no closed timelike geodesics) and that there are no "holes" or gaps in the spacetime (the intersection of the causal future of an event p and the causal past of another event q that lies in the causal future of p is compact). It turns out that all Cauchy surfaces are topologically equivalent (homeomorphic to each other).

An intuitively obvious but crucial consequence of the Cauchy surface condition is that every point on the trajectory of a light curve in spacetime can be traced back to a point on the Cauchy surface using a timelike curve.

In this picture, you have two Cauchy hypersurfaces and a light signal that goes from event P1 in one Cauchy surface to event P2 in the other. But the point P2 is also the evolution of a point that is the intersection of the perpendicular timelike curve with the first Cauchy surface. The timelike curves that are perpendicular to each Cauchy surface define a homeomorphism (1-1 topological equivalence) between the two Cauchy surfaces. We saw earlier, that the boundary of the causal future of a spacelike surface is an achronal C^0 manifold generates by null geodesics (light rays as above). The timelike curves coming down from a point P2 on the boundary to the Cauchy surface Sigma_1 will map open sets to open sets, so it is a homeomorphism onto its image. This fact will become important in the proof of Penrose's theorem.

Raychaudhuri's focusing equation

Until now we have not really discussed the effect of gravity on light rays and the curvature of spacetime. The first and most famous verification of the General Theory of relativity was the observation of the bending of light during a solar eclipse by Arthur Eddington in May 1919. The phenomenon of gravitational lensing is well known today. Light rays emanating from stars behind a massive object (like a black hole) will be bent when they pass near the massive object.

Gravitational lensing (Credit: ESA/Hubble & NASA)

Raychaudhary was the first to study the implications of Einstein's equations for the collective behavior of families of geodesics in spacetime (such as families of light rays or families of trajectories of particles). He showed that since gravity is an attractive force, neighboring geodesics are bent towards each other and will eventually intersect. The intersection of infinitesimally close neighboring geodesics has a very important consequence. They are known as focal points or conjugate points. They have been studied extensively in the context of Riemannian differential geometry and they usually have important consequences for the global differential geometry of surfaces. Penrose and Hawking were the first to study them in the context of relativity and spacetime.

A familiar situation in ordinary Riemannian geometry where geodesics intersect is the example of a sphere.

Credit: Hawking-Penrose

If a geodesic in Riemannian geometry has a conjugate point, then it cannot be a length minimizing geodesic as seen on the sphere. If a great circle from a point p to q encounters a conjugate point r before reaching q, then it will not be the shortest path between p and q, because there will be another shorter great circle that directly joins p and q.

There are analogous implications of the existence of conjugate points for timelike and null geodesics in spacetime. The Raychaudhuri equation helps determine the conditions under which geodesics will encounter conjugate points. This brings us to assumption d) of Penrose's theorem - the "Null Energy" condition. The null energy condition (also known as the Weak Energy Condition) states that the local energy at any point is always non-negative. From Einstein's field equations it turns out that the Null energy condition is equivalent to the last term in the right-hand side of the Raychaudhuri equation being positive. This means that the entire right-hand side of the equation is bounded below by the square of the convergence. By solving the Raychaudhuri inequality (ignoring the shear terms and the energy term both of which are positive), one can show that the convergence factor is bounded below by a function of the affine parameter that depends on the initial convergence factor and the initial parameter value.

Now by itself, the Raychaudhari equation does not imply that all null geodesics will have conjugate points. After all light rays tend to spread out, so the initial convergence factor is usually negative. If you light a candle or if a star explodes in a large flash of light, the light rays will expand out spherically. If the initial convergence factor is negative, then even if gravity tries to focus the light rays back, it may not be enough to make them meet. That is unless gravity is so strong that the convergence factor starts out being positive. It seems counter-intuitive to imagine, but that is exactly what happens with a trapped surface. If the initial convergence factor is positive, then in a finite period of time (measured by the affine parameter), the convergence factor will blow up to infinity, meaning one will encounter a focal point.

Closed Trapped Surfaces

The final assumption of Penrose's theorem, which is based on the fact that you are in a situation of very strong gravity is e) The "Trapped Surface" condition. The following two diagrams will illustrate the contrast between a "normal" 2-dimensional surface in spacetime and a "trapped" 2-dimensional surface.

Credit: Hawking-Penrose

With a close trapped surface, you have a compact 2-dimensional surface (with no boundary) such as a sphere from where the light cones at the surface will always be tipped inwards, which is the same thing as saying that the initial convergence factor is positive (or equivalently the null expansion factor is negative). Since the Raychaudhuri equation shows that the convergence factor for light rays emanating from the trapped surface is bounded below by the initial convergence factor, the light rays will always have a positive convergence factor. If light rays (or null geodesics) can be extended indefinitely, which is assumption b) of Penrose's theorem, then eventually geodesics will have to encounter a focal point after a finite period of time. The terms caustics, conjugate points, and focal points are all used in the literature to describe the same concept.

There has been a tremendous amount of research on the conditions under which trapped surfaces will form. The most obvious example is in the case of a spherically symmetric black hole when the event horizon is formed. Compact spacelike surfaces behind the event horizon are trapped surfaces. Even if you deform the spacetime so that it is not spherically symmetric, the trapped surfaces will continue to form.

However, there are theorems that show that even in the absence of spherical symmetry, trapped surfaces can form during gravitational collapse or conditions of strong gravity.

Putting it all together - the punchline

To summarize, we know from the local causal theory of spacetime that the boundary of the chronological future of a spacelike surface is generated by null geodesics (think of the light cone in Minkowski space whose surface consists of light rays). Think of the surface of a star that is undergoing a supernova explosion. The particles emanating from the surface are moving into the chronological future of the star and the light rays from the boundary of the chronological future in spacetime. In particular, the boundary of the chronological future of a trapped surface is generated by null geodesics. What is different about a trapped surface as opposed to any other surface (such as the surface of a star) is that the light rays will all start focusing on each other. If you let the light rays travel indefinitely, they will have to intersect their infinitesimal neighbors at some point, so there will be a focal point on each geodesic. Penrose proves that this contradicts the Cauchy surface condition. The easiest way to visualize the proof is to examine the following diagrams. They are due to Penrose and are taken from his 2020 Nobel lecture.

Penrose uses normal null geodesics that are emanating from the trapped surfaces and shows them converging. A key fact is that any null geodesic that encounters a focal point stops being null and becomes timelike after it crosses the focal point. Hence it must enter the interior of the chronological future (see the image below).

The proof of this fact is a bit subtle (see Hawking-Ellis Proposition 4.5.12 or Witten Section 5.2). The heuristic argument is that if you take a geodesic \gamma from p to r containing a focal point at q will allow an infinitesimally nearby geodesic that will also join p and q. Then the neighboring geodesic plus the segment qr will have a "kink" at q, which means that this new curve is not a geodesic. But this curve has the same length as \gamma. By smoothening out the kink we can create a path that reaches its destination to the past of q. This implies that the original curve \gamma is timelike.

The fact that the null generators of the boundary leave the boundary and enter the interior after a finite amount of time implies that the boundary itself has to have a finite extent. In other words, the null boundary of the chronological future of the trapped surface is compact as can be seen in Penrose's diagrams above. Now Penrose claims that this contradicts the Cauchy hypersurface (global hyperbolicity) condition. Compare Penrose's conical diagram above with the Cauchy hypersurface diagram below.

Credit: Wald GR

The light ray from P1 to P2 is one of the generators of the cones shown above. It turns out that the null boundary of the causal future of any surface can be mapped down to the Cauchy surface using timelike geodesics that are orthogonal to the Cauchy surface. If you can extend the null geodesics indefinitely, then they would form a hypersurface that is topologically equivalent to the Cauchy surface. So the null boundary of the chronological future must be a Cauchy hypersurface. But that is not possible if the null boundary "closes up" onto itself. The technical way to state this is that the focusing of null geodesics results in the null boundary of the causal future of the trapped surface being compact. But the original assumption was that the Cauchy surface was non-compact (meaning extending out to infinity).

Mapping of future null boundary to Cauchy surface

(Credit: Sayan Kar, IIT KGP)

So the boundary curling up into itself to form a compact hypersurface cannot happen if the initial Cauchy hypersurface is non-compact. This means that the null geodesics generating the boundary of the causal future of a trapped surface must not be extendible beyond a certain point. This is incompleteness. The light rays are moving in direction of a singularity, but will never reach it. For example, the squiggly line below represents r=0 which cannot be reached by the light rays. The incompleteness (presence of singularity) means you can imagine slicing the surface along the singularity and spreading the null boundary out to map to the initial Cauchy hypersurface.

Credit: Wald GR

The technical proof arrives at a contradiction by showing that the timelike mapping to the initial Cauchy hypersurface is compact and open. Being open and closed the image of the mapping is the entire Cauchy hypersurface since the latter is connected. But that is a contradiction because the Cauchy hypersurface is non-compact. Again the perfect example of a non-compact Cauchy hypersurface is the asymptotically flat spacetime surrounding an object such as a star.

Closing Remarks

Penrose's result was a turning point in the study of collapsed objects and the subsequent work by him and Hawking started a revival of interest in General Relativity. The Hawking-Penrose singularity theorems represent a landmark in the history of General Relativity. The developments spurred by their work would have possibly shocked Albert Einstein, the discoverer of relativity who always believed that singularities were a mathematical anomaly. But the exciting aspect of singularities is that they provide a hint of new physics that is yet to be developed. The study of black holes and singularities is an active area of research in theoretical physics, astrophysics, and mathematics.

Any discussion of black holes and singularities would be incomplete without a discussion of Penrose's cosmic censorship conjecture. Since Penrose's (and Hawking's) results show that singularities are inevitable when gravity is very strong, why is it that we don't encounter or observe singularities in the universe? Remember all the evidence for black holes is about dark supermassive objects that exercise enormous gravitational influence on their neighborhood. All known models of black holes have an event horizon. The images taken of the region around black holes show either gravitational lensing of light from stars behind the black hole or light spinning around the black hole near the event horizon (photosphere) or the accretion disk, which is a region near the event horizon where the matter is sucked into the black hole.

It would be rather disconcerting if there were singularities just lying about in spacetime, but their invisibility led Penrose to make this conjecture.

Weak Cosmic Censorship Hypothesis: Nature abhors a naked singularity.

In other words, even though singularities are inevitable in general relativity, they are always hidden behind event horizons. No observer from outside (at "Null Infinity") can see a singularity. It turns out that this allows one to develop a nice theory of black holes since a lot of physics can be done without worrying about the singularities. Proving or disproving the cosmic censorship hypothesis is one of the central problems of mathematical general relativity.

References:

General Relativity by Robert M. Wald

The large scale structure of space-time by S. W. Hawking and G. F. R. Ellis

Light Rays, Singularities, and All That by Ed Witten

Thursday, December 31, 2020

Penrose's Work on Singularities: Part 1

Image of a black hole (2019)

Credits: Event Horizon Telescope collaboration et al.

The most beautiful thing we can experience is the mysterious. It is the source of all true art and science. - Albert Einstein

The year of black holes

For most people, 2020 will be remembered as the year that a pandemic raged across the globe killing hundreds of thousands and disrupting many lives. It was the year of social and political upheaval culminating in a contentious US election. But it was also the year that Black Hole research received the recognition that it deserved. The 2020 Nobel Prizes in Physics were awarded to Sir Roger Penrose, Andrea Ghez, and Reinhard Genzel for their pioneering work on Black Holes. In announcing the prizes the Nobel committee stated that half of the Nobel Prize was awarded to Roger Penrose "for the discovery that black hole formation is a robust prediction of the theory of relativity". This post will try to unpack this statement and provide an intuitive feeling for the uninitiated reader of Penrose's remarkable work. The discovery in question refers to a short 2-1/2 page paper entitled "Gravitational Collapse and Singularities" that Penrose published in January of 1965 in the journal "Physical Review Letters". In this paper, Penrose provided rigorous mathematical proof that under certain conditions the formation of a singularity in space-time is unavoidable. What is a singularity and what does it have to do with black holes? A singularity is a place and time where something really "bad" happens. It could be things like the curvature of space-time "blowing up" to infinity or a "tear in the very fabric of space-time". For example, a sufficiently massive object could collapse under its own gravitational force, and if there is nothing to resist the collapse it could distort space-time so badly that its curvature could end up becoming infinite. But such descriptions are somewhat misleading. What it really means is that there is a breakdown in the physical theory, and that a broader theory is needed to explain what is going on. Singularities don't have to be associated with points of infinite curvature. For example, in the case of the Big Bang, things seem to come out of nowhere, meaning particles don't have a history beyond a certain point in time in the past. Similarly, in the interior of a black hole particles or light rays could reach a point beyond which spacetime simply ceases to exist.

Penrose Lecturing on the Big Bang in Berlin, 2015

Incidentally, any mention of Penrose's work and singularities would be incomplete without mention of Stephen Hawking. As portrayed in the movie "The Theory of Everything" both were young graduate students at Cambridge when Penrose made his discovery on singularities. Hawking immediately understood the significance of Penrose's work and applied it to Cosmology and the Big Bang. By essentially reversing the time direction of Penrose's argument, Hawking was able to prove that there had to be a singularity at the time of the Big Bang (the birth of space and time!). Both Penrose and Hawking were awarded the prestigious Adams Prize in 1966 for their research. They then went on to collaborate and publish a series of singularity theorems that are now collectively known as the Hawking-Penrose theorems. Hawking became an iconic figure in Science who overcame a debilitating disease (ALS) to make groundbreaking discoveries in physics. It is unfortunate that Hawking died in 2018, else he would have surely shared the Nobel prize with Penrose.

In the next couple of sections, we will discuss the history of black holes and singularity research prior to Penrose's publication of his 1965 paper.

The Schwarzschild Singularities

In 1915 Albert Einstein made history when he presented his General Theory of Relativity to the Prussian Academy of Sciences. Newspapers around the world hailed the discovery as the most important since Newton and Einstein became a household name. It heralded the dawn of a new era in Science with a new and transformed understanding of the universe. Just one month after the publication of his results on space, time, and gravitation, Einstein was stunned to receive a postcard from a lieutenant in the German army containing the first-ever exact solutions to his field equations of gravitation. Einstein's equations are highly non-linear differential equations and notoriously difficult to solve. Einstein had only been able to supply an approximate solution in the context of the planetary motion of Mercury. But here was a postcard from someone posted at the Russian front that said "As you see, the war treated me kindly enough, in spite of the heavy gunfire, to allow me to get away from it all and take this walk in the land of your ideas." Karl Schwarzschild was the Director of the Astrophysical Observatory in Potsdam, but as a patriot had decided to join the army to fight in the war. During breaks from the fighting on the Russian front, he had managed to find the time to not only read Einstein's latest papers but also solve Einstein's equations for the space-time surrounding a spherically symmetric, non-rotating, non-charged body. Einstein was impressed and replied "I have read your paper with the utmost interest. I had not expected that one could formulate the exact solution of the problem in such a simple way. I liked very much your mathematical treatment of the subject. Next Thursday I shall present the work to the Academy with a few words of explanation".

Schwarzschild's postcard to Einstein and his metric

Elegant and beautiful as Schwarzschild's solution was, it had a problem. There were two singularities in it, one at the center of the body at radius r=0 and one at the radius r=2GM/c^2, known today as the gravitational radius (or Schwarzschild radius). Here G is Newton's gravitational constant, M is the mass of the body and c is the speed of light. At these two points the solutions "blew up", meaning that they shot up to infinity. Einstein did not consider these singularities as physically meaningful. In fact in the spirit of classical electrostatic and Newtonian gravitational potentials, he made the assumption that the Schwarzschild solution applied only outside the spherical region of radius r=2GM/c^2. It is unclear what he expected to happen inside that sphere, but he considered them mathematical pathologies that had no physical meaning. In fact, Einstein was wrong to dismiss the singularity at the Schwarzschild radius. Today, the sphere at this radius is known as the "event horizon" of a black hole. Inside this spherical region, nothing can escape the gravitational attraction towards the center. The escape velocity exceeds the speed of light, so even light cannot escape, hence the term "black hole". As for the singularity, it was discovered by Arthur Eddington and David Finkelstein that it was indeed just an artifice of the choice of coordinates used by Schwarzschild. Eddington and Finkelstein showed that the singularity could be "transformed away" by simply choosing a different set of coordinates (now known as the Eddington-Finkelstein coordinates). However, r=0 was still a bonafide singularity and could not be transformed away. That did not cause any worry for Eddington or anyone else since the mass was centered at r=0, so the belief was that the field equations did not apply there.

Collapsing Stars

The first sign of trouble came in 1929 when a 19-year old Indian astrophysicist named S. Chandrashekhar performed some calculations on the final fate of stars on his sea voyage from India to England. The prevailing wisdom at that time was that a star that had spent all its nuclear fuel, would start collapsing causing outgoing shockwaves that would eject its outer shell in a "supernova explosion".

SN2018gv observed by the Hubble telescope

The inner core of such a collapsing star would settle into a stable object known as a white dwarf. It was believed that the white dwarf was prevented from further collapsing by something known as the "electron degeneracy pressure". In essence, the electron degeneracy pressure is a consequence of the Pauli exclusion principle in quantum mechanics which states that no two electrons can be in the same quantum state at the same time. So if you squeeze a collection of cold electrons in a small space, their repulsion due to Pauli exclusion principle and the electrostatic repulsion would result in an outward-facing pressure. The famous astrophysicist Ralph Fowler had shown that the electron degeneracy pressure was sufficient to resist the gravitational force and prevent the white dwarf from collapsing into itself. In doing so he ignored the relativistic motion of particles. Chandrashekhar produced a "relativistic degeneracy formula" that showed that if the star's mass was greater than 1.4 times the solar mass, then the electron degeneracy pressure was insufficient to prevent the star from collapsing beyond a white dwarf. This implied that the star would keep shrinking and collapsing ad infinitum. This was a startling and highly disconcerting discovery. While most experts including Fowler believed that Chandra's results were correct, Arthur Eddington who was highly influential at that time reacted with derision. At a conference in 1935, Eddington told his audience that Chandrasekhar's work “was almost a reductio ad absurdum of the relativistic degeneracy formula. Various accidents may intervene to save a star, but I want more protection than that. I think there should be a law of Nature to prevent a star from behaving in this absurd way!” Roger Penrose gave a nice talk on the topic, where he made the point that even though Chandrashekhar was correct in his calculations, Eddington was also right in believing that something in nature should prevent a star from collapsing indefinitely. Penrose does point out that Chandrashekhar (who was of a conservative bent of mind) was careful not to speculate about the eventual state of such an endlessly collapsing star. And it is known today that Eddington was wrong about the relativistic degeneracy formula. By dismissing Chandra's work, Eddington may have delayed much-needed research in the area of stellar collapse.

Penrose talking about Chandra

Chandrashekhar was eventually vindicated in his work and today the size of 1.4 solar mass is called the Chandrashekhar limit (Chandra also won the Nobel prize in Physics in 1983 for his work on the structure and evolution of stars). Meanwhile, Walter Bade and Fritz Zwicky proposed the existence of a neutron star just two years after the discovery of the neutron by James Chadwick in 1931. They predicted that a bigger star could squeeze the electrons and protons together to form neutrons which would generate a "neutron degeneracy pressure" that would resist runaway gravitational collapse in a manner similar to the electron degeneracy pressure. In 1939, Oppenheimer and Volkoff calculated an upper bound to the mass of cold, nonrotating neutron stars, analogous to the Chandrashekhar limit for white dwarf stars. This is known today as the Tolman-Oppenheimer-Volkoff limit and is estimated to be between 1.5 and 3 solar masses.

But what if the mass was greater than 3 solar masses. Was there a "law of Nature" as Eddington expected that would prevent a star from collapsing indefinitely? In a 1939 paper entitled "On Continued Gravitation Contraction", Oppenheimer and Snyder showed, that a spherically symmetric ball of gas of sufficient mass would necessarily collapse beyond the stage of a neutron star. In their words "When all thermonuclear sources of energy are exhausted a sufficiently heavy star will collapse. Unless fission due to rotation, the radiation of mass, or the blowing off of mass by radiation, reduce the star's mass to the order of that of the sun, this contraction will continue indefinitely." They further showed that "the radius of the star approaches asymptotically its gravitational radius; light from the surface of the star is progressively reddened and can escape over a progressively narrower range of angles". In essence, they showed that for a spherically symmetric ball of gas, the gravitational collapse will result in infinite density and the creation of an event horizon in finite time.

The 1965 Paper

The reception to the Oppenheimer-Snyder paper was lukewarm due to the assumptions made about spherical symmetry. Ironically, at about the same time Einstein published a paper claiming that singularities could never form in General Relativity. His paper contained a mistake. Until Penrose's 1965 paper, there was a question as to whether objects like black holes and singularities were mathematical pathologies that could never exist in nature. The Russians Kalatnikov and Lifshitz claimed to have proved that singularities could not occur in cosmology. Their paper contained an error that was later corrected by Belinski. But it did not constitute a categorical proof that singularities could not occur and Penrose himself was skeptical of their methods. The objection towards assumptions made in prior work is best articulated by Penrose himself in his 1965 paper:

"The question has been raised as to whether this singularity is, in fact, simply a property of the high symmetry assumed. The matter collapses radially inwards to the single point at the center, so that a resulting space-time catastrophe there is perhaps not surprising. Could not the presence of perturbations which destroy the spherical symmetry alter the situation drastically? The recent rotating solution of Kerr [also possesses a physical singularity, but since a high degree of symmetry is still present (and the solution is algebraically special), it might again be argued that this is not representative of the general situation. Collapse without assumptions of symmetry will be discussed here."

There are a few notable facts about Penrose's result.

Penrose makes a very generic argument without making any assumptions of symmetry. Unlike prior results on black holes and singularities, which rely on explicit solutions of Einstein's equations, Penrose's work makes use of differential topology and global methods in geometry.
Second, Penrose gives a very precise definition of singularity which is broader than the usual definition based on infinite curvature. Specifically, Penrose uses a concept in differential geometry called "geodesic incompleteness" as a proxy for the presence of singularities. Incompleteness means you cannot go past a certain point in space-time, which is an indication of a breakdown in the predictability of space-time.
Third, Penrose's result is a negative statement in the sense that it simply says that under certain reasonable assumptions about space-time and gravitational collapse, space-time has to become incomplete. It says nothing about where exactly the completeness breaks down and it does not even make any claim about the nature of the singularities that would lead to such incompleteness. However, what Penrose's paper does do is provide rigorous mathematical proof that incompleteness is inevitable if certain reasonable conditions are met.

With regard to the implications of his result, Penrose makes this intriguing remark. "If, as seems justifiable, actual physical singularities in space-time are not to be permitted to occur, the conclusion would appear inescapable that inside such a collapsing object at least one of the following holds: (a) Negative local energy occurs. (b) Einstein’s equations are violated. (c) The space-time manifold is incomplete. (d) The concept of space-time loses its meaning at very high curvature – possible because of quantum phenomena. In fact (a), (b), (c), (d) are somewhat interrelated, the distinction being partly one of attitude of mind."

Geodesic Incompleteness

Geodesics refer to the "straightest possible" curves on a surface. For example, a straight line is a geodesic on a flat plane. The longitudinal lines on a sphere (great circles) are geodesics. They are curves of extremal (maximum or minimum) length between two points. Typically one thinks of the shortest path between two points. However, in space-time, it is more appropriate to consider the time it takes for an object or a signal to travel between two points (events). In General Relativity, freely falling bodies in a gravitational field follow geodesics. In addition light rays also follow geodesics. They are the analogs of "straight" lines on the plane.

Penrose's diagram showing a singularity

A surface or a manifold is said to be geodesically complete if starting at any point p you can follow a smooth path indefinitely in any direction. A plane and a sphere are both geodesically complete. But if you remove a point from the plane ("punctured plane") you get a geodesically incomplete space. If you follow a straight line going towards the missing point at a certain speed, then after a finite amount of time you will hit the puncture ad you cannot go any further. You can think of the point that is removed as the singularity. Of course, a punctured plane is an artificial example, because it resides in an ambient smooth space namely the plane and one can remedy the incompleteness by simply adding back the point. But in general, that may not be possible. In differential geometry, you can define manifolds without immersing them in a larger space. If you follow a geodesic path and the path cannot be continued after a certain point, then the manifold is said to be geodesically incomplete. For example, a light ray or a spaceship moving towards the center of a Schwarzschild black hole will not have a future after a finite amount of time because it will encounter the singularity at r=0. Time and space literally come to an end at that point. Penrose has a rather amusing footnote in relation to his reference to space-time incompleteness: "The “I’m all right, Jack” philosophy with regard to singularities would be included under this heading!" "I'm all right, Jack" is a well known English expression indicating smug and complacent selfishness. It was also the title of a well known British movie starring Peter Sellers. Clearly, singularities are not very accomodating when it comes to letting things and signals from getting through. Technically speaking Penrose's singularity theorem should truly be called "Penrose's Incompleteness Theorem".

The Singularity Theorem

Penrose's paper is a mathematician's dream to dig into. But sadly for the uninitiated, it would be daunting to comprehend as it uses sophisticated mathematical concepts. The physicist Ed Witten said jokingly in a lecture that there are a small set of ideas in the paper that if understood would make even the uninitiated an expert. However, Witten's comment is addressed to his colleagues at the Institute for Advanced Study who could be called anything but uninitiated.

Witten's lecture at IAS on singularities

Regardless of my misgivings about Witten's assessment, I will attempt to convey the key ideas Penrose's theorem and its proof. Penrose lists five assumptions, which by themselves are reasonable but together lead to a mathematical inconsistency. The five assumptions are as follows:

"Past and Future": Space-time is a smooth manifold with a clear definition of past and future everywhere.
"Null Completeness": Every path built out of light rays can be extended indefinitely into the future.
"Cauchy hypersurface condition": Initial condition of space based on the distribution of matter allows one to determine its evolution over time (in relativity space-time is a dynamical entity that evolves with time). A Cauchy surface is a "nice" initial surface in space-time that can be used to predict the future dynamic evolution of spacetime. Penrose makes a crucial assumption that there exists a "non-compact" Cauchy hypersurface. Non-compact Cauchy hypersurface means a 3-dimensional surface that extends out to spatial infinity. For example, if there is a "non-compressed" spatial distribution of matter and energy, then locally space will be curved but it will be surrounded by "flat" space that extends out to infinity.
"Non-negativeness of local energy": The local geometry of space-time is affected by the local distribution of energy. The non-negativeness of energy implies the non-negativeness of curvature. This is essentially stating that gravity is an "attractive force".
"Trapped surface": This is the one and only assumption that is unique to a black hole situation. Trapped surfaces are special types of spherical surfaces where all light rays that emerge from them start bending towards each other. In general, outgoing light rays will spread out from the surface of a sphere. But behind an event horizon gravity is so strong that space itself is shrinking with time. If the shrinking of space is faster than the spreading of light, then over time the light rays will start focusing towards each other. The great advantage that trapped surfaces have over previous approaches to singularities is that they are robust to small perturbations away from spherical symmetry. So even if the space-time is distorted so that it is not spherically symmetric, trapped surfaces will continue to form behind the event horizon.

As a consequence, if we assume that assumptions 1, 3, and 4 are true and if we assume the presence of a trapped surface (assumption 5), then we have to conclude that assumption 2 is false, which implies that incompleteness of space-time is inevitable. So stated more simply Penrose's theorem states that for any reasonable space-time where the matter is distributed in a concentrated region of space and space-time becomes close to "flat" as we move away from that region, the presence of trapped surfaces will inevitably result in the incompleteness of space-time (aka singularities). We will discuss these assumptions in more detail and the proof of Penrose's theorem in Part 2 of the article.