Perspikacia - the Insights blog.: 2021

Saturday, May 22, 2021

Quantum for the curious: 1 - Qubits

Quantum computing has been in the news lately. Companies, governments, and research labs throughout the world are making significant investments in developing quantum computers. Large companies such as Google, Microsoft, and IBM are racing to build a scalable quantum computer because they believe it will give them a significant edge in high-performance computing. IBM was one of the first companies to offer cloud-based access to a quantum computer for experimentation and education. Google made a splash in October 2019 when they announced that they had achieved "quantum supremacy" for the first time in history. This means that they were able to perform a calculation on their quantum computer, which they estimated would take the fastest supercomputer 10,000 years to complete. Similar announcements have been made since then by teams of researchers in China. While there is a lot of hype around this technology, there are many challenges with building a scalable quantum computer. Quantum computers derive their power from massive parallelism that is made possible by the quantum nature of computational elements in them. But those quantum states are finicky and can be easily destroyed by noise in the system. To mitigate this phenomenon known as decoherence, quantum computing systems have to be isolated from the environment and cooled to very low temperatures. In addition, environmental noise can produce errors that have to be accounted for during computation. This is a major challenge today, but there is a tremendous amount of research being conducted to address this problem and a lot of progress is being made. So there is an expectation that in the next 20-30 years scalable quantum computers will be available to perform a variety of computational tasks. For example, IBM which runs a 65-qubit quantum processor has promised to have a 1000-qubit quantum computer by 2023. It does raise the question of what these quantum computers are going to be good for. There are many potential applications of quantum computers. Simulation of quantum phenomena in chemistry and physics, large-scale optimization problems, and machine learning are some of the well-known applications. But arguably the most dramatic consequence of a quantum computing revolution would be its implications for network communication and security. Public key cryptography is the foundation of secure communication on the internet. Modern eCommerce and internet banking as well as other forms of secure internet communication are all based on TLS, which in turn uses public-key cryptography. The latter is based on the computational hardness of problems such as large factorization. Peter Shor's discovery of an efficient quantum algorithm for integer factorization in 1994 was a showcase of the power of quantum computers to solve classically hard computational problems. But what also got a lot of attention was the realization that a scalable quantum computer would break all of modern public-key cryptography. This has led to a flurry of activity in developing quantum-safe classical cryptography (also known as post-quantum cryptography). Ironically, quantum information theory is also expected to revolutionize network communication by providing provable secure cryptography (via quantum key distribution) and efficient information transfer (via quantum teleportation and super-dense coding).

The Strange World of Quantum

Unlike traditional computers which perform computations using bits, quantum computers use a more exotic unit of information known as a qubit. A qubit (short for a quantum bit) is like the proverbial Schrodinger's cat. Unlike a bit which can take one of two possible values 0 or 1, a qubit can be simultaneously both 0 and 1 when it is not measured and assumes one of the two values 0 or 1 only when it is measured. This is like the cat in Schrodinger's thought experiment, which can be alive and dead at the same time when it is not observed, but decides to be dead or alive when it is measured. This strange description of nature caused enormous discomfort for scientists like Einstein and Schroedinger. In fact, Schroedinger's thought experiment was intended to highlight the absurd implications of quantum theory. If a radioactive atom can be in a superposition of "decayed" and "not-decayed" state at the same time, then the cat whose life depends on the state of the radioactive atom would be in a superposition state of a dead and alive state at the same time. Only when someone chose to open the door and observe the cat does "nature decide" whether the radioactive atom is decayed or not, which would lead to the cat's observed state to be dead or alive. This raised some very unpleasant questions about the nature of reality at the level of elementary particles, which seem to go against our common experience. Such philosophical objections notwithstanding, quantum theory has proved to be an enormously successful and precise theory of nature. The work of scientists such as John Bell and others have demonstrated that nature indeed is quantum in all its glorious strangeness. Moreover, key aspects of this quantum strangeness such as superposition, non-determinism, entanglement, and interference provide significant advantages for computation and communication. This blog post will try to explain these core aspects of quantum information in simple terms. Subsequent posts will cover the topics of quantum computation, its implications for public-key cryptography and quantum communication.

The Mighty Bit

Much of modern technology is based on digital systems. The fundamental unit of information in such systems is a bit. They are truly the atoms that make up the digital universe. In contrast to analog systems that work with continuously varying quantities, digital systems process discrete quantities of information made out of bits. Bits are so fundamental for information processing because all of the entities of information such as numbers, characters, data structures, memory addresses, and packets transmitted across a network can be encoded in bits. Moreover, using George Boole's laws of logic, bits can be combined and operated upon to perform logical operations. All computer programs ultimately boil down to operations on bits. The discovery of the transistor allowed the efficient realization and manipulation of bits. Very simply, when a transistor switch is on, it allows a current to flow and represents a 1, else it represents a zero. Turning the switch on and off represents bit transitions from, 1 to 0 and vice-versa. Chips made of transistors can switch their bit-states hundreds of thousands a second. More importantly, transistors can be combined to form logic gates to perform operations on bits. Logic gates can be combined to form logic circuits that perform arbitrary computations. Large numbers of transistors known as MOSFETs are combined to form integrated chips (ICs) that can execute computations using logic circuits. The advantage of working with bits (and qubits) is that they abstract away the rules of logic from the physical realizations. So one does not need to know the physics and electronics of computers to understand the rules and algorithms associated with information processing. Let's examine some basic operations one can perform with bits.

Logic Gates

All of these gates can be implemented efficiently using transistors. One thing to note is that while the NOT gate is reversible, all of the other gates are irreversible. We will see later that in contrast to classical logic gates, all quantum gates are reversible. This has deep implications for the physics of computation.

It also turns out that certain gates are universal meaning that all other gates can be expressed in terms of them. For example, it can be shown that the NAND gate is universal for all classical computation.

Binary Representation of Numbers

Data in computers is represented using an array of bits. For example, a fixed-size array of 32 or 64 bits could be used to represent integers. By using a base 2 (binary) representation one can convert an array of bits into integers.

Just as one adds integers in decimal notation by "carrying over" a 1 when the sum of the integers in a decimal location is greater than or equal to10, one adds binary digits by performing an XOR (which is the same as addition modulo 2) and carries a 1 over when the sum of two bits in a binary location is greater than or equal to 2. The process of adding two bits and carrying over can be represented by a logic circuit composed of an XOR gate and an AND gate. Such a circuit is called a "half-adder" circuit. It takes two input bits A and B and produces two outputs A XOR B and A AND B, which equate to the SUM bit and CARRY bit respectively. When adding two binary representations, one travels from right to left and at each step, one adds the bits to the previously carried bit and then records the sum of the bits and carries over any overflowing bit to the next step. So at each step, you have 3 inputs, namely the previously carried bit (called Cin) and the two bits A and B, and two outputs namely the SUM of A and B and a carried over bit (called Cout). To handle all the cases involved in this scenario one uses the "full-adder" circuit as shown below. It's easy to verify that the full adder circuit handles all the cases involved in adding the three bits Cin, A, and B.

Arbitrary integer addition can be performed by stacking a half-adder on top of a stack of full-adders and passing on the carry output of each addere as the carry input to the next adder below and accounting for any overflow in the end.

Similarly, other arithmetic operations can be performed using algorithms that ultimately boil down to logic circuits. For example, a recursive scheme called the Karatsuba algorithm is used to efficiently multiply large integers using products of smaller integers, which in turn can be multiplied using logic circuits such as the ones discussed above. We will discuss the Karatsuba algorithm when we discuss integer factoring and Shor's algorithm in a later post. In addition to integers, other data structures such as characters, floating-point numbers, images, file systems, and memory addresses are all ultimately represented using arrays of bits. In many cases, integers themselves are used such as code points for ASCII and Unicode characters and RBG values for images. Often various forms of encoding (such as UTF-8) are used to convert the data structures into bit arrays when writing out data into files or transmitting data across a network. Bit operations perform a crucial role in manipulating these data structures. In fact, much of modern cryptography relies on bit operations and integer arithmetic. For example, the "one-time pad" is a provably secure algorithm for performing one-time encryption of a given string of characters and it is based on the XOR operation.

Logic circuits are fundamental for modern computing. It turns out that the circuit model of computing extends to quantum computing as well and is essential for implementing quantum algorithms. In fact, classical circuits can be extended to reversible quantum circuits and are used in all the famous quantum algorithms. We will see how quantum circuits are built and used in the next post on quantum computing.

Qubits

The qubit is a dramatic generalization of the classical notion of a bit. A bit is an abstract representation of a 2-state system that one often encounters in daily life. For example, a light bulb is a 2-state system that can be on or off, a transistor is a 2-state system that can be on or off resulting in current flowing or not, a spinning top is a 2-state system that can be spinning clockwise or counter-clockwise and a cat is a 2-state system that is either dead or alive. But there are many 2-state systems studied in physics that have a ghostly nature that follow the strange rules of quantum mechanics. A qubit is similar to a bit in that it is a 2-state system when measured, meaning it can take only one of two possible values 0 or 1. However, when a qubit is not measured its evolution is based on rules of quantum mechanics that state that it is in a linear superposition of the 0 and 1 states. A system that is in superposition is in both 0 and 1 states simultaneously with a certain proportion. The squares of these proportions represent probabilities. The following diagram motivates the concept of a qubit.

A bit can have two possible values 0 or 1. These are represented above using two red dots. Now replace the dots with vectors of length 1 where an "up" vector represents 0 and a "down" vector represents 1. Now to rotate this unit vector in 3-dimensional space. The endpoints of the vectors will reside on a unit sphere known as the "Bloch sphere". A qubit refers to a point on the surface of this sphere. This is the state of the qubit when it is not measured. It turns out that one can express a point on the Bloch sphere as a "complex linear combination" of the up and down vectors. Here complex means a number of the form "a + b i" where "i" is the square root of -1 and a and b are real numbers. The up and down vectors are called "basis states" and the vector representing a point on the unit sphere represents an arbitrary qubit state. Therefore a qubit can be expressed as a complex vector "c |0> + d |1>" where c and d are complex numbers and |0> and |1> represent the "up" and "down" basis vectors respectively. One can prepare a qubit in a certain state. For example, a qubit could start out as the "up" arrow (|0> state) and then undergo transformations to assume a different state on the Bloch sphere. If the transformations are known then the qubit state and the associated coefficients c and d are also known. However, the coefficients can never be measured. This is the famous quantum indeterminism. The coefficients c and d represent the "private world" of the qubit. When a qubit is measured, it undergoes an irreversible disturbance that causes the state to collapse to |0> or |1> with the probabilities given by the absolute square of the coefficients. If we were to sample identical qubits by measuring them, one could estimate the probabilities, but one can never measure the "inner state" of the qubit.

There are many 2-state systems in nature that behave in this way and can be represented by qubits. Some well-known examples are currents flowing through superconducting wires at very low temperatures, magnetic spins of charged particles, energy levels of trapped ions, and the polarization modes of photons. To understand qubits in their full generality one has to review the properties of complex numbers and vectors. Most of the literature on quantum information (and there is plenty out there) starts with the Bloch sphere and the most general complex representation of a qubit. To the initiated casual reader it can be a little daunting to read at first. However, it turns out that one can understand a lot about qubits using just high school trigonometry by focusing on "real-valued" qubits. In fact, most of the concepts of quantum information can be understood using this simpler flavor of qubits. Some of the most important algorithms in quantum information such as the Deutsch-Jozsa algorithm, Bernstein-Vazirani algorithm, Simon's algorithm, Quantum Key Distribution (QKD), and Quantum Teleportation algorithms can be understood using just real-valued qubits. Moreover, there is a concrete physical realization of real-valued qubits namely the linear polarization of a photon of light. It provides a perfect illustration of the key aspects of a qubit without requiring an understanding of complex numbers and linear algebra. In fact, when we do discuss complex-valued qubits we will use circular polarization of a light photon to illustrate the more advanced aspects of qubits. Eventually to understand the crown jewels of quantum computing such as Shor's algorithm, the Quantum Fourier Transform, and Grover's algorithm we will need to work with complex-valued qubits. But in this post, we will focus only on real-valued qubits and discuss complex-valued qubits only when we need them in the next post. Here is a diagram that was drawn by my 17-year old daughter to help my 14-year old son understand the basics of trigonometry. It will be essential for our discussion of real-valued qubits.

by Tanvi Adhikari

A real-valued qubit can be represented by just a point on the unit circle as shown below. The point (1,0) on the X-axis is labeled |0> and the point (0,1) on the Y-axis is labeled |1>. A qubit is an arbitrary point on the circle whose values can be expressed in terms of the angle of the vector with respect to the X-axis. The point can also be expressed as a linear combination of |0> and |1> as shown below.

The state of a qubit is always expressed in terms of a measurement basis. Any pair of orthogonal unit vectors can serve as a basis for representing qubits. The vectors represented by |0> and |1> are orthogonal to each other and are together called the "computational basis". The state |0> represents a qubit whose measured value with respect to the computational basis is always 0, and the state |1> represents a qubit whose measured value with respect to the same basis is always 1. When an arbitrary qubit is measured in the computational basis its value is 0 or 1 with a probability given by the square of the coefficient of |0> or |1> (which are respectively the square of the cosine and sine of the angle of the qubit vector with respect to the X-axis as shown above). Therefore the probabilities of obtaining 0 or 1 are given by the squares of the cosine and sine respectively. When a measurement is performed on the qubit, the state of the qubit "collapses" to |0> or |1> with the same respective probabilities.

Since the probability of a measurement outcome in a basis is the square of the linear coefficient of the qubit state with respect to the basis, we get the following conclusions:

Measurement of |0> with respect to the computational basis leaves the qubit in the state |0> and produces a value of 0 with 100% probability. The analogous statement is true for the measurement of the qubit state |1> with respect to the computational basis.
Measurement of the qubit states |+> or |-> with respect to the computational basis collapses the state to |0> or |1> with a probability of 1/2 (50%) and the measured value is 0 or 1 respectively.
One can measure a qubit state with respect to a "rotated basis" such as the +- basis. Measuring the state |0> or |1> with respect to the +- basis will collapse the state to |+> or |-> with a 50% probability each. Therefore, even though measuring a state like |0> in the computational basis produces an outcome of |0> will 100% probability, measuring it on a rotating basis disturbs it irreversibly and produces an indeterminate value of + or - with 50% probability. This is the uncertainty principle in action.
This phenomenon of certainties becoming probabilities when measured on a different basis is a crucial aspect of quantum mechanics and plays an important role in security protocols such as quantum key distribution.

Photon Polarization

Linear polarization of light provides a perfect illustration of qubit states. The wave-particle duality of light (and matter) is a fundamental principle of quantum mechanics. Light consists of electromagnetic waves. The polarization of light refers to the direction of oscillation of the electrical and magnetic fields on the plane perpendicular to the direction of propagation of lightwave (represented by the wave vector). A horizontally polarized light wave has its electric field oscillating along the X-axis (and the magnetic field along the Y-axis) based on a chosen X-Y coordinate system perpendicular to the wave vector. Similarly, a vertically polarized light wave will have its electrical field oscillating in the Y-axis (and the magnetic field along the X-axis) with respect to the chosen X-Y coordinate system. A horizontal polarizer will allow only horizontally polarized light and block vertically polarized light and vice versa. This can be demonstrated by putting a vertical polarizer behind a horizontal polarizer and sending in a light beam. No light will go through because the light coming out of the horizontal polarizer is horizontally polarized, which is blocked by the vertical polarizer. So far so good. But things get interesting if you place a polarizer between the horizontal and vertical polarizers that is parallel to the polarizers but rotated at an angle to the X-axis. While one would expect the light to be still blocked, it turns out that now actually a portion of the light is allowed to go through and the proportion of the light intensity that goes through is given by the square of the cosine of the angle between the inclined polarizer and the X-axis. The amount of light blocked is the square of the sine of the angle.

But light also consists of quanta of energy known as photons. The intensity of light is proportional to the number of photons going through a perpendicular surface area. Then one can explain the transmission or blocking of the light through the inclined polarizer in terms of the probability that a photon is allowed to go through or the probability that it blocked by the inclined polarizer. A photon that is allowed to go through would be said to be polarized in the inclined direction and a photon that is blocked can be thought of as polarized in a direction perpendicular to the axis of inclination. Since the intensity of light that goes through is proportional to the square of the cosine of the angle of the direction of polarization with the X-axis it follows that the probability of a photon going through is the square of the cosine of the angle of inclination and the probability of being blocked is the square of the sine of the angle of inclination. This provides us with some evidence that the polarization mode of a photon could be represented by a qubit. The quantum theory of a photon posits that the polarization mode of light is based on the spin of the photon (which is a boson) which is in fact a qubit. In fact, quantum security protocols such as quantum key distribution (QKD) make use of photonic qubits to securely share cryptographic keys.

Entanglement

Arguably the most intriguing aspect of qubits is their ability to get entangled with each other. When two qubits are independent and don't interact with each other their combined state can be expressed in terms of the individual states as a tensor product of the two states. This is shown below. The computational basis for a two-qubit system is just the set of all 2-bit strings - (|00>, |01>, |10>, |11>. The most general state of a 2-qubit system is a linear superposition of the basis states. As shown below this includes states like the Bell states that cannot be separated into a tensor product of their individual qubit states.

When two qubits are entangled with each other just knowing the states of the individual qubits is not sufficient to know the state of the combined pair of qubits. Thus the whole is greater than the sum of its parts. Thus the pair of qubits must be treated as a single system. Moreover, even though the measurement outcome of each of the qubits is non-deterministic, the measurement outcomes are strongly correlated with each other. For example, in the first two Bell states measurement of each qubit will produce a 0 or 1 with 50% probability. But if the measurement outcome of one of the qubits is 0 then the measurement outcome of the other qubit is also 0. Alternatively, if the measurement outcome of one qubit is 1 then the measurement outcome of the other has to be 1. It appears as though when one of the qubits is measured it instantaneously forces the state of the other qubit to be one or the other based on the first qubit's measurement outcome. This happens no matter how far the individual qubits are from each other in space. This is what Einstein referred to as "spooky action at a distance". Einstein believed that if one qubit could influence the other qubit instantaneously that would constitute a violation of the special theory of relativity. Therefore, he claimed in the famous EPR paper that quantum theory was an incomplete theory of nature. A complete theory of nature would account for "hidden variables" that explain the correlations between the measurement outcomes of the two qubits. Einstein's objection was based on a philosophical assumption called "local realism". Local realism posits that faraway events can’t influence each other faster than the speed of light (“locality”) and properties of objects have a definite value even if we don’t measure them (“realism”). It turns out that by Bell's theorem local realism is incompatible with quantum theory.

Entanglement is a fundamental characteristic of quantum mechanics and has been observed in many systems across large distances. For example, the light going through certain types of crystals called "nonlinear crystals" can produce linearly polarized photon pairs whose polarization states are entangled with each other.

In subsequent posts, we will discuss Bell's theorem, quantum communication, quantum gates, quantum circuits, and their application to quantum computation.

Thursday, January 14, 2021

Quantum entanglement and its applications: Part 1

Reinhold Bertlmann

Credit: AB1927, Public domain, via Wikimedia Commons

Unlike classical physics and general relativity, which deal with the deterministic evolution of physical variables such as position and momentum, quantum mechanics deals with an abstract entity called the state vector. In general, the state vector resides in an infinite-dimensional complex Hilbert space. However, in the world of quantum information and quantum computing one deals mostly with state vectors that are finite-dimensional. For example, the spin of a particle or the direction of a superconducting current or energy state of a trapped ion has a state vector that is simply a vector residing in a 2-dimensional complex Hilbert (inner product) space. By the Born rule (or Born postulate), the probability of an outcome during measurement of the physical variable (for example spin) is given by the square of the norm of the state vector. The evolution of the state vector can simply be represented by a 2 X 2 complex unitary matrix. In the absence of measurement, subjecting the system to conservative force fields simply results in the transformation of the state vector by a suitable unitary 2 X 2 matrix. The transformation of a state vector by these unitary matrics is called a quantum logic gate and can be represented graphically. One of the most unique and intriguing aspects of quantum mechanics is the phenomenon of entanglement. It deals with non-local correlations between measurements of complementary observables (such as position and momentum or spin directions) performed on parts of a system that are physically separated by a "large" distance. In the language of state vectors, it simply represents an indecomposable vector in the tensor product of two complex Hilbert spaces. The phenomenon of entanglement was first discussed by Einstein-Podolsky-Rosen in the famous EPR paper with a clearly stated goal of demonstrating the incompleteness of quantum mechanics as a theory of physical reality. EPR demonstrated that quantum mechanics had non-local effects, an anathema for Einstein as it seemed to violate special relativity. Actually, EPR only showed that quantum mechanics implies non-local correlations between measurements, but such a correlation is so counter-intuitive that it seemed to imply that there was more to quantum mechanics than the Copenhagen interpretation of quantum mechanics. Little did Einstein know that John Bell would later show that non-locality was an essential component of quantum mechanics. Bell showed that any local hidden variable theory would have to satisfy an inequality (known as Bell's inequality), which quantum mechanics did not satisfy. Bell wrote a wonderful paper called "Bertellman's Socks and the Nature of Reality", explaining the crux of the EPR paradox and Bell's inequality.

Bertlmann and his socks

Credit: AB1927, Public domain, via Wikimedia Commons

Since then many people have recast the EPR paradox and Bell's discovery into different formats to convey the non-classical, counter-intuitive (Mermin's "Local Reality Machine"), and computationally powerful (CHSH Nonlocal game) nature of entanglement. Today, entanglement forms the foundation of modern quantum information theory and has applications to cybersecurity via schemes like quantum key distribution and quantum cryptography. The following presentation tries to give a flavor of the history and applications of quantum entanglement. I gave this presentation in an evening class on quantum computing that I took at the UW Physics department.

In future posts, I hope to explain each of the slides on quantum entanglement (including Bell's paper on "Bertellman's socks") in simple terms. Stay tuned.

Entanglement and its applications from Prashanth Adhikari

Sunday, January 3, 2021

Penrose's work on singularities: Part 2

In Part 1 of this article, we saw that by 1965 there was theoretical evidence that a sufficiently massive star that is spherically symmetrical would collapse to a black hole after it has spent all its nuclear fuel. Since General Relativity postulates that spacetime is curved by the presence of matter and energy, it is expected that there would be a severe distortion of spacetime when the matter is compressed to a high density. The Oppenheimer-Snyder model of gravitational collapse showed that for a spherically symmetric, homogenous, and static distribution of dust (with no internal pressure) of a sufficiently large mass, there is no known mechanism to prevent the compression of the matter to infinite density. This would result in the formation of a black hole with an event horizon at a radius of 2GM/c^2 (Schwarzschild radius) and a singularity behind the event horizon. It is customary to rescale the dimensions so that G/c^2=1, so the Schwarzschild radius is simply 2M. The singularity represents a point of infinite curvature and also a point where all future-directed paths and light rays come to an end. In 1965 Roger Penrose proved that even if we did not make any assumptions of spherical symmetry, the geometrical constraints imposed by a very strong gravitational field will inevitably result in spacetime singularities,

As discussed in Part 1, Roger Penrose formulated his groundbreaking theorem in terms of five assumptions, which he showed were collectively inconsistent with each other. Four of those assumptions are based on assumptions of reasonable "niceness" of spacetime, namely a) The "Past" and "Future" condition, which states that there exists a consistent definition of past and future everywhere, b) The "Null Completeness" condition, which states that all paths built out of light rays can be extended indefinitely at all points and in all directions, c) The "Cauchy Hypersurface" condition, which assumes the existence of a non-compact connected Cauchy hypersurface (to be defined below), d) The "Null Energy" condition, which states that the local energy at any point is always non-negative. The fifth condition is the only one that pertains to the conditions expected near a black hole where gravity is so strong that all light rays are bent towards each other. Penrose's fifth assumption is e) The "Trapped Surface" condition, which states that there exists a 2-dimensional compact surface (like a sphere) for which all light rays emanating from the surface are bent towards each other.

Understanding the statement of the theorem and its proof entails absorbing a fair amount of geometry, topology, and terminology associated with the causal theory of spacetime and general relativity. In this article, I will attempt to explain the concepts in the simplest possible way without appealing to all the jargon that one would typically encounter in a rigorous exposition of the topic. We start with the causal theory of spacetime, which is fundamental to the entire subject.

Local Causal Structure of Spacetime

The origin of the causal theory of spacetime lies in Minkowski's reformulation of Einstein's special theory of relativity in terms of a four-dimensional spacetime. In special relativity, the speed of light occupies a very special place. Nothing can travel faster than the speed of light, a universal constant independent of any inertial observer's frame of reference. Points in the Minkowski model are events whose separation is measured by the Lorentz metric, a quantity that is invariant under Lorentz transformations. The causal theory of what events can influence other events and the domain of influence is a feature unique to the special theory of relativity and is not present in classical Newtonian mechanics.

A fundamental geometrical object when studying causal theory is the "light cone". Imagine lighting a candle at a point on Earth. If you ignore the effects of gravity, light from the candle will spread out radially in all directions in straight lines. These straight lines will sweep out a sphere in 3-dimensional space. Now if we suppress one of the spatial dimensions (say the Z-axis), then we can visualize this as an expanding circle. If we choose the vertical axis to represent time and suppress one of the space dimensions (because we cannot really visualize a 4-dimensional object!), then we will see that as time progresses vertically, the wavefront of light will spread farther and farther, on the surface of a cone. A similar cone can be envisioned going back in time.

The points in the interior and on the surface of the upper cone represent the causal future of a point (event) at the origin (at time t=0). Every event in the interior of the cone can be influenced by an object or signal traveling at a speed strictly less than the speed of light. Therefore, the interior of the cone is called the chronological future of the point at the origin. Points on the surface of the cone represent the boundary of the casual future. They are events that can only be influenced by signals traveling at the speed of light. The curves traced by such signals are called null geodesics the surface of the cone which is swept out by the null geodesics (light signals) is called a null hypersurface.

Minkowski spacetime is a "flat" manifold, meaning it has no curvature. It represents an idealized condition where one is not subject to any force or influence of any kind and all entities are moving at a uniform speed relative to each other (inertial frames of reference). For example, the spacetime very far away from any star could be approximated by Minkowski spacetime. In General Relativity, spacetime is curved by the presence of matter and energy. To quote John Wheeler, "matter tells spacetime how to curve, and spacetime tells matter how to move". The fundamental postulate of General Relativity known as the "Principle of Equivalence" states that near a point in spacetime the gravitational forces can be canceled ("transformed away by a coordinate change") by moving to a freely falling frame of reference. For example, if I am in a space station above the Earth I will experience a gravitational pull from the Earth. But if I step into a spacecraft and start freely falling in Earth's gravitational field, I will experience weightlessness. In geometric terms, this amounts to carrying out a coordinate transformation to a coordinate system wherein a small neighborhood spacetime looks and feels like Minkowski spacetime. Thus the Principle of Equivalence implies that as a manifold, spacetime is locally Minkowski at every point. So at every point, the local causal structure can be represented by a light cone. Depending on how spacetime curves due to the presence of matter, the light cones at different points might be pointing in different directions.

Timelike Curves

As a particle moves in spacetime, its trajectory in spacetime is represented by a worldline (also known as a timelike curve). At each point of that curve is a lightcone which represents the boundary of all the different spacetime directions in which a signal can travel. If the particle is a photon (quantum of light), then the curve is built out of light rays, and in that case, the lightcone is tangential to the curve at every point in spacetime. Such a curve is called a lightlike or null curve.

Lightlike (null) curves

So the concepts of timelike geodesics, null geodesics, chronological future, causal future, and the null hypersurface built out of the boundary carry over verbatim to curved spacetimes (also known as Lorentz manifolds). An essential ingredient for this causal analysis is assumption a) of Penrose's theorem namely the "Past" and "Future" assumption. It is essential that there be a consistent way to define past and future across the spacetime manifold to avoid pathologies.

For Penrose's theorem, it is important to consider not just the causal future of a point in spacetime, but the causal future of an entire "spacelike" surface in spacetime. Spacelike simply represents a slice of spacetime at a particular choice of time chosen uniformly across all points (the fact that you can do it is an assumption known as time orientability). In other words, a spacelike surface is just a region of space at a particular time. The spatial slice could not be curved.

The chronological, causal, and null future of a set satisfies some easily provable topological properties. The chronological future is an open set, meaning that for every event in the chronological future, you can find a "ball" of neighboring events in spacetime that reside in the causal future. Similarly, for any point that is on the boundary, every ball of neighboring events will have an event that is in the chronological future (interior of the causal future). Moreover, the boundary if nonempty is a closed 3-dimensional achronal C^0 submanifold of the 4-dimensional spacetime. Achronal means that no two points of the boundary can be joined by a timelike curve (worldline of an object traveling at a speed smaller than lightspeed). C^0 submanifold means that for each point on the boundary of the causal future, there is a 3-dimensional neighborhood of a point on the boundary that is topologically equivalent to an open ball in R^3 (Euclidean 3-space). In general, the boundary of the causal future will not be a smooth manifold as is evident from the lightcone and the boundary of a disconnected set.

The C^0 (topological) manifold structure by taking the so-called Riemann Normal coordinates of 4-dimensional spacetime around any point p on the boundary of a causal future. For a sufficiently small neighborhood, one can choose one of the coordinates to be timelike (since the neighborhood can be chosen to be Minkowski). The integral curves of the tangent vector of this coordinate will intersect the boundary in exactly one point because the boundary is achronal (no two points are joined by timelike curves). So the remaining 3 coordinates can be used to define a homeomorphism to R^3.

Such considerations will become important when we discuss Cauchy surfaces, trapped surfaces, and the proof of the Penrose theorem.

Global Causal Structure of Spacetime

Penrose theorem relies heavily on certain global assumptions about spacetime. Locally it is fairly clear what is happening topologically in spacetime given that it is a Lorentz manifold by the General Theory of Relativity. But when you stitch together these locally Minkowski spacetime neighborhoods, the resulting spacetime could have all sorts of pathological conditions. But when we look around with our telescopes we don't see any pathologies in spacetime. It is important to make an assumption that the spacetime starts out being nice and smooth and then determine what happens when gravity becomes too strong. For example, an "asymptotically flat" spacetime is consists of a 3-dimensional space that extends out to infinity where the gravitational field (aka the curvature of spacetime) becomes negligible far away from the source of the field (typically a massive object such as a star). Near the source spacetime is curved but far away from the source spacetime is almost "flat" (hence the name "asymptotically flat").

Asymptotically flat spacetime

The Cauchy Hypersurface condition satisfies a global niceness condition. It states, that there is an initial connected (not broken up) smooth 3-dimensional space that is spread out infinitely from which all of the spacetime can be developed in a well-defined fashion. In fact, the assumption is that the entire spacetime can be "built" out of slices of spacetime at each instance of time. Asymptotically flat spacetime is a perfect example, of a spacetime satisfying the Cauchy Hypersurface condition.

Cauchy Surfaces

A spacetime that satisfies the Cauchy surface condition has some nice properties. In fact, the technical definition of a Cauchy surface is a surface having the property that every timelike curve (a curve that is pointing in the chronological future of an event) will intersect it. It turns out (from the work of Choquet-Bruhat and Geroch) that spacetimes can be built smoothly from Cauchy surfaces. In addition, such spaces are also known to be "Globally Hyperbolic". Without getting too technical, it simply means that in such a spacetime you cannot go back in time (no closed timelike geodesics) and that there are no "holes" or gaps in the spacetime (the intersection of the causal future of an event p and the causal past of another event q that lies in the causal future of p is compact). It turns out that all Cauchy surfaces are topologically equivalent (homeomorphic to each other).

An intuitively obvious but crucial consequence of the Cauchy surface condition is that every point on the trajectory of a light curve in spacetime can be traced back to a point on the Cauchy surface using a timelike curve.

In this picture, you have two Cauchy hypersurfaces and a light signal that goes from event P1 in one Cauchy surface to event P2 in the other. But the point P2 is also the evolution of a point that is the intersection of the perpendicular timelike curve with the first Cauchy surface. The timelike curves that are perpendicular to each Cauchy surface define a homeomorphism (1-1 topological equivalence) between the two Cauchy surfaces. We saw earlier, that the boundary of the causal future of a spacelike surface is an achronal C^0 manifold generates by null geodesics (light rays as above). The timelike curves coming down from a point P2 on the boundary to the Cauchy surface Sigma_1 will map open sets to open sets, so it is a homeomorphism onto its image. This fact will become important in the proof of Penrose's theorem.

Raychaudhuri's focusing equation

Until now we have not really discussed the effect of gravity on light rays and the curvature of spacetime. The first and most famous verification of the General Theory of relativity was the observation of the bending of light during a solar eclipse by Arthur Eddington in May 1919. The phenomenon of gravitational lensing is well known today. Light rays emanating from stars behind a massive object (like a black hole) will be bent when they pass near the massive object.

Gravitational lensing (Credit: ESA/Hubble & NASA)

Raychaudhary was the first to study the implications of Einstein's equations for the collective behavior of families of geodesics in spacetime (such as families of light rays or families of trajectories of particles). He showed that since gravity is an attractive force, neighboring geodesics are bent towards each other and will eventually intersect. The intersection of infinitesimally close neighboring geodesics has a very important consequence. They are known as focal points or conjugate points. They have been studied extensively in the context of Riemannian differential geometry and they usually have important consequences for the global differential geometry of surfaces. Penrose and Hawking were the first to study them in the context of relativity and spacetime.

A familiar situation in ordinary Riemannian geometry where geodesics intersect is the example of a sphere.

Credit: Hawking-Penrose

If a geodesic in Riemannian geometry has a conjugate point, then it cannot be a length minimizing geodesic as seen on the sphere. If a great circle from a point p to q encounters a conjugate point r before reaching q, then it will not be the shortest path between p and q, because there will be another shorter great circle that directly joins p and q.

There are analogous implications of the existence of conjugate points for timelike and null geodesics in spacetime. The Raychaudhuri equation helps determine the conditions under which geodesics will encounter conjugate points. This brings us to assumption d) of Penrose's theorem - the "Null Energy" condition. The null energy condition (also known as the Weak Energy Condition) states that the local energy at any point is always non-negative. From Einstein's field equations it turns out that the Null energy condition is equivalent to the last term in the right-hand side of the Raychaudhuri equation being positive. This means that the entire right-hand side of the equation is bounded below by the square of the convergence. By solving the Raychaudhuri inequality (ignoring the shear terms and the energy term both of which are positive), one can show that the convergence factor is bounded below by a function of the affine parameter that depends on the initial convergence factor and the initial parameter value.

Now by itself, the Raychaudhari equation does not imply that all null geodesics will have conjugate points. After all light rays tend to spread out, so the initial convergence factor is usually negative. If you light a candle or if a star explodes in a large flash of light, the light rays will expand out spherically. If the initial convergence factor is negative, then even if gravity tries to focus the light rays back, it may not be enough to make them meet. That is unless gravity is so strong that the convergence factor starts out being positive. It seems counter-intuitive to imagine, but that is exactly what happens with a trapped surface. If the initial convergence factor is positive, then in a finite period of time (measured by the affine parameter), the convergence factor will blow up to infinity, meaning one will encounter a focal point.

Closed Trapped Surfaces

The final assumption of Penrose's theorem, which is based on the fact that you are in a situation of very strong gravity is e) The "Trapped Surface" condition. The following two diagrams will illustrate the contrast between a "normal" 2-dimensional surface in spacetime and a "trapped" 2-dimensional surface.

Credit: Hawking-Penrose

With a close trapped surface, you have a compact 2-dimensional surface (with no boundary) such as a sphere from where the light cones at the surface will always be tipped inwards, which is the same thing as saying that the initial convergence factor is positive (or equivalently the null expansion factor is negative). Since the Raychaudhuri equation shows that the convergence factor for light rays emanating from the trapped surface is bounded below by the initial convergence factor, the light rays will always have a positive convergence factor. If light rays (or null geodesics) can be extended indefinitely, which is assumption b) of Penrose's theorem, then eventually geodesics will have to encounter a focal point after a finite period of time. The terms caustics, conjugate points, and focal points are all used in the literature to describe the same concept.

There has been a tremendous amount of research on the conditions under which trapped surfaces will form. The most obvious example is in the case of a spherically symmetric black hole when the event horizon is formed. Compact spacelike surfaces behind the event horizon are trapped surfaces. Even if you deform the spacetime so that it is not spherically symmetric, the trapped surfaces will continue to form.

However, there are theorems that show that even in the absence of spherical symmetry, trapped surfaces can form during gravitational collapse or conditions of strong gravity.

Putting it all together - the punchline

To summarize, we know from the local causal theory of spacetime that the boundary of the chronological future of a spacelike surface is generated by null geodesics (think of the light cone in Minkowski space whose surface consists of light rays). Think of the surface of a star that is undergoing a supernova explosion. The particles emanating from the surface are moving into the chronological future of the star and the light rays from the boundary of the chronological future in spacetime. In particular, the boundary of the chronological future of a trapped surface is generated by null geodesics. What is different about a trapped surface as opposed to any other surface (such as the surface of a star) is that the light rays will all start focusing on each other. If you let the light rays travel indefinitely, they will have to intersect their infinitesimal neighbors at some point, so there will be a focal point on each geodesic. Penrose proves that this contradicts the Cauchy surface condition. The easiest way to visualize the proof is to examine the following diagrams. They are due to Penrose and are taken from his 2020 Nobel lecture.

Penrose uses normal null geodesics that are emanating from the trapped surfaces and shows them converging. A key fact is that any null geodesic that encounters a focal point stops being null and becomes timelike after it crosses the focal point. Hence it must enter the interior of the chronological future (see the image below).

The proof of this fact is a bit subtle (see Hawking-Ellis Proposition 4.5.12 or Witten Section 5.2). The heuristic argument is that if you take a geodesic \gamma from p to r containing a focal point at q will allow an infinitesimally nearby geodesic that will also join p and q. Then the neighboring geodesic plus the segment qr will have a "kink" at q, which means that this new curve is not a geodesic. But this curve has the same length as \gamma. By smoothening out the kink we can create a path that reaches its destination to the past of q. This implies that the original curve \gamma is timelike.

The fact that the null generators of the boundary leave the boundary and enter the interior after a finite amount of time implies that the boundary itself has to have a finite extent. In other words, the null boundary of the chronological future of the trapped surface is compact as can be seen in Penrose's diagrams above. Now Penrose claims that this contradicts the Cauchy hypersurface (global hyperbolicity) condition. Compare Penrose's conical diagram above with the Cauchy hypersurface diagram below.

Credit: Wald GR

The light ray from P1 to P2 is one of the generators of the cones shown above. It turns out that the null boundary of the causal future of any surface can be mapped down to the Cauchy surface using timelike geodesics that are orthogonal to the Cauchy surface. If you can extend the null geodesics indefinitely, then they would form a hypersurface that is topologically equivalent to the Cauchy surface. So the null boundary of the chronological future must be a Cauchy hypersurface. But that is not possible if the null boundary "closes up" onto itself. The technical way to state this is that the focusing of null geodesics results in the null boundary of the causal future of the trapped surface being compact. But the original assumption was that the Cauchy surface was non-compact (meaning extending out to infinity).

Mapping of future null boundary to Cauchy surface

(Credit: Sayan Kar, IIT KGP)

So the boundary curling up into itself to form a compact hypersurface cannot happen if the initial Cauchy hypersurface is non-compact. This means that the null geodesics generating the boundary of the causal future of a trapped surface must not be extendible beyond a certain point. This is incompleteness. The light rays are moving in direction of a singularity, but will never reach it. For example, the squiggly line below represents r=0 which cannot be reached by the light rays. The incompleteness (presence of singularity) means you can imagine slicing the surface along the singularity and spreading the null boundary out to map to the initial Cauchy hypersurface.

Credit: Wald GR

The technical proof arrives at a contradiction by showing that the timelike mapping to the initial Cauchy hypersurface is compact and open. Being open and closed the image of the mapping is the entire Cauchy hypersurface since the latter is connected. But that is a contradiction because the Cauchy hypersurface is non-compact. Again the perfect example of a non-compact Cauchy hypersurface is the asymptotically flat spacetime surrounding an object such as a star.

Closing Remarks

Penrose's result was a turning point in the study of collapsed objects and the subsequent work by him and Hawking started a revival of interest in General Relativity. The Hawking-Penrose singularity theorems represent a landmark in the history of General Relativity. The developments spurred by their work would have possibly shocked Albert Einstein, the discoverer of relativity who always believed that singularities were a mathematical anomaly. But the exciting aspect of singularities is that they provide a hint of new physics that is yet to be developed. The study of black holes and singularities is an active area of research in theoretical physics, astrophysics, and mathematics.

Any discussion of black holes and singularities would be incomplete without a discussion of Penrose's cosmic censorship conjecture. Since Penrose's (and Hawking's) results show that singularities are inevitable when gravity is very strong, why is it that we don't encounter or observe singularities in the universe? Remember all the evidence for black holes is about dark supermassive objects that exercise enormous gravitational influence on their neighborhood. All known models of black holes have an event horizon. The images taken of the region around black holes show either gravitational lensing of light from stars behind the black hole or light spinning around the black hole near the event horizon (photosphere) or the accretion disk, which is a region near the event horizon where the matter is sucked into the black hole.

It would be rather disconcerting if there were singularities just lying about in spacetime, but their invisibility led Penrose to make this conjecture.

Weak Cosmic Censorship Hypothesis: Nature abhors a naked singularity.

In other words, even though singularities are inevitable in general relativity, they are always hidden behind event horizons. No observer from outside (at "Null Infinity") can see a singularity. It turns out that this allows one to develop a nice theory of black holes since a lot of physics can be done without worrying about the singularities. Proving or disproving the cosmic censorship hypothesis is one of the central problems of mathematical general relativity.

References:

General Relativity by Robert M. Wald

The large scale structure of space-time by S. W. Hawking and G. F. R. Ellis

Light Rays, Singularities, and All That by Ed Witten