Prime Time 02: The music of the primes

Published: October 14, 2008
Tags: number theory prime numbers mathematics

Time for the second installment of my series on primes. This time I want to talk about the distribution of prime numbers: how primes are spread out along the line of positive integers. Are there any patterns in the distribution, or is it entirely random? This is a question which has concerned number theorists for centuries. Answers to questions about the distribution of primes can be surprisingly hard to find, and sometimes tantalisingly vague. There are a lot of results in this field of study. The canonical one is the aptly-named prime number theorem, but I'm going to simply gloss over this result at the end of the post. Instead I am going to focus mainly on two other ideas which aren't as "powerful" as the prime number theorem, but are much more accessible to beginners.

Before I go any further, I'll point out that the title of this entry, "the music of the primes", isn't my own creation, it's the title of a famous book aimed at introducing some of the magic of primes to the public at large. The phrase has since become common within the field. I actually can't guarantee that it originated with that book. At any rate, I borrowed it.

Twin primes, and beyond

One extreme in the distribution of primes is the existence of the so-called twin primes. The term twin primes refers to any two primes who differ by two, i.e. if p is a prime and p + 2 is a prime, the p and p + 2 are twin primes. There are a lot of examples of twin primes near the start of the positive integers: 5 and 7 are twin primes, as are 11 and 13 and 17 and 19. Twin primes are as close together as two primes can be: it is impossible for p and p + 1 to both be primes, since at least one of two consecutive integers has to be even, and even numbers are, by definition, always divisible by 2. There is exactly one exception to this rule - 2 and 3 are consecutive and are both primes. This works because 2 is the only even prime number - which leads some mathematicians to jokingly call it "the odd" prime, a play on words using the alternate meaning of "odd" as "unusual". This says a lot about the sense of humor of a lot of mathematicians...

How many twin primes are there? As it turns out, nobody knows for certain. The twin prime conjecture states that there is an infinite number of twin primes. As the name suggests, this is a conjecture, not a theorem - it hasn't been proven, although people have been trying to prove it for centuries. Certainly, most people who have studied the problem believe that the conjecture is true. With modern computers it's relatively easy to search for twin primes automatically, and in this way number theorists have found a lot of them. The largest pair of twin primes yet found is 2003663613 x 2¹⁹⁵⁰⁰⁰ - 1 and 2003663613 x 2¹⁹⁵⁰⁰⁰ + 1, two huge numbers with 58711 digits in them! The fact that as our computers get faster we keep finding higher and higher twin primes is just one reason to believe in the infinitude of twin primes (although it certainly isn't a guarantee that we won't one day stop finding them). There are a number of rough arguments based on known facts that make it feel more sensible that there are infinitely many twin primes than that there aren't. Regardless of whether or not the twin prime conjecture is true, the fact is that there are a lot of twin primes out there: that is, there are a lot of cases where primes are distributed as close together as possible, which is an important consideration in trying to develop a feel for how the prime numbers are spread out.

Before moving on to the opposite extreme of prime distribution, I should point out that the idea of twin primes can be generalised pretty easily to prime triplets and various other higher level groupings. Some of these close groupings of primes are also conjectured to be infinitely abundant.

Large gaps in the primes

Having looked at the issue of twin primes, where primes are positioned as close together as they possibly can be, now I want to look at the opposite extreme: where the number line contains long stretches of non-primes (usually called "composites") in between consecutive primes. There's a really cool, kind of surprising and very easy to prove result about these situations, which simply says this:

For any positive integer n, there exist n consecutive integers, none of which are prime.

That is to say, you can pick any number you like - even the number of grains of sand on Earth or the number of stars in the universe, or any other outrageously large number - and it is guaranteed that somewhere along the number line there are that many non-prime integers in a row. The proof of this result doesn't just establish that these prime-free stretches exist, it explicitly shows us how to construct them. This might sound obvious, but not all so-called "existence proofs" in mathematics are nice enough to come with instructions like this. Sometimes we prove something exists by assuming that it doesn't and deriving a contradiction (the same logical strategy that we saw Euclid use to prove the infinitude of primes earlier), which doesn't necessarily leave us with any idea how to find the thing we just proved the existence of. Maths can be funny like that. Anyway, this is one of the nicer cases where the proof shows us exactly where to look.

We need to sort out a tiny bit of notation first. When a mathematician puts an exclamation mark after an integer, like 42!, what they mean is the product of that integer, in this case 42, with every positive integer less than it, all the way down to one. So, for example:

5! = 5 x 4 x 3 x 2 x 1 = 120,

10! = 10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 = 3628800

15! = 15 x 14 x 13 x 12 x 11 x 10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 = 1307674368000

As you can see, these numbers get very large very quickly. By the way, when reading these equations out, the exclamation mark is pronounced "factorial", i.e. 42! is "forty-two factorial". Now we're ready to prove our result. Let's say we want to find a prime-free gap of length 1000, i.e. 1000 non-prime numbers in a row. These numbers are simply:

1001! + 2, 1001! + 3, 1001! + 4, ..., 1001! + 1001.

These numbers are clearly consecutive and there are also clearly 1000 of them. Are any of them prime? 1001! + 2 isn't prime, because both sides of the addition sign are divisible by 2: 2 obviously is, and 1001! is divisible by any integer less than 1002, because it is, by definition, 1001 x 1000 x 999 x 998 x ... x 2 x 1. So the first number in the sequence isn't prime. 1001! + 3 also isn't prime, because for the same reason both sides of the sum are divisible by 3. It should now be clear that all of the numbers in the sequence are divisible by something other than themselves, because of the way they are constructed. This isn't a proof, yet, of course, it's merely an example, but it's very easy to generalise. To find n consecutive non-primes for any value of n, you just have to consider the sequence:

(n+1)! + 2, (n+1)! + 3, (n+1)! + 4, ..., (n+1)! + (n+1).

So here we have a result that is at the opposite extreme of the conjectured infinitude of twin primes: there are an infinite number of cases where primes are spaced as far apart as you like. In the interests of showing off some pedantic preciseness that being a good mathematician necessitates, I should point out that these theorem establishes that there exists at least one sequence of n consecutive composite numbers for any n. The sequence that is explicitly constructed in the proof is not guaranteed to be the only sequence of the length, nor the first such sequence to occur along the line. This isn't really an important distinction for the purposes of demonstrating a contrast to the conjectured infinitude of twin primes, but it might be an important detail to keep in mind if we were to ever, say, use this result to attempt to prove another. But I'm digressing.

The big picture

So far we've only seen a kind of frustrating inconsistency in the distribution of the primes: sometimes we find them as close together as they can possibly get, and sometimes we find them so far apart that our minds can't really grasp the significance of the number of composites between them. We haven't seen yet anything on whereabouts these two phenomena happen along the number line or whether one happens more often than the other. The truth is that hard and fast rules about the distribution of primes are something modern mathematics is distinctly lacking. The more widely applicable a rule is, the more vague it seems to be. A sense of romance about this tantalising mix of apparent randomness and apparent order is what attracts a lot of people to the study of primes in the first place.

In the interests of completeness, having shown you some very specific results about the distribution of primes, I should talk about the prime number theorem, which is one of those results that says something kind of vague, but is more widely applicable. The formal statement of the theorem that you'll see on Wikipedia is not very accessible to non-mathematicians, so here's a rough or, as a mathematician would say, "hand waving" statement that captures the gist of it:

As the positive integer n gets larger and larger, the chances of a random number picked from somewhere close by to n being prime gets closer and closer to 1 / ln(n).

Here ln(n) is the natural logarithm of n, which, to be very vague, is just something associated with a number than you can ask your calculator to figure out for you, much like the square root of n or the sine or cosine of n, etc. This hand waving result can give us an intuitive sense for what happens to the distribution of primes as we get far out into the number line. If n = 100,000, then ln(n) is roughly 11.5, so about 1 in 12 numbers near 100,000 is prime. If n = 1,000,000, then ln(n) is roughly 13.8, so about 1 in 14 numbers near 1,000,000 is prime. At n = 10,000,000 it's 1 in 16 and at n = 100,000,000 it's in in 18. If n is a googol (that is, a 1 followed by 100 zeros), about 1 in 230 numbers is prime. So we can see that prime numbers become more spread out the further we go along the line, but only in an "on average" sense of picking random numbers "near" some point on the line. The precise details of what is going on at any particular point are something a mystery until we actually crunch the numbers and look.

As a final point of interest, for those of you who get a lot more mileage out of a picture than a pile of words and theorems, here's a graph showing the length of the gap between consecutive primes for the first 1000 such gaps. I got this data from the On-Line Encyclopedia of Integer Sequences (gaps between consecutive primes is sequence A001223). I drew the graph using gnuplot. At the very bottom left you can see a single red cross that is lower than every other cross on the graph: that represents the gap of 1 between the primes 2 and 3. There's only one cross this low down because this is the only gap of 1 that ever happens in the sequence of primes, as we discussed 1earlier. All of the red crosses on the next level up represent gaps of 2: there is one cross for every twin prime. The reason that the crosses appear to come in horizontal stripes is simply reflective of the fact that the gap between any two primes always has to be even (except in that one exceptional case of 2 and 3) - if you add an odd number to any prime other than 2 you get an even (and hence non-prime) number. At roughly n = 200 and n = 950 you can see unusually high crosses sitting above the crowd - hints at the fact we saw earlier that the gap between consecutive primes can actually become arbitrarily large.

An (almost) 150 year old mystery

Although I've said all I really want to say on the distribution of primes for this entry, I would be remiss if I wrote something about the distribution of the primes and didn't give at least a passing mention to something called the Riemann hypothesis. This is the grand daddy of unsolved problems in modern mathematics. No kidding. If you prove this hypothesis, first proposed in 1859 by Bernhard Riemann (a very important German mathematician who, amongst other things, developed mathematical ideas which later came to be important in Einstein's theory of general relativity), to be true, you will win one million US dollars from the Clay Mathematics Institute (proving the Riemann hypothesis is one of seven so-called "millennium problems" that the CMI is offering a one million dollar reward each for) and serious fame. But don't think this is easy money: the world's very best mathematicians have been considering this problem for almost 150 years and nobody has found a proof yet. Aside 1from the simple attractiveness of solving a problem nobody else could in 150 years, one of reasons that so much effort is thrown at proving the Riemann hypothesis true is that if it is true, we'll be able to use it to prove a lot of other interesting results. One of these is a result similar to the prime number theorem but much stronger which would greatly improve our understanding of the distribution of prime numbers. This is some pretty heavy mathematics, and I freely admit that I don't know enough about the connection between the Riemann hypothesis and prime distribution to talk intelligently about it here - but I know that is a very important connection and provides a lot of the motivation for actually working on the problem.

Like the twin prime conjecture, most mathematicians who have studied the Riemann hypothesis strongly suspect it is true - we just haven't found a way to prove it beyond doubt yet.

Next prime time

Next prime time I want consider a slightly more practical issue about primes, namely: how do we actually know that a number is prime? This might sound silly, but when we are talking about the search for new primes larger than any found before, it is an entirely practical question to ask. We can't take a candidate number n which might or might not be a new prime and then try dividing it by 2, then by 3, then by 4, and so until we've tried dividing it by everything up to n. If n has 12978189 digits in it (like the current largest known prime) then trial division by every number less than n would take even the fastest computer on Earth an amount of time many orders of magnitude longer than our best estimates of the age of the universe. Thus we enter the field of primality testing, which is all about clever tricks to test whether or not a number is prime as quickly as possible. Although the question this field tries to answer is conceptually simple, it turns out that the best known methods for efficient testing involve extremely advanced mathematics. I'll introduce to one of the simplest improvements on trial division and discuss its strengths and weaknesses. Please look forward to it!

Feeds