An overview of the DNA cipher in the novel Upgrade by Blake Crouch (Warning: contains book spoilers)

Introduction

The novel Upgrade by Blake Crouch is a 2022 thriller set in the near future where gene editing is at the forefront of science. Miriam Ramsay is a famous geneticist who causes a catastrophe of global scale when releasing a new strain of insect in an attempt to protect crops. This disaster forces governments to outlaw the practice of gene modification.

The book follows Logan Ramsay (Miriam’s son), a federal agent who is exposed to a virus while on a raid that alters his DNA. Among the changes to his DNA he finds a code sequence that appears to be a hidden message. Can he decipher it’s meaning and find out what’s happening to him?

What is DNA?

Before we dive into the details of the cipher, let’s cover some background info. Deoxyribonucleic Acid (DNA) is a chemical with the famous double helix structure that encodes the genetic material in all living things. It was discovered in 1869 by Swiss physician Friedrich Miescher but it’s function wasn’t determined until the 1950s.

The molecule resembles a twisted ladder, where the “sides of the ladder are made up of sugars and phosphates, and the rungs are formed by bonded pairs of nitrogenous bases. These bases are adenine (A), guanine (G), cytosine (C), and thymine (T).

Side note: The title of the Andrew Niccol film Gattaca is created with only the letters of these DNA bases, as the film is set in a future where society has embraced eugenics.

an simplified illustration of a short section of DNA showing the different base pairs, Adenine, Thymine, Cytosine and Guanine with the phosphate backbone
An simplified illustration of the DNA molecule showing the base pairs and phosphate backbone

Human DNA has 6 billion base pairs, and it is the order of these bases that make up our genetic code. In the book, Logan’s DNA has been altered and he is able to discover where changes have been made to this code.

As you can see in the above image, the base pairs can be ‘read’ by working along the DNA molecule.

The cipher text

The sequence Logan discovers is hidden in an area of ‘junk’ DNA, where no changes would be expected. The cipher text is as follows:

TCC CCC CCG ACC CGA CCC ACG CAC CGC ACC CCT CTC GTG GTC ACC GCA CCC ACC CGG GAC CCC ACG GGT CCC CCC CCC CCC CCC CCC CCC GAC CCG ACC CAC GCA CCG CAC CCC TGG TGT CGG TCG GTC GGT CGG ACC CCG GGA CAC CCG CAC CCC

Breaking the code

The problem here is to work out how to encode and therefore decode meaningful information using only a combination of 4 letters. With 3 letter sequences there are 64 combinations, more than enough for the 26 letters of the alphabet and 10 numbers. But that would make this a substitution cipher, and fairly easy to break with frequency analysis.

Unlike us, the book’s protagonist has been genetically altered to enhance his cognitive and physical abilities. His IQ is now off the charts, and he is able to work this out in the course of an evening.

Logan applies frequency analysis but doesn’t get much from it. He does notice that the Ts and As do not appear doubled or tripled like the Gs and Cs, and deduces they are symbols that denote meaning to the following sequence; Ts indicate the following Gs and Cs represent a number, and As indicate a letter. As such, the cipher text is more like this:

TCCCCCCCG ACCCG ACCC ACGC ACCGC ACCCC TC TCG TGG TC ACCGC ACCC ACCCGGG ACCCC ACGGG TCCCCCCCCCCCCCCCCCCCCCG ACCCG ACCC ACGC ACCGC ACCCC TGG TG TCGG TCGG TCGG TCGG ACCCCGGG AC ACCCGC ACCCC

Inferring the Cs and Gs are similar to Roman Numerals and indicate 5s and 1s respectively that add up to make a number, that makes

TCCCCCCCG = 5 + 5 + 5 + 5 + 5 + 5 + 5 + 1 = number 36
ACCCG     = 5 + 5 + 5 + 1 = letter 16 = P

The plain text

Working through the rest of the cipher text produces the following plain text:

36POINT5625NORTH106POINT217777WEST

Which are the coordinates to a fairly remote location in the Carson National Forest north of Santa Fe, New Mexico, USA (map below). I’ll leave you to wonder what Logan finds at this location.

Disadvantages of the Upgrade DNA cipher

While the cipher text in the book is relatively short without many repeating characters, longer messages would become vulnerable to frequency analysis. Each letter or number really only has one corresponding code (e.g. letter E is always AC) making this another substitution cipher and that would quickly become apparent in a longer message.

Try the Upgrade DNA cipher yourself!

I built a small app to encrypt your own plain text into a DNA sequence using Blake Crouch’s method. Use my JavaScript app here to encrypt your own messages!

Other DNA ciphers

A quick internet search finds multiple papers discussing algorithms and methods to use DNA for the purposes of cryptography. There are even a few websites offering tools to encode and decode your own messages. I might add one to this website in the future.

However, unlike Crouch these tools use the codon table; where triplets of nucleotides (ie. GAT) encode an amino acid. There are 22 amino acids so 22 letters can be encoded with this method. This creates a sort of substitution cipher, an example below:

Plain text:   attack at dawn
Cipher text:  GCTACCACGGCATGCAAG GCGACT GACGCTTGGAAT

Conclusion

In this article I’ve briefly discussed what DNA is, how text can be encoded within it and how the cipher was broken in the book.

The DNA cipher in Upgrade is an interesting puzzle, using Roman Numerals to encode all letters and numbers using only 4 symbols. I like that Crouch went his own way on this one and created something different for his cipher instead of using an existing known DNA encoder.

Want more ciphers? Check out our ciphers page!