Matt Blaze: A Cryptologic Mystery

18 September 2020

A Cryptologic Mystery

Did a broken random number generator in Cuba help expose a Russian espionage network?

I picked up the new book Compromised last week and was intrigued to discover that it may have shed some light on a small (and rather esoteric) cryptologic and espionage mystery that I've been puzzling over for about 15 years. Compromised is primarily a memoir of former FBI counterintelligence agent Peter Strzok's investigation into Russian operations in the lead up to the 2016 presidential election, but this post is not a review of the book or concerned with that aspect of it.

Early in the book, as an almost throwaway bit of background color, Strzok discusses his work in Boston investigating the famous Russian "illegals" espionage network from 2000 until their arrest (and subsequent exchange with Russia) in 2010. "Illegals" are foreign agents operating abroad under false identities and without official or diplomatic cover. In this case, ten Russian illegals were living and working in the US under false Canadian and American identities. (The case inspired the recent TV series The Americans.)

Strzok was the case agent responsible for two of the suspects, Andrey Bezrukov and Elena Vavilova (posing as a Canadian couple under the aliases Donald Heathfield and Tracey Lee Ann Foley). The author recounts watching from the street on Thursday evenings as Vavilova received encrypted shortwave "numbers" transmissions in their Cambridge, MA apartment.

Given that Bezrukov and Vaviloa were indeed, as the FBI suspected, Russian spies, it's not surprising that they were sent messages from headquarters using this method; numbers stations are part of time-honored espionage tradecraft for communicating with covert agents. But their capture may have illustrated how subtle errors can cause these systems to fail badly in practice, even when the cryptography itself is sound.

First, a bit of background. For at least the last sixty years, encrypted shortwave radio transmissions have been a standard method for sending messages to covert spies abroad. Shortwave radio has several attractive properties here. It covers long distances; it's possible for a single transmitter to get hemispheric or even global coverage. Shortwave radio receivers, while less common than they once were, are readily available commercially in almost every country and are not usually suspicious to possess. And while it's relatively easy to tell where a shortwave signal is coming from, their wide coverage area makes it very difficult to infer exactly who or where the intended recipients might be. Both the US (and its allies) and the Soviet Union (and its satellites) made extensive use of shortwave radio for communicating with spies during the cold war, and enigmatic "numbers" transmissions aimed at spies continue to this day.

The encryption method of choice used by numbers stations is called a "one time pad" (OTP) cipher. OTPs have unique advantages over other encryption methods. Used properly, they are unconditionally secure; no amount of computing power or ingenuity can "break" them without knowledge of the secret key. Also, they are almost deceptively low tech. It is possible to encrypt and decrypt OTP messages by hand with nothing more than paper and pencil and simple arithmetic. The disadvantage is that OTPs are cumbersome; you need a secret key as long as all the messages you will ever send, with no part of the key ever re-used for multiple messages. Typically, the key would be printed as a series of digits bound into a pad of paper, with each page removed after use; hence the name "one time pad". OTPs can be difficult in practice to use properly and are quite vulnerable if used improperly; more on that later.

The OTP messages sent to spies by shortwave radio typically consist of decimal digits broadcast in either a mechanically recorded voice or in morse code (more recently, digital transmissions are also used) on designated frequencies at designated times, usually in four or five digit groups (hence the term "numbers station"). After copying and verifying a header in the message, the agent would remove the corresponding page from their secret OTP codebook and add each key digit to each corresponding message digit using modulo-10 arithmetic (without carry). The resulting "plaintext" digits are then converted to text with a simple substitution encoding (e.g, A=01, B=02, etc., although other encodings are generally used). That's all there is to it. The security of the system depends entirely on the uniqueness, unpredictability, and secrecy of the OTP codebook pad given to each agent.

To prevent "traffic analysis" that might reveal to an observer the number of active agents or the volume of messages sent to them, numbers stations typically operate on rigidly fixed schedules, sending messages at pre-determined times whether there is actually a message to be sent or not. When there is no traffic for a given timeslot, random dummy "fill" traffic is sent instead. The fill traffic should be indistinguishable to an outsider from real messages, thereby leaking nothing about how often or when the true messages are being sent. But more on this later.

None of this is by itself news. The existence of numbers stations has been publicly known (and tracked by hobbyists) since at least the 1960's, and OTPs are an elementary cryptographic technique known to every cryptographer. However, Strzok mentions two interesting details I'd not seen published previously and that may solve a mystery about one of the most well known numbers stations heard in North America.

First, Compromised reveals that the FBI found that during at least some of the time the illegals were under investigation, the Russian numbers intended for them were sent not by a transmitter in Russia (which might have difficulty being reliably received in the US), but relayed by the Cuban shortwave numbers station. This is perhaps a bit surprising, since the period in question (2000-2010) was well after the Soviet Union, the historic protector of Cuba's government, had ceased to exist.

The Cuban numbers station is somewhat legendary. It is a powerful station, operated by Cuba's intelligence directorate but co-located with Radio Habana's transmitters near Bauta, Cuba, and is easily received with even very modest equipment throughout the US. While its numbers transmissions have taken a variety of forms over the years, during the early 2000's it operated around the clock, transmitting in both voice and morse code. The station was (and remains) so powerful and widely heard that radio hobbyists quickly derived its hourly schedule. During this period, each scheduled hourly transmission consisted of a preamble followed by three messages, each made up entirely of a series of five digit groups (with by a brief period of silence separating the three messages). The three hourly messages would take a total of about 45 minutes, in either voice or morse code depending on the scheduled time and frequency. Every hour, the same thing, predictably right on schedule (with fill traffic presumably substituted for the slots during which there was no actual message).

If you want to hear what this sounded like, here's a recording I made on October 4, 2008 of one of the hourly voice transmissions, as received (static and all) in my Philadelphia apartment: www.mattblaze.org/private/17435khz-200810041700.mp3. The transmission follows the standard Cuban numbers format of the time, starting with an "Atención" preamble listing three five-digit identifiers for the three messages that follow, and ending with "Final, Final". In this recording, the first of the three messages (64202) starts at 3:00, the second (65852) at 16:00, and the third (86321) at 29:00, with the "Final" signoff at the end. The transmissions are, to my cryptographic ear at least, both profoundly dull and yet also eerily riveting.

And this is where the mystery I've been wondering about comes in. In 2007, I noticed an odd anomaly: some messages completely lacked the digit 9 ("nueve"). Most messages had, as they always did and as you'd expect with OTP ciphertext, a uniform distribution of the digits 0-9. But other messages, at random times, suddenly had no 9s at all. I wasn't the only (or the first) person to notice this; apparently the 9s started disappearing from messages some time around 2005.

This is, to say the least, very odd. The way OTPs work should produce a uniform distribution of all ten digits in the ciphertext. The odds of an entire message lacking 9s (or any other digit) are infinitesimal. And yet such messages were plainly being transmitted, and fairly often at that. In fact, in the recording of the 2008 transmission linked to above, you will notice that while the second and third messages use all ten digits, the first is completely devoid of 9s.

I remember concluding that the most likely, if still rather improbable, explanation was that the 9-less messages were dummy fill traffic and that the random number generator used to create the messages had a bug or developed a defect that prevented 9s from being included. This would be, to say the least, a very serious error, since it would allow a listener to easily distinguish fill traffic from real traffic, completely negating the benefit of having fill traffic in the first place. It would open the door to exactly the kind of traffic analysis that the system was carefully engineered to thwart. The 9-less messages went on for almost ten years. (If I were reporting this as an Internet vulnerability, I would dub it the "Nein Nines" attack; please forgive the linguistic muddle). But I was resigned to the likelihood that I would never know for sure.

And this brings us to the second observation from Strzok's book.

Compromised doesn't say anything about missing nueves, but ita does mention that the FBI exploited a serious error on the part of the sender: the FBI was able to tell when messages were and weren't being sent during the weekly timeslot when the suspect couple was observed in the room where they copied traffic. Even worse (for the illegals), empty message slots correlated perfectly with times that the suspect couple was traveling and not able to copy messages. This observation helped confirm the FBI's suspicions and ultimately led to their arrest and expulsion (along with the rest of the Russian illegals network).

I suspect that Strzok simplified the story for the book and that it was the 9-less Cuban messages that they surmised to be "no message" traffic when the Cuban station was used. The FBI (or NSA) no doubt noticed the lack of 9s just as I (and others) did, and likely came to the same conclusions I did. The difference is that they were in a position to confirm the hypothesis through real-time surveillance of actual espionage suspects.

Ironically, this was not the first time that Russian/Soviet intelligence has been burned by sloppy OTP practices. The first was, more famously, the disastrous re-use of OTPs discovered and exploited in the Venona intercepts.

One time pads can be a cryptographic landmine. They have a very attractive property - provable security! - but at the cost of unforgiving operational assumptions that can be hard to meet in practice. OTPs have long been a favorite of hucksters selling supposedly "unbreakable" encryption software. So remember this story next time someone tries to sell you their super-secure one-time-pad-based crypto scheme. If actual Russian spies can't use it securely, chances are neither can you.

Anyway, as they say on the radio...

FINAL
FINAL