Once again I have had the good fortune to have been invited to contribute a cryptographic puzzle as part of the fund raising efforts for Bletchley Park and the Museum of Computing during the ACCU conference.
Electronic computers advanced cryptography enormously. Richard Harris sets a challenge, and finds a solution. Electronic computers advanced cryptography enormously. Richard Harris sets a challenge, and finds a solution. Last year, if you recall, I designed a puzzle based upon the Enigma machine which could be broken with pencil and paper in 15 to 30 minutes once the weaknesses in the rotor mechanism and the crib in the message had been spotted.
As was the case with that puzzle, this one has been designed to be possible to solve with pencil and paper alone and includes a bonus question that cannot be answered if the problem is simply brute-forced.
So that you too may enjoy the puzzle, it is repeated below followed by its historical justification, its solution and the names of the master cryptographers who solved it at the conference.
The challenge
Encoding
The enemy are using a 32 character alphabet encoded as 5 bit unsigned binary numbers. The # character is a control code indicating that the following character should be interpreted as its numerical value rather than as a letter or punctuation. The full table of character mappings is given below.
00000000001111111111222222222233 01234567890123456789012345678901 #abcdefghijklmnopqrstuvwxyz ?!.,
Encryption
The encryption scheme used by the enemy is a symmetric key stream cipher in which each plaintext number is bitwise exclusive or-d with the key to create the ciphertext number after which the plaintext number is added to the key (discarding any bits above the 5th) to create a new key.
An in-place C implementation of this scheme is given in Listing 1.
void encrypt(unsigned char *s, size_t n, unsigned char key) { while(n--) { unsigned char c = *s; *s++ ^= key; key += c; key &= 0x1f; } } |
Listing 1 |
Plaintext
We know that the enemy prefix their messages with a single 5 bit number (i.e. one character) representing the message's priority. We also know that they foolishly postfix their messages with the date in the form
DDMMM
where DD is the day and MMM is the month, represented by a number and a three character abbreviation respectively.
The enemy encrypt each message with a randomly chosen key. This message key is repeated twice in a 2 character header at the start of the plaintext which is itself encrypted using a daily key.
Example
The plaintext
kkbflee#dfeb
represents a priority 2 message of 'flee' sent on the 4th February that will be encrypted with the message key 'k' and the daily key using
assert(*plaintext == *(plaintext+1)); encrypt(plaintext+2, 10, *plaintext); encrypt(plaintext, 2, daily_key);
If the daily key is 'd' this yields the ciphertext
odik,zaimkvz
Ciphertext
Decrypting the following ciphertext will reveal the number of the box in which to deposit your ticket.
ynb#zuybzmot.vt!xbhaxh
Bonus question
If we add to the ciphertext and the information we have about the structure and content of the plaintext a set of guessed key and/or plaintext characters at specific locations in the plaintext, and if we assume that these guessed characters are added to this set in the worst possible order that is logically consistent with a cryptanalysis, how large must this set be to guarantee that e correctly deduce where to deposit the ticket?
The historical justification
After the electro-mechanical rotor-based cryptosystems like the Enigma, the next big development in commercial cryptography were the digital electronic cryptosystems of the 1970s.
Perhaps the most successful of these was the Data Encryption Standard, or DES, developed by IBM as a candidate for a government encryption standard proposed by the US National Bureau of Standards and not officially retired until 2005.
This was a block cipher, in which the plaintext is broken up into relatively large blocks, each of which is algorithmically combined with a fixed key.
In contrast to block ciphers are stream ciphers which bear a much greater resemblance to their forebears, being polyalphabetic substitution ciphers operating on single characters, or even on single bits, of the message at a time. In doing so, they are attractive for telecommunications, for which the size of the message can rarely be determined in advance and the hardware may have limited volatile memory in which to store the message data.
These ciphers draw their inspiration from the only provably unbreakable encryption algorithm: the one-time pad. To operate a one-time pad, the sender and receiver of encrypted messages must first exchange identical sequences of random characters. To encrypt a message the sender iterates through both its characters and those in the sequence of random characters, combining them; typically with x-or. To decrypt a message the receiver performs the same operation on the ciphertext. Crucially, each must discard the random characters as they are used; if they do not do so their messages can be broken with statistical analyses.
The difficulty in exchanging the random characters makes the one-time pad unattractive for almost all applications. It is therefore extremely tempting to replace the random sequence with a pseudo-random sequence so that the two parties need only exchange the much shorter seed. The resulting cryptosystem is no longer provably unbreakable, but may be practically unbreakable.
Provided the pseudo-random sequence of characters is not too predictable, that is.
The solution
To decrypt the message we apply the same algorithm, although since the plaintext character is recovered after the x-or we don't need to store it to add to the key. An in-place C implementation is given in Listing 2.
void decrypt(unsigned char *s, size_t n, unsigned char key) { while(n--) { *s ^= key; key += *s++; key &= 0x1f; } } |
Listing 2 |
Trivially, we could brute force the message and discover the plaintext with a worst case of 32 guessed characters. The real question is whether or not we can decode it with fewer.
Noting that the enemy postfix their messages with the date we know that the 5th character from the end must be '#', represented by a 0. Since the ciphertext is generated by x-oring the plaintext with the key we can deduce that the key for this character must be 'b'. Running the last 5 characters through the decryption function yields
#jmar
indicating that the message was sent on the 10th March.
We might assume that the final characters of the message represent the number of the box, in which case the 7th character from the end must also be a '#', meaning that the key must be '!'. Using this key to decrypt the last 7 characters of the message yields
#e#jmar
which is entirely consistent with what we already know about the plaintext. This is reasonably convincing evidence that the ticket should be deposited in box 5 and that we need just one guessed character.
Reasonably convincing, but wrong.
In last year's Enigma challenge, the predictable structure of the message's postfix signature was the crib that ultimately allowed its decryption. As such it was too tempting for me to resist using it as a red herring in this puzzle.
This time the crib is, in fact, the repeated message key.
The trick is to recognise that for single bits, addition is exactly the same as x-or. Specifically, for any pair of integers, we have
(x+y)&1 == (x^y)&1
Denoting the ith bits of the daily key, the message key and the jth character of the plaintext as di, mi and Tj,i respectively, we have
From this we can deduce that the least significant bit of the message key is given by
We can perform a similar calculation for the second least significant bits, but must take into account the fact that there may have been a carry when the daily key was added to the message key.
Fortunately, we can retrieve the carry bit c1 from the information we have already extracted from the message.
Denoting the ith carry bit as ci we can generalise our formulae to bits other than the least significant.
giving us a recursive procedure for extracting the bits of the keys.
The first two characters of the ciphertext are
y = 25 = 11001 n = 14 = 01110
Applying this procedure yields
Giving us daily and message keys of
01010 = 10 = j 10011 = 19 = s
Using these to decrypt the message yields
ssqdrop no. three#jmar
meaning that the ticket should be placed in box number 3 and that it was not necessary to guess any characters.
And finally...
Congratulations to Duncan Gary Duke, Andrew Kemp, Per Liboriussen, Callum Piper and Martin Turnock who successfully cryptanalysed the Crypto Challenge.
To everyone who took part, and to everyone who donated to Bletchley and the Museum of Computing, we extend our deepest gratitude.
Overload Journal #98 - August 2010 + Programming Topics
Browse in : |
All
> Journals
> Overload
> 98
(7)
All > Topics > Programming (877) Any of these categories - All of these categories |