Journal Articles
Browse in : |
All
> Journals
> CVu
> 282
(9)
All > Topics > Programming (877) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Encryption
Author: Martin Moene
Date: 08 May 2016 08:54:04 +01:00 or Sun, 08 May 2016 08:54:04 +01:00
Summary: Kevin Highley implements a common technique for secure communication.
Body:
Keeping secrets has been important for thousands of years. Modern communication systems and computers have given us new ways to store and exchange information, and new ways for the wrong people to read it. Encryption is a tool to tackle such problems, but what is it, and as a programmer, how do I do it?
Conventional encryption is about securing communications. A message is converted to a form that only the intended recipient can read. Often this depends on information that only the sender and recipient know, used as a key to ‘unlock’ the message.
It is assumed that any intercepting party knows about the encrypting methods. Obviously, if an enemy doesn’t know a message exists they won’t be looking for it. Even if they catch a message, we hope they won’t recognize it as such, or know how to read it. The only safe assumption, however, is that the enemy knows everything they could possibly know. Security through obscurity sometimes works, but you never know if someone even more obscure is watching you.
Simple systems
A message is a series of symbols, for our purposes a string of characters. Their meaning can be disguised by re-arranging their order and/or by substituting for individual characters with other characters.
For example, let us take the message “What shall we do tonight?†A simple re-ordering is achieved by arranging the letters in an array horizontally, and reading them out vertically. 25 characters, so a 5x5 array.
W h a t s h a l l w e d o t o n i g h t ?
Coded, the message becomes “Ws oihhw gaaethtl ot ldn?â€
Replacing each character with a letter from a certain distance further along in the alphabet (or for us, the ascii code) gives a code reputedly used by Julius Caesar. If the offset is +1:
“What shall we do tonight?â€
becomes
“Xibu!tibmm!xf!ep!upojhiu@â€
Both types were used in ancient times. The Greeks supposedly spirally wrapped a strip of leather round a stick, and wrote along the stick. The message could only be read if wrapped around the same diameter stick. This gives a version of the re-ordering code above. The stick diameter key corresponding to the dimensions of the array. In Julius Caesar’s code, the key is how many letters to shift along.
These approaches yield exciting looking messages, but would be easy to unravel.
I can see that ‘!’ Is probably space and ‘@’ a punctuation mark. On a longer message, and with some idea about what the message might be about, and what language it is in analysing letter frequencies will soon reveal all. A more elaborate character substitution system than to just shift n characters along would not help. The commonest character in the output still represents the commonest character in the input.
We can make the code look much more exotic by using strange fonts, Egyptian hieroglyphs or made up special symbols. So long as each letter in our plain text is always represented by the same symbol it is relatively simple to break the code. In normal English text ‘ ’ and ‘e’ occur much more often than ‘x’ and ‘q’. Count the number of times each symbol in the encoded text appears, and replace each symbol with the letter that occurs with the most similar frequency in English (or whatever language you think the message is in). This will often immediately give you something that is almost readable, needing only a few letter allocations to be swapped about. This gets easier with larger messages, and messages in which you can guess that certain words or people’s names are likely.
A less obviously structured re-ordering of the characters would be better, and we need a substitution system in which the same character might be represented by different characters each time it appears.
Computers and keys
Those older systems had keys of only a few letters or digits, there are only 26 letters in the alphabet, and only a limited number of practical stick sizes. In current terms they are 4 or 5 bit codes, they have only 24 or 25 possible keys. Such small key encryption systems can be broken by a sufficiently powerful modern computer. Simply run a version of the decrypting algorithm with every possible key. We can devise systems with more complex keys, but while a human codebreaker might be daunted by a system that required thousands of tries to find the right key, computers thrive on big numbers. WWII code crackers would have been delighted with the power of a modern laptop, but would have reminded you: if you’ve got this, what have the big boys got? A coding system that would take, even with the world’s top computers, years to break seems secure enough, very little intelligence stays useful for more than months. However – are you sure you know just how fast those top machines are? Are you sure there is no easier back-door way to unpick your encryption, which could improve decryption rates by orders of magnitude?
How big does a key need to be? Current super computers are chasing 100 petaflops (1017) and there are about 108 seconds in three years. A reasonably lucky code breaker might find the right value after trying 10% of all the possible values. Perhaps a secret, clever, dedicated code breaking system can test a key value in the time the machines we do know about take to do a floating point operation (note how fast colossus was compared with general purpose machines of the late 40s). We need keys giving more than 1026 possibilities, more still if we suspect there may be any cracks in the encryption method. 1026 ~= 287. Even with the more reasonable assumptions, that it might take 1000 operations to test a key, and that results are needed within a day, we still need a key with > 1020 possibilities, that’s about 267. The argument about 64 bit versus 128 bit encryption becomes clear. Even a perfect code system with no flaws and back doors might be breakable by brute force if it uses 64 bit or smaller keys. For the brute force decoder, as the key size goes up the problem becomes selecting probable messages (e.g. ones with recognizable words in them) from the vast number of possibles generated.
A large key makes brute force (try everything) approaches expensive in computer time and thus impractical. We need an algorithm that makes it hard work to test each key. We don’t want the breaker to be able to reject a key after just a few characters, we want no output until they have decoded most of the message. We also need to avoid anything that lets them home in on a solution. Trying to find our way to a particular level on a smooth slope is much easier than trying to find a stone of a particular weight from a pile of similar stones by weighing one at a time. If we were trying to find the zeroes of a function it is much easier if the function has smooth derivatives, and much harder if it seems to be a random step function.
One time pads
One way of defeating the frequency analysis attack is to use Julius Caesar’s substitution method, but have a different offset for each character in the message. The key is now a list of numbers the same size as the message, each number indicating the amount to shift the corresponding character in the message. The large key size is a problem: it is too big for people to remember, and if written down and passed around, it is possible for an enemy to learn it. Another weakness is that the same pattern is used for every message. If many messages have the same format (e.g. starting with the sender’s name and the date) the same pattern will appear at the start of each coded message, providing something for the code breaker to attack. Adding a re-ordering step makes patterns harder to spot, but not impossible. A solution is to change the list of offset numbers frequently, although this leads to problems of how you distribute the large and numerous keys. If a new table of numbers is used for every message this becomes a ‘one time pad’ system, somewhat cumbersome, but completely secure so long as the particular table used has no pattern and stays secret.
The one time pad system was widely used, despite the problems of key distribution. Many tables of numbers were prepared, all different and serial numbered. A field agent would be issued with some. When contacting base they would use a table to encode their message, and then destroy the sheet containing that table. The only copy of that particular table, identified by it’s serial number, now exists at base, so only they can decode the message.
The Enigma
Another way of changing the coding for each character was enigma type machines. They had a series of wheels, each of which generated a substitute for whichever letter you fed to it. Several wheels in series providing a more elaborate pathway. Pressing a key on the machine caused a letter to light up on the output. The clever bit was that the wheels were moved on after each letter had been pressed, generating a new sequence of connections, and a new coding for the next letter. The key for this system involved knowing the wiring patterns of all the wheels, which wheel was in each slot in the machine, and where each wheel has been rotated to at the start of the message. There were also jumpers that controlled other variables. All in all a fiendishly complicated device. The story of how these codes were broken at Bletchley Park is well worth a read and visit if you don’t already know it.
Other methods
There are other ways of sending secret messages. Some are variations on the substitution and re-arrangement types. Public key encryption is a system where the key used to encrypt is different to the one needed to decrypt. It is important for much of web security. As far as I know it depends upon being able to find the prime factors of very large numbers, which is a difficult problem on it’s own. Code systems where symbols stand for concepts rather than characters are hard to break. They overlap with the problems of trying to understand ancient languages. The code talkers of WW2 and the book code, as used in a Sherlock Holmes story, did not need a computer, nor do slang, jargon and the ever evolving languages of youth, designed to keep parents in the dark. These I can ignore in the next section, on using a computer for encryption. As for quantum methods, well, I just don’t know how they work.
Book code: Message is sent as a series of number triads, each representing a word from a book selected by page:line:word. If needed, number quads can represent individual letters. The key here is the book to be used.
Code talkers: Messages were sent as conversations between members of a native American tribe in their own language. There were only a few speakers of the language left, and it was not available in a written form, so there was little chance of an enemy understanding it.
To be pedantic most of what we have been looking at should be referred to as ciphers rather than codes. Ciphers tend to have a 1:1 relationship between plaintext character counts and disguised text counts, codes can have complex ideas represented by short sets of symbols.
Programmers do it with a computer
Fortunately for the code writer (if not for big brother) it is simple to implement, scrambling, one time pad, and enigma type codes on almost any computer.
My programs frequently do the same thing to all the elements of an array. All the arrays are the same size, 256 bytes. To save writing the same code repeatedly:
#define Ri for(i=0;i<=255;i++)
Table (aka wheel) of size 256, and contains one each of each possible value. Why?
I am assuming the enemy knows everything, so I might as well use sizes that are convenient for computing. In assembler, I used 8 bit characters, pointers, and arithmetic. 256 eliminates the need for range checking, an 8 bit pointer can’t get out of a 256 array. All different values reduces the key size from 256256 ~= 10616 to a mere 256! ~= 10506 – still a big number. By keeping all the numbers and merely re-arranging each pass I hope to avoid a wheel ever collapsing to all zeros, or other problem pattern. An all different 256 array can be used as a scrambling tool as well as a one time pad tool.
Characters are often stored on computer as 8 bit values. Those 8 bit values could also be thought of as numbers in the range 0–255. Using 8 bit unsigned arithmetic, if I add numbers in the range 0–255 to the numbers representing each character in my message I will get a new collection of numbers in the range 0–255. With 8 bit arithmetic, overflows are discarded.
If the plain text is in an array uint8_t a[256]
and our random numbers key in uint8_t x[256]
then Ri a[i] += x[i];
provides the encryption step, a[]
now contains the encrypted message. a[]
and x[]
need to be unsigned char
or uint8_t
to get the right overflow behaviour. That's it! unbreakable encryption in one statement.
Part of the security problem is to ensure that the machines you are using are not already infested with some key grabber or screen shot program that records the plain text before you encrypt it. This is easier to check with less software layers between you and the hardware. The code examples were first imagined in assembler, to run on very minimal machines. This was changed because A, My friends no longer use such machines, and B, few find assembler easy to read, and the point of writing the code was to illustrate an idea. The current program in minimal C running on the console (stdin
/stdout
) should run on many machines. I want you to modify and compile for yourself, at this level of security code, don’t trust anything where you haven’t seen and understood the source. For better security... am I paranoid, or is that Windows 10 watching me? Run your encrypt and decrypt programs on a small machine that is not connected to any network or other machines, and transfer the encrypted file via a memory stick to the machine that sends it.
If I later subtract the same numbers, I will get back my original character values, underflows are also discarded. If all the numbers added had been the same we would have had Julius Caesar’s code, but as the numbers added are all different, and nearly random (no patterns to find and exploit), we have a strong code. The key is now a table of nearly random numbers the same size as the message.
Decrypt is just Ri a[i] -= x[i];
The random number table approach is good, but the table must change frequently. If we make our table of random numbers completely new for each message we have a ‘one time pad’ system. If we were to move to a different place in the table after each character we have an enigma-like system.
We need random number tables. Assuming the random generator has been seeded, the code in Listing 1 generates one.
uint8_t a,b,z; int i; Ri x[i] = i; // Fill x array with values 0 - 255 for (i=0; i<2000; i++) { // Swap random pairs // of values 2000 times a = rand(); b = rand(); // to give all possible z = x[a]; // 8 bit numbers, but in a random x[a] = x[b]; // order. Like Eric Morecombe's x[b] = z; // music "All the right notes, but } // not necessarily in the right // order." |
Listing 1 |
It might be better to restrict rand()
to produce only numbers in the range 0–255, here it seems to work, presumably by taking only the bottom 8 bits. There are probably many better ways, but it is important to be your own carpenter when making random number tables. The only way to read a one time pad system is to know the tables, if you copy some else’s tables or their generating code that might just be discoverable. It is also unwise to rely on the system supplied random number generator, some are flawed. Using the random number indirectly as here may insulate you from some generator foibles.
With this sort of random number table containing one each of the numbers 0–255 an order scrambling routine is very simple to implement.
With x[]
the random table, o[]
the plain text, and a[]
the scrambled text:
To scramble, Ri a[i] = o[x[i]];
and to unscramble again, Ri o[x[i]] = a[i];
The tools we have so far would enable us to produce a secure email encrypting program. Generate a few million random number tables, store them on your machine, copy onto a flash drive or DVD and give it to the intended recipient. Convert your message to an 8 bit unsigned character string, add a random number table, then use the same table to scramble. Send the encrypted array, and tell recipient which random table to use to decrypt it.
Potential problem?
If we re use the random numbers, even if changed daily, a repeatedly sent message that was all one character, such as blank, would be converted to an offset version of the table. That may not seem to be a problem, the enemy doesn’t know that you have sent an image of today’s pattern. But, imagine something resting on a keyboard generating twenty pages of white space, even a dumb cracker might notice twenty consecutive identical messages, and figure out what had happened. Such mishaps have been the clue that cracks a code.
So far so good, but the receiver may fall into the wrong hands, and the enemy obtain the random number flash drive, and details of your encrypt/decrypt algorithm.
We can dispense with the flash drive by making our machines scramble the random number array after each message. We need to do exactly the same scrambling at both transmitter and receiver. This has the advantage that even if a machine is captured it can only decode the next message, the tables used for previous messages are gone. The one thing that transmitter and receiver have in common is the message, so use that to scramble the table.
Random table in x[]
encrypted message in a[]
temporary z
, all uint8_t
(see Listing 2). The same code is used at both ends after encrypting or decrypting a message.
Ri { z = x[i]; x[i] = x[a[i]]; x[a[i]] = z; } |
Listing 2 |
The system now changes the random table every message. A single stage provides fairly solid encryption, the system is behaving almost as a one time pad. An attacker with access to the receiver, all the encoded and some decoded messages would still be able to work out the current settings, and, knowing the scramble algorithm, ‘unzip’ the whole chain of messages. The next stage makes that more difficult.
It would be useful to re-introduce small keys, something the user can remember, which stops an enemy immediately using a captured machine. We also need to make the decrypt more clock cycle hungry so that a captured machine cannot be quickly put back into action with brute force discovered small keys. To this end I now introduce multiple random number tables, and a more complicated encryption algorithm. Our small keys select the order in which the tables are used, and where in each table we start.
The main encryption stage uses ideas from the enigma code machine (that’s why I start calling random number tables wheels). As a character is encoded by a wheel it also moves that wheel to a new position. The process is repeated 5 times with five different wheels.
We now have 6 wheels, they are implemented as the arrays x[256][6]
, and 6 keys, stored in array k[6]
, each holding a value 0–255. The array w[6]
represents the slots in our imaginary enigma machine, so w[n]
contains the number of the wheel in slot n
. k[n]
contains the starting position for the wheel in slot n. Slot 0 has already been used with the initial encode and scramble. We count along the slots with j
and have 6 working arrays a[~][6]
. The already partly encrypted message starts in a[~][0]
, and ends up in a[~][5]
.
for(j=1; j<=5; j++) { // Main enigma loop k = key[j]; // Wheel to start position Ri { a[i][j] = // Value from wheel added a[i][j-1] + x[k][w[j]]; k = [a[i][j-1]][w[j]]; // Wheel moved to } // new position } // according to character just read
Finally the code wheels are scrambled using the intermediate stages of the above encryption, (which being ephemeral are not available to a code breaker), to control the scramble. Temporary swap value z
is again uint8_t
.
for(j=0; j<=5; j++) { Ri { z = x[i][w[j]]; x[i][w[j]] = x[a[i][j]][w[j]]; x[a[i][j]][w[j]] = z; } }
The main encrypt loop and the code wheel scrambling are separate here to make it easier to see what is happening. They could be combined into one loop to reduce memory use.
The decrypt routines are almost a mirror image of the encrypt.
I notice that the encoded message looks very different to the plain text – lots of unprintable characters. If messages are mostly printable characters then perhaps we should convert the encoded text to mainly printables. We will have to do that any way in order to use some communication channels. Or perhaps a first pass which assigns several of the 0–255 symbols to ‘e’ and ‘space’ and fewer to ‘q’ to make messages look more like noise across the whole 0-255. Anything which gives a breaker program less of a entry point is to be encouraged.
The demonstration program, available from https://github.com/KevinHighley/CryptoEntanglement, is deliberately minimal, I want it to run on almost any machine with a C compiler. Because I rely on 8 bit unsigned arithmetic overflow behaviour, be careful if you change anything, not to mix uint8_t
values with int
counters. The 8 bit variables get promoted to 16 or 32, and ‘interesting’ things happen. The demo uses two sets of x[]
arrays, xs[]
are the arrays on the source machine, xd[]
on the destination. I have a more elaborate version of the demo written using Qt widgets and C++. It works on my Linux mint setup, but I have not yet worked out how to port it to Apple, Windows and Android, which is supposed to be straight forward with Qt. This will probably have been done by the time this sees print – although I am being distracted by the classic travelling salesman problem at the moment.
Conclusion
Secure communication between devices can be achieved by the one time pad system. In the past this was restricted by the need to distribute coding pads with suitable sets of random number tables, more data than all the messages to be sent put together. The current price of memory sticks makes 16GB of random number tables generated, and physically passed between stations practical. No wonder the authorities say they only want to store information about messages, not content. The content of e-mails is potentially pretty secure anyway.
I suggest an alternative approach which doesn’t need big random number tables. Set up a few random number tables, matched on transmitter and receiver, and mutate them as part of the process of sending and receiving each message keeping Tx and Rx in step. This also has the effect of making the receiving machine capable of decoding only the current message, even with the right user keys. A captured machine cannot decode past or future messages.
The old assumption was that the enemy has access to all message traffic, but not to either end. A safer assumption is that either end may be compromised at any time, and to restrict potential damage in that event as much as possible.
Notes:
More fields may be available via dynamicdata ..