Application Level Cryptography
{LANG_NAVORIGIN} Encryption
Ashish Anand
07/05/2005
Scenario:
- Mails sent from a browser (using Yahoo! etc.) are broken down into TCP packets that contain the body as plain text.
- Instant messages sent using clients (MSN, Yahoo! messenger etc.) are also sent as plaintext.
Vulnerability:
- TCP packets can be intercepted by monitoring the originating interface or by having access to any of the routers that they pass through. Privacy loses its charm when even your ISP can easily monitor all your data.
- Sad but true, inspite of using decent hardware (3Com HiperARC Dial-In PPP RASs’) ISP’s like MTNL have proved to show oblivion towards elementary security measures.
My work:
- I successfully managed to access most of their routers and monitor each interface, thus enabling me to view mails & chat sessions. The best part is that they can’t sue me for this!
- In the process of coding a Linux based server in C++ using secure TCP/IP sockets
- Developing a TSR application with Windows-DOS inter-portability to make available cipher-text in Windows Clipboard
- Key generation by measuring keyboard latency & tracking mouse movements. Technique is immune to physical attacks. Statistical study verifies randomness of technique used. Working on making key generation even more secure.
- “Private key” transfer using “public key”
- New “Session Key” after random time intervals, generated from the IM
- Modified IDEA & DESX standards to develop a simpler, faster, yet secure ciphering technique using whitening, transposition, block & stream ciphering, and compression algorithms.
- Resistance to brute force attacks.
Real time working of all concepts would be demonstrated…
“If I take a letter, lock it in a safe, hide the safe
somewhere in New York, then tell you to read the
letter, that’s not security. That’s obscurity. On
the other hand, if I take a letter and lock it in a
safe, and then give you the safe along with the
design specifications of the safe and a hundred
identical safes with their combinations so that
you and the world’s best safecrackers can study
the locking mechanism – and you still can’t open
the safe and read the letter – that’s security!” [1]
Fig. 1. Basic Cryptography [7]
In addition to providing confidentiality,
cryptography is often asked to do other jobs [1]:
Authentication: It should be possible for the
receiver of a message to ascertain its origin; an
intruder should not be able to masquerade as
someone else.
Integrity: It should be possible for the receiver
of a message to verify that it has not been
modified in transit; in intruder should not be able
to substitute a false message for a legitimate one.
Nonrepudiation: A sender should not be able to
falsely deny later that he sent a message.
This paper is inclined primarily towards the
integrity aspect of a successful cryptosystem.
Algorithms & Keys
A cryptographic algorithm, also called a
cipher,
is the mathematical function used for encryption
and decryption. If the security of an algorithm is
based on keeping the way that algorithm works a
secret, it is a
restricted algorithm. Restricted
algorithms are woefully inadequate by today’s
standards. A large or changing group of users cannot use them, because every time a user
leaves the group, everyone else must switch to a
different algorithm. If someone accidentally
reveals the secret, everyone must change their
algorithm.
Modern cryptography solves this problem with a
key. All of the security in key based algorithms
is based in the key (or keys); none is based in the
details of the algorithm. This means that the
algorithm can be
published and analyzed.
Products using the algorithm can be mass
produced. It doesn’t matter if an eavesdropper
knows your algorithm; if he doesn’t know your
particular key, he can’t read your messages.
“If privacy is outlawed, only outlaws will have
privacy…”
Choosing an algorithm
When it comes to evaluating and choosing
algorithms, people have several alternatives:
- They can chose a published algorithm, based
on the belief that a published algorithm has been
scrutinized by many cryptographers; if no one
has broken the algorithm yet, then it must be
pretty good
- They can trust a manufacturer, based on the
belief that a well-known manufacturer has a
reputation to uphold and is unlikely to risk that
reputation by selling equipment or programs
with inferior algorithms.
- They can trust a private consultant, based on
the belief that an impartial consultant is best
equipped to make a reliable evaluation of
different algorithms.
- They can trust the government, based on the
belief that the government is trustworthy and
wouldn’t steer its citizens wrong.
- They can write their own algorithms, based on
the belief that their cryptographic ability is
second-to-none and that they should trust nobody
but themselves.
How I decided to design my own algorithm
considering the above mentioned:
- The DES (Digital Encryption Standard) and
IDEA (International Data Encryption Standard)
are the most popular patented algorithms around
today. I chose a much simpler and thus faster
implementation of a combination of what these
algorithms do, including whitening,
transposition, block and stream ciphering
resulting in the evolution of an inherited version
of the DES variations (DESX) and IDEA used in
PGP. Thus besides originality, elements of
something widely published make the project
qualitative.
- Trusting a manufacturer would mean
purchasing a hardware implementation.
Practically speaking, one man alone (me); cannot
design an implementation of this magnitude, so
the second option was ruled out.
- Hiring a private consultant seems feasible, but
cryptography is not my profession, I haven’t
dedicated my life to it either. You’re probably
insane if you’re expecting someone to come up
with a radical new idea all by himself and I’d
rather not say why!
- The Indian Government is hardly involved in
such projects on a scale comparable to the US or the Europeans. Besides, this project is not
intended to be implemented on a scale so large
that it would require the intervention of a nations
Government.
- Writing my own algorithm…sounds fun,
though the decision is certainly not based on the
belief that my ability is second-to-none and that I
don’t trust anyone but myself!
“Don’t worry that you’re reinventing the wheel
all over again, that’s what learning is all
about…!”
Compression & Encryption
Using a data compression algorithm together
with an encryption algorithm makes sense for
two reasons:
- Cryptanalysis relies on exploiting
redundancies in the plaintext; compressing a file
before encryption reduces these redundancies
- Encryption is time -consuming; compressing a
file before encryption speeds up the entire
process
The important thing to remember is to compress
before encryption. If the encryption algorithm is
any good, the cipher-text will not be
compressible; it will look like random data. (This
makes a reasonable test of an encryption
algorithm; if the cipher-text can be compressed,
then the algorithm probably isn’t very good).
Table 1. Observations upon encrypting a sample text file without any
compression using the program I developed
The catch here is that the input text consisted of
the source code of my program, which has about
1000 line breaks. Combine that with multiple
repetitions of keywords like
cout, printf, get,
read, write and so on, and you have more
redundancies than one could imagine. More the
redundancies, greater is the compression.
In a way, this means that there are no bounds to
the range of the input character set (by no
bounds, I mean that it ranges across the
entire
ASCII character set; codes 0-255). Now that’s at
least about
200 different symbols for any
conventional input stream. But after the second
round of my block ciphering algorithm, I’m
limiting the output cipher text to an ASCII
character set ranging from codes 33 – 132, that’s
just about
100 different symbols. This leads to
one observation and one question:
There would be a
lot of character repetitions in
the resulting cipher text. Imagine representing
16,947 (~17,000) characters consisting of 200
different symbols using a character set of just
100 symbols. No wonder the size of the cipher
text is almost twice as that of the plain text!
Which one of the following ideal cases would
lead to greater compression?
- Compressing an input plain text of 200
characters in which each symbol is
unique (because it uses the 0-255 range,
with some exceptions)
- Compressing an input plain text of 200
characters in which at least 100 symbols
are unique (33-132) and each symbol
occurs exactly twice
If you couldn’t already guess, it’s the latter.
Better still, a practical example would verify that
symbol occurrences would actually be more than
twice, thus leading to even great amounts of
compression.
What conventional block ciphers don’t do is
limit their output to a smaller character set. And
that’s exactly what I’ve done. Thus, I’m
purposely compressing after encryption, rather
than the other way round. One may argue that if
the entire algorithm is intended to be made
public, then compressing the data wouldn’t add
to security as an eavesdropper could uncompress
it to obtain the cipher text with all the
redundancies. Now that would be undesirable.
To overcome this issue, I decided to treat the
compressed data with a transposition function
that would take the private key as an argument.
Thus without the key, it would be impossible to
uncompress the cipher-text.
IDEA - Block Cipher Algorithm
The first incarnation of the IDEA cipher, by
Xuejia Lai and Kames Massey, surfaced in 1990.
It was called PES (Proposed Encryption
Standard). After subsequent strengthening of the
algorithm, it was renamed to IDEA (International
Data Encryption Algorithm) in 1992. IDEA is
patented in Europe and the United States; the
patent is held by Ascom-Tech AG.
IDEA’s key length is 128 bits. Assuming that a
brute force attack is the most efficient, it would
require 2
128 (10
38) encryptions to recover the key.
Design a chip that can test a
billion keys per
second and throw a
billion of them at the
problem, and it will still take 10
13 years – that’s
longer than the age of the universe. An array
of 10
24 such chips can find the key in a day, but
there aren’t enough silicon atoms in the universe
to build such a machine.
This remarkable observation about this algorithm
prompted me to devise a software
implementation of something similar, but
something that would work much faster, since
IDEA uses 8 rounds of XOR and modular
addition and multiplication operations along with
52 sub keys derived from the 128 bit key. I cut it
down to just one round of XOR operations,
while using just 2 sub keys derived from a single
10 bit key. Though increasing the key length to
about 32 or 64 bits wouldn’t hurt, I shall stick to
just one round of XOR operations. (Inspite of
this simplification, encryption seems to remain a
slow process for comparatively large amounts of
data, as well as theoretically much less secure.
But my implementation is obviously not
intended to aim at achieving a Patent or
anything, and the outputs produced from my
program will appear secure enough to be
implemented on a personal basis as well as
within internal networks of an organization).
As already stated, one may keep adding rounds
and rounds of complexity to your algorithm, but
the power of your cryptosystem depends more on
how you generate the key.
Some concepts involved
Transposition Ciphers [Discussed later]
Whitening
Whitening is the name given to the technique of
XORing some key material with the input to a
block algorithm, and XORing some other key
material with the output. This was first done in
the DESX variant developed by RSA Data
Security, Inc.
Measuring Keyboard Latency
People’s typing patterns are both random and
nonrandom. They are nonrandom enough that
thy can be used as a means of identification, but
they are random enough that they can be used to
generate random bits. Here’s what I did:
Measure the time between successive keystrokes,
then take the least significant bits of those
measurements. These bits are going to be pretty
random. This technique may not work on a
UNIX system, since the keystrokes pass through
filters and other mechanisms before they get to
the program, but it will work just fine in a
DOS/Windows environment.
Next, find the sum of these random numbers.
Divide it by another random number between 1 –
11 generated by the C++ internal randomize()
function. If this number is greater than 5, then
divide the sum by this number, else multiply the
sum with it.
Calculate the 10-bit binary of the result and use it
as the private key.
Other sources of obtaining random data [1]:
- The sector number, time of day, and
seek latency for every disk operation
- Actual mouse position
- Number of current scan line of monitor
- Contents of the actually displayed
image
- Sizes of FATs, kernel tables, and so on
- CPU Load
- Arrival times of network packets
- Input from a microphone (noise)
“Random Number Generators in your Compiler
are NOT a source of randomness…!”
Statistics
During the program testing phase, the random
key generator's results at each run were recorded,
based on the typing patterns of a number of
different users. 5 different users were made to
generate 20 keys each, and the total number of
keys (100), were entered into a database for
further analysis.
Each user's data set was filtered out to obtain the
number of repetitions in the 20 keys generated.
The efficiency of my technique was obtained by
dividing the number of unique keys generated
with the total number of keys generated. Individual efficiencies for each user's data were
calculated, along with the average efficiency of
the entire key generation algorithm. Some users
were made to type different strings each time,
while others were asked to monotonously repeat
the same string each time.
Table 2. Key Generation Statistics - If all 100 items are filtered, 23 repetitions are found. This implies an average efficiency of 77%
PGP – Pretty Good Privacy
“If all the personal computers in the world—260
million—were put to work on a single PGPencrypted
message, it would still take an
estimated 12 million times the age of the
universe, on average, to break a single
message.”
PGP is an e-mail security program written by
Phil Zimmermann, based on the IDEA algorithm
for encryption of plaintext and uses the RSA
Public Key algorithm for encryption of the
private key.
Fig. 2. The details of how PGP works, is out of the scope of this document [6]
PGP uses a pass phrase to encrypt your private
key on your machine. You use the pass phrase to
decrypt and use your private key. A pass phrase
should be hard for you to forget and difficult for
others to guess. It should be something already
firmly embedded in your long-term memory,
rather than something you make up from scratch.
Why? Because if you forget your pass phrase,
the game’s over! Your private key is totally and
absolutely useless without your pass phrase and
nothing can be done about it. PGP is
cryptography that will keep major governments
out of your files. It will certainly keep you out of
your files, too!
Sounds pretty neat! Top that up with the fact that
it’s all OpenSource, which has its own plethora
of advantages which we don’t want to get into
right now.
E-Mail Link
Your IP address will be sent with this e-mail