Network Security Library
Javascript Feeds    RSS Feed    Security Dashboard    SearchSecurity.com
About | Contact | Advertise | Site Map
Print Printer Friendly      PDF PDF Version
intrusion detection E-mail      Save Save This

MD5 To Be Considered Harmful Someday


{LANG_NAVORIGIN} Encryption
By: Dan Kaminsky, 03/04/2005



Strikeback: Traitor Tracing



Security is a battle between attackers and defenders, and defenders do not necessarily need to cede the ground of cryptographic exploitation to attackers alone. A research path known as "Strikeback" examines the mechanisms by which a defender under attack can exploit weaknesses in his attackers to defend his systems.

There are strikeback implications to this MD5 research.

The proof of concept for Stripwire was simple: Take an audio file encoded in the MPEG-1 Layer 3 (MP3) format. Append it to both vec1 and vec2. Note that the agglomerated files both play flawlessly and identically. This wasn't a surprising result; MP3 is a bitstream format and as such is highly resistant to so-called "junk data" infecting the datastream. But this was the first proof that two files with bit-differences and the same MD5 hash could still function correctly given a cooperative execution harness, and led to the basic design of Stripwire.

It also yielded an MP3 file that contained an extra bit of information -- whether vec1 or vec2 had been prepended. A single bit is not useful. But we are not constrained to a single bit.

Wang has disclosed that, given an arbitrary MD5 system state, her implementation is capable of finding a multicollision-capable set after approximately one hour of computation with one doppelganger computable every fifteen minutes after that. It is well within the realm of feasiblity to compute 16 sequential multicollision sets, each adapting to the MD5 state emitted by the previous, with 256 (or 28)computed doppelgangers for each set. Now, instead of the single bit of information represented by the choice between vec1 and vec2, we have 8 bits of information per prepended block -- and there are 16 blocks. This yields space for a 128 bit signature, and things just got much more interesting.

MP3


Consider the problem of tracing the path of an MP3 file as it winds its way through a peer to peer network. (Peer to Peer networks are, of course, just a special case of a distributed content network of which there are innumerable legitimate uses -- Google, for one.) Since MP3 files are error-resilient, one could connect a custom client to the network that prepended a unique 128-bit serial number to every song transmitted. Every second-level copy would now be individually tagged and it'd be possible to trace every file on the network back to the second-level host that retrieved it from the custom file server. Adding a deviating serial number to each file transmitted would normally cause problems as both the search algorithms and file integrity checks on P2P networks tend to be MD5 centric. But since the serial number is represented in a form that MD5 is blind to, nothing fails -- except perhaps some of the opacity of the P2P network.

That's not to say there aren't countermeasures. The serial number is easy to detect, can be trivially stripped, is simple to alter, and can be rendered inoperable simply by switching the network to another hashing algorithm. But even here there are caveats -- detection may be simple, but eliminating the serial number entirely will yield a different hash value, subtly breaking the network's ability to coalesce all identical payloads. And while it's possible for hosts on the P2P network to "mix and match" doppelganger sets from several hosts, it's relatively straightforward to identify a cryptographically secure subset of the $2^{128}$ possible serial numbers that makes it impossible for users to synthesize valid serial numbers through any other means of acquiring them from a first or second generation source. And finally peer to peer networks are still networks and as such are vulnerable to the greatest caveat of network effects: Even after faults are identified, so many nodes may depend on the faulty behavior that the value of the network is decreased more by fixing the fault than it is by suffering its continued presence.

There is one special case on P2P networks -- some designs allow a file to be acquired in pieces from several different nodes. One solution to this is to seed a file with serial numbers across its entire body, perhaps three sets every 128 kilobytes. This is a much more compute-intensive operation, though, since the multicollision sets must be computed on a per-file basis as different data will preceed each group of doppelgangers.

Executables


Barring some of the more creative and noticeably illegal designs which infect MP3 files with executable content, it's not possible for MP3-embedded signatures to yield any more evidence except for what they present by their existence on any number of hosts.

Actual executables are another story. They are generally quite full of undocumented and undocumentable functionality, much of it inserted by a compiler. (Thus the limits of auditors -- they may be able to read source, but how many can read what the compiler actually emits? Because that's what the system ultimately needs to trust.) Particularly if an executable is aggressively protected from public distribution, there can be no expectation of publically safe behavior (in fact, that's generally why aggressive protections are instituted in the first place. That and profit motives.) It would be quite irresponsible to embed code that erased hard drives or flooded networks...

But why not locate the source of the leak?

128 bits is a fair amount of capacity -- English text only takes 1.3 bits per character, compressed -- and it'd be reasonable to quadruple that if needed. Before distributing an illicitly acquired executable, an attacker is likely to test it during their packaging process. During this testing, the executable installer could be configured to collect PII (Personally Identifiable Information) from across the file system. The 128 to 512 most valuable bits of information would be locally transformed into the requisite MD5-blind series of doppelgangers, and injected back into the installer upon its exit before mass distribution could take place. The range of acquirable data is extensive -- potential sources include:
  1. Network data -- IP address, DNS name, default name server, MAC address
  2. Browser Cookies, Caches, and Password Stores -- Online Banking, Hotmail, Amazon 1-Click
  3. Cached Instant Messenger Credentials -- Yahoo, AOL IM, MSN, Trillian
  4. P2P Memberships -- KaZaA, Gnutella2
  5. Corporate Identifiers -- VPN Client Data / Logs
  6. Shipped Material -- CPU ID, Vendor ID, Windows Activation Key
  7. System Configurations -- Time Zone, Telephone API area code
  8. Wireless Data -- MAC addresses of local access points
  9. Existence Tests -- Special files in download directory
Also possible but legally problematic would be acquiring not just one hop's worth of data but watching the executable as it travels across large networks, containing identifying information for as many previous hops as possible. Capacity becomes a problem, as it does with IP's "Record Route" option, but we can handle it by dynamically reducing resolution (the RRDTool approach) or by simply keeping an overflow counter (what IP does).

This is not the first scheme assembled to uniquely tag executables. What's interesting here is that these tags are self-updating as the file is trafficked, and that the self-updating tags are difficult to detect even with dedicated file integrity checks (md5sums). In a very unique sense, this is a steganographic strategy aimed not at the human analyst but at the precise internals of the MD5 algorithm. It's quite effective.


Conclusions



The point is not that MD5 has collapsed. It hasn't. The point is that there's a very clear trend regarding the security level of MD5, and it isn't good. It is now undeniable that the selection of MD5 matters -- the constraint that deployed implementations of the one-way hash primitive be functionally identical has been broken. The failures detected are not merely algorithmic or theoretical, rather new capabilities above and beyond what the primitive specifies are made available by the selection of MD5. It is not expected that this paper will cause a precipitous decline in the use of MD5; that will probably occur when a means of silently introducing single-bit errors in arbitrary (rather than chosen) MD5 payloads is discovered.

But in the security community, we tend to complain about the "phase change" nature of our systems that suddenly collapse from secure to insecure on the discovery of a "zero day" exploit. The phase change for MD5 isn't here yet, but it will come, someday. Nobody should be surprised when that day arrives.


Footnotes



1 Since Ice and Fire deviate by only 6 bits, it may be possible for a particularly adept auditor to brute-force convert vec2 to vec1 and thus acquire the correct key to examine the AES encrypted payload. If bit deviations can be at arbitrary positions, this becomes a 245 attack; if Wang's attack only allows a few locations to be involved in multicollisions, (as it appears to do) converting vec2 to vec1 may be a near-trivial operation. Of course if textbf{these particular} hash collisions are employed cracking the encrypted payload requires only a copy of this paper.


References



[1] Antoine Joux, “Multicollisions in iterated hash functions. applications to cascaded constructions.,” 2004.
[2] XiaoyunWang, Dengguo Feng, Xuejia Lai, and Hongbo Yu, “Collisions for hash functions md4, md5, haval-128 and ripemd,” Cryptology ePrint Archive, Report 2004/199, 2004, http://eprint.iacr.org/.
[3] Hans Dobbertin, “Cryptanalysis of md5 compress,” 1996, http://citeseer.ist.psu.edu/68442.html.
[4] Philip Hawkes, Michael Paddon, and Gregory G. Rose, “Musings on the wang et al. md5 collision,” Cryptology ePrint Archive, Report 2004/264, 2004, http://eprint.iacr.org/.
[5] Stefan Lucks, “Design principles for iterated hash functions,” Cryptology ePrint Archive, Report 2004/253, 2004, http://eprint.iacr.org/.
[6] David Maynor, “Trust no one, not even yourself, or the weak link might be your build tools,” in The Black Hat Briefings USA, 2004, http://www.blackhat.com/presentations/bh-usa-04/bh-us-04-maynor.pdf.
[7] Ken Thompson, “Reflections on trusting trust,” Commun. ACM, vol. 27, no. 8, pp. 761–763, 1984.
[8] Andy Green Peter Barth, Jeff Mears, “Project b (hacking) overview,” 2004, http://www.xbox-linux.org/docs/projectboverview.html.
















E-Mail Link

Your IP address will be sent with this e-mail
From e-mail to e-mail



11964 Views
4.5/5 Rating
4 Votes
Newest
Highest Rated
Most Viewed
Reference

Javascript Feeds
RSS (New Papers)
Security Dashboard

About SecurityDocs
Advertise
Contact

Valid HTML 4.01!
Valid CSS!


Unless otherwise noted, all paper copyrights are owned by the author. The rest copyright 2003-2005 TechTarget

Privacy : Contact