tor/doc/spec/proposals/ideas/xxx-what-uses-sha1.txt
Nick Mathewson 183b5905bb Remove some stuff from the SHA-1 paragraph.
We don't need to explain the difference between 2nd preimage and
collision: anybody who doesn't know can use wikipedia.
2009-05-08 12:52:18 -04:00

248 lines
11 KiB
Plaintext

Filename: xxx-what-uses-sha1.txt
Title: Where does Tor use SHA-1 today?
Authors: Nick Mathewson, Marian
Created: 30-Dec-2008
Status: Meta
Introduction:
Tor uses SHA-1 as a message digest. SHA-1 is showing its age:
theoretical attacks for finding collisions against it get better
every year or two, and it will likely be broken in practice before
too long.
According to smart crypto people, the SHA-2 functions (SHA-256, etc)
share too much of SHA-1's structure to be very good. RIPEMD-160 is
also based on flawed past hashes. Some people think other hash
functions (e.g. Whirlpool and Tiger) are not as bad; most of these
have not seen enough analysis to be used yet.
Here is a 2006 paper about hash algorithms.
http://www.sane.nl/sane2006/program/final-papers/R10.pdf
(Todo: Ask smart crypto people.)
By 2012, the NIST SHA-3 competition will be done, and with luck we'll
have something good to switch too. But it's probably a bad idea to
wait until 2012 to figure out _how_ to migrate to a new hash
function, for two reasons:
1) It's not inconceivable we'll want to migrate in a hurry
some time before then.
2) It's likely that migrating to a new hash function will
require protocol changes, and it's easiest to make protocol
changes backward compatible if we lay the groundwork in
advance. It would suck to have to break compatibility with
a big hard-to-test "flag day" protocol change.
This document attempts to list everything Tor uses SHA-1 for today.
This is the first step in getting all the design work done to switch
to something else.
This document SHOULD NOT be a clearinghouse of what to do about our
use of SHA-1. That's better left for other individual proposals.
Why now?
The recent publication of "MD5 considered harmful today: Creating a
rogue CA certificate" by Alexander Sotirov, Marc Stevens, Jacob
Appelbaum, Arjen Lenstra, David Molnar, Dag Arne Osvik, and Benne de
Weger has reminded me that:
* You can't rely on theoretical attacks to stay theoretical.
* It's quite unpleasant when theoretical attacks become practical
and public on days you were planning to leave for vacation.
* Broken hash functions (which SHA-1 is not quite yet AFAIU)
should be dropped like hot potatoes. Failure to do so can make
one look silly.
Triage
How severe are these problems? Let's divide them into these
categories, where H(x) is the SHA-1 hash of x:
PREIMAGE -- find any x such that a H(x) has a chosen value
-- A SHA-1 usage that only depends on preimage
resistance
* Also SECOND PREIMAGE. Given x, find a y not equal to
x such that H(x) = H(y)
COLLISION<role> -- A SHA-1 usage that depends on collision
resistance, but the only party who could mount a
collision-based attack is already in a trusted role
(like a distribution signer or a directory authority).
COLLISION -- find any x and y such that H(x) = H(y) -- A
SHA-1 usage that depends on collision resistance
and doesn't need the attacker to have any special keys.
There is no need to put much effort into fixing PREIMAGE and SECOND
PREIMAGE usages in the near-term: while there have been some
theoretical results doing these attacks against SHA-1, they don't
seem to be close to practical yet. To fix COLLISION<code-signing>
usages is not too important either, since anyone who has the key to
sign the code can mount far worse attacks. It would be good to fix
COLLISION<authority> usages, since we try to resist bad authorities
to a limited extent. The COLLISION usages are the most important
to fix.
Kelsey and Schneier published a theoretical second preimage attack
against SHA-1 in 2005, so it would be a good idea to fix PREIMAGE
and SECOND PREIMAGE usages after fixing COLLISION usages or where fixes
require minimal effort.
http://www.schneier.com/paper-preimages.html
Additionally, we need to consider the impact of a successful attack
in each of these cases. SHA-1 collisions are still expensive even
if recent results are verified, and anybody with the resources to
compute one also has the resources to mount a decent Sybil attack.
Let's be pessimistic, and not assume that producing collisions of
a given format is actually any harder than producing collisions at
all.
What Tor uses hashes for today:
1. Infrastructure.
A. Our X.509 certificates are signed with SHA-1.
COLLSION
B. TLS uses SHA-1 (and MD5) internally to generate keys.
PREIMAGE?
* At least breaking SHA-1 and MD5 simultaneously is
much more difficult than breaking either
independently.
C. Some of the TLS ciphersuites we allow use SHA-1.
PREIMAGE?
D. When we sign our code with GPG, it might be using SHA-1.
COLLISION<code-signing>
* GPG 1.4 and up have writing support for SHA-2 hashes.
This blog has help for converting:
http://www.schwer.us/journal/2005/02/19/sha-1-broken-and-gnupg-gpg/
E. Our GPG keys might be authenticated with SHA-1.
COLLISION<code-signing-key-signing>
F. OpenSSL's random number generator uses SHA-1, I believe.
PREIMAGE
2. The Tor protocol
A. Everything we sign, we sign using SHA-1-based OAEP-MGF1.
PREIMAGE?
B. Our CREATE cell format uses SHA-1 for: OAEP padding.
PREIMAGE?
C. Our EXTEND cells use SHA-1 to hash the identity key of the
target server.
COLLISION
D. Our CREATED cells use SHA-1 to hash the derived key data.
??
E. The data we use in CREATE_FAST cells to generate a key is the
length of a SHA-1.
NONE
F. The data we send back in a CREATED/CREATED_FAST cell is the length
of a SHA-1.
NONE
G. We use SHA-1 to derive our circuit keys from the negotiated g^xy
value.
NONE
H. We use SHA-1 to derive the digest field of each RELAY cell, but that's
used more as a checksum than as a strong digest.
NONE
3. Directory services
[All are COLLISION or COLLISION<authority> ]
A. All signatures are generated on the SHA-1 of their corresponding
documents, using PKCS1 padding.
* In dir-spec.txt, section 1.3, it states,
"SIGNATURE" Object contains a signature (using the signing key)
of the PKCS1-padded digest of the entire document, taken from
the beginning of the Initial item, through the newline after
the Signature Item's keyword and its arguments."
So our attacker, Malcom, could generate a collision for the hash
that is signed. Thus, a second pre-image attack is possible.
Vulnerable to regular collision attack only if key is stolen.
If the key is stolen, Malcom could distribute two different
copies of the document which have the same hash. Maybe useful
for a partitioning attack?
B. Router descriptors identify their corresponding extra-info documents
by their SHA-1 digest.
* A third party might use a second pre-image attack to generate a
false extra-info document that has the same hash. The router
itself might use a regular collision attack to generate multiple
extra-info documents with the same hash, which might be useful
for a partitioning attack.
C. Fingerprints in router descriptors are taken using SHA-1.
* The fingerprint must match the public key. Not sure what would
happen if two routers had different public keys but the same
fingerprint. There could perhaps be unpredictable behaviour.
D. In router descriptors, routers in the same "Family" may be listed
by server nicknames or hexdigests.
* Does not seem critical.
E. Fingerprints in authority certs are taken using SHA-1.
F. Fingerprints in dir-source lines of votes and consensuses are taken
using SHA-1.
G. Networkstatuses refer to routers identity keys and descriptors by their
SHA-1 digests.
H. Directory-signature lines identify which key is doing the signing by
the SHA-1 digests of the authority's signing key and its identity key.
I. The following items are downloaded by the SHA-1 of their contents:
XXXX list them
J. The following items are downloaded by the SHA-1 of an identity key:
XXXX list them too.
4. The rendezvous protocol
A. Hidden servers use SHA-1 to establish introduction points on relays,
and relays use SHA-1 to check incoming introduction point
establishment requests.
B. Hidden servers use SHA-1 in multiple places when generating hidden
service descriptors.
* The permanent-id is the first 80 bits of the SHA-1 hash of the
public key
** time-period performs caclulations using the permanent-id
* The secret-id-part is the SHA-1 has of the time period, the
descriptor-cookie, and replica.
* Hash of introduction point's identity key.
C. Hidden servers performing basic-type client authorization for their
services use SHA-1 when encrypting introduction points contained in
hidden service descriptors.
D. Hidden service directories use SHA-1 to check whether a given hidden
service descriptor may be published under a given descriptor
identifier or not.
E. Hidden servers use SHA-1 to derive .onion addresses of their
services.
* What's worse, it only uses the first 80 bits of the SHA-1 hash.
However, the rend-spec.txt says we aren't worried about arbitrary
collisons?
F. Clients use SHA-1 to generate the current hidden service descriptor
identifiers for a given .onion address.
G. Hidden servers use SHA-1 to remember digests of the first parts of
Diffie-Hellman handshakes contained in introduction requests in order
to detect replays. See the RELAY_ESTABLISH_INTRO cell. We seem to be
taking a hash of a hash here.
H. Hidden servers use SHA-1 during the Diffie-Hellman key exchange with
a connecting client.
5. The bridge protocol
XXXX write me
A. Client may attempt to query for bridges where he knows a digest
(probably SHA-1) before a direct query.
6. The Tor user interface
A. We log information about servers based on SHA-1 hashes of their
identity keys.
COLLISION
B. The controller identifies servers based on SHA-1 hashes of their
identity keys.
COLLISION
C. Nearly all of our configuration options that list servers allow SHA-1
hashes of their identity keys.
COLLISION
E. The deprecated .exit notation uses SHA-1 hashes of identity keys
COLLISION