From 400a5a7ddd3adf8ef798302e60db8a0a0ff94b9f Mon Sep 17 00:00:00 2001 From: Jacob Appelbaum Date: Fri, 18 Feb 2011 16:04:23 -0500 Subject: [PATCH] Add TLS/cert normalization spec draft --- .../xxx-draft-spec-for-TLS-normalization.txt | 330 ++++++++++++++++++ 1 file changed, 330 insertions(+) create mode 100644 doc/spec/proposals/ideas/xxx-draft-spec-for-TLS-normalization.txt diff --git a/doc/spec/proposals/ideas/xxx-draft-spec-for-TLS-normalization.txt b/doc/spec/proposals/ideas/xxx-draft-spec-for-TLS-normalization.txt new file mode 100644 index 0000000000..14546dff04 --- /dev/null +++ b/doc/spec/proposals/ideas/xxx-draft-spec-for-TLS-normalization.txt @@ -0,0 +1,330 @@ +Filename: xxx-draft-spec-for-TLS-normalization.txt +Title: Draft spec for TLS certificate and handshake normalization +Author: Jacob Appelbaum +Created: 16-Feb-2011 +Status: Draft + + + Draft spec for TLS certificate and handshake normalization + + + Overview + +Scope + +This is a document that proposes improvements to problems with Tor's +current TLS (Transport Layer Security) certificates and handshake that will +reduce the distinguishability of Tor traffic from other encrypted traffic that +uses TLS. It also addresses some of the possible fingerprinting attacks +possible against the current Tor TLS protocol setup process. + +Motivation and history + +Censorship is an arms race and this is a step forward in the defense +of Tor. This proposal outlines ideas to make it more difficult to +fingerprint and block Tor traffic. + +Goals + +This proposal intends to normalize or remove easy-to-predict or static +values in the Tor TLS certificates and with the Tor TLS setup process. +These values can be used as criteria for the automated classification of +encrypted traffic as Tor traffic. Network observers should not be able +to trivially detect Tor merely by receiving or observing the certificate +used or advertised by a Tor relay. I also propose the creation of +a hard-to-detect covert channel through which a server can signal that it +supports the third version ("V3") of the Tor handshake protocol. + +Non-Goals + +This document is not intended to solve all of the possible active or passive +Tor fingerprinting problems. This document focuses on removing distinctive +and predictable features of TLS protocol negotiation; we do not attempt to +make guarantees about resisting other kinds of fingerprinting of Tor +traffic, such as fingerprinting techniques related to timing or volume of +transmitted data. + + Implementation details + + +Certificate Issues + +The CN or commonName ASN1 field + +Tor generates certificates with a predictable commonName field; the +field is within a given range of values that is specific to Tor. +Additionally, the generated host names have other undesirable properties. +The host names typically do not resolve in the DNS because the domain +names referred to are generated at random. Although they are syntatically +valid, they usually refer to domains that have never been registered by +any domain name registrar. + +An example of the current commonName field: CN=www.s4ku5skci.net + +An example of OpenSSL’s asn1parse over a typical Tor certificate: + + 0:d=0 hl=4 l= 438 cons: SEQUENCE + 4:d=1 hl=4 l= 287 cons: SEQUENCE + 8:d=2 hl=2 l= 3 cons: cont [ 0 ] + 10:d=3 hl=2 l= 1 prim: INTEGER :02 + 13:d=2 hl=2 l= 4 prim: INTEGER :4D3C763A + 19:d=2 hl=2 l= 13 cons: SEQUENCE + 21:d=3 hl=2 l= 9 prim: OBJECT :sha1WithRSAEncryption + 32:d=3 hl=2 l= 0 prim: NULL + 34:d=2 hl=2 l= 35 cons: SEQUENCE + 36:d=3 hl=2 l= 33 cons: SET + 38:d=4 hl=2 l= 31 cons: SEQUENCE + 40:d=5 hl=2 l= 3 prim: OBJECT :commonName + 45:d=5 hl=2 l= 24 prim: PRINTABLESTRING :www.vsbsvwu5b4soh4wg.net + 71:d=2 hl=2 l= 30 cons: SEQUENCE + 73:d=3 hl=2 l= 13 prim: UTCTIME :110123184058Z + 88:d=3 hl=2 l= 13 prim: UTCTIME :110123204058Z + 103:d=2 hl=2 l= 28 cons: SEQUENCE + 105:d=3 hl=2 l= 26 cons: SET + 107:d=4 hl=2 l= 24 cons: SEQUENCE + 109:d=5 hl=2 l= 3 prim: OBJECT :commonName + 114:d=5 hl=2 l= 17 prim: PRINTABLESTRING :www.s4ku5skci.net + 133:d=2 hl=3 l= 159 cons: SEQUENCE + 136:d=3 hl=2 l= 13 cons: SEQUENCE + 138:d=4 hl=2 l= 9 prim: OBJECT :rsaEncryption + 149:d=4 hl=2 l= 0 prim: NULL + 151:d=3 hl=3 l= 141 prim: BIT STRING + 295:d=1 hl=2 l= 13 cons: SEQUENCE + 297:d=2 hl=2 l= 9 prim: OBJECT :sha1WithRSAEncryption + 308:d=2 hl=2 l= 0 prim: NULL + 310:d=1 hl=3 l= 129 prim: BIT STRING + +I propose that the commonName field be generated to match a specific property +of the server in question. It is reasonable to set the commonName element to +match either the hostname of the relay, the detected IP address of the relay, +or for the relay operator to override certificate generation entirely by +loading a custom certificate. For custom certificates, see the Custom +Certificates section. + +I propose that the value for the commonName field be populated with the +fully qualified host name as detected by reverse and forward resolution of the +IP address of the relay. If the host name is in the DNS, this host name should +be set as the common name. When forward and reverse DNS is not available, I +propose that the IP address alone be used. + +The commonName field for the issuer should be set to known issuer names, +random words or omitted entirely. + +Since some host names may themselves trigger censorship keyword filters, +it may be reasonable to provide an option to override the defaults and +force certain values in the commonName field. + +Considerations for commonName normalization + +Any host name supplied for the commonName field should resolve - even if it +does not resolve to the IP address of the relay[0]. If the commonName field +does include an IP address, it should be the current IP address of the relay as +seen by other Internet hosts. + +Certificate serial numbers + +Currently our generated certificate serial number is set to the number of +seconds since the epoch at the time of the certificate's creation. I propose +that we should ensure that our serial numbers are unrelated to the epoch, +since the generation methods are potentially recognizable as Tor-related. +Instead, I propose that we use a randomly generated number that is +subsequently hashed with SHA-512 and then truncated to a length chosen at +random within a finite set of bounds. The length of the serial number should be +chosen randomly at certificate generation time; it should be bound between the +most commonly found bit lengths[1] in the wild. Random sixteen byte values +appear to be the high bound for serial number as issued by Verisign and +DigiCert. RapidSSL appears to be three bytes in length. Others common byte +lengths appear to be between one and four bytes. I propose that we choose a +byte length that is either 3, 4, or 16 bytes at certificate generation time. + +This randomly generated field may now serve as a covert channel that signals to +the client that the OR will not support TLS renegotiation; this means that the +client can expect to perform a V3 TLS handshake setup. Otherwise, if the serial +number is a reasonable time since the epoch, we should assume the OR is +using an earlier protocol version and hence that it expects renegotiation. + +As a security note, care must be taken to ensure that supporting this +covert channel will not lead to an attacker having a method to downgrade client +behavior. + +Certificate fingerprinting issues expressed as base64 encoding + +It appears that all deployed Tor certificates have the following strings in +common: + +MIIB +CCA +gAwIBAgIETU +ANBgkqhkiG9w0BAQUFADA +YDVQQDEx +3d3cu + +As expected these values correspond to specific ASN.1 OBJECT IDENTIFIER (OID) +properties (sha1WithRSAEncryption, commonName, etc) of how we generate our +certificates. + +As an illustrated example of the common bytes of all certificates used within +the Tor network within a single one hour window, I have replaced the actual +value with a wild card ('.') character here: + +-----BEGIN CERTIFICATE----- +MIIB..CCA..gAwIBAgIETU....ANBgkqhkiG9w0BAQUFADA.M..w..YDVQQDEx.3 +d3cu............................................................ +................................................................ +................................................................ +................................................................ +................................................................ +................................................................ +................................................................ +................................................................ +........................... <--- Variable length and padding +-----END CERTIFICATE----- + +This fine ascii art only illustrates the bytes that absolutely match in all +cases. In many cases, it's likely that there is a high probability for a given +byte to be only a small subset of choices. + +Using the above strings, the EFF's certificate observatory may trivially +discover all known relays, known bridges and unknown bridges in a single SQL +query. I propose that we ensure that we test our certificates to ensure that +they do not have these kinds of statistical similarities without ensuring +overlap with a very large cross section of the internet's certificates. + +Other certificate fields + +It may be advantageous to also generate values for the O, L, ST, C, and OU +certificate fields. The C and ST fields may be populated from GeoIP information +that is already available to Tor to reflect a plausible geographic location +for the OR. The other fields should contain some semblance of a word or +grouping of words. It has been suggested[2] that we should look to guides for +certificate generation that use OpenSSL as a reasonable baseline for +understanding these fields, as well as other certificate properties. + +Certificate dating and validity issues + +TLS certificates found in the wild are generally found to be long-lived; +they are frequently old and often even expired. The current Tor certificate +validity time is a very small time window starting at generation time and +ending shortly thereafter, as defined in or.h by MAX_SSL_KEY_LIFETIME +(2*60*60). + +I propose that the certificate validity time length is extended to a period of +twelve Earth months, possibly with a small random skew to be determined by the +implementer. Tor should randomly set the start date in the past or some +currently unspecified window of time before the current date. This would +more closely track the typical distribution of non-Tor TLS certificate +expiration times. + +The certificate values, such as expiration, should not be used for anything +relating to security; for example, if the OR presents an expired TLS +certificate, this does not imply that the client should terminate the +connection (as would be appropriate for an ordinary TLS implementation). +Rather, I propose we use a TOFU style expiration policy - the certificate +should never be trusted for more than a two hour window from first sighting. + +This policy should have two major impacts. The first is that an adversary will +have to perform a differential analysis of all certificates for a given IP +address rather than a single check. The second is that the server expiration +time is enforced by the client and confirmed by keys rotating in the consensus. + +The expiration time should not be a fixed time that is simple to calculate by +any Deep Packet Inspection device or it will become a new Tor TLS setup +fingerprint. + +Custom Certificates + +It should be possible for a Tor relay operator to use a specifically supplied +certificate and secret key. This will allow a relay or bridge operator to use a +certificate signed by any member of any geographically relevant certificate +authority racket; it will also allow for any other user-supplied certificate. +This may be desirable in some kinds of filtered networks or when attempting to +avoid attracting suspicion by blending in with the TLS web server certificate +crowd. + +Problematic Diffie–Hellman parameters + +We currently send a static Diffie–Hellman parameter, prime p (or “prime p +outlaw”) as specified in RFC2409 as part of the TLS Server Hello response. + +The use of this prime in TLS negotiations may, as a result, be filtered and +effectively banned by certain networks. We do not have to use this particular +prime in all cases. + +While amusing to have the power to make specific prime numbers into a new class +of numbers (cf. imaginary, irrational, illegal [3]) - our new friend prime p +outlaw is not required. + +The use of this prime in TLS negotiations may, as a result, be filtered and +effectively banned by certain networks. We do not have to use this particular +prime in all cases. + +I propose that the function to initialize and generate DH parameters be +split into two functions. + +First, init_dh_param() should be used only for OR-to-OR DH setup and +communication. Second, it is proposed that we create a new function +init_tls_dh_param() that will have a two-stage development process. + +The first stage init_tls_dh_param() will use the same prime that +Apache2.x [4] sends (or “dh1024_apache_p”), and this change should be +made immediately. This is a known good and safe prime number (p-1 / 2 +is also prime) that is currently not known to be blocked. + +The second stage init_tls_dh_param() should randomly generate a new prime on a +regular basis; this is designed to make the prime difficult to outlaw or +filter. Call this a shape-shifting or "Rakshasa" prime. This should be added +to the 0.2.3.x branch of Tor. This prime can be generated at setup or execution +time and probably does not need to be stored on disk. Rakshasa primes only +need to be generated by Tor relays as Tor clients will never send them. Such +a prime should absolutely not be shared between different Tor relays nor +should it ever be static after the 0.2.3.x release. + +As a security precaution, care must be taken to ensure that we do not generate +weak primes or known filtered primes. Both weak and filtered primes will +undermine the TLS connection security properties. OpenSSH solves this issue +dynamically in RFC 4419 [5] and may provide a solution that works reasonably +well for Tor. More research in this area including the applicability of +Miller-Rabin or AKS primality tests[6] will need to be analyzed and probably +added to Tor. + +Practical key size + +Currently we use a 1024 bit long RSA modulus. I propose that we increase the +RSA key size to 2048 as an additional channel to signal support for the V3 +handshake setup. 2048 appears to be the most common key size[0] above 1024. +Additionally, the increase in modulus size provides a reasonable security boost +with regard to key security properties. + +The implementer should increase the 1024 bit RSA modulus to 2048 bits. + +Possible future filtering nightmares + +At some point it may cost effective or politically feasible for a network +filter to simply block all signed or self-signed certificates without a known +valid CA trust chain. This will break many applications on the internet and +hopefully, our option for custom certificates will ensure that this step is +simply avoided by the censors. + +The Rakshasa prime approach may cause censors to specifically allow only +certain known and accepted DH parameters. + +Appendix: Other issues + +What other obvious TLS certificate issues exist? What other static values are +present in the Tor TLS setup process? + +[0] http://archives.seul.org/or/dev/Jan-2011/msg00051.html +[1] http://archives.seul.org/or/dev/Feb-2011/msg00016.html +[2] http://archives.seul.org/or/dev/Feb-2011/msg00039.html +[3] To be fair this is hardly a new class of numbers. History is rife with + similar examples of inane authoritarian attempts at mathematical secrecy. + Probably the most dramatic example is the story of the pupil Hipassus of + Metapontum, pupil of the famous Pythagoras, who, legend goes, proved the + fact that Root2 cannot be expressed as a fraction of whole numbers (now + called an irrational number) and was assassinated for revealing this + secret. Further reading on the subject may be found on the Wikipedia: + http://en.wikipedia.org/wiki/Hippasus + +[4] httpd-2.2.17/modules/ss/ssl_engine_dh.c +[5] http://tools.ietf.org/html/rfc4419 +[6] http://archives.seul.org/or/dev/Jan-2011/msg00037.html