tor/doc/spec/proposals/127-dirport-mirrors-downloads.txt

Filename: 127-dirport-mirrors-downloads.txt
Title: Relaying dirport requests to Tor download site / website
Version: $Revision$
Last-Modified: $Date$
Author: Roger Dingledine
Created: 2007-12-02
Status: Draft

1. Overview

  Some countries and networks block connections to the Tor website. As
  time goes by, this will remain a problem and it may even become worse.

  We have a big pile of mirrors (google for "Tor mirrors"), but few of
  our users think to try a search like that. Also, many of these mirrors
  might be automatically blocked since their pages contain words that
  might cause them to get banned. And lastly, we can imagine a future
  where the blockers are aware of the mirror list too.

  Here we describe a new set of URLs for Tor's DirPort that will relay
  connections from users to the official Tor download site. Rather than
  trying to cache a bunch of new Tor packages (which is a hassle in terms
  of keeping them up to date, and a hassle in terms of drive space used),
  we instead just proxy the requests directly to Tor's /dist page.

  Specifically, we should support

    GET /tor/dist/$1

  and

    GET /tor/website/$1

2. Direct connections, one-hop circuits, or three-hop circuits?

  We could relay the connections directly to the download site -- but
  this produces recognizable outgoing traffic on the bridge or cache's
  network, which will probably surprise our nice volunteers. (Is this
  a good enough reason to discard the direct connection idea?)

  Even if we don't do direct connections, should we do a one-hop
  begindir-style connection to the mirror site (make a one-hop circuit
  to it, then send a 'begindir' cell down the circuit), or should we do
  a normal three-hop anonymized connection?

  If these mirrors are mainly bridges, doing either a direct or a one-hop
  connection creates another way to enumerate bridges. That would argue
  for three-hop. On the other hand, downloading a 10+ megabyte installer
  through a normal Tor circuit can't be fun. But if you're already getting
  throttled a lot because you're in the "relayed traffic" bucket, you're
  going to have to accept a slow transfer anyway. So three-hop it is.

  Speaking of which, we would want to label this connection
  as "relay" traffic for the purposes of rate limiting; see
  connection_counts_as_relayed_traffic() and or_conn->client_used. This
  will be a bit tricky though, because these connections will use the
  bridge's guards.

3. Scanning resistance

  One other goal we'd like to achieve, or at least not hinder, is making
  it hard to scan large swaths of the Internet to look for responses
  that indicate a bridge.

  In general this is a really hard problem, so we shouldn't demand to
  solve it here. But we can note that some bridges should open their
  DirPort (and offer this functionality), and others shouldn't. Then
  some bridges provide a download mirror while others can remain
  scanning-resistant.

4. Integrity checking

  If we serve this stuff in plaintext from the bridge, anybody in between
  the user and the bridge can intercept and modify it. The bridge can too.

  If we do an anonymized three-hop connection, the exit node can also
  intercept and modify the exe it sends back.

  Are we setting ourselves up for rogue exit relays, or rogue bridges,
  that trojan our users?

  Answer #1: Users need to do pgp signature checking. Not a very good
  answer, a) because it's complex, and b) because they don't know the
  right signing keys in the first place.

  Answer #2: The mirrors could exit from a specific Tor relay, using the
  '.exit' notation. This would make connections a bit more brittle, but
  would resolve the rogue exit relay issue. We could even round-robin
  among several, and the list could be dynamic -- for example, all the
  relays with an Authority flag that allow exits to the Tor website.

  Answer #3: The mirrors should connect to the main distribution site
  via SSL. That way the exit relay can't influence anything.

  Answer #4: We could suggest that users only use trusted bridges for
  fetching a copy of Tor. Hopefully they heard about the bridge from a
  trusted source rather than from the adversary.

  Answer #5: What if the adversary is trawling for Tor downloads by
  network signature -- either by looking for known bytes in the binary,
  or by looking for "GET /tor/dist/"? It would be nice to encrypt the
  connection from the bridge user to the bridge. And we can! The bridge
  already supports TLS. Rather than initiating a TLS renegotiation after
  connecting to the ORPort, the user should actually request a URL. Then
  the ORPort can either pass the connection off as a linked conn to the
  dirport, or renegotiate and become a Tor connection, depending on how
  the client behaves.

5. Linked connections: at what level should we proxy?

  Check out the connection_ap_make_link() function, as called from
  directory.c. Tor clients use this to create a "fake" socks connection
  back to themselves, and then they attach a directory request to it,
  so they can launch directory fetches via Tor. We can piggyback on
  this feature.

  We need to decide if we're going to be passing the bytes back and
  forth between the web browser and the main distribution site, or if
  we're going to be actually acting like a proxy (parsing out the file
  they want, fetching that file, and serving it back).

  Advantages of proxying without looking inside:
    - We don't need to build any sort of http support (including
      continues, partial fetches, etc etc).
  Disadvantages:
    - If the browser thinks it's speaking http, are there easy ways
      to pass the bytes to an https server and have everything work
      correctly? At the least, it would seem that the browser would
      complain about the cert. More generally, ssl wants to be negotiated
      before the URL and headers are sent, yet we need to read the URL
      and headers to know that this is a mirror request; so we have an
      ordering problem here.
    - Makes it harder to do caching later on, if we don't look at what
      we're relaying. (It might be useful down the road to cache the
      answers to popular requests, so we don't have to keep getting
      them again.)

6. Outstanding problems

  1) HTTP proxies already exist.  Why waste our time cloning one
  badly? When we clone existing stuff, we usually regret it.

  2) It's overbroad.  We only seem to need a secure get-a-tor feature,
  and instead we're contemplating building a locked-down HTTP proxy.

  3) It's going to add a fair bit of complexity to our code.  We do
  not currently implement HTTPS.  We'd need to refactor lots of the
  low-level connection stuff so that "SSL" and "Cell-based" were no
  longer synonymous.

  4) It's still unclear how effective this proposal would be in
  practice. You need to know that this feature exists, which means
  somebody needs to tell you about a bridge (mirror) address and tell
  you how to use it. And if they're doing that, they could (e.g.) tell
  you about a gmail autoresponder address just as easily, and then you'd
  get better authentication of the Tor program to boot.
some notes on tor dist/ and website/ mirrors via dir caches svn:r12640 2007-12-02 15:41:39 +01:00			`Filename: 127-dirport-mirrors-downloads.txt`
a few more thoughts on mirroring dist/ on bridges svn:r12667 2007-12-04 19:34:30 +01:00			`Title: Relaying dirport requests to Tor download site / website`
propsets svn:r12644 2007-12-03 12:18:44 +01:00			`Version: $Revision$`
			`Last-Modified: $Date$`
some notes on tor dist/ and website/ mirrors via dir caches svn:r12640 2007-12-02 15:41:39 +01:00			`Author: Roger Dingledine`
			`Created: 2007-12-02`
make a new 'ideas' subdir for half-baked proposals. make a new 'draft' status for nearly-baked proposals. svn:r12677 2007-12-05 07:00:03 +01:00			`Status: Draft`
some notes on tor dist/ and website/ mirrors via dir caches svn:r12640 2007-12-02 15:41:39 +01:00
			`1. Overview`

			`Some countries and networks block connections to the Tor website. As`
			`time goes by, this will remain a problem and it may even become worse.`

			`We have a big pile of mirrors (google for "Tor mirrors"), but few of`
			`our users think to try a search like that. Also, many of these mirrors`
			`might be automatically blocked since their pages contain words that`
a few more thoughts on mirroring dist/ on bridges svn:r12667 2007-12-04 19:34:30 +01:00			`might cause them to get banned. And lastly, we can imagine a future`
some notes on tor dist/ and website/ mirrors via dir caches svn:r12640 2007-12-02 15:41:39 +01:00			`where the blockers are aware of the mirror list too.`

			`Here we describe a new set of URLs for Tor's DirPort that will relay`
			`connections from users to the official Tor download site. Rather than`
			`trying to cache a bunch of new Tor packages (which is a hassle in terms`
			`of keeping them up to date, and a hassle in terms of drive space used),`
			`we instead just proxy the requests directly to Tor's /dist page.`

			`Specifically, we should support`

			`GET /tor/dist/$1`

			`and`

			`GET /tor/website/$1`

more work on the dirport-mirrors-downloads proposal. still not really solved well yet. svn:r12690 2007-12-06 11:54:57 +01:00			`2. Direct connections, one-hop circuits, or three-hop circuits?`
some notes on tor dist/ and website/ mirrors via dir caches svn:r12640 2007-12-02 15:41:39 +01:00
			`We could relay the connections directly to the download site -- but`
			`this produces recognizable outgoing traffic on the bridge or cache's`
			`network, which will probably surprise our nice volunteers. (Is this`
			`a good enough reason to discard the direct connection idea?)`

a few more thoughts on mirroring dist/ on bridges svn:r12667 2007-12-04 19:34:30 +01:00			`Even if we don't do direct connections, should we do a one-hop`
			`begindir-style connection to the mirror site (make a one-hop circuit`
			`to it, then send a 'begindir' cell down the circuit), or should we do`
			`a normal three-hop anonymized connection?`
some notes on tor dist/ and website/ mirrors via dir caches svn:r12640 2007-12-02 15:41:39 +01:00
a few more thoughts on mirroring dist/ on bridges svn:r12667 2007-12-04 19:34:30 +01:00			`If these mirrors are mainly bridges, doing either a direct or a one-hop`
			`connection creates another way to enumerate bridges. That would argue`
			`for three-hop. On the other hand, downloading a 10+ megabyte installer`
			`through a normal Tor circuit can't be fun. But if you're already getting`
			`throttled a lot because you're in the "relayed traffic" bucket, you're`
			`going to have to accept a slow transfer anyway. So three-hop it is.`
some notes on tor dist/ and website/ mirrors via dir caches svn:r12640 2007-12-02 15:41:39 +01:00
			`Speaking of which, we would want to label this connection`
			`as "relay" traffic for the purposes of rate limiting; see`
			`connection_counts_as_relayed_traffic() and or_conn->client_used. This`
a few more thoughts on mirroring dist/ on bridges svn:r12667 2007-12-04 19:34:30 +01:00			`will be a bit tricky though, because these connections will use the`
			`bridge's guards.`
some notes on tor dist/ and website/ mirrors via dir caches svn:r12640 2007-12-02 15:41:39 +01:00
more work on the dirport-mirrors-downloads proposal. still not really solved well yet. svn:r12690 2007-12-06 11:54:57 +01:00			`3. Scanning resistance`
some notes on tor dist/ and website/ mirrors via dir caches svn:r12640 2007-12-02 15:41:39 +01:00
			`One other goal we'd like to achieve, or at least not hinder, is making`
			`it hard to scan large swaths of the Internet to look for responses`
			`that indicate a bridge.`

a few more thoughts on mirroring dist/ on bridges svn:r12667 2007-12-04 19:34:30 +01:00			`In general this is a really hard problem, so we shouldn't demand to`
			`solve it here. But we can note that some bridges should open their`
			`DirPort (and offer this functionality), and others shouldn't. Then`
			`some bridges provide a download mirror while others can remain`
			`scanning-resistant.`
some notes on tor dist/ and website/ mirrors via dir caches svn:r12640 2007-12-02 15:41:39 +01:00
more work on the dirport-mirrors-downloads proposal. still not really solved well yet. svn:r12690 2007-12-06 11:54:57 +01:00			`4. Integrity checking`
some notes on tor dist/ and website/ mirrors via dir caches svn:r12640 2007-12-02 15:41:39 +01:00
			`If we serve this stuff in plaintext from the bridge, anybody in between`
			`the user and the bridge can intercept and modify it. The bridge can too.`

			`If we do an anonymized three-hop connection, the exit node can also`
			`intercept and modify the exe it sends back.`

			`Are we setting ourselves up for rogue exit relays, or rogue bridges,`
			`that trojan our users?`

			`Answer #1: Users need to do pgp signature checking. Not a very good`
			`answer, a) because it's complex, and b) because they don't know the`
a few more thoughts on mirroring dist/ on bridges svn:r12667 2007-12-04 19:34:30 +01:00			`right signing keys in the first place.`
some notes on tor dist/ and website/ mirrors via dir caches svn:r12640 2007-12-02 15:41:39 +01:00
			`Answer #2: The mirrors could exit from a specific Tor relay, using the`
			`'.exit' notation. This would make connections a bit more brittle, but`
			`would resolve the rogue exit relay issue. We could even round-robin`
			`among several, and the list could be dynamic -- for example, all the`
			`relays with an Authority flag that allow exits to the Tor website.`

more work on the dirport-mirrors-downloads proposal. still not really solved well yet. svn:r12690 2007-12-06 11:54:57 +01:00			`Answer #3: The mirrors should connect to the main distribution site`
			`via SSL. That way the exit relay can't influence anything.`

			`Answer #4: We could suggest that users only use trusted bridges for`
some notes on tor dist/ and website/ mirrors via dir caches svn:r12640 2007-12-02 15:41:39 +01:00			`fetching a copy of Tor. Hopefully they heard about the bridge from a`
			`trusted source rather than from the adversary.`

more work on the dirport-mirrors-downloads proposal. still not really solved well yet. svn:r12690 2007-12-06 11:54:57 +01:00			`Answer #5: What if the adversary is trawling for Tor downloads by`
some notes on tor dist/ and website/ mirrors via dir caches svn:r12640 2007-12-02 15:41:39 +01:00			`network signature -- either by looking for known bytes in the binary,`
			`or by looking for "GET /tor/dist/"? It would be nice to encrypt the`
			`connection from the bridge user to the bridge. And we can! The bridge`
a few more thoughts on mirroring dist/ on bridges svn:r12667 2007-12-04 19:34:30 +01:00			`already supports TLS. Rather than initiating a TLS renegotiation after`
some notes on tor dist/ and website/ mirrors via dir caches svn:r12640 2007-12-02 15:41:39 +01:00			`connecting to the ORPort, the user should actually request a URL. Then`
			`the ORPort can either pass the connection off as a linked conn to the`
			`dirport, or renegotiate and become a Tor connection, depending on how`
			`the client behaves.`

more work on the dirport-mirrors-downloads proposal. still not really solved well yet. svn:r12690 2007-12-06 11:54:57 +01:00			`5. Linked connections: at what level should we proxy?`

			`Check out the connection_ap_make_link() function, as called from`
			`directory.c. Tor clients use this to create a "fake" socks connection`
			`back to themselves, and then they attach a directory request to it,`
			`so they can launch directory fetches via Tor. We can piggyback on`
			`this feature.`
a few more thoughts on mirroring dist/ on bridges svn:r12667 2007-12-04 19:34:30 +01:00
more work on the dirport-mirrors-downloads proposal. still not really solved well yet. svn:r12690 2007-12-06 11:54:57 +01:00			`We need to decide if we're going to be passing the bytes back and`
			`forth between the web browser and the main distribution site, or if`
			`we're going to be actually acting like a proxy (parsing out the file`
			`they want, fetching that file, and serving it back).`

			`Advantages of proxying without looking inside:`
			`- We don't need to build any sort of http support (including`
			`continues, partial fetches, etc etc).`
			`Disadvantages:`
			`- If the browser thinks it's speaking http, are there easy ways`
			`to pass the bytes to an https server and have everything work`
			`correctly? At the least, it would seem that the browser would`
			`complain about the cert. More generally, ssl wants to be negotiated`
			`before the URL and headers are sent, yet we need to read the URL`
			`and headers to know that this is a mirror request; so we have an`
			`ordering problem here.`
			`- Makes it harder to do caching later on, if we don't look at what`
			`we're relaying. (It might be useful down the road to cache the`
			`answers to popular requests, so we don't have to keep getting`
			`them again.)`
svn:r12671 2007-12-04 20:38:42 +01:00
add some details on why we haven't done proposal 127 yet, and may not ever do it. svn:r13884 2008-03-07 22:19:21 +01:00			`6. Outstanding problems`

			`1) HTTP proxies already exist. Why waste our time cloning one`
			`badly? When we clone existing stuff, we usually regret it.`

			`2) It's overbroad. We only seem to need a secure get-a-tor feature,`
			`and instead we're contemplating building a locked-down HTTP proxy.`

			`3) It's going to add a fair bit of complexity to our code. We do`
			`not currently implement HTTPS. We'd need to refactor lots of the`
			`low-level connection stuff so that "SSL" and "Cell-based" were no`
			`longer synonymous.`

			`4) It's still unclear how effective this proposal would be in`
			`practice. You need to know that this feature exists, which means`
			`somebody needs to tell you about a bridge (mirror) address and tell`
			`you how to use it. And if they're doing that, they could (e.g.) tell`
			`you about a gmail autoresponder address just as easily, and then you'd`
			`get better authentication of the Tor program to boot.`