tor/doc/spec/proposals/141-jit-sd-downloads.txt

Filename: 141-jit-sd-downloads.txt
Title: Download server descriptors on demand
Version: $Revision$
Last-Modified: $Date$
Author: Peter Palfrader
Created: 15-Jun-2008
Status: Draft

1. Overview

  Downloading all server descriptors is the most expensive part
  of bootstrapping a Tor client.  These server descriptors currently
  amount to about 1.5 Megabytes of data, and this size will grow
  linearly with network size.

  Fetching all these server descriptors takes a long while for people
  behind slow network connections.  It is also a considerable load on
  our network of directory mirrors.

  This document describes proposed changes to the Tor network and
  directory protocol so that clients will no longer need to download
  all server descriptors.

  These changes consist of moving load balancing information into
  network status documents, implementing a means to download server
  descriptors on demand in an anonymity-preserving way, and dealing
  with exit node selection.

2. What is in a server descriptor

  When a Tor client starts the first thing it will try to get is a
  current network status document: a consensus signed by a majority
  of directory authorities.  This document is currently about 100
  Kilobytes in size, tho it will grow linearly with network size.
  This document lists all servers currently running on the network.
  The Tor client will then try to get a server descriptor for each
  of the running servers.  All server descriptors currently amount
  to about 1.5 Megabytes of downloads.

  A Tor client learns several things about a server from its descriptor.
  Some of these it already learned from the network status document
  published by the authorities, but the server descriptor contains it
  again in a single statement signed by the server itself, not just by
  the directory authorities.

  Tor clients use the information from server descriptors for
  different purposes, which are considered in the following sections.

  #three ways:  One, to determine if a server will be able to handle
  #this client's request; two, to actually communicate or use the server;
  #three, for load balancing decisions.
  #
  #These three points are considered in the following subsections.

2.1 Load balancing

  The Tor load balancing mechanism is quite complex in its details, but
  it has a simple goal: The more traffic a server can handle the more
  traffic it should get.  That means the more traffic a server can
  handle the more likely a client will use it.

  For this purpose each server descriptor has bandwidth information
  which tries to convey a server's capacity to clients.

  Currently we weigh servers differently for different purposes.  There
  is a weigh for when we use a server as a guard node (our entry to the
  Tor network), there is one weigh we assign servers for exit duties,
  and a third for when we need intermediate (middle) nodes.

2.2 Exit information

  When a Tor wants to exit to some resource on the internet it will
  build a circuit to an exit node that allows access to that resource's
  IP address and TCP Port.

  When building that circuit the client can make sure that the circuit
  ends at a server that will be able to fulfill the request because the
  client already learned of all the servers' exit policies from their
  descriptors.

2.3 Capability information

  Server descriptors contain information about the specific version or
  the Tor protocol they understand [proposal 105].

  Furthermore the server descriptor also contains the exact version of
  the Tor software that the server is running and some decisions are
  made based on the server version number (for instance a Tor client
  will only make conditional consensus requests [proposal 139] when
  talking to Tor servers version 0.2.1.1-alpha or later).

2.4 Contact/key information

  A server descriptor lists a server's IP address and TCP ports on which
  it accepts onion and directory connections.  Furthermore it contains
  the onion key (a short lived RSA key to which clients encrypt CREATE
  cells).

2.5 Identity information

  A Tor client learns the digest of a server's key from the network
  status document.  Once it has a server descriptor this descriptor
  contains the full RSA identity key of the server.  Clients verify
  that 1) the digest of the identity key matches the expected digest
  it got from the consensus, and 2) that the signature on the descriptor
  from that key is valid.


3. No longer require clients to have copies of all SDs

3.1 Load balancing info in consensus documents

  One of the reasons why clients download all server descriptors is for
  doing load proper load balancing as described in 2.1.  In order for
  clients to not require all server descriptors this information will
  have to move into the network status document.

  Consensus documents will have a new line per router similar
  to the "r", "s", and "v" lines that already exist.  This line
  will convey weight information to clients.

   "w Bandwidth=193671"

  The bandwidth number is the lesser of observed bandwidth and bandwidth
  rate limit from the server descriptor that the "r" line referenced by
  digest (1st and 3rd field of the bandwidth line in the descriptor).

  Authorities will cap the bandwidth number at some arbitrary value,
  currently 10MB/sec.  If a router claims a larger bandwidth an
  authority's vote will still only show Bandwidth=10000000.

  The consensus value for bandwidth is the median of all bandwidth
  numbers given in votes.  In case of an even number of votes we use
  the lower median.  (Using this procedure allows us to change the
  cap value more easily.)

  Clients should believe the bandwidth as presented in the consensus,
  not capping it again.

3.2 Fetching descriptors on demand

  As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
  and the onion key for a server.

  A client already knows the IP address and the ports from the consensus
  documents, but without the onion key it will not be able to send
  CREATE/EXTEND cells for that server.  Since the client needs the onion
  key it needs the descriptor.

  If a client only downloaded a few descriptors in an observable manner
  then that would leak which nodes it was going to use.

  This proposal suggests the following:

  1) when connecting to a guard node for which the client does not
     yet have a cached descriptor it requests the descriptor it
     expects by hash.  (The consensus document that the client holds
     has a hash for the descriptor of this server.  We want exactly
     that descriptor, not a different one.)

     It does that by sending a RELAY_REQUEST_SD cell.

     A client MAY cache the descriptor of the guard node so that it does
     not need to request it every single time it contacts the guard.

  2) when a client wants to extend a circuit that currently ends in
     server B to a new next server C, the client will send a
     RELAY_REQUEST_SD cell to server B.  This cell contains in its
     payload the hash of a server descriptor the client would like
     to obtain (C's server descriptor).  The server sends back the
     descriptor and the client can now form a valid EXTEND/CREATE cell
     encrypted to C's onion key.

     Clients MUST NOT cache such descriptors.  If they did they might
     leak that they already extended to that server at least once
     before.

  Replies to RELAY_REQUEST_SD requests need to be padded to some
  constant upper limit in order to conceal a client's destination
  from anybody who might be counting cells/bytes.

  RELAY_REQUEST_SD cells contain the following information:
    - hash of the server descriptor requested
    - hash of the identity digest of the server for which we want the SD
    - IP address and OR-port or the server for which we want the SD
    - padding factor - the number of cells we want the answer
      padded to.
      [XXX this just occured to me and it might be smart.  or it might
       be stupid.  clients would learn the padding factor they want
       to use from the consensus document.  This allows us to grow
       the replies later on should SDs become larger.]
  [XXX: figure out a decent padding size]

3.3 Protocol versions

  Server descriptors contain optional information of supported
  link-level and circuit-level protocols in the form of
  "opt protocols Link 1 2 Circuit 1".  These are not currently needed
  and will probably eventually move into the "v" (version) line in
  the consensus.  This proposal does not deal with them.

  Similarly a server descriptor contains the version number of
  a Tor node.  This information is already present in the consensus
  and is thus available to all clients immediately.

3.4 Exit selection

  Currently finding an appropriate exit node for a user's request is
  easy for a client because it has complete knowledge of all the exit
  policies of all servers on the network.

  The consensus document will once again be extended to contain the
  information required by clients.  This information will be a summary
  of each node's exit policy.  The exit policy summary will only contain
  the list of ports to which a node exits to most destination IP
  addresses.

  A summary should claim a router exits to a specific TCP port if,
  ignoring private IP addresses (link and site local per RFC3300), the
  exit policy indicates that the router would exit to this port to any
  IP address with the exception of at most 2^25 single addresses (That's
  either two /8 netblocks, or one /8 and a couple of /12s or any other
  combination).

  An exit policy summary will be included in votes and consensus as a
  new line attached to each exit node.  A lack of policy should indicate
  a non-exit policy.  The line will have the format
   "p" <space> "accept"|"reject" <portlist>
  where portlist is a comma seperated list of single port numbers or
  portranges (e.g.  "22,80-88,1024-6000,6667").  Whether the summary
  shows the list of accepted ports or the list of rejected ports depends
  on which list is shorter (has less elements).  In case of ties we
  choose the list of accepted ports.

  Similarly to IP address, ports, timestamp, and bandwidth a consensus
  should list the exit policy matching the descriptor digest referenced
  in the consensus document.

3.4.1 Client behaviour

  When choosing an exit node for a specific request a Tor client will
  choose from the list of nodes that exit to the requested port as given
  by the consensus document.  If a client has additional knowledge (like
  cached full descriptors) that indicates the so chosen exit node will
  reject the request then it MAY use that knowledge (or not include such
  nodes in the selection to begin with).  However, clients MUST NOT use
  nodes that do not list the port as accepted in the summary (but for
  which they know that the node would exit to that address from other
  sources, like a cached descriptor).

  An exception to this is exit enclave behaviour: A client MAY use the
  node at a specific IP address to exit to any port on the same address
  even if that node is not listed as exiting to the port in the summary.

4. Migration

4.1 Consensus document changes.

  The consensus will need to include
    - bandwidth information (see 3.1)
    - exit policy summaries (3.4)

  A new consensus method (number TBD) will be chosen for this.

5. Future possibilities

  This proposal still requires that all servers have the descriptors of
  every other node in the network in order to answer RELAY_REQUEST_SD
  cells.  These cells are sent when a circuit is extended from ending at
  node B to a new node C.  In that case B would have to answer a
  RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).

  In order to answer that request B obviously needs a copy of C's server
  descriptor.  The RELAY_REQUEST_SD cell already has all the info that
  B needs to contact C so it can ask about the descriptor before passing it
  back to the client.
Add proposal 141: download server descriptors on demand. (Status: Draft). svn:r15302 2008-06-16 19:30:22 +02:00			`Filename: 141-jit-sd-downloads.txt`
			`Title: Download server descriptors on demand`
			`Version: $Revision$`
			`Last-Modified: $Date$`
			`Author: Peter Palfrader`
			`Created: 15-Jun-2008`
			`Status: Draft`

			`1. Overview`

			`Downloading all server descriptors is the most expensive part`
			`of bootstrapping a Tor client. These server descriptors currently`
			`amount to about 1.5 Megabytes of data, and this size will grow`
			`linearly with network size.`

			`Fetching all these server descriptors takes a long while for people`
			`behind slow network connections. It is also a considerable load on`
			`our network of directory mirrors.`

			`This document describes proposed changes to the Tor network and`
			`directory protocol so that clients will no longer need to download`
			`all server descriptors.`

			`These changes consist of moving load balancing information into`
			`network status documents, implementing a means to download server`
			`descriptors on demand in an anonymity-preserving way, and dealing`
			`with exit node selection.`

			`2. What is in a server descriptor`

			`When a Tor client starts the first thing it will try to get is a`
changes sitting in my trunk sandbox svn:r15955 2008-07-16 02:05:46 +02:00			`current network status document: a consensus signed by a majority`
Add proposal 141: download server descriptors on demand. (Status: Draft). svn:r15302 2008-06-16 19:30:22 +02:00			`of directory authorities. This document is currently about 100`
			`Kilobytes in size, tho it will grow linearly with network size.`
			`This document lists all servers currently running on the network.`
			`The Tor client will then try to get a server descriptor for each`
			`of the running servers. All server descriptors currently amount`
I still think Metabytes are much cooler than Megabytes svn:r15944 2008-07-15 23:12:05 +02:00			`to about 1.5 Megabytes of downloads.`
Add proposal 141: download server descriptors on demand. (Status: Draft). svn:r15302 2008-06-16 19:30:22 +02:00
			`A Tor client learns several things about a server from its descriptor.`
			`Some of these it already learned from the network status document`
			`published by the authorities, but the server descriptor contains it`
			`again in a single statement signed by the server itself, not just by`
			`the directory authorities.`

			`Tor clients use the information from server descriptors for`
			`different purposes, which are considered in the following sections.`

			`#three ways: One, to determine if a server will be able to handle`
			`#this client's request; two, to actually communicate or use the server;`
			`#three, for load balancing decisions.`
			`#`
			`#These three points are considered in the following subsections.`

			`2.1 Load balancing`

			`The Tor load balancing mechanism is quite complex in its details, but`
			`it has a simple goal: The more traffic a server can handle the more`
			`traffic it should get. That means the more traffic a server can`
			`handle the more likely a client will use it.`

			`For this purpose each server descriptor has bandwidth information`
			`which tries to convey a server's capacity to clients.`

			`Currently we weigh servers differently for different purposes. There`
			`is a weigh for when we use a server as a guard node (our entry to the`
			`Tor network), there is one weigh we assign servers for exit duties,`
			`and a third for when we need intermediate (middle) nodes.`

			`2.2 Exit information`

			`When a Tor wants to exit to some resource on the internet it will`
			`build a circuit to an exit node that allows access to that resource's`
			`IP address and TCP Port.`

			`When building that circuit the client can make sure that the circuit`
			`ends at a server that will be able to fulfill the request because the`
			`client already learned of all the servers' exit policies from their`
			`descriptors.`

			`2.3 Capability information`

			`Server descriptors contain information about the specific version or`
			`the Tor protocol they understand [proposal 105].`

			`Furthermore the server descriptor also contains the exact version of`
			`the Tor software that the server is running and some decisions are`
			`made based on the server version number (for instance a Tor client`
proposal from 13 Apr 2008 that never got a number is proposal #139 svn:r15945 2008-07-15 23:18:10 +02:00			`will only make conditional consensus requests [proposal 139] when`
			`talking to Tor servers version 0.2.1.1-alpha or later).`
Add proposal 141: download server descriptors on demand. (Status: Draft). svn:r15302 2008-06-16 19:30:22 +02:00
			`2.4 Contact/key information`

			`A server descriptor lists a server's IP address and TCP ports on which`
			`it accepts onion and directory connections. Furthermore it contains`
changes sitting in my trunk sandbox svn:r15955 2008-07-16 02:05:46 +02:00			`the onion key (a short lived RSA key to which clients encrypt CREATE`
			`cells).`
Add proposal 141: download server descriptors on demand. (Status: Draft). svn:r15302 2008-06-16 19:30:22 +02:00
			`2.5 Identity information`

			`A Tor client learns the digest of a server's key from the network`
			`status document. Once it has a server descriptor this descriptor`
			`contains the full RSA identity key of the server. Clients verify`
			`that 1) the digest of the identity key matches the expected digest`
			`it got from the consensus, and 2) that the signature on the descriptor`
			`from that key is valid.`


Add weight consensus line, as described on or-dev, list elements of RELAY_REQUEST_SD cells svn:r15844 2008-07-11 21:01:48 +02:00			`3. No longer require clients to have copies of all SDs`
Add proposal 141: download server descriptors on demand. (Status: Draft). svn:r15302 2008-06-16 19:30:22 +02:00
			`3.1 Load balancing info in consensus documents`

			`One of the reasons why clients download all server descriptors is for`
			`doing load proper load balancing as described in 2.1. In order for`
			`clients to not require all server descriptors this information will`
			`have to move into the network status document.`

Add weight consensus line, as described on or-dev, list elements of RELAY_REQUEST_SD cells svn:r15844 2008-07-11 21:01:48 +02:00			`Consensus documents will have a new line per router similar`
			`to the "r", "s", and "v" lines that already exist. This line`
			`will convey weight information to clients.`

We put bw info directory into the consensus, also versions are already there and protocol versions are not currently required svn:r16423 2008-08-05 18:29:20 +02:00			`"w Bandwidth=193671"`
Add weight consensus line, as described on or-dev, list elements of RELAY_REQUEST_SD cells svn:r15844 2008-07-11 21:01:48 +02:00
We put bw info directory into the consensus, also versions are already there and protocol versions are not currently required svn:r16423 2008-08-05 18:29:20 +02:00			`The bandwidth number is the lesser of observed bandwidth and bandwidth`
			`rate limit from the server descriptor that the "r" line referenced by`
			`digest (1st and 3rd field of the bandwidth line in the descriptor).`
Add weight consensus line, as described on or-dev, list elements of RELAY_REQUEST_SD cells svn:r15844 2008-07-11 21:01:48 +02:00
Use median for bw after all, capping is done at the authorities, not client-side svn:r16512 2008-08-12 21:20:05 +02:00			`Authorities will cap the bandwidth number at some arbitrary value,`
			`currently 10MB/sec. If a router claims a larger bandwidth an`
			`authority's vote will still only show Bandwidth=10000000.`

			`The consensus value for bandwidth is the median of all bandwidth`
			`numbers given in votes. In case of an even number of votes we use`
			`the lower median. (Using this procedure allows us to change the`
			`cap value more easily.)`

			`Clients should believe the bandwidth as presented in the consensus,`
			`not capping it again.`
Add proposal 141: download server descriptors on demand. (Status: Draft). svn:r15302 2008-06-16 19:30:22 +02:00
			`3.2 Fetching descriptors on demand`

			`As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,`
			`and the onion key for a server.`

			`A client already knows the IP address and the ports from the consensus`
			`documents, but without the onion key it will not be able to send`
			`CREATE/EXTEND cells for that server. Since the client needs the onion`
			`key it needs the descriptor.`

			`If a client only downloaded a few descriptors in an observable manner`
			`then that would leak which nodes it was going to use.`

			`This proposal suggests the following:`

			`1) when connecting to a guard node for which the client does not`
			`yet have a cached descriptor it requests the descriptor it`
			`expects by hash. (The consensus document that the client holds`
			`has a hash for the descriptor of this server. We want exactly`
			`that descriptor, not a different one.)`

Add weight consensus line, as described on or-dev, list elements of RELAY_REQUEST_SD cells svn:r15844 2008-07-11 21:01:48 +02:00			`It does that by sending a RELAY_REQUEST_SD cell.`
Add proposal 141: download server descriptors on demand. (Status: Draft). svn:r15302 2008-06-16 19:30:22 +02:00
			`A client MAY cache the descriptor of the guard node so that it does`
			`not need to request it every single time it contacts the guard.`

			`2) when a client wants to extend a circuit that currently ends in`
			`server B to a new next server C, the client will send a`
			`RELAY_REQUEST_SD cell to server B. This cell contains in its`
			`payload the hash of a server descriptor the client would like`
			`to obtain (C's server descriptor). The server sends back the`
			`descriptor and the client can now form a valid EXTEND/CREATE cell`
			`encrypted to C's onion key.`

			`Clients MUST NOT cache such descriptors. If they did they might`
			`leak that they already extended to that server at least once`
			`before.`

			`Replies to RELAY_REQUEST_SD requests need to be padded to some`
			`constant upper limit in order to conceal a client's destination`
			`from anybody who might be counting cells/bytes.`

Add weight consensus line, as described on or-dev, list elements of RELAY_REQUEST_SD cells svn:r15844 2008-07-11 21:01:48 +02:00			`RELAY_REQUEST_SD cells contain the following information:`
			`- hash of the server descriptor requested`
			`- hash of the identity digest of the server for which we want the SD`
			`- IP address and OR-port or the server for which we want the SD`
			`- padding factor - the number of cells we want the answer`
			`padded to.`
			`[XXX this just occured to me and it might be smart. or it might`
			`be stupid. clients would learn the padding factor they want`
			`to use from the consensus document. This allows us to grow`
			`the replies later on should SDs become larger.]`
Add proposal 141: download server descriptors on demand. (Status: Draft). svn:r15302 2008-06-16 19:30:22 +02:00			`[XXX: figure out a decent padding size]`

			`3.3 Protocol versions`

We put bw info directory into the consensus, also versions are already there and protocol versions are not currently required svn:r16423 2008-08-05 18:29:20 +02:00			`Server descriptors contain optional information of supported`
			`link-level and circuit-level protocols in the form of`
			`"opt protocols Link 1 2 Circuit 1". These are not currently needed`
			`and will probably eventually move into the "v" (version) line in`
			`the consensus. This proposal does not deal with them.`
Add proposal 141: download server descriptors on demand. (Status: Draft). svn:r15302 2008-06-16 19:30:22 +02:00
We put bw info directory into the consensus, also versions are already there and protocol versions are not currently required svn:r16423 2008-08-05 18:29:20 +02:00			`Similarly a server descriptor contains the version number of`
			`a Tor node. This information is already present in the consensus`
			`and is thus available to all clients immediately.`
Add proposal 141: download server descriptors on demand. (Status: Draft). svn:r15302 2008-06-16 19:30:22 +02:00
			`3.4 Exit selection`

			`Currently finding an appropriate exit node for a user's request is`
			`easy for a client because it has complete knowledge of all the exit`
			`policies of all servers on the network.`

spec exit policy summaries svn:r16500 2008-08-11 21:56:46 +02:00			`The consensus document will once again be extended to contain the`
			`information required by clients. This information will be a summary`
			`of each node's exit policy. The exit policy summary will only contain`
			`the list of ports to which a node exits to most destination IP`
			`addresses.`

			`A summary should claim a router exits to a specific TCP port if,`
			`ignoring private IP addresses (link and site local per RFC3300), the`
			`exit policy indicates that the router would exit to this port to any`
			`IP address with the exception of at most 2^25 single addresses (That's`
			`either two /8 netblocks, or one /8 and a couple of /12s or any other`
			`combination).`

			`An exit policy summary will be included in votes and consensus as a`
			`new line attached to each exit node. A lack of policy should indicate`
			`a non-exit policy. The line will have the format`
			`"p" <space> "accept"\|"reject" <portlist>`
			`where portlist is a comma seperated list of single port numbers or`
			`portranges (e.g. "22,80-88,1024-6000,6667"). Whether the summary`
			`shows the list of accepted ports or the list of rejected ports depends`
			`on which list is shorter (has less elements). In case of ties we`
			`choose the list of accepted ports.`

			`Similarly to IP address, ports, timestamp, and bandwidth a consensus`
			`should list the exit policy matching the descriptor digest referenced`
			`in the consensus document.`

Describe how clients should use the exit summaries, what they may use a locally cached descriptor for, and that enclave exiting is still allowed svn:r16501 2008-08-11 22:09:43 +02:00			`3.4.1 Client behaviour`

			`When choosing an exit node for a specific request a Tor client will`
			`choose from the list of nodes that exit to the requested port as given`
			`by the consensus document. If a client has additional knowledge (like`
			`cached full descriptors) that indicates the so chosen exit node will`
			`reject the request then it MAY use that knowledge (or not include such`
			`nodes in the selection to begin with). However, clients MUST NOT use`
			`nodes that do not list the port as accepted in the summary (but for`
			`which they know that the node would exit to that address from other`
			`sources, like a cached descriptor).`

			`An exception to this is exit enclave behaviour: A client MAY use the`
			`node at a specific IP address to exit to any port on the same address`
			`even if that node is not listed as exiting to the port in the summary.`

spec exit policy summaries svn:r16500 2008-08-11 21:56:46 +02:00			`4. Migration`

			`4.1 Consensus document changes.`

			`The consensus will need to include`
			`- bandwidth information (see 3.1)`
			`- exit policy summaries (3.4)`

			`A new consensus method (number TBD) will be chosen for this.`

			`5. Future possibilities`
Add proposal 141: download server descriptors on demand. (Status: Draft). svn:r15302 2008-06-16 19:30:22 +02:00
			`This proposal still requires that all servers have the descriptors of`
			`every other node in the network in order to answer RELAY_REQUEST_SD`
			`cells. These cells are sent when a circuit is extended from ending at`
			`node B to a new node C. In that case B would have to answer a`
			`RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).`

			`In order to answer that request B obviously needs a copy of C's server`
Add weight consensus line, as described on or-dev, list elements of RELAY_REQUEST_SD cells svn:r15844 2008-07-11 21:01:48 +02:00			`descriptor. The RELAY_REQUEST_SD cell already has all the info that`
			`B needs to contact C so it can ask about the descriptor before passing it`
Add proposal 141: download server descriptors on demand. (Status: Draft). svn:r15302 2008-06-16 19:30:22 +02:00			`back to the client.`