2008-06-16 19:30:22 +02:00
|
|
|
Filename: 141-jit-sd-downloads.txt
|
|
|
|
Title: Download server descriptors on demand
|
|
|
|
Version: $Revision$
|
|
|
|
Last-Modified: $Date$
|
|
|
|
Author: Peter Palfrader
|
|
|
|
Created: 15-Jun-2008
|
|
|
|
Status: Draft
|
|
|
|
|
|
|
|
1. Overview
|
|
|
|
|
|
|
|
Downloading all server descriptors is the most expensive part
|
|
|
|
of bootstrapping a Tor client. These server descriptors currently
|
|
|
|
amount to about 1.5 Megabytes of data, and this size will grow
|
|
|
|
linearly with network size.
|
|
|
|
|
|
|
|
Fetching all these server descriptors takes a long while for people
|
|
|
|
behind slow network connections. It is also a considerable load on
|
|
|
|
our network of directory mirrors.
|
|
|
|
|
|
|
|
This document describes proposed changes to the Tor network and
|
|
|
|
directory protocol so that clients will no longer need to download
|
|
|
|
all server descriptors.
|
|
|
|
|
|
|
|
These changes consist of moving load balancing information into
|
|
|
|
network status documents, implementing a means to download server
|
|
|
|
descriptors on demand in an anonymity-preserving way, and dealing
|
|
|
|
with exit node selection.
|
|
|
|
|
|
|
|
2. What is in a server descriptor
|
|
|
|
|
|
|
|
When a Tor client starts the first thing it will try to get is a
|
2008-07-16 02:05:46 +02:00
|
|
|
current network status document: a consensus signed by a majority
|
2008-06-16 19:30:22 +02:00
|
|
|
of directory authorities. This document is currently about 100
|
|
|
|
Kilobytes in size, tho it will grow linearly with network size.
|
|
|
|
This document lists all servers currently running on the network.
|
|
|
|
The Tor client will then try to get a server descriptor for each
|
|
|
|
of the running servers. All server descriptors currently amount
|
2008-07-15 23:12:05 +02:00
|
|
|
to about 1.5 Megabytes of downloads.
|
2008-06-16 19:30:22 +02:00
|
|
|
|
|
|
|
A Tor client learns several things about a server from its descriptor.
|
|
|
|
Some of these it already learned from the network status document
|
|
|
|
published by the authorities, but the server descriptor contains it
|
|
|
|
again in a single statement signed by the server itself, not just by
|
|
|
|
the directory authorities.
|
|
|
|
|
|
|
|
Tor clients use the information from server descriptors for
|
|
|
|
different purposes, which are considered in the following sections.
|
|
|
|
|
|
|
|
#three ways: One, to determine if a server will be able to handle
|
|
|
|
#this client's request; two, to actually communicate or use the server;
|
|
|
|
#three, for load balancing decisions.
|
|
|
|
#
|
|
|
|
#These three points are considered in the following subsections.
|
|
|
|
|
|
|
|
2.1 Load balancing
|
|
|
|
|
|
|
|
The Tor load balancing mechanism is quite complex in its details, but
|
|
|
|
it has a simple goal: The more traffic a server can handle the more
|
|
|
|
traffic it should get. That means the more traffic a server can
|
|
|
|
handle the more likely a client will use it.
|
|
|
|
|
|
|
|
For this purpose each server descriptor has bandwidth information
|
|
|
|
which tries to convey a server's capacity to clients.
|
|
|
|
|
|
|
|
Currently we weigh servers differently for different purposes. There
|
|
|
|
is a weigh for when we use a server as a guard node (our entry to the
|
|
|
|
Tor network), there is one weigh we assign servers for exit duties,
|
|
|
|
and a third for when we need intermediate (middle) nodes.
|
|
|
|
|
|
|
|
2.2 Exit information
|
|
|
|
|
|
|
|
When a Tor wants to exit to some resource on the internet it will
|
|
|
|
build a circuit to an exit node that allows access to that resource's
|
|
|
|
IP address and TCP Port.
|
|
|
|
|
|
|
|
When building that circuit the client can make sure that the circuit
|
|
|
|
ends at a server that will be able to fulfill the request because the
|
|
|
|
client already learned of all the servers' exit policies from their
|
|
|
|
descriptors.
|
|
|
|
|
|
|
|
2.3 Capability information
|
|
|
|
|
|
|
|
Server descriptors contain information about the specific version or
|
|
|
|
the Tor protocol they understand [proposal 105].
|
|
|
|
|
|
|
|
Furthermore the server descriptor also contains the exact version of
|
|
|
|
the Tor software that the server is running and some decisions are
|
|
|
|
made based on the server version number (for instance a Tor client
|
2008-07-15 23:18:10 +02:00
|
|
|
will only make conditional consensus requests [proposal 139] when
|
|
|
|
talking to Tor servers version 0.2.1.1-alpha or later).
|
2008-06-16 19:30:22 +02:00
|
|
|
|
|
|
|
2.4 Contact/key information
|
|
|
|
|
|
|
|
A server descriptor lists a server's IP address and TCP ports on which
|
|
|
|
it accepts onion and directory connections. Furthermore it contains
|
2008-07-16 02:05:46 +02:00
|
|
|
the onion key (a short lived RSA key to which clients encrypt CREATE
|
|
|
|
cells).
|
2008-06-16 19:30:22 +02:00
|
|
|
|
|
|
|
2.5 Identity information
|
|
|
|
|
|
|
|
A Tor client learns the digest of a server's key from the network
|
|
|
|
status document. Once it has a server descriptor this descriptor
|
|
|
|
contains the full RSA identity key of the server. Clients verify
|
|
|
|
that 1) the digest of the identity key matches the expected digest
|
|
|
|
it got from the consensus, and 2) that the signature on the descriptor
|
|
|
|
from that key is valid.
|
|
|
|
|
|
|
|
|
2008-07-11 21:01:48 +02:00
|
|
|
3. No longer require clients to have copies of all SDs
|
2008-06-16 19:30:22 +02:00
|
|
|
|
|
|
|
3.1 Load balancing info in consensus documents
|
|
|
|
|
|
|
|
One of the reasons why clients download all server descriptors is for
|
|
|
|
doing load proper load balancing as described in 2.1. In order for
|
|
|
|
clients to not require all server descriptors this information will
|
|
|
|
have to move into the network status document.
|
|
|
|
|
2008-07-11 21:01:48 +02:00
|
|
|
Consensus documents will have a new line per router similar
|
|
|
|
to the "r", "s", and "v" lines that already exist. This line
|
|
|
|
will convey weight information to clients.
|
|
|
|
|
|
|
|
"w Exit=41 Guard=94 Middle=543 ..."
|
|
|
|
|
|
|
|
It starts with the letter w and then contains any number of Key=Value
|
|
|
|
pairs. Values will be non-negative integers. Clients will pick
|
|
|
|
routers with a propability proportional to the number for the intended
|
|
|
|
purpose.
|
|
|
|
|
|
|
|
Clients MUST accept sums of all weights for a given purpose over all
|
|
|
|
routers in a consensus up to UINT64_max.
|
|
|
|
|
|
|
|
[XXX how do we arrive at a consensus weight?
|
|
|
|
option a) Perhaps the vote could contain the node's bandwidth, and
|
|
|
|
this could be used to calculate the weights? It's
|
|
|
|
necessary that the consensus remain a deterministic
|
|
|
|
function of the votes.
|
|
|
|
option b) Every voter assigns weights for each of the purposes
|
|
|
|
(Exit, Guard, ..) so that their total sum is some constant
|
|
|
|
X. When building a consensus we take the median for each
|
|
|
|
purpose for each router.
|
|
|
|
|
|
|
|
Option a has the disadvantage that if we want to tweak the weighting
|
|
|
|
we have to make a new consensus-method]
|
2008-06-16 19:30:22 +02:00
|
|
|
|
|
|
|
3.2 Fetching descriptors on demand
|
|
|
|
|
|
|
|
As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
|
|
|
|
and the onion key for a server.
|
|
|
|
|
|
|
|
A client already knows the IP address and the ports from the consensus
|
|
|
|
documents, but without the onion key it will not be able to send
|
|
|
|
CREATE/EXTEND cells for that server. Since the client needs the onion
|
|
|
|
key it needs the descriptor.
|
|
|
|
|
|
|
|
If a client only downloaded a few descriptors in an observable manner
|
|
|
|
then that would leak which nodes it was going to use.
|
|
|
|
|
|
|
|
This proposal suggests the following:
|
|
|
|
|
|
|
|
1) when connecting to a guard node for which the client does not
|
|
|
|
yet have a cached descriptor it requests the descriptor it
|
|
|
|
expects by hash. (The consensus document that the client holds
|
|
|
|
has a hash for the descriptor of this server. We want exactly
|
|
|
|
that descriptor, not a different one.)
|
|
|
|
|
2008-07-11 21:01:48 +02:00
|
|
|
It does that by sending a RELAY_REQUEST_SD cell.
|
2008-06-16 19:30:22 +02:00
|
|
|
|
|
|
|
A client MAY cache the descriptor of the guard node so that it does
|
|
|
|
not need to request it every single time it contacts the guard.
|
|
|
|
|
|
|
|
2) when a client wants to extend a circuit that currently ends in
|
|
|
|
server B to a new next server C, the client will send a
|
|
|
|
RELAY_REQUEST_SD cell to server B. This cell contains in its
|
|
|
|
payload the hash of a server descriptor the client would like
|
|
|
|
to obtain (C's server descriptor). The server sends back the
|
|
|
|
descriptor and the client can now form a valid EXTEND/CREATE cell
|
|
|
|
encrypted to C's onion key.
|
|
|
|
|
|
|
|
Clients MUST NOT cache such descriptors. If they did they might
|
|
|
|
leak that they already extended to that server at least once
|
|
|
|
before.
|
|
|
|
|
|
|
|
Replies to RELAY_REQUEST_SD requests need to be padded to some
|
|
|
|
constant upper limit in order to conceal a client's destination
|
|
|
|
from anybody who might be counting cells/bytes.
|
|
|
|
|
2008-07-11 21:01:48 +02:00
|
|
|
RELAY_REQUEST_SD cells contain the following information:
|
|
|
|
- hash of the server descriptor requested
|
|
|
|
- hash of the identity digest of the server for which we want the SD
|
|
|
|
- IP address and OR-port or the server for which we want the SD
|
|
|
|
- padding factor - the number of cells we want the answer
|
|
|
|
padded to.
|
|
|
|
[XXX this just occured to me and it might be smart. or it might
|
|
|
|
be stupid. clients would learn the padding factor they want
|
|
|
|
to use from the consensus document. This allows us to grow
|
|
|
|
the replies later on should SDs become larger.]
|
2008-06-16 19:30:22 +02:00
|
|
|
[XXX: figure out a decent padding size]
|
|
|
|
|
|
|
|
3.3 Protocol versions
|
|
|
|
|
|
|
|
[XXX: find out where we need "opt protocols Link 1 2 Circuit 1"
|
|
|
|
information described in 2.3 above. If we need it, it might have
|
|
|
|
to go into the consensus document.]
|
|
|
|
|
|
|
|
[XXX: Similarly find out where we need the version number of a
|
|
|
|
remote tor server. This information is in the consensus, but
|
|
|
|
maybe we use it in some place where having it signed by the
|
|
|
|
server in question is really important?]
|
|
|
|
|
|
|
|
3.4 Exit selection
|
|
|
|
|
|
|
|
Currently finding an appropriate exit node for a user's request is
|
|
|
|
easy for a client because it has complete knowledge of all the exit
|
|
|
|
policies of all servers on the network.
|
|
|
|
|
|
|
|
[XXX: I have no finished ideas here yet.
|
|
|
|
- if clients only rely on the current exit flag they will
|
|
|
|
a) never use servers for exit purposes that don't have it,
|
|
|
|
b) will have a hard time finding a suitable exit node for
|
|
|
|
their weird port that only a few servers allow.
|
|
|
|
- the authorities could create a new summary document that
|
|
|
|
lists all the exit policies and their nodes (by fingerprint).
|
|
|
|
I need to find out how large that document would be.
|
|
|
|
- can we make the "Exit" flag more useful? can we come
|
|
|
|
up with some "standard policies" and have operators pick
|
|
|
|
one of the standards?
|
|
|
|
]
|
|
|
|
|
|
|
|
4. Future possibilities
|
|
|
|
|
|
|
|
This proposal still requires that all servers have the descriptors of
|
|
|
|
every other node in the network in order to answer RELAY_REQUEST_SD
|
|
|
|
cells. These cells are sent when a circuit is extended from ending at
|
|
|
|
node B to a new node C. In that case B would have to answer a
|
|
|
|
RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).
|
|
|
|
|
|
|
|
In order to answer that request B obviously needs a copy of C's server
|
2008-07-11 21:01:48 +02:00
|
|
|
descriptor. The RELAY_REQUEST_SD cell already has all the info that
|
|
|
|
B needs to contact C so it can ask about the descriptor before passing it
|
2008-06-16 19:30:22 +02:00
|
|
|
back to the client.
|
|
|
|
|