mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-24 20:33:31 +01:00
bcd7357b71
svn:r15302
220 lines
9.1 KiB
Plaintext
220 lines
9.1 KiB
Plaintext
Filename: 141-jit-sd-downloads.txt
|
|
Title: Download server descriptors on demand
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Peter Palfrader
|
|
Created: 15-Jun-2008
|
|
Status: Draft
|
|
|
|
1. Overview
|
|
|
|
Downloading all server descriptors is the most expensive part
|
|
of bootstrapping a Tor client. These server descriptors currently
|
|
amount to about 1.5 Megabytes of data, and this size will grow
|
|
linearly with network size.
|
|
|
|
Fetching all these server descriptors takes a long while for people
|
|
behind slow network connections. It is also a considerable load on
|
|
our network of directory mirrors.
|
|
|
|
This document describes proposed changes to the Tor network and
|
|
directory protocol so that clients will no longer need to download
|
|
all server descriptors.
|
|
|
|
These changes consist of moving load balancing information into
|
|
network status documents, implementing a means to download server
|
|
descriptors on demand in an anonymity-preserving way, and dealing
|
|
with exit node selection.
|
|
|
|
2. What is in a server descriptor
|
|
|
|
When a Tor client starts the first thing it will try to get is a
|
|
current network status document, a consensus signed by a majority
|
|
of directory authorities. This document is currently about 100
|
|
Kilobytes in size, tho it will grow linearly with network size.
|
|
This document lists all servers currently running on the network.
|
|
The Tor client will then try to get a server descriptor for each
|
|
of the running servers. All server descriptors currently amount
|
|
to about 1.5 Metabytes of downloads.
|
|
|
|
A Tor client learns several things about a server from its descriptor.
|
|
Some of these it already learned from the network status document
|
|
published by the authorities, but the server descriptor contains it
|
|
again in a single statement signed by the server itself, not just by
|
|
the directory authorities.
|
|
|
|
Tor clients use the information from server descriptors for
|
|
different purposes, which are considered in the following sections.
|
|
|
|
#three ways: One, to determine if a server will be able to handle
|
|
#this client's request; two, to actually communicate or use the server;
|
|
#three, for load balancing decisions.
|
|
#
|
|
#These three points are considered in the following subsections.
|
|
|
|
2.1 Load balancing
|
|
|
|
The Tor load balancing mechanism is quite complex in its details, but
|
|
it has a simple goal: The more traffic a server can handle the more
|
|
traffic it should get. That means the more traffic a server can
|
|
handle the more likely a client will use it.
|
|
|
|
For this purpose each server descriptor has bandwidth information
|
|
which tries to convey a server's capacity to clients.
|
|
|
|
Currently we weigh servers differently for different purposes. There
|
|
is a weigh for when we use a server as a guard node (our entry to the
|
|
Tor network), there is one weigh we assign servers for exit duties,
|
|
and a third for when we need intermediate (middle) nodes.
|
|
|
|
2.2 Exit information
|
|
|
|
When a Tor wants to exit to some resource on the internet it will
|
|
build a circuit to an exit node that allows access to that resource's
|
|
IP address and TCP Port.
|
|
|
|
When building that circuit the client can make sure that the circuit
|
|
ends at a server that will be able to fulfill the request because the
|
|
client already learned of all the servers' exit policies from their
|
|
descriptors.
|
|
|
|
2.3 Capability information
|
|
|
|
Server descriptors contain information about the specific version or
|
|
the Tor protocol they understand [proposal 105].
|
|
|
|
Furthermore the server descriptor also contains the exact version of
|
|
the Tor software that the server is running and some decisions are
|
|
made based on the server version number (for instance a Tor client
|
|
will only make conditional consensus requests [proposal from 13 Apr
|
|
2008 that never got a number] when talking to Tor servers version
|
|
0.2.1.1-alpha or later).
|
|
|
|
2.4 Contact/key information
|
|
|
|
A server descriptor lists a server's IP address and TCP ports on which
|
|
it accepts onion and directory connections. Furthermore it contains
|
|
the onion key, a short lived RSA key to which clients encrypt CREATE
|
|
cells.
|
|
|
|
2.5 Identity information
|
|
|
|
A Tor client learns the digest of a server's key from the network
|
|
status document. Once it has a server descriptor this descriptor
|
|
contains the full RSA identity key of the server. Clients verify
|
|
that 1) the digest of the identity key matches the expected digest
|
|
it got from the consensus, and 2) that the signature on the descriptor
|
|
from that key is valid.
|
|
|
|
|
|
3. Doing away with the need for all SDs
|
|
|
|
3.1 Load balancing info in consensus documents
|
|
|
|
One of the reasons why clients download all server descriptors is for
|
|
doing load proper load balancing as described in 2.1. In order for
|
|
clients to not require all server descriptors this information will
|
|
have to move into the network status document.
|
|
|
|
[XXX Two open questions here:
|
|
a) how do we arrive at a consensus weight?
|
|
b) how to represent weights in the consensus?
|
|
Maybe "s Guard=0.13 Exit=0.02 Middle=0.00 Stable.."
|
|
]
|
|
|
|
3.2 Fetching descriptors on demand
|
|
|
|
As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
|
|
and the onion key for a server.
|
|
|
|
A client already knows the IP address and the ports from the consensus
|
|
documents, but without the onion key it will not be able to send
|
|
CREATE/EXTEND cells for that server. Since the client needs the onion
|
|
key it needs the descriptor.
|
|
|
|
If a client only downloaded a few descriptors in an observable manner
|
|
then that would leak which nodes it was going to use.
|
|
|
|
This proposal suggests the following:
|
|
|
|
1) when connecting to a guard node for which the client does not
|
|
yet have a cached descriptor it requests the descriptor it
|
|
expects by hash. (The consensus document that the client holds
|
|
has a hash for the descriptor of this server. We want exactly
|
|
that descriptor, not a different one.)
|
|
|
|
[XXX: How? We could either come up with a new cell type,
|
|
RELAY_REQUEST_SD that takes only a hash (of the SD), or use
|
|
RELAY_BEGIN_DIR. The former is probably smarter since we will
|
|
want to use it later on as well, and there we will require
|
|
padding.]
|
|
|
|
A client MAY cache the descriptor of the guard node so that it does
|
|
not need to request it every single time it contacts the guard.
|
|
|
|
2) when a client wants to extend a circuit that currently ends in
|
|
server B to a new next server C, the client will send a
|
|
RELAY_REQUEST_SD cell to server B. This cell contains in its
|
|
payload the hash of a server descriptor the client would like
|
|
to obtain (C's server descriptor). The server sends back the
|
|
descriptor and the client can now form a valid EXTEND/CREATE cell
|
|
encrypted to C's onion key.
|
|
|
|
Clients MUST NOT cache such descriptors. If they did they might
|
|
leak that they already extended to that server at least once
|
|
before.
|
|
|
|
Replies to RELAY_REQUEST_SD requests need to be padded to some
|
|
constant upper limit in order to conceal a client's destination
|
|
from anybody who might be counting cells/bytes.
|
|
|
|
[XXX: detailed spec of RELAY_REQUEST_SD cell and its reply]
|
|
[XXX: figure out a decent padding size]
|
|
|
|
3.3 Protocol versions
|
|
|
|
[XXX: find out where we need "opt protocols Link 1 2 Circuit 1"
|
|
information described in 2.3 above. If we need it, it might have
|
|
to go into the consensus document.]
|
|
|
|
[XXX: Similarly find out where we need the version number of a
|
|
remote tor server. This information is in the consensus, but
|
|
maybe we use it in some place where having it signed by the
|
|
server in question is really important?]
|
|
|
|
3.4 Exit selection
|
|
|
|
Currently finding an appropriate exit node for a user's request is
|
|
easy for a client because it has complete knowledge of all the exit
|
|
policies of all servers on the network.
|
|
|
|
[XXX: I have no finished ideas here yet.
|
|
- if clients only rely on the current exit flag they will
|
|
a) never use servers for exit purposes that don't have it,
|
|
b) will have a hard time finding a suitable exit node for
|
|
their weird port that only a few servers allow.
|
|
- the authorities could create a new summary document that
|
|
lists all the exit policies and their nodes (by fingerprint).
|
|
I need to find out how large that document would be.
|
|
- can we make the "Exit" flag more useful? can we come
|
|
up with some "standard policies" and have operators pick
|
|
one of the standards?
|
|
]
|
|
|
|
4. Future possibilities
|
|
|
|
This proposal still requires that all servers have the descriptors of
|
|
every other node in the network in order to answer RELAY_REQUEST_SD
|
|
cells. These cells are sent when a circuit is extended from ending at
|
|
node B to a new node C. In that case B would have to answer a
|
|
RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).
|
|
|
|
In order to answer that request B obviously needs a copy of C's server
|
|
descriptor. In the future we might amend RELAY_REQUEST_SD cells to
|
|
contain also the expected IP address and OR-port of the server C (the
|
|
client learns them from the network status document), so that B no
|
|
longer needs to know all the descriptors of the entire network but
|
|
instead can simply go and ask C for its descriptor before passing it
|
|
back to the client.
|
|
|