mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-12-11 05:03:34 +01:00
d230827912
Tor doesn't use SVN anymore, making $Revision$, $Id$ and $Date$ meaningless. Remove them without replacement.
324 lines
14 KiB
Plaintext
324 lines
14 KiB
Plaintext
Filename: 141-jit-sd-downloads.txt
|
|
Title: Download server descriptors on demand
|
|
Author: Peter Palfrader
|
|
Created: 15-Jun-2008
|
|
Status: Draft
|
|
|
|
1. Overview
|
|
|
|
Downloading all server descriptors is the most expensive part
|
|
of bootstrapping a Tor client. These server descriptors currently
|
|
amount to about 1.5 Megabytes of data, and this size will grow
|
|
linearly with network size.
|
|
|
|
Fetching all these server descriptors takes a long while for people
|
|
behind slow network connections. It is also a considerable load on
|
|
our network of directory mirrors.
|
|
|
|
This document describes proposed changes to the Tor network and
|
|
directory protocol so that clients will no longer need to download
|
|
all server descriptors.
|
|
|
|
These changes consist of moving load balancing information into
|
|
network status documents, implementing a means to download server
|
|
descriptors on demand in an anonymity-preserving way, and dealing
|
|
with exit node selection.
|
|
|
|
2. What is in a server descriptor
|
|
|
|
When a Tor client starts the first thing it will try to get is a
|
|
current network status document: a consensus signed by a majority
|
|
of directory authorities. This document is currently about 100
|
|
Kilobytes in size, tho it will grow linearly with network size.
|
|
This document lists all servers currently running on the network.
|
|
The Tor client will then try to get a server descriptor for each
|
|
of the running servers. All server descriptors currently amount
|
|
to about 1.5 Megabytes of downloads.
|
|
|
|
A Tor client learns several things about a server from its descriptor.
|
|
Some of these it already learned from the network status document
|
|
published by the authorities, but the server descriptor contains it
|
|
again in a single statement signed by the server itself, not just by
|
|
the directory authorities.
|
|
|
|
Tor clients use the information from server descriptors for
|
|
different purposes, which are considered in the following sections.
|
|
|
|
#three ways: One, to determine if a server will be able to handle
|
|
#this client's request; two, to actually communicate or use the server;
|
|
#three, for load balancing decisions.
|
|
#
|
|
#These three points are considered in the following subsections.
|
|
|
|
2.1 Load balancing
|
|
|
|
The Tor load balancing mechanism is quite complex in its details, but
|
|
it has a simple goal: The more traffic a server can handle the more
|
|
traffic it should get. That means the more traffic a server can
|
|
handle the more likely a client will use it.
|
|
|
|
For this purpose each server descriptor has bandwidth information
|
|
which tries to convey a server's capacity to clients.
|
|
|
|
Currently we weigh servers differently for different purposes. There
|
|
is a weight for when we use a server as a guard node (our entry to the
|
|
Tor network), there is one weight we assign servers for exit duties,
|
|
and a third for when we need intermediate (middle) nodes.
|
|
|
|
2.2 Exit information
|
|
|
|
When a Tor wants to exit to some resource on the internet it will
|
|
build a circuit to an exit node that allows access to that resource's
|
|
IP address and TCP Port.
|
|
|
|
When building that circuit the client can make sure that the circuit
|
|
ends at a server that will be able to fulfill the request because the
|
|
client already learned of all the servers' exit policies from their
|
|
descriptors.
|
|
|
|
2.3 Capability information
|
|
|
|
Server descriptors contain information about the specific version of
|
|
the Tor protocol they understand [proposal 105].
|
|
|
|
Furthermore the server descriptor also contains the exact version of
|
|
the Tor software that the server is running and some decisions are
|
|
made based on the server version number (for instance a Tor client
|
|
will only make conditional consensus requests [proposal 139] when
|
|
talking to Tor servers version 0.2.1.1-alpha or later).
|
|
|
|
2.4 Contact/key information
|
|
|
|
A server descriptor lists a server's IP address and TCP ports on which
|
|
it accepts onion and directory connections. Furthermore it contains
|
|
the onion key (a short lived RSA key to which clients encrypt CREATE
|
|
cells).
|
|
|
|
2.5 Identity information
|
|
|
|
A Tor client learns the digest of a server's key from the network
|
|
status document. Once it has a server descriptor this descriptor
|
|
contains the full RSA identity key of the server. Clients verify
|
|
that 1) the digest of the identity key matches the expected digest
|
|
it got from the consensus, and 2) that the signature on the descriptor
|
|
from that key is valid.
|
|
|
|
|
|
3. No longer require clients to have copies of all SDs
|
|
|
|
3.1 Load balancing info in consensus documents
|
|
|
|
One of the reasons why clients download all server descriptors is for
|
|
doing load proper load balancing as described in 2.1. In order for
|
|
clients to not require all server descriptors this information will
|
|
have to move into the network status document.
|
|
|
|
Consensus documents will have a new line per router similar
|
|
to the "r", "s", and "v" lines that already exist. This line
|
|
will convey weight information to clients.
|
|
|
|
"w Bandwidth=193"
|
|
|
|
The bandwidth number is the lesser of observed bandwidth and bandwidth
|
|
rate limit from the server descriptor that the "r" line referenced by
|
|
digest (1st and 3rd field of the bandwidth line in the descriptor).
|
|
It is given in kilobytes per second so the byte value in the
|
|
descriptor has to be divided by 1024 (and is then truncated, i.e.
|
|
rounded down).
|
|
|
|
Authorities will cap the bandwidth number at some arbitrary value,
|
|
currently 10MB/sec. If a router claims a larger bandwidth an
|
|
authority's vote will still only show Bandwidth=10240.
|
|
|
|
The consensus value for bandwidth is the median of all bandwidth
|
|
numbers given in votes. In case of an even number of votes we use
|
|
the lower median. (Using this procedure allows us to change the
|
|
cap value more easily.)
|
|
|
|
Clients should believe the bandwidth as presented in the consensus,
|
|
not capping it again.
|
|
|
|
3.2 Fetching descriptors on demand
|
|
|
|
As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
|
|
and the onion key for a server.
|
|
|
|
A client already knows the IP address and the ports from the consensus
|
|
documents, but without the onion key it will not be able to send
|
|
CREATE/EXTEND cells for that server. Since the client needs the onion
|
|
key it needs the descriptor.
|
|
|
|
If a client only downloaded a few descriptors in an observable manner
|
|
then that would leak which nodes it was going to use.
|
|
|
|
This proposal suggests the following:
|
|
|
|
1) when connecting to a guard node for which the client does not
|
|
yet have a cached descriptor it requests the descriptor it
|
|
expects by hash. (The consensus document that the client holds
|
|
has a hash for the descriptor of this server. We want exactly
|
|
that descriptor, not a different one.)
|
|
|
|
It does that by sending a RELAY_REQUEST_SD cell.
|
|
|
|
A client MAY cache the descriptor of the guard node so that it does
|
|
not need to request it every single time it contacts the guard.
|
|
|
|
2) when a client wants to extend a circuit that currently ends in
|
|
server B to a new next server C, the client will send a
|
|
RELAY_REQUEST_SD cell to server B. This cell contains in its
|
|
payload the hash of a server descriptor the client would like
|
|
to obtain (C's server descriptor). The server sends back the
|
|
descriptor and the client can now form a valid EXTEND/CREATE cell
|
|
encrypted to C's onion key.
|
|
|
|
Clients MUST NOT cache such descriptors. If they did they might
|
|
leak that they already extended to that server at least once
|
|
before.
|
|
|
|
Replies to RELAY_REQUEST_SD requests need to be padded to some
|
|
constant upper limit in order to conceal a client's destination
|
|
from anybody who might be counting cells/bytes.
|
|
|
|
RELAY_REQUEST_SD cells contain the following information:
|
|
- hash of the server descriptor requested
|
|
- hash of the identity digest of the server for which we want the SD
|
|
- IP address and OR-port or the server for which we want the SD
|
|
- padding factor - the number of cells we want the answer
|
|
padded to.
|
|
[XXX this just occured to me and it might be smart. or it might
|
|
be stupid. clients would learn the padding factor they want
|
|
to use from the consensus document. This allows us to grow
|
|
the replies later on should SDs become larger.]
|
|
[XXX: figure out a decent padding size]
|
|
|
|
3.3 Protocol versions
|
|
|
|
Server descriptors contain optional information of supported
|
|
link-level and circuit-level protocols in the form of
|
|
"opt protocols Link 1 2 Circuit 1". These are not currently needed
|
|
and will probably eventually move into the "v" (version) line in
|
|
the consensus. This proposal does not deal with them.
|
|
|
|
Similarly a server descriptor contains the version number of
|
|
a Tor node. This information is already present in the consensus
|
|
and is thus available to all clients immediately.
|
|
|
|
3.4 Exit selection
|
|
|
|
Currently finding an appropriate exit node for a user's request is
|
|
easy for a client because it has complete knowledge of all the exit
|
|
policies of all servers on the network.
|
|
|
|
The consensus document will once again be extended to contain the
|
|
information required by clients. This information will be a summary
|
|
of each node's exit policy. The exit policy summary will only contain
|
|
the list of ports to which a node exits to most destination IP
|
|
addresses.
|
|
|
|
A summary should claim a router exits to a specific TCP port if,
|
|
ignoring private IP addresses, the exit policy indicates that the
|
|
router would exit to this port to most IP address. either two /8
|
|
netblocks, or one /8 and a couple of /12s or any other combination).
|
|
The exact algorith used is this: Going through all exit policy items
|
|
- ignore any accept that is not for all IP addresses ("*"),
|
|
- ignore rejects for these netblocks (exactly, no subnetting):
|
|
0.0.0.0/8, 169.254.0.0/16, 127.0.0.0/8, 192.168.0.0/16, 10.0.0.0/8,
|
|
and 172.16.0.0/12m
|
|
- for each reject count the number of IP addresses rejected against
|
|
the affected ports,
|
|
- once we hit an accept for all IP addresses ("*") add the ports in
|
|
that policy item to the list of accepted ports, if they don't have
|
|
more than 2^25 IP addresses (that's two /8 networks) counted
|
|
against them (i.e. if the router exits to a port to everywhere but
|
|
at most two /8 networks).
|
|
|
|
An exit policy summary will be included in votes and consensus as a
|
|
new line attached to each exit node. The line will have the format
|
|
"p" <space> "accept"|"reject" <portlist>
|
|
where portlist is a comma seperated list of single port numbers or
|
|
portranges (e.g. "22,80-88,1024-6000,6667").
|
|
|
|
Whether the summary shows the list of accepted ports or the list of
|
|
rejected ports depends on which list is shorter (has a shorter string
|
|
representation). In case of ties we choose the list of accepted
|
|
ports. As an exception to this rule an allow-all policy is
|
|
represented as "accept 1-65535" instead of "reject " and a reject-all
|
|
policy is similarly given as "reject 1-65535".
|
|
|
|
Summary items are compressed, that is instead of "80-88,89-100" there
|
|
only is a single item of "80-100", similarly instead of "20,21" a
|
|
summary will say "20-21".
|
|
|
|
Port lists are sorted in ascending order.
|
|
|
|
The maximum allowed length of a policy summary (including the "accept "
|
|
or "reject ") is 1000 characters. If a summary exceeds that length we
|
|
use an accept-style summary and list as much of the port list as is
|
|
possible within these 1000 bytes.
|
|
|
|
3.4.1 Consensus selection
|
|
|
|
When building a consensus, authorities have to agree on a digest of
|
|
the server descriptor to list in the router line for each router.
|
|
This is documented in dir-spec section 3.4.
|
|
|
|
All authorities that listed that agreed upon descriptor digest in
|
|
their vote should also list the same exit policy summary - or list
|
|
none at all if the authority has not been upgraded to list that
|
|
information in their vote.
|
|
|
|
If we have votes with matching server descriptor digest of which at
|
|
least one of them has an exit policy then we differ between two cases:
|
|
a) all authorities agree (or abstained) on the policy summary, and we
|
|
use the exit policy summary that they all listed in their vote,
|
|
b) something went wrong (or some authority is playing foul) and we
|
|
have different policy summaries. In that case we pick the one
|
|
that is most commonly listed in votes with the matching
|
|
descriptor. We break ties in favour of the lexigraphically larger
|
|
vote.
|
|
|
|
If none one of the votes with a matching server descriptor digest has
|
|
an exit policy summary we use the most commonly listed one in all
|
|
votes, breaking ties like in case b above.
|
|
|
|
3.4.2 Client behaviour
|
|
|
|
When choosing an exit node for a specific request a Tor client will
|
|
choose from the list of nodes that exit to the requested port as given
|
|
by the consensus document. If a client has additional knowledge (like
|
|
cached full descriptors) that indicates the so chosen exit node will
|
|
reject the request then it MAY use that knowledge (or not include such
|
|
nodes in the selection to begin with). However, clients MUST NOT use
|
|
nodes that do not list the port as accepted in the summary (but for
|
|
which they know that the node would exit to that address from other
|
|
sources, like a cached descriptor).
|
|
|
|
An exception to this is exit enclave behaviour: A client MAY use the
|
|
node at a specific IP address to exit to any port on the same address
|
|
even if that node is not listed as exiting to the port in the summary.
|
|
|
|
4. Migration
|
|
|
|
4.1 Consensus document changes.
|
|
|
|
The consensus will need to include
|
|
- bandwidth information (see 3.1)
|
|
- exit policy summaries (3.4)
|
|
|
|
A new consensus method (number TBD) will be chosen for this.
|
|
|
|
5. Future possibilities
|
|
|
|
This proposal still requires that all servers have the descriptors of
|
|
every other node in the network in order to answer RELAY_REQUEST_SD
|
|
cells. These cells are sent when a circuit is extended from ending at
|
|
node B to a new node C. In that case B would have to answer a
|
|
RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).
|
|
|
|
In order to answer that request B obviously needs a copy of C's server
|
|
descriptor. The RELAY_REQUEST_SD cell already has all the info that
|
|
B needs to contact C so it can ask about the descriptor before passing it
|
|
back to the client.
|
|
|