mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-12-11 21:23:35 +01:00
149 lines
6.1 KiB
Plaintext
149 lines
6.1 KiB
Plaintext
Filename: 151-path-selection-improvements.txt
|
|
Title: Improving Tor Path Selection
|
|
Author: Fallon Chen, Mike Perry
|
|
Created: 5-Jul-2008
|
|
Status: Finished
|
|
In-Spec: path-spec.txt
|
|
|
|
Overview
|
|
|
|
The performance of paths selected can be improved by adjusting the
|
|
CircuitBuildTimeout and avoiding failing guard nodes. This proposal
|
|
describes a method of tracking buildtime statistics at the client, and
|
|
using those statistics to adjust the CircuitBuildTimeout.
|
|
|
|
Motivation
|
|
|
|
Tor's performance can be improved by excluding those circuits that
|
|
have long buildtimes (and by extension, high latency). For those Tor
|
|
users who require better performance and have lower requirements for
|
|
anonymity, this would be a very useful option to have.
|
|
|
|
Implementation
|
|
|
|
Gathering Build Times
|
|
|
|
Circuit build times are stored in the circular array
|
|
'circuit_build_times' consisting of uint32_t elements as milliseconds.
|
|
The total size of this array is based on the number of circuits
|
|
it takes to converge on a good fit of the long term distribution of
|
|
the circuit builds for a fixed link. We do not want this value to be
|
|
too large, because it will make it difficult for clients to adapt to
|
|
moving between different links.
|
|
|
|
From our observations, the minimum value for a reasonable fit appears
|
|
to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep
|
|
a good fit over the long term, we store 5000 most recent circuits in
|
|
the array (NCIRCUITS_TO_OBSERVE).
|
|
|
|
The Tor client will build test circuits at a rate of one per
|
|
minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of
|
|
MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have
|
|
a CircuitBuildTimeout estimated within 8 hours after install,
|
|
upgrade, or network change (see below).
|
|
|
|
Long Term Storage
|
|
|
|
The long-term storage representation is implemented by storing a
|
|
histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when
|
|
writing out the statistics to disk. The format this takes in the
|
|
state file is 'CircuitBuildTime <bin-ms> <count>', with the total
|
|
specified as 'TotalBuildTimes <total>'
|
|
Example:
|
|
|
|
TotalBuildTimes 100
|
|
CircuitBuildTimeBin 25 50
|
|
CircuitBuildTimeBin 75 25
|
|
CircuitBuildTimeBin 125 13
|
|
...
|
|
|
|
Reading the histogram in will entail inserting <count> values
|
|
into the circuit_build_times array each with the value of
|
|
<bin-ms> milliseconds. In order to evenly distribute the values
|
|
in the circular array, the Fisher-Yates shuffle will be performed
|
|
after reading values from the bins.
|
|
|
|
Learning the CircuitBuildTimeout
|
|
|
|
Based on studies of build times, we found that the distribution of
|
|
circuit buildtimes appears to be a Frechet distribution. However,
|
|
estimators and quantile functions of the Frechet distribution are
|
|
difficult to work with and slow to converge. So instead, since we
|
|
are only interested in the accuracy of the tail, we approximate
|
|
the tail of the distribution with a Pareto curve starting at
|
|
the mode of the circuit build time sample set.
|
|
|
|
We will calculate the parameters for a Pareto distribution
|
|
fitting the data using the estimators at
|
|
http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
|
|
|
|
The timeout itself is calculated by using the Quartile function (the
|
|
inverted CDF) to give us the value on the CDF such that
|
|
BUILDTIME_PERCENT_CUTOFF (80%) of the mass of the distribution is
|
|
below the timeout value.
|
|
|
|
Thus, we expect that the Tor client will accept the fastest 80% of
|
|
the total number of paths on the network.
|
|
|
|
Detecting Changing Network Conditions
|
|
|
|
We attempt to detect both network connectivity loss and drastic
|
|
changes in the timeout characteristics.
|
|
|
|
We assume that we've had network connectivity loss if 3 circuits
|
|
timeout and we've received no cells or TLS handshakes since those
|
|
circuits began. We then set the timeout to 60 seconds and stop
|
|
counting timeouts.
|
|
|
|
If 3 more circuits timeout and the network still has not been
|
|
live within this new 60 second timeout window, we then discard
|
|
the previous timeouts during this period from our history.
|
|
|
|
To detect changing network conditions, we keep a history of
|
|
the timeout or non-timeout status of the past RECENT_CIRCUITS (20)
|
|
that successfully completed at least one hop. If more than 75%
|
|
of these circuits timeout, we discard all buildtimes history,
|
|
reset the timeout to 60, and then begin recomputing the timeout.
|
|
|
|
Testing
|
|
|
|
After circuit build times, storage, and learning are implemented,
|
|
the resulting histogram should be checked for consistency by
|
|
verifying it persists across successive Tor invocations where
|
|
no circuits are built. In addition, we can also use the existing
|
|
buildtime scripts to record build times, and verify that the histogram
|
|
the python produces matches that which is output to the state file in Tor,
|
|
and verify that the Pareto parameters and cutoff points also match.
|
|
|
|
We will also verify that there are no unexpected large deviations from
|
|
node selection, such as nodes from distant geographical locations being
|
|
completely excluded.
|
|
|
|
Dealing with Timeouts
|
|
|
|
Timeouts should be counted as the expectation of the region of
|
|
of the Pareto distribution beyond the cutoff. This is done by
|
|
generating a random sample for each timeout at points on the
|
|
curve beyond the current timeout cutoff.
|
|
|
|
Future Work
|
|
|
|
At some point, it may be desirable to change the cutoff from a
|
|
single hard cutoff that destroys the circuit to a soft cutoff and
|
|
a hard cutoff, where the soft cutoff merely triggers the building
|
|
of a new circuit, and the hard cutoff triggers destruction of the
|
|
circuit.
|
|
|
|
It may also be beneficial to learn separate timeouts for each
|
|
guard node, as they will have slightly different distributions.
|
|
This will take longer to generate initial values though.
|
|
|
|
Issues
|
|
|
|
Impact on anonymity
|
|
|
|
Since this follows a Pareto distribution, large reductions on the
|
|
timeout can be achieved without cutting off a great number of the
|
|
total paths. This will eliminate a great deal of the performance
|
|
variation of Tor usage.
|