Update proposal to match implementation.

This commit is contained in:
Mike Perry 2009-09-16 17:03:54 -07:00
parent 0352d43917
commit 81dc435ffa

View File

@ -20,7 +20,7 @@ Motivation
Implementation
Storing Build Times
Gathering Build Times
Circuit build times are stored in the circular array
'circuit_build_times' consisting of uint32_t elements as milliseconds.
@ -30,8 +30,16 @@ Implementation
too large, because it will make it difficult for clients to adapt to
moving between different links.
From our observations, this value appears to be on the order of 1000,
but is configurable in a #define NCIRCUITS_TO_OBSERVE.
From our observations, the minimum value for a reasonable fit appears
to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep
a good fit over the long term, we store 5000 most recent circuits in
the array (NCIRCUITS_TO_OBSERVE).
The Tor client will build test circuits at a rate of one per
minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of
MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have
a CircuitBuildTimeout estimated within 8 hours after install,
upgrade, or network change (see below).
Long Term Storage
@ -43,9 +51,9 @@ Implementation
Example:
TotalBuildTimes 100
CircuitBuildTimeBin 0 50
CircuitBuildTimeBin 50 25
CircuitBuildTimeBin 100 13
CircuitBuildTimeBin 25 50
CircuitBuildTimeBin 75 25
CircuitBuildTimeBin 125 13
...
Reading the histogram in will entail inserting <count> values
@ -57,7 +65,12 @@ Implementation
Learning the CircuitBuildTimeout
Based on studies of build times, we found that the distribution of
circuit buildtimes appears to be a Pareto distribution.
circuit buildtimes appears to be a Frechet distribution. However,
estimators and quantile functions of the Frechet distribution are
difficult to work with and slow to converge. So instead, since we
are only interested in the accuracy of the tail, we approximate
the tail of the distribution with a Pareto curve starting at
the mode of the circuit build time sample set.
We will calculate the parameters for a Pareto distribution
fitting the data using the estimators at
@ -73,11 +86,8 @@ Implementation
Detecting Changing Network Conditions
We attempt to detect both network connectivty loss and drastic
changes in the timeout characteristics. Network connectivity loss
is detected by recording a timestamp every time Tor either completes
a TLS connection or receives a cell. If this timestamp is more than
90 seconds in the past, circuit timeouts are no longer counted.
We attempt to detect both network connectivity loss and drastic
changes in the timeout characteristics.
If more than MAX_RECENT_TIMEOUT_RATE (80%) of the past
RECENT_CIRCUITS (20) time out, we assume the network connection
@ -86,6 +96,11 @@ Implementation
position on the Pareto Quartile function for the ratio of
timeouts.
Network connectivity loss is detected by recording a timestamp every
time Tor either completes a TLS connection or receives a cell. If
this timestamp is more than CircuitBuildTimeout*RECENT_CIRCUITS/3
seconds in the past, circuit timeouts are no longer counted.
Testing
After circuit build times, storage, and learning are implemented,
@ -96,7 +111,18 @@ Implementation
the python produces matches that which is output to the state file in Tor,
and verify that the Pareto parameters and cutoff points also match.
Soft timeout vs Hard Timeout
We will also verify that there are no unexpected large deviations from
node selection, such as nodes from distant geographical locations being
completely excluded.
Dealing with Timeouts
Timeouts should be counted as the expectation of the region of
of the Pareto distribution beyond the cutoff. This is done by
generating a random sample for each timeout at points on the
curve beyond the current timeout cutoff.
Future Work
At some point, it may be desirable to change the cutoff from a
single hard cutoff that destroys the circuit to a soft cutoff and
@ -104,36 +130,9 @@ Implementation
of a new circuit, and the hard cutoff triggers destruction of the
circuit.
Good values for hard and soft cutoffs seem to be 80% and 60%
respectively, but we should eventually justify this with observation.
When to Begin Calculation
The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before
changing the CircuitBuildTimeout will be tunable via a #define. From
our measurements, a good value for NCIRCUITS_TO_CUTOFF appears to be
on the order of 100.
Dealing with Timeouts
Timeouts should be counted as the expectation of the region of
of the Pareto distribution beyond the cutoff. The proposal will
be updated with this value soon.
Also, in the event of network failure, the observation mechanism
should stop collecting timeout data.
Client Hints
Some research still needs to be done to provide initial values
for CircuitBuildTimeout based on values learned from modem
users, DSL users, Cable Modem users, and dedicated links. A
radiobutton in Vidalia should eventually be provided that
sets CircuitBuildTimeout to one of these values and also
provide the option of purging all learned data, should any exist.
These values can either be published in the directory, or
shipped hardcoded for a particular Tor version.
It may also be beneficial to learn separate timeouts for each
guard node, as they will have slightly different distributions.
This will take longer to generate initial values though.
Issues