tor/doc/spec/proposals/151-path-selection-improvements.txt

Filename: 151-path-selection-improvements.txt
Title: Improving Tor Path Selection
Author: Fallon Chen, Mike Perry
Created: 5-Jul-2008
Status: Finished

Overview

  The performance of paths selected can be improved by adjusting the
  CircuitBuildTimeout and avoiding failing guard nodes. This proposal
  describes a method of tracking buildtime statistics at the client, and
  using those statistics to adjust the CircuitBuildTimeout.

Motivation

  Tor's performance can be improved by excluding those circuits that
  have long buildtimes (and by extension, high latency). For those Tor
  users who require better performance and have lower requirements for
  anonymity, this would be a very useful option to have.

Implementation

  Gathering Build Times

    Circuit build times are stored in the circular array
    'circuit_build_times' consisting of uint32_t elements as milliseconds.
    The total size of this array is based on the number of circuits
    it takes to converge on a good fit of the long term distribution of
    the circuit builds for a fixed link. We do not want this value to be
    too large, because it will make it difficult for clients to adapt to
    moving between different links.

    From our observations, the minimum value for a reasonable fit appears
    to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep
    a good fit over the long term, we store 5000 most recent circuits in
    the array (NCIRCUITS_TO_OBSERVE).

    The Tor client will build test circuits at a rate of one per
    minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of
    MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have
    a CircuitBuildTimeout estimated within 8 hours after install,
    upgrade, or network change (see below).

  Long Term Storage

    The long-term storage representation is implemented by storing a
    histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when
    writing out the statistics to disk. The format this takes in the
    state file is 'CircuitBuildTime <bin-ms> <count>', with the total
    specified as 'TotalBuildTimes <total>'
    Example:

    TotalBuildTimes 100
    CircuitBuildTimeBin 25 50
    CircuitBuildTimeBin 75 25
    CircuitBuildTimeBin 125 13
    ...

    Reading the histogram in will entail inserting <count> values
    into the circuit_build_times array each with the value of
    <bin-ms> milliseconds. In order to evenly distribute the values
    in the circular array, the Fisher-Yates shuffle will be performed
    after reading values from the bins.

  Learning the CircuitBuildTimeout

    Based on studies of build times, we found that the distribution of
    circuit buildtimes appears to be a Frechet distribution. However,
    estimators and quantile functions of the Frechet distribution are
    difficult to work with and slow to converge. So instead, since we
    are only interested in the accuracy of the tail, we approximate
    the tail of the distribution with a Pareto curve starting at
    the mode of the circuit build time sample set.

    We will calculate the parameters for a Pareto distribution
    fitting the data using the estimators at
    http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.

    The timeout itself is calculated by using the Quartile function (the
    inverted CDF) to give us the value on the CDF such that
    BUILDTIME_PERCENT_CUTOFF (80%) of the mass of the distribution is
    below the timeout value.

    Thus, we expect that the Tor client will accept the fastest 80% of
    the total number of paths on the network.

  Detecting Changing Network Conditions

    We attempt to detect both network connectivity loss and drastic
    changes in the timeout characteristics.

    We assume that we've had network connectivity loss if 3 circuits
    timeout and we've received no cells or TLS handshakes since those
    circuits began. We then set the timeout to 60 seconds and stop
    counting timeouts.

    If 3 more circuits timeout and the network still has not been
    live within this new 60 second timeout window, we then discard
    the previous timeouts during this period from our history.

    To detect changing network conditions, we keep a history of
    the timeout or non-timeout status of the past RECENT_CIRCUITS (20)
    that successfully completed at least one hop. If more than 75%
    of these circuits timeout, we discard all buildtimes history,
    reset the timeout to 60, and then begin recomputing the timeout.

  Testing

    After circuit build times, storage, and learning are implemented,
    the resulting histogram should be checked for consistency by
    verifying it persists across successive Tor invocations where
    no circuits are built. In addition, we can also use the existing
    buildtime scripts to record build times, and verify that the histogram
    the python produces matches that which is output to the state file in Tor,
    and verify that the Pareto parameters and cutoff points also match.

    We will also verify that there are no unexpected large deviations from
    node selection, such as nodes from distant geographical locations being
    completely excluded.

  Dealing with Timeouts

    Timeouts should be counted as the expectation of the region of
    of the Pareto distribution beyond the cutoff. This is done by
    generating a random sample for each timeout at points on the
    curve beyond the current timeout cutoff.

  Future Work

    At some point, it may be desirable to change the cutoff from a
    single hard cutoff that destroys the circuit to a soft cutoff and
    a hard cutoff, where the soft cutoff merely triggers the building
    of a new circuit, and the hard cutoff triggers destruction of the
    circuit.

    It may also be beneficial to learn separate timeouts for each
    guard node, as they will have slightly different distributions.
    This will take longer to generate initial values though.

Issues

  Impact on anonymity

    Since this follows a Pareto distribution, large reductions on the
    timeout can be achieved without cutting off a great number of the
    total paths. This will eliminate a great deal of the performance
    variation of Tor usage.
Add proposal 150 and proposal 151 svn:r15695 2008-07-06 19:37:04 +02:00			`Filename: 151-path-selection-improvements.txt`
			`Title: Improving Tor Path Selection`
			`Author: Fallon Chen, Mike Perry`
			`Created: 5-Jul-2008`
The name for the proposal status of 151 is "finished", not "Implemented" 2010-01-28 06:04:45 +01:00			`Status: Finished`
Add proposal 150 and proposal 151 svn:r15695 2008-07-06 19:37:04 +02:00
			`Overview`

			`The performance of paths selected can be improved by adjusting the`
Add guard node failure plans to proposal. svn:r15706 2008-07-07 01:36:33 +02:00			`CircuitBuildTimeout and avoiding failing guard nodes. This proposal`
Remove trailing spaces. As if bytes were free... Also correct some typos. 2009-09-03 14:44:01 +02:00			`describes a method of tracking buildtime statistics at the client, and`
Updated to remove dropping of failing guards and just focus on the specifics of recording, storing, and learning circuitbuildtimeout parameters. svn:r16511 2008-08-12 20:23:38 +02:00			`using those statistics to adjust the CircuitBuildTimeout.`
Add proposal 150 and proposal 151 svn:r15695 2008-07-06 19:37:04 +02:00
			`Motivation`

			`Tor's performance can be improved by excluding those circuits that`
			`have long buildtimes (and by extension, high latency). For those Tor`
			`users who require better performance and have lower requirements for`
			`anonymity, this would be a very useful option to have.`

			`Implementation`

Update proposal to match implementation. 2009-09-17 02:03:54 +02:00			`Gathering Build Times`
Updated to remove dropping of failing guards and just focus on the specifics of recording, storing, and learning circuitbuildtimeout parameters. svn:r16511 2008-08-12 20:23:38 +02:00
Update proposal to bring it more in-line with implementation. 2009-09-02 05:13:52 +02:00			`Circuit build times are stored in the circular array`
			`'circuit_build_times' consisting of uint32_t elements as milliseconds.`
			`The total size of this array is based on the number of circuits`
Updated to remove dropping of failing guards and just focus on the specifics of recording, storing, and learning circuitbuildtimeout parameters. svn:r16511 2008-08-12 20:23:38 +02:00			`it takes to converge on a good fit of the long term distribution of`
			`the circuit builds for a fixed link. We do not want this value to be`
			`too large, because it will make it difficult for clients to adapt to`
			`moving between different links.`

Update proposal to match implementation. 2009-09-17 02:03:54 +02:00			`From our observations, the minimum value for a reasonable fit appears`
			`to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep`
			`a good fit over the long term, we store 5000 most recent circuits in`
			`the array (NCIRCUITS_TO_OBSERVE).`

			`The Tor client will build test circuits at a rate of one per`
			`minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of`
			`MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have`
			`a CircuitBuildTimeout estimated within 8 hours after install,`
			`upgrade, or network change (see below).`
Remove trailing spaces. As if bytes were free... Also correct some typos. 2009-09-03 14:44:01 +02:00
Updated to remove dropping of failing guards and just focus on the specifics of recording, storing, and learning circuitbuildtimeout parameters. svn:r16511 2008-08-12 20:23:38 +02:00			`Long Term Storage`

Remove trailing spaces. As if bytes were free... Also correct some typos. 2009-09-03 14:44:01 +02:00			`The long-term storage representation is implemented by storing a`
			`histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when`
Update proposal to bring it more in-line with implementation. 2009-09-02 05:13:52 +02:00			`writing out the statistics to disk. The format this takes in the`
Remove trailing spaces. As if bytes were free... Also correct some typos. 2009-09-03 14:44:01 +02:00			`state file is 'CircuitBuildTime <bin-ms> <count>', with the total`
Update proposal to bring it more in-line with implementation. 2009-09-02 05:13:52 +02:00			`specified as 'TotalBuildTimes <total>'`
Updated to remove dropping of failing guards and just focus on the specifics of recording, storing, and learning circuitbuildtimeout parameters. svn:r16511 2008-08-12 20:23:38 +02:00			`Example:`

Update proposal after feedback from Nick. svn:r16556 2008-08-15 06:13:11 +02:00			`TotalBuildTimes 100`
Update proposal to match implementation. 2009-09-17 02:03:54 +02:00			`CircuitBuildTimeBin 25 50`
			`CircuitBuildTimeBin 75 25`
			`CircuitBuildTimeBin 125 13`
Updated to remove dropping of failing guards and just focus on the specifics of recording, storing, and learning circuitbuildtimeout parameters. svn:r16511 2008-08-12 20:23:38 +02:00			`...`

Update proposal to bring it more in-line with implementation. 2009-09-02 05:13:52 +02:00			`Reading the histogram in will entail inserting <count> values`
			`into the circuit_build_times array each with the value of`
			`<bin-ms> milliseconds. In order to evenly distribute the values`
			`in the circular array, the Fisher-Yates shuffle will be performed`
			`after reading values from the bins.`
Updated to remove dropping of failing guards and just focus on the specifics of recording, storing, and learning circuitbuildtimeout parameters. svn:r16511 2008-08-12 20:23:38 +02:00
Add proposal 150 and proposal 151 svn:r15695 2008-07-06 19:37:04 +02:00			`Learning the CircuitBuildTimeout`

			`Based on studies of build times, we found that the distribution of`
Update proposal to match implementation. 2009-09-17 02:03:54 +02:00			`circuit buildtimes appears to be a Frechet distribution. However,`
			`estimators and quantile functions of the Frechet distribution are`
			`difficult to work with and slow to converge. So instead, since we`
			`are only interested in the accuracy of the tail, we approximate`
			`the tail of the distribution with a Pareto curve starting at`
			`the mode of the circuit build time sample set.`
Add guard node failure plans to proposal. svn:r15706 2008-07-07 01:36:33 +02:00
Updated to remove dropping of failing guards and just focus on the specifics of recording, storing, and learning circuitbuildtimeout parameters. svn:r16511 2008-08-12 20:23:38 +02:00			`We will calculate the parameters for a Pareto distribution`
			`fitting the data using the estimators at`
			`http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.`
Add guard node failure plans to proposal. svn:r15706 2008-07-07 01:36:33 +02:00
Update proposal to bring it more in-line with implementation. 2009-09-02 05:13:52 +02:00			`The timeout itself is calculated by using the Quartile function (the`
			`inverted CDF) to give us the value on the CDF such that`
			`BUILDTIME_PERCENT_CUTOFF (80%) of the mass of the distribution is`
			`below the timeout value.`
Add proposal 150 and proposal 151 svn:r15695 2008-07-06 19:37:04 +02:00
Remove trailing spaces. As if bytes were free... Also correct some typos. 2009-09-03 14:44:01 +02:00			`Thus, we expect that the Tor client will accept the fastest 80% of`
Update proposal to bring it more in-line with implementation. 2009-09-02 05:13:52 +02:00			`the total number of paths on the network.`

			`Detecting Changing Network Conditions`

Update proposal to match implementation. 2009-09-17 02:03:54 +02:00			`We attempt to detect both network connectivity loss and drastic`
			`changes in the timeout characteristics.`
Update proposal to bring it more in-line with implementation. 2009-09-02 05:13:52 +02:00
Implement and document new network liveness algorithm. Based on irc discussion with arma. 2009-09-18 11:01:39 +02:00			`We assume that we've had network connectivity loss if 3 circuits`
Fix typos and comments, plus two bugs A) We were considering a circuit had timed out in the special cases where we close rendezvous circuits because the final rendezvous circuit couldn't be built in time. B) We were looking at the wrong timestamp_created when considering a timeout. 2009-09-21 01:50:44 +02:00			`timeout and we've received no cells or TLS handshakes since those`
Implement and document new network liveness algorithm. Based on irc discussion with arma. 2009-09-18 11:01:39 +02:00			`circuits began. We then set the timeout to 60 seconds and stop`
			`counting timeouts.`

			`If 3 more circuits timeout and the network still has not been`
			`live within this new 60 second timeout window, we then discard`
			`the previous timeouts during this period from our history.`

			`To detect changing network conditions, we keep a history of`
			`the timeout or non-timeout status of the past RECENT_CIRCUITS (20)`
Fix typos and comments, plus two bugs A) We were considering a circuit had timed out in the special cases where we close rendezvous circuits because the final rendezvous circuit couldn't be built in time. B) We were looking at the wrong timestamp_created when considering a timeout. 2009-09-21 01:50:44 +02:00			`that successfully completed at least one hop. If more than 75%`
Implement and document new network liveness algorithm. Based on irc discussion with arma. 2009-09-18 11:01:39 +02:00			`of these circuits timeout, we discard all buildtimes history,`
			`reset the timeout to 60, and then begin recomputing the timeout.`
Update proposal to match implementation. 2009-09-17 02:03:54 +02:00
Update proposal after feedback from Nick. svn:r16556 2008-08-15 06:13:11 +02:00			`Testing`

			`After circuit build times, storage, and learning are implemented,`
			`the resulting histogram should be checked for consistency by`
Remove trailing spaces. As if bytes were free... Also correct some typos. 2009-09-03 14:44:01 +02:00			`verifying it persists across successive Tor invocations where`
Update proposal after feedback from Nick. svn:r16556 2008-08-15 06:13:11 +02:00			`no circuits are built. In addition, we can also use the existing`
Remove trailing spaces. As if bytes were free... Also correct some typos. 2009-09-03 14:44:01 +02:00			`buildtime scripts to record build times, and verify that the histogram`
Update proposal after feedback from Nick. svn:r16556 2008-08-15 06:13:11 +02:00			`the python produces matches that which is output to the state file in Tor,`
			`and verify that the Pareto parameters and cutoff points also match.`
Remove trailing spaces. As if bytes were free... Also correct some typos. 2009-09-03 14:44:01 +02:00
Update proposal to match implementation. 2009-09-17 02:03:54 +02:00			`We will also verify that there are no unexpected large deviations from`
			`node selection, such as nodes from distant geographical locations being`
			`completely excluded.`
Updated to remove dropping of failing guards and just focus on the specifics of recording, storing, and learning circuitbuildtimeout parameters. svn:r16511 2008-08-12 20:23:38 +02:00
			`Dealing with Timeouts`

Remove trailing spaces. As if bytes were free... Also correct some typos. 2009-09-03 14:44:01 +02:00			`Timeouts should be counted as the expectation of the region of`
Update proposal to match implementation. 2009-09-17 02:03:54 +02:00			`of the Pareto distribution beyond the cutoff. This is done by`
			`generating a random sample for each timeout at points on the`
			`curve beyond the current timeout cutoff.`
Updated to remove dropping of failing guards and just focus on the specifics of recording, storing, and learning circuitbuildtimeout parameters. svn:r16511 2008-08-12 20:23:38 +02:00
Update proposal to match implementation. 2009-09-17 02:03:54 +02:00			`Future Work`
Add proposal 150 and proposal 151 svn:r15695 2008-07-06 19:37:04 +02:00
Update proposal to match implementation. 2009-09-17 02:03:54 +02:00			`At some point, it may be desirable to change the cutoff from a`
			`single hard cutoff that destroys the circuit to a soft cutoff and`
			`a hard cutoff, where the soft cutoff merely triggers the building`
			`of a new circuit, and the hard cutoff triggers destruction of the`
			`circuit.`
Add guard node failure plans to proposal. svn:r15706 2008-07-07 01:36:33 +02:00
Update proposal to match implementation. 2009-09-17 02:03:54 +02:00			`It may also be beneficial to learn separate timeouts for each`
			`guard node, as they will have slightly different distributions.`
			`This will take longer to generate initial values though.`
Remove trailing spaces. As if bytes were free... Also correct some typos. 2009-09-03 14:44:01 +02:00
Updated to remove dropping of failing guards and just focus on the specifics of recording, storing, and learning circuitbuildtimeout parameters. svn:r16511 2008-08-12 20:23:38 +02:00			`Issues`

			`Impact on anonymity`
Add proposal 150 and proposal 151 svn:r15695 2008-07-06 19:37:04 +02:00
Updated to remove dropping of failing guards and just focus on the specifics of recording, storing, and learning circuitbuildtimeout parameters. svn:r16511 2008-08-12 20:23:38 +02:00			`Since this follows a Pareto distribution, large reductions on the`
			`timeout can be achieved without cutting off a great number of the`
			`total paths. This will eliminate a great deal of the performance`
			`variation of Tor usage.`