tor/doc/spec/proposals/151-path-selection-improvements.txt

Filename: 151-path-selection-improvements.txt
Title: Improving Tor Path Selection
Version:
Last-Modified:
Author: Fallon Chen, Mike Perry
Created: 5-Jul-2008
Status: Draft

Overview

  The performance of paths selected can be improved by adjusting the
  CircuitBuildTimeout and avoiding failing guard nodes. This proposal
  describes a method of tracking buildtime statistics, and using those
  statistics to adjust the CircuitBuildTimeout and the number of guards.

Motivation

  Tor's performance can be improved by excluding those circuits that
  have long buildtimes (and by extension, high latency). For those Tor
  users who require better performance and have lower requirements for
  anonymity, this would be a very useful option to have.

Implementation

  Learning the CircuitBuildTimeout

    Based on studies of build times, we found that the distribution of
    circuit buildtimes appears to be a Pareto distribution. The number
    of circuits to observe (ncircuits_to_cutoff) before changing the
    CircuitBuildTimeout will be tunable. From out measurements, 
    ncircuits_to_cuttoff appears to be on the order of 100.
 
	In addition, the total number of circuits gathered
    (ncircuits_to_observe) will also be tunable. It is likely that
    ncircuits_to_observe will be somewhere on the order of 1000. The values
    can be represented compactly in Tor in milliseconds as a circular array
    of 16 bit integers. More compact long-term storage representations can
    be implemented by simply storing a histogram with 50 millisecond buckets
    when writing out the statistics to disk.

  Calculating the preferred CircuitBuildTimeout

    Circuits that have longer buildtimes than some x% of the estimated
    CDF of the Pareto distribution will be excluded. x will be tunable
    as well.

  Circuit timeouts

    In the event of a timeout, backoff values should include the 100-x%
    of expected CDF of timeouts.  Also, in the event of network failure,
    the observation mechanism should stop collecting timeout data.

  Dropping Failed Guards

    In addition, we have noticed that some entry guards are much more
    failure prone than others. In particular, the circuit failure rates for
    the fastest entry guards was approximately 20-25%, where as slower
    guards exhibit failure rates as high as 45-50%. In [1], it was
    demonstrated that failing guard nodes can deliberately bias path
    selection to improve their success at capturing traffic. For both these
    reasons, failing guards should be avoided. 
    
    We propose increasing the number of entry guards to five, and gathering
    circuit failure statistics on each entry guard. Any guards that exceed
    the average failure rate of all guards by 10% after we have
    gathered ncircuits_to_observe circuits will be replaced.
    

Issues

  Impact on anonymity

    Since this follows a Pareto distribution, large reductions on the
    timeout can be achieved without cutting off a great number of the
    total paths.  However, hard statistics on which cutoff percentage
    gives optimal performance have not yet been gathered.

  Guard Turnover

    We contend that the risk from failing guards biasing path selection
    outweighs the risk of exposure to larger portions of the network
    for the first hop. Furthermore, from our observations, it appears
    that circuit failure is strongly correlated to node load. Allowing
    clients to migrate away from failing guards should naturally
    rebalance the network, and eventually clients should converge on
    a stable set of reliable guards. It is also likely that once clients
    begin to migrate away from failing guards, their load should go
    down, causing their failure rates to drop as well.


[1] http://www.crhc.uiuc.edu/~nikita/papers/relmix-ccs07.pdf
Add proposal 150 and proposal 151 svn:r15695 2008-07-06 19:37:04 +02:00			`Filename: 151-path-selection-improvements.txt`
			`Title: Improving Tor Path Selection`
			`Version:`
			`Last-Modified:`
			`Author: Fallon Chen, Mike Perry`
			`Created: 5-Jul-2008`
			`Status: Draft`

			`Overview`

			`The performance of paths selected can be improved by adjusting the`
Add guard node failure plans to proposal. svn:r15706 2008-07-07 01:36:33 +02:00			`CircuitBuildTimeout and avoiding failing guard nodes. This proposal`
			`describes a method of tracking buildtime statistics, and using those`
			`statistics to adjust the CircuitBuildTimeout and the number of guards.`
Add proposal 150 and proposal 151 svn:r15695 2008-07-06 19:37:04 +02:00
			`Motivation`

			`Tor's performance can be improved by excluding those circuits that`
			`have long buildtimes (and by extension, high latency). For those Tor`
			`users who require better performance and have lower requirements for`
			`anonymity, this would be a very useful option to have.`

			`Implementation`

			`Learning the CircuitBuildTimeout`

			`Based on studies of build times, we found that the distribution of`
			`circuit buildtimes appears to be a Pareto distribution. The number`
Add guard node failure plans to proposal. svn:r15706 2008-07-07 01:36:33 +02:00			`of circuits to observe (ncircuits_to_cutoff) before changing the`
			`CircuitBuildTimeout will be tunable. From out measurements,`
			`ncircuits_to_cuttoff appears to be on the order of 100.`

			`In addition, the total number of circuits gathered`
			`(ncircuits_to_observe) will also be tunable. It is likely that`
			`ncircuits_to_observe will be somewhere on the order of 1000. The values`
			`can be represented compactly in Tor in milliseconds as a circular array`
			`of 16 bit integers. More compact long-term storage representations can`
			`be implemented by simply storing a histogram with 50 millisecond buckets`
			`when writing out the statistics to disk.`
Add proposal 150 and proposal 151 svn:r15695 2008-07-06 19:37:04 +02:00
			`Calculating the preferred CircuitBuildTimeout`

			`Circuits that have longer buildtimes than some x% of the estimated`
			`CDF of the Pareto distribution will be excluded. x will be tunable`
			`as well.`

			`Circuit timeouts`

			`In the event of a timeout, backoff values should include the 100-x%`
			`of expected CDF of timeouts. Also, in the event of network failure,`
			`the observation mechanism should stop collecting timeout data.`

Add guard node failure plans to proposal. svn:r15706 2008-07-07 01:36:33 +02:00			`Dropping Failed Guards`

			`In addition, we have noticed that some entry guards are much more`
			`failure prone than others. In particular, the circuit failure rates for`
			`the fastest entry guards was approximately 20-25%, where as slower`
			`guards exhibit failure rates as high as 45-50%. In [1], it was`
			`demonstrated that failing guard nodes can deliberately bias path`
			`selection to improve their success at capturing traffic. For both these`
			`reasons, failing guards should be avoided.`

			`We propose increasing the number of entry guards to five, and gathering`
			`circuit failure statistics on each entry guard. Any guards that exceed`
			`the average failure rate of all guards by 10% after we have`
			`gathered ncircuits_to_observe circuits will be replaced.`


			`Issues`

			`Impact on anonymity`
Add proposal 150 and proposal 151 svn:r15695 2008-07-06 19:37:04 +02:00
			`Since this follows a Pareto distribution, large reductions on the`
			`timeout can be achieved without cutting off a great number of the`
			`total paths. However, hard statistics on which cutoff percentage`
			`gives optimal performance have not yet been gathered.`

Add guard node failure plans to proposal. svn:r15706 2008-07-07 01:36:33 +02:00			`Guard Turnover`

			`We contend that the risk from failing guards biasing path selection`
			`outweighs the risk of exposure to larger portions of the network`
			`for the first hop. Furthermore, from our observations, it appears`
			`that circuit failure is strongly correlated to node load. Allowing`
			`clients to migrate away from failing guards should naturally`
			`rebalance the network, and eventually clients should converge on`
			`a stable set of reliable guards. It is also likely that once clients`
			`begin to migrate away from failing guards, their load should go`
			`down, causing their failure rates to drop as well.`


			`[1] http://www.crhc.uiuc.edu/~nikita/papers/relmix-ccs07.pdf`
Add proposal 150 and proposal 151 svn:r15695 2008-07-06 19:37:04 +02:00