tor/doc/spec/proposals/175-automatic-node-promotion.txt

Filename: xxx-automatic-node-promotion.txt
Title: Automatically promoting Tor clients to nodes
Author: Steven Murdoch
Created: 12-Mar-2010
Status: Draft
Target:

1. Overview

   This proposal describes how Tor clients could determine when they
   have sufficient bandwidth capacity and are sufficiently reliable to
   become either bridges or Tor relays. When they meet this
   criteria, they will automatically promote themselves, based on user
   preferences. The proposal also defines the new controller messages
   and options which will control this process.

   Note that for the moment, only transitions between client and
   bridge are being considered. Transitions to public relay will
   be considered at a future date, but will use the same
   infrastructure for measuring capacity and reliability.

2. Motivation and history

   Tor has a growing user-base and one of the major impediments to the
   quality of service offered is the lack of network capacity. This is
   particularly the case for bridges, because these are gradually
   being blocked, and thus no longer of use to people within some
   countries. By automatically promoting Tor clients to bridges, and
   perhaps also to full public relays, this proposal aims to solve
   these problems.
 
   Only Tor clients which are sufficiently useful should be promoted,
   and the process of determining usefulness should be performed
   without reporting the existence of the client to the central
   authorities. The criteria used for determining usefulness will be
   in terms of bandwidth capacity and uptime, but parameters should be
   specified in the directory consensus. State stored at the client
   should be in no more detail than necessary, to prevent sensitive
   information being recorded.

3. Design

3.x Opt-in state model

   Tor can be in one of five node-promotion states:

   - off (O): Currently a client, and will stay as such
   - auto (A): Currently a client, but will consider promotion
   - bridge (B): Currently a bridge, and will stay as such
   - auto-bridge (AB): Currently a bridge, but will consider promotion
   - relay (R): Currently a public relay, and will stay as such

   The state can be fully controlled from the configuration file or
   controller, but the normal state transitions are as follows:

   Any state -> off: User has opted out of node promotion
   Off -> any state: Only permitted with user consent

   Auto -> auto-bridge: Tor has detected that it is sufficiently
    reliable to be a *bridge*
   Auto -> bridge: Tor has detected that it is sufficiently reliable
    to be a *relay*, but the user has chosen to remain a *bridge*
   Auto -> relay: Tor has detected that it is sufficiently reliable
    to be *relay*, and will skip being a *bridge*
   Auto-bridge -> relay: Tor has detected that it is sufficiently
    reliable to be a *relay*

   Note that this model does not support automatic demotion. If this
   is desirable, there should be some memory as to whether the
   previous state was relay, bridge, or auto-bridge. Otherwise the
   user may be prompted to become a relay, although he has opted to
   only be a bridge.

3.x User interaction policy

   There are a variety of options in how to involve the user into the
   decision as to whether and when to perform node promotion. The
   choice also may be different when Tor is running from Vidalia (and
   thus can readily prompt the user for information), and standalone
   (where Tor can only log messages, which may or may not be read).

   The option requiring minimal user interaction is to automatically
   promote nodes according to reliability, and allow the user to opt
   out, by changing settings in the configuration file or Vidalia user
   interface.

   Alternatively, if a user interface is available, Tor could prompt
   the user when it detects that a transition is available, and allow
   the user to choose which of the available options to select. If
   Vidalia is not available, it still may be possible to solicit an
   email address on install, and contact the operator to ask whether
   a transition to bridge or relay is permitted.

   Finally, Tor could by default not make any transition, and the user
   would need to opt in by stating the maximum level (bridge or
   relay) to which the node may automatically promote itself.

3.x Performance monitoring model

   To prevent a large number of clients activating as relays, but
   being too unreliable to be useful, clients should measure their
   performance. If this performance meets a parameterized acceptance
   criteria, a client should consider promotion. To measure
   reliability, this proposal adopts a simple user model:

    - A user decides to use Tor at times which follow a Poisson
      distribution
    - At each time, the user will be happy if the bridge chosen has
      adequate bandwidth and is reachable
    - If the chosen bridge is down or slow too many times, the user
      will consider Tor to be bad

   If we additionally assume that the recent history of relay
   performance matches the current performance, we can measure
   reliability by simulating this simple user.

   The following parameters are distributed to clients in the
   directory consensus:

     - min_bandwidth: Minimum self-measured bandwidth for a node to be
       considered useful, in bytes per second
     - check_period: How long, in seconds, to wait between checking
       reachability and bandwidth (on average)
     - num_samples: Number of recent samples to keep
     - num_useful: Minimum number of recent samples where the node was
       reachable and had at least min_bandwidth capacity, for a client
       to consider promoting to a bridge

   A different set of parameters may be used for considering when to
   promote a bridge to a full relay, but this will be the subject of a
   future revision of the proposal.

3.x Performance monitoring algorithm

   The simulation described above can be implemented as follows:

   Every 60 seconds:
     1. Tor generates a random floating point number x in
        the interval [0, 1).
     2. If x > (1 / (check_period / 60)) GOTO end; otherwise:
     3. Tor sets the value last_check to the current_time (in seconds)
     4. Tor measures reachability
     5. If the client is reachable, Tor measures its bandwidth
     6. If the client is reachable and the bandwidth is >=
        min_bandwidth, the test has succeeded, otherwise it has failed.
     7. Tor adds the test result to the end of a ring-buffer containing
        the last num_samples results: measurement_results
     8. Tor saves last_check and measurements_results to disk
     9. If the length of measurements_results == num_samples and
        the number of successes >= num_useful, Tor should consider
        promotion to a bridge
   end.
 
   When Tor starts, it must fill in the samples for which it was not
   running. This can only happen once the consensus has downloaded,
   because the value of check_period is needed.
 
      1. Tor generates a random number y from the Poisson distribution [1]
         with lambda = (current_time - last_check) * (1 / check_period)
      2. Tor sets the value last_check to the current_time (in seconds)	
      3. Add y test failures to the ring buffer measurements_results
      4. Tor saves last_check and measurements_results to disk
 
   In this way, a Tor client will measure its bandwidth and
   reachability every check_period seconds, on average. Provided
   check_period is sufficiently greater than a minute (say, at least an
   hour), the times of check will follow a Poisson distribution. [2]
 
   While this does require that Tor does record the state of a client
   over time, this does not leak much information. Only a binary
   reachable/non-reachable is stored, and the timing of samples becomes
   increasingly fuzzy as the data becomes less recent.

   On IP address changes, Tor should clear the ring-buffer, because
   from the perspective of users with the old IP address, this node
   might as well be a new one with no history. This policy may change
   once we start allowing the bridge authority to hand out new IP
   addresses given the fingerprint.

3.x Bandwidth measurement

   Tor needs to measure its bandwidth to test the usefulness as a
   bridge. A non-intrusive way to do this would be to passively measure
   the peak data transfer rate since the last reachability test. Once
   this exceeds min_bandwidth, Tor can set a flag that this node
   currently has sufficient bandwidth to pass the bandwidth component
   of the upcoming performance measurement.

   For the first version we may simply skip the bandwidth test,
   because the existing reachability test sends 500 kB over several
   circuits, and checks whether the node can transfer at least 50
   kB/s.  This is probably good enough for a bridge, so this test
   might be sufficient to record a success in the ring buffer.

3.x New options

3.x New controller message

4. Migration plan

   We should start by setting a high bandwidth and uptime requirement
   in the consensus, so as to avoid overloading the bridge authority
   with too many bridges. Once we are confident our systems can scale,
   the criteria can be gradually shifted down to gain more bridges.

5. Related proposals

6. Open questions:

   - What user interaction policy should we take?

   - When (if ever) should we turn a relay into an exit relay?

   - What should the rate limits be for auto-promoted bridges/relays?
     Should we prompt the user for this?

   - Perhaps the bridge authority should tell potential bridges
     whether to enable themselves, by taking into account whether
     their IP address is blocked

   - How do we explain the possible risks of running a bridge/relay
     * Use of bandwidth/congestion
     * Publication of IP address
     * Blocking from IRC (even for non-exit relays)

   - What feedback should we give to bridge relays, to encourage then
     e.g. number of recent users (what about reserve bridges)?

   - Can clients back-off from doing these tests (yes, we should do
     this)

[1] For algorithms to generate random numbers from the Poisson
    distribution, see: http://en.wikipedia.org/wiki/Poisson_distribution#Generating_Poisson-distributed_random_variables
[2] "The sample size n should be equal to or larger than 20 and the
     probability of a single success, p, should be smaller than or equal to
     .05. If n >= 100, the approximation is excellent if np is also <= 10."
    http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc331.htm (e-Handbook of Statistical Methods)

% vim: spell ai et:
Start idea xxx-automatic-node-promotion - Initial draft of overview and motivation - Start of design 2010-03-15 00:06:48 +01:00			`Filename: xxx-automatic-node-promotion.txt`
			`Title: Automatically promoting Tor clients to nodes`
			`Author: Steven Murdoch`
			`Created: 12-Mar-2010`
			`Status: Draft`
			`Target:`

			`1. Overview`

			`This proposal describes how Tor clients could determine when they`
			`have sufficient bandwidth capacity and are sufficiently reliable to`
Change "server" to "relay", so as to match existing terminology 2010-03-15 01:14:41 +01:00			`become either bridges or Tor relays. When they meet this`
Start idea xxx-automatic-node-promotion - Initial draft of overview and motivation - Start of design 2010-03-15 00:06:48 +01:00			`criteria, they will automatically promote themselves, based on user`
			`preferences. The proposal also defines the new controller messages`
			`and options which will control this process.`

Integrate more feedback from IRC - For now we are only talking about moving clients to be bridges - Some questions on how we should inform users 2010-03-15 18:50:50 +01:00			`Note that for the moment, only transitions between client and`
			`bridge are being considered. Transitions to public relay will`
			`be considered at a future date, but will use the same`
			`infrastructure for measuring capacity and reliability.`

Start idea xxx-automatic-node-promotion - Initial draft of overview and motivation - Start of design 2010-03-15 00:06:48 +01:00			`2. Motivation and history`

			`Tor has a growing user-base and one of the major impediments to the`
			`quality of service offered is the lack of network capacity. This is`
			`particularly the case for bridges, because these are gradually`
			`being blocked, and thus no longer of use to people within some`
			`countries. By automatically promoting Tor clients to bridges, and`
Change "server" to "relay", so as to match existing terminology 2010-03-15 01:14:41 +01:00			`perhaps also to full public relays, this proposal aims to solve`
Start idea xxx-automatic-node-promotion - Initial draft of overview and motivation - Start of design 2010-03-15 00:06:48 +01:00			`these problems.`

			`Only Tor clients which are sufficiently useful should be promoted,`
			`and the process of determining usefulness should be performed`
			`without reporting the existence of the client to the central`
			`authorities. The criteria used for determining usefulness will be`
			`in terms of bandwidth capacity and uptime, but parameters should be`
			`specified in the directory consensus. State stored at the client`
			`should be in no more detail than necessary, to prevent sensitive`
			`information being recorded.`

			`3. Design`

			`3.x Opt-in state model`

			`Tor can be in one of five node-promotion states:`

			`- off (O): Currently a client, and will stay as such`
			`- auto (A): Currently a client, but will consider promotion`
			`- bridge (B): Currently a bridge, and will stay as such`
			`- auto-bridge (AB): Currently a bridge, but will consider promotion`
Change "server" to "relay", so as to match existing terminology 2010-03-15 01:14:41 +01:00			`- relay (R): Currently a public relay, and will stay as such`
Start idea xxx-automatic-node-promotion - Initial draft of overview and motivation - Start of design 2010-03-15 00:06:48 +01:00
			`The state can be fully controlled from the configuration file or`
			`controller, but the normal state transitions are as follows:`

			`Any state -> off: User has opted out of node promotion`
			`Off -> any state: Only permitted with user consent`

			`Auto -> auto-bridge: Tor has detected that it is sufficiently`
			`reliable to be a bridge`
			`Auto -> bridge: Tor has detected that it is sufficiently reliable`
Change "server" to "relay", so as to match existing terminology 2010-03-15 01:14:41 +01:00			`to be a relay, but the user has chosen to remain a bridge`
			`Auto -> relay: Tor has detected that it is sufficiently reliable`
			`to be relay, and will skip being a bridge`
			`Auto-bridge -> relay: Tor has detected that it is sufficiently`
			`reliable to be a relay`
Start idea xxx-automatic-node-promotion - Initial draft of overview and motivation - Start of design 2010-03-15 00:06:48 +01:00
Note that we only can't handle automatic demotion. Users can always change their state manually. 2010-03-15 01:23:12 +01:00			`Note that this model does not support automatic demotion. If this`
			`is desirable, there should be some memory as to whether the`
			`previous state was relay, bridge, or auto-bridge. Otherwise the`
			`user may be prompted to become a relay, although he has opted to`
			`only be a bridge.`
Start idea xxx-automatic-node-promotion - Initial draft of overview and motivation - Start of design 2010-03-15 00:06:48 +01:00
			`3.x User interaction policy`

			`There are a variety of options in how to involve the user into the`
			`decision as to whether and when to perform node promotion. The`
			`choice also may be different when Tor is running from Vidalia (and`
			`thus can readily prompt the user for information), and standalone`
			`(where Tor can only log messages, which may or may not be read).`

			`The option requiring minimal user interaction is to automatically`
			`promote nodes according to reliability, and allow the user to opt`
			`out, by changing settings in the configuration file or Vidalia user`
			`interface.`

			`Alternatively, if a user interface is available, Tor could prompt`
			`the user when it detects that a transition is available, and allow`
Add some open questions, and mention Roger's idea about asking for consent via email 2010-03-15 02:07:12 +01:00			`the user to choose which of the available options to select. If`
			`Vidalia is not available, it still may be possible to solicit an`
			`email address on install, and contact the operator to ask whether`
			`a transition to bridge or relay is permitted.`
Start idea xxx-automatic-node-promotion - Initial draft of overview and motivation - Start of design 2010-03-15 00:06:48 +01:00
			`Finally, Tor could by default not make any transition, and the user`
			`would need to opt in by stating the maximum level (bridge or`
Change "server" to "relay", so as to match existing terminology 2010-03-15 01:14:41 +01:00			`relay) to which the node may automatically promote itself.`
Start idea xxx-automatic-node-promotion - Initial draft of overview and motivation - Start of design 2010-03-15 00:06:48 +01:00
Add algorithm and rationale for performance measurement 2010-03-29 21:49:30 +02:00			`3.x Performance monitoring model`

			`To prevent a large number of clients activating as relays, but`
			`being too unreliable to be useful, clients should measure their`
			`performance. If this performance meets a parameterized acceptance`
			`criteria, a client should consider promotion. To measure`
			`reliability, this proposal adopts a simple user model:`

			`- A user decides to use Tor at times which follow a Poisson`
			`distribution`
			`- At each time, the user will be happy if the bridge chosen has`
			`adequate bandwidth and is reachable`
			`- If the chosen bridge is down or slow too many times, the user`
			`will consider Tor to be bad`

			`If we additionally assume that the recent history of relay`
			`performance matches the current performance, we can measure`
			`reliability by simulating this simple user.`

			`The following parameters are distributed to clients in the`
			`directory consensus:`

			`- min_bandwidth: Minimum self-measured bandwidth for a node to be`
			`considered useful, in bytes per second`
			`- check_period: How long, in seconds, to wait between checking`
			`reachability and bandwidth (on average)`
			`- num_samples: Number of recent samples to keep`
			`- num_useful: Minimum number of recent samples where the node was`
			`reachable and had at least min_bandwidth capacity, for a client`
			`to consider promoting to a bridge`

			`A different set of parameters may be used for considering when to`
			`promote a bridge to a full relay, but this will be the subject of a`
			`future revision of the proposal.`

			`3.x Performance monitoring algorithm`

			`The simulation described above can be implemented as follows:`

			`Every 60 seconds:`
			`1. Tor generates a random floating point number x in`
			`the interval [0, 1).`
			`2. If x > (1 / (check_period / 60)) GOTO end; otherwise:`
			`3. Tor sets the value last_check to the current_time (in seconds)`
			`4. Tor measures reachability`
			`5. If the client is reachable, Tor measures its bandwidth`
			`6. If the client is reachable and the bandwidth is >=`
			`min_bandwidth, the test has succeeded, otherwise it has failed.`
			`7. Tor adds the test result to the end of a ring-buffer containing`
			`the last num_samples results: measurement_results`
			`8. Tor saves last_check and measurements_results to disk`
			`9. If the length of measurements_results == num_samples and`
			`the number of successes >= num_useful, Tor should consider`
			`promotion to a bridge`
			`end.`

			`When Tor starts, it must fill in the samples for which it was not`
			`running. This can only happen once the consensus has downloaded,`
			`because the value of check_period is needed.`

			`1. Tor generates a random number y from the Poisson distribution [1]`
			`with lambda = (current_time - last_check) * (1 / check_period)`
			`2. Tor sets the value last_check to the current_time (in seconds)`
			`3. Add y test failures to the ring buffer measurements_results`
			`4. Tor saves last_check and measurements_results to disk`

			`In this way, a Tor client will measure its bandwidth and`
			`reachability every check_period seconds, on average. Provided`
			`check_period is sufficiently greater than a minute (say, at least an`
			`hour), the times of check will follow a Poisson distribution. [2]`

			`While this does require that Tor does record the state of a client`
			`over time, this does not leak much information. Only a binary`
			`reachable/non-reachable is stored, and the timing of samples becomes`
			`increasingly fuzzy as the data becomes less recent.`

Add comments from nickm and arma, from IRC 2010-03-30 17:57:49 +02:00			`On IP address changes, Tor should clear the ring-buffer, because`
			`from the perspective of users with the old IP address, this node`
			`might as well be a new one with no history. This policy may change`
			`once we start allowing the bridge authority to hand out new IP`
			`addresses given the fingerprint.`

			`3.x Bandwidth measurement`

			`Tor needs to measure its bandwidth to test the usefulness as a`
			`bridge. A non-intrusive way to do this would be to passively measure`
			`the peak data transfer rate since the last reachability test. Once`
			`this exceeds min_bandwidth, Tor can set a flag that this node`
			`currently has sufficient bandwidth to pass the bandwidth component`
			`of the upcoming performance measurement.`

			`For the first version we may simply skip the bandwidth test,`
			`because the existing reachability test sends 500 kB over several`
			`circuits, and checks whether the node can transfer at least 50`
			`kB/s. This is probably good enough for a bridge, so this test`
			`might be sufficient to record a success in the ring buffer.`

Start idea xxx-automatic-node-promotion - Initial draft of overview and motivation - Start of design 2010-03-15 00:06:48 +01:00			`3.x New options`

			`3.x New controller message`

In the migration plan, mention how to prevent overloading the bridge authority 2010-03-17 12:56:59 +01:00			`4. Migration plan`
Start idea xxx-automatic-node-promotion - Initial draft of overview and motivation - Start of design 2010-03-15 00:06:48 +01:00
In the migration plan, mention how to prevent overloading the bridge authority 2010-03-17 12:56:59 +01:00			`We should start by setting a high bandwidth and uptime requirement`
			`in the consensus, so as to avoid overloading the bridge authority`
			`with too many bridges. Once we are confident our systems can scale,`
			`the criteria can be gradually shifted down to gain more bridges.`

			`5. Related proposals`

			`6. Open questions:`
Add some open questions, and mention Roger's idea about asking for consent via email 2010-03-15 02:07:12 +01:00
			`- What user interaction policy should we take?`

			`- When (if ever) should we turn a relay into an exit relay?`

			`- What should the rate limits be for auto-promoted bridges/relays?`
			`Should we prompt the user for this?`

			`- Perhaps the bridge authority should tell potential bridges`
			`whether to enable themselves, by taking into account whether`
			`their IP address is blocked`
Integrate more feedback from IRC - For now we are only talking about moving clients to be bridges - Some questions on how we should inform users 2010-03-15 18:50:50 +01:00
			`- How do we explain the possible risks of running a bridge/relay`
			`* Use of bandwidth/congestion`
			`* Publication of IP address`
			`* Blocking from IRC (even for non-exit relays)`

			`- What feedback should we give to bridge relays, to encourage then`
			`e.g. number of recent users (what about reserve bridges)?`
Add algorithm and rationale for performance measurement 2010-03-29 21:49:30 +02:00
Add comments from nickm and arma, from IRC 2010-03-30 17:57:49 +02:00			`- Can clients back-off from doing these tests (yes, we should do`
			`this)`

Add algorithm and rationale for performance measurement 2010-03-29 21:49:30 +02:00			`[1] For algorithms to generate random numbers from the Poisson`
			`distribution, see: http://en.wikipedia.org/wiki/Poisson_distribution#Generating_Poisson-distributed_random_variables`
			`[2] "The sample size n should be equal to or larger than 20 and the`
			`probability of a single success, p, should be smaller than or equal to`
			`.05. If n >= 100, the approximation is excellent if np is also <= 10."`
			`http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc331.htm (e-Handbook of Statistical Methods)`

			`% vim: spell ai et:`