mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-28 06:13:31 +01:00
Update spec with new right-censored pareto estimators.
This commit is contained in:
parent
f897154b26
commit
81736f426f
@ -311,15 +311,39 @@ of their choices.
|
||||
the tail of the distribution with a Pareto curve.
|
||||
|
||||
We calculate the parameters for a Pareto distribution fitting the data
|
||||
using the estimators at
|
||||
http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
|
||||
using the estimators in equation 4 from:
|
||||
http://portal.acm.org/citation.cfm?id=1647962.1648139
|
||||
|
||||
Because this is not a true Pareto distribution, we alter how Xm is
|
||||
computed. The Xm parameter is computed as the midpoint of the most
|
||||
This is:
|
||||
|
||||
alpha_m = s/(ln(U(X)/Xm^n))
|
||||
|
||||
where s is the total number of completed circuits we have seen, and
|
||||
|
||||
U(X) = x_max^u * Prod_s{x_i}
|
||||
|
||||
with x_i as our i-th completed circuit time, x_max as the longest
|
||||
completed circuit build time we have yet observed, u as the
|
||||
number of unobserved timeouts that have no exact value recorded,
|
||||
and n as u+s, the total number of circuits that either timeout or
|
||||
complete.
|
||||
|
||||
Using log laws, we compute this as the sum of logs to avoid
|
||||
overflow and ln(1.0+epsilon) precision issues:
|
||||
|
||||
alpha_m = s/(u*ln(x_max) + Sum_s{ln(x_i)} - n*ln(Xm))
|
||||
|
||||
This estimator is closely related to the parameters present in:
|
||||
http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation
|
||||
except they are adjusted to handle the fact that our samples are
|
||||
right-censored at the timeout cutoff.
|
||||
|
||||
Additionally, because this is not a true Pareto distribution, we alter
|
||||
how Xm is computed. The Xm parameter is computed as the midpoint of the most
|
||||
frequently occurring 50ms histogram bin, until the point where 1000
|
||||
circuits are recorded. After this point, the weighted average of the top
|
||||
3 midpoint modes is used as Xm. All times below this value are counted
|
||||
as having the midpoint value of this weighted average bin.
|
||||
'cbtnummodes' (default: 3) midpoint modes is used as Xm. All times below
|
||||
this value are counted as having the midpoint value of this weighted average bin.
|
||||
|
||||
The timeout itself is calculated by using the Pareto Quantile function (the
|
||||
inverted CDF) to give us the value on the CDF such that 80% of the mass
|
||||
@ -347,10 +371,18 @@ of their choices.
|
||||
|
||||
2.4.3. How to record timeouts
|
||||
|
||||
Timeouts should be counted as the expectation of the region of
|
||||
of the Pareto distribution beyond the cutoff. This is done by
|
||||
generating a random sample for each timeout at points on the
|
||||
curve beyond the current timeout cutoff up to the 90% quantile marker.
|
||||
Circuits that pass the timeout threshold should be allowed to continue
|
||||
building until a time corresponding to the point 'cbtclosequantile'
|
||||
(default 95) on the Pareto curve, or 60 seconds, whichever is greater.
|
||||
|
||||
The actual completion times for these circuits should be recorded.
|
||||
Implementations should completely abandon a circuit and record a value
|
||||
as an 'unknown' timeout if the total build time exceeds this threshold.
|
||||
|
||||
The reason for this is that right-censored pareto estimators begin to lose
|
||||
their accuracy if more than approximately 5% of the values are censored.
|
||||
Since we wish to set the cutoff at 20%, we must allow circuits to continue
|
||||
building past this cutoff point up to the 95th percentile.
|
||||
|
||||
2.4.4. Detecting Changing Network Conditions
|
||||
|
||||
@ -388,6 +420,14 @@ of their choices.
|
||||
disabled and history should be discarded. For use in
|
||||
emergency situations only.
|
||||
|
||||
cbtnummodes
|
||||
Default: 3
|
||||
Effect: This value governs how many modes to use in the weighted
|
||||
average calculation of Pareto paramter Xm. A value of 3 introduces
|
||||
some bias (2-5% of CDF) under ideal conditions, but allows for better
|
||||
performance in the event that a client chooses guard nodes of radically
|
||||
different performance characteristics.
|
||||
|
||||
cbtrecentcount
|
||||
Default: 20
|
||||
Effect: This is the number of circuit build times to keep track of
|
||||
@ -409,11 +449,11 @@ of their choices.
|
||||
Effect: This is the position on the quantile curve to use to set the
|
||||
timeout value. It is a percent (0-99).
|
||||
|
||||
cbtmaxsynthquantile
|
||||
Default: 90
|
||||
Effect: This is the maximum position on the quantile curve to use to
|
||||
generate synthetic circuit build times for timeouts. It is a
|
||||
percent (0-99).
|
||||
cbtclosequantile
|
||||
Default: 95
|
||||
Effect: This is the position on the quantile curve to use to set the
|
||||
timeout value to use to actually close circuits. It is a percent
|
||||
(0-99).
|
||||
|
||||
cbttestfreq
|
||||
Default: 60
|
||||
|
Loading…
Reference in New Issue
Block a user