tor/doc/design-paper/performance.tex
2009-02-04 23:26:39 +00:00

224 lines
16 KiB
TeX

\documentclass{article}
%\usepackage{palatcm}
\usepackage{fancyhdr}
\usepackage{color}
\usepackage{graphicx}
\usepackage{prettyref}
\newrefformat{sec}{Section~\ref{#1}}
\newrefformat{tab}{Table~\ref{#1}}
\newrefformat{fig}{Figure~\ref{#1}}
\newrefformat{cha}{Chapter~\ref{#1}}
\newrefformat{eqn}{Equation~\ref{#1}}
\newrefformat{apx}{Appendix~\ref{#1}}
\usepackage{hyperref}
\hypersetup{colorlinks, citecolor=MyDarkRed, filecolor=MyDarkBlue, linkcolor=MyDarkRed, urlcolor=MyDarkBlue}
\definecolor{MyDarkBlue}{rgb}{0, 0.0, 0.45}
\definecolor{MyDarkRed}{rgb}{0.45, 0.0, 0}
\definecolor{MyDarkGreen}{rgb}{0, 0.45, 0}
\definecolor{MyLightGray}{gray}{.90}
\definecolor{MyLightGreen}{rgb}{0.5, 0.99, 0.5}
\usepackage{xspace}
\makeatletter
\newcommand{\ie}{i.e.\@\xspace}
\newcommand{\eg}{e.g.\@\xspace}
\newcommand{\cf}{cf.\@\xspace}
\newcommand{\vs}{vs.\@\xspace}
\newcommand{\wrt}{w.r.t.\@\xspace}
\newcommand{\etal}{\textit{et al.\@\xspace}}
\newcommand{\detal}{\textit{et al.}}
\newcommand{\ia}{inter alia\xspace}
\makeatother
\newcommand{\thetitle}{Performance Improvements on Tor}
\title{\thetitle}
%% Please add your name in here if you contribute
\author{Steven J. Murdoch}
\pagestyle{fancy}
\fancyhf{}
\fancyhead[C]{\thetitle}
\fancyfoot[C]{\thepage}
\begin{document}
\thispagestyle{plain}
\maketitle
\section{Minimizing latency of paths}
Currently Tor selects paths purely by the random selection of nodes, biased by node bandwidth.
This will sometimes cause high latency circuits due to multiple ocean crossings or otherwise congested links.
An alternative approach would be to not only bias selection of nodes based on bandwidth, but to also bias the selection of hops based on expected latency.
One option would be to predict the latency of hops based on geolocation of the node IP address.
This approach has the advantage of not requiring any additional measurement database to be published.
However, it does assume that the geolocation database is accurate and that physical distance between hops is an accurate estimator for latency.
A second option would be to actually measure hop latency, and publish the database.
Nodes could do this themselves and include the results in their descriptor.
Alternatively, a central authority could perform the measurements and publish the results.
Performing these measurements would be a $O(n^2)$ problem, where $n$ is the number of nodes, so does not scale well.
Publishing a latency database would also increase the size of the directory that each client must download.
If na\"{\i}vely implemented, the database would scale with $O(n^2)$.
However, a more efficient versions could be created, such as by dimension reduction, creating a map in which the distance between any two nodes is an approximation of the latency of a hop between them.
Delta compression could be used if the map changes slowly.
Reducing the number of potential paths would also have anonymity consequences, and these would need to be carefully considered.
For example, an attacker who wishes to monitor traffic could create several nodes, on distinct /16 subnets, but with low latency between them.
A Tor client trying to minimize latency would be more likely to select these nodes for both entry than exit than it would otherwise.
This particular problem could be mitigated by selecting entry and exit node as normal, and only using latency measurements to select the middle node.
\section{Peer-to-peer bandwidth estimation}
Snader and Borisov proposed that each Tor node opportunistically monitor the data rates that it achieves when communicating with other Tor nodes.
Since currently Tor uses a clique topology, given enough time, all nodes will communicate with all other Tor nodes.
If each Tor node reported their measurements back to a central authority, it would be possible to estimate the capacity of each Tor node.
This estimate would be difficult to game, when compared to the current self-advertisement of bandwidth capacity.
Experiments show that opportunistic bandwidth measurement has a better systematic error than Tor's current self-advertised measure, although has a poorer log-log correlation (0.48 vs. 0.57).
The most accurate scheme is active probing of capacity, with a log-log correlation of 0.63, but this introduces network overhead.
All three schemes do suffer from fairly poor accuracy, presumably due to some nodes with high variance in bandwidth capacity.
\section{Considering exit policy in node selection}
When selecting an exit node for a circuit, a Tor client will build a list of all exit nodes which can carry the desired stream, then select from them with a probability weighted by each node's capacity\footnote{The actual algorithm is slightly more complex, in particular exit nodes which are also guard nodes will be weighted less, and there is also preemptive circuit creation}.
This means that nodes with more permissive exit policies will be candidates for more circuits, and hence will be more heavily loaded compared to nodes with restrictive policies.
\begin{figure}
\includegraphics[width=\textwidth]{node-selection/exit-capacity}
\caption{Exit node capacity, in terms of number of nodes and advertised bandwidth for a selection of port numbers.}
\label{fig:exit-capacity}
\end{figure}
\prettyref{fig:exit-capacity} shows the exit node capacity for a selection of port numbers.
It can be clearly seen that there is a radical difference in the availability of nodes for certain ports, generally those not in the default exit policy.
Any traffic to these ports will be routed through a small number of exit nodes, and if they have a permissive exit policy, they will likely become overloaded from all the other traffic they receive.
The extent of this effect will depend on how much traffic in Tor is to ports which are not in the default exit policy.
The overloading of permissive exit nodes can be counteracted by adjusting the selection probability of a node based on its exit policy and knowledge of the network load per-port.
While it should improve performance, this modification will make it easier for malicious exit nodes to select traffic they wish to monitor.
For example, an exit node which wants to attack SSH sessions can currently list only port 22 in their exit policy.
Currently they will get a small amount of traffic compared to their capacity, but with the modification they will get a much larger share of SSH traffic.
However a malicious exit node could already do this, by artificially inflating their advertised bandwidth capacity.
\subsection{Further work}
To properly balance exit node usage, it is necessary to know the usage of the Tor network, by port.
McCoy \etal have figures for protocol usage in Tor, but these figures were generated through deep packet inspection, rather than by port number.
Furthermore, the exit node they ran used the fairly permissive default exit policy.
Therefore, their measurements will underestimate the relative traffic on ports which are present in the default exit policy, and are also present in more restrictive policies.
To accurately estimate the Tor network usage by port, it is necessary to measure the network usage by port on one or more exit nodes, while simultaneously recording the exit policy of all other exit nodes considered usable.
\section{Altering node selection algorithm}
Currently Tor selects nodes with a probability proportional to their bandwidth contribution to the network, however this may not be the optimal algorithm.
Murdoch and Watson investigated the performance impact of different node selection algorithms, and derived a formula for estimating average latency $T$:
\begin{equation}
T = \sum_{i=1}^n q_i t_i = \sum_{i=1}^n \frac{q_i x_i (2 - q_i x_i \Lambda)}{2 (1 - q_i x_i \Lambda)}
\label{eqn:waiting}
\end{equation}
Where $q_i$ is the probability of the $i$th node (out of $n$ nodes) being selected, $t_i$ is the average latency at the $i$th node, $x_i$ is the reciprocal of the $i$th node's bandwidth, and $\Lambda$ is the total network load.
This calculation is subject to a number of assumptions.
In particular, it assumes that Tor nodes have infinite length queues and input traffic is Poisson distributed.
Whereas in practise Tor nodes have finite length queues (which controls network load), and the distribution of input cells is not known.
Unfortunately, these assumptions are necessary to apply standard queueing theory results.
Despite the simplifications made to the network model, results derived from it may still be useful.
This is especially the case because it models the entire network, whereas experiments can feasibly change only a few of the clients' behaviour.
The formula is also amenable to mathematical analysis such as non-linear optimization.
To try and find the optimum node selection probabilities, I used a hill-climbing algorithm to minimize network latency, with a Tor directory snapshot as input.
The results (shown in \prettyref{fig:optimum-selection} and \prettyref{fig:relative-selection}) depend on the network load relative to overall capacity.
As load approaches capacity, the optimum selection probabilities converge to the one used by Tor: node bandwidth proportional to network capacity.
However, as load drops, the optimized selection algorithm favours slow nodes less and faster nodes more; many nodes are not used at all.
\begin{figure}
\includegraphics[width=\textwidth]{node-selection/optimum-selection-probabilities}
\caption{Optimum node selection probabilities for a variety of network loads. Tor is currently at around 50\% utilization. The node selection probabilities currently used by Tor are shown in black.}
\label{fig:optimum-selection}
\end{figure}
\begin{figure}
\includegraphics[width=\textwidth]{node-selection/relative-selection-probabilities}
\caption{Difference between Tor's current node selection probabilities and the optimum, for a variety of network loads. For Tor's current network load ($\approx 50$\%) shown in pink, the slowest nodes are not used at all, and the slower nodes are favoured less.}
\label{fig:relative-selection}
\end{figure}
\subsection{Impact of network load estimation}
The node selection probabilities discussed above are tuned to a particular level of network load.
It is possible to estimate network load because all Tor nodes report back both their capacity and usage in their descriptor.
However, this estimate may be inaccurate and it is possible that the network load will change over time.
In which case the node selection algorithm chosen will no longer be optimal.
\prettyref{fig:vary-load} shows how average network latency is affected by node selection probabilities, for different levels of network load.
As can be seen, small changes in network load do not significantly affect the network latency, and for all load levels examined, the optimized selection probabilities do offer lower latency when compared to the Tor selection algorithm.
However, each probability distribution has a cut-off point at which at least one node will have a higher load than its capacity, at which its queue length, and hence latency, will become infinite.
For the optimized selection probability distributions, this cut-off point is a few percent above the level they were designed to operate at.
For the Tor selection algorithm, it is when the overall network capacity equals the overall network load.
In this respect the Tor selection algorithm reaches the theoretical optimum, as no network can operate at greater than 100\% utilization while maintaining finite latency.
However, in a real instantiation of any of these alternative probability distributions, the network latency would not become infinite; instead a connection would time out and a different circuit would be selected.
So in practise, if the wrong probability distribution was selected, the network would converge at a different one.
Unfortunately the standard queuing theory models cannot handle this case and it seems likely that a simulation would be required to estimate the effect.
\begin{figure}
\includegraphics[width=\textwidth]{node-selection/vary-network-load}
\caption{Average network latency against network load. Three node selection probabilities are shown, optimized for 50\%, 75\%, and 90\% network load. The Tor node selection algorithm is also included (black). The dots on the $x$ axis show the level of network load at which the node selection probability distributions are optimized for. The line is cut off when the model predicts that at least one node will have an infinite queue length, which occurs before load $=$ capacity for all node selection algorithms except for Tor's current one.}
\label{fig:vary-load}
\end{figure}
\section{TLS application record overhead reduction}
OpenSSL will, by default, insert an empty TLS application record before any one which contains data.
This is to prevent an attack, by which someone who has partial control over the plaintext of a TLS stream, can also confirm guesses as to the plaintext which he does not control.
By including an empty application record, which incorporates a MAC, the attacker is made unable to control the CBC initialization vector, and hence does not have control of the input to the encryption function\footnote{\url{http://www.openssl.org/~bodo/tls-cbc.txt}}.
This application record does introduce an appreciable overhead.
Most Tor cells are sent in application records of their own, giving application records of 512 bytes (cell) $+$ 20 bytes (MAC) $+$ 12 bytes (TLS padding) $+$ 5 bytes (TLS application record header) $=$ 549 bytes.
The empty application records contain only 20 bytes (MAC) $+$ 12 bytes (TLS padding) $+$ 5 bytes (TLS application record header) $=$ 37 bytes.
There is also a 20 byte IP header and 32 byte TCP header.
Thus the overhead saved by removing the empty TLS application record itself is $37 / (549 + 37 + 20 + 32) = 5.8\%$.
This calculation is assuming that the same number of IP packets will be sent, because currently Tor sends packets, with only one cell, far smaller than the path MTU.
If Tor were to pack cells optimally efficiently into packets, then removing the empty application records would also reduce the number of packets, and hence TCP/IP headers, that needed to be sent.
The reduction in TCP/IP header overhead would be $37/(549 + 37) = 6.3\%$.
Of course, the empty application record was inserted for a reason -- to prevent an attack on the CBC mode of operation used by TLS, so before removing it we must be confident the attack does not apply to Tor.
Ben Laurie (one of the OpenSSL developers), concluded that in his opinion Tor could safely remove the insertion of empty TLS application records\footnote{\url{http://archives.seul.org/or/dev/Dec-2008/msg00005.html}}.
I was able to come up with only certificational weaknesses (discussed in the above analysis), which are expensive to exploit and give little information to the attacker.
To be successful, the attacker must have full control of the plaintext application record before the one he wishes to guess.
Tor makes this difficult because all cells where the payload is controlled by the attacker are prepended with a two byte circuit ID, unknown to the attacker.
Also, because the majority of cells sent in Tor are encrypted by a key not known by the attacker, the probability that an attacker can guess what a cell might be is extremely small.
The exception is a padding cell, which has no circuit ID and a zero length payload, however Tor does not currently send padding cells, other than as a periodic keep-alive.
\subsection*{Acknowledgements}
% Mike Perry provided many of the ideas discussed here
\end{document}
Other items to add in somewhere:
UDP transport
Mike and Fallon's proposal
Csaba's proposal to shrink the maximum circuit window.
If extending a circuit fails, try extending a few other places before
abandoning the circuit.