write remaining sections; edit some.

svn:r3515
This commit is contained in:
Nick Mathewson 2005-02-03 19:06:09 +00:00
parent 59e89e2b89
commit cd39e4fc62

View File

@ -724,55 +724,77 @@ transport a greater variety of protocols.
Though Tor has always been designed to be practical and usable first Though Tor has always been designed to be practical and usable first
with as much anonymity as can be built in subject to those goals, we with as much anonymity as can be built in subject to those goals, we
have contemplated that users might need resistance to at least simple have contemplated that users might need resistance to at least simple
traffic confirmation attacks. Raising the latency of communication traffic confirmation attacks. Higher-latency mix-networks resist these
slightly might make this feasible. If the latency could be kept to two attacks by introducing variability into message arrival times in order to
or three times its current overhead, this might be acceptable to the suppress timing correlation. Thus, it seems worthwhile to consider the
majority of Tor users. However, it might also destroy much of the user whether we can improving Tor's anonymity by introducing batching and delaying
base, and it is difficult to know in advance. Note also that in strategies to the Tor messages to prevent observers from linking incoming and
practice, as the network is growing and we accept cable modem, DSL outgoing traffic.
nodes, and more nodes in various continents, we're \emph{already}
looking at many-second delays for some transactions. The engineering Before we consider the engineering issues involved in the approach, of
required to get this lower is going to be extremely hard. It's worth course, we first need to study whether it can genuinely make users more
considering how hard it would be to accept the fixed (higher) latency anonymous. Research on end-to-end traffic analysis on higher-latency mix
and improve the protection we get from it. Thus, it may be most networks~\cite{e2e-traffic} indicates that as timing variance decreases,
practical to run a mid-latency option over the Tor network for those timing correlation attacks require increasingly less data; it might be the
users either willing to experiment or in need of more a priori case that Tor can't resist timing attacks for longer than a few minutes
anonymity in the network. This will allow us to experiment with both without increasing message delays to an unusable degree. Conversely, if Tor
can remain usable and slow timing attacks by even a matter of hours, this
would represent a significant improvement in practical anonymity: protecting
short-duration, once-off activities against a global observer is better than
protecting no activities at all. In order to answer this question, we might
try to adapt the techniques of~\cite{e2e-traffic} to a lower-latency mix
network, where instead of sending uncorrelated messages, users send batches
of cells in temporally clustered connections.
Once the anonymity questions are answered, we need to consider usability. If
the latency could be kept to two or three times its current overhead, this
might be acceptable to most Tor users. However, it might also destroy much of
the user base, and it is difficult to know in advance. Note also that in
practice, as the network grows to incorporate more DSL and cable-modem nodes,
and more nodes in various continents, this alone will \emph{already} cause
many-second delays for some transactions. Reducing this latency will be
hard, so perhaps it's worth considering whether accepting this higher latency
can improve the anonymity we provide. Also, it could be possible to
run a mid-latency option over the Tor network for those
users either willing to experiment or in need of more
anonymity. This would allow us to experiment with both
the anonymity provided and the interest on the part of users. the anonymity provided and the interest on the part of users.
Adding a mid-latency option should not require significant fundamental Adding a mid-latency option should not require significant fundamental
change to the Tor client or server design; circuits can be labeled as change to the Tor client or server design; circuits could be labeled as
low or mid latency on servers as they are set up. Low-latency traffic low- or mid- latency as they are constructed. Low-latency traffic
would be processed as now. Packets on circuits that are mid-latency would be processed as now, while cells on on circuits that are mid-latency
would be sent in uniform size chunks at synchronized intervals. To would be sent in uniform-size chunks at synchronized intervals. (Traffic
some extent the chunking is already done because traffic moves through already moves through the Tor network in fixed-sized cells; this would
the network in uniform size cells, but this would occur at a coarser increase the granularity.) If servers forward these chunks in roughly
granularity. If servers forward these chunks in roughly synchronous synchronous fashion, it will increase the similarity of data stream timing
fashion, it will increase the similarity of data stream timing
signatures. By experimenting with the granularity of data chunks and signatures. By experimenting with the granularity of data chunks and
of synchronization we can attempt once again to optimize for both of synchronization we can attempt once again to optimize for both
usability and anonymity. Unlike in \cite{sync-batching}, it may be usability and anonymity. Unlike in \cite{sync-batching}, it may be
impractical to synchronize on network batches by dropping chunks from impractical to synchronize on network batches by dropping chunks from
a batch that arrive late at a given node---unless Tor moves away from a batch that arrive late at a given node---unless Tor moves away from
stream processing to a more loss-tolerant processing of traffic (cf.\ stream processing to a more loss-tolerant paradigm (cf.\
Section~\ref{subsec:tcp-vs-ip}). In other words, there would Section~\ref{subsec:tcp-vs-ip}). Instead, batch timing would be obscured by
probably be no direct attempt to synchronize on batches of data synchronizing batches at the link level, and there would
entering the Tor network at the same time. Rather, it is the link be no direct attempt to synchronize all batches
level batching that will add noise to the traffic patterns entering entering the Tor network at the same time.
and passing through the %Alternatively, if end-to-end traffic confirmation is the
network. Similarly, if end-to-end traffic confirmation is the %concern, there is little point in mixing.
concern, there is little point in mixing. It might also be feasible to % Why not?? -NM
It might also be feasible to
pad chunks to uniform size as is done now for cells; if this is link pad chunks to uniform size as is done now for cells; if this is link
padding rather than end-to-end, then it will take less overhead, padding rather than end-to-end, then it will take less overhead,
especially in bursty environments. This is another way in which it especially in bursty environments.
would be fairly practical to set up a mid-latency option within the % This is another way in which it
existing Tor network. Other padding regimens might supplement the %would be fairly practical to set up a mid-latency option within the
%existing Tor network.
Other padding regimens might supplement the
mid-latency option; however, we should continue the caution with which mid-latency option; however, we should continue the caution with which
we have always approached padding lest the overhead cost us too much we have always approached padding lest the overhead cost us too much
performance or too many volunteers. performance or too many volunteers.
The distinction between traffic confirmation and traffic analysis is The distinction between traffic confirmation and traffic analysis is
not as practically cut and dried as we might wish. In \cite{hintz-pet02} it was not as cut and dried as we might wish. In \cite{hintz-pet02} it was
shown that if data volumes of various popular shown that if data volumes of various popular
responder destinations are catalogued, it may not be necessary to responder destinations are catalogued, it may not be necessary to
observe both ends of a stream to confirm a source-destination link. observe both ends of a stream to confirm a source-destination link.
@ -811,42 +833,96 @@ be another significant step to evaluating resistance to such attacks.
The attacks in \cite{attack-tor-oak05} are also dependent on The attacks in \cite{attack-tor-oak05} are also dependent on
cooperation of the responding application or the ability to modify or cooperation of the responding application or the ability to modify or
monitor the responder stream, in order of decreasing attack monitor the responder stream, in order of decreasing attack
effectiveness. So, another way to counter these attacks in some cases effectiveness. So, another way to slow some of these attacks
would be to employ caching of responses. This is infeasible for would be to cache responses at exit servers where possible, as it is with
application data that is not relatively static and from frequently DNS lookups and cacheable HTTP responses. Caching would, however,
visited sites; however, it might be useful for DNS lookups. This is create threats of its own.
also likely to be trading one practical threat for another. To be %To be
useful, such caches would need to be distributed to any likely exit %useful, such caches would need to be distributed to any likely exit
nodes of recurred requests for the same data. Aside from the logistic %nodes of recurred requests for the same data.
difficulties and overhead of distribution, they constitute a collected % Even local caches could be useful, I think. -NM
record of destinations and/or data visited by Tor users. While Aside from the logistic
difficulties and overhead, caches would constitute a
record of destinations and data visited by Tor users. While
limited to network insiders, given the need for wide distribution limited to network insiders, given the need for wide distribution
they could serve as useful data to an attacker deciding which locations they could serve as useful data to an attacker deciding which locations
to target for confirmation. A way to counter this distribution to target for confirmation. A way to counter this distribution
threat might be to only cache at certain semitrusted helper nodes. threat might be to only cache at certain semitrusted helper nodes.
%Does that cacheing discussion belong in low-latency?
[nick will work on this] \subsection{Application support: SOCKS and beyond}
\subsection{Application support: socks doesn't solve all our problems} Tor supports the SOCKS protocol, which provides a standardized interface for
generic TCP proxies. Unfortunately, this is not a complete solution for
many applications and platforms:
\begin{tightlist}
\item Many applications do not support SOCKS. To support such applications,
it's necessary to replace the networking system calls with SOCKS-aware
versions, or to run a local SOCKS tunnel and convince the applications to
connect to localhost. Neither of these tasks is easy for the average user,
even with good instructions.
\item Even when applications do use SOCKS, they often make DNS requests
themselves. (The various versions of the SOCKS protocol include some where
the application tells the proxy an IP address, and some where it sends a
hostname.) By connecting to the DNS sever directly, the application breaks
the user's anonymity and advertises where it is about to connect.
\end{tightlist}
socks4a isn't everywhere. the dns problem. etc. So in order to actually provide good anonymity, we need to make sure that
users have a practical way to use Tor anonymously. Possibilities include
nick will work on this. writing wrappers for applications to anonymize them automatically; improving
the applications' support for SOCKS; writing libraries to help application
writers use Tor properly; and implementing a local DNS proxy to reroute DNS
requests to Tor so that applications can simply point their DNS resolvers at
localhost and continue to use SOCKS for data only.
\subsection{Measuring performance and capacity} \subsection{Measuring performance and capacity}
One of the paradoxes with engineering an anonymity network is that we'd like
to learn as much as we can about how traffic flows so we can improve the
network, but we want to prevent others from learning how traffic flows in
order to trace users' connections through the network. Furthermore, many
mechanisms that help Tor run efficiently (such as having clients choose servers
based on their capacities) require measurements about the network.
How to measure performance without letting people selectively deny service Currently, servers record their bandwidth use in 15-minute intervals and
by distinguishing pings. Heck, just how to measure performance at all. In include this information in the descriptors they upload to the directory.
practice people have funny firewalls that don't match up to their exit They also try to deduce their own available bandwidth, on the basis of how
policies and Tor doesn't deal. much traffic they have been able to transfer recently, and upload this
information as well.
Network investigation: Is all this bandwidth publishing thing a good idea? This is, of course, eminantly cheatable. A malicious server can get a
How can we collect stats better? Note weasel's smokeping, at disproportionate amount of traffic simply by claiming to have more bandiwdth
http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor than it does. But better mechanisms have their problems. If bandwidth data
which probably gives george and steven enough info to break tor? is to be measured rather than self-reported, it is usually possible for
servers to selectively provide better service for the measuring party, or
sabotage the measured value of other servers. Complex solutions for
mix networks have been proposed, but do not address the issues
completely~\cite{mix-acc,casc-rep}.
[nick will work on this section, unless arma gets there first] Even without the possibility of cheating, network measurement is
non-trivial. It is far from unusual for one observer's view of a server's
latency or bandwidth to disagree wildly with another's. Furthermore, it is
unclear whether total bandwidth is really the right measure; perhaps clients
should be considering servers on the basis of unused bandwidth instead, or
perhaps observed throughput.
% XXXX say more here?
%How to measure performance without letting people selectively deny service
%by distinguishing pings. Heck, just how to measure performance at all. In
%practice people have funny firewalls that don't match up to their exit
%policies and Tor doesn't deal.
%Network investigation: Is all this bandwidth publishing thing a good idea?
%How can we collect stats better? Note weasel's smokeping, at
%http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor
%which probably gives george and steven enough info to break tor?
Even if we can collect and use this network information effectively, we need
to make sure that it is not more useful to attackers than to us. While it
seems plausible that bandwidth data alone is not enough to reveal
sender-recipient connections under most circumstances, it could certainly
reveal the path taken by large traffic flows under low-usage circumstances.
\subsection{Running a Tor server, path length, and helper nodes} \subsection{Running a Tor server, path length, and helper nodes}
@ -913,27 +989,57 @@ your computer is doing that behavior.
\subsection{Helper nodes} \subsection{Helper nodes}
\label{subsec:helper-nodes} \label{subsec:helper-nodes}
When does fixing your entry or exit node help you? Tor can only provide anonymity against an attacker if that attacker can't
Helper nodes in the literature don't deal with churn, and monitor the user's entry and exit on the Tor network. But since Tor
especially active attacks to induce churn. currently chooses entry and exit points randomly and changes them frequently,
a patient attacker who controls a single entry and a single exit is sure to
eventually break some circuits of frequent users who consider those servers.
(We assume that users are as concerned about statistical profiling as about
the anonymity any particular connection. That is, it is almost as bad to
leak the fact that Alice {\it sometimes} talks to Bob as it is to leak the times
when Alice is {\it actually} talking to Bob.)
Do general DoS attacks have anonymity implications? See e.g. Adam
Back's IH paper, but I think there's more to be pointed out here.
Game theory for helper nodes: if Alice offers a hidden service on a One solution to this problem is to use ``helper nodes''~\cite{helpers}---to
server (enclave model), and nobody ever uses helper nodes, then against have each client choose a few fixed servers for critical positions in her
George+Steven's attack she's totally nailed. If only Alice uses a helper circuits. That is, Alice might choose some server H1 as her preferred
node, then she's still identified as the source of the data. If everybody entry, so that unless the attacker happens to control or observe her
uses a helper node (including Alice), then the attack identifies the connection to H1, her circuits will remain anonymous. If H1 is compromised,
helper node and also Alice, and knows which one is which. If everybody Alice is vunerable as before. But now, at least, she has a chance of
uses a helper node (but not Alice), then the attacker figures the real not being profiled.
source was a client that is using Alice as a helper node. [How's my
logic here?]
point to routing-zones section re: helper nodes to defend against (Choosing fixed exit nodes is less useful, since the connection from the exit
big stuff. node to Alice's destination will be seen not only by the exit but by the
destination. Even if Alice chooses a good fixed exit node, she may
nevertheless connect to a hostile website.)
[nick will write this section] There are still obstacles remaining before helper nodes can be implemented.
For one, the litereature does not describe how to choose helpers from a list
of servers that changes over time. If Alice is forced to choose a new entry
helper every $d$ days, she can expect to choose a compromised server around
every $dc/n$ days. Worse, an attacker with the ability to DoS servers could
force their users to switch helper nodes more frequently.
%Do general DoS attacks have anonymity implications? See e.g. Adam
%Back's IH paper, but I think there's more to be pointed out here. -RD
% Not sure what you want to say here. -NM
%Game theory for helper nodes: if Alice offers a hidden service on a
%server (enclave model), and nobody ever uses helper nodes, then against
%George+Steven's attack she's totally nailed. If only Alice uses a helper
%node, then she's still identified as the source of the data. If everybody
%uses a helper node (including Alice), then the attack identifies the
%helper node and also Alice, and knows which one is which. If everybody
%uses a helper node (but not Alice), then the attacker figures the real
%source was a client that is using Alice as a helper node. [How's my
%logic here?] -RD
%
% Not sure about the logic. For the attack to work with helper nodes, the
%attacker needs to guess that Alice is running the hidden service, right?
%Otherwise, how can he know to measure her traffic specifically? -NM
%point to routing-zones section re: helper nodes to defend against
%big stuff.
\subsection{Location-hidden services} \subsection{Location-hidden services}
@ -1256,29 +1362,20 @@ help address censorship; we wish them luck.
\subsection{Non-clique topologies} \subsection{Non-clique topologies}
[nick will try to shrink this section] Tor's comparatively weak model makes it easier to scale than other mix net
designs. High-latency mix networks need to avoid partitioning attacks, where
Because of its threat model that is substantially weaker than high network splits prevent users of the separate partitions from providing cover
latency mixnets, Tor is actually in a potentially better position to for each other. In Tor, however, we assume that the adversary cannot
scale at least initially. From the perspective of a mix network, one cheaply observe nodes at will, so even if the network becomes split, the
of the worst things that can happen is partitioning. The more users do not necessarily receive much less protection.
potential senders of messages entering the network the better the Thus, a simple possibility when the scale of a Tor network
anonymity. Roughly, if a network is, e.g., split in half, then your
anonymity is cut in half. Attacks become half as hard (if they're
linear in network size), etc. In some sense this is still true for
Tor: if you want to know who Alice is talking to, you can watch her
for one end of a circuit. For a half size network, you then only have
to brute force examine half as many nodes to find the other end. But
Tor is not meant to cope with someone directly attacking many dozens
of nodes in a few minutes. It was meant to cope with traffic
confirmation attacks. And, these are independent of the size of the
network. So, a simple possibility when the scale of a Tor network
exceeds some size is to simply split it. Care could be taken in exceeds some size is to simply split it. Care could be taken in
allocating which nodes go to which network along the lines of allocating which nodes go to which network along the lines of
\cite{casc-rep} to insure that collaborating hostile nodes are not \cite{casc-rep} to insure that collaborating hostile nodes are not
able to gain any advantage in network splitting that they do not able to gain any advantage in network splitting that they do not
already have in joining a network. already have in joining a network.
% Describe these attacks; many people will not have read the paper!
The attacks in \cite{attack-tor-oak05} show that certain types of The attacks in \cite{attack-tor-oak05} show that certain types of
brute force attacks are in fact feasible; however they make the brute force attacks are in fact feasible; however they make the
above point stronger not weaker. The attacks do not appear to be above point stronger not weaker. The attacks do not appear to be
@ -1292,32 +1389,31 @@ basis. More analysis is needed; we simply note here that splitting
a Tor network is an easy way to achieve moderate scalability and that a Tor network is an easy way to achieve moderate scalability and that
it does not necessarily have the same implications as splitting a mixnet. it does not necessarily have the same implications as splitting a mixnet.
Alternatively, we can try to scale a single network. Some issues for Alternatively, we can try to scale a single Tor network. Some issues for
scaling include how many neighbors can nodes support and how many scaling include restricting the number of sockets and the amount of bandwidth
users (and how much application traffic capacity) can the network used by each server. The number of sockets is determined by the network's
handle for each new node that comes into the network. This depends on connectivity and the number of users, while bandwidth capacity is determined
many things, most notably the traffic capacity of the new nodes. We by the total bandwidth of servers on the network. The simplest solution to
can observe, however, that adding a tor node of any feasible bandwidth bandwidth capacity is to add more servers, since adding a tor node of any
will increase the traffic capacity of the network. This means that, as feasible bandwidth will increase the traffic capacity of the network. So as
a first step to scaling, we can focus on the interconnectivity of the a first step to scaling, we should focus on making the network tolerate more
nodes, followed by directories, discovery, etc. servers, by reducing the interconnectivity of the nodes; later we can reduce
overhead associated withy directories, discovery, and so on.
By reducing the connectivity of the network we increase the total By reducing the connectivity of the network we increase the total number of
number of nodes that the network can contain. Anonymity implications nodes that the network can contain. Danezis~\cite{danezis-pet03} considers
of restricted routes for mix networks have already been explored by the anonymity implications of restricting routes on mix networks, and
Danezis~\cite{danezis-pets03}. That paper explicitly considered only recommends an approach based on expander graphs (where any subgraph is likely
traffic analysis resistance provided by a mix network and sidestepped to have many neighbors). It is not immediately clear that this approach will
questions of traffic confirmation resistance. But, Tor is designed extend to Tor, which has a weaker threat model but higher performance
only to resist traffic confirmation. For this and other reasons, we requirements than the network considered. Instead of analyzing the
cannot simply adopt his mixnet results to onion routing networks. If probability of an attacker's viewing whole paths, we will need to examine the
an attacker gains minimal increase in the likelyhood of compromising attacker's likelihood of compromising the endpoints of a Tor circuit through
the endpoints of a Tor circuit through a sparse network (vs.\ a clique a sparse network.
on the same node set), then the restriction will have had minimal
impact on the anonymity provided by that network.
The approach Danezis describes is based on expander graphs, i.e., % Nick edits these next 2 grafs.
graphs in which any subgraph of nodes is likely to have lots of nodes
as neighbors. For Tor, we may not need to have an expander per se, it To make matters simpler, Tor may not need an expander graph per se: it
may be enough to have a single subnet that is highly connected. As an may be enough to have a single subnet that is highly connected. As an
example, assume fifty nodes of relatively high traffic capacity. This example, assume fifty nodes of relatively high traffic capacity. This
\emph{center} forms are a clique. Assume each center node can each \emph{center} forms are a clique. Assume each center node can each
@ -1342,9 +1438,6 @@ only visible connection to Tor at those points where it connects.
As far as the public network is concerned or anyone observing it, As far as the public network is concerned or anyone observing it,
they are running clients. they are running clients.
\section{The Future} \section{The Future}
\label{sec:conclusion} \label{sec:conclusion}