write remaining sections; edit some.

svn:r3515
This commit is contained in:
Nick Mathewson 2005-02-03 19:06:09 +00:00
parent 59e89e2b89
commit cd39e4fc62

View File

@ -724,55 +724,77 @@ transport a greater variety of protocols.
Though Tor has always been designed to be practical and usable first
with as much anonymity as can be built in subject to those goals, we
have contemplated that users might need resistance to at least simple
traffic confirmation attacks. Raising the latency of communication
slightly might make this feasible. If the latency could be kept to two
or three times its current overhead, this might be acceptable to the
majority of Tor users. However, it might also destroy much of the user
base, and it is difficult to know in advance. Note also that in
practice, as the network is growing and we accept cable modem, DSL
nodes, and more nodes in various continents, we're \emph{already}
looking at many-second delays for some transactions. The engineering
required to get this lower is going to be extremely hard. It's worth
considering how hard it would be to accept the fixed (higher) latency
and improve the protection we get from it. Thus, it may be most
practical to run a mid-latency option over the Tor network for those
users either willing to experiment or in need of more a priori
anonymity in the network. This will allow us to experiment with both
traffic confirmation attacks. Higher-latency mix-networks resist these
attacks by introducing variability into message arrival times in order to
suppress timing correlation. Thus, it seems worthwhile to consider the
whether we can improving Tor's anonymity by introducing batching and delaying
strategies to the Tor messages to prevent observers from linking incoming and
outgoing traffic.
Before we consider the engineering issues involved in the approach, of
course, we first need to study whether it can genuinely make users more
anonymous. Research on end-to-end traffic analysis on higher-latency mix
networks~\cite{e2e-traffic} indicates that as timing variance decreases,
timing correlation attacks require increasingly less data; it might be the
case that Tor can't resist timing attacks for longer than a few minutes
without increasing message delays to an unusable degree. Conversely, if Tor
can remain usable and slow timing attacks by even a matter of hours, this
would represent a significant improvement in practical anonymity: protecting
short-duration, once-off activities against a global observer is better than
protecting no activities at all. In order to answer this question, we might
try to adapt the techniques of~\cite{e2e-traffic} to a lower-latency mix
network, where instead of sending uncorrelated messages, users send batches
of cells in temporally clustered connections.
Once the anonymity questions are answered, we need to consider usability. If
the latency could be kept to two or three times its current overhead, this
might be acceptable to most Tor users. However, it might also destroy much of
the user base, and it is difficult to know in advance. Note also that in
practice, as the network grows to incorporate more DSL and cable-modem nodes,
and more nodes in various continents, this alone will \emph{already} cause
many-second delays for some transactions. Reducing this latency will be
hard, so perhaps it's worth considering whether accepting this higher latency
can improve the anonymity we provide. Also, it could be possible to
run a mid-latency option over the Tor network for those
users either willing to experiment or in need of more
anonymity. This would allow us to experiment with both
the anonymity provided and the interest on the part of users.
Adding a mid-latency option should not require significant fundamental
change to the Tor client or server design; circuits can be labeled as
low or mid latency on servers as they are set up. Low-latency traffic
would be processed as now. Packets on circuits that are mid-latency
would be sent in uniform size chunks at synchronized intervals. To
some extent the chunking is already done because traffic moves through
the network in uniform size cells, but this would occur at a coarser
granularity. If servers forward these chunks in roughly synchronous
fashion, it will increase the similarity of data stream timing
change to the Tor client or server design; circuits could be labeled as
low- or mid- latency as they are constructed. Low-latency traffic
would be processed as now, while cells on on circuits that are mid-latency
would be sent in uniform-size chunks at synchronized intervals. (Traffic
already moves through the Tor network in fixed-sized cells; this would
increase the granularity.) If servers forward these chunks in roughly
synchronous fashion, it will increase the similarity of data stream timing
signatures. By experimenting with the granularity of data chunks and
of synchronization we can attempt once again to optimize for both
usability and anonymity. Unlike in \cite{sync-batching}, it may be
impractical to synchronize on network batches by dropping chunks from
a batch that arrive late at a given node---unless Tor moves away from
stream processing to a more loss-tolerant processing of traffic (cf.\
Section~\ref{subsec:tcp-vs-ip}). In other words, there would
probably be no direct attempt to synchronize on batches of data
entering the Tor network at the same time. Rather, it is the link
level batching that will add noise to the traffic patterns entering
and passing through the
network. Similarly, if end-to-end traffic confirmation is the
concern, there is little point in mixing. It might also be feasible to
stream processing to a more loss-tolerant paradigm (cf.\
Section~\ref{subsec:tcp-vs-ip}). Instead, batch timing would be obscured by
synchronizing batches at the link level, and there would
be no direct attempt to synchronize all batches
entering the Tor network at the same time.
%Alternatively, if end-to-end traffic confirmation is the
%concern, there is little point in mixing.
% Why not?? -NM
It might also be feasible to
pad chunks to uniform size as is done now for cells; if this is link
padding rather than end-to-end, then it will take less overhead,
especially in bursty environments. This is another way in which it
would be fairly practical to set up a mid-latency option within the
existing Tor network. Other padding regimens might supplement the
especially in bursty environments.
% This is another way in which it
%would be fairly practical to set up a mid-latency option within the
%existing Tor network.
Other padding regimens might supplement the
mid-latency option; however, we should continue the caution with which
we have always approached padding lest the overhead cost us too much
performance or too many volunteers.
The distinction between traffic confirmation and traffic analysis is
not as practically cut and dried as we might wish. In \cite{hintz-pet02} it was
not as cut and dried as we might wish. In \cite{hintz-pet02} it was
shown that if data volumes of various popular
responder destinations are catalogued, it may not be necessary to
observe both ends of a stream to confirm a source-destination link.
@ -811,42 +833,96 @@ be another significant step to evaluating resistance to such attacks.
The attacks in \cite{attack-tor-oak05} are also dependent on
cooperation of the responding application or the ability to modify or
monitor the responder stream, in order of decreasing attack
effectiveness. So, another way to counter these attacks in some cases
would be to employ caching of responses. This is infeasible for
application data that is not relatively static and from frequently
visited sites; however, it might be useful for DNS lookups. This is
also likely to be trading one practical threat for another. To be
useful, such caches would need to be distributed to any likely exit
nodes of recurred requests for the same data. Aside from the logistic
difficulties and overhead of distribution, they constitute a collected
record of destinations and/or data visited by Tor users. While
effectiveness. So, another way to slow some of these attacks
would be to cache responses at exit servers where possible, as it is with
DNS lookups and cacheable HTTP responses. Caching would, however,
create threats of its own.
%To be
%useful, such caches would need to be distributed to any likely exit
%nodes of recurred requests for the same data.
% Even local caches could be useful, I think. -NM
Aside from the logistic
difficulties and overhead, caches would constitute a
record of destinations and data visited by Tor users. While
limited to network insiders, given the need for wide distribution
they could serve as useful data to an attacker deciding which locations
to target for confirmation. A way to counter this distribution
threat might be to only cache at certain semitrusted helper nodes.
%Does that cacheing discussion belong in low-latency?
[nick will work on this]
\subsection{Application support: SOCKS and beyond}
\subsection{Application support: socks doesn't solve all our problems}
Tor supports the SOCKS protocol, which provides a standardized interface for
generic TCP proxies. Unfortunately, this is not a complete solution for
many applications and platforms:
\begin{tightlist}
\item Many applications do not support SOCKS. To support such applications,
it's necessary to replace the networking system calls with SOCKS-aware
versions, or to run a local SOCKS tunnel and convince the applications to
connect to localhost. Neither of these tasks is easy for the average user,
even with good instructions.
\item Even when applications do use SOCKS, they often make DNS requests
themselves. (The various versions of the SOCKS protocol include some where
the application tells the proxy an IP address, and some where it sends a
hostname.) By connecting to the DNS sever directly, the application breaks
the user's anonymity and advertises where it is about to connect.
\end{tightlist}
socks4a isn't everywhere. the dns problem. etc.
nick will work on this.
So in order to actually provide good anonymity, we need to make sure that
users have a practical way to use Tor anonymously. Possibilities include
writing wrappers for applications to anonymize them automatically; improving
the applications' support for SOCKS; writing libraries to help application
writers use Tor properly; and implementing a local DNS proxy to reroute DNS
requests to Tor so that applications can simply point their DNS resolvers at
localhost and continue to use SOCKS for data only.
\subsection{Measuring performance and capacity}
One of the paradoxes with engineering an anonymity network is that we'd like
to learn as much as we can about how traffic flows so we can improve the
network, but we want to prevent others from learning how traffic flows in
order to trace users' connections through the network. Furthermore, many
mechanisms that help Tor run efficiently (such as having clients choose servers
based on their capacities) require measurements about the network.
How to measure performance without letting people selectively deny service
by distinguishing pings. Heck, just how to measure performance at all. In
practice people have funny firewalls that don't match up to their exit
policies and Tor doesn't deal.
Currently, servers record their bandwidth use in 15-minute intervals and
include this information in the descriptors they upload to the directory.
They also try to deduce their own available bandwidth, on the basis of how
much traffic they have been able to transfer recently, and upload this
information as well.
Network investigation: Is all this bandwidth publishing thing a good idea?
How can we collect stats better? Note weasel's smokeping, at
http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor
which probably gives george and steven enough info to break tor?
This is, of course, eminantly cheatable. A malicious server can get a
disproportionate amount of traffic simply by claiming to have more bandiwdth
than it does. But better mechanisms have their problems. If bandwidth data
is to be measured rather than self-reported, it is usually possible for
servers to selectively provide better service for the measuring party, or
sabotage the measured value of other servers. Complex solutions for
mix networks have been proposed, but do not address the issues
completely~\cite{mix-acc,casc-rep}.
[nick will work on this section, unless arma gets there first]
Even without the possibility of cheating, network measurement is
non-trivial. It is far from unusual for one observer's view of a server's
latency or bandwidth to disagree wildly with another's. Furthermore, it is
unclear whether total bandwidth is really the right measure; perhaps clients
should be considering servers on the basis of unused bandwidth instead, or
perhaps observed throughput.
% XXXX say more here?
%How to measure performance without letting people selectively deny service
%by distinguishing pings. Heck, just how to measure performance at all. In
%practice people have funny firewalls that don't match up to their exit
%policies and Tor doesn't deal.
%Network investigation: Is all this bandwidth publishing thing a good idea?
%How can we collect stats better? Note weasel's smokeping, at
%http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor
%which probably gives george and steven enough info to break tor?
Even if we can collect and use this network information effectively, we need
to make sure that it is not more useful to attackers than to us. While it
seems plausible that bandwidth data alone is not enough to reveal
sender-recipient connections under most circumstances, it could certainly
reveal the path taken by large traffic flows under low-usage circumstances.
\subsection{Running a Tor server, path length, and helper nodes}
@ -913,27 +989,57 @@ your computer is doing that behavior.
\subsection{Helper nodes}
\label{subsec:helper-nodes}
When does fixing your entry or exit node help you?
Helper nodes in the literature don't deal with churn, and
especially active attacks to induce churn.
Tor can only provide anonymity against an attacker if that attacker can't
monitor the user's entry and exit on the Tor network. But since Tor
currently chooses entry and exit points randomly and changes them frequently,
a patient attacker who controls a single entry and a single exit is sure to
eventually break some circuits of frequent users who consider those servers.
(We assume that users are as concerned about statistical profiling as about
the anonymity any particular connection. That is, it is almost as bad to
leak the fact that Alice {\it sometimes} talks to Bob as it is to leak the times
when Alice is {\it actually} talking to Bob.)
Do general DoS attacks have anonymity implications? See e.g. Adam
Back's IH paper, but I think there's more to be pointed out here.
Game theory for helper nodes: if Alice offers a hidden service on a
server (enclave model), and nobody ever uses helper nodes, then against
George+Steven's attack she's totally nailed. If only Alice uses a helper
node, then she's still identified as the source of the data. If everybody
uses a helper node (including Alice), then the attack identifies the
helper node and also Alice, and knows which one is which. If everybody
uses a helper node (but not Alice), then the attacker figures the real
source was a client that is using Alice as a helper node. [How's my
logic here?]
One solution to this problem is to use ``helper nodes''~\cite{helpers}---to
have each client choose a few fixed servers for critical positions in her
circuits. That is, Alice might choose some server H1 as her preferred
entry, so that unless the attacker happens to control or observe her
connection to H1, her circuits will remain anonymous. If H1 is compromised,
Alice is vunerable as before. But now, at least, she has a chance of
not being profiled.
point to routing-zones section re: helper nodes to defend against
big stuff.
(Choosing fixed exit nodes is less useful, since the connection from the exit
node to Alice's destination will be seen not only by the exit but by the
destination. Even if Alice chooses a good fixed exit node, she may
nevertheless connect to a hostile website.)
[nick will write this section]
There are still obstacles remaining before helper nodes can be implemented.
For one, the litereature does not describe how to choose helpers from a list
of servers that changes over time. If Alice is forced to choose a new entry
helper every $d$ days, she can expect to choose a compromised server around
every $dc/n$ days. Worse, an attacker with the ability to DoS servers could
force their users to switch helper nodes more frequently.
%Do general DoS attacks have anonymity implications? See e.g. Adam
%Back's IH paper, but I think there's more to be pointed out here. -RD
% Not sure what you want to say here. -NM
%Game theory for helper nodes: if Alice offers a hidden service on a
%server (enclave model), and nobody ever uses helper nodes, then against
%George+Steven's attack she's totally nailed. If only Alice uses a helper
%node, then she's still identified as the source of the data. If everybody
%uses a helper node (including Alice), then the attack identifies the
%helper node and also Alice, and knows which one is which. If everybody
%uses a helper node (but not Alice), then the attacker figures the real
%source was a client that is using Alice as a helper node. [How's my
%logic here?] -RD
%
% Not sure about the logic. For the attack to work with helper nodes, the
%attacker needs to guess that Alice is running the hidden service, right?
%Otherwise, how can he know to measure her traffic specifically? -NM
%point to routing-zones section re: helper nodes to defend against
%big stuff.
\subsection{Location-hidden services}
@ -1256,29 +1362,20 @@ help address censorship; we wish them luck.
\subsection{Non-clique topologies}
[nick will try to shrink this section]
Because of its threat model that is substantially weaker than high
latency mixnets, Tor is actually in a potentially better position to
scale at least initially. From the perspective of a mix network, one
of the worst things that can happen is partitioning. The more
potential senders of messages entering the network the better the
anonymity. Roughly, if a network is, e.g., split in half, then your
anonymity is cut in half. Attacks become half as hard (if they're
linear in network size), etc. In some sense this is still true for
Tor: if you want to know who Alice is talking to, you can watch her
for one end of a circuit. For a half size network, you then only have
to brute force examine half as many nodes to find the other end. But
Tor is not meant to cope with someone directly attacking many dozens
of nodes in a few minutes. It was meant to cope with traffic
confirmation attacks. And, these are independent of the size of the
network. So, a simple possibility when the scale of a Tor network
Tor's comparatively weak model makes it easier to scale than other mix net
designs. High-latency mix networks need to avoid partitioning attacks, where
network splits prevent users of the separate partitions from providing cover
for each other. In Tor, however, we assume that the adversary cannot
cheaply observe nodes at will, so even if the network becomes split, the
users do not necessarily receive much less protection.
Thus, a simple possibility when the scale of a Tor network
exceeds some size is to simply split it. Care could be taken in
allocating which nodes go to which network along the lines of
\cite{casc-rep} to insure that collaborating hostile nodes are not
able to gain any advantage in network splitting that they do not
already have in joining a network.
% Describe these attacks; many people will not have read the paper!
The attacks in \cite{attack-tor-oak05} show that certain types of
brute force attacks are in fact feasible; however they make the
above point stronger not weaker. The attacks do not appear to be
@ -1292,32 +1389,31 @@ basis. More analysis is needed; we simply note here that splitting
a Tor network is an easy way to achieve moderate scalability and that
it does not necessarily have the same implications as splitting a mixnet.
Alternatively, we can try to scale a single network. Some issues for
scaling include how many neighbors can nodes support and how many
users (and how much application traffic capacity) can the network
handle for each new node that comes into the network. This depends on
many things, most notably the traffic capacity of the new nodes. We
can observe, however, that adding a tor node of any feasible bandwidth
will increase the traffic capacity of the network. This means that, as
a first step to scaling, we can focus on the interconnectivity of the
nodes, followed by directories, discovery, etc.
Alternatively, we can try to scale a single Tor network. Some issues for
scaling include restricting the number of sockets and the amount of bandwidth
used by each server. The number of sockets is determined by the network's
connectivity and the number of users, while bandwidth capacity is determined
by the total bandwidth of servers on the network. The simplest solution to
bandwidth capacity is to add more servers, since adding a tor node of any
feasible bandwidth will increase the traffic capacity of the network. So as
a first step to scaling, we should focus on making the network tolerate more
servers, by reducing the interconnectivity of the nodes; later we can reduce
overhead associated withy directories, discovery, and so on.
By reducing the connectivity of the network we increase the total
number of nodes that the network can contain. Anonymity implications
of restricted routes for mix networks have already been explored by
Danezis~\cite{danezis-pets03}. That paper explicitly considered only
traffic analysis resistance provided by a mix network and sidestepped
questions of traffic confirmation resistance. But, Tor is designed
only to resist traffic confirmation. For this and other reasons, we
cannot simply adopt his mixnet results to onion routing networks. If
an attacker gains minimal increase in the likelyhood of compromising
the endpoints of a Tor circuit through a sparse network (vs.\ a clique
on the same node set), then the restriction will have had minimal
impact on the anonymity provided by that network.
By reducing the connectivity of the network we increase the total number of
nodes that the network can contain. Danezis~\cite{danezis-pet03} considers
the anonymity implications of restricting routes on mix networks, and
recommends an approach based on expander graphs (where any subgraph is likely
to have many neighbors). It is not immediately clear that this approach will
extend to Tor, which has a weaker threat model but higher performance
requirements than the network considered. Instead of analyzing the
probability of an attacker's viewing whole paths, we will need to examine the
attacker's likelihood of compromising the endpoints of a Tor circuit through
a sparse network.
The approach Danezis describes is based on expander graphs, i.e.,
graphs in which any subgraph of nodes is likely to have lots of nodes
as neighbors. For Tor, we may not need to have an expander per se, it
% Nick edits these next 2 grafs.
To make matters simpler, Tor may not need an expander graph per se: it
may be enough to have a single subnet that is highly connected. As an
example, assume fifty nodes of relatively high traffic capacity. This
\emph{center} forms are a clique. Assume each center node can each
@ -1342,9 +1438,6 @@ only visible connection to Tor at those points where it connects.
As far as the public network is concerned or anyone observing it,
they are running clients.
\section{The Future}
\label{sec:conclusion}