mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-10 21:23:58 +01:00
bacdecd93a
svn:r3571
1609 lines
86 KiB
TeX
1609 lines
86 KiB
TeX
\documentclass{llncs}
|
|
% XXXX NM: Fold ``bandwidth and usability'' into ``Tor and filesharing'' --
|
|
% ``bandwidth and file-sharing''.
|
|
|
|
\usepackage{url}
|
|
\usepackage{amsmath}
|
|
\usepackage{epsfig}
|
|
|
|
\newenvironment{tightlist}{\begin{list}{$\bullet$}{
|
|
\setlength{\itemsep}{0mm}
|
|
\setlength{\parsep}{0mm}
|
|
% \setlength{\labelsep}{0mm}
|
|
% \setlength{\labelwidth}{0mm}
|
|
% \setlength{\topsep}{0mm}
|
|
}}{\end{list}}
|
|
|
|
\begin{document}
|
|
|
|
\title{Challenges in deploying low-latency anonymity (DRAFT)}
|
|
|
|
%\author{Roger Dingledine and Nick Mathewson and }
|
|
%\institute{The Free Haven Project\\
|
|
%\email{\{arma,nickm\}@freehaven.net}}
|
|
\author{Roger Dingledine \\ The Free Haven Project \\ arma@freehaven.net \and
|
|
Nick Mathewson \\ The Free Haven Project \\ nickm@freehaven.net \and
|
|
Paul Syverson \\ Naval Research Lab \\ syverson@itd.nrl.navy.mil}
|
|
|
|
\maketitle
|
|
\pagestyle{empty}
|
|
|
|
\begin{abstract}
|
|
There are many unexpected or unexpectedly difficult obstacles to
|
|
deploying anonymous communications. Drawing on our experiences deploying
|
|
Tor (the next-generation onion routing network), we describe social
|
|
challenges and technical issues that must be faced
|
|
in building, deploying, and sustaining a scalable, distributed, low-latency
|
|
anonymity network.
|
|
\end{abstract}
|
|
|
|
\section{Introduction}
|
|
% Your network is not practical unless it is sustainable and distributed.
|
|
Anonymous communication is full of surprises. This paper discusses some
|
|
unexpected challenges arising from our experiences deploying Tor, a
|
|
low-latency general-purpose anonymous communication system. We will discuss
|
|
some of the difficulties we have experienced and how we have met them (or how
|
|
we plan to meet them, if we know). We will also discuss some less
|
|
troublesome open problems that we must nevertheless eventually address.
|
|
%We will describe both those future challenges that we intend to explore and
|
|
%those that we have decided not to explore and why.
|
|
|
|
Tor is an overlay network for anonymizing TCP streams over the
|
|
Internet~\cite{tor-design}. It addresses limitations in earlier Onion
|
|
Routing designs~\cite{or-ih96,or-jsac98,or-discex00,or-pet00} by adding
|
|
perfect forward secrecy, congestion control, directory servers, integrity
|
|
checking, configurable exit policies, and location-hidden services using
|
|
rendezvous points. Tor works on the real-world Internet, requires no special
|
|
privileges or kernel modifications, requires little synchronization or
|
|
coordination between nodes, and provides a reasonable tradeoff between
|
|
anonymity, usability, and efficiency.
|
|
|
|
We first publicly deployed a Tor network in October 2003; since then it has
|
|
grown to over a hundred volunteer servers and as much as 80 megabits of
|
|
average traffic per second. Tor's research strategy has focused on deploying
|
|
a network to as many users as possible; thus, we have resisted designs that
|
|
would compromise deployability by imposing high resource demands on server
|
|
operators, and designs that would compromise usability by imposing
|
|
unacceptable restrictions on which applications we support. Although this
|
|
strategy has
|
|
its drawbacks (including a weakened threat model, as discussed below), it has
|
|
made it possible for Tor to serve many thousands of users, and attract
|
|
research funding from organizations so diverse as ONR and DARPA
|
|
(for use in securing sensitive communications), and the Electronic Frontier
|
|
Foundation (for maintaining civil liberties of ordinary citizens online).
|
|
|
|
While the Tor design paper~\cite{tor-design} gives an overall view of Tor's
|
|
design and goals, this paper describes some policy, social, and technical
|
|
issues that we face as we continue deployment.
|
|
Rather than trying to provide complete solutions to every problem here, we
|
|
lay out the assumptions and constraints that we have observed while
|
|
deploying Tor in the wild. In doing so, we aim to create a research agenda
|
|
for others to help in addressing these issues. We believe that the issues
|
|
described here will be of general interest to projects attempting to build
|
|
and deploy practical, useable anonymity networks in the wild.
|
|
|
|
% ----------------
|
|
|
|
Tor research and development has been funded by the U.S.~Navy and DARPA
|
|
for use in securing government
|
|
communications, and by the Electronic Frontier Foundation, for use
|
|
in maintaining civil liberties for ordinary citizens online. The Tor
|
|
protocol is one of the leading choices
|
|
to be the anonymizing layer in the European Union's PRIME directive to
|
|
help maintain privacy in Europe. The University of Dresden in Germany
|
|
has integrated an independent implementation of the Tor protocol into
|
|
their popular Java Anon Proxy anonymizing client. This wide variety of
|
|
interests helps maintain both the stability and the security of the
|
|
network.
|
|
|
|
|
|
%While the Tor design paper~\cite{tor-design} gives an overall view its
|
|
%design and goals,
|
|
%this paper describes the policy and technical issues that Tor faces as
|
|
%we continue deployment. Rather than trying to provide complete solutions
|
|
%to every problem here, we lay out the assumptions and constraints
|
|
%that we have observed through deploying Tor in the wild. In doing so, we
|
|
%aim to create a research agenda for others to
|
|
%help in addressing these issues.
|
|
% Section~\ref{sec:what-is-tor} gives an
|
|
%overview of the Tor
|
|
%design and ours goals. Sections~\ref{sec:crossroads-policy}
|
|
%and~\ref{sec:crossroads-design} go on to describe the practical challenges,
|
|
%both policy and technical respectively,
|
|
%that stand in the way of moving
|
|
%from a practical useful network to a practical useful anonymous network.
|
|
|
|
%\section{What Is Tor}
|
|
\section{Background}
|
|
Here we give a basic overview of the Tor design and its properties, and
|
|
compare Tor to other low-latency anonymity designs.
|
|
|
|
\subsection{Tor, threat models, and distributed trust}
|
|
\label{sec:what-is-tor}
|
|
|
|
%Here we give a basic overview of the Tor design and its properties. For
|
|
%details on the design, assumptions, and security arguments, we refer
|
|
%the reader to the Tor design paper~\cite{tor-design}.
|
|
|
|
\subsubsection{How Tor works}
|
|
Tor provides \emph{forward privacy}, so that users can connect to
|
|
Internet sites without revealing their logical or physical locations
|
|
to those sites or to observers. It also provides \emph{location-hidden
|
|
services}, so that critical servers can support authorized users without
|
|
giving adversaries an effective vector for physical or online attacks.
|
|
The design provides these protections even when a portion of its own
|
|
infrastructure is controlled by an adversary.
|
|
|
|
To create a private network pathway with Tor, the client software
|
|
incrementally builds a \emph{circuit} of encrypted connections through
|
|
servers on the network. The circuit is extended one hop at a time, and
|
|
each server along the way knows only which server gave it data and which
|
|
server it is giving data to. No individual server ever knows the complete
|
|
path that a data packet has taken. The client negotiates a separate set
|
|
of encryption keys for each hop along the circuit.% to ensure that each
|
|
%hop can't trace these connections as they pass through.
|
|
Because each server sees no more than one hop in the
|
|
circuit, neither an eavesdropper nor a compromised server can use traffic
|
|
analysis to link the connection's source and destination.
|
|
For efficiency, the Tor software uses the same circuit for all the TCP
|
|
connections that happen within the same short period.
|
|
Later requests use a new
|
|
circuit, to prevent long-term linkability between different actions by
|
|
a single user.
|
|
|
|
Tor also makes it possible for users to hide their locations while
|
|
offering various kinds of services, such as web publishing or an instant
|
|
messaging server. Using ``rendezvous points'', other Tor users can
|
|
connect to these hidden services, each without knowing the other's network
|
|
identity.
|
|
|
|
Tor attempts to anonymize the transport layer, not the application layer, so
|
|
application protocols that include personally identifying information need
|
|
additional application-level scrubbing proxies, such as
|
|
Privoxy~\cite{privoxy} for HTTP. Furthermore, Tor does not permit arbitrary
|
|
IP packets; it only anonymizes TCP streams and DNS request, and only supports
|
|
connections via SOCKS (see Section~\ref{subsec:tcp-vs-ip}).
|
|
|
|
Most servers operators do not want to allow arbitary TCP connections to leave
|
|
their servers. To address this, Tor provides \emph{exit policies} so that
|
|
each server can block the IP addresses and ports it is unwilling to allow.
|
|
Servers advertise their exit policies to the directory servers, so that
|
|
client can tell which servers will support their connections.
|
|
|
|
As of January 2005, the Tor network has grown to around a hundred servers
|
|
on four continents, with a total capacity exceeding 1Gbit/s. Appendix A
|
|
shows a graph of the number of working servers over time, as well as a
|
|
vgraph of the number of bytes being handled by the network over time. At
|
|
this point the network is sufficiently diverse for further development
|
|
and testing; but of course we always encourage and welcome new servers
|
|
to join the network.
|
|
|
|
\subsubsection{Threat models and design philosophy}
|
|
The ideal Tor network would be practical, useful and and anonymous. When
|
|
trade-offs arise between these properties, Tor's research strategy has been
|
|
to insist on remaining useful enough to attract many users,
|
|
and practical enough to support them. Only subject to these
|
|
constraints do we aim to maximize
|
|
anonymity.\footnote{This is not the only possible
|
|
direction in anonymity research: designs exist that provide more anonymity
|
|
than Tor at the expense of significantly increased resource requirements, or
|
|
decreased flexibility in application support (typically because of increased
|
|
latency). Such research does not typically abandon aspirations towards
|
|
deployability or utility, but instead tries to maximize deployability and
|
|
utility subject to a certain degree of inherent anonymity (inherent because
|
|
usability and practicality affect usage which affects the actual anonymity
|
|
provided by the network \cite{back01,econymics}). We believe that these
|
|
approaches can be promising and useful, but that by focusing on deploying a
|
|
usable system in the wild, Tor helps us experiment with the actual parameters
|
|
of what makes a system ``practical'' for volunteer operators and ``useful''
|
|
for home users, and helps illuminate undernoticed issues which any deployed
|
|
volunteer anonymity network will need to address.}
|
|
Because of this strategy, Tor has a weaker threat model than many anonymity
|
|
designs in the literature. In particular, because we
|
|
support interactive communications without impractically expensive padding,
|
|
we fall prey to a variety
|
|
of intra-network~\cite{back01,attack-tor-oak05,flow-correlation04} and
|
|
end-to-end~\cite{danezis-pet2004,SS03} anonymity-breaking attacks.
|
|
|
|
|
|
Tor does not attempt to defend against a global observer. In general, an
|
|
attacker who can observe both ends of a connection through the Tor network
|
|
can correlate the timing and volume of data on that connection as it enters
|
|
and leaves the network, and so link a user to her chosen communication
|
|
parties. Known solutions to this attack would seem to require introducing a
|
|
prohibitive degree of traffic padding between the user and the network, or
|
|
introducing an unacceptable degree of latency (but see Section
|
|
\ref{subsec:mid-latency}). Also, it is not clear that these methods would
|
|
work at all against a minimally active adversary that can introduce timing
|
|
patterns or additional traffic. Thus, Tor only attempts to defend against
|
|
external observers who cannot observe both sides of a user's connection.
|
|
|
|
Against internal attackers who sign up Tor servers, the situation is more
|
|
complicated. In the simplest case, if an adversary has compromised $c$ of
|
|
$n$ servers on the Tor network, then the adversary will be able to compromise
|
|
a random circuit with probability $\frac{c^2}{n^2}$ (since the circuit
|
|
initiator chooses hops randomly). But there are
|
|
complicating factors:
|
|
\begin{tightlist}
|
|
\item If the user continues to build random circuits over time, an adversary
|
|
is pretty certain to see a statistical sample of the user's traffic, and
|
|
thereby can build an increasingly accurate profile of her behavior. (See
|
|
\ref{subsec:helper-nodes} for possible solutions.)
|
|
\item An adversary who controls a popular service outside of the Tor network
|
|
can be certain of observing all connections to that service; he
|
|
therefore will trace connections to that service with probability
|
|
$\frac{c}{n}$.
|
|
\item Users do not in fact choose servers with uniform probability; they
|
|
favor servers with high bandwidth or uptime, and exit servers that
|
|
permit connections to their favorite services.
|
|
\end{tightlist}
|
|
|
|
%discuss $\frac{c^2}{n^2}$, except how in practice the chance of owning
|
|
%the last hop is not $c/n$ since that doesn't take the destination (website)
|
|
%into account. so in cases where the adversary does not also control the
|
|
%final destination we're in good shape, but if he *does* then we'd be better
|
|
%off with a system that lets each hop choose a path.
|
|
%
|
|
%Isn't it more accurate to say ``If the adversary _always_ controls the final
|
|
% dest, we would be just as well off with such as system.'' ? If not, why
|
|
% not? -nm
|
|
% Sure. In fact, better off, since they seem to scale more easily. -rd
|
|
|
|
% XXXX the below paragraph should probably move later, and merge with
|
|
% other discussions of attack-tor-oak5.
|
|
In practice Tor's threat model is based entirely on the goal of
|
|
dispersal and diversity. Murdoch and Danezis describe an attack
|
|
\cite{attack-tor-oak05} that lets an attacker determine the nodes used
|
|
in a circuit; yet s/he cannot identify the initiator or responder,
|
|
e.g., client or web server, through this attack. So the endpoints
|
|
remain secure, which is the goal. It is conceivable that an
|
|
adversary could attack or set up observation of all connections
|
|
to an arbitrary Tor node in only a few minutes. If such an adversary
|
|
were to exist, s/he could use this probing to remotely identify a node
|
|
for further attack. Of more likely immediate practical concern
|
|
an adversary with active access to the responder traffic
|
|
wants to keep a circuit alive long enough to attack an identified
|
|
node. Thus it is important to prevent the responding end of the circuit
|
|
from keeping it open indefinitely.
|
|
Also, someone could identify nodes in this way and if in their
|
|
jurisdiction, immediately get a subpoena (if they even need one)
|
|
telling the node operator(s) that she must retain all the active
|
|
circuit data she now has.
|
|
Further, the enclave model, which had previously looked to be the most
|
|
generally secure, seems particularly threatened by this attack, since
|
|
it identifies endpoints when they're also nodes in the Tor network:
|
|
see Section~\ref{subsec:helper-nodes} for discussion of some ways to
|
|
address this issue.
|
|
|
|
See \ref{subsec:routing-zones} for discussion of larger
|
|
adversaries and our dispersal goals.
|
|
|
|
\subsubsection{Distributed trust}
|
|
Tor's defense lies in having a diverse enough set of servers
|
|
to prevent most real-world
|
|
adversaries from being in the right places to attack users.
|
|
Tor aims to resist observers and insiders by distributing each transaction
|
|
over several nodes in the network. This ``distributed trust'' approach
|
|
means the Tor network can be safely operated and used by a wide variety
|
|
of mutually distrustful users, providing more sustainability and security
|
|
than some previous attempts at anonymizing networks.
|
|
The Tor network has a broad range of users, including ordinary citizens
|
|
concerned about their privacy, corporations
|
|
who don't want to reveal information to their competitors, and law
|
|
enforcement and government intelligence agencies who need
|
|
to do operations on the Internet without being noticed.
|
|
|
|
No organization can achieve this security on its own. If a single
|
|
corporation or government agency were to build a private network to
|
|
protect its operations, any connections entering or leaving that network
|
|
would be obviously linkable to the controlling organization. The members
|
|
and operations of that agency would be easier, not harder, to distinguish.
|
|
|
|
Instead, to protect our networks from traffic analysis, we must
|
|
collaboratively blend the traffic from many organizations and private
|
|
citizens, so that an eavesdropper can't tell which users are which,
|
|
and who is looking for what information. By bringing more users onto
|
|
the network, all users become more secure~\cite{econymics}.
|
|
|
|
Naturally, organizations will not want to depend on others for their
|
|
security. If most participating providers are reliable, Tor tolerates
|
|
some hostile infiltration of the network. For maximum protection,
|
|
the Tor design includes an enclave approach that lets data be encrypted
|
|
(and authenticated) end-to-end, so high-sensitivity users can be sure it
|
|
hasn't been read or modified. This even works for Internet services that
|
|
don't have built-in encryption and authentication, such as unencrypted
|
|
HTTP or chat, and it requires no modification of those services.
|
|
|
|
%Tor doesn't try to provide steg (but see Section~\ref{subsec:china}), or
|
|
%the other non-goals listed in tor-design.
|
|
|
|
\subsection{Related work}
|
|
Tor is not the only anonymity system that aims to be practical and useful.
|
|
Commercial single-hop proxies~\cite{anonymizer}, as well as unsecured
|
|
open proxies around the Internet, can provide good
|
|
performance and some security against a weaker attacker. The Java
|
|
Anon Proxy~\cite{web-mix} provides similar functionality to Tor but only
|
|
handles web browsing rather than arbitrary TCP\@.
|
|
%Some peer-to-peer file-sharing overlay networks such as
|
|
%Freenet~\cite{freenet} and Mute~\cite{mute}
|
|
Zero-Knowledge Systems' commercial Freedom
|
|
network~\cite{freedom21-security} was even more flexible than Tor in
|
|
that it could transport arbitrary IP packets, and it also supported
|
|
pseudonymous access rather than just anonymous access; but it had
|
|
a different approach to sustainability (collecting money from users
|
|
and paying ISPs to run servers), and has shut down due to financial
|
|
load. Finally, more scalable designs like Tarzan~\cite{tarzan:ccs02} and
|
|
MorphMix~\cite{morphmix:fc04} have been proposed in the literature, but
|
|
have not yet been fielded. We direct the interested reader to Section
|
|
2 of~\cite{tor-design} for a more in-depth review of related work.
|
|
|
|
Tor differs from other deployed systems for traffic analysis resistance
|
|
in its security and flexibility. Mix networks such as
|
|
Mixmaster~\cite{mixmaster-spec} or its successor Mixminion~\cite{minion-design}
|
|
gain the highest degrees of anonymity at the expense of introducing highly
|
|
variable delays, thus making them unsuitable for applications such as web
|
|
browsing. Commercial single-hop
|
|
proxies~\cite{anonymizer} present a single point of failure, where
|
|
a single compromise can expose all users' traffic, and a single-point
|
|
eavesdropper can perform traffic analysis on the entire network.
|
|
Also, their proprietary implementations place any infrastucture that
|
|
depends on these single-hop solutions at the mercy of their providers'
|
|
financial health as well as network security.
|
|
|
|
%XXXX six-four. crowds. i2p.
|
|
|
|
%XXXX
|
|
have a serious discussion of morphmix's assumptions, since they would
|
|
seem to be the direct competition. in fact tor is a flexible architecture
|
|
that would encompass morphmix, and they're nearly identical except for
|
|
path selection and node discovery. and the trust system morphmix has
|
|
seems overkill (and/or insecure) based on the threat model we've picked.
|
|
% this para should probably move to the scalability / directory system. -RD
|
|
|
|
|
|
\section{Crossroads: Policy issues}
|
|
\label{sec:crossroads-policy}
|
|
|
|
Many of the issues the Tor project needs to address extend beyond
|
|
system design and technology development. In particular, the
|
|
Tor project's \emph{image} with respect to its users and the rest of
|
|
the Internet impacts the security it can provide.
|
|
% No image, no sustainability -NM
|
|
|
|
% Fold this into next subsec.
|
|
As an example to motivate this section, some U.S.~Department of Energy
|
|
penetration testing engineers are tasked with compromising DoE computers
|
|
from the outside. They only have a limited number of ISPs from which to
|
|
launch their attacks, and they found that the defenders were recognizing
|
|
attacks because they came from the same IP space. These engineers wanted
|
|
to use Tor to hide their tracks. First, from a technical standpoint,
|
|
Tor does not support the variety of IP packets one would like to use in
|
|
such attacks (see Section~\ref{subsec:tcp-vs-ip}). But aside from this,
|
|
we also decided that it would probably be poor precedent to encourage
|
|
such use---even legal use that improves national security---and managed
|
|
to dissuade them.
|
|
|
|
With this image issue in mind, this section discusses the Tor user base and
|
|
Tor's interaction with other services on the Internet.
|
|
|
|
\subsection{Image and security}
|
|
% Communicating security? - NM
|
|
|
|
A growing field of papers argue that usability for anonymity systems
|
|
contributes directly to their security, because how usable the system
|
|
is impacts the possible anonymity set~\cite{back01,econymics}. Or
|
|
conversely, an unusable system attracts few users and thus can't provide
|
|
much anonymity.
|
|
|
|
This phenomenon has a second-order effect: knowing this, users should
|
|
choose which anonymity system to use based in part on how usable
|
|
\emph{others} will find it, in order to get the protection of a larger
|
|
anonymity set. Thus we might replace the adage ``usability is a security
|
|
parameter''~\cite{back01} with a new one: ``perceived usability is a
|
|
security parameter.'' From here we can better understand the effects
|
|
of publicity and advertising on security: the more convincing your
|
|
advertising, the more likely people will believe you have users, and thus
|
|
the more users you will attract. Perversely, over-hyped systems (if they
|
|
are not too broken) may be a better choice than modestly promoted ones,
|
|
if the hype attracts more users~\cite{usability-network-effect}.
|
|
|
|
So it follows that we should come up with ways to accurately communicate
|
|
the available security levels to the user, so she can make informed
|
|
decisions. JAP aims to do this by including a
|
|
comforting `anonymity meter' dial in the software's graphical interface,
|
|
giving the user an impression of the level of protection for her current
|
|
traffic.
|
|
|
|
However, there's a catch. For users to share the same anonymity set,
|
|
they need to act like each other. An attacker who can distinguish
|
|
a given user's traffic from the rest of the traffic will not be
|
|
distracted by other users on the network. For high-latency systems like
|
|
Mixminion, where the threat model is based on mixing messages with each
|
|
other, there's an arms race between end-to-end statistical attacks and
|
|
counter-strategies~\cite{statistical-disclosure,minion-design,e2e-traffic,trickle02}.
|
|
But for low-latency systems like Tor, end-to-end \emph{traffic
|
|
correlation} attacks~\cite{danezis-pet2004,SS03,defensive-dropping}
|
|
allow an attacker who can measure both ends of a communication
|
|
to match packet timing and volume, quickly linking
|
|
the initiator to her destination. This is why Tor's threat model is
|
|
based on preventing the adversary from observing both the initiator and
|
|
the responder.
|
|
|
|
Like Tor, the current JAP implementation does not pad connections
|
|
(apart from using small fixed-size cells for transport). In fact,
|
|
its cascade-based network topology may be even more vulnerable to these
|
|
attacks, because the network has fewer edges. JAP was born out of
|
|
the ISDN mix design~\cite{isdn-mixes}, where padding made sense because
|
|
every user had a fixed bandwidth allocation, but in its current context
|
|
as a general Internet web anonymizer, adding sufficient padding to JAP
|
|
would be prohibitively expensive.\footnote{Even if they could find and
|
|
maintain extra funding to run higher-capacity nodes, our experience
|
|
suggests that many users would not accept the increased per-user
|
|
bandwidth requirements, leading to an overall much smaller user base. But
|
|
see Section \ref{subsec:mid-latency}.} Therefore, since under this threat
|
|
model the number of concurrent users does not seem to have much impact
|
|
on the anonymity provided, we suggest that JAP's anonymity meter is not
|
|
correctly communicating security levels to its users.
|
|
|
|
% because more users don't help anonymity much, we need to rely more
|
|
% on other incentive schemes, both policy-based (see sec x) and
|
|
% technically enforced (see sec y)
|
|
|
|
On the other hand, while the number of active concurrent users may not
|
|
matter as much as we'd like, it still helps to have some other users
|
|
who use the network. We investigate this issue in the next section.
|
|
|
|
\subsection{Reputability}
|
|
% Maintaining image of social value? Social value? -NM
|
|
|
|
Another factor impacting the network's security is its reputability:
|
|
the perception of its social value based on its current user base. If Alice is
|
|
the only user who has ever downloaded the software, it might be socially
|
|
accepted, but she's not getting much anonymity. Add a thousand animal rights
|
|
activists, and she's anonymous, but everyone thinks she's a Bambi lover (or
|
|
NRA member if you prefer a contrasting example). Add a thousand
|
|
random citizens (cancer survivors, privacy enthusiasts, and so on)
|
|
and now she's harder to profile.
|
|
|
|
The more cancer survivors on Tor, the better for the human rights
|
|
activists. The more script kiddies, the worse for the normal users. Thus,
|
|
reputability is an anonymity issue for two reasons. First, it impacts
|
|
the sustainability of the network: a network that's always about to be
|
|
shut down has difficulty attracting and keeping users, so its anonymity
|
|
set suffers.
|
|
% XXX but we said the anonymity set doesn't matter!
|
|
Second, a disreputable network attracts the attention of
|
|
powerful attackers who may not mind revealing the identities of all the
|
|
users to uncover a few bad ones.
|
|
|
|
While people therefore have an incentive for the network to be used for
|
|
``more reputable'' activities than their own, there are still tradeoffs
|
|
involved when it comes to anonymity. To follow the above example, a
|
|
network used entirely by cancer survivors might welcome some NRA members
|
|
onto the network, though of course they'd prefer a wider
|
|
variety of users.
|
|
|
|
Reputability becomes even more tricky in the case of privacy networks,
|
|
since the good uses of the network (such as publishing by journalists in
|
|
dangerous countries) are typically kept private, whereas network abuses
|
|
or other problems tend to be more widely publicized.
|
|
|
|
The impact of public perception on security is especially important
|
|
during the bootstrapping phase of the network, where the first few
|
|
widely publicized uses of the network can dictate the types of users it
|
|
attracts next.
|
|
|
|
%% "outside of academia, jap has just lost, permanently". (That is,
|
|
%% even though the crime detection issues are resolved and are unlikely
|
|
%% to go down the same way again, public perception has not been kind.)
|
|
|
|
\subsection{Sustainability and incentives}
|
|
One of the unsolved problems in low-latency anonymity designs is
|
|
how to keep the servers running. Zero-Knowledge Systems's Freedom network
|
|
depended on paying third parties to run its servers; the JAP project's
|
|
bandwidth depends on grants to pay for its bandwidth and
|
|
administrative expenses. In Tor, bandwidth and administrative costs are
|
|
distributed across the volunteers who run Tor nodes, so we at least have
|
|
reason to think that the Tor network could survive without continued research
|
|
funding.\footnote{It also helps that Tor is implemented with free and open
|
|
source software that can be maintained by anybody with the ability and
|
|
inclination.} But why are these volunteers running nodes, and what can we
|
|
do to encourage more volunteers to do so?
|
|
|
|
We have not surveyed Tor operators to learn why they are running servers, but
|
|
from the information they have provided, it seems that many of them run Tor
|
|
nodes for reasons of personal interest in privacy issues. It is possible
|
|
that others are running Tor for anonymity reasons, but of course they are
|
|
hardly likely to tell us if they are.
|
|
|
|
Significantly, Tor's threat model changes the anonymity incentives for running
|
|
a server. In a high-latency mix network, users can receive additional
|
|
anonymity by running their own server, since doing so obscures when they are
|
|
injecting messages into the network. But in Tor, anybody observing a Tor
|
|
server can tell when the server is generating traffic that corresponds to
|
|
none of its incoming traffic.
|
|
Still, anonymity and privacy incentives do remain for server operators:
|
|
\begin{tightlist}
|
|
\item Against a hostile website, running a Tor exit node can provide a degree
|
|
of ``deniability'' for traffic that originates at that exit node. For
|
|
example, it is likely in practice that HTTP requests from a Tor server's IP
|
|
will be assumed to be from the Tor network.
|
|
XXXX clarify.
|
|
\item Maintain the sustainability of the network. XXX sentencize
|
|
%\item Local Tor entry and exit servers allow users on a network to run in an
|
|
% `enclave' configuration. [XXXX need to resolve this. They would do this
|
|
% for E2E encryption + auth?]
|
|
\end{tightlist}
|
|
|
|
First, we try to make the costs of running a Tor server easily minimized.
|
|
Since Tor is run by volunteers, the most crucial software usability issue is
|
|
usability by operators: when an operator leaves, the network becomes less
|
|
usable by everybody. To keep operators pleased, we must try to keep Tor's
|
|
resource and administrative demands as low as possible. [XXXX say more. E.g.,
|
|
exit policies.]
|
|
|
|
Because of ISP billing structures, many Tor operators have underused capacity
|
|
that they are willing to donate to the network, at no additional monetary
|
|
cost to them. Features to limit bandwidth have been essential to adoption.
|
|
Also useful has been a ``hibernation'' feature that allows a server that
|
|
wants to provide high bandwidth, but no more than a certain amount in a
|
|
giving billing cycle, to become dormant once its bandwidth is exhausted, and
|
|
to reawaken at a random offset into the next billing cycle. This feature has
|
|
interesting policy implications, however; see
|
|
section~\ref{subsec:bandwidth-and-usability} below.
|
|
|
|
[XXXX say more. Why else would you run a server? What else can we do/do we
|
|
already do to make running a server more attractive?]
|
|
[We can enforce incentives; see Section 6.1. We can rate-limit clients.
|
|
We can put "top bandwidth servers lists" up a la seti@home.]
|
|
|
|
\subsection{Bandwidth and usability}
|
|
\label{subsec:bandwidth-and-usability}
|
|
Once users have configured their applications to work with Tor, the largest
|
|
remaining usability issue is bandwidth. When websites ``feel slow,'' users
|
|
begin to suffer.
|
|
|
|
Clients currently try to build their connections through servers that they
|
|
guess will have enough bandwidth. But even if capacity is allocated
|
|
optimally, it seems unlikely that the current network architecture will have
|
|
enough capacity to provide every user with as much bandwidth as she would
|
|
receive if she weren't using Tor, unless far more servers join the network
|
|
(see above).
|
|
|
|
Limited capacity does not destroy the network, however. Instead, usage tends
|
|
towards an equilibrium: when performance suffers, users who value performance
|
|
over anonymity tend to leave the system, thus freeing capacity until the
|
|
remaining users on the network are exactly those willing to use that capacity
|
|
there is.
|
|
|
|
XXX But is it the right equilibirum? And if it's the wrong one, we lose
|
|
XXX users. And if we lose the wrong users, servers won't want to help.
|
|
|
|
XXX what if the file-sharers are more persistent than the journalists?
|
|
|
|
\subsection{Tor and file-sharing}
|
|
%One potentially problematical area with deploying Tor has been our response
|
|
%to file-sharing applications.
|
|
File-sharing applications make up an enormous
|
|
fraction of the traffic on the Internet today, and provide two challenges to
|
|
any anonymizing network: their intensive bandwidth requirement, and the
|
|
degree to which they are associated (correctly or not) with copyright
|
|
violation.
|
|
|
|
As noted above, high-bandwidth protocols can make the network unresponsive,
|
|
but tend to be somewhat self-correcting. Issues of copyright violation,
|
|
however, are more interesting. Typical exit node operators want to help
|
|
people achieve private and anonymous speech, not to help people (say) host
|
|
Vin Diesel movies for download; and typical ISPs would rather not
|
|
deal with customers who incur them the overhead of getting menacing letters
|
|
from the MPAA. While it is quite likely that the operators are doing nothing
|
|
illegal, many ISPs have policies of dropping users who get repeated legal
|
|
threats regardless of the merits of those threats, and many operators would
|
|
prefer to avoid receiving legal threats even if those threats have little
|
|
merit. So when the letters arrive, operators are likely to face
|
|
pressure to block filesharing applications entirely, in order to avoid the
|
|
hassle.
|
|
|
|
But blocking filesharing would not necessarily be easy; most popular
|
|
protocols have evolved to run on a variety of non-standard ports in order to
|
|
get around other port-based bans. Thus, exit node operators who wanted to
|
|
block filesharing would have to find some way to integrate Tor with a
|
|
protocol-aware exit filter. This could be a technically expensive
|
|
undertaking, and one with poor prospects: it is unlikely that Tor exit nodes
|
|
would succeed where so many institutional firewalls have failed. Another
|
|
possibility for sensitive operators is to run a restrictive server that
|
|
only permits exit connections to a restricted range of ports which are
|
|
not frequently associated with file sharing. There are increasingly few such
|
|
ports.
|
|
|
|
For the moment, it seems that Tor's bandwidth issues have rendered it
|
|
unattractive for bulk file-sharing traffic; this may continue to be so in the
|
|
future. Nevertheless, Tor will likely remain attractive for limited use in
|
|
filesharing protocols that have separate control and data channels.
|
|
|
|
[xxxx We should say more -- but what? That we'll see a similar
|
|
equilibriating effect as with bandwidth, where sensitive ops switch to
|
|
middleman, and we become less useful for filesharing, so the filesharing
|
|
people back off, so we get more ops since there's less filesharing, so the
|
|
filesharers come back, etc.]
|
|
|
|
in practice, plausible deniability is hypothetical and doesn't seem very
|
|
convincing. if ISPs find the activity antisocial, they don't care *why*
|
|
your computer is doing that behavior.
|
|
|
|
XXXX deliberately give priority to quiet circuits?
|
|
XXXX or non file-sharing ports??
|
|
XXXX Point is not to beat them off the network, but to keep them from
|
|
XXXX hogging the network.
|
|
|
|
\subsection{Tor and blacklists}
|
|
|
|
It was long expected that, alongside Tor's legitimate users, it would also
|
|
attract troublemakers who exploited Tor in order to abuse services on the
|
|
Internet.
|
|
[XXX we're not talking bandwidth abuse here, we're talking vandalism,
|
|
hate mails via hotmail, attacks, etc.]
|
|
Our initial answer to this situation was to use ``exit policies''
|
|
to allow individual Tor servers to block access to specific IP/port ranges.
|
|
This approach was meant to make operators more willing to run Tor by allowing
|
|
them to prevent their servers from being used for abusing particular
|
|
services. For example, all Tor servers currently block SMTP (port 25), in
|
|
order to avoid being used to send spam.
|
|
|
|
This approach is useful, but is insufficient for two reasons. First, since
|
|
it is not possible to force all servers to block access to any given service,
|
|
many of those services try to block Tor instead. More broadly, while being
|
|
blockable is important to being good netizens, we would like to encourage
|
|
services to allow anonymous access; services should not need to decide
|
|
between blocking legitimate anonymous use and allowing unlimited abuse.
|
|
|
|
This is potentially a bigger problem than it may appear.
|
|
On the one hand, if people want to refuse connections from you on
|
|
their servers it would seem that they should be allowed to. But, a
|
|
possible major problem with the blocking of Tor is that it's not just
|
|
the decision of the individual server administrator whose deciding if
|
|
he wants to post to Wikipedia from his Tor node address or allow
|
|
people to read Wikipedia anonymously through his Tor node. (Wikipedia
|
|
has blocked all posting from all Tor nodes based on IP address.) If e.g.,
|
|
s/he comes through a campus or corporate NAT, then the decision must
|
|
be to have the entire population behind it able to have a Tor exit
|
|
node or to have write access to Wikipedia. This is a loss for both of us (Tor
|
|
and Wikipedia). We don't want to compete for (or divvy up) the NAT
|
|
protected entities of the world.
|
|
|
|
(A related problem is that many IP blacklists are not terribly fine-grained.
|
|
No current IP blacklist, for example, allow a service provider to blacklist
|
|
only those Tor servers that allow access to a specific IP or port, even
|
|
though this information is readily available. One IP blacklist even bans
|
|
every class C network that contains a Tor server, and recommends banning SMTP
|
|
from these networks even though Tor does not allow SMTP at all.)
|
|
[****Since this is stupid and we oppose it, shouldn't we name names here -pfs]
|
|
[XXX also, they're making \emph{middleman nodes leave} because they're caught
|
|
up in the standoff!]
|
|
[XXX Mention: it's not dumb, it's strategic!]
|
|
[XXX Mention: for some servops, any blacklist is a blacklist too many,
|
|
because it is risky. (Guy lives in apt with one IP.)]
|
|
|
|
Problems of abuse occur mainly with services such as IRC networks and
|
|
Wikipedia, which rely on IP blocking to ban abusive users. While at first
|
|
blush this practice might seem to depend on the anachronistic assumption that
|
|
each IP is an identifier for a single user, it is actually more reasonable in
|
|
practice: it assumes that non-proxy IPs are a costly resource, and that an
|
|
abuser can not change IPs at will. By blocking IPs which are used by Tor
|
|
servers, open proxies, and service abusers, these systems hope to make
|
|
ongoing abuse difficult. Although the system is imperfect, it works
|
|
tolerably well for them in practice.
|
|
|
|
But of course, we would prefer that legitimate anonymous users be able to
|
|
access abuse-prone services. One conceivable approach would be to require
|
|
would-be IRC users, for instance, to register accounts if they wanted to
|
|
access the IRC network from Tor. But in practise, this would not
|
|
significantly impede abuse if creating new accounts were easily automatable;
|
|
[ XXX yahoo uses captchas in exactly this situation]
|
|
this is why services use IP blocking. In order to deter abuse, pseudonymous
|
|
identities need to require a significant switching cost in resources or human
|
|
time.
|
|
|
|
%One approach, similar to that taken by Freedom, would be to bootstrap some
|
|
%non-anonymous costly identification mechanism to allow access to a
|
|
%blind-signature pseudonym protocol. This would effectively create costly
|
|
%pseudonyms, which services could require in order to allow anonymous access.
|
|
%This approach has difficulties in practise, however:
|
|
%\begin{tightlist}
|
|
%\item Unlike Freedom, Tor is not a commercial service. Therefore, it would
|
|
% be a shame to require payment in order to make Tor useful, or to make
|
|
% non-paying users second-class citizens.
|
|
%\item It is hard to think of an underlying resource that would actually work.
|
|
% We could use IP addresses, but that's the problem, isn't it?
|
|
%\item Managing single sign-on services is not considered a well-solved
|
|
% problem in practice. If Microsoft can't get universal acceptance for
|
|
% Passport, why do we think that a Tor-specific solution would do any good?
|
|
%\item Even if we came up with a perfect authentication system for our needs,
|
|
% there's no guarantee that any service would actually start using it. It
|
|
% would require a nonzero effort for them to support it, and it might just
|
|
% be less hassle for them to block tor anyway.
|
|
%\end{tightlist}
|
|
|
|
The use of squishy IP-based ``authentication'' and ``authorization''
|
|
has not broken down even to the level that SSNs used for these
|
|
purposes have in commercial and public record contexts. Externalities
|
|
and misplaced incentives cause a continued focus on fighting identity
|
|
theft by protecting SSNs rather than developing better authentication
|
|
and incentive schemes \cite{price-privacy}. Similarly we can expect a
|
|
continued use of identification by IP number as long as there is no
|
|
workable alternative.
|
|
|
|
%Fortunately, our modular design separates
|
|
%routing from node discovery; so we could implement Morphmix in Tor just
|
|
%by implementing the Morphmix-specific node discovery and path selection
|
|
%pieces.
|
|
|
|
[XXX Mention correct DNS-RBL implementation. -NM]
|
|
|
|
\section{Crossroads: Design choices}
|
|
\label{sec:crossroads-design}
|
|
|
|
[XXX sentence here.]
|
|
|
|
\subsection{Transporting the stream vs transporting the packets}
|
|
\label{subsec:stream-vs-packet}
|
|
\label{subsec:tcp-vs-ip}
|
|
|
|
We periodically run into ex ZKS employees who tell us that the process of
|
|
anonymizing IPs should ``obviously'' be done at the IP layer. Here are
|
|
the issues that need to be resolved before we'll be ready to switch Tor
|
|
over to arbitrary IP traffic.
|
|
|
|
\begin{enumerate}
|
|
\setlength{\itemsep}{0mm}
|
|
\setlength{\parsep}{0mm}
|
|
\item \emph{IP packets reveal OS characteristics.} We still need to do
|
|
IP-level packet normalization, to stop things like IP fingerprinting
|
|
attacks. There likely exist libraries that can help with this.
|
|
\item \emph{Application-level streams still need scrubbing.} We still need
|
|
Tor to be easy to integrate with user-level application-specific proxies
|
|
such as Privoxy. So it's not just a matter of capturing packets and
|
|
anonymizing them at the IP layer.
|
|
\item \emph{Certain protocols will still leak information.} For example,
|
|
DNS requests destined for my local DNS servers need to be rewritten
|
|
to be delivered to some other unlinkable DNS server. This requires
|
|
understanding the protocols we are transporting.
|
|
\item \emph{The crypto is unspecified.} First we need a block-level encryption
|
|
approach that can provide security despite
|
|
packet loss and out-of-order delivery. Freedom allegedly had one, but it was
|
|
never publicly specified. %, and we believe it's likely vulnerable to tagging
|
|
%attacks \cite{tor-design}.
|
|
Also, TLS over UDP is not implemented or even
|
|
specified, though some early work has begun on that~\cite{dtls}.
|
|
\item \emph{We'll still need to tune network parameters}. Since the above
|
|
encryption system will likely need sequence numbers (and maybe more) to do
|
|
replay detection, handle duplicate frames, etc, we will be reimplementing
|
|
some subset of TCP anyway.
|
|
\item \emph{Exit policies for arbitrary IP packets mean building a secure
|
|
IDS.} Our server operators tell us that exit policies are one of
|
|
the main reasons they're willing to run Tor.
|
|
Adding an Intrusion Detection System to handle exit policies would
|
|
increase the security complexity of Tor, and would likely not work anyway,
|
|
as evidenced by the entire field of IDS and counter-IDS papers. Many
|
|
potential abuse issues are resolved by the fact that Tor only transports
|
|
valid TCP streams (as opposed to arbitrary IP including malformed packets
|
|
and IP floods), so exit policies become even \emph{more} important as
|
|
we become able to transport IP packets. We also need a way to compactly
|
|
characterize the exit policies and let clients parse them to predict
|
|
which nodes will allow which packets to exit.
|
|
\item \emph{The Tor-internal name spaces would need to be redesigned.} We
|
|
support hidden service {\tt{.onion}} addresses, and other special addresses
|
|
like {\tt{.exit}} for the user to request a particular exit server,
|
|
by intercepting the addresses when they are passed to the Tor client.
|
|
\end{enumerate}
|
|
|
|
This list is discouragingly long right now, but we recognize that it
|
|
would be good to investigate each of these items in further depth and to
|
|
understand which are actual roadblocks and which are easier to resolve
|
|
than we think. We certainly wouldn't mind if Tor one day is able to
|
|
transport a greater variety of protocols.
|
|
[XXX clarify our actual attitude here. -NM]
|
|
|
|
\subsection{Mid-latency}
|
|
\label{subsec:mid-latency}
|
|
|
|
Though Tor has always been designed to be practical and usable first
|
|
with as much anonymity as can be built in subject to those goals, we
|
|
have contemplated that users might need resistance to at least simple
|
|
traffic correlation attacks. Higher-latency mix-networks resist these
|
|
attacks by introducing variability into message arrival times in order to
|
|
suppress timing correlation. Thus, it seems worthwhile to consider the
|
|
whether we can improving Tor's anonymity by introducing batching and delaying
|
|
strategies to the Tor messages to prevent observers from linking incoming and
|
|
outgoing traffic.
|
|
|
|
Before we consider the engineering issues involved in the approach, of
|
|
course, we first need to study whether it can genuinely make users more
|
|
anonymous. Research on end-to-end traffic analysis on higher-latency mix
|
|
networks~\cite{e2e-traffic} indicates that as timing variance decreases,
|
|
timing correlation attacks require increasingly less data; it might be the
|
|
case that Tor can't resist timing attacks for longer than a few minutes
|
|
without increasing message delays to an unusable degree. Conversely, if Tor
|
|
can remain usable and slow timing attacks by even a matter of hours, this
|
|
would represent a significant improvement in practical anonymity: protecting
|
|
short-duration, once-off activities against a global observer is better than
|
|
protecting no activities at all. In order to answer this question, we might
|
|
try to adapt the techniques of~\cite{e2e-traffic} to a lower-latency mix
|
|
network, where instead of sending uncorrelated messages, users send batches
|
|
of cells in temporally clustered connections.
|
|
|
|
Once the anonymity questions are answered, we need to consider usability. If
|
|
the latency could be kept to two or three times its current overhead, this
|
|
might be acceptable to most Tor users. However, it might also destroy much of
|
|
the user base, and it is difficult to know in advance. Note also that in
|
|
practice, as the network grows to incorporate more DSL and cable-modem nodes,
|
|
and more nodes in various continents, this alone will \emph{already} cause
|
|
many-second delays for some transactions. Reducing this latency will be
|
|
hard, so perhaps it's worth considering whether accepting this higher latency
|
|
can improve the anonymity we provide. Also, it could be possible to
|
|
run a mid-latency option over the Tor network for those
|
|
users either willing to experiment or in need of more
|
|
anonymity. This would allow us to experiment with both
|
|
the anonymity provided and the interest on the part of users.
|
|
|
|
Adding a mid-latency option should not require significant fundamental
|
|
change to the Tor client or server design; circuits could be labeled as
|
|
low- or mid- latency as they are constructed. Low-latency traffic
|
|
would be processed as now, while cells on on circuits that are mid-latency
|
|
would be sent in uniform-size chunks at synchronized intervals. (Traffic
|
|
already moves through the Tor network in fixed-sized cells; this would
|
|
increase the granularity.) If servers forward these chunks in roughly
|
|
synchronous fashion, it will increase the similarity of data stream timing
|
|
signatures. By experimenting with the granularity of data chunks and
|
|
of synchronization we can attempt once again to optimize for both
|
|
usability and anonymity. Unlike in \cite{sync-batching}, it may be
|
|
impractical to synchronize on network batches by dropping chunks from
|
|
a batch that arrive late at a given node---unless Tor moves away from
|
|
stream processing to a more loss-tolerant paradigm (cf.\
|
|
Section~\ref{subsec:tcp-vs-ip}). Instead, batch timing would be obscured by
|
|
synchronizing batches at the link level, and there would
|
|
be no direct attempt to synchronize all batches
|
|
entering the Tor network at the same time.
|
|
%Alternatively, if end-to-end traffic correlation is the
|
|
%concern, there is little point in mixing.
|
|
% Why not?? -NM
|
|
It might also be feasible to
|
|
pad chunks to uniform size as is done now for cells; if this is link
|
|
padding rather than end-to-end, then it will take less overhead,
|
|
especially in bursty environments.
|
|
% This is another way in which it
|
|
%would be fairly practical to set up a mid-latency option within the
|
|
%existing Tor network.
|
|
Other padding regimens might supplement the
|
|
mid-latency option; however, we should continue the caution with which
|
|
we have always approached padding lest the overhead cost us too much
|
|
performance or too many volunteers.
|
|
|
|
The distinction between traffic correlation and traffic analysis is
|
|
not as cut and dried as we might wish. In \cite{hintz-pet02} it was
|
|
shown that if data volumes of various popular
|
|
responder destinations are catalogued, it may not be necessary to
|
|
observe both ends of a stream to learn a source-destination link.
|
|
This should be fairly effective without simultaneously observing both
|
|
ends of the connection. However, it is still essentially confirming
|
|
suspected communicants where the responder suspects are ``stored'' rather
|
|
than observed at the same time as the client.
|
|
Similarly latencies of going through various routes can be
|
|
catalogued~\cite{back01} to connect endpoints.
|
|
This is likely to entail high variability and massive storage since
|
|
% XXX hintz-pet02 just looked at data volumes of the sites. this
|
|
% doesn't require much variability or storage. I think it works
|
|
% quite well actually. Also, \cite{kesdogan:pet2002} takes the
|
|
% attack another level further, to narrow down where you could be
|
|
% based on an intersection attack on subpages in a website. -RD
|
|
%
|
|
% I was trying to be terse and simultaneously referring to both the
|
|
% Hintz stuff and the Back et al. stuff from Info Hiding 01. I've
|
|
% separated the two and added the references. -PFS
|
|
routes through the network to each site will be random even if they
|
|
have relatively unique latency characteristics. So this does
|
|
not seem an immediate practical threat. Further along similar lines,
|
|
the same paper suggested a ``clogging attack''. A version of this
|
|
was demonstrated to be practical in
|
|
\cite{attack-tor-oak05}. There it was shown that an outside attacker can
|
|
trace a stream through the Tor network while a stream is still active
|
|
simply by observing the latency of his own traffic sent through
|
|
various Tor nodes. These attacks are especially significant since they
|
|
counter previous results that running one's own onion router protects
|
|
better than using the network from the outside. The attacks do not
|
|
show the client address, only the first server within the Tor network,
|
|
making helper nodes all the more worthy of exploration for enclave
|
|
protection. Setting up a mid-latency subnet as described above would
|
|
be another significant step to evaluating resistance to such attacks.
|
|
|
|
The attacks in \cite{attack-tor-oak05} are also dependent on
|
|
cooperation of the responding application or the ability to modify or
|
|
monitor the responder stream, in order of decreasing attack
|
|
effectiveness. So, another way to slow some of these attacks
|
|
would be to cache responses at exit servers where possible, as it is with
|
|
DNS lookups and cacheable HTTP responses. Caching would, however,
|
|
create threats of its own. First, a Tor network is expected to contain
|
|
hostile nodes. If one of these is the repository of a cache, the
|
|
attack is still possible. Though more work to set up a Tor node and
|
|
cache repository, the payoff of such an attack is potentially
|
|
higher.
|
|
%To be
|
|
%useful, such caches would need to be distributed to any likely exit
|
|
%nodes of recurred requests for the same data.
|
|
% Even local caches could be useful, I think. -NM
|
|
%
|
|
%Added some clarification -PFS
|
|
Besides allowing any other insider attacks, caching nodes would hold a
|
|
record of destinations and data visited by Tor users reducing forward
|
|
anonymity. Worse, for the cache to be widely useful much beyond the
|
|
client that caused it there would have to either be a new mechanism to
|
|
distribute cache information around the network and a way for clients
|
|
to make use of it or the caches themselves would need to be
|
|
distributed widely. Either way the record of visited sites and
|
|
downloaded information is made automatically available to an attacker
|
|
without having to actively gather it himself. Besides its inherent
|
|
value, this could serve as useful data to an attacker deciding which
|
|
locations to target for confirmation. A way to counter this
|
|
distribution threat might be to only cache at certain semitrusted
|
|
helper nodes. This might help specific clients, but it would limit
|
|
the general value of caching.
|
|
|
|
%Does that cacheing discussion belong in low-latency?
|
|
|
|
\subsection{Application support: SOCKS and beyond}
|
|
|
|
Tor supports the SOCKS protocol, which provides a standardized interface for
|
|
generic TCP proxies. Unfortunately, this is not a complete solution for
|
|
many applications and platforms:
|
|
\begin{tightlist}
|
|
\item Many applications do not support SOCKS. To support such applications,
|
|
it's necessary to replace the networking system calls with SOCKS-aware
|
|
versions, or to run a local SOCKS tunnel and convince the applications to
|
|
connect to localhost. Neither of these tasks is easy for the average user,
|
|
even with good instructions.
|
|
\item Even when applications do use SOCKS, they often make DNS requests
|
|
themselves. (The various versions of the SOCKS protocol include some where
|
|
the application tells the proxy an IP address, and some where it sends a
|
|
hostname.) By connecting to the DNS sever directly, the application breaks
|
|
the user's anonymity and advertises where it is about to connect.
|
|
\end{tightlist}
|
|
|
|
So in order to actually provide good anonymity, we need to make sure that
|
|
users have a practical way to use Tor anonymously. Possibilities include
|
|
writing wrappers for applications to anonymize them automatically; improving
|
|
the applications' support for SOCKS; writing libraries to help application
|
|
writers use Tor properly; and implementing a local DNS proxy to reroute DNS
|
|
requests to Tor so that applications can simply point their DNS resolvers at
|
|
localhost and continue to use SOCKS for data only.
|
|
|
|
\subsection{Measuring performance and capacity}
|
|
\label{subsec:performance}
|
|
|
|
One of the paradoxes with engineering an anonymity network is that we'd like
|
|
to learn as much as we can about how traffic flows so we can improve the
|
|
network, but we want to prevent others from learning how traffic flows in
|
|
order to trace users' connections through the network. Furthermore, many
|
|
mechanisms that help Tor run efficiently (such as having clients choose servers
|
|
based on their capacities) require measurements about the network.
|
|
|
|
Currently, servers record their bandwidth use in 15-minute intervals and
|
|
include this information in the descriptors they upload to the directory.
|
|
They also try to deduce their own available bandwidth, on the basis of how
|
|
much traffic they have been able to transfer recently, and upload this
|
|
information as well.
|
|
|
|
This is, of course, eminently cheatable. A malicious server can get a
|
|
disproportionate amount of traffic simply by claiming to have more bandiwdth
|
|
than it does. But better mechanisms have their problems. If bandwidth data
|
|
is to be measured rather than self-reported, it is usually possible for
|
|
servers to selectively provide better service for the measuring party, or
|
|
sabotage the measured value of other servers. Complex solutions for
|
|
mix networks have been proposed, but do not address the issues
|
|
completely~\cite{mix-acc,casc-rep}.
|
|
|
|
Even without the possibility of cheating, network measurement is
|
|
non-trivial. It is far from unusual for one observer's view of a server's
|
|
latency or bandwidth to disagree wildly with another's. Furthermore, it is
|
|
unclear whether total bandwidth is really the right measure; perhaps clients
|
|
should be considering servers on the basis of unused bandwidth instead, or
|
|
perhaps observed throughput.
|
|
% XXXX say more here?
|
|
|
|
%How to measure performance without letting people selectively deny service
|
|
%by distinguishing pings. Heck, just how to measure performance at all. In
|
|
%practice people have funny firewalls that don't match up to their exit
|
|
%policies and Tor doesn't deal.
|
|
|
|
%Network investigation: Is all this bandwidth publishing thing a good idea?
|
|
%How can we collect stats better? Note weasel's smokeping, at
|
|
%http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor
|
|
%which probably gives george and steven enough info to break tor?
|
|
|
|
Even if we can collect and use this network information effectively, we need
|
|
to make sure that it is not more useful to attackers than to us. While it
|
|
seems plausible that bandwidth data alone is not enough to reveal
|
|
sender-recipient connections under most circumstances, it could certainly
|
|
reveal the path taken by large traffic flows under low-usage circumstances.
|
|
|
|
\subsection{Running a Tor server, path length, and helper nodes}
|
|
|
|
It has been thought for some time that the best anonymity protection
|
|
comes from running your own onion router~\cite{or-pet00,tor-design}.
|
|
(In fact, in Onion Routing's first design, this was the only option
|
|
possible~\cite{or-ih96}.) The first design also had a fixed path
|
|
length of five nodes. Middle Onion Routing involved much analysis
|
|
(mostly unpublished) of route selection algorithms and path length
|
|
algorithms to combine efficiency with unpredictability in routes.
|
|
Since, unlike Crowds, nodes in a route cannot all know the ultimate
|
|
destination of an application connection, it was generally not
|
|
considered significant if a node could determine via latency that it
|
|
was second in the route. But if one followed Tor's three node default
|
|
path length, an enclave-to-enclave communication (in which two of the
|
|
ORs were at each enclave) would be completely compromised by the
|
|
middle node. Thus for enclave-to-enclave communication, four is the fewest
|
|
number of nodes that preserves the $\frac{c^2}{n^2}$ degree of protection
|
|
in any setting.
|
|
|
|
The Murdoch-Danezis attack, however, shows that simply adding to the
|
|
path length may not protect usage of an enclave protecting OR\@. A
|
|
hostile web server can determine all of the nodes in a three node Tor
|
|
path. The attack only identifies that a node is on the route, not
|
|
where. For example, if all of the nodes on the route were enclave
|
|
nodes, the attack would not identify which of the two not directly
|
|
visible to the attacker was the source. Thus, there remains an
|
|
element of plausible deniability that is preserved for enclave nodes.
|
|
However, Tor has always sought to be stronger than plausible
|
|
deniability. Our assumption is that users of the network are concerned
|
|
about being identified by an adversary, not with being proven guilty
|
|
beyond any reasonable doubt. Still it is something, and may be desired
|
|
in some settings.
|
|
|
|
It is reasonable to think that this attack can be easily extended to
|
|
longer paths should those be used; nonetheless there may be some
|
|
advantage to random path length. If the number of nodes is unknown,
|
|
then the adversary would need to send streams to all the nodes in the
|
|
network and analyze the resulting latency from them to be reasonably
|
|
certain that it has not missed the first node in the circuit. Also,
|
|
the attack does not identify the order of nodes in a route, so the
|
|
longer the route, the greater the uncertainty about which node might
|
|
be first. It may be possible to extend the attack to learn the route
|
|
node order, but has not been shown whether this is practically feasible.
|
|
If so, the incompleteness uncertainty engendered by random lengths would
|
|
remain, but once the complete set of nodes in the route were identified
|
|
the initiating node would also be identified.
|
|
|
|
Another way to reduce the threats to both enclaves and simple Tor
|
|
clients is to have helper nodes. Helper nodes were introduced
|
|
in~\cite{wright03} as a suggested means of protecting the identity
|
|
of the initiator of a communication in various anonymity protocols.
|
|
The idea is to use a single trusted node as the first one you go to,
|
|
that way an attacker cannot ever attack the first nodes you connect
|
|
to and do some form of intersection attack. This will not affect the
|
|
Danezis-Murdoch attack at all if the attacker can time latencies to
|
|
both the helper node and the enclave node.
|
|
|
|
We have to pick the path length so adversary can't distinguish client from
|
|
server (how many hops is good?).
|
|
|
|
\subsection{Helper nodes}
|
|
\label{subsec:helper-nodes}
|
|
|
|
Tor can only provide anonymity against an attacker if that attacker can't
|
|
monitor the user's entry and exit on the Tor network. But since Tor
|
|
currently chooses entry and exit points randomly and changes them frequently,
|
|
a patient attacker who controls a single entry and a single exit is sure to
|
|
eventually break some circuits of frequent users who consider those servers.
|
|
(We assume that users are as concerned about statistical profiling as about
|
|
the anonymity any particular connection. That is, it is almost as bad to
|
|
leak the fact that Alice {\it sometimes} talks to Bob as it is to leak the times
|
|
when Alice is {\it actually} talking to Bob.)
|
|
|
|
|
|
One solution to this problem is to use ``helper nodes''~\cite{wright02,wright03}---to
|
|
have each client choose a few fixed servers for critical positions in her
|
|
circuits. That is, Alice might choose some server H1 as her preferred
|
|
entry, so that unless the attacker happens to control or observe her
|
|
connection to H1, her circuits will remain anonymous. If H1 is compromised,
|
|
Alice is vunerable as before. But now, at least, she has a chance of
|
|
not being profiled.
|
|
|
|
(Choosing fixed exit nodes is less useful, since the connection from the exit
|
|
node to Alice's destination will be seen not only by the exit but by the
|
|
destination. Even if Alice chooses a good fixed exit node, she may
|
|
nevertheless connect to a hostile website.)
|
|
|
|
There are still obstacles remaining before helper nodes can be implemented.
|
|
For one, the litereature does not describe how to choose helpers from a list
|
|
of servers that changes over time. If Alice is forced to choose a new entry
|
|
helper every $d$ days, she can expect to choose a compromised server around
|
|
every $dc/n$ days. Worse, an attacker with the ability to DoS servers could
|
|
force their users to switch helper nodes more frequently.
|
|
|
|
%Do general DoS attacks have anonymity implications? See e.g. Adam
|
|
%Back's IH paper, but I think there's more to be pointed out here. -RD
|
|
% Not sure what you want to say here. -NM
|
|
|
|
%Game theory for helper nodes: if Alice offers a hidden service on a
|
|
%server (enclave model), and nobody ever uses helper nodes, then against
|
|
%George+Steven's attack she's totally nailed. If only Alice uses a helper
|
|
%node, then she's still identified as the source of the data. If everybody
|
|
%uses a helper node (including Alice), then the attack identifies the
|
|
%helper node and also Alice, and knows which one is which. If everybody
|
|
%uses a helper node (but not Alice), then the attacker figures the real
|
|
%source was a client that is using Alice as a helper node. [How's my
|
|
%logic here?] -RD
|
|
%
|
|
% Not sure about the logic. For the attack to work with helper nodes, the
|
|
%attacker needs to guess that Alice is running the hidden service, right?
|
|
%Otherwise, how can he know to measure her traffic specifically? -NM
|
|
|
|
%point to routing-zones section re: helper nodes to defend against
|
|
%big stuff.
|
|
|
|
\subsection{Location-hidden services}
|
|
\label{subsec:hidden-services}
|
|
|
|
While most of the discussions about have been about forward anonymity
|
|
with Tor, it also provides support for \emph{rendezvous points}, which
|
|
let users provide TCP services to other Tor users without revealing
|
|
their location. Since this feature is relatively recent, we describe here
|
|
a couple of our early observations from its deployment.
|
|
|
|
First, our implementation of hidden services seems less hidden than we'd
|
|
like, since they are configured on a single client and get used over
|
|
and over---particularly because an external adversary can induce them to
|
|
produce traffic. They seem the ideal use case for our above discussion
|
|
of helper nodes. This insecurity means that they may not be suitable as
|
|
a building block for Free Haven~\cite{freehaven-berk} or other anonymous
|
|
publishing systems that aim to provide long-term security.
|
|
%Also, they're brittle in terms of intersection and observation attacks.
|
|
|
|
\emph{Hot-swap} hidden services, where more than one location can
|
|
provide the service and loss of any one location does not imply a
|
|
change in service, would help foil intersection and observation attacks
|
|
where an adversary monitors availability of a hidden service and also
|
|
monitors whether certain users or servers are online. However, the design
|
|
challenges in providing these services without otherwise compromising
|
|
the hidden service's anonymity remain an open problem.
|
|
|
|
In practice, hidden services are used for more than just providing private
|
|
access to a web server or IRC server. People are using hidden services
|
|
as a poor man's VPN and firewall-buster. Many people want to be able
|
|
to connect to the computers in their private network via secure shell,
|
|
and rather than playing with dyndns and trying to pierce holes in their
|
|
firewall, they run a hidden service on the inside and then rendezvous
|
|
with that hidden service externally.
|
|
|
|
Also, sites like Bloggers Without Borders (www.b19s.org) are advertising
|
|
a hidden-service address on their front page. Doing this can provide
|
|
increased robustness if they use the dual-IP approach we describe in
|
|
tor-design, but in practice they do it firstly to increase visibility
|
|
of the tor project and their support for privacy, and secondly to offer
|
|
a way for their users, using unmodified software, to get end-to-end
|
|
encryption and end-to-end authentication to their website.
|
|
|
|
\subsection{Trust and discovery}
|
|
\label{subsec:trust-and-discovery}
|
|
|
|
[arma will edit this and expand/retract it]
|
|
|
|
The published Tor design adopted a deliberately simplistic design for
|
|
authorizing new nodes and informing clients about servers and their status.
|
|
In the early Tor designs, all ORs periodically uploaded a signed description
|
|
of their locations, keys, and capabilities to each of several well-known {\it
|
|
directory servers}. These directory servers constructed a signed summary
|
|
of all known ORs (a ``directory''), and a signed statement of which ORs they
|
|
believed to be operational at any given time (a ``network status''). Clients
|
|
periodically downloaded a directory in order to learn the latest ORs and
|
|
keys, and more frequently downloaded a network status to learn which ORs are
|
|
likely to be running. ORs also operate as directory caches, in order to
|
|
lighten the bandwidth on the authoritative directory servers.
|
|
|
|
In order to prevent Sybil attacks (wherein an adversary signs up many
|
|
purportedly independent servers in order to increase her chances of observing
|
|
a stream as it enters and leaves the network), the early Tor directory design
|
|
required the operators of the authoritative directory servers to manually
|
|
approve new ORs. Unapproved ORs were included in the directory, but clients
|
|
did not use them at the start or end of their circuits. In practice,
|
|
directory administrators performed little actual verification, and tended to
|
|
approve any OR whose operator could compose a coherent email. This procedure
|
|
may have prevented trivial automated Sybil attacks, but would do little
|
|
against a clever attacker.
|
|
|
|
There are a number of flaws in this system that need to be addressed as we
|
|
move forward. They include:
|
|
\begin{tightlist}
|
|
\item Each directory server represents an independent point of failure; if
|
|
any one were compromised, it could immediately compromise all of its users
|
|
by recommending only compromised ORs.
|
|
\item The more servers appear join the network, the more unreasonable it
|
|
becomes to expect clients to know about them all. Directories
|
|
become unfeasibly large, and downloading the list of servers becomes
|
|
burdonsome.
|
|
\item The validation scheme may do as much harm as it does good. It is not
|
|
only incapable of preventing clever attackers from mounting Sybil attacks,
|
|
but may deter server operators from joining the network. (For instance, if
|
|
they expect the validation process to be difficult, or if they do not share
|
|
any languages in common with the directory server operators.)
|
|
\end{tightlist}
|
|
|
|
We could try to move the system in several directions, depending on our
|
|
choice of threat model and requirements. If we did not need to increase
|
|
network capacity in order to support more users, there would be no reason not
|
|
to adopt even stricter validation requirements, and reduce the number of
|
|
servers in the network to a trusted minimum. But since we want Tor to work
|
|
for as many users as it can, we need XXXXX
|
|
|
|
In order to address the first two issues, it seems wise to move to a system
|
|
including a number of semi-trusted directory servers, no one of which can
|
|
compromise a user on its own. Ultimately, of course, we cannot escape the
|
|
problem of a first introducer: since most users will run Tor in whatever
|
|
configuration the software ships with, the Tor distribution itself will
|
|
remain a potential single point of failure so long as it includes the seed
|
|
keys for directory servers, a list of directory servers, or any other means
|
|
to learn which servers are on the network. But omitting this information
|
|
from the Tor distribution would only delegate the trust problem to the
|
|
individual users, most of whom are presumably less informed about how to make
|
|
trust decisions than the Tor developers.
|
|
|
|
%Network discovery, sybil, node admission, scaling. It seems that the code
|
|
%will ship with something and that's our trust root. We could try to get
|
|
%people to build a web of trust, but no. Where we go from here depends
|
|
%on what threats we have in mind. Really decentralized if your threat is
|
|
%RIAA; less so if threat is to application data or individuals or...
|
|
|
|
\section{Scaling}
|
|
%\label{sec:crossroads-scaling}
|
|
%P2P + anonymity issues:
|
|
|
|
Tor is running today with hundreds of servers and tens of thousands of
|
|
users, but it will certainly not scale to millions.
|
|
|
|
Scaling Tor involves three main challenges. First is safe server
|
|
discovery, both bootstrapping -- how a Tor client can robustly find an
|
|
initial server list -- and ongoing -- how a Tor client can learn about
|
|
a fair sample of honest servers and not let the adversary control his
|
|
circuits (see Section~\ref{subsec:trust-and-discovery}). Second is detecting and handling the speed
|
|
and reliability of the variety of servers we must use if we want to
|
|
accept many servers (see Section~\ref{subsec:performance}).
|
|
Since the speed and reliability of a circuit is limited by its worst link,
|
|
we must learn to track and predict performance. Finally, in order to get
|
|
a large set of servers in the first place, we must address incentives
|
|
for users to carry traffic for others (see Section incentives).
|
|
|
|
\subsection{Incentives by Design}
|
|
|
|
There are three behaviors we need to encourage for each server: relaying
|
|
traffic; providing good throughput and reliability while doing it;
|
|
and allowing traffic to exit the network from that server.
|
|
|
|
We encourage these behaviors through \emph{indirect} incentives, that
|
|
is, designing the system and educating users in such a way that users
|
|
with certain goals will choose to relay traffic. One
|
|
main incentive for running a Tor server is social benefit: volunteers
|
|
altruistically donate their bandwidth and time. We also keep public
|
|
rankings of the throughput and reliability of servers, much like
|
|
seti@home. We further explain to users that they can get plausible
|
|
deniability for any traffic emerging from the same address as a Tor
|
|
exit node, and they can use their own Tor server
|
|
as entry or exit point and be confident it's not run by the adversary.
|
|
Further, users who need to be able to communicate anonymously
|
|
may run a server simply because their need to increase
|
|
expectation that such a network continues to be available to them
|
|
and usable exceeds any countervening costs.
|
|
Finally, we can improve the usability and feature set of the software:
|
|
rate limiting support and easy packaging decrease the hassle of
|
|
maintaining a server, and our configurable exit policies allow each
|
|
operator to advertise a policy describing the hosts and ports to which
|
|
he feels comfortable connecting.
|
|
|
|
To date these appear to have been adequate. As the system scales or as
|
|
new issues emerge, however, we may also need to provide
|
|
\emph{direct} incentives:
|
|
providing payment or other resources in return for high-quality service.
|
|
Paying actual money is problematic: decentralized e-cash systems are
|
|
not yet practical, and a centralized collection system not only reduces
|
|
robustness, but also has failed in the past (the history of commercial
|
|
anonymizing networks is littered with failed attempts). A more promising
|
|
option is to use a tit-for-tat incentive scheme: provide better service
|
|
to nodes that have provided good service to you.
|
|
|
|
Unfortunately, such an approach introduces new anonymity problems.
|
|
There are many surprising ways for servers to game the incentive and
|
|
reputation system to undermine anonymity because such systems are
|
|
designed to encourage fairness in storage or bandwidth usage not
|
|
fairness of provided anonymity. An adversary can attract more traffic
|
|
by performing well or can provide targeted differential performance to
|
|
individual users to undermine their anonymity. Typically a user who
|
|
chooses evenly from all options is most resistant to an adversary
|
|
targeting him, but that approach prevents from handling heterogeneous
|
|
servers.
|
|
|
|
%When a server (call him Steve) performs well for Alice, does Steve gain
|
|
%reputation with the entire system, or just with Alice? If the entire
|
|
%system, how does Alice tell everybody about her experience in a way that
|
|
%prevents her from lying about it yet still protects her identity? If
|
|
%Steve's behavior only affects Alice's behavior, does this allow Steve to
|
|
%selectively perform only for Alice, and then break her anonymity later
|
|
%when somebody (presumably Alice) routes through his node?
|
|
|
|
A possible solution is a simplified approach to the tit-for-tat
|
|
incentive scheme based on two rules: (1) each node should measure the
|
|
service it receives from adjacent nodes, and provide service relative
|
|
to the received service, but (2) when a node is making decisions that
|
|
affect its own security (e.g. when building a circuit for its own
|
|
application connections), it should choose evenly from a sufficiently
|
|
large set of nodes that meet some minimum service threshold
|
|
\cite{casc-rep}. This approach allows us to discourage bad service
|
|
without opening Alice up as much to attacks. All of this requires
|
|
further study.
|
|
|
|
|
|
%XXX rewrite the above so it sounds less like a grant proposal and
|
|
%more like a "if somebody were to try to solve this, maybe this is a
|
|
%good first step".
|
|
|
|
%We should implement the above incentive scheme in the
|
|
%deployed Tor network, in conjunction with our plans to add the necessary
|
|
%associated scalability mechanisms. We will do experiments (simulated
|
|
%and/or real) to determine how much the incentive system improves
|
|
%efficiency over baseline, and also to determine how far we are from
|
|
%optimal efficiency (what we could get if we ignored the anonymity goals).
|
|
|
|
\subsection{Peer-to-peer / practical issues}
|
|
|
|
[leave this section for now, and make sure things here are covered
|
|
elsewhere. then remove it.]
|
|
|
|
Making use of servers with little bandwidth. How to handle hammering by
|
|
certain applications.
|
|
|
|
Handling servers that are far away from the rest of the network, e.g. on
|
|
the continents that aren't North America and Europe. High latency,
|
|
often high packet loss.
|
|
|
|
Running Tor servers behind NATs, behind great-firewalls-of-China, etc.
|
|
Restricted routes. How to propagate to everybody the topology? BGP
|
|
style doesn't work because we don't want just *one* path. Point to
|
|
Geoff's stuff.
|
|
|
|
\subsection{Location diversity and ISP-class adversaries}
|
|
\label{subsec:routing-zones}
|
|
|
|
Anonymity networks have long relied on diversity of node location for
|
|
protection against attacks---typically an adversary who can observe a
|
|
larger fraction of the network can launch a more effective attack. One
|
|
way to achieve dispersal involves growing the network so a given adversary
|
|
sees less. Alternately, we can arrange the topology so traffic can enter
|
|
or exit at many places (for example, by using a free-route network
|
|
like Tor rather than a cascade network like JAP). Lastly, we can use
|
|
distributed trust to spread each transaction over multiple jurisdictions.
|
|
But how do we decide whether two nodes are in related locations?
|
|
|
|
Feamster and Dingledine defined a \emph{location diversity} metric
|
|
in \cite{feamster:wpes2004}, and began investigating a variant of location
|
|
diversity based on the fact that the Internet is divided into thousands of
|
|
independently operated networks called {\em autonomous systems} (ASes).
|
|
The key insight from their paper is that while we typically think of a
|
|
connection as going directly from the Tor client to her first Tor node,
|
|
actually it traverses many different ASes on each hop. An adversary at
|
|
any of these ASes can monitor or influence traffic. Specifically, given
|
|
plausible initiators and recipients and path random path selection,
|
|
some ASes in the simulation were able to observe 10\% to 30\% of the
|
|
transactions (that is, learn both the origin and the destination) on
|
|
the deployed Tor network (33 nodes as of June 2004).
|
|
|
|
The paper concludes that for best protection against the AS-level
|
|
adversary, nodes should be in ASes that have the most links to other ASes:
|
|
Tier-1 ISPs such as AT\&T and Abovenet. Further, a given transaction
|
|
is safest when it starts or ends in a Tier-1 ISP. Therefore, assuming
|
|
initiator and responder are both in the U.S., it actually \emph{hurts}
|
|
our location diversity to add far-flung nodes in continents like Asia
|
|
or South America.
|
|
|
|
Many open questions remain. First, it will be an immense engineering
|
|
challenge to get an entire BGP routing table to each Tor client, or at
|
|
least summarize it sufficiently. Without a local copy, clients won't be
|
|
able to safely predict what ASes will be traversed on the various paths
|
|
through the Tor network to the final destination. Tarzan~\cite{tarzan:ccs02}
|
|
and MorphMix~\cite{morphmix:fc04} suggest that we compare IP prefixes to
|
|
determine location diversity; but the above paper showed that in practice
|
|
many of the Mixmaster nodes that share a single AS have entirely different
|
|
IP prefixes. When the network has scaled to thousands of nodes, does IP
|
|
prefix comparison become a more useful approximation?
|
|
%
|
|
Second, can take advantage of caching certain content at the exit nodes, to
|
|
limit the number of requests that need to leave the network at all.
|
|
what about taking advantage of caches like akamai's or googles? what
|
|
about treating them as adversaries?
|
|
%
|
|
Third, if we follow the paper's recommendations and tailor path selection
|
|
to avoid choosing endpoints in similar locations, how much are we hurting
|
|
anonymity against larger real-world adversaries who can take advantage
|
|
of knowing our algorithm?
|
|
%
|
|
Lastly, can we use this knowledge to figure out which gaps in our network
|
|
would most improve our robustness to this class of attack, and go recruit
|
|
new servers with those ASes in mind?
|
|
|
|
Tor's security relies in large part on the dispersal properties of its
|
|
network. We need to be more aware of the anonymity properties of various
|
|
approaches we can make better design decisions in the future.
|
|
|
|
\subsection{The China problem}
|
|
\label{subsec:china}
|
|
|
|
Citizens in a variety of countries, such as most recently China and
|
|
Iran, are periodically blocked from accessing various sites outside
|
|
their country. These users try to find any tools available to allow
|
|
them to get-around these firewalls. Some anonymity networks, such as
|
|
Six-Four~\cite{six-four}, are designed specifically with this goal in
|
|
mind; others like the Anonymizer~\cite{anonymizer} are paid by sponsors
|
|
such as Voice of America to set up a network to encourage Internet
|
|
freedom. Even though Tor wasn't
|
|
designed with ubiquitous access to the network in mind, thousands of
|
|
users across the world are trying to use it for exactly this purpose.
|
|
% Academic and NGO organizations, peacefire, \cite{berkman}, etc
|
|
|
|
Anti-censorship networks hoping to bridge country-level blocks face
|
|
a variety of challenges. One of these is that they need to find enough
|
|
exit nodes---servers on the `free' side that are willing to relay
|
|
arbitrary traffic from users to their final destinations. Anonymizing
|
|
networks including Tor are well-suited to this task, since we have
|
|
already gathered a set of exit nodes that are willing to tolerate some
|
|
political heat.
|
|
|
|
The other main challenge is to distribute a list of reachable relays
|
|
to the users inside the country, and give them software to use them,
|
|
without letting the authorities also enumerate this list and block each
|
|
relay. Anonymizer solves this by buying lots of seemingly-unrelated IP
|
|
addresses (or having them donated), abandoning old addresses as they are
|
|
`used up', and telling a few users about the new ones. Distributed
|
|
anonymizing networks again have an advantage here, in that we already
|
|
have tens of thousands of separate IP addresses whose users might
|
|
volunteer to provide this service since they've already installed and use
|
|
the software for their own privacy~\cite{koepsell:wpes2004}. Because
|
|
the Tor protocol separates routing from network discovery (see Section
|
|
\ref{do-we-discuss-this?}), volunteers could configure their Tor clients
|
|
to generate server descriptors and send them to a special directory
|
|
server that gives them out to dissidents who need to get around blocks.
|
|
|
|
Of course, this still doesn't prevent the adversary
|
|
from enumerating all the volunteer relays and blocking them preemptively.
|
|
Perhaps a tiered-trust system could be built where a few individuals are
|
|
given relays' locations, and they recommend other individuals by telling them
|
|
those addresses, thus providing a built-in incentive to avoid letting the
|
|
adversary intercept them. Max-flow trust algorithms~\cite{advogato}
|
|
might help to bound the number of IP addresses leaked to the adversary. Groups
|
|
like the W3C are looking into using Tor as a component in an overall system to
|
|
help address censorship; we wish them luck.
|
|
|
|
%\cite{infranet}
|
|
|
|
\subsection{Non-clique topologies}
|
|
|
|
Tor's comparatively weak model makes it easier to scale than other mix net
|
|
designs. High-latency mix networks need to avoid partitioning attacks, where
|
|
network splits prevent users of the separate partitions from providing cover
|
|
for each other. In Tor, however, we assume that the adversary cannot
|
|
cheaply observe nodes at will, so even if the network becomes split, the
|
|
users do not necessarily receive much less protection.
|
|
Thus, a simple possibility when the scale of a Tor network
|
|
exceeds some size is to simply split it. Care could be taken in
|
|
allocating which nodes go to which network along the lines of
|
|
\cite{casc-rep} to insure that collaborating hostile nodes are not
|
|
able to gain any advantage in network splitting that they do not
|
|
already have in joining a network.
|
|
|
|
% Describe these attacks; many people will not have read the paper!
|
|
The attacks in \cite{attack-tor-oak05} show that certain types of
|
|
brute force attacks are in fact feasible; however they make the
|
|
above point stronger not weaker. The attacks do not appear to be
|
|
significantly more difficult to mount against a network that is
|
|
twice the size. Also, they only identify the Tor nodes used in a
|
|
circuit, not the client. Finally note that even if the network is split,
|
|
a client does not need to use just one of the two resulting networks.
|
|
Alice could use either of them, and it would not be difficult to make
|
|
the Tor client able to access several such network on a per circuit
|
|
basis. More analysis is needed; we simply note here that splitting
|
|
a Tor network is an easy way to achieve moderate scalability and that
|
|
it does not necessarily have the same implications as splitting a mixnet.
|
|
|
|
Alternatively, we can try to scale a single Tor network. Some issues for
|
|
scaling include restricting the number of sockets and the amount of bandwidth
|
|
used by each server. The number of sockets is determined by the network's
|
|
connectivity and the number of users, while bandwidth capacity is determined
|
|
by the total bandwidth of servers on the network. The simplest solution to
|
|
bandwidth capacity is to add more servers, since adding a tor node of any
|
|
feasible bandwidth will increase the traffic capacity of the network. So as
|
|
a first step to scaling, we should focus on making the network tolerate more
|
|
servers, by reducing the interconnectivity of the nodes; later we can reduce
|
|
overhead associated withy directories, discovery, and so on.
|
|
|
|
By reducing the connectivity of the network we increase the total number of
|
|
nodes that the network can contain. Danezis~\cite{danezis-pets03} considers
|
|
the anonymity implications of restricting routes on mix networks, and
|
|
recommends an approach based on expander graphs (where any subgraph is likely
|
|
to have many neighbors). It is not immediately clear that this approach will
|
|
extend to Tor, which has a weaker threat model but higher performance
|
|
requirements than the network considered. Instead of analyzing the
|
|
probability of an attacker's viewing whole paths, we will need to examine the
|
|
attacker's likelihood of compromising the endpoints of a Tor circuit through
|
|
a sparse network.
|
|
|
|
% Nick edits these next 2 grafs.
|
|
|
|
To make matters simpler, Tor may not need an expander graph per se: it
|
|
may be enough to have a single subnet that is highly connected. As an
|
|
example, assume fifty nodes of relatively high traffic capacity. This
|
|
\emph{center} forms are a clique. Assume each center node can each
|
|
handle 200 connections to other nodes (including the other ones in the
|
|
center). Assume every noncenter node connects to three nodes in the
|
|
center and anyone out of the center that they want to. Then the
|
|
network easily scales to c. 2500 nodes with commensurate increase in
|
|
bandwidth. There are many open questions: how directory information
|
|
is distributed (presumably information about the center nodes could
|
|
be given to any new nodes with their codebase), whether center nodes
|
|
will need to function as a `backbone', etc. As above the point is
|
|
that this would create problems for the expected anonymity for a mixnet,
|
|
but for an onion routing network where anonymity derives largely from
|
|
the edges, it may be feasible.
|
|
|
|
Another point is that we already have a non-clique topology.
|
|
Individuals can set up and run Tor nodes without informing the
|
|
directory servers. This will allow, e.g., dissident groups to run a
|
|
local Tor network of such nodes that connects to the public Tor
|
|
network. This network is hidden behind the Tor network and its
|
|
only visible connection to Tor at those points where it connects.
|
|
As far as the public network is concerned or anyone observing it,
|
|
they are running clients.
|
|
|
|
\section{The Future}
|
|
\label{sec:conclusion}
|
|
|
|
we should put random thoughts here until there are enough for a
|
|
conclusion.
|
|
|
|
will our sustainability approach work? we'll see.
|
|
|
|
Applications that leak data: we can say they're not our problem, but
|
|
they're somebody's problem.
|
|
The more widely deployed Tor becomes, the more people who need a
|
|
deployed overlay network tell us they'd like to use us if only we added
|
|
the following more features.
|
|
|
|
"These are difficult and open questions, yet choosing not to solve them
|
|
means leaving most users to a less secure network or no anonymizing
|
|
network at all."
|
|
|
|
\bibliographystyle{plain} \bibliography{tor-design}
|
|
|
|
\clearpage
|
|
\appendix
|
|
|
|
\begin{figure}[t]
|
|
%\unitlength=1in
|
|
\centering
|
|
%\begin{picture}(6.0,2.0)
|
|
%\put(3,1){\makebox(0,0)[c]{\epsfig{figure=graphnodes,width=6in}}}
|
|
%\end{picture}
|
|
\mbox{\epsfig{figure=graphnodes,width=5in}}
|
|
\caption{Number of servers over time. Lowest line is number of exit
|
|
nodes that allow connections to port 80. Middle line is total number of
|
|
verified (registered) servers. The line above that represents servers
|
|
that are not yet registered.}
|
|
\label{fig:graphnodes}
|
|
\end{figure}
|
|
|
|
\begin{figure}[t]
|
|
\centering
|
|
\mbox{\epsfig{figure=graphtraffic,width=5in}}
|
|
\caption{The sum of traffic reported by each server over time. The bottom
|
|
pair show average throughput, and the top pair represent the largest 15
|
|
minute burst in each 4 hour period.}
|
|
\label{fig:graphtraffic}
|
|
\end{figure}
|
|
|
|
\end{document}
|
|
|