Tweaks and typos throughout. Nearly there.

svn:r3586
This commit is contained in:
Paul Syverson 2005-02-08 20:34:57 +00:00
parent 4518e7e642
commit 1d569eb492

View File

@ -6,11 +6,11 @@
\usepackage{amsmath}
\usepackage{epsfig}
\setlength{\textwidth}{6in}
\setlength{\textheight}{8in}
\setlength{\topmargin}{.5in}
\setlength{\oddsidemargin}{1cm}
\setlength{\evensidemargin}{1cm}
\setlength{\textwidth}{6.1in}
\setlength{\textheight}{8.5in}
\setlength{\topmargin}{1cm}
\setlength{\oddsidemargin}{.5cm}
\setlength{\evensidemargin}{.5cm}
\newenvironment{tightlist}{\begin{list}{$\bullet$}{
\setlength{\itemsep}{0mm}
@ -28,7 +28,7 @@
Nick Mathewson\inst{1} \and
Paul Syverson\inst{2}}
\institute{The Free Haven Project \email{<\{arma,nickm\}@freehaven.net>} \and
Naval Research Lab \email{<syverson@itd.nrl.navy.mil>}}
Naval Research Laboratory \email{<syverson@itd.nrl.navy.mil>}}
\maketitle
\pagestyle{plain}
@ -77,14 +77,15 @@ made it possible for Tor to serve many thousands of users and attract
funding from diverse sources whose goals range from security on a
national scale down to the liberties of each individual.
While the Tor design paper~\cite{tor-design} gives an overall view of Tor's
design and goals, this paper describes some policy, social, and technical
While~\cite{tor-design} gives an overall view of Tor's
design and goals, this paper describes policy, social, and technical
issues that we face as we continue deployment.
Rather than trying to provide complete solutions to every problem here, we
lay out the assumptions and constraints that we have observed while
deploying Tor in the wild. In doing so, we aim to create a research agenda
for others to help in addressing these issues. We believe that the issues
described here will be of general interest to projects attempting to build
described here will be of general interest to any and all
projects attempting to build
and deploy practical, useable anonymity networks in the wild.
%While the Tor design paper~\cite{tor-design} gives an overall view its
@ -132,7 +133,7 @@ Tor nodes on the network. The circuit is extended one hop at a time, and
each node along the way knows only which node gave it data and which
node it is giving data to. No individual Tor node ever knows the complete
path that a data packet has taken. The client negotiates a separate set
of encryption keys for each hop along the circuit.% to ensure that each
of encryption keys for each hop along the circuit. % to ensure that each
%hop can't trace these connections as they pass through.
Because each node sees no more than one hop in the
circuit, neither an eavesdropper nor a compromised node can use traffic
@ -140,7 +141,7 @@ analysis to link the connection's source and destination.
For efficiency, the Tor software uses the same circuit for all the TCP
connections that happen within the same short period.
Later requests use a new
circuit, to prevent long-term linkability between different actions by
circuit, to complicate long-term linkability between different actions by
a single user.
Tor also makes it possible for users to hide their locations while
@ -152,25 +153,25 @@ identity.
Tor attempts to anonymize the transport layer, not the application layer, so
application protocols that include personally identifying information need
additional application-level scrubbing proxies, such as
Privoxy~\cite{privoxy} for HTTP. Furthermore, Tor does not permit arbitrary
Privoxy~\cite{privoxy} for HTTP\@. Furthermore, Tor does not permit arbitrary
IP packets; it only anonymizes TCP streams and DNS request, and only supports
connections via SOCKS (see Section~\ref{subsec:tcp-vs-ip}).
Most node operators do not want to allow arbitary TCP connections to leave
their server. To address this, Tor provides \emph{exit policies} so that
each exit node can block the IP addresses and ports it is unwilling to allow.
TRs advertise their exit policies to the directory servers, so that
Tor nodes advertise their exit policies to the directory servers, so that
client can tell which nodes will support their connections.
As of January 2005, the Tor network has grown to around a hundred nodes
on four continents, with a total capacity exceeding 1Gbit/s. Appendix A
shows a graph of the number of working nodes over time, as well as a
vgraph of the number of bytes being handled by the network over time. At
graph of the number of bytes being handled by the network over time. At
this point the network is sufficiently diverse for further development
and testing; but of course we always encourage and welcome new nodes
to join the network.
Tor research and development has been funded by the U.S.~Navy and DARPA
Tor research and development has been funded by ONR and DARPA
for use in securing government
communications, and by the Electronic Frontier Foundation, for use
in maintaining civil liberties for ordinary citizens online. The Tor
@ -257,8 +258,8 @@ that an outside attacker can trace a stream through the Tor network
while a stream is still active simply by observing the latency of his
own traffic sent through various Tor nodes. These attacks do not show
the client address, only the first node within the Tor network, making
helper nodes all the more worthy of exploration (cf.,
Section~\ref{subsec:helper-nodes}).
helper nodes all the more worthy of exploration. (See
Section~\ref{subsec:helper-nodes}.)
Against internal attackers who sign up Tor nodes, the situation is more
complicated. In the simplest case, if an adversary has compromised $c$ of
@ -277,8 +278,8 @@ complicating factors:
(3)~Users do not in fact choose nodes with uniform probability; they
favor nodes with high bandwidth or uptime, and exit nodes that
permit connections to their favorite services.
See Section~\ref{subsec:routing-zones} for discussion of larger
adversaries and our dispersal goals.
(See Section~\ref{subsec:routing-zones} for discussion of how larger
adversaries affect our dispersal goals.)
%\begin{tightlist}
%\item If the user continues to build random circuits over time, an adversary
@ -360,10 +361,10 @@ and operations of that agency would be easier, not harder, to distinguish.
Instead, to protect our networks from traffic analysis, we must
collaboratively blend the traffic from many organizations and private
citizens, so that an eavesdropper can't tell which users are which,
and who is looking for what information. By bringing more users onto
the network, all users become more secure~\cite{econymics}.
[XXX I feel uncomfortable saying this last sentence now. -RD]
and who is looking for what information. %By bringing more users onto
%the network, all users become more secure~\cite{econymics}.
%[XXX I feel uncomfortable saying this last sentence now. -RD]
%[So, I took it out. I think we can do without it. -PFS]
Naturally, organizations will not want to depend on others for their
security. If most participating providers are reliable, Tor tolerates
some hostile infiltration of the network. For maximum protection,
@ -430,13 +431,12 @@ system design and technology development. In particular, the
Tor project's \emph{image} with respect to its users and the rest of
the Internet impacts the security it can provide.
% No image, no sustainability -NM
With this image issue in mind, this section discusses the Tor user base and
Tor's interaction with other services on the Internet.
\subsection{Communicating security}
A growing field of papers argue that usability for anonymity systems
Usability for anonymity systems
contributes directly to their security, because how usable the system
is impacts the possible anonymity set~\cite{econymics,back01}. Or
conversely, an unusable system attracts few users and thus can't provide
@ -481,13 +481,15 @@ Like Tor, the current JAP implementation does not pad connections
JAP's cascade-based network topology may be even more vulnerable to these
attacks, because the network has fewer edges. JAP was born out of
the ISDN mix design~\cite{isdn-mixes}, where padding made sense because
every user had a fixed bandwidth allocation, but in its current context
every user had a fixed bandwidth allocation and altering the timing
pattern of packets could be immediately detected, but in its current context
as a general Internet web anonymizer, adding sufficient padding to JAP
would be prohibitively expensive.\footnote{Even if JAP could
would be prohibitively expensive and probably ineffective against a
minimally active attacker.\footnote{Even if JAP could
fund higher-capacity nodes indefinitely, our experience
suggests that many users would not accept the increased per-user
bandwidth requirements, leading to an overall much smaller user base. But
cf.\ Section \ref{subsec:mid-latency}.} Therefore, since under this threat
cf.\ Section~\ref{subsec:mid-latency}.} Therefore, since under this threat
model the number of concurrent users does not seem to have much impact
on the anonymity provided, we suggest that JAP's anonymity meter is not
accurately communicating security levels to its users.
@ -611,9 +613,9 @@ wants to provide high bandwidth, but no more than a certain amount in a
giving billing cycle, to become dormant once its bandwidth is exhausted, and
to reawaken at a random offset into the next billing cycle. This feature has
interesting policy implications, however; see
Section~\ref{subsec:bandwidth-and-file-sharing} below.
the next section below.
Exit policies help to limit administrative costs by limiting the frequency of
abuse complaints.
abuse complaints. (See Section~\ref{subsec:tor-and-blacklists}.)
%[XXXX say more. Why else would you run a node? What else can we do/do we
% already do to make running a node more attractive?]
@ -696,6 +698,7 @@ file-sharing protocols that have separate control and data channels.
%your computer is doing that behavior.
\subsection{Tor and blacklists}
\label{subsec:tor-and-blacklists}
It was long expected that, alongside Tor's legitimate users, it would also
attract troublemakers who exploited Tor in order to abuse services on the
@ -730,7 +733,7 @@ and Wikipedia. We don't want to compete for (or divvy up) the NAT
protected entities of the world.
Worse, many IP blacklists are not terribly fine-grained.
No current IP blacklist, for example, allow a service provider to blacklist
No current IP blacklist, for example, allows a service provider to blacklist
only those Tor nodes that allow access to a specific IP or port, even
though this information is readily available. One IP blacklist even bans
every class C network that contains a Tor node, and recommends banning SMTP
@ -758,7 +761,7 @@ tolerably well for them in practice.
But of course, we would prefer that legitimate anonymous users be able to
access abuse-prone services. One conceivable approach would be to require
would-be IRC users, for instance, to register accounts if they wanted to
access the IRC network from Tor. But in practise, this would not
access the IRC network from Tor. In practise this would not
significantly impede abuse if creating new accounts were easily automatable;
this is why services use IP blocking. In order to deter abuse, pseudonymous
identities need to require a significant switching cost in resources or human
@ -908,14 +911,21 @@ cable-modem nodes and more nodes in distant continents. Perhaps we can
harness this increased latency to improve anonymity rather than just
reduce usability. Further, if we let clients label certain circuits as
mid-latency as they are constructed, we could handle both types of traffic
on the same network, giving users a choice between speed and security.
on the same network, giving users a choice between speed and security---and
giving researchers a chance to experiment with parameters to improve the
quality of those choices.
\subsection{Enclaves and helper nodes}
\label{subsec:helper-nodes}
It has long been thought that the best anonymity comes from running your
own node~\cite{tor-design,or-pet00}. This is called using Tor in an
\emph{enclave} configuration. Of course, Tor's default path length of
own node~\cite{tor-design,or-ih96,or-pet00}. This is called using Tor in an
\emph{enclave} configuration. By running Tor clients only on Tor nodes
at the enclave perimeter, enclave configuration can also permit anonymity
protection even when policy or other requiremnts prevent individual machines
within the enclave from running Tor clients~\cite{or-jsac98,or-discex00}.
Of course, Tor's default path length of
three is insufficient for these enclaves, since the entry and/or exit
themselves are sensitive. Tor thus increments the path length by one
for each sensitive endpoint in the circuit.
@ -1034,14 +1044,14 @@ distributed trust to spread each transaction over multiple jurisdictions.
But how do we decide whether two nodes are in related locations?
Feamster and Dingledine defined a \emph{location diversity} metric
in \cite{feamster:wpes2004}, and began investigating a variant of location
in~\cite{feamster:wpes2004}, and began investigating a variant of location
diversity based on the fact that the Internet is divided into thousands of
independently operated networks called {\em autonomous systems} (ASes).
The key insight from their paper is that while we typically think of a
connection as going directly from the Tor client to her first Tor node,
connection as going directly from the Tor client to the first Tor node,
actually it traverses many different ASes on each hop. An adversary at
any of these ASes can monitor or influence traffic. Specifically, given
plausible initiators and recipients and path random path selection,
plausible initiators and recipients, and given random path selection,
some ASes in the simulation were able to observe 10\% to 30\% of the
transactions (that is, learn both the origin and the destination) on
the deployed Tor network (33 nodes as of June 2004).
@ -1049,10 +1059,10 @@ the deployed Tor network (33 nodes as of June 2004).
The paper concludes that for best protection against the AS-level
adversary, nodes should be in ASes that have the most links to other ASes:
Tier-1 ISPs such as AT\&T and Abovenet. Further, a given transaction
is safest when it starts or ends in a Tier-1 ISP. Therefore, assuming
is safest when it starts or ends in a Tier-1 ISP\@. Therefore, assuming
initiator and responder are both in the U.S., it actually \emph{hurts}
our location diversity to add far-flung nodes in continents like Asia
or South America.
our location diversity to enter or exit from far-flung nodes in
continents like Asia or South America.
Many open questions remain. First, it will be an immense engineering
challenge to get an entire BGP routing table to each Tor client, or to
@ -1071,7 +1081,8 @@ network at all. What about taking advantage of caches like Akamai or
Google~\cite{shsm03}? (Note that they're also well-positioned as global
adversaries.)
%
Third, if we follow the paper's recommendations and tailor path selection
Third, if we follow the recommendations in~\cite{feamster:wpes2004}
and tailor path selection
to avoid choosing endpoints in similar locations, how much are we hurting
anonymity against larger real-world adversaries who can take advantage
of knowing our algorithm?
@ -1150,7 +1161,7 @@ accept many nodes (see Section~\ref{subsec:performance}).
Since the speed and reliability of a circuit is limited by its worst link,
we must learn to track and predict performance. Finally, in order to get
a large set of nodes in the first place, we must address incentives
for users to carry traffic for others (see Section incentives).
for users to carry traffic for others.
\subsection{Incentives by Design}
@ -1168,10 +1179,9 @@ seti@home. We further explain to users that they can get plausible
deniability for any traffic emerging from the same address as a Tor
exit node, and they can use their own Tor node
as entry or exit point and be confident it's not run by the adversary.
Further, users who need to be able to communicate anonymously
may run a node simply because their need to increase
expectation that such a network continues to be available to them
and usable exceeds any countervening costs.
Further, users may run a node simply because they need such a network
to be persistently available and usable.
And, the value of supporting this exceeds any countervening costs.
Finally, we can improve the usability and feature set of the software:
rate limiting support and easy packaging decrease the hassle of
maintaining a node, and our configurable exit policies allow each
@ -1197,8 +1207,8 @@ fairness of provided anonymity. An adversary can attract more traffic
by performing well or can provide targeted differential performance to
individual users to undermine their anonymity. Typically a user who
chooses evenly from all options is most resistant to an adversary
targeting him, but that approach prevents from handling heterogeneous
nodes.
targeting him, but that approach precludes the efficient use
of heterogeneous nodes.
%When a node (call him Steve) performs well for Alice, does Steve gain
%reputation with the entire system, or just with Alice? If the entire
@ -1236,14 +1246,15 @@ further study.
The published Tor design adopted a deliberately simplistic design for
authorizing new nodes and informing clients about Tor nodes and their status.
In the early Tor designs, all nodes periodically uploaded a signed description
In preliminary Tor designs, all nodes periodically uploaded a
signed description
of their locations, keys, and capabilities to each of several well-known {\it
directory servers}. These directory servers constructed a signed summary
of all known Tor nodes (a ``directory''), and a signed statement of which
nodes they
believed to be operational at any given time (a ``network status''). Clients
periodically downloaded a directory in order to learn the latest nodes and
keys, and more frequently downloaded a network status to learn which nodes are
keys, and more frequently downloaded a network status to learn which nodes were
likely to be running. Tor nodes also operate as directory caches, in order to
lighten the bandwidth on the authoritative directory servers.
@ -1258,7 +1269,7 @@ directory administrators performed little actual verification, and tended to
approve any Tor node whose operator could compose a coherent email.
This procedure
may have prevented trivial automated Sybil attacks, but would do little
against a clever attacker.
against a clever and determined attacker.
There are a number of flaws in this system that need to be addressed as we
move forward. They include:
@ -1283,7 +1294,7 @@ network capacity in order to support more users, we could simply
adopt even stricter validation requirements, and reduce the number of
nodes in the network to a trusted minimum.
But, we can only do that if can simultaneously make node capacity
scale much more than we anticipate feasible soon, and if we can find
scale much more than we anticipate to be feasible soon, and if we can find
entities willing to run such nodes, an equally daunting prospect.
@ -1355,7 +1366,8 @@ reveal the path taken by large traffic flows under low-usage circumstances.
\subsection{Non-clique topologies}
Tor's comparatively weak model makes it easier to scale than other mix net
Tor's comparatively weak threat model makes it easier to scale than
other mix net
designs. High-latency mix networks need to avoid partitioning attacks, where
network splits prevent users of the separate partitions from providing cover
for each other. In Tor, however, we assume that the adversary cannot
@ -1381,7 +1393,7 @@ scaling include restricting the number of sockets and the amount of bandwidth
used by each node. The number of sockets is determined by the network's
connectivity and the number of users, while bandwidth capacity is determined
by the total bandwidth of nodes on the network. The simplest solution to
bandwidth capacity is to add more nodes, since adding a tor node of any
bandwidth capacity is to add more nodes, since adding a Tor node of any
feasible bandwidth will increase the traffic capacity of the network. So as
a first step to scaling, we should focus on making the network tolerate more
nodes, by reducing the interconnectivity of the nodes; later we can reduce
@ -1403,7 +1415,7 @@ a sparse network.
To make matters simpler, Tor may not need an expander graph per se: it
may be enough to have a single subnet that is highly connected. As an
example, assume fifty nodes of relatively high traffic capacity. This
\emph{center} forms are a clique. Assume each center node can each
\emph{center} forms a clique. Assume each center node can
handle 200 connections to other nodes (including the other ones in the
center). Assume every noncenter node connects to three nodes in the
center and anyone out of the center that they want to. Then the
@ -1413,16 +1425,16 @@ is distributed (presumably information about the center nodes could
be given to any new nodes with their codebase), whether center nodes
will need to function as a `backbone', etc. As above the point is
that this would create problems for the expected anonymity for a mixnet,
but for an onion routing network where anonymity derives largely from
but for a low-latency network where anonymity derives largely from
the edges, it may be feasible.
Another point is that we already have a non-clique topology.
Individuals can set up and run Tor nodes without informing the
directory servers. This will allow, e.g., dissident groups to run a
local Tor network of such nodes that connects to the public Tor
network. This network is hidden behind the Tor network and its
only visible connection to Tor at those points where it connects.
As far as the public network is concerned or anyone observing it,
network. This network is hidden behind the Tor network, and its
only visible connection to Tor is at those points where it connects.
As far as the public network, or anyone observing it, is concerned,
they are running clients.
\section{The Future}
@ -1442,7 +1454,7 @@ network: as Tor grows more popular, other groups who need an overlay
network on the Internet are starting to adapt Tor to their needs.
%
Second, Tor is only one of many components that preserve privacy online.
To keep identifying information out of application traffic, we must build
To keep identifying information out of application traffic, someone must build
more and better protocol-aware proxies that are usable by ordinary people.
%
Third, we need to gain a reputation for social good, and learn how to