From 1d569eb4928acf0c6fd1870b195f11d1efe4df8c Mon Sep 17 00:00:00 2001 From: Paul Syverson Date: Tue, 8 Feb 2005 20:34:57 +0000 Subject: [PATCH] Tweaks and typos throughout. Nearly there. svn:r3586 --- doc/design-paper/challenges.tex | 134 +++++++++++++++++--------------- 1 file changed, 73 insertions(+), 61 deletions(-) diff --git a/doc/design-paper/challenges.tex b/doc/design-paper/challenges.tex index ce906fe833..9e2be66019 100644 --- a/doc/design-paper/challenges.tex +++ b/doc/design-paper/challenges.tex @@ -6,11 +6,11 @@ \usepackage{amsmath} \usepackage{epsfig} -\setlength{\textwidth}{6in} -\setlength{\textheight}{8in} -\setlength{\topmargin}{.5in} -\setlength{\oddsidemargin}{1cm} -\setlength{\evensidemargin}{1cm} +\setlength{\textwidth}{6.1in} +\setlength{\textheight}{8.5in} +\setlength{\topmargin}{1cm} +\setlength{\oddsidemargin}{.5cm} +\setlength{\evensidemargin}{.5cm} \newenvironment{tightlist}{\begin{list}{$\bullet$}{ \setlength{\itemsep}{0mm} @@ -28,7 +28,7 @@ Nick Mathewson\inst{1} \and Paul Syverson\inst{2}} \institute{The Free Haven Project \email{<\{arma,nickm\}@freehaven.net>} \and -Naval Research Lab \email{}} +Naval Research Laboratory \email{}} \maketitle \pagestyle{plain} @@ -77,14 +77,15 @@ made it possible for Tor to serve many thousands of users and attract funding from diverse sources whose goals range from security on a national scale down to the liberties of each individual. -While the Tor design paper~\cite{tor-design} gives an overall view of Tor's -design and goals, this paper describes some policy, social, and technical +While~\cite{tor-design} gives an overall view of Tor's +design and goals, this paper describes policy, social, and technical issues that we face as we continue deployment. Rather than trying to provide complete solutions to every problem here, we lay out the assumptions and constraints that we have observed while deploying Tor in the wild. In doing so, we aim to create a research agenda for others to help in addressing these issues. We believe that the issues -described here will be of general interest to projects attempting to build +described here will be of general interest to any and all +projects attempting to build and deploy practical, useable anonymity networks in the wild. %While the Tor design paper~\cite{tor-design} gives an overall view its @@ -132,7 +133,7 @@ Tor nodes on the network. The circuit is extended one hop at a time, and each node along the way knows only which node gave it data and which node it is giving data to. No individual Tor node ever knows the complete path that a data packet has taken. The client negotiates a separate set -of encryption keys for each hop along the circuit.% to ensure that each +of encryption keys for each hop along the circuit. % to ensure that each %hop can't trace these connections as they pass through. Because each node sees no more than one hop in the circuit, neither an eavesdropper nor a compromised node can use traffic @@ -140,7 +141,7 @@ analysis to link the connection's source and destination. For efficiency, the Tor software uses the same circuit for all the TCP connections that happen within the same short period. Later requests use a new -circuit, to prevent long-term linkability between different actions by +circuit, to complicate long-term linkability between different actions by a single user. Tor also makes it possible for users to hide their locations while @@ -152,25 +153,25 @@ identity. Tor attempts to anonymize the transport layer, not the application layer, so application protocols that include personally identifying information need additional application-level scrubbing proxies, such as -Privoxy~\cite{privoxy} for HTTP. Furthermore, Tor does not permit arbitrary +Privoxy~\cite{privoxy} for HTTP\@. Furthermore, Tor does not permit arbitrary IP packets; it only anonymizes TCP streams and DNS request, and only supports connections via SOCKS (see Section~\ref{subsec:tcp-vs-ip}). Most node operators do not want to allow arbitary TCP connections to leave their server. To address this, Tor provides \emph{exit policies} so that each exit node can block the IP addresses and ports it is unwilling to allow. -TRs advertise their exit policies to the directory servers, so that +Tor nodes advertise their exit policies to the directory servers, so that client can tell which nodes will support their connections. As of January 2005, the Tor network has grown to around a hundred nodes on four continents, with a total capacity exceeding 1Gbit/s. Appendix A shows a graph of the number of working nodes over time, as well as a -vgraph of the number of bytes being handled by the network over time. At +graph of the number of bytes being handled by the network over time. At this point the network is sufficiently diverse for further development and testing; but of course we always encourage and welcome new nodes to join the network. -Tor research and development has been funded by the U.S.~Navy and DARPA +Tor research and development has been funded by ONR and DARPA for use in securing government communications, and by the Electronic Frontier Foundation, for use in maintaining civil liberties for ordinary citizens online. The Tor @@ -257,8 +258,8 @@ that an outside attacker can trace a stream through the Tor network while a stream is still active simply by observing the latency of his own traffic sent through various Tor nodes. These attacks do not show the client address, only the first node within the Tor network, making -helper nodes all the more worthy of exploration (cf., -Section~\ref{subsec:helper-nodes}). +helper nodes all the more worthy of exploration. (See +Section~\ref{subsec:helper-nodes}.) Against internal attackers who sign up Tor nodes, the situation is more complicated. In the simplest case, if an adversary has compromised $c$ of @@ -277,8 +278,8 @@ complicating factors: (3)~Users do not in fact choose nodes with uniform probability; they favor nodes with high bandwidth or uptime, and exit nodes that permit connections to their favorite services. -See Section~\ref{subsec:routing-zones} for discussion of larger -adversaries and our dispersal goals. +(See Section~\ref{subsec:routing-zones} for discussion of how larger +adversaries affect our dispersal goals.) %\begin{tightlist} %\item If the user continues to build random circuits over time, an adversary @@ -360,10 +361,10 @@ and operations of that agency would be easier, not harder, to distinguish. Instead, to protect our networks from traffic analysis, we must collaboratively blend the traffic from many organizations and private citizens, so that an eavesdropper can't tell which users are which, -and who is looking for what information. By bringing more users onto -the network, all users become more secure~\cite{econymics}. -[XXX I feel uncomfortable saying this last sentence now. -RD] - +and who is looking for what information. %By bringing more users onto +%the network, all users become more secure~\cite{econymics}. +%[XXX I feel uncomfortable saying this last sentence now. -RD] +%[So, I took it out. I think we can do without it. -PFS] Naturally, organizations will not want to depend on others for their security. If most participating providers are reliable, Tor tolerates some hostile infiltration of the network. For maximum protection, @@ -430,13 +431,12 @@ system design and technology development. In particular, the Tor project's \emph{image} with respect to its users and the rest of the Internet impacts the security it can provide. % No image, no sustainability -NM - With this image issue in mind, this section discusses the Tor user base and Tor's interaction with other services on the Internet. \subsection{Communicating security} -A growing field of papers argue that usability for anonymity systems +Usability for anonymity systems contributes directly to their security, because how usable the system is impacts the possible anonymity set~\cite{econymics,back01}. Or conversely, an unusable system attracts few users and thus can't provide @@ -481,13 +481,15 @@ Like Tor, the current JAP implementation does not pad connections JAP's cascade-based network topology may be even more vulnerable to these attacks, because the network has fewer edges. JAP was born out of the ISDN mix design~\cite{isdn-mixes}, where padding made sense because -every user had a fixed bandwidth allocation, but in its current context +every user had a fixed bandwidth allocation and altering the timing +pattern of packets could be immediately detected, but in its current context as a general Internet web anonymizer, adding sufficient padding to JAP -would be prohibitively expensive.\footnote{Even if JAP could +would be prohibitively expensive and probably ineffective against a +minimally active attacker.\footnote{Even if JAP could fund higher-capacity nodes indefinitely, our experience suggests that many users would not accept the increased per-user bandwidth requirements, leading to an overall much smaller user base. But -cf.\ Section \ref{subsec:mid-latency}.} Therefore, since under this threat +cf.\ Section~\ref{subsec:mid-latency}.} Therefore, since under this threat model the number of concurrent users does not seem to have much impact on the anonymity provided, we suggest that JAP's anonymity meter is not accurately communicating security levels to its users. @@ -611,9 +613,9 @@ wants to provide high bandwidth, but no more than a certain amount in a giving billing cycle, to become dormant once its bandwidth is exhausted, and to reawaken at a random offset into the next billing cycle. This feature has interesting policy implications, however; see -Section~\ref{subsec:bandwidth-and-file-sharing} below. +the next section below. Exit policies help to limit administrative costs by limiting the frequency of -abuse complaints. +abuse complaints. (See Section~\ref{subsec:tor-and-blacklists}.) %[XXXX say more. Why else would you run a node? What else can we do/do we % already do to make running a node more attractive?] @@ -696,6 +698,7 @@ file-sharing protocols that have separate control and data channels. %your computer is doing that behavior. \subsection{Tor and blacklists} +\label{subsec:tor-and-blacklists} It was long expected that, alongside Tor's legitimate users, it would also attract troublemakers who exploited Tor in order to abuse services on the @@ -730,7 +733,7 @@ and Wikipedia. We don't want to compete for (or divvy up) the NAT protected entities of the world. Worse, many IP blacklists are not terribly fine-grained. -No current IP blacklist, for example, allow a service provider to blacklist +No current IP blacklist, for example, allows a service provider to blacklist only those Tor nodes that allow access to a specific IP or port, even though this information is readily available. One IP blacklist even bans every class C network that contains a Tor node, and recommends banning SMTP @@ -758,7 +761,7 @@ tolerably well for them in practice. But of course, we would prefer that legitimate anonymous users be able to access abuse-prone services. One conceivable approach would be to require would-be IRC users, for instance, to register accounts if they wanted to -access the IRC network from Tor. But in practise, this would not +access the IRC network from Tor. In practise this would not significantly impede abuse if creating new accounts were easily automatable; this is why services use IP blocking. In order to deter abuse, pseudonymous identities need to require a significant switching cost in resources or human @@ -908,14 +911,21 @@ cable-modem nodes and more nodes in distant continents. Perhaps we can harness this increased latency to improve anonymity rather than just reduce usability. Further, if we let clients label certain circuits as mid-latency as they are constructed, we could handle both types of traffic -on the same network, giving users a choice between speed and security. +on the same network, giving users a choice between speed and security---and +giving researchers a chance to experiment with parameters to improve the +quality of those choices. \subsection{Enclaves and helper nodes} \label{subsec:helper-nodes} It has long been thought that the best anonymity comes from running your -own node~\cite{tor-design,or-pet00}. This is called using Tor in an -\emph{enclave} configuration. Of course, Tor's default path length of +own node~\cite{tor-design,or-ih96,or-pet00}. This is called using Tor in an +\emph{enclave} configuration. By running Tor clients only on Tor nodes +at the enclave perimeter, enclave configuration can also permit anonymity +protection even when policy or other requiremnts prevent individual machines +within the enclave from running Tor clients~\cite{or-jsac98,or-discex00}. + +Of course, Tor's default path length of three is insufficient for these enclaves, since the entry and/or exit themselves are sensitive. Tor thus increments the path length by one for each sensitive endpoint in the circuit. @@ -1034,14 +1044,14 @@ distributed trust to spread each transaction over multiple jurisdictions. But how do we decide whether two nodes are in related locations? Feamster and Dingledine defined a \emph{location diversity} metric -in \cite{feamster:wpes2004}, and began investigating a variant of location +in~\cite{feamster:wpes2004}, and began investigating a variant of location diversity based on the fact that the Internet is divided into thousands of independently operated networks called {\em autonomous systems} (ASes). The key insight from their paper is that while we typically think of a -connection as going directly from the Tor client to her first Tor node, +connection as going directly from the Tor client to the first Tor node, actually it traverses many different ASes on each hop. An adversary at any of these ASes can monitor or influence traffic. Specifically, given -plausible initiators and recipients and path random path selection, +plausible initiators and recipients, and given random path selection, some ASes in the simulation were able to observe 10\% to 30\% of the transactions (that is, learn both the origin and the destination) on the deployed Tor network (33 nodes as of June 2004). @@ -1049,10 +1059,10 @@ the deployed Tor network (33 nodes as of June 2004). The paper concludes that for best protection against the AS-level adversary, nodes should be in ASes that have the most links to other ASes: Tier-1 ISPs such as AT\&T and Abovenet. Further, a given transaction -is safest when it starts or ends in a Tier-1 ISP. Therefore, assuming +is safest when it starts or ends in a Tier-1 ISP\@. Therefore, assuming initiator and responder are both in the U.S., it actually \emph{hurts} -our location diversity to add far-flung nodes in continents like Asia -or South America. +our location diversity to enter or exit from far-flung nodes in +continents like Asia or South America. Many open questions remain. First, it will be an immense engineering challenge to get an entire BGP routing table to each Tor client, or to @@ -1071,7 +1081,8 @@ network at all. What about taking advantage of caches like Akamai or Google~\cite{shsm03}? (Note that they're also well-positioned as global adversaries.) % -Third, if we follow the paper's recommendations and tailor path selection +Third, if we follow the recommendations in~\cite{feamster:wpes2004} + and tailor path selection to avoid choosing endpoints in similar locations, how much are we hurting anonymity against larger real-world adversaries who can take advantage of knowing our algorithm? @@ -1150,7 +1161,7 @@ accept many nodes (see Section~\ref{subsec:performance}). Since the speed and reliability of a circuit is limited by its worst link, we must learn to track and predict performance. Finally, in order to get a large set of nodes in the first place, we must address incentives -for users to carry traffic for others (see Section incentives). +for users to carry traffic for others. \subsection{Incentives by Design} @@ -1168,10 +1179,9 @@ seti@home. We further explain to users that they can get plausible deniability for any traffic emerging from the same address as a Tor exit node, and they can use their own Tor node as entry or exit point and be confident it's not run by the adversary. -Further, users who need to be able to communicate anonymously -may run a node simply because their need to increase -expectation that such a network continues to be available to them -and usable exceeds any countervening costs. +Further, users may run a node simply because they need such a network +to be persistently available and usable. +And, the value of supporting this exceeds any countervening costs. Finally, we can improve the usability and feature set of the software: rate limiting support and easy packaging decrease the hassle of maintaining a node, and our configurable exit policies allow each @@ -1197,8 +1207,8 @@ fairness of provided anonymity. An adversary can attract more traffic by performing well or can provide targeted differential performance to individual users to undermine their anonymity. Typically a user who chooses evenly from all options is most resistant to an adversary -targeting him, but that approach prevents from handling heterogeneous -nodes. +targeting him, but that approach precludes the efficient use +of heterogeneous nodes. %When a node (call him Steve) performs well for Alice, does Steve gain %reputation with the entire system, or just with Alice? If the entire @@ -1236,14 +1246,15 @@ further study. The published Tor design adopted a deliberately simplistic design for authorizing new nodes and informing clients about Tor nodes and their status. -In the early Tor designs, all nodes periodically uploaded a signed description +In preliminary Tor designs, all nodes periodically uploaded a +signed description of their locations, keys, and capabilities to each of several well-known {\it directory servers}. These directory servers constructed a signed summary of all known Tor nodes (a ``directory''), and a signed statement of which nodes they believed to be operational at any given time (a ``network status''). Clients periodically downloaded a directory in order to learn the latest nodes and -keys, and more frequently downloaded a network status to learn which nodes are +keys, and more frequently downloaded a network status to learn which nodes were likely to be running. Tor nodes also operate as directory caches, in order to lighten the bandwidth on the authoritative directory servers. @@ -1258,7 +1269,7 @@ directory administrators performed little actual verification, and tended to approve any Tor node whose operator could compose a coherent email. This procedure may have prevented trivial automated Sybil attacks, but would do little -against a clever attacker. +against a clever and determined attacker. There are a number of flaws in this system that need to be addressed as we move forward. They include: @@ -1283,7 +1294,7 @@ network capacity in order to support more users, we could simply adopt even stricter validation requirements, and reduce the number of nodes in the network to a trusted minimum. But, we can only do that if can simultaneously make node capacity -scale much more than we anticipate feasible soon, and if we can find +scale much more than we anticipate to be feasible soon, and if we can find entities willing to run such nodes, an equally daunting prospect. @@ -1355,7 +1366,8 @@ reveal the path taken by large traffic flows under low-usage circumstances. \subsection{Non-clique topologies} -Tor's comparatively weak model makes it easier to scale than other mix net +Tor's comparatively weak threat model makes it easier to scale than +other mix net designs. High-latency mix networks need to avoid partitioning attacks, where network splits prevent users of the separate partitions from providing cover for each other. In Tor, however, we assume that the adversary cannot @@ -1381,7 +1393,7 @@ scaling include restricting the number of sockets and the amount of bandwidth used by each node. The number of sockets is determined by the network's connectivity and the number of users, while bandwidth capacity is determined by the total bandwidth of nodes on the network. The simplest solution to -bandwidth capacity is to add more nodes, since adding a tor node of any +bandwidth capacity is to add more nodes, since adding a Tor node of any feasible bandwidth will increase the traffic capacity of the network. So as a first step to scaling, we should focus on making the network tolerate more nodes, by reducing the interconnectivity of the nodes; later we can reduce @@ -1403,7 +1415,7 @@ a sparse network. To make matters simpler, Tor may not need an expander graph per se: it may be enough to have a single subnet that is highly connected. As an example, assume fifty nodes of relatively high traffic capacity. This -\emph{center} forms are a clique. Assume each center node can each +\emph{center} forms a clique. Assume each center node can handle 200 connections to other nodes (including the other ones in the center). Assume every noncenter node connects to three nodes in the center and anyone out of the center that they want to. Then the @@ -1413,16 +1425,16 @@ is distributed (presumably information about the center nodes could be given to any new nodes with their codebase), whether center nodes will need to function as a `backbone', etc. As above the point is that this would create problems for the expected anonymity for a mixnet, -but for an onion routing network where anonymity derives largely from +but for a low-latency network where anonymity derives largely from the edges, it may be feasible. Another point is that we already have a non-clique topology. Individuals can set up and run Tor nodes without informing the directory servers. This will allow, e.g., dissident groups to run a local Tor network of such nodes that connects to the public Tor -network. This network is hidden behind the Tor network and its -only visible connection to Tor at those points where it connects. -As far as the public network is concerned or anyone observing it, +network. This network is hidden behind the Tor network, and its +only visible connection to Tor is at those points where it connects. +As far as the public network, or anyone observing it, is concerned, they are running clients. \section{The Future} @@ -1442,7 +1454,7 @@ network: as Tor grows more popular, other groups who need an overlay network on the Internet are starting to adapt Tor to their needs. % Second, Tor is only one of many components that preserve privacy online. -To keep identifying information out of application traffic, we must build +To keep identifying information out of application traffic, someone must build more and better protocol-aware proxies that are usable by ordinary people. % Third, we need to gain a reputation for social good, and learn how to