and a practical design for location-hidden services via rendezvous
points. Tor works on the real-world
Internet, requires no special privileges or kernel modifications, requires
little synchronization or coordination between nodes, and provides a
reasonable tradeoff between anonymity, usability, and efficiency.
We briefly describe our experiences with an international network of
more than 30 nodes. We close with a list of open problems in anonymous communication.
<divclass="p"><!----></div>
<divclass="p"><!----></div>
<divclass="p"><!----></div>
<h2><aname="tth_sEc1">
1</a> Overview</h2>
<aname="sec:intro">
</a>
<divclass="p"><!----></div>
Onion Routing is a distributed overlay network designed to anonymize
TCP-based applications like web browsing, secure shell,
and instant messaging. Clients choose a path through the network and
build a <em>circuit</em>, in which each node (or "onion router" or "OR")
in the path knows its predecessor and successor, but no other nodes in
the circuit. Traffic flows down the circuit in fixed-size
<em>cells</em>, which are unwrapped by a symmetric key at each node
(like the layers of an onion) and relayed downstream. The
Onion Routing project published several design and analysis
papers [<ahref="#or-ih96"name="CITEor-ih96">27</a>,<ahref="#or-jsac98"name="CITEor-jsac98">41</a>,<ahref="#or-discex00"name="CITEor-discex00">48</a>,<ahref="#or-pet00"name="CITEor-pet00">49</a>]. While a wide area Onion
Routing network was deployed briefly, the only long-running
public implementation was a fragile
proof-of-concept that ran on a single machine. Even this simple deployment
processed connections from over sixty thousand distinct IP addresses from
all over the world at a rate of about fifty thousand per day.
But many critical design and deployment issues were never
resolved, and the design has not been updated in years. Here
we describe Tor, a protocol for asynchronous, loosely federated onion
routers that provides the following improvements over the old Onion
Routing design:
<divclass="p"><!----></div>
<b>Perfect forward secrecy:</b> In the original Onion Routing design,
a single hostile node could record traffic and
later compromise successive nodes in the circuit and force them
to decrypt it. Rather than using a single multiply encrypted data
structure (an <em>onion</em>) to lay each circuit,
Tor now uses an incremental or <em>telescoping</em> path-building design,
where the initiator negotiates session keys with each successive hop in
the circuit. Once these keys are deleted, subsequently compromised nodes
cannot decrypt old traffic. As a side benefit, onion replay detection
is no longer necessary, and the process of building circuits is more
reliable, since the initiator knows when a hop fails and can then try
extending to a new node.
<divclass="p"><!----></div>
<b>Separation of "protocol cleaning" from anonymity:</b>
Onion Routing originally required a separate "application
proxy" for each supported application protocol-most of which were
never written, so many applications were never supported. Tor uses the
standard and near-ubiquitous SOCKS [<ahref="#socks4"name="CITEsocks4">32</a>] proxy interface, allowing
us to support most TCP-based programs without modification. Tor now
relies on the filtering features of privacy-enhancing
application-level proxies such as Privoxy [<ahref="#privoxy"name="CITEprivoxy">39</a>], without trying
to duplicate those features itself.
<divclass="p"><!----></div>
<b>No mixing, padding, or traffic shaping (yet):</b> Onion
Routing originally called for batching and reordering cells as they arrived,
assumed padding between ORs, and in
later designs added padding between onion proxies (users) and
ORs [<ahref="#or-ih96"name="CITEor-ih96">27</a>,<ahref="#or-jsac98"name="CITEor-jsac98">41</a>]. Tradeoffs between padding protection
and cost were discussed, and <em>traffic shaping</em> algorithms were
theorized [<ahref="#or-pet00"name="CITEor-pet00">49</a>] to provide good security without expensive
padding, but no concrete padding scheme was suggested.
<b>PipeNet</b> [<ahref="#back01"name="CITEback01">5</a>,<ahref="#pipenet"name="CITEpipenet">12</a>], another low-latency design proposed
around the same time as Onion Routing, gave
stronger anonymity but allowed a single user to shut
down the network by not sending. Systems like <b>ISDN
mixes</b> [<ahref="#isdn-mixes"name="CITEisdn-mixes">38</a>] were designed for other environments with
different assumptions.
<divclass="p"><!----></div>
In P2P designs like <b>Tarzan</b> [<ahref="#tarzan:ccs02"name="CITEtarzan:ccs02">24</a>] and
<b>MorphMix</b> [<ahref="#morphmix:fc04"name="CITEmorphmix:fc04">43</a>], all participants both generate
traffic and relay traffic for others. These systems aim to conceal
whether a given peer originated a request
or just relayed it from another peer. While Tarzan and MorphMix use
layered encryption as above, <b>Crowds</b> [<ahref="#crowds-tissec"name="CITEcrowds-tissec">42</a>] simply assumes
an adversary who cannot observe the initiator: it uses no public-key
encryption, so any node on a circuit can read users' traffic.
<divclass="p"><!----></div>
<b>Hordes</b> [<ahref="#hordes-jcs"name="CITEhordes-jcs">34</a>] is based on Crowds but also uses multicast
responses to hide the initiator. <b>Herbivore</b> [<ahref="#herbivore"name="CITEherbivore">25</a>] and
<b>P</b><sup><b>5</b></sup> [<ahref="#p5"name="CITEp5">46</a>] go even further, requiring broadcast.
These systems are designed primarily for communication among peers,
although Herbivore users can make external connections by
requesting a peer to serve as a proxy.
<divclass="p"><!----></div>
Systems like <b>Freedom</b> and the original Onion Routing build circuits
all at once, using a layered "onion" of public-key encrypted messages,
each layer of which provides session keys and the address of the
next server in the circuit. Tor as described herein, Tarzan, MorphMix,
<b>Cebolla</b> [<ahref="#cebolla"name="CITEcebolla">9</a>], and Rennhard's <b>Anonymity Network</b> [<ahref="#anonnet"name="CITEanonnet">44</a>]
build circuits
in stages, extending them one hop at a time.
Section <ahref="#subsubsec:constructing-a-circuit">4.2</a> describes how this
approach enables perfect forward secrecy.
<divclass="p"><!----></div>
Circuit-based designs must choose which protocol layer
to anonymize. They may intercept IP packets directly, and
relay them whole (stripping the source address) along the
circuit [<ahref="#freedom2-arch"name="CITEfreedom2-arch">8</a>,<ahref="#tarzan:ccs02"name="CITEtarzan:ccs02">24</a>]. Like
Tor, they may accept TCP streams and relay the data in those streams,
ignoring the breakdown of that data into TCP
segments [<ahref="#morphmix:fc04"name="CITEmorphmix:fc04">43</a>,<ahref="#anonnet"name="CITEanonnet">44</a>]. Finally, like Crowds, they may accept
application-level protocols such as HTTP and relay the application
requests themselves.
Making this protocol-layer decision requires a compromise between flexibility
and anonymity. For example, a system that understands HTTP
can strip
identifying information from requests, can take advantage of caching
to limit the number of requests that leave the network, and can batch
or encode requests to minimize the number of connections.
On the other hand, an IP-level anonymizer can handle nearly any protocol,
even ones unforeseen by its designers (though these systems require
kernel-level modifications to some operating systems, and so are more
complex and less portable). TCP-level anonymity networks like Tor present
a middle approach: they are application neutral (so long as the
application supports, or can be tunneled across, TCP), but by treating
application connections as data streams rather than raw TCP packets,
they avoid the inefficiencies of tunneling TCP over
TCP.
<divclass="p"><!----></div>
Distributed-trust anonymizing systems need to prevent attackers from
adding too many servers and thus compromising user paths.
Tor relies on a small set of well-known directory servers, run by
independent parties, to decide which nodes can
join. Tarzan and MorphMix allow unknown users to run servers, and use
a limited resource (like IP addresses) to prevent an attacker from
controlling too much of the network. Crowds suggests requiring
written, notarized requests from potential crowd members.
<divclass="p"><!----></div>
Anonymous communication is essential for censorship-resistant
systems like Eternity [<ahref="#eternity"name="CITEeternity">2</a>], Free Haven [<ahref="#freehaven-berk"name="CITEfreehaven-berk">19</a>],
Publius [<ahref="#publius"name="CITEpublius">53</a>], and Tangler [<ahref="#tangler"name="CITEtangler">52</a>]. Tor's rendezvous
points enable connections between mutually anonymous entities; they
are a building block for location-hidden servers, which are needed by
Eternity and Free Haven.
<divclass="p"><!----></div>
<divclass="p"><!----></div>
<h2><aname="tth_sEc3">
3</a> Design goals and assumptions</h2>
<aname="sec:assumptions">
</a>
<divclass="p"><!----></div>
<fontsize="+1"><b>Goals</b></font><br/>
Like other low-latency anonymity designs, Tor seeks to frustrate
attackers from linking communication partners, or from linking
multiple communications to or from a single user. Within this
main goal, however, several considerations have directed
Tor's evolution.
<divclass="p"><!----></div>
<b>Deployability:</b> The design must be deployed and used in the
real world. Thus it
must not be expensive to run (for example, by requiring more bandwidth
than volunteers are willing to provide); must not place a heavy
liability burden on operators (for example, by allowing attackers to
implicate onion routers in illegal activities); and must not be
difficult or expensive to implement (for example, by requiring kernel
patches, or separate proxies for every protocol). We also cannot
require non-anonymous parties (such as websites)
to run our software. (Our rendezvous point design does not meet
this goal for non-anonymous users talking to hidden servers,
however; see Section <ahref="#sec:rendezvous">5</a>.)
<divclass="p"><!----></div>
<b>Usability:</b> A hard-to-use system has fewer users-and because
anonymity systems hide users among users, a system with fewer users
provides less anonymity. Usability is thus not only a convenience:
it is a security requirement [<ahref="#econymics"name="CITEeconymics">1</a>,<ahref="#back01"name="CITEback01">5</a>]. Tor should
therefore not
require modifying familiar applications; should not introduce prohibitive
delays;
and should require as few configuration decisions
as possible. Finally, Tor should be easily implementable on all common
platforms; we cannot require users to change their operating system
to be anonymous. (Tor currently runs on Win32, Linux,
Solaris, BSD-style Unix, MacOS X, and probably others.)
<divclass="p"><!----></div>
<b>Flexibility:</b> The protocol must be flexible and well-specified,
so Tor can serve as a test-bed for future research.
Many of the open problems in low-latency anonymity
networks, such as generating dummy traffic or preventing Sybil
attacks [<ahref="#sybil"name="CITEsybil">22</a>], may be solvable independently from the issues
solved by
Tor. Hopefully future systems will not need to reinvent Tor's design.
<divclass="p"><!----></div>
<b>Simple design:</b> The protocol's design and security
parameters must be well-understood. Additional features impose implementation
and complexity costs; adding unproven techniques to the design threatens
deployability, readability, and ease of security analysis. Tor aims to
deploy a simple and stable system that integrates the best accepted
In the second step, Bob proves that it was he who received g<sup>x</sup>,
and who chose y. We use PK encryption in the first step
(rather than, say, using the first two steps of STS, which has a
signature in the second step) because a single cell is too small to
hold both a public key and a signature. Preliminary analysis with the
NRL protocol analyzer [<ahref="#meadows96"name="CITEmeadows96">35</a>] shows this protocol to be
secure (including perfect forward secrecy) under the
traditional Dolev-Yao model.<br/>
<divclass="p"><!----></div>
<fontsize="+1"><b>Relay cells</b></font><br/>
Once Alice has established the circuit (so she shares keys with each
OR on the circuit), she can send relay cells.
Upon receiving a relay
cell, an OR looks up the corresponding circuit, and decrypts the relay
header and payload with the session key for that circuit.
If the cell is headed away from Alice the OR then checks whether the
decrypted cell has a valid digest (as an optimization, the first
two bytes of the integrity check are zero, so in most cases we can avoid
computing the hash).
If valid, it accepts the relay cell and processes it as described
below. Otherwise,
the OR looks up the circID and OR for the
next step in the circuit, replaces the circID as appropriate, and
sends the decrypted relay cell to the next OR. (If the OR at the end
of the circuit receives an unrecognized relay cell, an error has
occurred, and the circuit is torn down.)
<divclass="p"><!----></div>
OPs treat incoming relay cells similarly: they iteratively unwrap the
relay header and payload with the session keys shared with each
OR on the circuit, from the closest to farthest.
If at any stage the digest is valid, the cell must have
originated at the OR whose encryption has just been removed.
<divclass="p"><!----></div>
To construct a relay cell addressed to a given OR, Alice assigns the
digest, and then iteratively
encrypts the cell payload (that is, the relay header and payload) with
the symmetric key of each hop up to that OR. Because the digest is
encrypted to a different value at each step, only at the targeted OR
will it have a meaningful value.<ahref="#tthFtNtAAC"name="tthFrefAAC"><sup>2</sup></a>
This <em>leaky pipe</em> circuit topology
allows Alice's streams to exit at different ORs on a single circuit.
Alice may choose different exit points because of their exit policies,
or to keep the ORs from knowing that two streams
originate from the same person.
<divclass="p"><!----></div>
When an OR later replies to Alice with a relay cell, it
encrypts the cell's relay header and payload with the single key it
shares with Alice, and sends the cell back toward Alice along the
circuit. Subsequent ORs add further layers of encryption as they
relay the cell back to Alice.
<divclass="p"><!----></div>
To tear down a circuit, Alice sends a <em>destroy</em> control
cell. Each OR in the circuit receives the <em>destroy</em> cell, closes
all streams on that circuit, and passes a new <em>destroy</em> cell
forward. But just as circuits are built incrementally, they can also
be torn down incrementally: Alice can send a <em>relay
truncate</em> cell to a single OR on a circuit. That OR then sends a
<em>destroy</em> cell forward, and acknowledges with a
<em>relay truncated</em> cell. Alice can then extend the circuit to
different nodes, without signaling to the intermediate nodes (or
a limited observer) that she has changed her circuit.
Similarly, if a node on the circuit goes down, the adjacent
node can send a <em>relay truncated</em> cell back to Alice. Thus the
"break a node and see which circuits go down"
attack [<ahref="#freedom21-security"name="CITEfreedom21-security">4</a>] is weakened.
<divclass="p"><!----></div>
<h3><aname="tth_sEc4.3">
4.3</a> Opening and closing streams</h3>
<aname="subsec:tcp">
</a>
<divclass="p"><!----></div>
When Alice's application wants a TCP connection to a given
address and port, it asks the OP (via SOCKS) to make the
connection. The OP chooses the newest open circuit (or creates one if
needed), and chooses a suitable OR on that circuit to be the
exit node (usually the last node, but maybe others due to exit policy
conflicts; see Section <ahref="#subsec:exitpolicies">6.2</a>.) The OP then opens
the stream by sending a <em>relay begin</em> cell to the exit node,
using a new random streamID. Once the
exit node connects to the remote host, it responds
with a <em>relay connected</em> cell. Upon receipt, the OP sends a
SOCKS reply to notify the application of its success. The OP
now accepts data from the application's TCP stream, packaging it into
<em>relay data</em> cells and sending those cells along the circuit to
the chosen OR.
<divclass="p"><!----></div>
There's a catch to using SOCKS, however-some applications pass the
alphanumeric hostname to the Tor client, while others resolve it into
an IP address first and then pass the IP address to the Tor client. If
the application does DNS resolution first, Alice thereby reveals her
destination to the remote DNS server, rather than sending the hostname
through the Tor network to be resolved at the far end. Common applications
like Mozilla and SSH have this flaw.
<divclass="p"><!----></div>
With Mozilla, the flaw is easy to address: the filtering HTTP
proxy called Privoxy gives a hostname to the Tor client, so Alice's
computer never does DNS resolution.
But a portable general solution, such as is needed for
SSH, is
an open problem. Modifying or replacing the local nameserver
can be invasive, brittle, and unportable. Forcing the resolver
library to prefer TCP rather than UDP is hard, and also has
portability problems. Dynamically intercepting system calls to the
resolver library seems a promising direction. We could also provide
a tool similar to <em>dig</em> to perform a private lookup through the
Tor network. Currently, we encourage the use of privacy-aware proxies
like Privoxy wherever possible.
<divclass="p"><!----></div>
Closing a Tor stream is analogous to closing a TCP stream: it uses a
two-step handshake for normal operation, or a one-step handshake for
errors. If the stream closes abnormally, the adjacent node simply sends a
<em>relay teardown</em> cell. If the stream closes normally, the node sends
a <em>relay end</em> cell down the circuit, and the other side responds with
its own <em>relay end</em> cell. Because
all relay cells use layered encryption, only the destination OR knows
that a given relay cell is a request to close a stream. This two-step
handshake allows Tor to support TCP-based applications that use half-closed
connections.
<divclass="p"><!----></div>
<h3><aname="tth_sEc4.4">
4.4</a> Integrity checking on streams</h3>
<aname="subsec:integrity-checking">
</a>
<divclass="p"><!----></div>
Because the old Onion Routing design used a stream cipher without integrity
checking, traffic was
vulnerable to a malleability attack: though the attacker could not
decrypt cells, any changes to encrypted data
would create corresponding changes to the data leaving the network.
This weakness allowed an adversary who could guess the encrypted content
to change a padding cell to a destroy
cell; change the destination address in a <em>relay begin</em> cell to the
adversary's webserver; or change an FTP command from
<tt>dir</tt> to <tt>rm *</tt>. (Even an external
adversary could do this, because the link encryption similarly used a
stream cipher.)
<divclass="p"><!----></div>
Because Tor uses TLS on its links, external adversaries cannot modify
data. Addressing the insider malleability attack, however, is
more complex.
<divclass="p"><!----></div>
We could do integrity checking of the relay cells at each hop, either
by including hashes or by using an authenticating cipher mode like
EAX [<ahref="#eax"name="CITEeax">6</a>], but there are some problems. First, these approaches
impose a message-expansion overhead at each hop, and so we would have to
either leak the path length or waste bytes by padding to a maximum
path length. Second, these solutions can only verify traffic coming
from Alice: ORs would not be able to produce suitable hashes for
the intermediate hops, since the ORs on a circuit do not know the
other ORs' session keys. Third, we have already accepted that our design
is vulnerable to end-to-end timing attacks; so tagging attacks performed
within the circuit provide no additional information to the attacker.
<divclass="p"><!----></div>
Thus, we check integrity only at the edges of each stream. (Remember that
in our leaky-pipe circuit topology, a stream's edge could be any hop
in the circuit.) When Alice
negotiates a key with a new hop, they each initialize a SHA-1
digest with a derivative of that key,
thus beginning with randomness that only the two of them know.
Then they each incrementally add to the SHA-1 digest the contents of
all relay cells they create, and include with each relay cell the
first four bytes of the current digest. Each also keeps a SHA-1
digest of data received, to verify that the received hashes are correct.
<divclass="p"><!----></div>
To be sure of removing or modifying a cell, the attacker must be able
to deduce the current digest state (which depends on all
traffic between Alice and Bob, starting with their negotiated key).
Attacks on SHA-1 where the adversary can incrementally add to a hash
to produce a new valid hash don't work, because all hashes are
end-to-end encrypted across the circuit. The computational overhead
of computing the digests is minimal compared to doing the AES
encryption performed at each hop of the circuit. We use only four
bytes per cell to minimize overhead; the chance that an adversary will
correctly guess a valid hash
is
acceptably low, given that the OP or OR tear down the circuit if they
receive a bad hash.
<divclass="p"><!----></div>
<h3><aname="tth_sEc4.5">
4.5</a> Rate limiting and fairness</h3>
<aname="subsec:rate-limit">
</a>
<divclass="p"><!----></div>
Volunteers are more willing to run services that can limit
their bandwidth usage. To accommodate them, Tor servers use a
token bucket approach [<ahref="#tannenbaum96"name="CITEtannenbaum96">50</a>] to
enforce a long-term average rate of incoming bytes, while still
permitting short-term bursts above the allowed bandwidth.
<divclass="p"><!----></div>
<divclass="p"><!----></div>
Because the Tor protocol outputs about the same number of bytes as it
takes in, it is sufficient in practice to limit only incoming bytes.
With TCP streams, however, the correspondence is not one-to-one:
relaying a single incoming byte can require an entire 512-byte cell.
(We can't just wait for more bytes, because the local application may
be awaiting a reply.) Therefore, we treat this case as if the entire
cell size had been read, regardless of the cell's fullness.
<divclass="p"><!----></div>
Further, inspired by Rennhard et al's design in [<ahref="#anonnet"name="CITEanonnet">44</a>], a
circuit's edges can heuristically distinguish interactive streams from bulk
streams by comparing the frequency with which they supply cells. We can
provide good latency for interactive streams by giving them preferential
service, while still giving good overall throughput to the bulk
streams. Such preferential treatment presents a possible end-to-end
attack, but an adversary observing both
ends of the stream can already learn this information through timing
attacks.
<divclass="p"><!----></div>
<h3><aname="tth_sEc4.6">
4.6</a> Congestion control</h3>
<aname="subsec:congestion">
</a>
<divclass="p"><!----></div>
Even with bandwidth rate limiting, we still need to worry about
congestion, either accidental or intentional. If enough users choose the
same OR-to-OR connection for their circuits, that connection can become
saturated. For example, an attacker could send a large file
through the Tor network to a webserver he runs, and then
refuse to read any of the bytes at the webserver end of the
circuit. Without some congestion control mechanism, these bottlenecks
can propagate back through the entire network. We don't need to
reimplement full TCP windows (with sequence numbers,
the ability to drop cells when we're full and retransmit later, and so
on),
because TCP already guarantees in-order delivery of each
cell.
We describe our response below.
<divclass="p"><!----></div>
<b>Circuit-level throttling:</b>
To control a circuit's bandwidth usage, each OR keeps track of two
windows. The <em>packaging window</em> tracks how many relay data cells the OR is
allowed to package (from incoming TCP streams) for transmission back to the OP,
and the <em>delivery window</em> tracks how many relay data cells it is willing
to deliver to TCP streams outside the network. Each window is initialized
(say, to 1000 data cells). When a data cell is packaged or delivered,
the appropriate window is decremented. When an OR has received enough
data cells (currently 100), it sends a <em>relay sendme</em> cell towards the OP,
with streamID zero. When an OR receives a <em>relay sendme</em> cell with
streamID zero, it increments its packaging window. Either of these cells
increments the corresponding window by 100. If the packaging window
reaches 0, the OR stops reading from TCP connections for all streams
on the corresponding circuit, and sends no more relay data cells until
receiving a <em>relay sendme</em> cell.
<divclass="p"><!----></div>
The OP behaves identically, except that it must track a packaging window
and a delivery window for every OR in the circuit. If a packaging window
reaches 0, it stops reading from streams destined for that OR.
<divclass="p"><!----></div>
<b>Stream-level throttling</b>:
The stream-level congestion control mechanism is similar to the
circuit-level mechanism. ORs and OPs use <em>relay sendme</em> cells
to implement end-to-end flow control for individual streams across
circuits. Each stream begins with a packaging window (currently 500 cells),
and increments the window by a fixed value (50) upon receiving a <em>relay
sendme</em> cell. Rather than always returning a <em>relay sendme</em> cell as soon
as enough cells have arrived, the stream-level congestion control also
has to check whether data has been successfully flushed onto the TCP
stream; it sends the <em>relay sendme</em> cell only when the number of bytes pending
to be flushed is under some threshold (currently 10 cells' worth).
<divclass="p"><!----></div>
<divclass="p"><!----></div>
These arbitrarily chosen parameters seem to give tolerable throughput
and delay; see Section <ahref="#sec:in-the-wild">8</a>.
<divclass="p"><!----></div>
<h2><aname="tth_sEc5">
5</a> Rendezvous Points and hidden services</h2>
<aname="sec:rendezvous">
</a>
<divclass="p"><!----></div>
Rendezvous points are a building block for <em>location-hidden
services</em> (also known as <em>responder anonymity</em>) in the Tor
network. Location-hidden services allow Bob to offer a TCP
service, such as a webserver, without revealing his IP address.
This type of anonymity protects against distributed DoS attacks:
attackers are forced to attack the onion routing network
because they do not know Bob's IP address.
<divclass="p"><!----></div>
Our design for location-hidden servers has the following goals.
<b>Access-control:</b> Bob needs a way to filter incoming requests,
so an attacker cannot flood Bob simply by making many connections to him.
<b>Robustness:</b> Bob should be able to maintain a long-term pseudonymous
identity even in the presence of router failure. Bob's service must
not be tied to a single OR, and Bob must be able to migrate his service
across ORs. <b>Smear-resistance:</b>
A social attacker
should not be able to "frame" a rendezvous router by
offering an illegal or disreputable location-hidden service and
making observers believe the router created that service.
<b>Application-transparency:</b> Although we require users
to run special software to access location-hidden servers, we must not
require them to modify their applications.
<divclass="p"><!----></div>
We provide location-hiding for Bob by allowing him to advertise
several onion routers (his <em>introduction points</em>) as contact
points. He may do this on any robust efficient
key-value lookup system with authenticated updates, such as a
distributed hash table (DHT) like CFS [<ahref="#cfs:sosp01"name="CITEcfs:sosp01">11</a>].<ahref="#tthFtNtAAD"name="tthFrefAAD"><sup>3</sup></a> Alice, the client, chooses an OR as her
<em>rendezvous point</em>. She connects to one of Bob's introduction
points, informs him of her rendezvous point, and then waits for him
to connect to the rendezvous point. This extra level of indirection
helps Bob's introduction points avoid problems associated with serving
unpopular files directly (for example, if Bob serves
material that the introduction point's community finds objectionable,
or if Bob's service tends to get attacked by network vandals).
The extra level of indirection also allows Bob to respond to some requests
and ignore others.
<divclass="p"><!----></div>
<h3><aname="tth_sEc5.1">
5.1</a> Rendezvous points in Tor</h3>
<divclass="p"><!----></div>
The following steps are
performed on behalf of Alice and Bob by their local OPs;
application integration is described more fully below.
<divclass="p"><!----></div>
<dlcompact="compact">
<dt><b></b></dt>
<dd><li>Bob generates a long-term public key pair to identify his service.</dd>
<dt><b></b></dt>
<dd><li>Bob chooses some introduction points, and advertises them on
the lookup service, signing the advertisement with his public key. He
can add more later.</dd>
<dt><b></b></dt>
<dd><li>Bob builds a circuit to each of his introduction points, and tells
them to wait for requests.</dd>
<dt><b></b></dt>
<dd><li>Alice learns about Bob's service out of band (perhaps Bob told her,
or she found it on a website). She retrieves the details of Bob's
service from the lookup service. If Alice wants to access Bob's
service anonymously, she must connect to the lookup service via Tor.</dd>
<dt><b></b></dt>
<dd><li>Alice chooses an OR as the rendezvous point (RP) for her connection to
Bob's service. She builds a circuit to the RP, and gives it a
randomly chosen "rendezvous cookie" to recognize Bob.</dd>
<dt><b></b></dt>
<dd><li>Alice opens an anonymous stream to one of Bob's introduction
points, and gives it a message (encrypted with Bob's public key)
telling it about herself,
her RP and rendezvous cookie, and the
start of a DH
handshake. The introduction point sends the message to Bob.</dd>
<dt><b></b></dt>
<dd><li>If Bob wants to talk to Alice, he builds a circuit to Alice's
RP and sends the rendezvous cookie, the second half of the DH
handshake, and a hash of the session
key they now share. By the same argument as in
Section <ahref="#subsubsec:constructing-a-circuit">4.2</a>, Alice knows she
shares the key only with Bob.</dd>
<dt><b></b></dt>
<dd><li>The RP connects Alice's circuit to Bob's. Note that RP can't
recognize Alice, Bob, or the data they transmit.</dd>
<dt><b></b></dt>
<dd><li>Alice sends a <em>relay begin</em> cell along the circuit. It
arrives at Bob's OP, which connects to Bob's
webserver.</dd>
<dt><b></b></dt>
<dd><li>An anonymous stream has been established, and Alice and Bob
communicate as normal.
</dd>
</dl>
<divclass="p"><!----></div>
When establishing an introduction point, Bob provides the onion router
with the public key identifying his service. Bob signs his
messages, so others cannot usurp his introduction point
in the future. He uses the same public key to establish the other
introduction points for his service, and periodically refreshes his
entry in the lookup service.
<divclass="p"><!----></div>
The message that Alice gives
the introduction point includes a hash of Bob's public key and an optional initial authorization token (the
introduction point can do prescreening, for example to block replays). Her
message to Bob may include an end-to-end authorization token so Bob
can choose whether to respond.
The authorization tokens can be used to provide selective access:
important users can get uninterrupted access.
During normal situations, Bob's service might simply be offered
directly from mirrors, while Bob gives out tokens to high-priority users. If
the mirrors are knocked down,
those users can switch to accessing Bob's service via
the Tor rendezvous system.
<divclass="p"><!----></div>
Bob's introduction points are themselves subject to DoS-he must
open many introduction points or risk such an attack.
He can provide selected users with a current list or future schedule of
unadvertised introduction points;
this is most practical
if there is a stable and large group of introduction points
available. Bob could also give secret public keys
for consulting the lookup service. All of these approaches
limit exposure even when
some selected users collude in the DoS.
<divclass="p"><!----></div>
<h3><aname="tth_sEc5.2">
5.2</a> Integration with user applications</h3>
<divclass="p"><!----></div>
Bob configures his onion proxy to know the local IP address and port of his
service, a strategy for authorizing clients, and his public key. The onion
proxy anonymously publishes a signed statement of Bob's
public key, an expiration time, and
the current introduction points for his service onto the lookup service,
indexed
by the hash of his public key. Bob's webserver is unmodified,
and doesn't even know that it's hidden behind the Tor network.
<divclass="p"><!----></div>
Alice's applications also work unchanged-her client interface
remains a SOCKS proxy. We encode all of the necessary information
into the fully qualified domain name (FQDN) Alice uses when establishing her
connection. Location-hidden services use a virtual top level domain
called <tt>.onion</tt>: thus hostnames take the form <tt>x.y.onion</tt> where
<tt>x</tt> is the authorization cookie and <tt>y</tt> encodes the hash of
the public key. Alice's onion proxy
examines addresses; if they're destined for a hidden server, it decodes
the key and starts the rendezvous as described above.
<divclass="p"><!----></div>
<h3><aname="tth_sEc5.3">
5.3</a> Previous rendezvous work</h3>
<divclass="p"><!----></div>
Rendezvous points in low-latency anonymity systems were first
described for use in ISDN telephony [<ahref="#jerichow-jsac98"name="CITEjerichow-jsac98">30</a>,<ahref="#isdn-mixes"name="CITEisdn-mixes">38</a>].
Later low-latency designs used rendezvous points for hiding location
of mobile phones and low-power location
trackers [<ahref="#federrath-ih96"name="CITEfederrath-ih96">23</a>,<ahref="#reed-protocols97"name="CITEreed-protocols97">40</a>]. Rendezvous for
anonymizing low-latency
Internet connections was suggested in early Onion Routing
work [<ahref="#or-ih96"name="CITEor-ih96">27</a>], but the first published design was by Ian
Goldberg [<ahref="#ian-thesis"name="CITEian-thesis">26</a>]. His design differs from
ours in three ways. First, Goldberg suggests that Alice should manually
hunt down a current location of the service via Gnutella; our approach
makes lookup transparent to the user, as well as faster and more robust.
Second, in Tor the client and server negotiate session keys
with Diffie-Hellman, so plaintext is not exposed even at the rendezvous
point. Third,
our design minimizes the exposure from running the
service, to encourage volunteers to offer introduction and rendezvous
services. Tor's introduction points do not output any bytes to the
clients; the rendezvous points don't know the client or the server,
and can't read the data being transmitted. The indirection scheme is
also designed to include authentication/authorization-if Alice doesn't
include the right cookie with her request for service, Bob need not even
acknowledge his existence.
<divclass="p"><!----></div>
<h2><aname="tth_sEc6">
6</a> Other design decisions</h2>
<aname="sec:other-design">
</a>
<divclass="p"><!----></div>
<h3><aname="tth_sEc6.1">
6.1</a> Denial of service</h3>
<aname="subsec:dos">
</a>
<divclass="p"><!----></div>
Providing Tor as a public service creates many opportunities for
denial-of-service attacks against the network. While
flow control and rate limiting (discussed in
Section <ahref="#subsec:congestion">4.6</a>) prevent users from consuming more
bandwidth than routers are willing to provide, opportunities remain for
users to
consume more network resources than their fair share, or to render the
network unusable for others.
<divclass="p"><!----></div>
First of all, there are several CPU-consuming denial-of-service
attacks wherein an attacker can force an OR to perform expensive
cryptographic operations. For example, an attacker can
fake the start of a TLS handshake, forcing the OR to carry out its
(comparatively expensive) half of the handshake at no real computational
cost to the attacker.
<divclass="p"><!----></div>
We have not yet implemented any defenses for these attacks, but several
approaches are possible. First, ORs can
require clients to solve a puzzle [<ahref="#puzzles-tls"name="CITEpuzzles-tls">16</a>] while beginning new
TLS handshakes or accepting <em>create</em> cells. So long as these
tokens are easy to verify and computationally expensive to produce, this
approach limits the attack multiplier. Additionally, ORs can limit
the rate at which they accept <em>create</em> cells and TLS connections,
so that
the computational work of processing them does not drown out the
symmetric cryptography operations that keep cells
flowing. This rate limiting could, however, allow an attacker
to slow down other users when they build new circuits.
<divclass="p"><!----></div>
<divclass="p"><!----></div>
Adversaries can also attack the Tor network's hosts and network
links. Disrupting a single circuit or link breaks all streams passing
along that part of the circuit. Users similarly lose service
when a router crashes or its operator restarts it. The current
Tor design treats such attacks as intermittent network failures, and
depends on users and applications to respond or recover as appropriate. A
future design could use an end-to-end TCP-like acknowledgment protocol,
so no streams are lost unless the entry or exit point is
disrupted. This solution would require more buffering at the network
edges, however, and the performance and anonymity implications from this
extra complexity still require investigation.
<divclass="p"><!----></div>
<h3><aname="tth_sEc6.2">
6.2</a> Exit policies and abuse</h3>
<aname="subsec:exitpolicies">
</a>
<divclass="p"><!----></div>
<divclass="p"><!----></div>
Exit abuse is a serious barrier to wide-scale Tor deployment. Anonymity
presents would-be vandals and abusers with an opportunity to hide
the origins of their activities. Attackers can harm the Tor network by
implicating exit servers for their abuse. Also, applications that commonly
use IP-based authentication (such as institutional mail or webservers)
can be fooled by the fact that anonymous connections appear to originate
at the exit OR.
<divclass="p"><!----></div>
We stress that Tor does not enable any new class of abuse. Spammers
and other attackers already have access to thousands of misconfigured
systems worldwide, and the Tor network is far from the easiest way
to launch attacks.
But because the
onion routers can be mistaken for the originators of the abuse,
and the volunteers who run them may not want to deal with the hassle of
explaining anonymity networks to irate administrators, we must block or limit
abuse through the Tor network.
<divclass="p"><!----></div>
To mitigate abuse issues, each onion router's <em>exit policy</em>
describes to which external addresses and ports the router will
connect. On one end of the spectrum are <em>open exit</em>
nodes that will connect anywhere. On the other end are <em>middleman</em>
nodes that only relay traffic to other Tor nodes, and <em>private exit</em>
nodes that only connect to a local host or network. A private
exit can allow a client to connect to a given host or
network more securely-an external adversary cannot eavesdrop traffic
between the private exit and the final destination, and so is less sure of
Alice's destination and activities. Most onion routers in the current
network function as
<em>restricted exits</em> that permit connections to the world at large,
but prevent access to certain abuse-prone addresses and services such
as SMTP.
The OR might also be able to authenticate clients to
prevent exit abuse without harming anonymity [<ahref="#or-discex00"name="CITEor-discex00">48</a>].
<divclass="p"><!----></div>
<divclass="p"><!----></div>
Many administrators use port restrictions to support only a
limited set of services, such as HTTP, SSH, or AIM.
This is not a complete solution, of course, since abuse opportunities for these
protocols are still well known.
<divclass="p"><!----></div>
We have not yet encountered any abuse in the deployed network, but if
we do we should consider using proxies to clean traffic for certain
protocols as it leaves the network. For example, much abusive HTTP
behavior (such as exploiting buffer overflows or well-known script
vulnerabilities) can be detected in a straightforward manner.
Similarly, one could run automatic spam filtering software (such as
SpamAssassin) on email exiting the OR network.
<divclass="p"><!----></div>
ORs may also rewrite exiting traffic to append
headers or other information indicating that the traffic has passed
through an anonymity service. This approach is commonly used
by email-only anonymity systems. ORs can also
run on servers with hostnames like <tt>anonymous</tt> to further
alert abuse targets to the nature of the anonymous traffic.
<divclass="p"><!----></div>
A mixture of open and restricted exit nodes allows the most
flexibility for volunteers running servers. But while having many
middleman nodes provides a large and robust network,
having only a few exit nodes reduces the number of points
an adversary needs to monitor for traffic analysis, and places a
greater burden on the exit nodes. This tension can be seen in the
Java Anon Proxy
cascade model, wherein only one node in each cascade needs to handle
abuse complaints-but an adversary only needs to observe the entry
and exit of a cascade to perform traffic analysis on all that
cascade's users. The hydra model (many entries, few exits) presents a
different compromise: only a few exit nodes are needed, but an
adversary needs to work harder to watch all the clients; see
Section <ahref="#sec:conclusion">10</a>.
<divclass="p"><!----></div>
Finally, we note that exit abuse must not be dismissed as a peripheral
issue: when a system's public image suffers, it can reduce the number
and diversity of that system's users, and thereby reduce the anonymity
of the system itself. Like usability, public perception is a
security parameter. Sadly, preventing abuse of open exit nodes is an
unsolved problem, and will probably remain an arms race for the
foreseeable future. The abuse problems faced by Princeton's CoDeeN
project [<ahref="#darkside"name="CITEdarkside">37</a>] give us a glimpse of likely issues.
<divclass="p"><!----></div>
<h3><aname="tth_sEc6.3">
6.3</a> Directory Servers</h3>
<aname="subsec:dirservers">
</a>
<divclass="p"><!----></div>
First-generation Onion Routing designs [<ahref="#freedom2-arch"name="CITEfreedom2-arch">8</a>,<ahref="#or-jsac98"name="CITEor-jsac98">41</a>] used
in-band network status updates: each router flooded a signed statement
to its neighbors, which propagated it onward. But anonymizing networks
have different security goals than typical link-state routing protocols.
For example, delays (accidental or intentional)
that can cause different parts of the network to have different views
of link-state and topology are not only inconvenient: they give
attackers an opportunity to exploit differences in client knowledge.
We also worry about attacks to deceive a
client about the router membership list, topology, or current network
state. Such <em>partitioning attacks</em> on client knowledge help an
adversary to efficiently deploy resources
against a target [<ahref="#minion-design"name="CITEminion-design">15</a>].
<divclass="p"><!----></div>
Tor uses a small group of redundant, well-known onion routers to
track changes in network topology and node state, including keys and
exit policies. Each such <em>directory server</em> acts as an HTTP
server, so clients can fetch current network state
and router lists, and so other ORs can upload
state information. Onion routers periodically publish signed
statements of their state to each directory server. The directory servers
combine this information with their own views of network liveness,
and generate a signed description (a <em>directory</em>) of the entire
network state. Client software is
pre-loaded with a list of the directory servers and their keys,
to bootstrap each client's view of the network.
<divclass="p"><!----></div>
When a directory server receives a signed statement for an OR, it
checks whether the OR's identity key is recognized. Directory
servers do not advertise unrecognized ORs-if they did,
an adversary could take over the network by creating many
servers [<ahref="#sybil"name="CITEsybil">22</a>]. Instead, new nodes must be approved by the
directory
server administrator before they are included. Mechanisms for automated
node approval are an area of active research, and are discussed more
in Section <ahref="#sec:maintaining-anonymity">9</a>.
<divclass="p"><!----></div>
Of course, a variety of attacks remain. An adversary who controls
a directory server can track clients by providing them different
information-perhaps by listing only nodes under its control, or by
informing only certain clients about a given node. Even an external
adversary can exploit differences in client knowledge: clients who use
a node listed on one directory server but not the others are vulnerable.
<divclass="p"><!----></div>
Thus these directory servers must be synchronized and redundant, so
that they can agree on a common directory. Clients should only trust
this directory if it is signed by a threshold of the directory
servers.
<divclass="p"><!----></div>
The directory servers in Tor are modeled after those in
Mixminion [<ahref="#minion-design"name="CITEminion-design">15</a>], but our situation is easier. First,
we make the
simplifying assumption that all participants agree on the set of
directory servers. Second, while Mixminion needs to predict node
behavior, Tor only needs a threshold consensus of the current
state of the network. Third, we assume that we can fall back to the
human administrators to discover and resolve problems when a consensus
directory cannot be reached. Since there are relatively few directory
servers (currently 3, but we expect as many as 9 as the network scales),
we can afford operations like broadcast to simplify the consensus-building
protocol.
<divclass="p"><!----></div>
To avoid attacks where a router connects to all the directory servers
but refuses to relay traffic from other routers, the directory servers
must also build circuits and use them to anonymously test router
reliability [<ahref="#mix-acc"name="CITEmix-acc">18</a>]. Unfortunately, this defense is not yet
designed or
implemented.
<divclass="p"><!----></div>
Using directory servers is simpler and more flexible than flooding.
Flooding is expensive, and complicates the analysis when we
start experimenting with non-clique network topologies. Signed
directories can be cached by other
onion routers,
so directory servers are not a performance
bottleneck when we have many users, and do not aid traffic analysis by
forcing clients to announce their existence to any
central point.
<divclass="p"><!----></div>
<h2><aname="tth_sEc7">
7</a> Attacks and Defenses</h2>
<aname="sec:attacks">
</a>
<divclass="p"><!----></div>
Below we summarize a variety of attacks, and discuss how well our
design withstands them.<br/>
<divclass="p"><!----></div>
<fontsize="+1"><b>Passive attacks</b></font><br/>
<em>Observing user traffic patterns.</em> Observing a user's connection
will not reveal her destination or data, but it will
reveal traffic patterns (both sent and received). Profiling via user
connection patterns requires further processing, because multiple
application streams may be operating simultaneously or in series over
a single circuit.
<divclass="p"><!----></div>
<em>Observing user content.</em> While content at the user end is encrypted,
connections to responders may not be (indeed, the responding website
itself may be hostile). While filtering content is not a primary goal
of Onion Routing, Tor can directly use Privoxy and related
filtering services to anonymize application data streams.
<divclass="p"><!----></div>
<em>Option distinguishability.</em> We allow clients to choose
configuration options. For example, clients concerned about request
linkability should rotate circuits more often than those concerned
about traceability. Allowing choice may attract users with different
needs; but clients who are
in the minority may lose more anonymity by appearing distinct than they
gain by optimizing their behavior [<ahref="#econymics"name="CITEeconymics">1</a>].
<divclass="p"><!----></div>
<em>End-to-end timing correlation.</em> Tor only minimally hides
such correlations. An attacker watching patterns of
traffic at the initiator and the responder will be
able to confirm the correspondence with high probability. The
greatest protection currently available against such confirmation is to hide
the connection between the onion proxy and the first Tor node,
by running the OP on the Tor node or behind a firewall. This approach
requires an observer to separate traffic originating at the onion
router from traffic passing through it: a global observer can do this,
but it might be beyond a limited observer's capabilities.
<fontsize="+1"><b>Attacks against rendezvous points</b></font><br/>
<em>Make many introduction requests.</em> An attacker could
try to deny Bob service by flooding his introduction points with
requests. Because the introduction points can block requests that
lack authorization tokens, however, Bob can restrict the volume of
requests he receives, or require a certain amount of computation for
every request he receives.
<divclass="p"><!----></div>
<em>Attack an introduction point.</em> An attacker could
disrupt a location-hidden service by disabling its introduction
points. But because a service's identity is attached to its public
key, the service can simply re-advertise
itself at a different introduction point. Advertisements can also be
done secretly so that only high-priority clients know the address of
Bob's introduction points or so that different clients know of different
introduction points. This forces the attacker to disable all possible
introduction points.
<divclass="p"><!----></div>
<em>Compromise an introduction point.</em> An attacker who controls
Bob's introduction point can flood Bob with
introduction requests, or prevent valid introduction requests from
reaching him. Bob can notice a flood, and close the circuit. To notice
blocking of valid requests, however, he should periodically test the
introduction point by sending rendezvous requests and making
sure he receives them.
<divclass="p"><!----></div>
<em>Compromise a rendezvous point.</em> A rendezvous
point is no more sensitive than any other OR on
a circuit, since all data passing through the rendezvous is encrypted
with a session key shared by Alice and Bob.
<divclass="p"><!----></div>
<h2><aname="tth_sEc8">
8</a> Early experiences: Tor in the Wild</h2>
<aname="sec:in-the-wild">
</a>
<divclass="p"><!----></div>
As of mid-May 2004, the Tor network consists of 32 nodes
(24 in the US, 8 in Europe), and more are joining each week as the code
matures. (For comparison, the current remailer network
has about 40 nodes.) Each node has at least a 768Kb/768Kb connection, and
many have 10Mb. The number of users varies (and of course, it's hard to
tell for sure), but we sometimes have several hundred users-administrators at
several companies have begun sending their entire departments' web
traffic through Tor, to block other divisions of
their company from reading their traffic. Tor users have reported using
the network for web browsing, FTP, IRC, AIM, Kazaa, SSH, and
recipient-anonymous email via rendezvous points. One user has anonymously
set up a Wiki as a hidden service, where other users anonymously publish
the addresses of their hidden services.
<divclass="p"><!----></div>
Each Tor node currently processes roughly 800,000 relay
cells (a bit under half a gigabyte) per week. On average, about 80%
of each 498-byte payload is full for cells going back to the client,
whereas about 40% is full for cells coming from the client. (The difference
arises because most of the network's traffic is web browsing.) Interactive
traffic like SSH brings down the average a lot-once we have more
experience, and assuming we can resolve the anonymity issues, we may
partition traffic into two relay cell sizes: one to handle
bulk traffic and one for interactive traffic.
<divclass="p"><!----></div>
Based in part on our restrictive default exit policy (we
reject SMTP requests) and our low profile, we have had no abuse
issues since the network was deployed in October
2003. Our slow growth rate gives us time to add features,
resolve bugs, and get a feel for what users actually want from an
anonymity system. Even though having more users would bolster our
anonymity sets, we are not eager to attract the Kazaa or warez
communities-we feel that we must build a reputation for privacy, human
rights, research, and other socially laudable activities.
<divclass="p"><!----></div>
As for performance, profiling shows that Tor spends almost
all its CPU time in AES, which is fast. Current latency is attributable
to two factors. First, network latency is critical: we are
intentionally bouncing traffic around the world several times. Second,
our end-to-end congestion control algorithm focuses on protecting
volunteer servers from accidental DoS rather than on optimizing
performance. To quantify these effects, we did some informal tests using a network of 4
nodes on the same machine (a heavily loaded 1GHz Athlon). We downloaded a 60
megabyte file from <tt>debian.org</tt> every 30 minutes for 54 hours (108 sample
points). It arrived in about 300 seconds on average, compared to 210s for a
direct download. We ran a similar test on the production Tor network,
fetching the front page of <tt>cnn.com</tt> (55 kilobytes):
while a direct
download consistently took about 0.3s, the performance through Tor varied.
Some downloads were as fast as 0.4s, with a median at 2.8s, and
90% finishing within 5.3s. It seems that as the network expands, the chance
of building a slow circuit (one that includes a slow or heavily loaded node
or link) is increasing. On the other hand, as our users remain satisfied
with this increased latency, we can address our performance incrementally as we
proceed with development.
<divclass="p"><!----></div>
<divclass="p"><!----></div>
<divclass="p"><!----></div>
Although Tor's clique topology and full-visibility directories present
scaling problems, we still expect the network to support a few hundred
nodes and maybe 10,000 users before we're forced to become
more distributed. With luck, the experience we gain running the current
topology will help us choose among alternatives when the time comes.
<divclass="p"><!----></div>
<h2><aname="tth_sEc9">
9</a> Open Questions in Low-latency Anonymity</h2>
<aname="sec:maintaining-anonymity">
</a>
<divclass="p"><!----></div>
In addition to the non-goals in
Section <ahref="#subsec:non-goals">3</a>, many questions must be solved
before we can be confident of Tor's security.
<divclass="p"><!----></div>
Many of these open issues are questions of balance. For example,
how often should users rotate to fresh circuits? Frequent rotation
is inefficient, expensive, and may lead to intersection attacks and
predecessor attacks [<ahref="#wright03"name="CITEwright03">54</a>], but infrequent rotation makes the
user's traffic linkable. Besides opening fresh circuits, clients can
also exit from the middle of the circuit,
or truncate and re-extend the circuit. More analysis is
needed to determine the proper tradeoff.
<divclass="p"><!----></div>
<divclass="p"><!----></div>
How should we choose path lengths? If Alice always uses two hops,
then both ORs can be certain that by colluding they will learn about
Alice and Bob. In our current approach, Alice always chooses at least
three nodes unrelated to herself and her destination.
Should Alice choose a random path length (e.g. from a geometric
distribution) to foil an attacker who
uses timing to learn that he is the fifth hop and thus concludes that
both Alice and the responder are running ORs?
<divclass="p"><!----></div>
Throughout this paper, we have assumed that end-to-end traffic
confirmation will immediately and automatically defeat a low-latency
anonymity system. Even high-latency anonymity systems can be
vulnerable to end-to-end traffic confirmation, if the traffic volumes
are high enough, and if users' habits are sufficiently
distinct [<ahref="#statistical-disclosure"name="CITEstatistical-disclosure">14</a>,<ahref="#limits-open"name="CITElimits-open">31</a>]. Can anything be
done to
make low-latency systems resist these attacks as well as high-latency
systems? Tor already makes some effort to conceal the starts and ends of
streams by wrapping long-range control commands in identical-looking
relay cells. Link padding could frustrate passive observers who count
packets; long-range padding could work against observers who own the
first hop in a circuit. But more research remains to find an efficient
and practical approach. Volunteers prefer not to run constant-bandwidth
padding; but no convincing traffic shaping approach has been
specified. Recent work on long-range padding [<ahref="#defensive-dropping"name="CITEdefensive-dropping">33</a>]
shows promise. One could also try to reduce correlation in packet timing
by batching and re-ordering packets, but it is unclear whether this could
improve anonymity without introducing so much latency as to render the
network unusable.
<divclass="p"><!----></div>
A cascade topology may better defend against traffic confirmation by
aggregating users, and making padding and
mixing more affordable. Does the hydra topology (many input nodes,
few output nodes) work better against some adversaries? Are we going
to get a hydra anyway because most nodes will be middleman nodes?
<divclass="p"><!----></div>
Common wisdom suggests that Alice should run her own OR for best
anonymity, because traffic coming from her node could plausibly have
come from elsewhere. How much mixing does this approach need? Is it
immediately beneficial because of real-world adversaries that can't
observe Alice's router, but can run routers of their own?
<divclass="p"><!----></div>
To scale to many users, and to prevent an attacker from observing the
whole network, it may be necessary
to support far more servers than Tor currently anticipates.
This introduces several issues. First, if approval by a central set
of directory servers is no longer feasible, what mechanism should be used
to prevent adversaries from signing up many colluding servers? Second,
if clients can no longer have a complete picture of the network,
how can they perform discovery while preventing attackers from
manipulating or exploiting gaps in their knowledge? Third, if there
are too many servers for every server to constantly communicate with
every other, which non-clique topology should the network use?
(Restricted-route topologies promise comparable anonymity with better
scalability [<ahref="#danezis-pets03"name="CITEdanezis-pets03">13</a>], but whatever topology we choose, we
need some way to keep attackers from manipulating their position within
it [<ahref="#casc-rep"name="CITEcasc-rep">21</a>].) Fourth, if no central authority is tracking
server reliability, how do we stop unreliable servers from making
the network unusable? Fifth, do clients receive so much anonymity
from running their own ORs that we should expect them all to do
so [<ahref="#econymics"name="CITEeconymics">1</a>], or do we need another incentive structure to
motivate them? Tarzan and MorphMix present possible solutions.
<divclass="p"><!----></div>
<divclass="p"><!----></div>
When a Tor node goes down, all its circuits (and thus streams) must break.
Will users abandon the system because of this brittleness? How well
does the method in Section <ahref="#subsec:dos">6.1</a> allow streams to survive
node failure? If affected users rebuild circuits immediately, how much
anonymity is lost? It seems the problem is even worse in a peer-to-peer
environment-such systems don't yet provide an incentive for peers to
stay connected when they're done retrieving content, so we would expect
a higher churn rate.
<divclass="p"><!----></div>
<divclass="p"><!----></div>
<h2><aname="tth_sEc10">
10</a> Future Directions</h2>
<aname="sec:conclusion">
</a>
<divclass="p"><!----></div>
Tor brings together many innovations into a unified deployable system. The
next immediate steps include:
<divclass="p"><!----></div>
<em>Scalability:</em> Tor's emphasis on deployability and design simplicity
has led us to adopt a clique topology, semi-centralized
directories, and a full-network-visibility model for client
knowledge. These properties will not scale past a few hundred servers.
Section <ahref="#sec:maintaining-anonymity">9</a> describes some promising
approaches, but more deployment experience will be helpful in learning
the relative importance of these bottlenecks.
<divclass="p"><!----></div>
<em>Bandwidth classes:</em> This paper assumes that all ORs have
good bandwidth and latency. We should instead adopt the MorphMix model,
where nodes advertise their bandwidth level (DSL, T1, T3), and
Alice avoids bottlenecks by choosing nodes that match or
exceed her bandwidth. In this way DSL users can usefully join the Tor
network.
<divclass="p"><!----></div>
<em>Incentives:</em> Volunteers who run nodes are rewarded with publicity
and possibly better anonymity [<ahref="#econymics"name="CITEeconymics">1</a>]. More nodes means increased
scalability, and more users can mean more anonymity. We need to continue
examining the incentive structures for participating in Tor. Further,
we need to explore more approaches to limiting abuse, and understand
why most people don't bother using privacy systems.
<divclass="p"><!----></div>
<em>Cover traffic:</em> Currently Tor omits cover traffic-its costs
in performance and bandwidth are clear but its security benefits are
not well understood. We must pursue more research on link-level cover
traffic and long-range cover traffic to determine whether some simple padding
method offers provable protection against our chosen adversary.
<divclass="p"><!----></div>
<divclass="p"><!----></div>
<em>Caching at exit nodes:</em> Perhaps each exit node should run a
caching web proxy [<ahref="#shsm03"name="CITEshsm03">47</a>], to improve anonymity for cached pages
(Alice's request never
leaves the Tor network), to improve speed, and to reduce bandwidth cost.
On the other hand, forward security is weakened because caches
constitute a record of retrieved files. We must find the right
balance between usability and security.
<divclass="p"><!----></div>
<em>Better directory distribution:</em>
Clients currently download a description of
the entire network every 15 minutes. As the state grows larger
and clients more numerous, we may need a solution in which
clients receive incremental updates to directory state.
More generally, we must find more
scalable yet practical ways to distribute up-to-date snapshots of