mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-10 21:23:58 +01:00
53dca60b13
svn:r646
805 lines
36 KiB
TeX
805 lines
36 KiB
TeX
\documentclass[times,10pt,twocolumn]{article}
|
|
\usepackage{latex8}
|
|
%\usepackage{times}
|
|
\usepackage{url}
|
|
\usepackage{graphics}
|
|
\usepackage{amsmath}
|
|
|
|
\pagestyle{empty}
|
|
|
|
\renewcommand\url{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\Url}
|
|
\newcommand\emailaddr{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\Url}
|
|
|
|
% If an URL ends up with '%'s in it, that's because the line *in the .bib/.tex
|
|
% file* is too long, so break it there (it doesn't matter if the next line is
|
|
% indented with spaces). -DH
|
|
|
|
%\newif\ifpdf
|
|
%\ifx\pdfoutput\undefined
|
|
% \pdffalse
|
|
%\else
|
|
% \pdfoutput=1
|
|
% \pdftrue
|
|
%\fi
|
|
|
|
\newenvironment{tightlist}{\begin{list}{$\bullet$}{
|
|
\setlength{\itemsep}{0mm}
|
|
\setlength{\parsep}{0mm}
|
|
% \setlength{\labelsep}{0mm}
|
|
% \setlength{\labelwidth}{0mm}
|
|
% \setlength{\topsep}{0mm}
|
|
}}{\end{list}}
|
|
|
|
\begin{document}
|
|
|
|
%% Use dvipdfm instead. --DH
|
|
%\ifpdf
|
|
% \pdfcompresslevel=9
|
|
% \pdfpagewidth=\the\paperwidth
|
|
% \pdfpageheight=\the\paperheight
|
|
%\fi
|
|
|
|
\title{Tor: Design of a Next-Generation Onion Router}
|
|
|
|
%\author{Anonymous}
|
|
%\author{Roger Dingledine \\ The Free Haven Project \\ arma@freehaven.net \and
|
|
%Nick Mathewson \\ The Free Haven Project \\ nickm@freehaven.net \and
|
|
%Paul Syverson \\ Naval Research Lab \\ syverson@itd.nrl.navy.mil}
|
|
|
|
\maketitle
|
|
\thispagestyle{empty}
|
|
|
|
\begin{abstract}
|
|
We present Tor, a connection-based low-latency anonymous communication
|
|
system. It is intended as an update and replacement for onion routing
|
|
and addresses many limitations in the original onion routing design.
|
|
Tor works in a real-world Internet environment,
|
|
requires little synchronization or coordination between nodes, and
|
|
protects against known anonymity-breaking attacks as well
|
|
as or better than other systems with similar design parameters.
|
|
\end{abstract}
|
|
|
|
%\begin{center}
|
|
%\textbf{Keywords:} anonymity, peer-to-peer, remailer, nymserver, reply block
|
|
%\end{center}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\Section{Overview}
|
|
\label{sec:intro}
|
|
|
|
Onion routing is a distributed overlay network designed to anonymize
|
|
low-latency TCP-based applications such as web browsing, secure shell,
|
|
and instant messaging. Users choose a path through the network and
|
|
build a \emph{virtual circuit}, in which each node in the path knows its
|
|
predecessor and successor, but no others. Traffic flowing down the circuit
|
|
is sent in fixed-size \emph{cells}, which are unwrapped by a symmetric key
|
|
at each node, revealing the downstream node. The original onion routing
|
|
project published several design and analysis papers
|
|
\cite{or-jsac98,or-discex00,or-ih96,or-pet00}. While there was briefly
|
|
a wide area onion routing network,
|
|
the only long-running and publicly accessible
|
|
implementation was a fragile proof-of-concept that ran on a single
|
|
machine. Many critical design and deployment issues were never implemented,
|
|
and the design has not been updated in several years.
|
|
Here we describe Tor, a protocol for asynchronous, loosely
|
|
federated onion routers that provides the following improvements over
|
|
the old onion routing design:
|
|
|
|
\begin{tightlist}
|
|
|
|
\item \textbf{Perfect forward secrecy:} The original onion routing
|
|
design is vulnerable to a single hostile node recording traffic and later
|
|
forcing successive nodes in the circuit to decrypt it. Rather than using
|
|
onions to lay the circuits, Tor uses an incremental or \emph{telescoping}
|
|
path-building design, where the initiator negotiates session keys with
|
|
each successive hop in the circuit. Onion replay detection is no longer
|
|
necessary, and the network as a whole is more reliable to boot, since
|
|
the initiator knows which hop failed and can try extending to a new node.
|
|
|
|
\item \textbf{Applications talk to the onion proxy via Socks:}
|
|
The original onion routing design required a separate proxy for each
|
|
supported application protocol, resulting in a lot of extra code (most
|
|
of which was never written) and also meaning that a lot of TCP-based
|
|
applications were not supported. Tor uses the unified and standard Socks
|
|
\cite{socks4,socks5} interface, allowing us to support any TCP-based
|
|
program without modification.
|
|
|
|
\item \textbf{Many applications can share one circuit:} The original
|
|
onion routing design built one circuit for each request. Aside from the
|
|
performance issues of doing public key operations for every request, it
|
|
also turns out that regular communications patterns mean building lots
|
|
of circuits, which can endanger anonymity.
|
|
The very first onion routing design \cite{or-ih96} protected against
|
|
this to some extent by hiding network access behind an onion
|
|
router/firewall that was also forwarding traffic from other nodes.
|
|
However, even if this meant complete protection, many users can
|
|
benefit from onion routing for which neither running one's own node
|
|
nor such firewall configurations are adequately convenient to be
|
|
feasible. Those users, especially if they engage in certain unusual
|
|
communication behaviors, may be identifiable \cite{wright03}. To
|
|
complicate the possibility of such attacks Tor multiplexes many
|
|
connections down each circuit, but still rotates the circuit
|
|
periodically to avoid too much linkability from requests on a single
|
|
circuit.
|
|
|
|
\item \textbf{No mixing or traffic shaping:} The original onion routing
|
|
design called for full link padding both between onion routers and between
|
|
onion proxies (that is, users) and onion routers \cite{or-jsac98}. The
|
|
later analysis paper \cite{or-pet00} suggested \emph{traffic shaping}
|
|
to provide similar protection but use less bandwidth, but did not go
|
|
into detail. However, recent research \cite{econymics} and deployment
|
|
experience \cite{freedom21-security} indicate that this level of resource
|
|
use is not practical or economical; and even full link padding is still
|
|
vulnerable to active attacks \cite{defensive-dropping}.
|
|
%[An upcoming FC04 paper. I'll add a cite when it's out. -RD]
|
|
|
|
\item \textbf{Leaky pipes:} Through in-band signalling within the
|
|
circuit, Tor initiators can direct traffic to nodes partway down the
|
|
circuit. This allows for long-range padding to frustrate traffic
|
|
shape and volume attacks at the initiator \cite{defensive-dropping},
|
|
but because circuits are used by more than one application, it also
|
|
allows traffic to exit the circuit from the middle -- thus
|
|
frustrating traffic shape and volume attacks based on observing exit
|
|
points.
|
|
%Or something like that. hm. Tone this down maybe? Or support it. -RD
|
|
%How's that? -PS
|
|
|
|
\item \textbf{Congestion control:} Earlier anonymity designs do not
|
|
address traffic bottlenecks. Unfortunately, typical approaches to load
|
|
balancing and flow control in overlay networks involve inter-node control
|
|
communication and global views of traffic. Our decentralized ack-based
|
|
congestion control maintains reasonable anonymity while allowing nodes
|
|
at the edges of the network to detect congestion or flooding attacks
|
|
and send less data until the congestion subsides.
|
|
|
|
\item \textbf{Directory servers:} Rather than attempting to flood
|
|
link-state information through the network, which can be unreliable and
|
|
open to partitioning attacks or outright deception, Tor takes a simplified
|
|
view towards distributing link-state information. Certain more trusted
|
|
onion routers also serve as directory servers; they provide signed
|
|
\emph{directories} describing all routers they know about, and which
|
|
are currently up. Users periodically download these directories via HTTP.
|
|
|
|
\item \textbf{End-to-end integrity checking:} Without integrity checking
|
|
on traffic going through the network, an onion router can change the
|
|
contents of cells as they pass by, e.g. by redirecting a connection on
|
|
the fly so it connects to a different webserver, or by tagging encrypted
|
|
traffic and looking for traffic at the network edges that has been
|
|
tagged \cite{minion-design}.
|
|
|
|
\item \textbf{Robustness to node failure:} router twins
|
|
|
|
\item \textbf{Exit policies:}
|
|
Tor provides a consistent mechanism for each node to specify and
|
|
advertise an exit policy.
|
|
|
|
\item \textbf{Rendezvous points:}
|
|
location-protected servers
|
|
|
|
\end{tightlist}
|
|
|
|
We review previous work in Section \ref{sec:background}, describe
|
|
our goals and assumptions in Section \ref{sec:assumptions},
|
|
and then address the above list of improvements in Sections
|
|
\ref{sec:design}-\ref{sec:maintaining-anonymity}. We then summarize
|
|
how our design stands up to known attacks, and conclude with a list of
|
|
open problems.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\Section{Background and threat model}
|
|
\label{sec:background}
|
|
|
|
\SubSection{Related work}
|
|
\label{sec:related-work}
|
|
Modern anonymity designs date to Chaum's Mix-Net\cite{chaum-mix} design of
|
|
1981. Chaum proposed hiding sender-recipient connections by wrapping
|
|
messages in several layers of public key cryptography, and relaying them
|
|
through a path composed of Mix servers. Mix servers in turn decrypt, delay,
|
|
and re-order messages, before relay them along the path towards their
|
|
destinations.
|
|
|
|
Subsequent relay-based anonymity designs have diverged in two
|
|
principal directions. Some have attempted to maximize anonymity at
|
|
the cost of introducing comparatively large and variable latencies,
|
|
for example, Babel\cite{babel}, Mixmaster\cite{mixmaster-spec}, and
|
|
Mixminion\cite{minion-design}. Because of this
|
|
decision, such \emph{high-latency} networks are well-suited for anonymous
|
|
email, but introduce too much lag for interactive tasks such as web browsing,
|
|
internet chat, or SSH connections.
|
|
|
|
Tor belongs to the second category: \emph{low-latency} designs that
|
|
attempt to anonymize interactive network traffic. Because such
|
|
traffic tends to involve a relatively large numbers of packets, it is
|
|
difficult to prevent an attacker who can eavesdrop entry and exit
|
|
points from correlating packets entering the anonymity network with
|
|
packets leaving it. Although some work has been done to frustrate
|
|
these attacks, most designs protect primarily against traffic analysis
|
|
rather than traffic confirmation \cite{or-jsac98}. One can pad and
|
|
limit communication to a constant rate or at least to control the
|
|
variation in traffic shape. This can have prohibitive bandwidth costs
|
|
and/or performance limitations. One can also use a cascade (fixed
|
|
shared route) with a relatively fixed set of users. This assumes a
|
|
significant degree of agreement and provides an easier target for an active
|
|
attacker since the endpoints are generally known. However, a practical
|
|
network with both of these features has been run for many years
|
|
(the Java Anon Proxy, aka Web MIXes, \cite{web-mix}).
|
|
|
|
Another low latency design that was proposed independently and at
|
|
about the same time as onion routing was PipeNet \cite{pipenet}.
|
|
This provided anonymity protections that were stronger than onion routing's,
|
|
but at the cost of allowing a single user to shut down the network simply
|
|
by not sending. It was also never implemented or formally published.
|
|
|
|
The simplest low-latency designs are single-hop proxies such as the
|
|
Anonymizer \cite{anonymizer}, wherein a single trusted server removes
|
|
identifying users' data before relaying it. These designs are easy to
|
|
analyze, but require end-users to trust the anonymizing proxy.
|
|
|
|
More complex are distributed-trust, channel-based anonymizing systems. In
|
|
these designs, a user establishes one or more medium-term bidirectional
|
|
end-to-end tunnels to exit servers, and uses those tunnels to deliver a
|
|
number of low-latency packets to and from one or more destinations per
|
|
tunnel. Establishing tunnels is comparatively expensive and typically
|
|
requires public-key cryptography, whereas relaying packets along a tunnel is
|
|
comparatively inexpensive. Because a tunnel crosses several servers, no
|
|
single server can learn the user's communication partners.
|
|
|
|
Systems such as earlier versions of Freedom and onion routing
|
|
build the anonymous channel all at once (using an onion). Later
|
|
designs of Freedom and onion routing as described herein build
|
|
the channel in stages as does AnonNet
|
|
\cite{anonnet}. Amongst other things, this makes perfect forward
|
|
secrecy feasible.
|
|
|
|
Some systems, such as Crowds \cite{crowds-tissec}, do not rely on the
|
|
changing appearance of packets to hide the path; rather they employ
|
|
mechanisms so that an intermediary cannot be sure when it is
|
|
receiving from/sending to the ultimate initiator. There is no public-key
|
|
encryption needed for Crowds, but the responder and all data are
|
|
visible to all nodes on the path so that anonymity of connection
|
|
initiator depends on filtering all identifying information from the
|
|
data stream. Crowds is also designed only for HTTP traffic.
|
|
|
|
Hordes \cite{hordes-jcs} is based on Crowds but also uses multicast
|
|
responses to hide the initiator. Herbivore \cite{herbivore} and
|
|
P5 \cite{p5} go even further requiring broadcast.
|
|
They each use broadcast in very different ways, and tradeoffs are made to
|
|
make broadcast more practical. Both Herbivore and P5 are designed primarily
|
|
for communication between communicating peers, although Herbivore
|
|
permits external connections by requesting a peer to serve as a proxy.
|
|
Allowing easy connections to nonparticipating responders or recipients
|
|
is a practical requirement for many users, e.g., to visit
|
|
nonparticipating Web sites or to exchange mail with nonparticipating
|
|
recipients.
|
|
|
|
Distributed-trust anonymizing systems differ in how they prevent attackers
|
|
from controlling too many servers and thus compromising too many user paths.
|
|
Some protocols rely on a centrally maintained set of well-known anonymizing
|
|
servers. Current Tor design falls into this category.
|
|
Others (such as Tarzan and MorphMix) allow unknown users to run
|
|
servers, while using a limited resource (DHT space for Tarzan; IP space for
|
|
MorphMix) to prevent an attacker from owning too much of the network.
|
|
Crowds uses a centralized ``blender'' to enforce Crowd membership
|
|
policy. For small crowds it is suggested that familiarity with all
|
|
members is adequate. For large diverse crowds, limiting accounts in
|
|
control of any one party is more difficult:
|
|
``(e.g., the blender administrator sets up an account for a user only
|
|
after receiving a written, notarized request from that user) and each
|
|
account to one jondo, and by monitoring and limiting the number of
|
|
jondos on any one net- work (using IP address), the attacker would be
|
|
forced to launch jondos using many different identities and on many
|
|
different networks to succeed'' \cite{crowds-tissec}.
|
|
|
|
|
|
[XXX I'm considering the subsection as ended here for now. I'm leaving the
|
|
following notes in case we want to revisit any of them. -PS]
|
|
|
|
There are also many systems which are intended for anonymous
|
|
and/or censorship resistant file sharing. [XXX Should we list all these
|
|
or just say it's out of scope for the paper?
|
|
eternity, gnunet, freenet, freehaven, publius, tangler, taz/rewebber]
|
|
|
|
|
|
|
|
Channel-based anonymizing systems also differ in their use of dummy traffic.
|
|
[XXX]
|
|
|
|
Finally, several systems provide low-latency anonymity without channel-based
|
|
communication. Crowds and [XXX] provide anonymity for HTTP requests; [...]
|
|
|
|
[XXX Mention error recovery?]
|
|
|
|
|
|
|
|
anonymizer%
|
|
pipenet%
|
|
freedom v1%
|
|
freedom v2%
|
|
onion routing v1%
|
|
isdn-mixes%
|
|
crowds%
|
|
real-time mixes, web mixes%
|
|
anonnet (marc rennhard's stuff)%
|
|
morphmix%
|
|
P5%
|
|
gnunet%
|
|
rewebbers%
|
|
tarzan%
|
|
herbivore%
|
|
hordes%
|
|
cebolla (?)%
|
|
|
|
[XXX Close by mentioning where Tor fits.]
|
|
|
|
\SubSection{Our threat model}
|
|
\label{subsec:threat-model}
|
|
|
|
Like all practical low-latency systems, Tor is broken against a global
|
|
passive adversary, the most commonly assumed adversary for analysis of
|
|
theoretical anonymous communication designs. The adversary we assume
|
|
is weaker than global with respect to distribution, but it is not
|
|
merely passive. We assume a threat model derived largely from that of
|
|
\cite{or-pet00}.
|
|
|
|
[XXX The following is cut in from the OR analysis paper from PET 2000.
|
|
I've already changed it a little, but didn't get very far.
|
|
And, much if not all will eventually
|
|
go. But I thought it a useful starting point. -PS]
|
|
|
|
The basic adversary components we consider are:
|
|
\begin{description}
|
|
\item[Observer:] can observe a connection (e.g., a sniffer on an
|
|
Internet router), but cannot initiate connections.
|
|
\item[Disrupter:] can delay (indefinitely) or corrupt traffic on a
|
|
link.
|
|
\item[Hostile initiator:] can initiate (destroy) connections with
|
|
specific routes as well as varying the timing and content of traffic
|
|
on the connections it creates.
|
|
\item[Hostile responder:] can vary the traffic on the connections made
|
|
to it including refusing them entirely, intentionally modifying what
|
|
it sends and at what rate, and selectively closing them.
|
|
\item[Compromised Tor-node:] can arbitrarily manipulate the connections
|
|
under its control, as well as creating new connections (that pass
|
|
through itself).
|
|
\end{description}
|
|
|
|
|
|
All feasible adversaries can be composed out of these basic
|
|
adversaries. This includes combinations such as one or more
|
|
compromised network nodes cooperating with disrupters of links on
|
|
which those nodes are not adjacent, or such as combinations of hostile
|
|
outsiders and observers. However, we are able to restrict our
|
|
analysis of adversaries to just one class, the compromised Tor-node.
|
|
We now justify this claim.
|
|
|
|
Especially in light of our assumption that the network forms a clique,
|
|
a hostile outsider can perform a subset of the actions that a
|
|
compromised COR can do. Also, while a compromised COR cannot disrupt
|
|
or observe a link unless it is adjacent to it, any adversary that
|
|
replaces some or all observers and/or disrupters with a compromised
|
|
COR adjacent to the relevant link is more powerful than the adversary
|
|
it replaces. And, in the presence of adequate link padding or bandwidth
|
|
limiting even collaborating observers can gain no useful information about
|
|
connections within the network. They may be able to gain information
|
|
by observing connections to the network (in the remote-COR configuration),
|
|
but again this is less than what the COR to which such connection is made
|
|
can learn. Thus, by considering adversaries consisting of
|
|
collections of compromised CORs we cover the worst case of all
|
|
combinations of basic adversaries. Our analysis focuses on this most
|
|
capable adversary, one or more compromised CORs.
|
|
|
|
The possible distributions of adversaries are
|
|
\begin{itemize}
|
|
\item{\bf single adversary}
|
|
\item{\bf multiple adversary:} A fixed, randomly distributed subset of
|
|
Tor-nodes is compromised.
|
|
\item{\bf roving adversary:} A fixed-bound size subset of Tor-nodes is
|
|
compromised at any one time. At specific intervals, other CORs can
|
|
become compromised or uncompromised.
|
|
\item{\bf global adversary:} All nodes are compromised.
|
|
\end{itemize}
|
|
|
|
Onion Routing provides no protection against a global adversary. If
|
|
all the CORs are compromised, they can know exactly who is talking to
|
|
whom. The content of what was sent will be revealed as it emerges
|
|
from the OR network, unless it has been end-to-end encrypted outside the
|
|
OR network. Even a firewall-to-firewall connection is exposed
|
|
if, as assumed above, our goal is to hide which local-COR is talking to
|
|
which local-COR.
|
|
|
|
\SubSection{Known attacks against low-latency anonymity systems}
|
|
\label{subsec:known-attacks}
|
|
|
|
We discuss each of these attacks in more detail below, along with the
|
|
aspects of the Tor design that provide defense. We provide a summary
|
|
of the attacks and our defenses against them in Section \ref{sec:attacks}.
|
|
|
|
Passive attacks:
|
|
simple observation,
|
|
timing correlation,
|
|
size correlation,
|
|
option distinguishability,
|
|
|
|
Active attacks:
|
|
key compromise,
|
|
iterated subpoena,
|
|
run recipient,
|
|
run a hostile node,
|
|
compromise entire path,
|
|
selectively DOS servers,
|
|
introduce timing into messages,
|
|
directory attacks,
|
|
tagging attacks
|
|
|
|
\Section{Design goals and assumptions}
|
|
\label{sec:assumptions}
|
|
|
|
\subsection{Goals}
|
|
% Are these really our goals? ;) -NM
|
|
Like other low-latency anonymity designs, Tor seeks to frustrate
|
|
attackers from linking communication partners, or from linking
|
|
multiple communications to or from a single point. Within this
|
|
overriding goal, however, several design considerations have directed
|
|
Tor's evolution.
|
|
|
|
First, we have tried to build a {\bf deployable} system. [XXX why?]
|
|
This requirement precludes designs that are expensive to run (for
|
|
example, by requiring more bandwidth than volunteers are easy to
|
|
provide); designs that place a heavy liability burden on operators
|
|
(for example, by allowing attackers to implicate operators in illegal
|
|
activities); and designs that are difficult or expensive to implement
|
|
(for example, by requiring kernel patches to many operating systems,
|
|
or ).
|
|
|
|
Second, the system must be {\bf usable}. A hard-to-use system has
|
|
fewer users---and because anonymity systems hide users among users, a
|
|
system with fewer users provides less anonymity. Thus, usability is
|
|
not only a convenience, but is a security requirement for anonymity
|
|
systems.
|
|
|
|
Third, the protocol must be {\bf extensible}, so that it can serve as
|
|
a test-bed for future research in low-latency anonymity systems.
|
|
(Note that while an extensible protocol benefits researchers, there is
|
|
a danger that differing choices of extensions will render users
|
|
distinguishable. Thus, implementations should not permit different
|
|
protocol extensions to coexist in a single deployed network.)
|
|
|
|
The protocol's design and security parameters must be {\bf
|
|
conservative}. Additional features impose implementation and
|
|
complexity costs. [XXX Say that we don't want to try to come up with
|
|
speculative solutions to problems we don't KNOW how to solve? -NM]
|
|
|
|
[XXX mention something about robustness? But we really aren't that
|
|
robust. We just assume that tunneled protocols tolerate connection
|
|
loss. -NM]
|
|
|
|
\subsection{Non-goals}
|
|
In favoring conservative, deployable designs, we have explicitly
|
|
deferred a number of goals---not because they are not desirable in
|
|
anonymity systems---but because solving them is either solved
|
|
elsewhere, or an area of active research without a generally accepted
|
|
solution.
|
|
|
|
Unlike Tarzan or Morphmix, Tor does not attempt to scale to completely
|
|
decentralized peer-to-peer environments with thousands of short-lived
|
|
servers.
|
|
|
|
Tor does not claim to provide a definitive solution to end-to-end
|
|
timing or intersection attacks for users who do not run their own
|
|
Onion Routers.
|
|
|
|
Tor does not provide ``protocol normalization'' like the Anonymizer,
|
|
Privoxy, or XXX. In order to provide client indistinguishibility for
|
|
complex and variable protocols such as HTTP, Tor must be layered with
|
|
a proxy such as Privoxy or XXX. Similarly, Tor does not currently
|
|
integrate tunneling for non-stream-based protocols; this too must be
|
|
provided by an external service.
|
|
|
|
Tor is not steganographic. It doesn't try to conceal which users are
|
|
sending or receiving communications via Tor.
|
|
|
|
\subsection{Assumptions}
|
|
- Threat model
|
|
- Mostly reliable nodes: not trusted.
|
|
- Small group of trusted dirserv ops
|
|
- Many users of diff bandwidth come and go.
|
|
|
|
[XXX what else?]
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\Section{The Tor Design}
|
|
\label{sec:design}
|
|
|
|
|
|
\Section{Other design decisions}
|
|
|
|
\SubSection{Exit policies and abuse}
|
|
\label{subsec:exitpolicies}
|
|
|
|
\SubSection{Directory Servers}
|
|
\label{subsec:dir-servers}
|
|
|
|
\Section{Rendezvous points: location privacy}
|
|
\label{sec:rendezvous}
|
|
|
|
Rendezvous points are a building block for \emph{location-hidden services}
|
|
(aka responder anonymity) in the Tor network. Location-hidden
|
|
services means Bob can offer a tcp service, such as an Apache webserver,
|
|
without revealing the IP of that service.
|
|
|
|
We provide this censorship resistance for Bob by allowing him to
|
|
advertise several onion routers (his \emph{Introduction Points}) as his
|
|
public location. Alice, the client, chooses a node for her \emph{Meeting
|
|
Point}. She connects to one of Bob's introduction points, informs him
|
|
about her meeting point, and then waits for him to connect to the meeting
|
|
point. This extra level of indirection means Bob's introduction points
|
|
don't open themselves up to abuse by serving files directly, eg if Bob
|
|
chooses a node in France to serve material distateful to the French. The
|
|
extra level of indirection also allows Bob to respond to some requests
|
|
and ignore others.
|
|
|
|
We provide the necessary glue so that Alice can view webpages from Bob's
|
|
location-hidden webserver with minimal invasive changes. Both Alice and
|
|
Bob must run local onion proxies.
|
|
|
|
The steps of a rendezvous:
|
|
\begin{tightlist}
|
|
\item Bob chooses some Introduction Points, and advertises them on a
|
|
Distributed Hash Table (DHT).
|
|
\item Bob establishes onion routing connections to each of his
|
|
Introduction Points, and waits.
|
|
\item Alice learns about Bob's service out of band (perhaps Bob told her,
|
|
or she found it on a website). She looks up the details of Bob's
|
|
service from the DHT.
|
|
\item Alice chooses and establishes a Meeting Point (MP) for this
|
|
transaction.
|
|
\item Alice goes to one of Bob's Introduction Points, and gives it a blob
|
|
(encrypted for Bob) which tells him about herself, the Meeting Point
|
|
she chose, and the first half of an ephemeral key handshake. The
|
|
Introduction Point sends the blob to Bob.
|
|
\item Bob chooses whether to ignore the blob, or to onion route to MP.
|
|
Let's assume the latter.
|
|
\item MP plugs together Alice and Bob. Note that MP can't recognize Alice,
|
|
Bob, or the data they transmit (they share a session key).
|
|
\item Alice sends a Begin cell along the circuit. It arrives at Bob's
|
|
onion proxy. Bob's onion proxy connects to Bob's webserver.
|
|
\item Data goes back and forth as usual.
|
|
\end{tightlist}
|
|
|
|
When establishing an introduction point, Bob provides the onion router
|
|
with a public ``introduction'' key. The hash of this public key
|
|
identifies a unique service, and (since Bob is required to sign his
|
|
messages) prevents anybody else from usurping Bob's introduction point
|
|
in the future. Bob uses the same public key when establish the other
|
|
introduction points for that service.
|
|
|
|
The blob that Alice gives the introduction point includes a hash of Bob's
|
|
public key to identify the service, an optional initial authentication
|
|
token (the introduction point can do prescreening, eg to block replays),
|
|
and (encrypted to Bob's public key) the location of the meeting point,
|
|
a meeting cookie Bob should tell the meeting point so he gets connected to
|
|
Alice, an optional authentication token so Bob choose whether to respond,
|
|
and the first half of a DH key exchange. When Bob connects to the meeting
|
|
place and gets connected to Alice's pipe, his first cell contains the
|
|
other half of the DH key exchange.
|
|
|
|
\subsection{Integration with user applications}
|
|
|
|
For each service Bob offers, he configures his local onion proxy to know
|
|
the local IP and port of the server, a strategy for authorizating Alices,
|
|
and a public key. We assume the existence of a robust decentralized
|
|
efficient lookup system which allows authenticated updates, eg
|
|
\cite{cfs:sosp01}. (Each onion router could run a node in this lookup
|
|
system; also note that as a stopgap measure, we can just run a simple
|
|
lookup system on the directory servers.) Bob publishes into the DHT
|
|
(indexed by the hash of the public key) the public key, an expiration
|
|
time (``not valid after''), and the current introduction points for that
|
|
service. Note that Bob's webserver is completely oblivious to the fact
|
|
that it's hidden behind the Tor network.
|
|
|
|
As far as Alice's experience goes, we require that her client interface
|
|
remain a SOCKS proxy, and we require that she shouldn't have to modify
|
|
her applications. Thus we encode all of the necessary information into
|
|
the hostname (more correctly, fully qualified domain name) that Alice
|
|
uses, eg when clicking on a url in her browser. Location-hidden services
|
|
use the special top level domain called `.onion': thus hostnames take the
|
|
form x.y.onion where x encodes the hash of PK, and y is the authentication
|
|
cookie. Alice's onion proxy examines hostnames and recognizes when they're
|
|
destined for a hidden server. If so, it decodes the PK and starts the
|
|
rendezvous as described in the table above.
|
|
|
|
\subsection{Previous rendezvous work}
|
|
|
|
Ian Goldberg developed a similar notion of rendezvous points for
|
|
low-latency anonymity systems \cite{ian-thesis}. His ``service tag''
|
|
is the same concept as our ``hash of service's public key''. We make it
|
|
a hash of the public key so it can be self-authenticating, and so the
|
|
client can recognize the same service with confidence later on. His
|
|
design differs from ours in the following ways though. Firstly, Ian
|
|
suggests that the client should manually hunt down a current location of
|
|
the service via Gnutella; whereas our use of the DHT makes lookup faster,
|
|
more robust, and transparent to the user. Secondly, the client and server
|
|
can share ephemeral DH keys, so at no point in the path is the plaintext
|
|
exposed. Thirdly, our design is much more practical for deployment in a
|
|
volunteer network, in terms of getting volunteers to offer introduction
|
|
and meeting point services. The introduction points do not output any
|
|
bytes to the clients. And the meeting points don't know the client,
|
|
the server, or the stuff being transmitted. The indirection scheme
|
|
is also designed with authentication/authorization in mind -- if the
|
|
client doesn't include the right cookie with its request for service,
|
|
the server doesn't even acknowledge its existence.
|
|
|
|
\Section{Maintaining anonymity sets}
|
|
\label{sec:maintaining-anonymity}
|
|
|
|
packet counting attacks work great against initiators. need to do some
|
|
level of obfuscation for that. standard link padding for passive link
|
|
observers. long-range padding for people who own the first hop. are
|
|
we just screwed against people who insert timing signatures into your
|
|
traffic?
|
|
|
|
Even regardless of link padding from Alice to the cloud, there will be
|
|
times when Alice is simply not online. Link padding, at the edges or
|
|
inside the cloud, does not help for this.
|
|
|
|
how often should we pull down directories? how often send updated
|
|
server descs?
|
|
|
|
when we start up the client, should we build a circuit immediately,
|
|
or should the default be to build a circuit only on demand? should we
|
|
fetch a directory immediately?
|
|
|
|
would we benefit from greater synchronization, to blend with the other
|
|
users? would the reduced speed hurt us more?
|
|
|
|
does the "you can't see when i'm starting or ending a stream because
|
|
you can't tell what sort of relay cell it is" idea work, or is just
|
|
a distraction?
|
|
|
|
does running a server actually get you better protection, because traffic
|
|
coming from your node could plausibly have come from elsewhere? how
|
|
much mixing do you need before this is actually plausible, or is it
|
|
immediately beneficial because many adversary can't see your node?
|
|
|
|
do different exit policies at different exit nodes trash anonymity sets,
|
|
or not mess with them much?
|
|
|
|
do we get better protection against a realistic adversary by having as
|
|
many nodes as possible, so he probably can't see the whole network,
|
|
or by having a small number of nodes that mix traffic well? is a
|
|
cascade topology a more realistic way to get defenses against traffic
|
|
confirmation? does the hydra (many inputs, few outputs) topology work
|
|
better? are we going to get a hydra anyway because most nodes will be
|
|
middleman nodes?
|
|
|
|
using a circuit many times is good because it's less cpu work
|
|
good because of predecessor attacks with path rebuilding
|
|
bad because predecessor attacks can be more likely to link you with a
|
|
previous circuit since you're so verbose
|
|
bad because each thing you do on that circuit is linked to the other
|
|
things you do on that circuit
|
|
|
|
Because Tor runs over TCP, when one of the servers goes down it seems
|
|
that all the circuits (and thus streams) going over that server must
|
|
break. This reduces anonymity because everybody needs to reconnect
|
|
right then (does it? how much?) and because exit connections all break
|
|
at the same time, and it also reduces usability. It seems the problem
|
|
is even worse in a p2p environment, because so far such systems don't
|
|
really provide an incentive for nodes to stay connected when they're
|
|
done browsing, so we would expect a much higher churn rate than for
|
|
onion routing. Are there ways of allowing streams to survive the loss
|
|
of a node in the path?
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\Section{Attacks and Defenses}
|
|
\label{sec:attacks}
|
|
|
|
Below we summarize a variety of attacks and how well our design withstands
|
|
them.
|
|
|
|
\begin{enumerate}
|
|
\item \textbf{Passive attacks}
|
|
\begin{itemize}
|
|
\item \emph{Simple observation.}
|
|
\item \emph{Timing correlation.}
|
|
\item \emph{Size correlation.}
|
|
\item \emph{Option distinguishability.}
|
|
\end{itemize}
|
|
|
|
\item \textbf{Active attacks}
|
|
\begin{itemize}
|
|
\item \emph{Key compromise.}
|
|
\item \emph{Iterated subpoena.}
|
|
\item \emph{Run recipient.}
|
|
\item \emph{Run a hostile node.}
|
|
\item \emph{Compromise entire path.}
|
|
\item \emph{Selectively DoS servers.}
|
|
\item \emph{Introduce timing into messages.}
|
|
\item \emph{Tagging attacks.}
|
|
\end{itemize}
|
|
|
|
\item \textbf{Directory attacks}
|
|
\begin{itemize}
|
|
\item foo
|
|
\end{itemize}
|
|
|
|
\end{enumerate}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\Section{Future Directions and Open Problems}
|
|
\label{sec:conclusion}
|
|
|
|
Tor brings together many innovations into
|
|
a unified deployable system. But there are still several attacks that
|
|
work quite well, as well as a number of sustainability and run-time
|
|
issues remaining to be ironed out. In particular:
|
|
|
|
\begin{itemize}
|
|
\item \emph{Scalability:} Since Tor's emphasis currently is on simplicity
|
|
of design and deployment, the current design won't easily handle more
|
|
than a few hundred servers, because of its clique topology. Restricted
|
|
route topologies \cite{danezis:pet2003} promise comparable anonymity
|
|
with much better scaling properties, but we must solve problems like
|
|
how to randomly form the network without introducing net attacks.
|
|
% cascades are a restricted route topology too. we must mention
|
|
% earlier why we're not satisfied with the cascade approach.
|
|
\item \emph{Cover traffic:} Currently we avoid cover traffic because
|
|
it introduces clear performance and bandwidth costs, but and its
|
|
security properties are not well understood. With more research
|
|
\cite{SS03,defensive-dropping}, the price/value ratio may change, both for
|
|
link-level cover traffic and also long-range cover traffic. In particular,
|
|
we expect restricted route topologies to reduce the cost of cover traffic
|
|
because there are fewer links to cover.
|
|
\item \emph{Better directory distribution:} Even with the threshold
|
|
directory agreement algorithm described in \ref{sec:dirservers},
|
|
the directory servers are still trust bottlenecks. We must find more
|
|
decentralized yet practical ways to distribute up-to-date snapshots of
|
|
network status without introducing new attacks.
|
|
\item \emph{Implementing location-hidden servers:} While Section
|
|
\ref{sec:rendezvous} provides a design for rendezvous points and
|
|
location-hidden servers, this feature has not yet been implemented.
|
|
We will likely encounter additional issues, both in terms of usability
|
|
and anonymity, that must be resolved.
|
|
\item \emph{Wider-scale deployment:} The original goal of Tor was to
|
|
gain experience in deploying an anonymizing overlay network, and learn
|
|
from having actual users. We are now at the point where we can start
|
|
deploying a wider network. We will see what happens!
|
|
% ok, so that's hokey. fix it. -RD
|
|
\end{itemize}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
%\Section{Acknowledgments}
|
|
%% commented out for anonymous submission
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\bibliographystyle{latex8}
|
|
\bibliography{tor-design}
|
|
|
|
\end{document}
|
|
|
|
% Style guide:
|
|
% U.S. spelling
|
|
% avoid contractions (it's, can't, etc.)
|
|
% 'mix', 'mixes' (as noun)
|
|
% 'mix-net'
|
|
% 'mix', 'mixing' (as verb)
|
|
% 'Mixminion Project'
|
|
% 'Mixminion' (meaning the protocol suite or the network)
|
|
% 'Mixmaster' (meaning the protocol suite or the network)
|
|
% 'middleman' [Not with a hyphen; the hyphen has been optional
|
|
% since Middle English.]
|
|
% 'nymserver'
|
|
% 'Cypherpunk', 'Cypherpunks', 'Cypherpunk remailer'
|
|
%
|
|
% 'Whenever you are tempted to write 'Very', write 'Damn' instead, so
|
|
% your editor will take it out for you.' -- Misquoted from Mark Twain
|
|
|