2003-07-11 21:28:36 +02:00
|
|
|
\documentclass[times,10pt,twocolumn]{article}
|
|
|
|
\usepackage{latex8}
|
2003-10-27 13:05:35 +01:00
|
|
|
\usepackage{times}
|
2003-07-11 21:28:36 +02:00
|
|
|
\usepackage{url}
|
|
|
|
\usepackage{graphics}
|
|
|
|
\usepackage{amsmath}
|
|
|
|
|
|
|
|
\pagestyle{empty}
|
|
|
|
|
|
|
|
\renewcommand\url{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\Url}
|
|
|
|
\newcommand\emailaddr{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\Url}
|
|
|
|
|
|
|
|
% If an URL ends up with '%'s in it, that's because the line *in the .bib/.tex
|
|
|
|
% file* is too long, so break it there (it doesn't matter if the next line is
|
|
|
|
% indented with spaces). -DH
|
|
|
|
|
|
|
|
%\newif\ifpdf
|
|
|
|
%\ifx\pdfoutput\undefined
|
|
|
|
% \pdffalse
|
|
|
|
%\else
|
|
|
|
% \pdfoutput=1
|
|
|
|
% \pdftrue
|
|
|
|
%\fi
|
|
|
|
|
2003-10-17 13:04:39 +02:00
|
|
|
\newenvironment{tightlist}{\begin{list}{$\bullet$}{
|
|
|
|
\setlength{\itemsep}{0mm}
|
|
|
|
\setlength{\parsep}{0mm}
|
|
|
|
% \setlength{\labelsep}{0mm}
|
|
|
|
% \setlength{\labelwidth}{0mm}
|
|
|
|
% \setlength{\topsep}{0mm}
|
|
|
|
}}{\end{list}}
|
|
|
|
|
2003-07-11 21:28:36 +02:00
|
|
|
\begin{document}
|
|
|
|
|
|
|
|
%% Use dvipdfm instead. --DH
|
|
|
|
%\ifpdf
|
|
|
|
% \pdfcompresslevel=9
|
|
|
|
% \pdfpagewidth=\the\paperwidth
|
|
|
|
% \pdfpageheight=\the\paperheight
|
|
|
|
%\fi
|
|
|
|
|
2003-11-02 07:14:59 +01:00
|
|
|
\title{Tor: The Second-Generation Onion Router}
|
2003-11-02 08:48:56 +01:00
|
|
|
% Putting the 'Private' back in 'Virtual Private Network'
|
2003-07-11 21:28:36 +02:00
|
|
|
|
2003-10-10 06:35:25 +02:00
|
|
|
%\author{Roger Dingledine \\ The Free Haven Project \\ arma@freehaven.net \and
|
|
|
|
%Nick Mathewson \\ The Free Haven Project \\ nickm@freehaven.net \and
|
|
|
|
%Paul Syverson \\ Naval Research Lab \\ syverson@itd.nrl.navy.mil}
|
2003-07-11 21:28:36 +02:00
|
|
|
|
|
|
|
\maketitle
|
|
|
|
\thispagestyle{empty}
|
|
|
|
|
|
|
|
\begin{abstract}
|
2003-11-01 04:40:20 +01:00
|
|
|
We present Tor, a circuit-based low-latency anonymous communication
|
2003-10-26 11:47:49 +01:00
|
|
|
system. Tor is the successor to Onion Routing
|
2003-10-24 23:18:38 +02:00
|
|
|
and addresses many limitations in the original Onion Routing design.
|
2003-11-02 08:48:56 +01:00
|
|
|
Tor works in a real-world Internet environment, requires no special
|
|
|
|
privileges such as root- or kernel-level access,
|
2003-07-11 21:28:36 +02:00
|
|
|
requires little synchronization or coordination between nodes, and
|
2003-11-02 08:48:56 +01:00
|
|
|
provides a reasonable tradeoff between anonymity and usability/efficiency.
|
|
|
|
We include a new practical design for rendezvous points, as well
|
|
|
|
as a big list of open problems.
|
2003-07-11 21:28:36 +02:00
|
|
|
\end{abstract}
|
|
|
|
|
|
|
|
%\begin{center}
|
|
|
|
%\textbf{Keywords:} anonymity, peer-to-peer, remailer, nymserver, reply block
|
|
|
|
%\end{center}
|
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
|
|
|
|
\Section{Overview}
|
|
|
|
\label{sec:intro}
|
|
|
|
|
2003-10-25 00:48:26 +02:00
|
|
|
Onion Routing is a distributed overlay network designed to anonymize
|
2003-10-10 06:35:25 +02:00
|
|
|
low-latency TCP-based applications such as web browsing, secure shell,
|
2003-10-25 00:48:26 +02:00
|
|
|
and instant messaging. Clients choose a path through the network and
|
2003-10-26 23:59:18 +01:00
|
|
|
build a \emph{virtual circuit}, in which each node (or ``onion router'')
|
|
|
|
in the path knows its
|
2003-10-10 06:35:25 +02:00
|
|
|
predecessor and successor, but no others. Traffic flowing down the circuit
|
|
|
|
is sent in fixed-size \emph{cells}, which are unwrapped by a symmetric key
|
2003-10-25 13:41:26 +02:00
|
|
|
at each node (like the layers of an onion) and relayed downstream. The
|
|
|
|
original Onion Routing project published several design and analysis
|
|
|
|
papers
|
2003-11-01 04:40:20 +01:00
|
|
|
\cite{or-ih96,or-jsac98,or-discex00,or-pet00}. While
|
2003-11-01 23:34:23 +01:00
|
|
|
a wide area Onion Routing network was deployed for some weeks,
|
2003-10-10 21:57:27 +02:00
|
|
|
the only long-running and publicly accessible
|
2003-10-07 17:59:30 +02:00
|
|
|
implementation was a fragile proof-of-concept that ran on a single
|
2003-11-01 04:40:20 +01:00
|
|
|
machine.
|
|
|
|
% (which nonetheless processed several tens of thousands of connections
|
|
|
|
%daily from thousands of global users).
|
|
|
|
%%Do we really want to say this? It softens our motivation for the paper. -RD
|
2003-11-01 23:34:23 +01:00
|
|
|
%
|
|
|
|
% In general, I try to emphasize rather than understate past
|
|
|
|
% accomplishments so I am giving an accurate comparison,
|
|
|
|
% which strengthens the claims in the paper. This is true whether
|
|
|
|
% it is my work or someone else's.
|
|
|
|
% This is also the only experimental basic viability result we
|
|
|
|
% can point to for Onion Routing in general at this point. -PS
|
2003-10-29 12:31:52 +01:00
|
|
|
Many critical design and deployment issues were never resolved,
|
2003-10-10 21:57:27 +02:00
|
|
|
and the design has not been updated in several years.
|
|
|
|
Here we describe Tor, a protocol for asynchronous, loosely
|
2003-10-07 17:59:30 +02:00
|
|
|
federated onion routers that provides the following improvements over
|
2003-11-01 04:40:20 +01:00
|
|
|
the old Onion Routing design:
|
2003-07-11 21:28:36 +02:00
|
|
|
|
2003-10-17 13:04:39 +02:00
|
|
|
\begin{tightlist}
|
2003-07-11 21:28:36 +02:00
|
|
|
|
2003-10-24 23:18:38 +02:00
|
|
|
\item \textbf{Perfect forward secrecy:} The original Onion Routing
|
2003-10-26 23:59:18 +01:00
|
|
|
design was vulnerable to a single hostile node recording traffic and later
|
2003-10-25 00:48:26 +02:00
|
|
|
compromising successive nodes in the circuit and forcing them to
|
|
|
|
decrypt it.
|
2003-10-26 23:59:18 +01:00
|
|
|
Rather than using a single onion to lay each circuit,
|
|
|
|
Tor now uses an incremental or \emph{telescoping}
|
2003-10-10 06:35:25 +02:00
|
|
|
path-building design, where the initiator negotiates session keys with
|
2003-10-26 23:59:18 +01:00
|
|
|
each successive hop in the circuit. Once these keys are deleted,
|
2003-10-27 00:49:01 +01:00
|
|
|
subsequently compromised nodes cannot decrypt old traffic.
|
2003-10-26 23:59:18 +01:00
|
|
|
As a side benefit, onion replay detection is no longer
|
2003-10-22 13:30:47 +02:00
|
|
|
necessary, and the process of building circuits is more reliable, since
|
2003-10-25 00:48:26 +02:00
|
|
|
the initiator knows when a hop fails and can then try extending to a new node.
|
2003-10-10 06:35:25 +02:00
|
|
|
|
2003-10-26 17:25:06 +01:00
|
|
|
% Perhaps mention that not all of these are things that we invented. -NM
|
|
|
|
|
2003-10-25 00:48:26 +02:00
|
|
|
\item \textbf{Separation of protocol cleaning from anonymity:}
|
|
|
|
The original Onion Routing design required a separate ``application
|
|
|
|
proxy'' for each
|
2003-10-30 01:24:53 +01:00
|
|
|
supported application protocol---most
|
2003-10-26 23:59:18 +01:00
|
|
|
of which were never written, so many applications were never supported.
|
|
|
|
Tor uses the standard and near-ubiquitous SOCKS
|
2003-10-25 00:48:26 +02:00
|
|
|
\cite{socks4,socks5} proxy interface, allowing us to support most TCP-based
|
|
|
|
programs without modification. This design change allows Tor to
|
2003-10-26 23:59:18 +01:00
|
|
|
use the filtering features of privacy-enhancing
|
2003-10-30 03:21:51 +01:00
|
|
|
application-level proxies such as Privoxy \cite{privoxy} without having to
|
2003-10-25 00:48:26 +02:00
|
|
|
incorporate those features itself.
|
|
|
|
|
|
|
|
\item \textbf{Many TCP streams can share one circuit:} The original
|
2003-10-26 23:59:18 +01:00
|
|
|
Onion Routing design built a separate circuit for each application-level
|
|
|
|
request.
|
2003-10-27 00:49:01 +01:00
|
|
|
This hurt performance by requiring multiple public key operations for
|
2003-10-26 23:59:18 +01:00
|
|
|
every request, and also presented
|
2003-11-01 04:40:20 +01:00
|
|
|
a threat to anonymity from building so many different circuits; see
|
|
|
|
Section~\ref{sec:maintaining-anonymity}.
|
|
|
|
Tor multiplexes multiple TCP streams along each virtual
|
|
|
|
circuit, to improve efficiency and anonymity.
|
2003-10-10 06:35:25 +02:00
|
|
|
|
2003-10-27 00:49:01 +01:00
|
|
|
\item \textbf{No mixing, padding, or traffic shaping:} The original
|
2003-11-01 04:40:20 +01:00
|
|
|
Onion Routing design called for batching and reordering the cells arriving
|
2003-11-01 23:34:23 +01:00
|
|
|
from each circuit and the ability to do padding between onion routers and,
|
|
|
|
in a later design, between onion
|
|
|
|
proxies (that is, users) and onion routers \cite{or-ih96,or-jsac98}.
|
|
|
|
The tradeoff between padding protection and cost was discussed, but no
|
|
|
|
general padding scheme was suggested. In
|
|
|
|
\cite{or-pet00} it was theorized \emph{traffic shaping} would generally
|
|
|
|
be used, but details were not provided.
|
|
|
|
Recent research \cite{econymics} and deployment
|
2003-10-25 00:48:26 +02:00
|
|
|
experience \cite{freedom21-security} suggest that this level of resource
|
2003-10-10 06:35:25 +02:00
|
|
|
use is not practical or economical; and even full link padding is still
|
2003-10-27 00:49:01 +01:00
|
|
|
vulnerable \cite{defensive-dropping}. Thus, until we have a proven and
|
2003-11-01 04:40:20 +01:00
|
|
|
convenient design for traffic shaping or low-latency mixing that
|
|
|
|
will improve anonymity against a realistic adversary, we leave these
|
|
|
|
strategies out.
|
2003-10-26 17:25:06 +01:00
|
|
|
|
2003-10-25 00:48:26 +02:00
|
|
|
\item \textbf{Leaky-pipe circuit topology:} Through in-band
|
|
|
|
signalling within the
|
2003-10-21 01:44:53 +02:00
|
|
|
circuit, Tor initiators can direct traffic to nodes partway down the
|
2003-11-01 04:40:20 +01:00
|
|
|
circuit. This allows for long-range padding to frustrate traffic
|
|
|
|
shape and volume attacks at the initiator \cite{defensive-dropping}.
|
|
|
|
Because circuits are used by more than one application, it also
|
2003-10-30 01:24:53 +01:00
|
|
|
allows traffic to exit the circuit from the middle---thus
|
2003-11-01 04:40:20 +01:00
|
|
|
frustrating traffic shape and volume attacks based on observing the
|
|
|
|
end of the circuit.
|
2003-10-10 06:35:25 +02:00
|
|
|
|
|
|
|
\item \textbf{Congestion control:} Earlier anonymity designs do not
|
|
|
|
address traffic bottlenecks. Unfortunately, typical approaches to load
|
|
|
|
balancing and flow control in overlay networks involve inter-node control
|
2003-11-01 04:40:20 +01:00
|
|
|
communication and global views of traffic. Tor's decentralized congestion
|
|
|
|
control uses end-to-end acks to maintain reasonable anonymity while
|
|
|
|
allowing nodes
|
2003-10-10 06:35:25 +02:00
|
|
|
at the edges of the network to detect congestion or flooding attacks
|
|
|
|
and send less data until the congestion subsides.
|
|
|
|
|
2003-10-27 00:49:01 +01:00
|
|
|
\item \textbf{Directory servers:} The original Onion Routing design
|
2003-10-30 01:24:53 +01:00
|
|
|
planned to flood link-state information through the network---an
|
2003-10-27 00:49:01 +01:00
|
|
|
approach which can be unreliable and
|
|
|
|
open to partitioning attacks or outright deception. Tor takes a simplified
|
2003-10-10 06:35:25 +02:00
|
|
|
view towards distributing link-state information. Certain more trusted
|
2003-11-01 04:40:20 +01:00
|
|
|
onion routers also act as directory servers: they provide signed
|
|
|
|
\emph{directories} which describe the routers they know about and mark
|
|
|
|
those that
|
2003-10-10 06:35:25 +02:00
|
|
|
are currently up. Users periodically download these directories via HTTP.
|
|
|
|
|
2003-11-01 04:40:20 +01:00
|
|
|
\item \textbf{End-to-end integrity checking:} The original Onion Routing
|
|
|
|
design did no integrity checking on data. Any onion router on the circuit
|
|
|
|
could change the contents of cells as they pass by---for example, to
|
|
|
|
redirect a
|
2003-10-23 13:45:51 +02:00
|
|
|
connection on the fly so it connects to a different webserver, or to
|
|
|
|
tag encrypted traffic and look for the tagged traffic at the network
|
2003-10-26 23:59:18 +01:00
|
|
|
edges \cite{minion-design}. Tor hampers these attacks by checking data
|
|
|
|
integrity before it leaves the network.
|
2003-10-23 13:45:51 +02:00
|
|
|
|
2003-11-01 04:40:20 +01:00
|
|
|
\item \textbf{Robustness to failed nodes:} A failed node in the old design
|
|
|
|
meant that circuit-building failed, but thanks to Tor's step-by-step
|
2003-10-25 00:48:26 +02:00
|
|
|
circuit building, users can notice failed
|
|
|
|
nodes while building circuits and route around them. Additionally,
|
|
|
|
liveness information from directories allows users to avoid
|
2003-10-26 23:59:18 +01:00
|
|
|
unreliable nodes in the first place.
|
2003-10-25 00:48:26 +02:00
|
|
|
%We further provide a
|
|
|
|
%simple mechanism that allows connections to be established despite recent
|
|
|
|
%node failure or slightly dated information from a directory server. Tor
|
2003-11-02 08:48:56 +01:00
|
|
|
%permits onion routers to have \emph{router twins}---nodes that share
|
2003-10-25 00:48:26 +02:00
|
|
|
%the same private decryption key. Note that because connections now have
|
|
|
|
%perfect forward secrecy, an onion router still cannot read the traffic
|
|
|
|
%on a connection established through its twin even while that connection
|
|
|
|
%is active. Also, which nodes are twins can change dynamically depending
|
|
|
|
%on current circumstances, and twins may or may not be under the same
|
|
|
|
%administrative authority.
|
|
|
|
%
|
|
|
|
%[Commented out; Router twins provide no real increase in robustness
|
|
|
|
%to failed nodes. If a non-twinned node goes down, the
|
|
|
|
%circuit-builder notices this and routes around it. Circuit-building
|
|
|
|
%is offline, so there shouldn't even be a latency hit. -NM]
|
|
|
|
|
|
|
|
\item \textbf{Variable exit policies:} Tor provides a consistent
|
2003-10-26 23:59:18 +01:00
|
|
|
mechanism for
|
2003-10-25 00:48:26 +02:00
|
|
|
each node to specify and advertise a policy describing the hosts and
|
|
|
|
ports to which it will connect. These exit policies
|
2003-10-23 13:45:51 +02:00
|
|
|
are critical in a volunteer-based distributed infrastructure, because
|
|
|
|
each operator is comfortable with allowing different types of traffic
|
|
|
|
to exit the Tor network from his node.
|
|
|
|
|
2003-11-01 04:40:20 +01:00
|
|
|
\item \textbf{Implementable in user-space:} Unlike other anonymity systems
|
|
|
|
like Freedom \cite{freedom2-arch}, Tor only attempts to anonymize TCP
|
|
|
|
streams. Thus it does not require patches to an operating system's network
|
|
|
|
stack (or built-in support) to operate. Although this approach is less
|
2003-10-27 00:49:01 +01:00
|
|
|
flexible, it has proven valuable to Tor's portability and deployability.
|
2003-10-24 23:18:38 +02:00
|
|
|
|
2003-11-01 04:40:20 +01:00
|
|
|
\item \textbf{Rendezvous points and location-protected servers:}
|
|
|
|
Tor provides an integrated mechanism for responder anonymity via
|
2003-10-26 23:59:18 +01:00
|
|
|
location-protected servers. Previous Onion Routing designs included
|
2003-11-01 04:40:20 +01:00
|
|
|
long-lived ``reply onions'' which could be used to build virtual circuits
|
|
|
|
to a hidden server, but a reply onion becomes useless if any node in
|
|
|
|
the path goes down or rotates its keys, and it also does not provide
|
|
|
|
forward security. In Tor's current design, clients negotiate {\it
|
|
|
|
rendezvous points} to connect with hidden servers; reply onions are no
|
|
|
|
longer required.
|
2003-10-17 13:04:39 +02:00
|
|
|
\end{tightlist}
|
2003-07-11 21:28:36 +02:00
|
|
|
|
2003-11-01 04:40:20 +01:00
|
|
|
We have implemented most of the above features. Our source code is
|
|
|
|
available under a free license, and is not encumbered by patents. We have
|
|
|
|
recently begun deploying a widespread alpha network to see how well the
|
|
|
|
design works in practice, to get more experience with usability and users,
|
|
|
|
and to provide a research platform for experimenting with new ideas.
|
2003-10-24 23:18:38 +02:00
|
|
|
|
2003-11-01 04:40:20 +01:00
|
|
|
We review previous work in Section~\ref{sec:related-work}, describe
|
2003-10-31 07:16:21 +01:00
|
|
|
our goals and assumptions in Section~\ref{sec:assumptions},
|
|
|
|
and then address the above list of improvements in
|
2003-11-01 04:40:20 +01:00
|
|
|
Sections~\ref{sec:design}-\ref{sec:rendezvous}. We
|
|
|
|
summarize in Section \ref{sec:analysis}
|
2003-10-10 06:35:25 +02:00
|
|
|
how our design stands up to known attacks, and conclude with a list of
|
|
|
|
open problems.
|
2003-07-11 21:28:36 +02:00
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
|
2003-11-01 04:40:20 +01:00
|
|
|
\Section{Related work}
|
2003-10-14 07:29:03 +02:00
|
|
|
\label{sec:related-work}
|
2003-11-01 04:40:20 +01:00
|
|
|
|
2003-10-14 07:29:03 +02:00
|
|
|
Modern anonymity designs date to Chaum's Mix-Net\cite{chaum-mix} design of
|
2003-11-01 04:40:20 +01:00
|
|
|
1981. Chaum proposed hiding sender-recipient linkability by wrapping
|
|
|
|
messages in layers of public key cryptography, and relaying them
|
2003-10-30 06:24:38 +01:00
|
|
|
through a path composed of ``Mixes.'' These mixes in turn decrypt, delay,
|
|
|
|
and re-order messages, before relaying them along the sender-selected
|
|
|
|
path towards their destinations.
|
2003-10-14 07:29:03 +02:00
|
|
|
|
2003-10-16 23:49:04 +02:00
|
|
|
Subsequent relay-based anonymity designs have diverged in two
|
|
|
|
principal directions. Some have attempted to maximize anonymity at
|
|
|
|
the cost of introducing comparatively large and variable latencies,
|
|
|
|
for example, Babel\cite{babel}, Mixmaster\cite{mixmaster-spec}, and
|
|
|
|
Mixminion\cite{minion-design}. Because of this
|
2003-11-01 04:40:20 +01:00
|
|
|
trade-off, these \emph{high-latency} networks are well-suited for anonymous
|
2003-10-14 07:29:03 +02:00
|
|
|
email, but introduce too much lag for interactive tasks such as web browsing,
|
|
|
|
internet chat, or SSH connections.
|
|
|
|
|
2003-10-30 01:24:53 +01:00
|
|
|
Tor belongs to the second category: \emph{low-latency} designs that attempt
|
|
|
|
to anonymize interactive network traffic. Because these protocols typically
|
2003-10-30 03:21:51 +01:00
|
|
|
involve a large number of packets that must be delivered quickly, it is
|
2003-10-30 01:24:53 +01:00
|
|
|
difficult for them to prevent an attacker who can eavesdrop both ends of the
|
2003-11-01 04:40:20 +01:00
|
|
|
communication from correlating the timing and volume
|
2003-10-30 01:24:53 +01:00
|
|
|
of traffic entering the anonymity network with traffic leaving it. These
|
2003-11-01 04:40:20 +01:00
|
|
|
protocols are also vulnerable against active attacks in which an
|
2003-10-30 01:24:53 +01:00
|
|
|
adversary introduces timing patterns into traffic entering the network, and
|
|
|
|
looks
|
|
|
|
for correlated patterns among exiting traffic.
|
|
|
|
Although some work has been done to frustrate
|
|
|
|
these attacks,\footnote{
|
|
|
|
The most common approach is to pad and limit communication to a constant
|
|
|
|
rate, or to limit
|
|
|
|
the variation in traffic shape. Doing so can have prohibitive bandwidth
|
|
|
|
costs and/or performance limitations.
|
|
|
|
} most designs protect primarily against traffic analysis rather than traffic
|
|
|
|
confirmation \cite{or-jsac98}---that is, they assume that the attacker is
|
|
|
|
attempting to learn who is talking to whom, not to confirm a prior suspicion
|
|
|
|
about who is talking to whom.
|
2003-10-16 23:49:04 +02:00
|
|
|
|
2003-10-14 07:29:03 +02:00
|
|
|
The simplest low-latency designs are single-hop proxies such as the
|
2003-11-01 04:40:20 +01:00
|
|
|
Anonymizer \cite{anonymizer}, wherein a single trusted server strips the
|
|
|
|
data's origin before relaying it. These designs are easy to
|
|
|
|
analyze, but require end-users to trust the anonymizing proxy.
|
|
|
|
Concentrating the traffic to a single point increases the anonymity set
|
|
|
|
(the set of people a given user is hiding among), but it can make traffic
|
|
|
|
analysis easier: an adversary need only eavesdrop on the proxy to observe
|
|
|
|
the entire system.
|
|
|
|
|
2003-11-02 07:14:59 +01:00
|
|
|
More complex are distributed-trust, circuit-based anonymizing systems.
|
|
|
|
In these designs, a user establishes one or more medium-term bidirectional
|
|
|
|
end-to-end circuits, and tunnels TCP streams in fixed-size cells.
|
|
|
|
Establishing circuits is expensive and typically requires public-key
|
|
|
|
cryptography, whereas relaying cells is comparatively inexpensive.
|
|
|
|
Because a circuit crosses several servers, no single server can link a
|
|
|
|
user to her communication partners.
|
|
|
|
|
|
|
|
The Java Anon Proxy (also known
|
|
|
|
as JAP or Web MIXes) uses fixed shared routes known as
|
|
|
|
\emph{cascades}. As with a single-hop proxy, this approach aggregates
|
2003-11-01 04:40:20 +01:00
|
|
|
users into larger anonymity sets, but again an attacker only needs to
|
|
|
|
observe both ends of the cascade to bridge all the system's traffic.
|
2003-11-02 07:14:59 +01:00
|
|
|
The Java Anon Proxy's design provides protection by padding
|
2003-10-30 01:24:53 +01:00
|
|
|
between end users and the head of the cascade \cite{web-mix}. However, the
|
|
|
|
current implementation does no padding and thus remains vulnerable
|
|
|
|
to both active and passive bridging.
|
2003-11-01 04:40:20 +01:00
|
|
|
%XXX fix, yes it does, sort of.
|
|
|
|
|
|
|
|
PipeNet \cite{back01, pipenet}, another low-latency design proposed at
|
|
|
|
about the same time as the original Onion Routing design, provided
|
|
|
|
stronger anonymity at the cost of allowing a single user to shut
|
|
|
|
down the network simply by not sending. Low-latency anonymous
|
2003-11-02 02:48:41 +01:00
|
|
|
communication has also been designed for other environments such as
|
|
|
|
ISDN \cite{isdn-mixes}.
|
|
|
|
|
|
|
|
In P2P designs like Tarzan \cite{tarzan:ccs02} and MorphMix
|
|
|
|
\cite{morphmix:fc04}, all participants both generate traffic and relay
|
|
|
|
traffic for others. Rather than aiming to hide the originator within a
|
|
|
|
group of other originators, these systems instead aim to prevent a peer
|
|
|
|
or observer from knowing whether a given peer originated the request
|
|
|
|
or just relayed it from another peer. While Tarzan and MorphMix use
|
|
|
|
layered encryption as above, Crowds \cite{crowds-tissec} simply assumes
|
|
|
|
an adversary who cannot observe the initiator: it uses no public-key
|
|
|
|
encryption, so nodes on a circuit can read that circuit's traffic. The
|
|
|
|
anonymity of the initiator relies on filtering all identifying information
|
|
|
|
from the data stream.
|
2003-10-30 01:24:53 +01:00
|
|
|
|
|
|
|
Hordes \cite{hordes-jcs} is based on Crowds but also uses multicast
|
2003-11-02 02:48:41 +01:00
|
|
|
responses to hide the initiator. Herbivore \cite{herbivore} and P5
|
2003-11-02 07:14:59 +01:00
|
|
|
\cite{p5} go even further, requiring broadcast. They make anonymity
|
|
|
|
and efficiency tradeoffs to make broadcast more practical.
|
|
|
|
These systems are designed primarily for communication between peers,
|
|
|
|
although Herbivore users can make external connections by
|
2003-11-02 02:48:41 +01:00
|
|
|
requesting a peer to serve as a proxy. Allowing easy connections to
|
|
|
|
nonparticipating responders or recipients is important for usability,
|
|
|
|
for example so users can visit nonparticipating Web sites or exchange
|
|
|
|
mail with nonparticipating recipients.
|
|
|
|
|
|
|
|
Systems like Freedom and the original Onion Routing build the circuit
|
|
|
|
all at once, using a layered ``onion'' of public-key encrypted messages,
|
|
|
|
each layer of which provides a set of session keys and the address of the
|
|
|
|
next server in the circuit. Tor as described herein, Tarzan, MorphMix,
|
|
|
|
Cebolla \cite{cebolla}, and AnonNet \cite{anonnet} build the circuit
|
|
|
|
in stages, extending it one hop at a time. This approach makes perfect
|
|
|
|
forward secrecy feasible.
|
|
|
|
|
2003-11-02 08:48:56 +01:00
|
|
|
Circuit-based anonymity designs must choose which protocol layer
|
|
|
|
to anonymize. They may choose to intercept IP packets directly, and
|
|
|
|
relay them whole (stripping the source address) as the contents of
|
|
|
|
the circuit \cite{tarzan:ccs02,freedom2-arch}. Alternatively, like
|
|
|
|
Tor, they may accept TCP streams and relay the data in those streams
|
|
|
|
along the circuit, ignoring the breakdown of that data into TCP frames
|
|
|
|
\cite{anonnet,morphmix:fc04}. Finally, they may accept application-level
|
|
|
|
protocols (such as HTTP) and relay the application requests themselves
|
|
|
|
along the circuit.
|
|
|
|
This protocol-layer decision represents a compromise between flexibility
|
|
|
|
and anonymity. For example, a system that understands HTTP can strip
|
|
|
|
identifying information from those requests; can take advantage of caching
|
|
|
|
to limit the number of requests that leave the network; and can batch
|
|
|
|
or encode those requests in order to minimize the number of connections.
|
|
|
|
On the other hand, an IP-level anonymizer can handle nearly any protocol,
|
|
|
|
even ones unforeseen by their designers (though these systems require
|
|
|
|
kernel-level modifications to some operating systems, and so are more
|
|
|
|
complex and less portable). TCP-level anonymity networks like Tor present
|
|
|
|
a middle approach: they are fairly application neutral (so long as the
|
|
|
|
application supports, or can be tunneled across, TCP), but by treating
|
|
|
|
application connections as data streams rather than raw TCP packets,
|
|
|
|
they avoid the well-known inefficiencies of tunneling TCP over TCP
|
|
|
|
\cite{tcp-over-tcp-is-bad}. [XXX what's a better cite?]
|
|
|
|
|
2003-11-02 02:48:41 +01:00
|
|
|
Distributed-trust anonymizing systems need to prevent attackers from
|
|
|
|
adding too many servers and thus compromising too many user paths.
|
|
|
|
Tor relies on a centrally maintained set of well-known servers. Tarzan
|
|
|
|
and MorphMix allow unknown users to run servers, and limit an attacker
|
|
|
|
from becoming too much of the network based on a limited resource such
|
|
|
|
as number of IPs controlled. Crowds suggests requiring written, notarized
|
|
|
|
requests from potential crowd members.
|
|
|
|
|
|
|
|
Anonymous communication is an essential component of censorship-resistant
|
|
|
|
systems like Eternity \cite{eternity}, Free Haven \cite{freehaven-berk},
|
|
|
|
Publius \cite{publius}, and Tangler \cite{tangler}. Tor's rendezvous
|
|
|
|
points enable connections between mutually anonymous entities; they
|
|
|
|
are a building block for location-hidden servers, which are needed by
|
|
|
|
Eternity and Free Haven.
|
|
|
|
|
|
|
|
% didn't include rewebbers. No clear place to put them, so I'll leave
|
|
|
|
% them out for now. -RD
|
2003-07-11 21:28:36 +02:00
|
|
|
|
|
|
|
\Section{Design goals and assumptions}
|
|
|
|
\label{sec:assumptions}
|
|
|
|
|
2003-10-31 07:16:21 +01:00
|
|
|
\SubSection{Goals}
|
2003-10-21 19:43:26 +02:00
|
|
|
Like other low-latency anonymity designs, Tor seeks to frustrate
|
|
|
|
attackers from linking communication partners, or from linking
|
2003-11-02 07:14:59 +01:00
|
|
|
multiple communications to or from a single user. Within this
|
2003-10-22 13:30:47 +02:00
|
|
|
main goal, however, several design considerations have directed
|
2003-10-21 19:43:26 +02:00
|
|
|
Tor's evolution.
|
|
|
|
|
2003-11-02 07:14:59 +01:00
|
|
|
\textbf{Deployability:} The design must be one which can be implemented,
|
|
|
|
deployed, and used in the real world. This requirement precludes designs
|
|
|
|
that are expensive to run (for example, by requiring more bandwidth
|
|
|
|
than volunteers are willing to provide); designs that place a heavy
|
|
|
|
liability burden on operators (for example, by allowing attackers to
|
|
|
|
implicate onion routers in illegal activities); and designs that are
|
|
|
|
difficult or expensive to implement (for example, by requiring kernel
|
|
|
|
patches, or separate proxies for every protocol). This requirement also
|
|
|
|
precludes systems in which users who do not benefit from anonymity are
|
|
|
|
required to run special software in order to communicate with anonymous
|
|
|
|
parties. (We do not meet this goal for the current rendezvous design,
|
|
|
|
however; see Section~\ref{sec:rendezvous}.)
|
|
|
|
|
|
|
|
\textbf{Usability:} A hard-to-use system has fewer users---and because
|
|
|
|
anonymity systems hide users among users, a system with fewer users
|
|
|
|
provides less anonymity. Usability is not only a convenience for Tor:
|
|
|
|
it is a security requirement \cite{econymics,back01}. Tor should not
|
|
|
|
require modifying applications; should not introduce prohibitive delays;
|
|
|
|
and should require the user to make as few configuration decisions
|
|
|
|
as possible.
|
|
|
|
|
|
|
|
\textbf{Flexibility:} The protocol must be flexible and well-specified,
|
|
|
|
so that it can serve as a test-bed for future research in low-latency
|
|
|
|
anonymity systems. Many of the open problems in low-latency anonymity
|
|
|
|
networks, such as generating dummy traffic or preventing Sybil attacks
|
|
|
|
\cite{sybil}, may be solvable independently from the issues solved by
|
|
|
|
Tor. Hopefully future systems will not need to reinvent Tor's design
|
|
|
|
decisions. (But note that while a flexible design benefits researchers,
|
|
|
|
there is a danger that differing choices of extensions will make users
|
|
|
|
distinguishable. Experiments should be run on a separate network.)
|
|
|
|
|
|
|
|
\textbf{Conservative design:} The protocol's design and security
|
|
|
|
parameters must be conservative. Additional features impose implementation
|
|
|
|
and complexity costs; adding unproven techniques to the design threatens
|
|
|
|
deployability, readability, and ease of security analysis. Tor aims to
|
|
|
|
deploy a simple and stable system that integrates the best well-understood
|
|
|
|
approaches to protecting anonymity.
|
2003-10-21 19:43:26 +02:00
|
|
|
|
2003-10-31 07:16:21 +01:00
|
|
|
\SubSection{Non-goals}
|
2003-11-01 07:47:19 +01:00
|
|
|
\label{subsec:non-goals}
|
2003-10-30 05:05:28 +01:00
|
|
|
In favoring conservative, deployable designs, we have explicitly deferred
|
2003-11-02 07:14:59 +01:00
|
|
|
a number of goals, either because they are solved elsewhere, or because
|
|
|
|
they are an open research question.
|
|
|
|
|
|
|
|
\textbf{Not Peer-to-peer:} Tarzan and MorphMix aim to scale to completely
|
|
|
|
decentralized peer-to-peer environments with thousands of short-lived
|
|
|
|
servers, many of which may be controlled by an adversary. This approach
|
|
|
|
is appealing, but still has many open problems.
|
|
|
|
|
|
|
|
\textbf{Not secure against end-to-end attacks:} Tor does not claim
|
|
|
|
to provide a definitive solution to end-to-end timing or intersection
|
|
|
|
attacks. Some approaches, such as running an onion router, may help;
|
|
|
|
see Section~\ref{sec:analysis} for more discussion.
|
|
|
|
|
|
|
|
\textbf{No protocol normalization:} Tor does not provide \emph{protocol
|
|
|
|
normalization} like Privoxy or the Anonymizer. For complex and variable
|
|
|
|
protocols such as HTTP, Tor must be layered with a filtering proxy such
|
|
|
|
as Privoxy to hide differences between clients, and expunge protocol
|
|
|
|
features that leak identity. Similarly, Tor does not currently integrate
|
|
|
|
tunneling for non-stream-based protocols like UDP; this too must be
|
|
|
|
provided by an external service.
|
2003-10-30 05:05:28 +01:00
|
|
|
% Actually, tunneling udp over tcp is probably horrible for some apps.
|
|
|
|
% Should this get its own non-goal bulletpoint? The motivation for
|
2003-11-02 07:14:59 +01:00
|
|
|
% non-goal-ness would be burden on clients / portability. -RD
|
|
|
|
% No, leave it as is. -RD
|
|
|
|
|
|
|
|
\textbf{Not steganographic:} Tor does not try to conceal which users are
|
|
|
|
sending or receiving communications; it only tries to conceal with whom
|
|
|
|
they communicate.
|
2003-10-21 23:44:00 +02:00
|
|
|
|
2003-10-30 13:10:24 +01:00
|
|
|
\SubSection{Threat Model}
|
|
|
|
\label{subsec:threat-model}
|
2003-10-21 23:44:00 +02:00
|
|
|
|
2003-10-30 13:10:24 +01:00
|
|
|
A global passive adversary is the most commonly assumed threat when
|
2003-11-02 07:14:59 +01:00
|
|
|
analyzing theoretical anonymity designs. But like all practical
|
|
|
|
low-latency systems, Tor does not protect against such a strong
|
|
|
|
adversary. Instead, we expect an adversary who can observe some fraction
|
|
|
|
of network traffic; who can generate, modify, delete, or delay traffic
|
|
|
|
on the network; who can operate onion routers of its own; and who can
|
|
|
|
compromise some fraction of the onion routers on the network.
|
|
|
|
|
|
|
|
%Large adversaries will be able to compromise a considerable fraction
|
|
|
|
%of the network. (In some circumstances---for example, if the Tor
|
|
|
|
%network is running on a hardened network where all operators have
|
|
|
|
%had background checks---the number of compromised nodes could be quite
|
|
|
|
%small.) Compromised nodes can arbitrarily manipulate the connections that
|
|
|
|
%pass through them, as well as creating new connections that pass through
|
|
|
|
%themselves. They can observe traffic, and record it for later analysis.
|
|
|
|
|
|
|
|
In low-latency anonymity systems that use layered encryption, the
|
|
|
|
adversary's typical goal is to observe both the initiator and the
|
|
|
|
receiver. Passive attackers can confirm a suspicion that Alice is
|
|
|
|
talking to Bob if the timing and volume properties of the traffic on the
|
|
|
|
connection are unique enough; active attackers are even more effective
|
|
|
|
because they can induce timing signatures on the traffic. Tor provides
|
|
|
|
some defenses against these \emph{traffic confirmation} attacks, for
|
|
|
|
example by encouraging users to run their own onion routers, but it does
|
|
|
|
not provide complete protection. Rather, we aim to prevent \emph{traffic
|
|
|
|
analysis} attacks, where the adversary uses traffic patterns to learn
|
|
|
|
which points in the network he should attack.
|
|
|
|
|
|
|
|
Our adversary might try to link an initiator Alice with any of her
|
|
|
|
communication partners, or he might try to build a profile of Alice's
|
|
|
|
behavior. He might mount passive attacks by observing the edges of the
|
|
|
|
network and correlating traffic entering and leaving the network---either
|
|
|
|
because of relationships in packet timing; relationships in the volume
|
|
|
|
of data sent; or relationships in any externally visible user-selected
|
|
|
|
options. The adversary can also mount active attacks by compromising
|
|
|
|
routers or keys; by replaying traffic; by selectively DoSing trustworthy
|
|
|
|
routers to encourage users to send their traffic through compromised
|
|
|
|
routers, or DoSing users to see if the traffic elsewhere in the
|
|
|
|
network stops; or by introducing patterns into traffic that can later be
|
|
|
|
detected. The adversary might attack the directory servers to give users
|
|
|
|
differing views of network state. Additionally, he can try to decrease
|
|
|
|
the network's reliability by attacking nodes or by performing antisocial
|
|
|
|
activities from reliable servers and trying to get them taken down;
|
|
|
|
making the network unreliable flushes users to other less anonymous
|
|
|
|
systems, where they may be easier to attack.
|
|
|
|
|
|
|
|
We consider each of these attacks in more detail below, and summarize
|
|
|
|
in Section~\ref{sec:attacks} how well the Tor design defends against
|
|
|
|
each of them.
|
2003-10-21 23:44:00 +02:00
|
|
|
|
2003-07-11 21:28:36 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
|
|
|
|
\Section{The Tor Design}
|
|
|
|
\label{sec:design}
|
|
|
|
|
2003-10-25 13:41:26 +02:00
|
|
|
The Tor network is an overlay network; each node is called an onion router
|
2003-10-31 00:05:40 +01:00
|
|
|
(OR). Onion routers run as normal user-level processes without needing
|
|
|
|
any special
|
2003-11-01 09:48:12 +01:00
|
|
|
privileges. Currently, each OR maintains a long-term TLS \cite{TLS}
|
|
|
|
connection to every other
|
2003-10-31 00:05:40 +01:00
|
|
|
OR. (We examine some ways to relax this clique-topology assumption in
|
2003-10-31 07:16:21 +01:00
|
|
|
Section~\ref{subsec:restricted-routes}.) A subset of the ORs also act as
|
2003-10-25 13:41:26 +02:00
|
|
|
directory servers, tracking which routers are currently in the network;
|
2003-10-31 07:16:21 +01:00
|
|
|
see Section~\ref{subsec:dirservers} for directory server details. Users
|
2003-10-31 00:05:40 +01:00
|
|
|
run local software called an onion proxy (OP) to fetch directories,
|
|
|
|
establish paths (called \emph{virtual circuits}) across the network,
|
|
|
|
and handle connections from user applications. Onion proxies accept
|
2003-10-25 13:41:26 +02:00
|
|
|
TCP streams and multiplex them across the virtual circuit. The onion
|
2003-10-27 11:18:20 +01:00
|
|
|
router on the other side
|
|
|
|
% I don't mean other side, I mean wherever it is on the circuit. But
|
|
|
|
% don't want to introduce complexity this early? Hm. -RD
|
|
|
|
of the circuit connects to the destinations of
|
2003-10-25 13:41:26 +02:00
|
|
|
the TCP streams and relays data.
|
|
|
|
|
2003-10-31 00:05:40 +01:00
|
|
|
Each onion router uses three public keys: a long-term identity key, a
|
|
|
|
short-term onion key, and a short-term link key. The identity
|
|
|
|
(signing) key is used to sign TLS certificates, to sign its router
|
|
|
|
descriptor (a summary of its keys, address, bandwidth, exit policy,
|
|
|
|
etc), and to sign directories if it is a directory server. Changing
|
|
|
|
the identity key of a router is considered equivalent to creating a
|
|
|
|
new router. The onion (decryption) key is used for decrypting requests
|
|
|
|
from users to set up a circuit and negotiate ephemeral keys. Finally,
|
|
|
|
link keys are used by the TLS protocol when communicating between
|
2003-10-31 07:16:21 +01:00
|
|
|
onion routers. We discuss rotating these keys in
|
|
|
|
Section~\ref{subsec:rotating-keys}.
|
2003-10-26 11:47:49 +01:00
|
|
|
|
2003-10-31 07:16:21 +01:00
|
|
|
Section~\ref{subsec:cells} discusses the structure of the fixed-size
|
2003-10-26 11:47:49 +01:00
|
|
|
\emph{cells} that are the unit of communication in Tor. We describe
|
2003-10-31 07:16:21 +01:00
|
|
|
in Section~\ref{subsec:circuits} how virtual circuits are
|
|
|
|
built, extended, truncated, and destroyed. Section~\ref{subsec:tcp}
|
2003-10-31 00:05:40 +01:00
|
|
|
describes how TCP streams are routed through the network, and finally
|
2003-10-31 07:16:21 +01:00
|
|
|
Section~\ref{subsec:congestion} talks about congestion control and
|
2003-10-26 11:47:49 +01:00
|
|
|
fairness issues.
|
2003-10-25 13:41:26 +02:00
|
|
|
|
|
|
|
\SubSection{Cells}
|
2003-10-26 11:47:49 +01:00
|
|
|
\label{subsec:cells}
|
|
|
|
|
2003-10-31 00:05:40 +01:00
|
|
|
% I think we should describe connections before cells. -NM
|
|
|
|
|
2003-10-31 07:16:21 +01:00
|
|
|
Traffic passes from one OR to another, or between a user's OP and an OR,
|
2003-10-31 00:05:40 +01:00
|
|
|
in fixed-size cells. Each cell is 256
|
|
|
|
bytes, and consists of a header and a payload. The header includes an
|
2003-10-31 07:16:21 +01:00
|
|
|
anonymous circuit identifier (ACI) that specifies which circuit the
|
|
|
|
% Should we replace ACI with circID ? What is this 'anonymous circuit'
|
|
|
|
% thing anyway? -RD
|
2003-10-31 00:05:40 +01:00
|
|
|
cell refers to
|
2003-10-26 11:47:49 +01:00
|
|
|
(many circuits can be multiplexed over the single TCP connection between
|
|
|
|
ORs or between an OP and an OR), and a command to describe what to do
|
2003-10-31 00:05:40 +01:00
|
|
|
with the cell's payload. Cells are either \emph{control} cells, which are
|
|
|
|
interpreted by the node that receives them, or \emph{relay} cells,
|
2003-10-31 07:16:21 +01:00
|
|
|
which carry end-to-end stream data. Controls cells can be one of:
|
2003-10-31 00:05:40 +01:00
|
|
|
\emph{padding} (currently used for keepalive, but also usable for link
|
|
|
|
padding); \emph{create} or \emph{created} (used to set up a new circuit);
|
2003-10-26 11:47:49 +01:00
|
|
|
or \emph{destroy} (to tear down a circuit).
|
2003-10-31 00:05:40 +01:00
|
|
|
% We need to say that ACIs are connection-specific: each circuit has
|
|
|
|
% a different ACI along each connection. -NM
|
2003-10-31 07:16:21 +01:00
|
|
|
% agreed -RD
|
2003-10-26 11:47:49 +01:00
|
|
|
|
|
|
|
Relay cells have an additional header (the relay header) after the
|
2003-10-31 07:16:21 +01:00
|
|
|
cell header, containing the stream identifier (many streams can
|
2003-10-31 00:05:40 +01:00
|
|
|
be multiplexed over a circuit); an end-to-end checksum for integrity
|
|
|
|
checking; the length of the relay payload; and a relay command. Relay
|
2003-10-27 11:18:20 +01:00
|
|
|
commands can be one of: \emph{relay
|
2003-10-26 11:47:49 +01:00
|
|
|
data} (for data flowing down the stream), \emph{relay begin} (to open a
|
|
|
|
stream), \emph{relay end} (to close a stream), \emph{relay connected}
|
|
|
|
(to notify the OP that a relay begin has succeeded), \emph{relay
|
|
|
|
extend} and \emph{relay extended} (to extend the circuit by a hop,
|
|
|
|
and to acknowledge), \emph{relay truncate} and \emph{relay truncated}
|
|
|
|
(to tear down only part of the circuit, and to acknowledge), \emph{relay
|
|
|
|
sendme} (used for congestion control), and \emph{relay drop} (used to
|
|
|
|
implement long-range dummies).
|
|
|
|
|
2003-10-31 00:05:40 +01:00
|
|
|
We describe each of these cell types in more detail below.
|
2003-10-26 11:47:49 +01:00
|
|
|
|
2003-10-26 23:49:07 +01:00
|
|
|
% Nick: should there have been a table here? -RD
|
2003-10-31 00:05:40 +01:00
|
|
|
% Maybe. -NM
|
2003-10-26 11:47:49 +01:00
|
|
|
|
|
|
|
\SubSection{Circuits and streams}
|
|
|
|
\label{subsec:circuits}
|
|
|
|
|
2003-10-31 00:05:40 +01:00
|
|
|
% I think when we say ``the user,'' maybe we should say ``the user's OP.''
|
|
|
|
|
|
|
|
The original Onion Routing design built one circuit for each
|
|
|
|
TCP stream. Because building a circuit can take several tenths of a
|
|
|
|
second (due to public-key cryptography delays and network latency),
|
|
|
|
this design imposed high costs on applications like web browsing that
|
|
|
|
open many TCP streams.
|
|
|
|
|
|
|
|
In Tor, each circuit can be shared by many TCP streams. To avoid
|
|
|
|
delays, users construct circuits preemptively. To limit linkability
|
|
|
|
among the streams, users rotate connections by building a new circuit
|
|
|
|
periodically (currently every minute) if the previous one has been
|
|
|
|
used, and expire old used circuits that are no longer in use. Thus
|
2003-11-01 04:44:13 +01:00
|
|
|
even heavy users spend a negligible amount of time and CPU in
|
2003-10-31 00:05:40 +01:00
|
|
|
building circuits, but only a limited number of requests can be linked
|
|
|
|
to each other by a given exit node. Also, because circuits are built
|
|
|
|
in the background, failed routers do not affects user experience.
|
|
|
|
|
|
|
|
\subsubsection{Constructing a circuit}
|
|
|
|
|
|
|
|
Users construct each incrementally, negotiating a symmetric key with
|
|
|
|
each hop one at a time. To begin creating a new circuit, the user
|
|
|
|
(call her Alice) sends a \emph{create} cell to the first node in her
|
|
|
|
chosen path. The cell's payload is the first half of the
|
|
|
|
Diffie-Hellman handshake, encrypted to the onion key of the OR (call
|
|
|
|
him Bob). Bob responds with a \emph{created} cell containg the second
|
|
|
|
half of the DH handshake, along with a hash of the negotiated key
|
|
|
|
$K=g^{xy}$. This protocol tries to achieve unilateral entity
|
|
|
|
authentication (Alice knows she's handshaking with Bob, Bob doesn't
|
|
|
|
care who is opening the circuit---Alice has no key and is trying to
|
|
|
|
remain anonymous); unilateral key authentication (Alice and Bob
|
|
|
|
agree on a key, and Alice knows Bob is the only other person who could
|
|
|
|
know it). We also want perfect forward
|
|
|
|
secrecy, key freshness, etc.
|
2003-10-26 11:47:49 +01:00
|
|
|
|
|
|
|
\begin{equation}
|
|
|
|
\begin{aligned}
|
|
|
|
\mathrm{Alice} \rightarrow \mathrm{Bob}&: E_{PK_{Bob}}(g^x) \\
|
2003-10-27 00:49:01 +01:00
|
|
|
\mathrm{Bob} \rightarrow \mathrm{Alice}&: g^y, H(K | \mathrm{``handshake"}) \\
|
2003-10-26 11:47:49 +01:00
|
|
|
\end{aligned}
|
|
|
|
\end{equation}
|
|
|
|
|
2003-10-28 22:55:38 +01:00
|
|
|
The second step shows both that it was Bob
|
2003-10-26 11:47:49 +01:00
|
|
|
who received $g^x$, and that it was Bob who came up with $y$. We use
|
2003-10-29 12:31:52 +01:00
|
|
|
PK encryption in the first step (rather than, e.g., using the first two
|
2003-10-26 11:47:49 +01:00
|
|
|
steps of STS, which has a signature in the second step) because we
|
|
|
|
don't have enough room in a single cell for a public key and also a
|
2003-11-01 23:34:23 +01:00
|
|
|
signature. Preliminary analysis with the NRL protocol analyzer \cite{meadows96}
|
|
|
|
shows the above protocol to be secure (including providing PFS) under the
|
2003-10-26 11:47:49 +01:00
|
|
|
traditional Dolev-Yao model.
|
|
|
|
% cite Cathy? -RD
|
|
|
|
% did I use the buzzwords correctly? -RD
|
|
|
|
|
2003-10-31 00:05:40 +01:00
|
|
|
% Hm. I think that this paragraph could go earlier in expository
|
|
|
|
% order: we describe how to build whole circuit, then explain the
|
|
|
|
% protocol in more detail. -NM
|
2003-10-26 11:47:49 +01:00
|
|
|
To extend a circuit past the first hop, Alice sends a \emph{relay extend}
|
|
|
|
cell to the last node in the circuit, specifying the address of the new
|
|
|
|
OR and an encrypted $g^x$ for it. That node copies the half-handshake
|
|
|
|
into a \emph{create} cell, and passes it to the new OR to extend the
|
|
|
|
circuit. When it responds with a \emph{created} cell, the penultimate OR
|
|
|
|
copies the payload into a \emph{relay extended} cell and passes it back.
|
2003-10-26 23:49:07 +01:00
|
|
|
% Nick: please fix my "that OR" pronouns -RD
|
|
|
|
|
2003-10-31 00:05:40 +01:00
|
|
|
\subsubsection{Relay cells}
|
2003-10-26 23:49:07 +01:00
|
|
|
Once Alice has established the circuit (so she shares a key with each
|
|
|
|
OR on the circuit), she can send relay cells.
|
2003-10-30 13:10:24 +01:00
|
|
|
The stream ID in the relay header indicates to which stream the cell belongs.
|
2003-10-31 07:56:52 +01:00
|
|
|
A relay cell can be addressed to any of the ORs on the circuit. To
|
|
|
|
construct a relay cell addressed to a given OR, Alice iteratively
|
2003-10-26 23:49:07 +01:00
|
|
|
encrypts the cell payload (that is, the relay header and payload)
|
2003-10-31 07:56:52 +01:00
|
|
|
with the symmetric key of each hop up to that OR. Then, at each hop
|
2003-10-26 23:49:07 +01:00
|
|
|
down the circuit, the OR decrypts the cell payload and checks whether
|
2003-10-31 07:56:52 +01:00
|
|
|
it recognizes the stream ID. A stream ID is recognized either if it
|
2003-10-26 23:49:07 +01:00
|
|
|
is an already open stream at that OR, or if it is equal to zero. The
|
|
|
|
zero stream ID is treated specially, and is used for control messages,
|
2003-10-28 22:55:38 +01:00
|
|
|
e.g. starting a new stream. If the stream ID is unrecognized, the OR
|
2003-10-31 07:56:52 +01:00
|
|
|
passes the relay cell downstream. This \emph{leaky pipe} circuit topology
|
|
|
|
allows Alice's streams to exit at different ORs on a single circuit.
|
2003-11-01 04:40:20 +01:00
|
|
|
Alice may choose different exit points because of their exit policies,
|
|
|
|
or to keep the ORs from knowing that two streams
|
2003-10-26 23:49:07 +01:00
|
|
|
originate at the same person.
|
|
|
|
|
|
|
|
To tear down a circuit, Alice sends a destroy control cell. Each OR
|
|
|
|
in the circuit receives the destroy cell, closes all open streams on
|
|
|
|
that circuit, and passes a new destroy cell forward. But since circuits
|
|
|
|
can be built incrementally, they can also be torn down incrementally:
|
2003-10-31 07:56:52 +01:00
|
|
|
Alice can instead send a relay truncate cell to a node along the circuit. That
|
2003-10-31 00:05:40 +01:00
|
|
|
node will send a destroy cell forward, and reply with an acknowledgment
|
2003-10-28 22:55:38 +01:00
|
|
|
(relay truncated). Alice might truncate her circuit so she can extend it
|
|
|
|
to different nodes without signaling to the first few nodes (or somebody
|
2003-10-26 23:49:07 +01:00
|
|
|
observing them) that she is changing her circuit. That is, nodes in the
|
|
|
|
middle are not even aware that the circuit was truncated, because the
|
|
|
|
relay cells are encrypted. Similarly, if a node on the circuit goes down,
|
|
|
|
the adjacent node can send a relay truncated back to Alice. Thus the
|
|
|
|
``break a node and see which circuits go down'' attack is weakened.
|
|
|
|
|
2003-10-26 11:47:49 +01:00
|
|
|
\SubSection{Opening and closing streams}
|
|
|
|
\label{subsec:tcp}
|
2003-10-24 23:18:38 +02:00
|
|
|
|
2003-10-27 11:18:20 +01:00
|
|
|
When Alice's application wants to open a TCP connection to a given
|
|
|
|
address and port, it asks the OP (via SOCKS) to make the connection. The
|
|
|
|
OP chooses the newest open circuit (or creates one if none is available),
|
|
|
|
chooses a suitable OR on that circuit to be the exit node (usually the
|
2003-10-31 07:16:21 +01:00
|
|
|
last node, but maybe others due to exit policy conflicts; see
|
|
|
|
Section~\ref{sec:exit-policies}), chooses a new random stream ID for
|
|
|
|
this stream,
|
2003-10-27 11:18:20 +01:00
|
|
|
and delivers a relay begin cell to that exit node. It uses a stream ID
|
|
|
|
of zero for the begin cell (so the OR will recognize it), and the relay
|
|
|
|
payload lists the new stream ID and the destination address and port.
|
|
|
|
Once the exit node completes the connection to the remote host, it
|
|
|
|
responds with a relay connected cell through the circuit. Upon receipt,
|
|
|
|
the OP notifies the application that it can begin talking.
|
|
|
|
|
|
|
|
There's a catch to using SOCKS, though -- some applications hand the
|
|
|
|
alphanumeric address to the proxy, while others resolve it into an IP
|
|
|
|
address first and then hand the IP to the proxy. When the application
|
|
|
|
does the DNS resolution first, Alice broadcasts her destination. Common
|
|
|
|
applications like Mozilla and ssh have this flaw.
|
|
|
|
|
|
|
|
In the case of Mozilla, we're fine: the filtering web proxy called Privoxy
|
|
|
|
does the SOCKS call safely, and Mozilla talks to Privoxy safely. But a
|
|
|
|
portable general solution, such as for ssh, is an open problem. We could
|
|
|
|
modify the local nameserver, but this approach is invasive, brittle, and
|
|
|
|
not portable. We could encourage the resolver library to do resolution
|
|
|
|
via TCP rather than UDP, but this approach is hard to do right, and also
|
|
|
|
has portability problems. Our current answer is to encourage the use of
|
|
|
|
privacy-aware proxies like Privoxy wherever possible, and also provide
|
|
|
|
a tool similar to \emph{dig} that can do a private lookup through the
|
|
|
|
Tor network.
|
|
|
|
|
|
|
|
Ending a Tor stream is analogous to ending a TCP stream: it uses a
|
|
|
|
two-step handshake for normal operation, or a one-step handshake for
|
|
|
|
errors. If one side of the stream closes abnormally, that node simply
|
|
|
|
sends a relay teardown cell, and tears down the stream. If one side
|
|
|
|
% Nick: mention relay teardown in 'cell' subsec? good enough name? -RD
|
|
|
|
of the stream closes the connection normally, that node sends a relay
|
|
|
|
end cell down the circuit. When the other side has sent back its own
|
|
|
|
relay end, the stream can be torn down. This two-step handshake allows
|
|
|
|
for TCP-based applications that, for example, close a socket for writing
|
|
|
|
but are still willing to read.
|
|
|
|
|
2003-10-28 22:55:38 +01:00
|
|
|
\SubSection{Integrity checking on streams}
|
2003-10-27 11:18:20 +01:00
|
|
|
|
2003-10-31 00:05:40 +01:00
|
|
|
In the old Onion Routing design, traffic was vulnerable to a
|
|
|
|
malleability attack: an attacker could make changes to an encrypted
|
|
|
|
cell to create corresponding changes to the data leaving the network.
|
|
|
|
(Even an external adversary could do this, despite link encryption!)
|
2003-10-27 11:18:20 +01:00
|
|
|
|
2003-10-31 00:05:40 +01:00
|
|
|
This weakness allowed an adversary to change a create cell to a destroy
|
|
|
|
cell; change the destination address in a relay begin cell to the
|
|
|
|
adversary's webserver; or change a user on an ftp connection from
|
|
|
|
typing ``dir'' to typing ``delete *''. Any node or observer along the
|
|
|
|
path could introduce such corruption in a stream.
|
2003-10-27 11:18:20 +01:00
|
|
|
|
2003-10-31 00:05:40 +01:00
|
|
|
Tor prevents external adversaries by mounting this attack simply by
|
|
|
|
using TLS. Addressing the insider malleability attack, however, is
|
|
|
|
more complex.
|
2003-10-27 11:18:20 +01:00
|
|
|
|
2003-10-31 00:05:40 +01:00
|
|
|
Rather than doing integrity checking of the relay cells at each hop,
|
|
|
|
which would increase packet size
|
2003-10-27 11:18:20 +01:00
|
|
|
by a function of path length\footnote{This is also the argument against
|
2003-11-02 08:48:56 +01:00
|
|
|
using recent cipher modes like EAX \cite{eax}---we don't want the added
|
2003-10-27 11:18:20 +01:00
|
|
|
message-expansion overhead at each hop, and we don't want to leak the path
|
2003-10-31 00:05:40 +01:00
|
|
|
length (or pad to some max path length).}, we choose to
|
|
|
|
% accept passive timing attacks,
|
|
|
|
% (How? I don't get it. Do we mean end-to-end traffic
|
|
|
|
% confirmation attacks? -NM)
|
2003-10-31 07:16:21 +01:00
|
|
|
and perform integrity
|
2003-10-27 11:18:20 +01:00
|
|
|
checking only at the edges of the circuit. When Alice negotiates a key
|
2003-10-31 00:05:40 +01:00
|
|
|
with the exit hop, they both start a SHA-1 with some derivative of that key,
|
2003-10-27 11:18:20 +01:00
|
|
|
thus starting out with randomness that only the two of them know. From
|
|
|
|
then on they each incrementally add all the data bytes flowing across
|
|
|
|
the stream to the SHA-1, and each relay cell includes the first 4 bytes
|
|
|
|
of the current value of the hash.
|
|
|
|
|
|
|
|
The attacker must be able to guess all previous bytes between Alice
|
|
|
|
and Bob on that circuit (including the pseudorandomness from the key
|
2003-10-29 12:31:52 +01:00
|
|
|
negotiation), plus the bytes in the current cell, to remove or modify the
|
2003-10-30 13:10:24 +01:00
|
|
|
cell. Attacks on SHA-1 where the adversary can incrementally add to a
|
2003-11-01 09:48:12 +01:00
|
|
|
hash to produce a new valid hash don't work,
|
2003-10-30 13:10:24 +01:00
|
|
|
because all hashes are end-to-end encrypted across the circuit.
|
|
|
|
The computational overhead isn't so bad, compared to doing an AES
|
|
|
|
% XXX We never say we use AES. Say it somewhere above? -RD
|
2003-10-27 11:18:20 +01:00
|
|
|
crypt at each hop in the circuit. We use only four bytes per cell to
|
|
|
|
minimize overhead; the chance that an adversary will correctly guess a
|
|
|
|
valid hash, plus the payload the current cell, is acceptly low, given
|
|
|
|
that Alice or Bob tear down the circuit if they receive a bad hash.
|
|
|
|
|
2003-10-28 22:55:38 +01:00
|
|
|
\SubSection{Rate limiting and fairness}
|
|
|
|
|
2003-10-31 07:56:52 +01:00
|
|
|
Volunteers are generally more willing to run services that can limit
|
|
|
|
their bandwidth usage. To accomodate them, Tor servers use a token
|
2003-11-01 09:48:12 +01:00
|
|
|
bucket approach to limit the number of bytes they
|
2003-10-31 07:56:52 +01:00
|
|
|
receive. Tokens are added to the bucket each second (when the bucket is
|
|
|
|
full, new tokens are discarded.) Each token represents permission to
|
2003-11-02 05:53:15 +01:00
|
|
|
receive one byte from the network---to receive a byte, the connection
|
2003-10-31 07:56:52 +01:00
|
|
|
must remove a token from the bucket. Thus if the bucket is empty, that
|
|
|
|
connection must wait until more tokens arrive. The number of tokens we
|
|
|
|
add enforces a long-term average rate of incoming bytes, while still
|
|
|
|
permitting short-term bursts above the allowed bandwidth. Current bucket
|
|
|
|
sizes are set to ten seconds worth of traffic.
|
2003-10-28 22:55:38 +01:00
|
|
|
|
|
|
|
Further, we want to avoid starving any Tor streams. Entire circuits
|
|
|
|
could starve if we read greedily from connections and one connection
|
|
|
|
uses all the remaining bandwidth. We solve this by dividing the number
|
|
|
|
of tokens in the bucket by the number of connections that want to read,
|
|
|
|
and reading at most that number of bytes from each connection. We iterate
|
|
|
|
this procedure until the number of tokens in the bucket is under some
|
|
|
|
threshold (eg 10KB), at which point we greedily read from connections.
|
|
|
|
|
2003-10-31 07:56:52 +01:00
|
|
|
Because the Tor protocol generates roughly the same number of outgoing
|
|
|
|
bytes as incoming bytes, it is sufficient in practice to rate-limit
|
|
|
|
incoming bytes.
|
|
|
|
% Is it? Fun attack: I send you lots of 1-byte-at-a-time TCP frames.
|
|
|
|
% In response, you send lots of 256 byte cells. Can I use this to
|
2003-11-01 09:48:12 +01:00
|
|
|
% make you exceed your outgoing bandwidth limit by a factor of 256? -NM
|
|
|
|
% Can we resolve this by, when reading from edge connections, rounding up
|
|
|
|
% the bytes read (wrt buckets) to the nearest multiple of 256? -RD
|
2003-10-31 07:56:52 +01:00
|
|
|
|
2003-11-01 09:48:12 +01:00
|
|
|
Further, inspired by Rennhard et al's design in \cite{anonnet}, a
|
2003-10-31 07:56:52 +01:00
|
|
|
circuit's edges heuristically distinguish interactive streams from bulk
|
|
|
|
streams by comparing the frequency with which they supply cells. We can
|
2003-11-01 04:40:20 +01:00
|
|
|
provide good latency for interactive streams by giving them preferential
|
2003-10-31 07:56:52 +01:00
|
|
|
service, while still getting good overall throughput to the bulk
|
|
|
|
streams. Such preferential treatment presents a possible end-to-end
|
2003-11-01 04:40:20 +01:00
|
|
|
attack, but an adversary who can observe both
|
2003-10-31 07:56:52 +01:00
|
|
|
ends of the stream can already learn this information through timing
|
|
|
|
attacks.
|
2003-10-28 22:55:38 +01:00
|
|
|
|
|
|
|
\SubSection{Congestion control}
|
2003-10-26 11:47:49 +01:00
|
|
|
\label{subsec:congestion}
|
|
|
|
|
2003-10-28 22:55:38 +01:00
|
|
|
Even with bandwidth rate limiting, we still need to worry about
|
2003-10-31 07:56:52 +01:00
|
|
|
congestion, either accidental or intentional. If enough users choose the
|
|
|
|
same OR-to-OR connection for their circuits, that connection can become
|
|
|
|
saturated. For example, an adversary could make a large HTTP PUT request
|
|
|
|
through the onion routing network to a webserver he runs, and then
|
|
|
|
refuse to read any of the bytes at the webserver end of the
|
2003-10-28 22:55:38 +01:00
|
|
|
circuit. Without some congestion control mechanism, these bottlenecks
|
2003-10-31 07:56:52 +01:00
|
|
|
can propagate back through the entire network. We describe our
|
|
|
|
responses below.
|
2003-10-28 22:55:38 +01:00
|
|
|
|
|
|
|
\subsubsection{Circuit-level}
|
|
|
|
|
|
|
|
To control a circuit's bandwidth usage, each OR keeps track of two
|
2003-10-31 07:56:52 +01:00
|
|
|
windows. The \emph{package window} tracks how many relay data cells the OR is
|
2003-10-28 22:55:38 +01:00
|
|
|
allowed to package (from outside streams) for transmission back to the OP,
|
2003-10-31 07:56:52 +01:00
|
|
|
and the \emph{deliver window} tracks how many relay data cells it is willing
|
2003-10-28 22:55:38 +01:00
|
|
|
to deliver to streams outside the network. Each window is initialized
|
|
|
|
(say, to 1000 data cells). When a data cell is packaged or delivered,
|
|
|
|
the appropriate window is decremented. When an OR has received enough
|
|
|
|
data cells (currently 100), it sends a relay sendme cell towards the OP,
|
|
|
|
with stream ID zero. When an OR receives a relay sendme cell with stream
|
|
|
|
ID zero, it increments its packaging window. Either of these cells
|
|
|
|
increments the corresponding window by 100. If the packaging window
|
|
|
|
reaches 0, the OR stops reading from TCP connections for all streams
|
|
|
|
on the corresponding circuit, and sends no more relay data cells until
|
|
|
|
receiving a relay sendme cell.
|
|
|
|
|
|
|
|
The OP behaves identically, except that it must track a packaging window
|
|
|
|
and a delivery window for every OR in the circuit. If a packaging window
|
|
|
|
reaches 0, it stops reading from streams destined for that OR.
|
|
|
|
|
|
|
|
\subsubsection{Stream-level}
|
|
|
|
|
|
|
|
The stream-level congestion control mechanism is similar to the
|
|
|
|
circuit-level mechanism above. ORs and OPs use relay sendme cells
|
|
|
|
to implement end-to-end flow control for individual streams across
|
|
|
|
circuits. Each stream begins with a package window (e.g. 500 cells),
|
|
|
|
and increments the window by a fixed value (50) upon receiving a relay
|
|
|
|
sendme cell. Rather than always returning a relay sendme cell as soon
|
|
|
|
as enough cells have arrived, the stream-level congestion control also
|
|
|
|
has to check whether data has been successfully flushed onto the TCP
|
|
|
|
stream; it sends a relay sendme only when the number of bytes pending
|
|
|
|
to be flushed is under some threshold (currently 10 cells worth).
|
|
|
|
|
|
|
|
Currently, non-data relay cells do not affect the windows. Thus we
|
|
|
|
avoid potential deadlock issues, e.g. because a stream can't send a
|
|
|
|
relay sendme cell because its packaging window is empty.
|
|
|
|
|
|
|
|
\subsubsection{Needs more research}
|
|
|
|
|
|
|
|
We don't need to reimplement full TCP windows (with sequence numbers,
|
|
|
|
the ability to drop cells when we're full and retransmit later, etc),
|
|
|
|
because the TCP streams already guarantee in-order delivery of each
|
|
|
|
cell. But we need to investigate further the effects of the current
|
|
|
|
parameters on throughput and latency, while also keeping privacy in mind;
|
2003-10-31 07:16:21 +01:00
|
|
|
see Section~\ref{sec:maintaining-anonymity} for more discussion.
|
2003-10-24 23:18:38 +02:00
|
|
|
|
2003-10-25 13:41:26 +02:00
|
|
|
\Section{Other design decisions}
|
|
|
|
|
|
|
|
\SubSection{Resource management and DoS prevention}
|
2003-10-27 11:18:20 +01:00
|
|
|
\label{subsec:dos}
|
2003-10-24 23:18:38 +02:00
|
|
|
|
2003-11-01 04:06:23 +01:00
|
|
|
Providing Tor as a public service provides many opportunities for an
|
|
|
|
attacker to mount denial-of-service attacks against the network. While
|
|
|
|
flow control and rate limiting (discussed in
|
2003-11-02 08:48:56 +01:00
|
|
|
Section~\ref{subsec:congestion}) prevent users from consuming more
|
|
|
|
bandwidth than routers are willing to provide, opportunities remain for
|
|
|
|
users to
|
2003-11-01 04:06:23 +01:00
|
|
|
consume more network resources than their fair share, or to render the
|
|
|
|
network unusable for other users.
|
|
|
|
|
|
|
|
First of all, there are a number of CPU-consuming denial-of-service
|
|
|
|
attacks wherein an attacker can force an OR to perform expensive
|
|
|
|
cryptographic operations. For example, an attacker who sends a
|
|
|
|
\emph{create} cell full of junk bytes can force an OR to perform an RSA
|
2003-11-02 08:48:56 +01:00
|
|
|
decrypt. Similarly, an attacker can
|
2003-11-01 04:06:23 +01:00
|
|
|
fake the start of a TLS handshake, forcing the OR to carry out its
|
|
|
|
(comparatively expensive) half of the handshake at no real computational
|
|
|
|
cost to the attacker.
|
|
|
|
|
2003-11-02 08:48:56 +01:00
|
|
|
Several approaches exist to address these attacks. First, ORs may
|
2003-11-01 04:06:23 +01:00
|
|
|
demand proof-of-computation tokens \cite{hashcash} before beginning new
|
|
|
|
TLS handshakes or accepting \emph{create} cells. So long as these
|
|
|
|
tokens are easy to verify and computationally expensive to produce, this
|
|
|
|
approach limits the DoS attack multiplier. Additionally, ORs may limit
|
|
|
|
the rate at which they accept create cells and TLS connections, so that
|
2003-11-02 08:48:56 +01:00
|
|
|
the computational work of processing them does not drown out the (comparatively
|
|
|
|
inexpensive) work of symmetric cryptography needed to keep cells
|
|
|
|
flowing. This rate limiting could, however, allows an attacker
|
|
|
|
to slow down other users when they build new circuits.
|
2003-11-01 04:06:23 +01:00
|
|
|
|
|
|
|
% What about link-to-link rate limiting?
|
|
|
|
|
|
|
|
More worrisome are distributed denial of service attacks wherein an
|
|
|
|
attacker uses a large number of compromised hosts throughout the network
|
|
|
|
to consume the Tor network's resources. Although these attacks are not
|
|
|
|
new to the networking literature, some proposed approaches are a poor
|
|
|
|
fit to anonymous networks. For example, solutions based on backtracking
|
2003-11-02 08:48:56 +01:00
|
|
|
harmful traffic \cite{XXX} could allow an anonymity-breaking
|
|
|
|
adversary to exploit the backtracking mechanism.
|
2003-11-01 04:06:23 +01:00
|
|
|
|
|
|
|
Attackers also have an opportunity to attack the Tor network by mounting
|
2003-11-02 08:48:56 +01:00
|
|
|
attacks on its hosts and network links. Disrupting a single circuit or
|
|
|
|
link breaks all currently open streams passing along that part of the
|
|
|
|
circuit. Indeed, this same loss of service occurs when a router crashes
|
|
|
|
or its operator restarts it. The current Tor design treats such attacks
|
|
|
|
as intermittent network failures, and depends on users and applications
|
|
|
|
to respond or recover as appropriate. A future design could use an
|
|
|
|
end-to-end based TCP-like acknowledgment protocol, so that no streams are
|
|
|
|
lost unless the entry or exit point itself is disrupted. This solution
|
|
|
|
would require more buffering at the network edges, however, and the
|
|
|
|
performance and anonymity implications from this extra complexity still
|
|
|
|
require investigation.
|
2003-07-11 21:28:36 +02:00
|
|
|
|
|
|
|
\SubSection{Exit policies and abuse}
|
|
|
|
\label{subsec:exitpolicies}
|
|
|
|
|
2003-11-01 22:19:46 +01:00
|
|
|
Exit abuse is a serious barrier to wide-scale Tor deployment. Not
|
|
|
|
only does anonymity present would-be vandals and abusers with an
|
|
|
|
opportunity to hide the origins of their activities---but also,
|
|
|
|
existing sanctions against abuse present an easy way for attackers to
|
|
|
|
harm the Tor network by implicating exit servers for their abuse.
|
|
|
|
Thus, must block or limit attacks and other abuse that travel through
|
2003-10-24 13:21:19 +02:00
|
|
|
the Tor network.
|
|
|
|
|
2003-11-01 22:19:46 +01:00
|
|
|
Also, applications that commonly use IP-based authentication (such
|
|
|
|
institutional mail or web servers) can be fooled by the fact that
|
|
|
|
anonymous connections appear to originate at the exit OR. Rather than
|
|
|
|
expose a private service, an administrator may prefer to prevent Tor
|
|
|
|
users from connecting to those services from a local OR.
|
|
|
|
|
|
|
|
To mitigate abuse issues, in Tor, each onion router's \emph{exit
|
|
|
|
policy} describes to which external addresses and ports the router
|
|
|
|
will permit stream connections. On one end of the spectrum are
|
|
|
|
\emph{open exit} nodes that will connect anywhere. As a compromise,
|
|
|
|
most onion routers will function as \emph{restricted exits} that
|
|
|
|
permit connections to the world at large, but prevent access to
|
|
|
|
certain abuse-prone addresses and services. on the other end are
|
|
|
|
\emph{middleman} nodes that only relay traffic to other Tor nodes, and
|
|
|
|
\emph{private exit} nodes that only connect to a local host or
|
|
|
|
network. (Using a private exit (if one exists) is a more secure way
|
|
|
|
for a client to connect to a given host or network---an external
|
|
|
|
adversary cannot eavesdrop traffic between the private exit and the
|
|
|
|
final destination, and so is less sure of Alice's destination and
|
|
|
|
activities.) is less sure of Alice's destination. More generally,
|
|
|
|
nodes can require a variety of forms of traffic authentication
|
|
|
|
\cite{or-discex00}.
|
|
|
|
|
|
|
|
%Tor offers more reliability than the high-latency fire-and-forget
|
|
|
|
%anonymous email networks, because the sender opens a TCP stream
|
|
|
|
%with the remote mail server and receives an explicit confirmation of
|
|
|
|
%acceptance. But ironically, the private exit node model works poorly for
|
|
|
|
%email, when Tor nodes are run on volunteer machines that also do other
|
|
|
|
%things, because it's quite hard to configure mail transport agents so
|
|
|
|
%normal users can send mail normally, but the Tor process can only deliver
|
|
|
|
%mail locally. Further, most organizations have specific hosts that will
|
|
|
|
%deliver mail on behalf of certain IP ranges; Tor operators must be aware
|
|
|
|
%of these hosts and consider putting them in the Tor exit policy.
|
|
|
|
|
|
|
|
%The abuse issues on closed (e.g. military) networks are different
|
|
|
|
%from the abuse on open networks like the Internet. While these IP-based
|
|
|
|
%access controls are still commonplace on the Internet, on closed networks,
|
|
|
|
%nearly all participants will be honest, and end-to-end authentication
|
|
|
|
%can be assumed for important traffic.
|
|
|
|
|
|
|
|
Many administrators will use port restrictions to support only a
|
|
|
|
limited set of well-known services, such as HTTP, SSH, or AIM.
|
|
|
|
This is not a complete solution, since abuse opportunities for these
|
|
|
|
protocols are still well known. Nonetheless, the benefits are real,
|
|
|
|
since administrators seem used to the concept of port 80 abuse not
|
2003-10-24 13:21:19 +02:00
|
|
|
coming from the machine's owner.
|
|
|
|
|
2003-11-01 22:19:46 +01:00
|
|
|
A further solution may be to use proxies to clean traffic for certain
|
|
|
|
protocols as it leaves the network. For example, much abusive HTTP
|
|
|
|
behavior (such as exploiting buffer overflows or well-known script
|
|
|
|
vulnerabilities) can be detected in a straightforward manner.
|
|
|
|
Similarly, one could run automatic spam filtering software (such as
|
|
|
|
SpamAssassin) on email exiting the OR network. A generic
|
|
|
|
intrusion detection system (IDS) could be adapted to these purposes.
|
|
|
|
|
2003-11-02 05:53:15 +01:00
|
|
|
[XXX Mention possibility of filtering spam-like habits--e.g., many
|
|
|
|
recipients. -NM]
|
|
|
|
|
2003-11-01 22:19:46 +01:00
|
|
|
ORs may also choose to rewrite exiting traffic in order to append
|
|
|
|
headers or other information to indicate that the traffic has passed
|
|
|
|
through an anonymity service. This approach is commonly used, to some
|
|
|
|
success, by email-only anonymity systems. When possible, ORs can also
|
|
|
|
run on servers with hostnames such as {\it anonymous}, to further
|
|
|
|
alert abuse targets to the nature of the anonymous traffic.
|
|
|
|
|
|
|
|
%we should run a squid at each exit node, to provide comparable anonymity
|
|
|
|
%to private exit nodes for cache hits, to speed everything up, and to
|
|
|
|
%have a buffer for funny stuff coming out of port 80. we could similarly
|
|
|
|
%have other exit proxies for other protocols, like mail, to check
|
|
|
|
%delivered mail for being spam.
|
|
|
|
|
|
|
|
%[XXX Um, I'm uncomfortable with this for several reasons.
|
|
|
|
%It's not good for keeping honest nodes honest about discarding
|
|
|
|
%state after it's no longer needed. Granted it keeps an external
|
|
|
|
%observer from noticing how often sites are visited, but it also
|
|
|
|
%allows fishing expeditions. ``We noticed you went to this prohibited
|
|
|
|
%site an hour ago. Kindly turn over your caches to the authorities.''
|
|
|
|
%I previously elsewhere suggested bulk transfer proxies to carve
|
|
|
|
%up big things so that they could be downloaded in less noticeable
|
|
|
|
%pieces over several normal looking connections. We could suggest
|
|
|
|
%similarly one or a handful of squid nodes that might serve up
|
|
|
|
%some of the more sensitive but common material, especially if
|
|
|
|
%the relevant sites didn't want to or couldn't run their own OR.
|
|
|
|
%This would be better than having everyone run a squid which would
|
|
|
|
%just help identify after the fact the different history of that
|
|
|
|
%node's activity. All this kind of speculation needs to move to
|
|
|
|
%future work section I guess. -PS]
|
2003-10-27 13:05:35 +01:00
|
|
|
|
2003-10-24 13:21:19 +02:00
|
|
|
A mixture of open and restricted exit nodes will allow the most
|
|
|
|
flexibility for volunteers running servers. But while a large number
|
|
|
|
of middleman nodes is useful to provide a large and robust network,
|
2003-11-01 22:19:46 +01:00
|
|
|
having only a small number of exit nodes reduces the number of nodes
|
|
|
|
an adversary needs to monitor for traffic analysis, and places a
|
|
|
|
greater burden on the exit nodes. This tension can be seen in the JAP
|
|
|
|
cascade model, wherein only one node in each cascade needs to handle
|
|
|
|
abuse complaints---but an adversary only needs to observe the entry
|
|
|
|
and exit of a cascade to perform traffic analysis on all that
|
|
|
|
cascade's users. The Hydra model (many entries, few exits) presents a
|
|
|
|
different compromise: only a few exit nodes are needed, but an
|
|
|
|
adversary needs to work harder to watch all the clients.
|
|
|
|
|
|
|
|
Finally, we note that exit abuse must not be dismissed as a peripheral
|
|
|
|
issue: when a system's public image suffers, it can reduce the number
|
|
|
|
and diversity of that system's users, and thereby reduce the anonymity
|
|
|
|
of the system itself. Like usability, public perception is also a
|
|
|
|
security parameter. Sadly, preventing abuse of open exit nodes is an
|
|
|
|
unsolved problem, and will probably remain an arms race for the
|
|
|
|
forseeable future. The abuse problems faced by Princeton's CoDeeN
|
|
|
|
project \cite{darkside} give us a glimpse of likely issues.
|
2003-10-24 23:18:38 +02:00
|
|
|
|
2003-07-11 21:28:36 +02:00
|
|
|
\SubSection{Directory Servers}
|
2003-10-23 13:45:51 +02:00
|
|
|
\label{subsec:dirservers}
|
|
|
|
|
|
|
|
First-generation Onion Routing designs \cite{or-jsac98,freedom2-arch} did
|
|
|
|
% is or-jsac98 the right cite here? what's our stock OR cite? -RD
|
|
|
|
in-band network status updates: each router flooded a signed statement
|
|
|
|
to its neighbors, which propagated it onward. But anonymizing networks
|
|
|
|
have different security goals than typical link-state routing protocols.
|
2003-11-01 22:19:46 +01:00
|
|
|
For example, delays (accidental or intentional)
|
2003-10-26 17:25:06 +01:00
|
|
|
that can cause different parts of the network to have different pictures
|
2003-11-01 22:19:46 +01:00
|
|
|
of link-state and topology are not only inconvenient---they give
|
|
|
|
attackers an opportunity to exploit differences in client knowledge.
|
|
|
|
We also worry about attacks to deceive a
|
2003-10-23 13:45:51 +02:00
|
|
|
client about the router membership list, topology, or current network
|
|
|
|
state. Such \emph{partitioning attacks} on client knowledge help an
|
|
|
|
adversary with limited resources to efficiently deploy those resources
|
|
|
|
when attacking a target.
|
|
|
|
|
2003-11-01 22:19:46 +01:00
|
|
|
Instead of flooding, Tor uses a small group of redundant, well-known
|
|
|
|
directory servers to track changes in network topology and node state,
|
|
|
|
including keys and exit policies. Directory servers are a small group
|
|
|
|
of well-known, mostly-trusted onion routers. They listen on a
|
|
|
|
separate port as an HTTP server, so that participants can fetch
|
|
|
|
current network state and router lists (a \emph{directory}), and so
|
|
|
|
that other onion routers can upload their router descriptors. Onion
|
|
|
|
routers now periodically publish signed statements of their state to
|
|
|
|
the directories only. The directories themselves combine this state
|
|
|
|
information with their own views of network liveness, and generate a
|
|
|
|
signed description of the entire network state whenever its contents
|
|
|
|
have changed. Client software is pre-loaded with a list of the
|
|
|
|
directory servers and their keys, and uses this information to
|
|
|
|
bootstrap each client's view of the network.
|
|
|
|
|
|
|
|
When a directory receives a signed statement from and onion router, it
|
|
|
|
recognizes the onion router by its identity (signing) key.
|
|
|
|
Directories do not automatically advertise ORs that they do not
|
|
|
|
recognize. (If they did, an adversary could take over the network by
|
|
|
|
creating many servers \cite{sybil}.) Instead, new nodes must be
|
|
|
|
approved by the directory administrator before they are included.
|
|
|
|
Mechanisms for automated node approval are an area of active research,
|
|
|
|
and are discussed more in section~\ref{sec:maintaining-anonymity}.
|
|
|
|
|
2003-10-23 13:45:51 +02:00
|
|
|
Of course, a variety of attacks remain. An adversary who controls a
|
|
|
|
directory server can track certain clients by providing different
|
2003-11-02 05:53:15 +01:00
|
|
|
information---perhaps by listing only nodes under its control
|
2003-10-23 13:45:51 +02:00
|
|
|
as working, or by informing only certain clients about a given
|
|
|
|
node. Moreover, an adversary without control of a directory server can
|
|
|
|
still exploit differences among client knowledge. If Eve knows that
|
|
|
|
node $M$ is listed on server $D_1$ but not on $D_2$, she can use this
|
|
|
|
knowledge to link traffic through $M$ to clients who have queried $D_1$.
|
|
|
|
|
|
|
|
Thus these directory servers must be synchronized and redundant. The
|
|
|
|
software is distributed with the signature public key of each directory
|
|
|
|
server, and directories must be signed by a threshold of these keys.
|
|
|
|
|
|
|
|
The directory servers in Tor are modeled after those in Mixminion
|
2003-11-01 22:19:46 +01:00
|
|
|
\cite{minion-design}, but our situation is easier. First, we make the
|
|
|
|
simplifying assumption that all participants agree on who the
|
|
|
|
directory servers are. Second, Mixminion needs to predict node
|
|
|
|
behavior, whereas Tor only needs a threshold consensus of the current
|
|
|
|
state of the network.
|
|
|
|
% Cite dir-spec or dir-agreement?
|
2003-10-23 13:45:51 +02:00
|
|
|
|
2003-11-02 01:32:54 +01:00
|
|
|
Tor directory servers build a consensus directory through a simple
|
|
|
|
four-round broadcast protocol. In round one, each server dates and
|
|
|
|
signs its current opinion, and broadcasts it to the other directory
|
|
|
|
servers; then in round two, each server rebroadcasts all the signed
|
|
|
|
opinions it has received. At this point all directory servers check
|
|
|
|
to see whether any server has signed multiple opinions in the same
|
|
|
|
period. If so, the server is either broken or cheating, so protocol
|
|
|
|
stops and notifies the administrators, who either remove the cheater
|
|
|
|
or wait for the broken server to be fixed. If there are no
|
|
|
|
discrepancies, each directory server then locally computes algorithm
|
|
|
|
on the set of opinions, resulting in a uniform shared directory. In
|
|
|
|
round three servers sign this directory and broadcast it; and finally
|
|
|
|
in round four the servers rebroadcast the directory and all the
|
|
|
|
signatures. If any directory server drops out of the network, its
|
|
|
|
signature is not included on the file directory.
|
|
|
|
|
|
|
|
The rebroadcast steps ensure that a directory server is heard by
|
|
|
|
either all of the other servers or none of them, assuming that any two
|
|
|
|
directories can talk directly, or via a third directory (some of the
|
|
|
|
links between directory servers may be down). Broadcasts are feasible
|
|
|
|
because there are relatively few directory servers (currently 3, but we expect
|
|
|
|
to use as many as 9 as the network scales). The actual local algorithm
|
|
|
|
for computing the shared directory is a straightforward threshold
|
|
|
|
voting process: we include an OR if a majority of directory servers
|
|
|
|
believe it to be good.
|
|
|
|
|
|
|
|
When a client Alice retrieves a consensus directory, she uses it if it
|
|
|
|
is signed by a majority of the directory servers she knows.
|
|
|
|
|
|
|
|
Using directory servers rather than flooding provides simplicity and
|
|
|
|
flexibility. For example, they don't complicate the analysis when we
|
|
|
|
start experimenting with non-clique network topologies. And because
|
|
|
|
the directories are signed, they can be cached by other onion routers,
|
|
|
|
or indeed by any server. Thus directory servers are not a performance
|
|
|
|
bottleneck when we have many users, and do not aid traffic analysis by
|
|
|
|
forcing clients to periodically announce their existence to any
|
|
|
|
central point.
|
2003-10-26 17:25:06 +01:00
|
|
|
% Mention Hydra as an example of non-clique topologies. -NM, from RD
|
2003-10-24 05:39:14 +02:00
|
|
|
|
2003-10-30 05:05:28 +01:00
|
|
|
% also find some place to integrate that dirservers have to actually
|
|
|
|
% lay test circuits and use them, otherwise routers could connect to
|
|
|
|
% the dirservers but discard all other traffic.
|
|
|
|
% in some sense they're like reputation servers in \cite{mix-acc} -RD
|
|
|
|
|
2003-11-01 22:19:46 +01:00
|
|
|
|
2003-10-21 03:11:29 +02:00
|
|
|
\Section{Rendezvous points: location privacy}
|
2003-07-11 21:28:36 +02:00
|
|
|
\label{sec:rendezvous}
|
|
|
|
|
2003-11-01 22:19:46 +01:00
|
|
|
Rendezvous points are a building block for \emph{location-hidden
|
|
|
|
services} (also known as ``responder anonymity'') in the Tor
|
|
|
|
network. Location-hidden services allow a server Bob to a TCP
|
|
|
|
service, such as a webserver, without revealing the IP of his service.
|
|
|
|
Besides allowing Bob to provided services anonymously, location
|
|
|
|
privacy also seeks to provide some protection against DDoS attacks:
|
|
|
|
attackers are forced to attack the onion routing network as a whole
|
|
|
|
rather than just Bob's IP.
|
2003-10-17 13:04:39 +02:00
|
|
|
|
2003-11-01 04:44:13 +01:00
|
|
|
\subsection{Goals for rendezvous points}
|
|
|
|
\label{subsec:rendezvous-goals}
|
|
|
|
In addition to our other goals, have tried to provide the following
|
|
|
|
properties in our design for location-hidden servers:
|
|
|
|
\begin{tightlist}
|
|
|
|
\item[Flood-proof:] An attacker should not be able to flood Bob with traffic
|
|
|
|
simply by sending may requests to Bob's public location. Thus, Bob needs a
|
|
|
|
way to filter incoming requests.
|
|
|
|
\item[Robust:] Bob should be able to maintain a long-term pseudonymous
|
|
|
|
identity even in the presence of OR failure. Thus, Bob's identity must not
|
|
|
|
be tied to a single OR.
|
|
|
|
\item[Smear-resistant:] An attacker should not be able to use rendezvous
|
|
|
|
points to smear an OR. That is, if a social attacker tries to host a
|
|
|
|
location-hidden service that is illegal or disreputable, it should not
|
|
|
|
appear---even to a casual observer---that the OR is hosting that service.
|
|
|
|
\item[Application-transparent:] Although we are willing to require users to
|
|
|
|
run special software to access location-hidden servers, we are not willing
|
|
|
|
to require them to modify their applications.
|
|
|
|
\end{tightlist}
|
|
|
|
|
|
|
|
\subsection{Rendezvous design}
|
2003-11-01 22:19:46 +01:00
|
|
|
We provide location-hiding for Bob by allowing him to advertise
|
|
|
|
several onion routers (his \emph{Introduction Points}) as his public
|
|
|
|
location. (He may do this on any robust efficient distributed
|
|
|
|
key-value lookup system with authenticated updates, such as CFS
|
|
|
|
\cite{cfs:sosp01}\footnote{
|
|
|
|
Each onion router could run a node in this lookup
|
|
|
|
system; also note that as a stopgap measure, we can start by running a
|
|
|
|
simple lookup system on the directory servers.})
|
|
|
|
Alice, the client, chooses a node for her
|
|
|
|
\emph{Meeting Point}. She connects to one of Bob's introduction
|
|
|
|
points, informs him about her rendezvous point, and then waits for him
|
|
|
|
to connect to the rendezvous point. This extra level of indirection
|
|
|
|
helps Bob's introduction points avoid problems associated with serving
|
|
|
|
unpopular files directly, as could occur, for example, if Bob chooses
|
|
|
|
an introduction point in Texas to serve anti-ranching propaganda,
|
|
|
|
or if Bob's service tends to get DDoS'ed by network vandals.
|
2003-10-28 22:55:38 +01:00
|
|
|
The extra level of indirection also allows Bob to respond to some requests
|
2003-10-21 03:11:29 +02:00
|
|
|
and ignore others.
|
|
|
|
|
2003-11-01 22:19:46 +01:00
|
|
|
The steps of a rendezvous as follows. These steps are performed on
|
|
|
|
behalf of Alice and Bob by their local onion proxies, which they both
|
|
|
|
must run; application integration is described more fully below.
|
2003-10-17 13:04:39 +02:00
|
|
|
\begin{tightlist}
|
2003-11-01 22:19:46 +01:00
|
|
|
\item Bob chooses some introduction ppoints, and advertises them via
|
|
|
|
CFS (or some other distributed key-value publication system).
|
|
|
|
\item Bob establishes a Tor virtual circuit to each of his
|
2003-10-17 13:04:39 +02:00
|
|
|
Introduction Points, and waits.
|
2003-10-21 03:11:29 +02:00
|
|
|
\item Alice learns about Bob's service out of band (perhaps Bob told her,
|
|
|
|
or she found it on a website). She looks up the details of Bob's
|
2003-11-01 22:19:46 +01:00
|
|
|
service from CFS.
|
|
|
|
\item Alice chooses an OR to serve as a Rendezvous Point (RP) for this
|
|
|
|
transaction. She establishes a virtual circuit to her RP, and
|
|
|
|
tells it to wait for connections. [XXX how?]
|
|
|
|
\item Alice opens an anonymous stream to one of Bob's Introduction
|
|
|
|
Points, and gives it message (encrypted for Bob) which tells him
|
|
|
|
about herself, her chosen RP, and the first half of an ephemeral
|
|
|
|
key handshake. The Introduction Point sends the message to Bob.
|
|
|
|
\item Bob may decide to ignore Alice's request. [XXX Based on what?]
|
|
|
|
Otherwise, he creates a new virtual circuit to Alice's RP, and
|
|
|
|
authenticates himself. [XXX how?]
|
|
|
|
\item If the authentication is successful, the RP connects Alice's
|
|
|
|
virtual circuit to Bob's. Note that RP can't recognize Alice,
|
2003-10-21 03:11:29 +02:00
|
|
|
Bob, or the data they transmit (they share a session key).
|
2003-11-01 22:19:46 +01:00
|
|
|
\item Alice now sends a Begin cell along the circuit. It arrives at Bob's
|
2003-10-21 03:11:29 +02:00
|
|
|
onion proxy. Bob's onion proxy connects to Bob's webserver.
|
2003-11-01 22:19:46 +01:00
|
|
|
\item An anonymous stream has been established, and Alice and Bob
|
|
|
|
communicate as normal.
|
2003-10-17 13:04:39 +02:00
|
|
|
\end{tightlist}
|
|
|
|
|
2003-11-01 22:19:46 +01:00
|
|
|
[XXX We need to modify the above to refer people down to these next
|
|
|
|
paragraphs. -NM]
|
|
|
|
|
2003-10-21 03:11:29 +02:00
|
|
|
When establishing an introduction point, Bob provides the onion router
|
|
|
|
with a public ``introduction'' key. The hash of this public key
|
|
|
|
identifies a unique service, and (since Bob is required to sign his
|
|
|
|
messages) prevents anybody else from usurping Bob's introduction point
|
2003-10-26 23:49:07 +01:00
|
|
|
in the future. Bob uses the same public key when establishing the other
|
2003-10-21 03:11:29 +02:00
|
|
|
introduction points for that service.
|
|
|
|
|
2003-11-01 22:19:46 +01:00
|
|
|
The message that Alice gives the introduction point includes a hash of Bob's
|
2003-10-21 03:11:29 +02:00
|
|
|
public key to identify the service, an optional initial authentication
|
|
|
|
token (the introduction point can do prescreening, eg to block replays),
|
2003-10-27 00:49:01 +01:00
|
|
|
and (encrypted to Bob's public key) the location of the rendezvous point,
|
|
|
|
a rendezvous cookie Bob should tell RP so he gets connected to
|
2003-10-24 13:21:19 +02:00
|
|
|
Alice, an optional authentication token so Bob can choose whether to respond,
|
2003-10-27 00:49:01 +01:00
|
|
|
and the first half of a DH key exchange. When Bob connects to RP
|
|
|
|
and gets connected to Alice's pipe, his first cell contains the
|
2003-10-21 03:11:29 +02:00
|
|
|
other half of the DH key exchange.
|
|
|
|
|
2003-10-28 22:55:38 +01:00
|
|
|
The authentication tokens can be used to provide selective access to users
|
|
|
|
proportional to how important it is that they main uninterrupted access
|
|
|
|
to the service. During normal situations, Bob's service might simply be
|
2003-11-01 22:19:46 +01:00
|
|
|
offered directly from mirrors; Bob can also give out authentication cookies
|
|
|
|
to high-priority users. If those mirrors are knocked down by DDoS attacks,
|
|
|
|
those users can switch to accessing Bob's service via the Tor
|
2003-10-28 22:55:38 +01:00
|
|
|
rendezvous system.
|
2003-10-23 13:45:51 +02:00
|
|
|
|
2003-10-31 07:16:21 +01:00
|
|
|
\SubSection{Integration with user applications}
|
2003-10-21 03:11:29 +02:00
|
|
|
|
|
|
|
For each service Bob offers, he configures his local onion proxy to know
|
2003-11-01 22:19:46 +01:00
|
|
|
the local IP and port of the server, a strategy for authorizing Alices,
|
|
|
|
and a public key. Bob publishes
|
|
|
|
the public key, an expiration
|
|
|
|
time (``not valid after''), and the current introduction points for
|
|
|
|
his
|
|
|
|
service into CFS, all indexed by the hash of the public key
|
|
|
|
Note that Bob's webserver is unmodified, and doesn't even know
|
2003-10-21 03:11:29 +02:00
|
|
|
that it's hidden behind the Tor network.
|
|
|
|
|
2003-11-01 22:19:46 +01:00
|
|
|
Because Alice's applications must work unchanged, her client interface
|
|
|
|
remains a SOCKS proxy. Thus we must encode all of the necessary
|
|
|
|
information into the fully qualified domain name Alice uses when
|
|
|
|
establishing her connections. Location-hidden services use a virtual
|
|
|
|
top level domain called `.onion': thus hostnames take the form
|
|
|
|
x.y.onion where x encodes the hash of PK, and y is the authentication
|
|
|
|
cookie. Alice's onion proxy examines hostnames and recognizes when
|
|
|
|
they're destined for a hidden server. If so, it decodes the PK and
|
|
|
|
starts the rendezvous as described in the table above.
|
2003-10-21 03:11:29 +02:00
|
|
|
|
|
|
|
\subsection{Previous rendezvous work}
|
|
|
|
|
2003-10-17 13:04:39 +02:00
|
|
|
Ian Goldberg developed a similar notion of rendezvous points for
|
2003-11-01 22:19:46 +01:00
|
|
|
low-latency anonymity systems \cite{ian-thesis}. His ``service tags''
|
|
|
|
play the same role in his design as the hashes of services' public
|
|
|
|
keys play in ours. We use public key hashes so that they can be
|
|
|
|
self-authenticating, and so the client can recognize the same service
|
|
|
|
with confidence later on. His design also differs from ours in the
|
|
|
|
following ways: First, Goldberg suggests that the client should
|
|
|
|
manually hunt down a current location of the service via Gnutella;
|
|
|
|
whereas our use of the DHT makes lookup faster, more robust, and
|
|
|
|
transparent to the user. Second, in Tor the client and server
|
|
|
|
negotiate ephemeral keys via Diffie-Hellman, so at no point in the
|
|
|
|
path is the plaintext exposed. Third, our design tries to minimize the
|
|
|
|
exposure associated with running the service, so as to make volunteers
|
|
|
|
more willing to offer introduction and rendezvous point services.
|
|
|
|
Tor's introduction points do not output any bytes to the clients, and
|
|
|
|
the rendezvous points don't know the client, the server, or the data
|
|
|
|
being transmitted. The indirection scheme is also designed to include
|
|
|
|
authentication/authorization---if the client doesn't include the right
|
|
|
|
cookie with its request for service, the server need not even
|
|
|
|
acknowledge its existence.
|
2003-10-17 13:04:39 +02:00
|
|
|
|
2003-10-24 23:18:38 +02:00
|
|
|
\Section{Analysis}
|
2003-10-30 12:40:14 +01:00
|
|
|
\label{sec:analysis}
|
|
|
|
|
|
|
|
In this section, we discuss how well Tor meets our stated design goals
|
|
|
|
and its resistance to attacks.
|
|
|
|
|
2003-11-01 23:34:23 +01:00
|
|
|
\SubSection{Meeting Basic Goals}
|
2003-11-02 04:58:05 +01:00
|
|
|
% None of these seem to say very much. Should this subsection be removed?
|
2003-11-01 23:34:23 +01:00
|
|
|
\begin{tightlist}
|
2003-10-30 12:40:14 +01:00
|
|
|
\item [Basic Anonymity:] Because traffic is encrypted, changing in
|
|
|
|
appearance, and can flow from anywhere to anywhere within the
|
|
|
|
network, a simple observer that cannot see both the initiator
|
|
|
|
activity and the corresponding activity where the responder talks to
|
|
|
|
the network will not be able to link the initiator and responder.
|
|
|
|
Nor is it possible to directly correlate any two communication
|
|
|
|
sessions as coming from a single source without additional
|
2003-11-02 04:58:05 +01:00
|
|
|
information. Resistance to more sophisticated anonymity threats is
|
|
|
|
discussed below.
|
2003-11-01 23:34:23 +01:00
|
|
|
\item[Deployability:] Tor requires no specialized hardware. Tor
|
|
|
|
requires no kernel modifications; it runs in user space (currently
|
|
|
|
on Linux, various BSDs, and Windows). All of these imply a low
|
|
|
|
technical barrier to running a Tor node. There is an assumption that
|
|
|
|
Tor nodes have good relatively persistent net connectivity
|
|
|
|
(currently T1 or better);
|
|
|
|
% Is that reasonable to say? We haven't really discussed it -P.S.
|
2003-11-02 04:58:05 +01:00
|
|
|
% Roger thinks otherwise; he will fix this. -NM
|
2003-11-01 23:34:23 +01:00
|
|
|
however, there is no padding overhead, and operators can limit
|
|
|
|
bandwidth on any link. Tor is freely available under the modified
|
2003-11-02 04:58:05 +01:00
|
|
|
BSD license, and operators are able to choose their own exit
|
|
|
|
policies, thus reducing legal and social barriers to
|
2003-11-01 23:34:23 +01:00
|
|
|
running a node.
|
|
|
|
|
|
|
|
\item[Usability:] As noted, Tor runs in user space. So does the onion
|
2003-11-02 04:58:05 +01:00
|
|
|
proxy, which is comparatively easy to install and run. SOCKS-aware
|
|
|
|
applications require nothing more than to be pointed at the onion
|
|
|
|
proxy; other applications can be redirected to use SOCKS for their
|
|
|
|
outgoing TCP connections by drop-in libraries such as tsocks.
|
2003-11-01 23:34:23 +01:00
|
|
|
|
2003-11-02 04:58:05 +01:00
|
|
|
\item[Flexibility:] Tor's design and implementation is fairly modular,
|
|
|
|
so that,
|
2003-11-01 23:34:23 +01:00
|
|
|
for example, a scalable P2P replacement for the directory servers
|
|
|
|
would not substantially impact other aspects of the system. Tor
|
|
|
|
runs on top of TCP, so design options that could not easily do so
|
|
|
|
would be difficult to test on the current network. However, most
|
|
|
|
low-latency protocols are designed to run over TCP. We are currently
|
2003-11-02 02:48:41 +01:00
|
|
|
discussing with the designers of MorphMix interoperability of the
|
2003-11-01 23:34:23 +01:00
|
|
|
two systems, which seems to be relatively straightforward. This will
|
|
|
|
allow testing and direct comparison of the two rather different
|
|
|
|
designs.
|
2003-11-02 04:58:05 +01:00
|
|
|
% Do we want to say this? I don't think we should talk about this
|
|
|
|
% kind of discussion till we have more positive results.
|
2003-11-01 23:34:23 +01:00
|
|
|
|
|
|
|
\item[Conservative design:] Tor opts for practicality when there is no
|
|
|
|
clear resolution of anonymity tradeoffs or practical means to
|
|
|
|
achieve resolution. Thus, we do not currently pad or mix; although
|
|
|
|
it would be easy to add either of these. Indeed, our system allows
|
2003-11-02 04:58:05 +01:00
|
|
|
long-range and variable padding if this should ever be shown to have
|
2003-11-01 23:34:23 +01:00
|
|
|
a clear advantage. Similarly, we do not currently attempt to
|
2003-11-02 04:58:05 +01:00
|
|
|
resolve such issues as Sybil attacks to dominate the network except
|
2003-11-01 23:34:23 +01:00
|
|
|
by such direct means as personal familiarity of director operators
|
|
|
|
with all node operators.
|
|
|
|
\end{tightlist}
|
|
|
|
|
|
|
|
\SubSection{Attacks and Defenses}
|
|
|
|
\label{sec:attacks}
|
|
|
|
|
|
|
|
Below we summarize a variety of attacks and how well our design withstands
|
|
|
|
them.
|
|
|
|
|
2003-11-02 04:58:05 +01:00
|
|
|
[XXX Note that some of these attacks are outside our threat model! -NM]
|
|
|
|
|
2003-11-01 23:34:23 +01:00
|
|
|
\subsubsection*{Passive attacks}
|
|
|
|
\begin{tightlist}
|
|
|
|
\item \emph{Observing user traffic patterns.} Observations of connection
|
|
|
|
between an end user and a first onion router will not reveal to whom
|
|
|
|
the user is connecting or what information is being sent. It will
|
|
|
|
reveal patterns of user traffic (both sent and received). Simple
|
|
|
|
profiling of user connection patterns is not generally possible,
|
|
|
|
however, because multiple application connections (streams) may be
|
|
|
|
operating simultaneously or in series over a single circuit. Thus,
|
|
|
|
further processing is necessary to try to discern even these usage
|
|
|
|
patterns.
|
|
|
|
|
|
|
|
\item \emph{Observing user content.} At the user end, content is
|
|
|
|
encrypted; however, connections from the network to arbitrary
|
|
|
|
websites may not be. Further, a responding website may itself be
|
|
|
|
considered an adversary. Filtering content is not a primary goal of
|
|
|
|
Onion Routing; nonetheless, Tor can directly make use of Privoxy and
|
2003-11-02 04:58:05 +01:00
|
|
|
related filtering services via SOCKS and thus anonymize their
|
|
|
|
application data streams.
|
2003-11-01 23:34:23 +01:00
|
|
|
|
|
|
|
\item \emph{Option distinguishability.} Configuration options can be a
|
|
|
|
source of distinguishable patterns. In general there is economic
|
|
|
|
incentive to allow preferential services \cite{econymics}, and some
|
|
|
|
degree of configuration choice is a factor in attracting large
|
2003-11-02 04:58:05 +01:00
|
|
|
numbers of users to provide anonymity. So far, however, we have
|
|
|
|
not found a compelling use case in Tor for any client-configurable
|
|
|
|
options. Thus, clients are currently distinguishable only by their
|
|
|
|
behavior.
|
2003-11-01 23:34:23 +01:00
|
|
|
|
2003-11-02 04:58:05 +01:00
|
|
|
\item \emph{End-to-end Timing correlation.} Tor only minimally hides
|
|
|
|
end-to-end timing correlations. If an attacker can watch patterns of
|
|
|
|
traffic at the initiator end and the responder end, then he will be
|
|
|
|
able to confirm the correspondence with high probability. The
|
|
|
|
greatest protection currently against such confirmation is if the
|
|
|
|
connection between the onion proxy and the first Tor node is hidden,
|
|
|
|
possibly because it is local or behind a firewall. This approach
|
|
|
|
requires an observer to separate traffic originating the onion
|
|
|
|
router from traffic passes through it. We still do not, however,
|
|
|
|
predict this approach to be a large problem for an attacker who can
|
|
|
|
observe traffic at both ends of an application connection.
|
2003-11-01 23:34:23 +01:00
|
|
|
|
|
|
|
\item \emph{End-to-end Size correlation.} Simple packet counting
|
2003-11-02 04:58:05 +01:00
|
|
|
without timing consideration will also be effective in confirming
|
|
|
|
endpoints of a connection through Onion Routing; although slightly
|
|
|
|
less so. This is because, even without padding, the leaky pipe
|
|
|
|
topology means different numbers of packets may enter one end of a
|
|
|
|
circuit than exit at the other.
|
2003-11-01 23:34:23 +01:00
|
|
|
|
|
|
|
\item \emph{Website fingerprinting.} All the above passive
|
|
|
|
attacks that are at all effective are traffic confirmation attacks.
|
|
|
|
This puts them outside our general design goals. There is also
|
2003-11-02 04:58:05 +01:00
|
|
|
a passive traffic analysis attack that is potentially effective.
|
|
|
|
Instead of searching exit connections for timing and volume
|
2003-11-01 23:34:23 +01:00
|
|
|
correlations it is possible to build up a database of
|
2003-11-02 04:58:05 +01:00
|
|
|
``fingerprints'' containing file sizes and access patterns for a
|
|
|
|
large numbers of interesting websites. If one now wants to
|
2003-11-01 23:34:23 +01:00
|
|
|
monitor the activity of a user, it may be possible to confirm a
|
2003-11-02 04:58:05 +01:00
|
|
|
connection to a site simply by consulting the database. This attack has
|
2003-11-01 23:34:23 +01:00
|
|
|
been shown to be effective against SafeWeb \cite{hintz-pet02}. Onion
|
|
|
|
Routing is not as vulnerable as SafeWeb to this attack: There is the
|
|
|
|
possibility that multiple streams are exiting the circuit at
|
2003-11-02 04:58:05 +01:00
|
|
|
different places concurrently. Also, fingerprinting will be limited to
|
2003-11-01 23:34:23 +01:00
|
|
|
the granularity of cells, currently 256 bytes. Larger cell sizes
|
|
|
|
and/or minimal padding schemes that group websites into large sets
|
2003-11-02 04:58:05 +01:00
|
|
|
are possible responses. But this remains an open problem. Link
|
|
|
|
padding or long-range dummies may also make fingerprints harder to
|
|
|
|
detect. (Note that
|
2003-11-01 23:34:23 +01:00
|
|
|
such fingerprinting should not be confused with the latency attacks
|
|
|
|
of \cite{back01}. Those require a fingerprint of the latencies of
|
|
|
|
all circuits through the network, combined with those from the
|
2003-11-02 04:58:05 +01:00
|
|
|
network edges to the targeted user and the responder website. While
|
2003-11-01 23:34:23 +01:00
|
|
|
these are in principal feasible and surprises are always possible,
|
|
|
|
these constitute a much more complicated attack, and there is no
|
2003-11-02 04:58:05 +01:00
|
|
|
current evidence of their practicality.)
|
2003-11-01 23:34:23 +01:00
|
|
|
|
2003-11-02 04:58:05 +01:00
|
|
|
\item \emph{Content analysis.} Tor explicitly provides no content
|
|
|
|
rewriting for any protocol at a higher level than TCP. When
|
|
|
|
protocol cleaners are available, however (as Privoxy is for HTTP),
|
|
|
|
Tor can integrate them in order to address these attacks.
|
2003-11-01 23:34:23 +01:00
|
|
|
|
|
|
|
\end{tightlist}
|
|
|
|
|
|
|
|
\subsubsection*{Active attacks}
|
|
|
|
\begin{tightlist}
|
2003-11-02 04:58:05 +01:00
|
|
|
\item \emph{Key compromise.} We consider the impact of a compromise
|
|
|
|
for each type of key in turn, from the shortest- to the
|
|
|
|
longest-lived. If a circuit session key is compromised, the
|
|
|
|
attacker can unwrap a single layer of encryption from the relay
|
|
|
|
cells traveling along that circuit. (Only nodes on the circuit can
|
|
|
|
see these cells.) If a TLS session key is compromised, an attacker
|
|
|
|
can view all the cells on TLS connection until the key is
|
|
|
|
renegotiated. (These cells are themselves encrypted.) If a TLS
|
|
|
|
private key is compromised, the attacker can fool others into
|
|
|
|
thinking that he is the affected OR, but still cannot accept any
|
|
|
|
connections. If an onion private key is compromised, the attacker
|
|
|
|
can impersonate the OR in circuits, but only if the attacker has
|
|
|
|
also compromised the OR's TLS private key, or is running the
|
|
|
|
previous OR in the circuit. (This compromise affects newly created
|
|
|
|
circuits, but because of perfect forward secrecy, the attacker
|
|
|
|
cannot hijack old circuits without compromising their session keys.)
|
|
|
|
In any case, an attacker can only take advantage of a compromise in
|
|
|
|
these mid-term private keys until they expire. Only by
|
|
|
|
compromising a node's identity key can an attacker replace that
|
|
|
|
node indefinitely, by sending new forged mid-term keys to the
|
|
|
|
directories. Finally, an attacker who can compromise a
|
|
|
|
\emph{directory's} identity key can influence every client's view
|
|
|
|
of the network---but only to the degree made possible by gaining a
|
|
|
|
vote with the rest of the the directory servers.
|
|
|
|
|
|
|
|
\item \emph{Iterated compromise.} A roving adversary who can
|
|
|
|
compromise ORs (by system intrusion, legal coersion, or extralegal
|
|
|
|
coersion) could march down length of a circuit compromising the
|
|
|
|
nodes until he reaches the end. Unless the adversary can complete
|
|
|
|
this attack within the lifetime of the circuit, however, the ORs
|
|
|
|
will have discarded the necessary information before the attack can
|
|
|
|
be completed. (Thanks to the perfect forward secrecy of session
|
|
|
|
keys, the attacker cannot cannot force nodes to decrypt recorded
|
2003-11-02 05:53:15 +01:00
|
|
|
traffic once the circuits have been closed.) Additionally, building
|
|
|
|
circuits that cross jurisdictions can make legal coercion
|
|
|
|
harder---this phenomenon is commonly called ``jurisdictional
|
|
|
|
arbitrage.''
|
|
|
|
|
2003-11-01 23:34:23 +01:00
|
|
|
|
2003-11-02 04:58:05 +01:00
|
|
|
\item \emph{Run a recipient.} By running a Web server, an adversary
|
|
|
|
trivially learns the timing patterns of those connecting to it, and
|
|
|
|
can introduce arbitrary patterns in its responses. This can greatly
|
|
|
|
facilitate end-to-end attacks: If the adversary can induce certain
|
|
|
|
users to connect to connect to his webserver (perhaps by providing
|
|
|
|
content targeted at those users), she now holds one end of their
|
|
|
|
connection. Additonally, here is a danger that the application
|
|
|
|
protocols and associated programs can be induced to reveal
|
|
|
|
information about the initiator. This is not directly in Onion
|
|
|
|
Routing's protection area, so we are dependent on Privoxy and
|
|
|
|
similar protocol cleaners to solve the problem.
|
2003-11-01 23:34:23 +01:00
|
|
|
|
|
|
|
\item \emph{Run an onion proxy.} It is expected that end users will
|
|
|
|
nearly always run their own local onion proxy. However, in some
|
2003-11-02 04:58:05 +01:00
|
|
|
settings, it may be necessary for the proxy to run
|
|
|
|
remotely---typically, in an institutional setting where it was
|
|
|
|
necessary to monitor the activity of those connecting to the proxy.
|
|
|
|
The drawback, of course, is that if the onion proxy is compromised,
|
|
|
|
then all future connections through it are completely compromised.
|
|
|
|
|
|
|
|
\item \emph{DoS non-observed nodes.} An observer who can observe some
|
|
|
|
of the Tor network can increase the value of this traffic analysis
|
|
|
|
if it can attack non-observed nodes to shut them down, reduce
|
|
|
|
their reliability, or persuade users that they are not trustworthy.
|
|
|
|
The best defense here is robustness.
|
2003-11-01 23:34:23 +01:00
|
|
|
|
2003-11-02 04:58:05 +01:00
|
|
|
\item \emph{Run a hostile node.} In addition to the abilties of a
|
|
|
|
local observer, an isolated hostile node can create circuits through
|
|
|
|
itself, or alter traffic patterns, in order to affect traffic at
|
|
|
|
other nodes. Its ability to directly DoS a neighbor is now limited
|
|
|
|
by bandwidth throttling. Nonetheless, in order to compromise the
|
|
|
|
anonymity of the endpoints of a circuit by its observations, a
|
|
|
|
hostile node is only significant if it is immediately adjacent to
|
|
|
|
that endpoint.
|
2003-11-01 23:34:23 +01:00
|
|
|
|
2003-11-02 04:58:05 +01:00
|
|
|
\item \emph{Run multiple hostile nodes.} If an adversary is able to
|
|
|
|
run multiple ORs, and is able to persuade the directory servers
|
|
|
|
that those ORs are trustworthy and independant, then occasionally
|
|
|
|
some user will choose one of those ORs for the start and another of
|
|
|
|
those ORs as the end of a circuit. When this happens, the user's
|
|
|
|
anonymity is compromised for those circuits. If an adversary can
|
2003-11-02 05:53:15 +01:00
|
|
|
control $m$ out of $N$ nodes, he should be able to correlate at most
|
|
|
|
$\frac{m}{N}$ of the traffic in this way---although an adersary
|
|
|
|
could possibly attract a disproportionately large amount of traffic
|
|
|
|
by running an exit node with an unusually permisssive exit policy.
|
2003-11-02 04:58:05 +01:00
|
|
|
|
2003-11-01 23:34:23 +01:00
|
|
|
\item \emph{Compromise entire path.} Anyone compromising both
|
|
|
|
endpoints of a circuit can confirm this with high probability. If
|
|
|
|
the entire path is compromised, this becomes a certainty; however,
|
2003-11-02 04:58:05 +01:00
|
|
|
the added benefit to the adversary of such an attack is small in
|
|
|
|
relation to the difficulty.
|
2003-11-01 23:34:23 +01:00
|
|
|
|
|
|
|
\item \emph{Run a hostile directory server.} Directory servers control
|
|
|
|
admission to the network. However, because the network directory
|
|
|
|
must be signed by a majority of servers, the threat of a single
|
|
|
|
hostile server is minimized.
|
|
|
|
|
|
|
|
\item \emph{Selectively DoS a Tor node.} As noted, neighbors are
|
|
|
|
bandwidth limited; however, it is possible to open up sufficient
|
|
|
|
numbers of circuits that converge at a single onion router to
|
|
|
|
overwhelm its network connection, its ability to process new
|
2003-11-02 04:58:05 +01:00
|
|
|
circuits or both.
|
2003-11-01 23:34:23 +01:00
|
|
|
|
|
|
|
%OK so I noticed that twins are completely removed from the paper above,
|
|
|
|
% but it's after 5 so I'll leave that problem to you guys. -PS
|
|
|
|
|
|
|
|
\item \emph{Introduce timing into messages.} This is simply a stronger
|
|
|
|
version of passive timing attacks already discussed above.
|
|
|
|
|
|
|
|
\item \emph{Tagging attacks.} A hostile node could try to ``tag'' a
|
|
|
|
cell by altering it. This would render it unreadable, but if the
|
2003-11-02 04:58:05 +01:00
|
|
|
connection is, for example, an unencrypted request to a Web site,
|
|
|
|
the garbled content coming out at the appropriate time could confirm
|
|
|
|
the association. However, integrity checks on cells prevent
|
|
|
|
this attack from succeeding.
|
2003-11-01 23:34:23 +01:00
|
|
|
|
2003-11-02 05:53:15 +01:00
|
|
|
\item \emph{Replace contents of unauthenticated protocols.} When a
|
|
|
|
relaying an unauthenticated protocol like HTTP, a hostile exit node
|
|
|
|
can impersonate the target server. Thus, whenever possible, clients
|
|
|
|
should prefer protocols with end-to-end authentication.
|
|
|
|
|
|
|
|
\item \emph{Replay attacks.} Some anonymity protocols are vulnerable
|
|
|
|
to replay attacks. Tor is not; replaying one side of a handshake
|
|
|
|
will result in a different negotiated session key, and so the rest
|
|
|
|
of the recorded session can't be used.
|
|
|
|
% ``NonSSL Anonymizer''?
|
|
|
|
|
|
|
|
\item \emph{Smear attacks.} An attacker could use the Tor network to
|
|
|
|
engage in socially dissapproved acts, so as to try to bring the
|
|
|
|
entire network into disrepute and get its operators to shut it down.
|
|
|
|
Exit policies can help reduce the possibilities for abuse, but
|
|
|
|
ultimately, the network will require volunteers who can tolerate
|
|
|
|
some political heat.
|
2003-11-01 23:34:23 +01:00
|
|
|
\end{tightlist}
|
|
|
|
|
|
|
|
\subsubsection*{Directory attacks}
|
|
|
|
\begin{tightlist}
|
|
|
|
\item knock out a dirserver
|
|
|
|
\item knock out half the dirservers
|
|
|
|
\item trick user into using different software (with different dirserver
|
|
|
|
keys)
|
|
|
|
\item OR connects to the dirservers but nowhere else
|
|
|
|
\item foo
|
|
|
|
\end{tightlist}
|
|
|
|
|
|
|
|
\subsubsection*{Attacks against rendezvous points}
|
|
|
|
\begin{tightlist}
|
|
|
|
\item foo
|
|
|
|
\end{tightlist}
|
|
|
|
|
2003-10-30 12:40:14 +01:00
|
|
|
|
2003-11-01 07:47:19 +01:00
|
|
|
\Section{Open Questions in Low-latency Anonymity}
|
2003-07-11 21:28:36 +02:00
|
|
|
\label{sec:maintaining-anonymity}
|
2003-11-01 22:19:46 +01:00
|
|
|
|
2003-11-01 07:47:19 +01:00
|
|
|
% There must be a better intro than this! -NM
|
|
|
|
In addition to the open problems discussed in
|
|
|
|
section~\ref{subsec:non-goals}, many other questions remain to be
|
|
|
|
solved by future research before we can be truly confident that we
|
|
|
|
have built a secure low-latency anonymity service.
|
|
|
|
|
|
|
|
Many of these open issues are questions of balance. For example,
|
|
|
|
how often should users rotate to fresh circuits? Too-frequent
|
|
|
|
rotation is inefficient and expensive, but too-infrequent rotation
|
|
|
|
makes the user's traffic linkable. Instead of opening a fresh
|
|
|
|
circuit; clients can also limit linkability exit from a middle point
|
|
|
|
of the circuit, or by truncating and re-extending the circuit, but
|
|
|
|
more analysis is needed to determine the proper trade-off.
|
|
|
|
[XXX mention predecessor attacks?]
|
|
|
|
|
|
|
|
A similar question surrounds timing of directory operations:
|
|
|
|
how often should directories be updated? With too-infrequent
|
|
|
|
updates clients receive an inaccurate picture of the network; with
|
|
|
|
too-frequent updates the directory servers are overloaded.
|
|
|
|
|
|
|
|
%do different exit policies at different exit nodes trash anonymity sets,
|
|
|
|
%or not mess with them much?
|
|
|
|
%
|
|
|
|
%% Why would they? By routing traffic to certain nodes preferentially?
|
|
|
|
|
|
|
|
[XXX Choosing paths and path lengths: I'm not writing this bit till
|
|
|
|
Arma's pathselection stuff is in. -NM]
|
|
|
|
|
|
|
|
%%%% Roger said that he'd put a path selection paragraph into section
|
|
|
|
%%%% 4 that would replace this.
|
|
|
|
%
|
|
|
|
%I probably should have noted that this means loops will be on at least
|
|
|
|
%five hop routes, which should be rare given the distribution. I'm
|
|
|
|
%realizing that this is reproducing some of the thought that led to a
|
|
|
|
%default of five hops in the original onion routing design. There were
|
|
|
|
%some different assumptions, which I won't spell out now. Note that
|
|
|
|
%enclave level protections really change these assumptions. If most
|
|
|
|
%circuits are just two hops, then just a single link observer will be
|
|
|
|
%able to tell that two enclaves are communicating with high probability.
|
|
|
|
%So, it would seem that enclaves should have a four node minimum circuit
|
|
|
|
%to prevent trivial circuit insider identification of the whole circuit,
|
|
|
|
%and three hop minimum for circuits from an enclave to some nonclave
|
|
|
|
%responder. But then... we would have to make everyone obey these rules
|
|
|
|
%or a node that through timing inferred it was on a four hop circuit
|
|
|
|
%would know that it was probably carrying enclave to enclave traffic.
|
|
|
|
%Which... if there were even a moderate number of bad nodes in the
|
|
|
|
%network would make it advantageous to break the connection to conduct
|
|
|
|
%a reformation intersection attack. Ahhh! I gotta stop thinking
|
|
|
|
%about this and work on the paper some before the family wakes up.
|
|
|
|
%On Sat, Oct 25, 2003 at 06:57:12AM -0400, Paul Syverson wrote:
|
|
|
|
%> Which... if there were even a moderate number of bad nodes in the
|
|
|
|
%> network would make it advantageous to break the connection to conduct
|
|
|
|
%> a reformation intersection attack. Ahhh! I gotta stop thinking
|
|
|
|
%> about this and work on the paper some before the family wakes up.
|
|
|
|
%This is the sort of issue that should go in the 'maintaining anonymity
|
|
|
|
%with tor' section towards the end. :)
|
|
|
|
%Email from between roger and me to beginning of section above. Fix and move.
|
|
|
|
|
|
|
|
Throughout this paper, we have assumed that end-to-end traffic
|
|
|
|
analysis cannot yet be defeated. But even high-latency anonymity
|
|
|
|
systems can be vulnerable to end-to-end traffic analysis, if the
|
|
|
|
traffic volumes are high enough, and if users' habits are sufficiently
|
2003-11-01 09:48:12 +01:00
|
|
|
distinct \cite{limits-open,statistical-disclosure}. \emph{What can be
|
2003-11-01 07:47:19 +01:00
|
|
|
done to limit the effectiveness of these attacks against low-latency
|
|
|
|
systems?} Tor already makes some effort to conceal the starts and
|
|
|
|
ends of streams by wrapping all long-range control commands in
|
|
|
|
identical-looking relay cells, but more analysis is needed. Link
|
|
|
|
padding could frustrate passive observer who count packets; long-range
|
|
|
|
padding could work against observers who own the first hop in a
|
|
|
|
circuit. But more research needs to be done in order to find an
|
|
|
|
efficient and practical approach. Volunteers prefer not to run
|
|
|
|
constant-bandwidth padding; but more sophisticated traffic shaping
|
|
|
|
approaches remain somewhat unanalyzed. [XXX is this so?] Recent work
|
2003-11-01 09:48:12 +01:00
|
|
|
on long-range padding \cite{defensive-dropping} shows promise. One
|
2003-11-01 07:47:19 +01:00
|
|
|
could also try to reduce correlation in packet timing by batching and
|
|
|
|
re-ordering packets, but it is unclear whether this could improve
|
|
|
|
anonymity without introducing so much latency as to render the
|
|
|
|
network unusable.
|
|
|
|
|
|
|
|
Even if passive timing attacks were wholly solved, active timing
|
|
|
|
attacks would remain. \emph{What can
|
|
|
|
be done to address attackers who can introduce timing patterns into
|
|
|
|
a user's traffic?} [XXX mention likely approaches]
|
|
|
|
|
|
|
|
%%% I think we cover this by framing the problem as ``Can we make
|
|
|
|
%%% end-to-end characteristics of low-latency systems as good as
|
|
|
|
%%% those of high-latency systems?'' Eliminating long-term
|
|
|
|
%%% intersection is a hard problem.
|
|
|
|
%
|
|
|
|
%Even regardless of link padding from Alice to the cloud, there will be
|
|
|
|
%times when Alice is simply not online. Link padding, at the edges or
|
|
|
|
%inside the cloud, does not help for this.
|
|
|
|
|
|
|
|
In order to scale to large numbers of users, and to prevent an
|
|
|
|
attacker from observing the whole network at once, it may be necessary
|
|
|
|
for low-latency anonymity systems to support far more servers than Tor
|
|
|
|
currently anticipates. This introduces several issues. First, if
|
|
|
|
approval by a centralized set of directory servers is no longer
|
|
|
|
feasible, what mechanism should be used to prevent adversaries from
|
2003-11-01 22:19:46 +01:00
|
|
|
signing up many spurious servers?
|
|
|
|
Second, if clients can no longer have a complete
|
|
|
|
picture of the network at all times, how can should they perform
|
|
|
|
discovery while preventing attackers from manipulating or exploiting
|
|
|
|
gaps in client knowledge? Third, if there are to many servers
|
2003-11-01 07:47:19 +01:00
|
|
|
for every server to constantly communicate with every other, what kind
|
2003-11-01 22:19:46 +01:00
|
|
|
of non-clique topology should the network use? Restricted-route
|
|
|
|
topologies promise comparable anonymity with better scalability
|
|
|
|
\cite{danezis-pets03}, but whatever topology we choose, we need some
|
|
|
|
way to keep attackers from manipulating their position within it.
|
2003-11-01 07:47:19 +01:00
|
|
|
Fourth, since no centralized authority is tracking server reliability,
|
|
|
|
How do we prevent unreliable servers from rendering the network
|
|
|
|
unusable? Fifth, do clients receive so much anonymity benefit from
|
|
|
|
running their own servers that we should expect them all to do so, or
|
|
|
|
do we need to find another incentive structure to motivate them?
|
2003-11-02 02:48:41 +01:00
|
|
|
(Tarzan and MorphMix present possible solutions.)
|
2003-11-01 22:19:46 +01:00
|
|
|
|
|
|
|
[[ XXX how to approve new nodes (advogato, sybil, captcha (RTT));]
|
2003-11-01 07:47:19 +01:00
|
|
|
|
|
|
|
Alternatively, it may be the case that one of these problems proves
|
|
|
|
intractable, or that the drawbacks to many-server systems prove
|
|
|
|
greater than the benefits. Nevertheless, we may still do well to
|
|
|
|
consider non-clique topologies. A cascade topology may provide more
|
|
|
|
defense against traffic confirmation confirmation.
|
|
|
|
% Why would it? Cite. -NM
|
|
|
|
Does the hydra (many inputs, few outputs) topology work
|
|
|
|
better? Are we going to get a hydra anyway because most nodes will be
|
2003-10-21 10:09:55 +02:00
|
|
|
middleman nodes?
|
|
|
|
|
2003-11-01 07:47:19 +01:00
|
|
|
%%% Do more with this paragraph once The TCP-over-TCP paragraph is
|
|
|
|
%%% more integrated into Related works.
|
|
|
|
%
|
|
|
|
As mentioned in section\ref{where-is-it-now}, Tor could improve its
|
|
|
|
robustness against node failure by buffering stream data at the
|
|
|
|
network's edges, and performing end-to-end acknowledgments. The
|
|
|
|
efficacy of this approach remains to be tested, however, and there
|
|
|
|
may be more effective means for ensuring reliable connections in the
|
|
|
|
presence of unreliable nodes.
|
|
|
|
|
|
|
|
%%% Keeping this original paragraph for a little while, since it
|
|
|
|
%%% is not the same as what's written there now.
|
|
|
|
%
|
|
|
|
%Because Tor depends on TLS and TCP to provide a reliable transport,
|
|
|
|
%when one of the servers goes down, all the circuits (and thus streams)
|
|
|
|
%traveling over that server must break. This reduces anonymity because
|
|
|
|
%everybody needs to reconnect right then (does it? how much?) and
|
|
|
|
%because exit connections all break at the same time, and it also harms
|
|
|
|
%usability. It seems the problem is even worse in a peer-to-peer
|
|
|
|
%environment, because so far such systems don't really provide an
|
|
|
|
%incentive for nodes to stay connected when they're done browsing, so
|
|
|
|
%we would expect a much higher churn rate than for onion routing.
|
|
|
|
%there ways of allowing streams to survive the loss of a node in the
|
|
|
|
%path?
|
|
|
|
|
|
|
|
% Roger or Paul suggested that we say something about incentives,
|
|
|
|
% too, but I think that's a better candidate for our future work
|
|
|
|
% section. After all, we will doubtlessly learn very much about why
|
|
|
|
% people do or don't run and use Tor in the near future. -NM
|
2003-07-11 21:28:36 +02:00
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
|
2003-10-26 17:25:06 +01:00
|
|
|
|
2003-10-21 06:27:54 +02:00
|
|
|
|
2003-07-11 21:28:36 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
|
2003-11-01 22:19:46 +01:00
|
|
|
\Section{Future Directions}
|
2003-07-11 21:28:36 +02:00
|
|
|
\label{sec:conclusion}
|
|
|
|
|
2003-10-24 23:18:38 +02:00
|
|
|
% Mention that we need to do TCP over tor for reliability.
|
|
|
|
|
2003-10-21 06:27:54 +02:00
|
|
|
Tor brings together many innovations into
|
2003-10-10 06:35:25 +02:00
|
|
|
a unified deployable system. But there are still several attacks that
|
|
|
|
work quite well, as well as a number of sustainability and run-time
|
|
|
|
issues remaining to be ironed out. In particular:
|
|
|
|
|
2003-11-01 22:19:46 +01:00
|
|
|
% Many of these (Scalability, cover traffic) are duplicates from open problems.
|
|
|
|
%
|
2003-11-02 07:14:59 +01:00
|
|
|
\begin{tightlist}
|
2003-11-01 22:19:46 +01:00
|
|
|
\item \emph{Scalability:} Tor's emphasis on design simplicity and
|
|
|
|
deployability has led us to adopt a clique topology, a
|
|
|
|
semi-centralized model for directories and trusts, and a
|
|
|
|
full-network-visibility model for client knowledge. None of these
|
|
|
|
properties will scale to more than a few hundred servers, at most.
|
|
|
|
Promising approaches to better scalability exist (see
|
|
|
|
section~\ref{sec:maintaining-anonymity}), but more deployment
|
|
|
|
experience would be helpful in learning the relative importance of
|
|
|
|
these bottlenecks.
|
2003-10-21 06:27:54 +02:00
|
|
|
\item \emph{Cover traffic:} Currently we avoid cover traffic because
|
2003-11-01 22:19:46 +01:00
|
|
|
of its clear costs in performance and bandwidth, and because its
|
|
|
|
security benefits have not well understood. With more research
|
|
|
|
\cite{SS03,defensive-dropping}, the price/value ratio may change,
|
|
|
|
both for link-level cover traffic and also long-range cover traffic.
|
2003-10-21 06:27:54 +02:00
|
|
|
\item \emph{Better directory distribution:} Even with the threshold
|
2003-11-01 22:19:46 +01:00
|
|
|
directory agreement algorithm described in \ref{subsec:dirservers},
|
|
|
|
the directory servers are still trust bottlenecks. We must find more
|
|
|
|
decentralized yet practical ways to distribute up-to-date snapshots of
|
|
|
|
network status without introducing new attacks. Also, directory
|
|
|
|
retrieval presents a scaling problem, since clients currently
|
|
|
|
download a description of the entire network state every 15
|
|
|
|
minutes. As the state grows larger and clients more numerous, we
|
|
|
|
may need to move to a solution in which clients only receive
|
|
|
|
incremental updates to directory state, or where directories are
|
|
|
|
cached at the ORs to avoid high loads on the directory servers.
|
2003-10-31 07:16:21 +01:00
|
|
|
\item \emph{Implementing location-hidden servers:} While
|
2003-11-01 22:19:46 +01:00
|
|
|
Section~\ref{sec:rendezvous} describes a design for rendezvous
|
|
|
|
points and location-hidden servers, these feature has not yet been
|
|
|
|
implemented. While doing so, will likely encounter additional
|
|
|
|
issues, both in terms of usability and anonymity, that must be
|
|
|
|
resolved.
|
|
|
|
\item \emph{Further specification review:} Although we have a public,
|
|
|
|
byte-level specification for the Tor protocols, this protocol has
|
|
|
|
not received extensive external review. We hope that as Tor
|
|
|
|
becomes more widely deployed, more people will become interested in
|
|
|
|
examining our specification.
|
2003-10-21 06:27:54 +02:00
|
|
|
\item \emph{Wider-scale deployment:} The original goal of Tor was to
|
2003-11-01 22:19:46 +01:00
|
|
|
gain experience in deploying an anonymizing overlay network, and
|
|
|
|
learn from having actual users. We are now at the point in design
|
|
|
|
and development where we can start deploying a wider network. Once
|
|
|
|
we have are ready for actual users, we will doubtlessly be better
|
|
|
|
able to evaluate some of our design decisions, including our
|
|
|
|
robustness/latency tradeoffs, our abuse-prevention mechanisms, and
|
|
|
|
our overall usability.
|
2003-11-02 07:14:59 +01:00
|
|
|
\end{tightlist}
|
2003-10-10 06:35:25 +02:00
|
|
|
|
2003-07-11 21:28:36 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
|
2003-10-21 10:09:55 +02:00
|
|
|
%% commented out for anonymous submission
|
2003-10-31 07:16:21 +01:00
|
|
|
%\Section{Acknowledgments}
|
|
|
|
% Peter Palfrader for editing
|
|
|
|
% Bram Cohen for congestion control discussions
|
2003-07-11 21:28:36 +02:00
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
|
|
|
|
\bibliographystyle{latex8}
|
2003-10-16 23:49:04 +02:00
|
|
|
\bibliography{tor-design}
|
2003-07-11 21:28:36 +02:00
|
|
|
|
|
|
|
\end{document}
|
|
|
|
|
|
|
|
% Style guide:
|
|
|
|
% U.S. spelling
|
|
|
|
% avoid contractions (it's, can't, etc.)
|
2003-11-01 04:44:13 +01:00
|
|
|
% prefer ``for example'' or ``such as'' to e.g.
|
|
|
|
% prefer ``that is'' to i.e.
|
2003-07-11 21:28:36 +02:00
|
|
|
% 'mix', 'mixes' (as noun)
|
|
|
|
% 'mix-net'
|
|
|
|
% 'mix', 'mixing' (as verb)
|
|
|
|
% 'middleman' [Not with a hyphen; the hyphen has been optional
|
|
|
|
% since Middle English.]
|
|
|
|
% 'nymserver'
|
|
|
|
% 'Cypherpunk', 'Cypherpunks', 'Cypherpunk remailer'
|
2003-10-24 23:18:38 +02:00
|
|
|
% 'Onion Routing design', 'onion router' [note capitalization]
|
2003-10-26 23:59:18 +01:00
|
|
|
% 'SOCKS'
|
2003-11-01 04:44:13 +01:00
|
|
|
% Try not to use \cite as a noun.
|
2003-11-01 22:19:46 +01:00
|
|
|
% 'Authorizating' sounds great, but it isn't a word.
|
2003-11-02 02:48:41 +01:00
|
|
|
% 'First, second, third', not 'Firstly, secondly, thirdly'.
|
|
|
|
% 'circuit', not 'channel'
|
2003-11-02 05:53:15 +01:00
|
|
|
% Typography: no space on either side of an em dash---ever.
|
|
|
|
% Hyphens are for multi-part words; en dashs imply movement or
|
|
|
|
% opposition (The Alice--Bob connection); and em dashes are
|
|
|
|
% for punctuation---like that.
|
2003-07-11 21:28:36 +02:00
|
|
|
%
|
2003-10-26 23:59:18 +01:00
|
|
|
% 'Substitute ``Damn'' every time you're inclined to write ``very;'' your
|
|
|
|
% editor will delete it and the writing will be just as it should be.'
|
2003-10-27 00:49:01 +01:00
|
|
|
% -- Mark Twain
|