down to 24 pages

svn:r13290
This commit is contained in:
Roger Dingledine 2008-01-26 02:48:43 +00:00
parent aac22f1523
commit 12fbf01abe
2 changed files with 74 additions and 72 deletions

Binary file not shown.

View File

@ -25,7 +25,7 @@
%\newcommand{\workingnote}[1]{(**#1)} % makes the note visible.
\date{}
\title{Design of a blocking-resistant anonymity system\\DRAFT}
\title{Design of a blocking-resistant anonymity system}
%\author{Roger Dingledine\inst{1} \and Nick Mathewson\inst{1}}
\author{Roger Dingledine \\ The Tor Project \\ arma@torproject.org \and
@ -50,12 +50,12 @@ by government-level attackers.
\end{abstract}
\section{Introduction and Goals}
\section{Introduction}
Anonymizing networks like Tor~\cite{tor-design} bounce traffic around a
network of encrypting relays. Unlike encryption, which hides only {\it what}
is said, these networks also aim to hide who is communicating with whom, which
users are using which websites, and similar relations. These systems have a
users are using which websites, and so on. These systems have a
broad range of users, including ordinary citizens who want to avoid being
profiled for targeted advertisements, corporations who don't want to reveal
information to their competitors, and law enforcement and government
@ -78,14 +78,14 @@ Wikipedia
and Blogspot, they are no longer affected by local censorship
and firewall rules. In fact, an informal user study
%(described in Appendix~\ref{app:geoip})
showed China as the third largest user base
for Tor clients, with perhaps ten thousand people accessing the Tor
network from China each day.
showed that a few hundred thousand users people access the Tor network
each day, with about 20\% of them coming from China~\cite{something}.
The current Tor design is easy to block if the attacker controls Alice's
connection to the Tor network---by blocking the directory authorities,
by blocking all the server IP addresses in the directory, or by filtering
based on the fingerprint of the Tor TLS handshake. Here we describe an
by blocking all the relay IP addresses in the directory, or by filtering
based on the network fingerprint of the Tor TLS handshake. Here we
describe an
extended design that builds upon the current Tor network to provide an
anonymizing
network that resists censorship as well as anonymity-breaking attacks.
@ -99,7 +99,7 @@ components of our designs in detail. Section~\ref{sec:security} considers
security implications and Section~\ref{sec:reachability} presents other
issues with maintaining connectivity and sustainability for the design.
%Section~\ref{sec:future} speculates about future more complex designs,
Finally Section~\ref{sec:conclusion} summarizes our next steps and
Finally section~\ref{sec:conclusion} summarizes our next steps and
recommendations.
% The other motivation is for places where we're concerned they will
@ -137,8 +137,8 @@ unanticipated oppressive situations. In fact, by designing with
a variety of adversaries in mind, we can take advantage of the fact that
adversaries will be in different stages of the arms race at each location,
so an address blocked in one locale can still be useful in others.
We focus on an attacker with somewhat complex goals:
We assume that the attackers' goals are somewhat complex.
\begin{tightlist}
\item The attacker would like to restrict the flow of certain kinds of
information, particularly when this information is seen as embarrassing to
@ -222,7 +222,7 @@ success and visibility.
We do not assume that government-level attackers are always uniform
across the country. For example, users of different ISPs in China
experience different censorship policies and mechanisms.
experience different censorship policies and mechanisms~\cite{china-ccs07}.
%there is no single centralized place in China
%that coordinates its specific censorship decisions and steps.
@ -253,11 +253,11 @@ real Tor network.
Tor is popular and sees a lot of use---it's the largest anonymity
network of its kind, and has
attracted more than 800 volunteer-operated routers from around the
attracted more than 1500 volunteer-operated routers from around the
world. Tor protects each user by routing their traffic through a multiply
encrypted ``circuit'' built of a few randomly selected servers, each of which
can remove only a single layer of encryption. Each server sees only the step
before it and the step after it in the circuit, and so no single server can
encrypted ``circuit'' built of a few randomly selected relay, each of which
can remove only a single layer of encryption. Each relay sees only the step
before it and the step after it in the circuit, and so no single relay can
learn the connection between a user and her chosen communication partners.
In this section, we examine some of the reasons why Tor has become popular,
with particular emphasis to how we can take advantage of these properties
@ -290,7 +290,7 @@ The Tor design provides other features as well that are not typically
present in manual or ad hoc circumvention techniques.
First, Tor has a well-analyzed and well-understood way to distribute
information about servers.
information about relay.
Tor directory authorities automatically aggregate, test,
and publish signed summaries of the available Tor routers. Tor clients
can fetch these summaries to learn which routers are available and
@ -365,11 +365,11 @@ something else: hundreds of thousands of different and often-changing
addresses that we can leverage for our blocking-resistance design.
Finally and perhaps most importantly, Tor provides anonymity and prevents any
single server from linking users to their communication partners. Despite
single relay from linking users to their communication partners. Despite
initial appearances, {\it distributed-trust anonymity is critical for
anti-censorship efforts}. If any single server can expose dissident bloggers
anti-censorship efforts}. If any single relay can expose dissident bloggers
or compile a list of users' behavior, the censors can profitably compromise
that server's operator, perhaps by applying economic pressure to their
that relay's operator, perhaps by applying economic pressure to their
employers,
breaking into their computer, pressuring their family (if they have relatives
in the censored area), or so on. Furthermore, in designs where any relay can
@ -394,7 +394,8 @@ process of finding one or more usable relays.
For example, we can divide the pieces of Tor in the previous section
into the process of building paths and sending
traffic over them (relay) and the process of learning from the directory
servers about what routers are available (discovery). With this distinction
authorities about what routers are available (discovery). With this
distinction
in mind, we now examine several categories of relay-based schemes.
\subsection{Centrally-controlled shared proxies}
@ -579,33 +580,34 @@ firewalls can't notice them without performing expensive stream
reconstruction~\cite{ptacek98insertion}. This technique relies on the
same insight as our weak steganography assumption.
\subsection{Internal caching networks}
%\subsection{Internal caching networks}
Freenet~\cite{freenet-pets00} is an anonymous peer-to-peer data store.
Analyzing Freenet's security can be difficult, as its design is in flux as
new discovery and routing mechanisms are proposed, and no complete
specification has (to our knowledge) been written. Freenet servers relay
requests for specific content (indexed by a digest of the content)
``toward'' the server that hosts it, and then cache the content as it
follows the same path back to
the requesting user. If Freenet's routing mechanism is successful in
allowing nodes to learn about each other and route correctly even as some
node-to-node links are blocked by firewalls, then users inside censored areas
can ask a local Freenet server for a piece of content, and get an answer
without having to connect out of the country at all. Of course, operators of
servers inside the censored area can still be targeted, and the addresses of
external servers can still be blocked.
%Freenet~\cite{freenet-pets00} is an anonymous peer-to-peer data store.
%Analyzing Freenet's security can be difficult, as its design is in flux as
%new discovery and routing mechanisms are proposed, and no complete
%specification has (to our knowledge) been written. Freenet servers relay
%requests for specific content (indexed by a digest of the content)
%``toward'' the server that hosts it, and then cache the content as it
%follows the same path back to
%the requesting user. If Freenet's routing mechanism is successful in
%allowing nodes to learn about each other and route correctly even as some
%node-to-node links are blocked by firewalls, then users inside censored areas
%can ask a local Freenet server for a piece of content, and get an answer
%without having to connect out of the country at all. Of course, operators of
%servers inside the censored area can still be targeted, and the addresses of
%external servers can still be blocked.
\subsection{Skype}
%\subsection{Skype}
%The popular Skype voice-over-IP software uses multiple techniques to tolerate
%restrictive networks, some of which allow it to continue operating in the
%presence of censorship. By switching ports and using encryption, Skype
%attempts to resist trivial blocking and content filtering. Even if no
%encryption were used, it would still be expensive to scan all voice
%traffic for sensitive words. Also, most current keyloggers are unable to
%store voice traffic. Nevertheless, Skype can still be blocked, especially at
%its central login server.
The popular Skype voice-over-IP software uses multiple techniques to tolerate
restrictive networks, some of which allow it to continue operating in the
presence of censorship. By switching ports and using encryption, Skype
attempts to resist trivial blocking and content filtering. Even if no
encryption were used, it would still be expensive to scan all voice
traffic for sensitive words. Also, most current keyloggers are unable to
store voice traffic. Nevertheless, Skype can still be blocked, especially at
its central login server.
%*sjmurdoch* "we consider the login server to be the only central component in
%the Skype p2p network."
%*sjmurdoch* http://www1.cs.columbia.edu/~salman/publications/skype1_4.pdf
@ -661,7 +663,7 @@ to get more relay addresses, and to distribute them to users differently.
\subsection{Bridge relays}
Today, Tor servers operate on less than a thousand distinct IP addresses;
Today, Tor relays operate on a few thousand distinct IP addresses;
an adversary
could enumerate and block them all with little trouble. To provide a
means of ingress to the network, we need a larger set of entry points, most
@ -695,7 +697,7 @@ Tor client; but we leave this discussion for Section~\ref{sec:security}.
How do the bridge relays advertise their existence to the world? We
introduce a second new component of the design: a specialized directory
authority that aggregates and tracks bridges. Bridge relays periodically
publish server descriptors (summaries of their keys, locations, etc,
publish relay descriptors (summaries of their keys, locations, etc,
signed by their long-term identity key), just like the relays in the
``main'' Tor network, but in this case they publish them only to the
bridge directory authorities.
@ -703,7 +705,7 @@ bridge directory authorities.
The main difference between bridge authorities and the directory
authorities for the main Tor network is that the main authorities provide
a list of every known relay, but the bridge authorities only give
out a server descriptor if you already know its identity key. That is,
out a relay descriptor if you already know its identity key. That is,
you can keep up-to-date on a bridge's location and other information
once you know about it, but you can't just grab a list of all the bridges.
@ -733,7 +735,7 @@ authorities, to limit the potential impact of an authority compromise.
%Secondly, while users can in fact configure which directory authorities
%they use, we need to add a new type of directory authority and teach
%bridges to fetch directory information from the main authorities while
%publishing server descriptors to the bridge authorities. We're most of
%publishing relay descriptors to the bridge authorities. We're most of
%the way there, since we can already specify attributes for directory
%authorities:
%add a separate flag named ``blocking''.
@ -756,7 +758,7 @@ If a blocked user knows the identity keys of a set of bridge relays, and
he has correct address information for at least one of them, he can use
that one to make a secure connection to the bridge authority and update
his knowledge about the other bridge relays. He can also use it to make
secure connections to the main Tor network and directory servers, so he
secure connections to the main Tor network and directory authorities, so he
can build circuits and connect to the rest of the Internet. All of these
updates happen in the background: from the blocked user's perspective,
he just accesses the Internet via his Tor client like always.
@ -786,15 +788,15 @@ out too much.
Currently, Tor uses two protocols for its network communications. The
main protocol uses TLS for encrypted and authenticated communication
between Tor instances. The second protocol is standard HTTP, used for
fetching directory information. All Tor servers listen on their ``ORPort''
fetching directory information. All Tor relays listen on their ``ORPort''
for TLS connections, and some of them opt to listen on their ``DirPort''
as well, to serve directory information. Tor servers choose whatever port
numbers they like; the server descriptor they publish to the directory
as well, to serve directory information. Tor relays choose whatever port
numbers they like; the relay descriptor they publish to the directory
tells users where to connect.
One format for communicating address information about a bridge relay is
its IP address and DirPort. From there, the user can ask the bridge's
directory cache for an up-to-date copy of its server descriptor, and
directory cache for an up-to-date copy of its relay descriptor, and
learn its current circuit keys, its ORPort, and so on.
However, connecting directly to the directory cache involves a plaintext
@ -824,7 +826,7 @@ potential users, and their current and anticipated firewall restrictions.
Furthermore, we need to look at the specifics of Tor's TLS handshake.
Right now Tor uses some predictable strings in its TLS handshakes. For
example, it sets the X.509 organizationName field to ``Tor'', and it puts
the Tor server's nickname in the certificate's commonName field. We
the Tor relay's nickname in the certificate's commonName field. We
should tweak the handshake protocol so it doesn't rely on any unusual details
in the certificate, yet it remains secure; the certificate itself
should be made to resemble an ordinary HTTPS certificate. We should also try
@ -841,7 +843,7 @@ These extra certificates may help identify Tor's TLS handshake; instead,
bridges should consider using only a single TLS key certificate signed by
their identity key, and providing the full value of the identity key in an
early handshake cell. More significantly, Tor currently has all clients
present certificates, so that clients are harder to distinguish from servers.
present certificates, so that clients are harder to distinguish from relays.
But in a blocking-resistance environment, clients should not present
certificates at all.
@ -892,10 +894,10 @@ adversary could do similar attacks just by monitoring the network
traffic.
% cue paper by steven and george
Once the Tor client has fetched the bridge's server descriptor, it should
Once the Tor client has fetched the bridge's relay descriptor, it should
remember the identity key fingerprint for that bridge relay. Thus if
the bridge relay moves to a new IP address, the client can query the
bridge directory authority to look up a fresh server descriptor using
bridge directory authority to look up a fresh relay descriptor using
this fingerprint.
So we've shown that it's \emph{possible} to bootstrap into the network
@ -1143,7 +1145,7 @@ bridge directory authorities, and bridges gravitate to one based on
their identity key. The better answer would be some federation of bridge
authorities that work together to provide redundancy but don't introduce
new security issues. We could even imagine designs where the bridge
authorities have encrypted versions of the bridge's server descriptors,
authorities have encrypted versions of the bridge's relay descriptors,
and the users learn a decryption key that they keep private when they
first hear about the bridge---this way the bridge authorities would not
be able to learn the IP address of the bridges.
@ -1163,7 +1165,7 @@ is it reachable from the public Internet? Second, what proportion of
the time is it available? Third, is it blocked in certain jurisdictions?
The first component can be tested just as we test reachability of
ordinary Tor servers. Specifically, the bridges do a self-test---connect
ordinary Tor relays. Specifically, the bridges do a self-test---connect
to themselves via the Tor network---before they are willing to
publish their descriptor, to make sure they're not obviously broken or
misconfigured. Once the bridges publish, the bridge authority also tests
@ -1377,7 +1379,7 @@ start the race. More research remains.
Against some attacks, relaying traffic for others can improve
anonymity. The simplest example is an attacker who owns a small number
of Tor servers. He will see a connection from the bridge, but he won't
of Tor relays. He will see a connection from the bridge, but he won't
be able to know whether the connection originated there or was relayed
from somebody else. More generally, the mere uncertainty of whether the
traffic originated from that user may be helpful.
@ -1406,9 +1408,9 @@ of its own.
We also need to examine how entry guards fit in. Entry guards
(a small set of nodes that are always used for the first
step in a circuit) help protect against certain attacks
where the attacker runs a few Tor servers and waits for
the user to choose these servers as the beginning and end of her
circuit\footnote{\url{http://wiki.noreply.org/noreply/TheOnionRouter/TorFAQ\#EntryGuards}}.
where the attacker runs a few Tor relays and waits for
the user to choose these relays as the beginning and end of her
circuit\footnote{\url{http://wiki.noreply.org/noreply/TheOnionRouter/TorFAQ#EntryGuards}}.
If the blocked user doesn't use the bridge's entry guards, then the bridge
doesn't gain as much cover benefit. On the other hand, what design changes
are needed for the blocked user to use the bridge's entry guards without
@ -1450,17 +1452,17 @@ system.
\label{subsec:trust-chain}
Tor's ``public key infrastructure'' provides a chain of trust to
let users verify that they're actually talking to the right servers.
let users verify that they're actually talking to the right relays.
There are four pieces to this trust chain.
First, when Tor clients are establishing circuits, at each step
they demand that the next Tor server in the path prove knowledge of
they demand that the next Tor relay in the path prove knowledge of
its private key~\cite{tor-design}. This step prevents the first node
in the path from just spoofing the rest of the path. Second, the
Tor directory authorities provide a signed list of servers along with
Tor directory authorities provide a signed list of relays along with
their public keys---so unless the adversary can control a threshold
of directory authorities, he can't trick the Tor client into using other
Tor servers. Third, the location and keys of the directory authorities,
Tor relays. Third, the location and keys of the directory authorities,
in turn, is hard-coded in the Tor source code---so as long as the user
got a genuine version of Tor, he can know that he is using the genuine
Tor network. And last, the source code and other packages are signed
@ -1491,7 +1493,7 @@ community, though, this question remains a critical weakness.
%\section{Performance improvements}
%\label{sec:performance}
%
%\subsection{Fetch server descriptors just-in-time}
%\subsection{Fetch relay descriptors just-in-time}
%
%I guess we should encourage most places to do this, so blocked
%users don't stand out.
@ -1635,9 +1637,9 @@ emphasizes the connections the bridge user is currently relaying.
%(Minor
%anonymity implications, but hey.) (In many cases there won't be much
%activity, so this may backfire. Or it may be better suited to full-fledged
%Tor servers.)
%Tor relay.)
% Also consider everybody-a-server. Many of the scalability questions
% Also consider everybody-a-relay. Many of the scalability questions
% are easier when you're talking about making everybody a bridge.
%\subsection{What if the clients can't install software?}
@ -1702,7 +1704,7 @@ each bridge, so users who hear about an honest bridge can get a good
copy.
See Section~\ref{subsec:first-bridge} for more discussion.
% Ian suggests that we have every tor server distribute a signed copy of the
% Ian suggests that we have every tor relay distribute a signed copy of the
% software.
\section{Next Steps}
@ -1824,7 +1826,7 @@ from somewhere.
9. Bridge directories must not simply be a handful of nodes that
provide the list of bridges. They must flood or otherwise distribute
information out to other Tor nodes as mirrors. That way it becomes
difficult for censors to flood the bridge directory servers with
difficult for censors to flood the bridge directory authorities with
requests, effectively denying access for others. But, there's lots of
churn and a much larger size than Tor directories. We are forced to
handle the directory scaling problem here much sooner than for the