From fddda9a797241d6cae249adcdf430ffe0cced130 Mon Sep 17 00:00:00 2001
From: Roger Dingledine <arma@torproject.org>
Date: Sun, 2 Nov 2003 06:14:59 +0000
Subject: [PATCH] more patches on sec2 and sec3; rewrite threat model

svn:r712
---
 doc/TODO           |   6 +-
 doc/tor-design.tex | 393 ++++++++++++++-------------------------------
 2 files changed, 130 insertions(+), 269 deletions(-)

diff --git a/doc/TODO b/doc/TODO
index b8bb95063f..9aaabf7bc1 100644
--- a/doc/TODO
+++ b/doc/TODO
@@ -1,6 +1,10 @@
-mutiny: if none of the ports is defined maybe it shouldn't start.
+mutiny suggests: if none of the ports is defined maybe it shouldn't start.
 aaron got a crash in tor_timegm in tzset on os x, with -l warn but not with -l debug.
 Oct 25 04:29:17.017 [warn] directory_initiate_command(): No running dirservers known. This is really bad.
+rename ACI to CircID
+rotate tls-level connections -- make new ones, expire old ones.
+dirserver shouldn't put you in running-routers list if you haven't
+  uploading a descriptor recently
 
 Legend:
 SPEC!!  - Not specified
diff --git a/doc/tor-design.tex b/doc/tor-design.tex
index c2f00f84e1..2c55b230b7 100644
--- a/doc/tor-design.tex
+++ b/doc/tor-design.tex
@@ -39,7 +39,7 @@
 %  \pdfpageheight=\the\paperheight
 %\fi
 
-\title{Tor: Design of a Second-Generation Onion Router}
+\title{Tor: The Second-Generation Onion Router}
 
 %\author{Roger Dingledine \\ The Free Haven Project \\ arma@freehaven.net \and
 %Nick Mathewson \\ The Free Haven Project \\ nickm@freehaven.net \and
@@ -308,22 +308,20 @@ Concentrating the traffic to a single point increases the anonymity set
 analysis easier: an adversary need only eavesdrop on the proxy to observe
 the entire system.
 
-More complex are distributed-trust, circuit-based anonymizing systems.  In
-these designs, a user establishes one or more medium-term bidirectional
-end-to-end tunnels to exit servers, and uses those tunnels to deliver
-low-latency packets to and from one or more destinations per
-tunnel. %XXX reword
-Establishing tunnels is expensive and typically
-requires public-key cryptography, whereas relaying packets along a tunnel is
-comparatively inexpensive.  Because a tunnel crosses several servers, no
-single server can link a user to her communication partners.
+More complex are distributed-trust, circuit-based anonymizing systems.
+In these designs, a user establishes one or more medium-term bidirectional
+end-to-end circuits, and tunnels TCP streams in fixed-size cells.
+Establishing circuits is expensive and typically requires public-key
+cryptography, whereas relaying cells is comparatively inexpensive.
+Because a circuit crosses several servers, no single server can link a
+user to her communication partners.
 
-In some distributed-trust systems, such as the Java Anon Proxy (also known
-as JAP or Web MIXes), users build their tunnels along a fixed shared route
-or \emph{cascade}.  As with a single-hop proxy, this approach aggregates
+The Java Anon Proxy (also known
+as JAP or Web MIXes) uses fixed shared routes known as
+\emph{cascades}.  As with a single-hop proxy, this approach aggregates
 users into larger anonymity sets, but again an attacker only needs to
 observe both ends of the cascade to bridge all the system's traffic.
-The Java Anon Proxy's design seeks to prevent this by padding
+The Java Anon Proxy's design provides protection by padding
 between end users and the head of the cascade \cite{web-mix}. However, the
 current implementation does no padding and thus remains vulnerable
 to both active and passive bridging.
@@ -350,10 +348,10 @@ from the data stream.
 
 Hordes \cite{hordes-jcs} is based on Crowds but also uses multicast
 responses to hide the initiator. Herbivore \cite{herbivore} and P5
-\cite{p5} go even further, requiring broadcast.  Each uses broadcast
-in different ways, and trade-offs are made to make broadcast more
-practical. Both Herbivore and P5 are designed primarily for communication
-between peers, although Herbivore permits external connections by
+\cite{p5} go even further, requiring broadcast. They make anonymity
+and efficiency tradeoffs to make broadcast more practical.
+These systems are designed primarily for communication between peers,
+although Herbivore users can make external connections by
 requesting a peer to serve as a proxy.  Allowing easy connections to
 nonparticipating responders or recipients is important for usability,
 for example so users can visit nonparticipating Web sites or exchange
@@ -391,273 +389,132 @@ Eternity and Free Haven.
 \SubSection{Goals}
 Like other low-latency anonymity designs, Tor seeks to frustrate
 attackers from linking communication partners, or from linking
-multiple communications to or from a single point.  Within this
+multiple communications to or from a single user.  Within this
 main goal, however, several design considerations have directed
 Tor's evolution.
 
-\begin{tightlist}
-\item[Deployability:] The design must be one which can be implemented,
-  deployed, and used in the real world.  This requirement precludes designs
-  that are expensive to run (for example, by requiring more bandwidth than
-  volunteers are willing to provide); designs that place a heavy liability
-  burden on operators (for example, by allowing attackers to implicate onion
-  routers in illegal activities); and designs that are difficult or expensive
-  to implement (for example, by requiring kernel patches, or separate proxies
-  for every protocol).  This requirement also precludes systems in which
-  users who do not benefit from anonymity are required to run special
-  software in order to communicate with anonymous parties.
-%     Our rendezvous points require clients to use our software to get to
-%     the location-hidden servers.
-%     Or at least, they require somebody near the client-side running our
-%     software. We haven't worked out the details of keeping it transparent
-%     for Alice if she's using some other http proxy somewhere. I guess the
-%     external http proxy should route through a Tor client, which automatically
-%     translates the foo.onion address? -RD
-%
-%  1. Such clients do benefit from anonymity: they can reach the server.
-%  Recall that our goal for location hidden servers is to continue to
-%  provide service to priviliged clients when a DoS is happening or
-%  to provide access to a location sensitive service. I see no contradiction.
-%  2. A good idiot check is whether what we require people to download
-%  and use is more extreme than downloading the anonymizer toolbar or
-%  privacy manager. I don't think so, though I'm not claiming we've already
-%  got the installation and running of a client down to that simplicity
-%  at this time. -PS
-\item[Usability:] A hard-to-use system has fewer users---and because
-  anonymity systems hide users among users, a system with fewer users
-  provides less anonymity.  Usability is not only a convenience for Tor:
-  it is a security requirement \cite{econymics,back01}. Tor
-  should work with most of a user's unmodified applications; shouldn't
-  introduce prohibitive delays; and should require the user to make as few
-  configuration decisions as possible.
-\item[Flexibility:] The protocol must be flexible and
-  well-specified, so that it can serve as a test-bed for future research in
-  low-latency anonymity systems.  Many of the open problems in low-latency
-  anonymity networks (such as generating dummy traffic, or preventing
-  pseudospoofing attacks) may be solvable independently from the issues
-  solved by Tor; it would be beneficial if future systems were not forced to
-  reinvent Tor's design decisions.  (But note that while a flexible design
-  benefits researchers, there is a danger that differing choices of
-  extensions will render users distinguishable.  Thus, experiments
-  on extensions should be limited and should not significantly affect
-  the distinguishability of ordinary users.
-  % To run an experiment researchers must file an
-  % anonymity impact statement -PS
-  of implementations should
-  not permit different protocol extensions to coexist in a single deployed
-  network.)
-\item[Conservative design:] The protocol's design and security parameters
-  must be conservative.  Because additional features impose implementation
-  and complexity costs, Tor should include as few speculative features as
-  possible.  (We do not oppose speculative designs in general; however, it is
-  our goal with Tor to embody a solution to the problems in low-latency
-  anonymity that we can solve today before we plunge into the problems of
-  tomorrow.)
-  % This last bit sounds completely cheesy.  Somebody should tone it down. -NM 
-\end{tightlist}
+\textbf{Deployability:} The design must be one which can be implemented,
+deployed, and used in the real world.  This requirement precludes designs
+that are expensive to run (for example, by requiring more bandwidth
+than volunteers are willing to provide); designs that place a heavy
+liability burden on operators (for example, by allowing attackers to
+implicate onion routers in illegal activities); and designs that are
+difficult or expensive to implement (for example, by requiring kernel
+patches, or separate proxies for every protocol).  This requirement also
+precludes systems in which users who do not benefit from anonymity are
+required to run special software in order to communicate with anonymous
+parties. (We do not meet this goal for the current rendezvous design,
+however; see Section~\ref{sec:rendezvous}.)
+
+\textbf{Usability:} A hard-to-use system has fewer users---and because
+anonymity systems hide users among users, a system with fewer users
+provides less anonymity.  Usability is not only a convenience for Tor:
+it is a security requirement \cite{econymics,back01}. Tor should not
+require modifying applications; should not introduce prohibitive delays;
+and should require the user to make as few configuration decisions
+as possible.
+
+\textbf{Flexibility:} The protocol must be flexible and well-specified,
+so that it can serve as a test-bed for future research in low-latency
+anonymity systems.  Many of the open problems in low-latency anonymity
+networks, such as generating dummy traffic or preventing Sybil attacks
+\cite{sybil}, may be solvable independently from the issues solved by
+Tor. Hopefully future systems will not need to reinvent Tor's design
+decisions.  (But note that while a flexible design benefits researchers,
+there is a danger that differing choices of extensions will make users
+distinguishable. Experiments should be run on a separate network.)
+
+\textbf{Conservative design:} The protocol's design and security
+parameters must be conservative. Additional features impose implementation
+and complexity costs; adding unproven techniques to the design threatens
+deployability, readability, and ease of security analysis. Tor aims to
+deploy a simple and stable system that integrates the best well-understood
+approaches to protecting anonymity.
 
 \SubSection{Non-goals}
 \label{subsec:non-goals}
 In favoring conservative, deployable designs, we have explicitly deferred
-a number of goals. Many of these goals are desirable in anonymity systems,
-but we choose to defer them either because they are solved elsewhere,
-or because they present an area of active research lacking a generally
-accepted solution.
+a number of goals, either because they are solved elsewhere, or because
+they are an open research question.
 
-\begin{tightlist}
-\item[Not Peer-to-peer:] Tarzan and MorphMix aim to
-  scale to completely decentralized peer-to-peer environments with thousands
-  of short-lived servers, many of which may be controlled by an adversary.
-  Because of the many open problems in this approach, Tor uses a more
-  conservative design.
-\item[Not secure against end-to-end attacks:] Tor does not claim to provide a
-  definitive solution to end-to-end timing or intersection attacks. Some
-  approaches, such as running an onion router, may help; see
-  Section~\ref{sec:analysis} for more discussion.
-\item[No protocol normalization:] Tor does not provide \emph{protocol
-  normalization} like Privoxy or the Anonymizer.  In order to make clients
-  indistinguishable when they use complex and variable protocols such as HTTP,
-  Tor must be layered with a filtering proxy such as Privoxy to hide
-  differences between clients, expunge protocol features that leak identity,
-  and so on.  Similarly, Tor does not currently integrate tunneling for
-  non-stream-based protocols like UDP; this too must be provided by
-  an external service.
+\textbf{Not Peer-to-peer:} Tarzan and MorphMix aim to scale to completely
+decentralized peer-to-peer environments with thousands of short-lived
+servers, many of which may be controlled by an adversary.  This approach
+is appealing, but still has many open problems.
+
+\textbf{Not secure against end-to-end attacks:} Tor does not claim
+to provide a definitive solution to end-to-end timing or intersection
+attacks. Some approaches, such as running an onion router, may help;
+see Section~\ref{sec:analysis} for more discussion.
+
+\textbf{No protocol normalization:} Tor does not provide \emph{protocol
+normalization} like Privoxy or the Anonymizer. For complex and variable
+protocols such as HTTP, Tor must be layered with a filtering proxy such
+as Privoxy to hide differences between clients, and expunge protocol
+features that leak identity. Similarly, Tor does not currently integrate
+tunneling for non-stream-based protocols like UDP; this too must be
+provided by an external service.
 % Actually, tunneling udp over tcp is probably horrible for some apps.
 % Should this get its own non-goal bulletpoint? The motivation for
-% non-goal-ness would be burden on clients / portability.
-\item[Not steganographic:] Tor does not try to conceal which users are
-  sending or receiving communications; it only tries to conceal whom they are
-  communicating with.
-\end{tightlist}
+% non-goal-ness would be burden on clients / portability. -RD
+% No, leave it as is. -RD
+
+\textbf{Not steganographic:} Tor does not try to conceal which users are
+sending or receiving communications; it only tries to conceal with whom
+they communicate.
 
 \SubSection{Threat Model}
 \label{subsec:threat-model}
 
 A global passive adversary is the most commonly assumed threat when
-analyzing theoretical anonymity designs. But like all practical low-latency
-systems, Tor is not secure against this adversary.  Instead, we assume an
-adversary that is weaker than global with respect to distribution, but that
-is not merely passive.  Our threat model expands on that from
-\cite{or-pet00}.
+analyzing theoretical anonymity designs. But like all practical
+low-latency systems, Tor does not protect against such a strong
+adversary. Instead, we expect an adversary who can observe some fraction
+of network traffic; who can generate, modify, delete, or delay traffic
+on the network; who can operate onion routers of its own; and who can
+compromise some fraction of the onion routers on the network.
 
-%%%% This is really keen analytical stuff, but it isn't our threat model:
-%%%% we just go ahead and assume a fraction of hostile nodes for
-%%%% convenience. -NM
-%
-%% The basic adversary components we consider are:
-%% \begin{tightlist}
-%% \item[Observer:] can observe a connection (e.g., a sniffer on an
-%%   Internet router), but cannot initiate connections. Observations may
-%%   include timing and/or volume of packets as well as appearance of
-%%   individual packets (including headers and content).
-%% \item[Disrupter:] can delay (indefinitely) or corrupt traffic on a
-%%   link. Can change all those things that an observer can observe up to
-%%   the limits of computational ability (e.g., cannot forge signatures
-%%   unless a key is compromised).
-%% \item[Hostile initiator:] can initiate (or destroy) connections with
-%%   specific routes as well as vary the timing and content of traffic
-%%   on the connections it creates. A special case of the disrupter with
-%%   additional abilities appropriate to its role in forming connections.
-%% \item[Hostile responder:] can vary the traffic on the connections made
-%%   to it including refusing them entirely, intentionally modifying what
-%%   it sends and at what rate, and selectively closing them. Also a
-%%   special case of the disrupter.
-%% \item[Key breaker:] can break the key used to encrypt connection
-%%   initiation requests sent to a Tor-node.
-%% % Er, there are no long-term private decryption keys. They have
-%% % long-term private signing keys, and medium-term onion (decryption)
-%% % keys. Plus short-term link keys. Should we lump them together or
-%% % separate them out? -RD
-%% %
-%% %  Hmmm, I was talking about the keys used to encrypt the onion skin
-%% %  that contains the public DH key from the initiator. Is that what you
-%% %  mean by medium-term onion key? (``Onion key'' used to mean the
-%% %  session keys distributed in the onion, back when there were onions.)
-%% %  Also, why are link keys short-term? By link keys I assume you mean
-%% %  keys that neighbor nodes use to superencrypt all the stuff they send
-%% %  to each other on a link.  Did you mean the session keys? I had been
-%% %  calling session keys short-term and everything else long-term. I
-%% %  know I was being sloppy. (I _have_ written papers formalizing
-%% %  concepts of relative freshness.) But, there's some questions lurking
-%% %  here. First up, I don't see why the onion-skin encryption key should
-%% %  be any shorter term than the signature key in terms of threat
-%% %  resistance. I understand that how we update onion-skin encryption
-%% %  keys makes them depend on the signature keys. But, this is not the
-%% %  basis on which we should be deciding about key rotation. Another
-%% %  question is whether we want to bother with someone who breaks a
-%% %  signature key as a particular adversary. He should be able to do
-%% %  nearly the same as a compromised tor-node, although they're not the
-%% %  same. I reworded above, I'm thinking we should leave other concerns
-%% %  for later. -PS
-%% \item[Hostile Tor node:] can arbitrarily manipulate the
-%%   connections under its control, as well as creating new connections
-%%   (that pass through itself).
-%% \end{tightlist}
-%
-%% All feasible adversaries can be composed out of these basic
-%% adversaries. This includes combinations such as one or more
-%% compromised Tor-nodes cooperating with disrupters of links on which
-%% those nodes are not adjacent, or such as combinations of hostile
-%% outsiders and link observers (who watch links between adjacent
-%% Tor-nodes).  Note that one type of observer might be a Tor-node. This
-%% is sometimes called an honest-but-curious adversary. While an observer
-%% Tor-node will perform only correct protocol interactions, it might
-%% share information about connections and cannot be assumed to destroy
-%% session keys at end of a session.  Note that a compromised Tor-node is
-%% stronger than any other adversary component in the sense that
-%% replacing a component of any adversary with a compromised Tor-node
-%% results in a stronger overall adversary (assuming that the compromised
-%% Tor-node retains the same signature keys and other private
-%% state-information as the component it replaces).
+%Large adversaries will be able to compromise a considerable fraction
+%of the network. (In some circumstances---for example, if the Tor
+%network is running on a hardened network where all operators have
+%had background checks---the number of compromised nodes could be quite
+%small.) Compromised nodes can arbitrarily manipulate the connections that
+%pass through them, as well as creating new connections that pass through
+%themselves.  They can observe traffic, and record it for later analysis.
 
-First, we assume that a threshold of directory servers are honest,
-reliable, accurate, and trustworthy.
-%% the rest of this isn't needed, if dirservers do threshold concensus dirs
-%  To augment this, users can periodically cross-check 
-%directories from each directory server (trust, but verify).
-%, and that they always have access to at least one directory server that they trust.
+In low-latency anonymity systems that use layered encryption, the
+adversary's typical goal is to observe both the initiator and the
+receiver. Passive attackers can confirm a suspicion that Alice is
+talking to Bob if the timing and volume properties of the traffic on the
+connection are unique enough; active attackers are even more effective
+because they can induce timing signatures on the traffic. Tor provides
+some defenses against these \emph{traffic confirmation} attacks, for
+example by encouraging users to run their own onion routers, but it does
+not provide complete protection. Rather, we aim to prevent \emph{traffic
+analysis} attacks, where the adversary uses traffic patterns to learn
+which points in the network he should attack.
 
-Second, we assume that somewhere between ten percent and twenty
-percent\footnote{In some circumstances---for example, if the Tor network is
-  running on a hardened network where all operators have had background
-  checks---the number of compromised nodes could be much lower.} 
-of the Tor nodes accepted by the directory servers are compromised, hostile,
-and collaborating in an off-line clique.  These compromised nodes can
-arbitrarily manipulate the connections that pass through them, as well as
-creating new connections that pass through themselves.  They can observe
-traffic, and record it for later analysis.  Honest participants do not know
-which servers these are.
-
-(In reality, many adversaries might have `bad' servers that are not
-fully compromised but simply under observation, or that have had their keys
-compromised.  But for the sake of analysis, we ignore, this possibility,
-since the threat model we assume is strictly stronger.)
-
-% This next paragraph is also more about analysis than it is about our
-% threat model.  Perhaps we can say, ``users can connect to the network and
-% use it in any way; we consider abusive attacks separately.'' ? -NM
-Third, we constrain the impact of hostile users.  Users are assumed to vary
-widely in both the duration and number of times they are connected to the Tor
-network. They can also be assumed to vary widely in the volume and shape of
-the traffic they send and receive. Hostile users are, by definition, limited
-to creating and varying their own connections into or through a Tor
-network. They may attack their own connections to try to gain identity
-information of the responder in a rendezvous connection. They can also try to
-attack sites through the Onion Routing network; however we will consider this
-abuse rather than an attack per se (see
-Section~\ref{subsec:exitpolicies}). Other than abuse, a hostile user's
-motivation to attack his own connections is limited to the network effects of
-such actions, such as denial of service (DoS) attacks.  Thus, in this case,
-we can view user as simply an extreme case of the ordinary user; although
-ordinary users are not likely to engage in, e.g., IP spoofing, to gain their
-objectives.
-
-In general, we are more focused on traffic analysis attacks than
-traffic confirmation attacks. 
-%A user who runs a Tor proxy on his own
-%machine, connects to some remote Tor-node and makes a connection to an
-%open Internet site, such as a public web server, is vulnerable to
-%traffic confirmation.
-That is, an active attacker who suspects that
-a particular client is communicating with a particular server can
-confirm this if she can modify and observe both the
-connection between the Tor network and the client and that between the
-Tor network and the server. Even a purely passive attacker can
-confirm traffic if the timing and volume properties of the traffic on
-the connection are unique enough.  (This is not to say that Tor offers
-no resistance to traffic confirmation; it does.  We defer discussion
-of this point and of particular attacks until Section~\ref{sec:attacks},
-after we have described Tor in more detail.)
-% XXX We need to say what traffic analysis is:  How about...
-On the other hand, we {\it do} try to prevent an attacker from
-performing traffic analysis: that is, attempting to learn the communication
-partners of an arbitrary user.
-% XXX If that's not right, what is?  It would be silly to have a
-% threat model section without saying what we want to prevent the
-% attacker from doing. -NM
-% XXX Also, do we want to mention linkability or building profiles? -NM
-
-Our assumptions about our adversary's capabilities imply a number of
-possible attacks against users' anonymity.  Our adversary might try to
-mount passive attacks by observing the edges of the network and
-correlating traffic entering and leaving the network: either because
-of relationships in packet timing; relationships in the volume of data
-sent; [XXX simple observation??]; or relationships in any externally
-visible user-selected options.  The adversary can also mount active
-attacks by trying to compromise all the servers' keys in a
-path---either through illegitimate means or through legal coercion in
-unfriendly jurisdiction; by selectively DoSing trustworthy servers; by
-introducing patterns into entering traffic that can later be detected;
-or by modifying data entering the network and hoping that trashed data
-comes out the other end.  The attacker can additionally try to
-decrease the network's reliability by performing antisocial activities
-from reliable servers and trying to get them taken down.
-% XXX Should there be more or less?  Should we turn this into a
-% bulleted list?  Should we cut it entirely?
-
-We consider these attacks and more, and describe our defenses against them
-in Section~\ref{sec:attacks}.
+Our adversary might try to link an initiator Alice with any of her
+communication partners, or he might try to build a profile of Alice's
+behavior. He might mount passive attacks by observing the edges of the
+network and correlating traffic entering and leaving the network---either
+because of relationships in packet timing; relationships in the volume
+of data sent; or relationships in any externally visible user-selected
+options. The adversary can also mount active attacks by compromising
+routers or keys; by replaying traffic; by selectively DoSing trustworthy
+routers to encourage users to send their traffic through compromised
+routers, or DoSing users to see if the traffic elsewhere in the
+network stops; or by introducing patterns into traffic that can later be
+detected. The adversary might attack the directory servers to give users
+differing views of network state. Additionally, he can try to decrease
+the network's reliability by attacking nodes or by performing antisocial
+activities from reliable servers and trying to get them taken down;
+making the network unreliable flushes users to other less anonymous
+systems, where they may be easier to attack.
 
+We consider each of these attacks in more detail below, and summarize
+in Section~\ref{sec:attacks} how well the Tor design defends against
+each of them.
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
@@ -2004,7 +1861,7 @@ issues remaining to be ironed out. In particular:
 
 % Many of these (Scalability, cover traffic) are duplicates from open problems.
 %
-\begin{itemize}
+\begin{tightlist}
 \item \emph{Scalability:} Tor's emphasis on design simplicity and
   deployability has led us to adopt a clique topology, a
   semi-centralized model for directories and trusts, and a
@@ -2049,7 +1906,7 @@ issues remaining to be ironed out. In particular:
   able to evaluate some of our design decisions, including our
   robustness/latency tradeoffs, our abuse-prevention mechanisms, and
   our overall usability.
-\end{itemize}
+\end{tightlist}
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%