Initial low-level changes to section 4

svn:r696
2024-11-28 06:13:31 +01:00 · 2003-10-30 23:05:40 +00:00 · 2003-10-30 23:05:40 +00:00 · 3ae1331088
commit 3ae1331088
parent 38400b3098
1 changed files with 98 additions and 71 deletions
--- a/doc/tor-design.tex
+++ b/doc/tor-design.tex
@ -692,15 +692,17 @@ in Section~\ref{sec:attacks}.
 \label{sec:design}
 The Tor network is an overlay network; each node is called an onion router
-(OR). Onion routers run on normal computers without needing any special
+(OR). Onion routers run as normal user-level processes without needing
-privileges. Each OR maintains a long-term TLS connection to every other
+any special
-OR (although we look at ways to relax this clique-topology assumption in
+privileges.  Currently, each OR maintains a long-term TLS connection
 to every other
 OR.  (We examine some ways to relax this clique-topology assumption in
 section \ref{subsec:restricted-routes}). A subset of the ORs also act as
 directory servers, tracking which routers are currently in the network;
 see section \ref{subsec:dirservers} for directory server details. Users
-run local software called an onion proxy (OP) that fetches directories,
+run local software called an onion proxy (OP) to fetch directories,
-establishes paths (called \emph{virtual circuits}) over the network,
+establish paths (called \emph{virtual circuits}) across the network,
-and handles connections from the user applications. Onion proxies accept
+and handle connections from user applications. Onion proxies accept
 TCP streams and multiplex them across the virtual circuit. The onion
 router on the other side 
 % I don't mean other side, I mean wherever it is on the circuit. But
@ -708,44 +710,51 @@ router on the other side
 of the circuit connects to the destinations of
 the TCP streams and relays data.
-Onion routers have three types of keys. The first key is the identity
+Each onion router uses three public keys: a long-term identity key, a
-(signing) key. An OR uses this key to sign TLS certificates, to sign its
+short-term onion key, and a short-term link key.  The identity
-router descriptor (a summary of its keys, address, bandwidth, exit policy,
+(signing) key is used to sign TLS certificates, to sign its router
-etc), and to sign directories if it is a directory server. Changing the
+descriptor (a summary of its keys, address, bandwidth, exit policy,
-identity key of a router is considered equivalent to creating a new
+etc), and to sign directories if it is a directory server. Changing
-router. The second key is the onion (decryption) key, which is used
+the identity key of a router is considered equivalent to creating a
-for decrypting requests from users to set up a circuit and negotiate
+new router. The onion (decryption) key is used for decrypting requests
-ephemeral keys. Thirdly, each OR shares link keys (generated by TLS)
+from users to set up a circuit and negotiate ephemeral keys. Finally,
-with the other ORs it's connected to. We discuss rotating these keys in
+link keys are used by the TLS protocol when communicating between
-Section \ref{subsec:rotating-keys}.
+onion routers.  We discuss rotating these keys in Section
 \ref{subsec:rotating-keys}.
 Section \ref{subsec:cells} discusses the structure of the fixed-size
 \emph{cells} that are the unit of communication in Tor. We describe
-in Section \ref{subsec:circuits} how circuits work, and how they are
+in section \ref{subsec:circuits} how virtual circuits are
 built, extended, truncated, and destroyed. Section \ref{subsec:tcp}
-discusses the process of opening TCP streams through Tor, and finally
+describes how TCP streams are routed through the network, and finally
 Section \ref{subsec:congestion} talks about congestion control and
 fairness issues.
 \SubSection{Cells}
 \label{subsec:cells}
-Traffic passes from node to node in fixed-size cells. Each cell is 256
+% I think we should describe connections before cells. -NM
-bytes, and consists of a header and a payload. The header includes the
+
-circuit identifier (ACI) which specifies which circuit the cell refers to
+Traffic passes from one OR to another, or from a user's OP to an OR,
 in fixed-size cells. Each cell is 256
 bytes, and consists of a header and a payload. The header includes an
 anonymous circuit identifier (ACI) the specifies which circuit the
 cell refers to
 (many circuits can be multiplexed over the single TCP connection between
 ORs or between an OP and an OR), and a command to describe what to do
-with the cell's payload. Cells are either control cells, meaning they are
+with the cell's payload. Cells are either \emph{control} cells, which are
-intended to be interpreted by the node that receives them, or relay cells,
+interpreted by the node that receives them, or \emph{relay} cells,
-meaning they carry end-to-end stream data. Controls cells can be one of:
+whichcarry end-to-end stream data. Controls cells can be one of:
-\emph{padding} (currently used for keepalive, but can be used for link
+\emph{padding} (currently used for keepalive, but also usable for link
-padding), \emph{create} or \emph{created} (to set up a new circuit),
+padding); \emph{create} or \emph{created} (used to set up a new circuit);
 or \emph{destroy} (to tear down a circuit).
 % We need to say that ACIs are connection-specific: each circuit has
 % a different ACI along each connection. -NM
 Relay cells have an additional header (the relay header) after the
-cell header, which specifies the stream identifier (many streams can
+cell header, containing a the stream identifier (many streams can
-be multiplexed over a circuit), an end-to-end checksum for integrity
+be multiplexed over a circuit); an end-to-end checksum for integrity
-checking, the length of the relay payload, and a relay command. Relay
+checking; the length of the relay payload; and a relay command. Relay
 commands can be one of: \emph{relay
 data} (for data flowing down the stream), \emph{relay begin} (to open a
 stream), \emph{relay end} (to close a stream), \emph{relay connected}
@ -756,36 +765,48 @@ and to acknowledge), \emph{relay truncate} and \emph{relay truncated}
 sendme} (used for congestion control), and \emph{relay drop} (used to
 implement long-range dummies).
-We will talk more about each of these cell types below.
+We describe each of these cell types in more detail below.
 % Nick: should there have been a table here? -RD
 % Maybe. -NM
 \SubSection{Circuits and streams}
 \label{subsec:circuits}
-While the original Onion Routing design built one circuit for each stream,
+% I think when we say ``the user,'' maybe we should say ``the user's OP.''
 Tor circuits can be used by many streams. Thus because circuits can
 take several tenths of a second to construct due to crypto and network
 latency, users construct circuits preemptively. Users build a new circuit
 periodically (currently every minute) if the previous one has been used,
 and expire old used circuits that are no longer in use. Thus even very
 active users spend a negligible amount of time and CPU in building
 circuits, but only a limited number of requests can be linked to each
 other by a given exit node. Also, because circuits are built in the
 background, an already failed router never affects the user experience.
-Users set up circuits incrementally, negotiating a symmetric key with
+The original Onion Routing design built one circuit for each
-each hop one at a time. To create a new circuit, the user (call her
+TCP stream.  Because building a circuit can take several tenths of a
-Alice) sends a \emph{create} cell to the first node in her chosen
+second (due to public-key cryptography delays and network latency),
-path. The payload is the first half of the Diffie-Hellman handshake,
+this design imposed high costs on applications like web browsing that
-encrypted to the onion key of the OR (call him Bob). Bob responds with a
+open many TCP streams.
-\emph{created} cell with the second half of the DH handshake, along with
+
-a hash of $K=g^{xy}$. The goal is to get unilateral entity authentication
+In Tor, each circuit can be shared by many TCP streams.  To avoid
-(Alice knows she's handshaking with Bob, Bob doesn't care who it is ---
+delays, users construct circuits preemptively.  To limit linkability
-recall that Alice has no key and is trying to remain anonymous) and
+among the streams, users rotate connections by building a new circuit
-unilateral key authentication (Alice and Bob agree on a key, and Alice
+periodically (currently every minute) if the previous one has been
-knows Bob is the only other person who could know it --- if he is
+used, and expire old used circuits that are no longer in use. Thus
-honest, etc.). We also want perfect forward secrecy, key freshness, etc.
+even very active users spend a negligible amount of time and CPU in
 building circuits, but only a limited number of requests can be linked
 to each other by a given exit node. Also, because circuits are built
 in the background, failed routers do not affects user experience.
 \subsubsection{Constructing a circuit}
 Users construct each incrementally, negotiating a symmetric key with
 each hop one at a time. To begin creating a new circuit, the user
 (call her Alice) sends a \emph{create} cell to the first node in her
 chosen path. The cell's payload is the first half of the
 Diffie-Hellman handshake, encrypted to the onion key of the OR (call
 him Bob). Bob responds with a \emph{created} cell containg the second
 half of the DH handshake, along with a hash of the negotiated key
 $K=g^{xy}$.  This protocol tries to achieve unilateral entity
 authentication (Alice knows she's handshaking with Bob, Bob doesn't
 care who is opening the circuit---Alice has no key and is trying to
 remain anonymous); unilateral key authentication (Alice and Bob
 agree on a key, and Alice knows Bob is the only other person who could
 know it).  We also want perfect forward
 secrecy, key freshness, etc.
 \begin{equation}
 \begin{aligned}
@ -805,6 +826,9 @@ traditional Dolev-Yao model.
 % cite Cathy? -RD
 % did I use the buzzwords correctly? -RD
 % Hm.  I think that this paragraph could go earlier in expository
 % order: we describe how to build whole circuit, then explain the
 % protocol in more detail.  -NM
 To extend a circuit past the first hop, Alice sends a \emph{relay extend}
 cell to the last node in the circuit, specifying the address of the new
 OR and an encrypted $g^x$ for it. That node copies the half-handshake
@ -813,6 +837,7 @@ circuit. When it responds with a \emph{created} cell, the penultimate OR
 copies the payload into a \emph{relay extended} cell and passes it back.
 % Nick: please fix my "that OR" pronouns -RD
 \subsubsection{Relay cells}
 Once Alice has established the circuit (so she shares a key with each
 OR on the circuit), she can send relay cells.
 The stream ID in the relay header indicates to which stream the cell belongs.
@ -835,7 +860,7 @@ in the circuit receives the destroy cell, closes all open streams on
 that circuit, and passes a new destroy cell forward. But since circuits
 can be built incrementally, they can also be torn down incrementally:
 Alice can send a relay truncate cell to a node along the circuit. That
-node will send a destroy cell forward, and reply with an acknowledgement
+node will send a destroy cell forward, and reply with an acknowledgment
 (relay truncated). Alice might truncate her circuit so she can extend it
 to different nodes without signaling to the first few nodes (or somebody
 observing them) that she is changing her circuit. That is, nodes in the
@ -890,31 +915,33 @@ but are still willing to read.
 \SubSection{Integrity checking on streams}
-In the old Onion Routing design, traffic was vulnerable to a malleability
+In the old Onion Routing design, traffic was vulnerable to a
-attack: without integrity checking, an adversary could
+malleability attack: an attacker could make changes to an encrypted
-guess some of the plaintext of a cell, xor it out, and xor in his own
+cell to create corresponding changes to the data leaving the network.
-plaintext. Even an external adversary could do this despite the link
+(Even an external adversary could do this, despite link encryption!)
 encryption!
-For example, an adversary could change a create cell to a
+This weakness allowed an adversary to change a create cell to a destroy
-destroy cell; change the destination address in a relay begin cell
+cell; change the destination address in a relay begin cell to the
-to the adversary's webserver; or change a user on an ftp connection
+adversary's webserver; or change a user on an ftp connection from
-from typing ``dir'' to typing ``delete *''. Any node or observer along
+typing ``dir'' to typing ``delete *''. Any node or observer along the
-the path can introduce such corruption in a stream.
+path could introduce such corruption in a stream.
-Tor solves this malleability attack with respect to external adversaries
+Tor prevents external adversaries by mounting this attack simply by
-simply by using TLS. Addressing the insider malleability attack is more
+using TLS. Addressing the insider malleability attack, however, is
-complex.
+more complex.
-Rather than doing integrity checking of the relay cells at each hop
+Rather than doing integrity checking of the relay cells at each hop,
-(like Mixminion \cite{minion-design}), which would increase packet size
+which would increase packet size
 by a function of path length\footnote{This is also the argument against
 using recent cipher modes like EAX \cite{eax} --- we don't want the added
 message-expansion overhead at each hop, and we don't want to leak the path
-length (or pad to some max path length).}, we choose to accept passive
+length (or pad to some max path length).}, we choose to
-timing attacks, and do integrity
+% accept passive timing attacks, 
 %    (How?  I don't get it.  Do we mean end-to-end traffic
 %    confirmation attacks? -NM)
 and preform integrity
 checking only at the edges of the circuit. When Alice negotiates a key
-with that hop, they both start a SHA-1 with some derivative of that key,
+with the exit hop, they both start a SHA-1 with some derivative of that key,
 thus starting out with randomness that only the two of them know. From
 then on they each incrementally add all the data bytes flowing across
 the stream to the SHA-1, and each relay cell includes the first 4 bytes