Initial low-level changes to section 4

svn:r696
This commit is contained in:
Nick Mathewson 2003-10-30 23:05:40 +00:00
parent 38400b3098
commit 3ae1331088

View File

@ -692,15 +692,17 @@ in Section~\ref{sec:attacks}.
\label{sec:design} \label{sec:design}
The Tor network is an overlay network; each node is called an onion router The Tor network is an overlay network; each node is called an onion router
(OR). Onion routers run on normal computers without needing any special (OR). Onion routers run as normal user-level processes without needing
privileges. Each OR maintains a long-term TLS connection to every other any special
OR (although we look at ways to relax this clique-topology assumption in privileges. Currently, each OR maintains a long-term TLS connection
to every other
OR. (We examine some ways to relax this clique-topology assumption in
section \ref{subsec:restricted-routes}). A subset of the ORs also act as section \ref{subsec:restricted-routes}). A subset of the ORs also act as
directory servers, tracking which routers are currently in the network; directory servers, tracking which routers are currently in the network;
see section \ref{subsec:dirservers} for directory server details. Users see section \ref{subsec:dirservers} for directory server details. Users
run local software called an onion proxy (OP) that fetches directories, run local software called an onion proxy (OP) to fetch directories,
establishes paths (called \emph{virtual circuits}) over the network, establish paths (called \emph{virtual circuits}) across the network,
and handles connections from the user applications. Onion proxies accept and handle connections from user applications. Onion proxies accept
TCP streams and multiplex them across the virtual circuit. The onion TCP streams and multiplex them across the virtual circuit. The onion
router on the other side router on the other side
% I don't mean other side, I mean wherever it is on the circuit. But % I don't mean other side, I mean wherever it is on the circuit. But
@ -708,44 +710,51 @@ router on the other side
of the circuit connects to the destinations of of the circuit connects to the destinations of
the TCP streams and relays data. the TCP streams and relays data.
Onion routers have three types of keys. The first key is the identity Each onion router uses three public keys: a long-term identity key, a
(signing) key. An OR uses this key to sign TLS certificates, to sign its short-term onion key, and a short-term link key. The identity
router descriptor (a summary of its keys, address, bandwidth, exit policy, (signing) key is used to sign TLS certificates, to sign its router
etc), and to sign directories if it is a directory server. Changing the descriptor (a summary of its keys, address, bandwidth, exit policy,
identity key of a router is considered equivalent to creating a new etc), and to sign directories if it is a directory server. Changing
router. The second key is the onion (decryption) key, which is used the identity key of a router is considered equivalent to creating a
for decrypting requests from users to set up a circuit and negotiate new router. The onion (decryption) key is used for decrypting requests
ephemeral keys. Thirdly, each OR shares link keys (generated by TLS) from users to set up a circuit and negotiate ephemeral keys. Finally,
with the other ORs it's connected to. We discuss rotating these keys in link keys are used by the TLS protocol when communicating between
Section \ref{subsec:rotating-keys}. onion routers. We discuss rotating these keys in Section
\ref{subsec:rotating-keys}.
Section \ref{subsec:cells} discusses the structure of the fixed-size Section \ref{subsec:cells} discusses the structure of the fixed-size
\emph{cells} that are the unit of communication in Tor. We describe \emph{cells} that are the unit of communication in Tor. We describe
in Section \ref{subsec:circuits} how circuits work, and how they are in section \ref{subsec:circuits} how virtual circuits are
built, extended, truncated, and destroyed. Section \ref{subsec:tcp} built, extended, truncated, and destroyed. Section \ref{subsec:tcp}
discusses the process of opening TCP streams through Tor, and finally describes how TCP streams are routed through the network, and finally
Section \ref{subsec:congestion} talks about congestion control and Section \ref{subsec:congestion} talks about congestion control and
fairness issues. fairness issues.
\SubSection{Cells} \SubSection{Cells}
\label{subsec:cells} \label{subsec:cells}
Traffic passes from node to node in fixed-size cells. Each cell is 256 % I think we should describe connections before cells. -NM
bytes, and consists of a header and a payload. The header includes the
circuit identifier (ACI) which specifies which circuit the cell refers to Traffic passes from one OR to another, or from a user's OP to an OR,
in fixed-size cells. Each cell is 256
bytes, and consists of a header and a payload. The header includes an
anonymous circuit identifier (ACI) the specifies which circuit the
cell refers to
(many circuits can be multiplexed over the single TCP connection between (many circuits can be multiplexed over the single TCP connection between
ORs or between an OP and an OR), and a command to describe what to do ORs or between an OP and an OR), and a command to describe what to do
with the cell's payload. Cells are either control cells, meaning they are with the cell's payload. Cells are either \emph{control} cells, which are
intended to be interpreted by the node that receives them, or relay cells, interpreted by the node that receives them, or \emph{relay} cells,
meaning they carry end-to-end stream data. Controls cells can be one of: whichcarry end-to-end stream data. Controls cells can be one of:
\emph{padding} (currently used for keepalive, but can be used for link \emph{padding} (currently used for keepalive, but also usable for link
padding), \emph{create} or \emph{created} (to set up a new circuit), padding); \emph{create} or \emph{created} (used to set up a new circuit);
or \emph{destroy} (to tear down a circuit). or \emph{destroy} (to tear down a circuit).
% We need to say that ACIs are connection-specific: each circuit has
% a different ACI along each connection. -NM
Relay cells have an additional header (the relay header) after the Relay cells have an additional header (the relay header) after the
cell header, which specifies the stream identifier (many streams can cell header, containing a the stream identifier (many streams can
be multiplexed over a circuit), an end-to-end checksum for integrity be multiplexed over a circuit); an end-to-end checksum for integrity
checking, the length of the relay payload, and a relay command. Relay checking; the length of the relay payload; and a relay command. Relay
commands can be one of: \emph{relay commands can be one of: \emph{relay
data} (for data flowing down the stream), \emph{relay begin} (to open a data} (for data flowing down the stream), \emph{relay begin} (to open a
stream), \emph{relay end} (to close a stream), \emph{relay connected} stream), \emph{relay end} (to close a stream), \emph{relay connected}
@ -756,36 +765,48 @@ and to acknowledge), \emph{relay truncate} and \emph{relay truncated}
sendme} (used for congestion control), and \emph{relay drop} (used to sendme} (used for congestion control), and \emph{relay drop} (used to
implement long-range dummies). implement long-range dummies).
We will talk more about each of these cell types below. We describe each of these cell types in more detail below.
% Nick: should there have been a table here? -RD % Nick: should there have been a table here? -RD
% Maybe. -NM
\SubSection{Circuits and streams} \SubSection{Circuits and streams}
\label{subsec:circuits} \label{subsec:circuits}
While the original Onion Routing design built one circuit for each stream, % I think when we say ``the user,'' maybe we should say ``the user's OP.''
Tor circuits can be used by many streams. Thus because circuits can
take several tenths of a second to construct due to crypto and network
latency, users construct circuits preemptively. Users build a new circuit
periodically (currently every minute) if the previous one has been used,
and expire old used circuits that are no longer in use. Thus even very
active users spend a negligible amount of time and CPU in building
circuits, but only a limited number of requests can be linked to each
other by a given exit node. Also, because circuits are built in the
background, an already failed router never affects the user experience.
Users set up circuits incrementally, negotiating a symmetric key with The original Onion Routing design built one circuit for each
each hop one at a time. To create a new circuit, the user (call her TCP stream. Because building a circuit can take several tenths of a
Alice) sends a \emph{create} cell to the first node in her chosen second (due to public-key cryptography delays and network latency),
path. The payload is the first half of the Diffie-Hellman handshake, this design imposed high costs on applications like web browsing that
encrypted to the onion key of the OR (call him Bob). Bob responds with a open many TCP streams.
\emph{created} cell with the second half of the DH handshake, along with
a hash of $K=g^{xy}$. The goal is to get unilateral entity authentication In Tor, each circuit can be shared by many TCP streams. To avoid
(Alice knows she's handshaking with Bob, Bob doesn't care who it is --- delays, users construct circuits preemptively. To limit linkability
recall that Alice has no key and is trying to remain anonymous) and among the streams, users rotate connections by building a new circuit
unilateral key authentication (Alice and Bob agree on a key, and Alice periodically (currently every minute) if the previous one has been
knows Bob is the only other person who could know it --- if he is used, and expire old used circuits that are no longer in use. Thus
honest, etc.). We also want perfect forward secrecy, key freshness, etc. even very active users spend a negligible amount of time and CPU in
building circuits, but only a limited number of requests can be linked
to each other by a given exit node. Also, because circuits are built
in the background, failed routers do not affects user experience.
\subsubsection{Constructing a circuit}
Users construct each incrementally, negotiating a symmetric key with
each hop one at a time. To begin creating a new circuit, the user
(call her Alice) sends a \emph{create} cell to the first node in her
chosen path. The cell's payload is the first half of the
Diffie-Hellman handshake, encrypted to the onion key of the OR (call
him Bob). Bob responds with a \emph{created} cell containg the second
half of the DH handshake, along with a hash of the negotiated key
$K=g^{xy}$. This protocol tries to achieve unilateral entity
authentication (Alice knows she's handshaking with Bob, Bob doesn't
care who is opening the circuit---Alice has no key and is trying to
remain anonymous); unilateral key authentication (Alice and Bob
agree on a key, and Alice knows Bob is the only other person who could
know it). We also want perfect forward
secrecy, key freshness, etc.
\begin{equation} \begin{equation}
\begin{aligned} \begin{aligned}
@ -805,6 +826,9 @@ traditional Dolev-Yao model.
% cite Cathy? -RD % cite Cathy? -RD
% did I use the buzzwords correctly? -RD % did I use the buzzwords correctly? -RD
% Hm. I think that this paragraph could go earlier in expository
% order: we describe how to build whole circuit, then explain the
% protocol in more detail. -NM
To extend a circuit past the first hop, Alice sends a \emph{relay extend} To extend a circuit past the first hop, Alice sends a \emph{relay extend}
cell to the last node in the circuit, specifying the address of the new cell to the last node in the circuit, specifying the address of the new
OR and an encrypted $g^x$ for it. That node copies the half-handshake OR and an encrypted $g^x$ for it. That node copies the half-handshake
@ -813,6 +837,7 @@ circuit. When it responds with a \emph{created} cell, the penultimate OR
copies the payload into a \emph{relay extended} cell and passes it back. copies the payload into a \emph{relay extended} cell and passes it back.
% Nick: please fix my "that OR" pronouns -RD % Nick: please fix my "that OR" pronouns -RD
\subsubsection{Relay cells}
Once Alice has established the circuit (so she shares a key with each Once Alice has established the circuit (so she shares a key with each
OR on the circuit), she can send relay cells. OR on the circuit), she can send relay cells.
The stream ID in the relay header indicates to which stream the cell belongs. The stream ID in the relay header indicates to which stream the cell belongs.
@ -835,7 +860,7 @@ in the circuit receives the destroy cell, closes all open streams on
that circuit, and passes a new destroy cell forward. But since circuits that circuit, and passes a new destroy cell forward. But since circuits
can be built incrementally, they can also be torn down incrementally: can be built incrementally, they can also be torn down incrementally:
Alice can send a relay truncate cell to a node along the circuit. That Alice can send a relay truncate cell to a node along the circuit. That
node will send a destroy cell forward, and reply with an acknowledgement node will send a destroy cell forward, and reply with an acknowledgment
(relay truncated). Alice might truncate her circuit so she can extend it (relay truncated). Alice might truncate her circuit so she can extend it
to different nodes without signaling to the first few nodes (or somebody to different nodes without signaling to the first few nodes (or somebody
observing them) that she is changing her circuit. That is, nodes in the observing them) that she is changing her circuit. That is, nodes in the
@ -890,31 +915,33 @@ but are still willing to read.
\SubSection{Integrity checking on streams} \SubSection{Integrity checking on streams}
In the old Onion Routing design, traffic was vulnerable to a malleability In the old Onion Routing design, traffic was vulnerable to a
attack: without integrity checking, an adversary could malleability attack: an attacker could make changes to an encrypted
guess some of the plaintext of a cell, xor it out, and xor in his own cell to create corresponding changes to the data leaving the network.
plaintext. Even an external adversary could do this despite the link (Even an external adversary could do this, despite link encryption!)
encryption!
For example, an adversary could change a create cell to a This weakness allowed an adversary to change a create cell to a destroy
destroy cell; change the destination address in a relay begin cell cell; change the destination address in a relay begin cell to the
to the adversary's webserver; or change a user on an ftp connection adversary's webserver; or change a user on an ftp connection from
from typing ``dir'' to typing ``delete *''. Any node or observer along typing ``dir'' to typing ``delete *''. Any node or observer along the
the path can introduce such corruption in a stream. path could introduce such corruption in a stream.
Tor solves this malleability attack with respect to external adversaries Tor prevents external adversaries by mounting this attack simply by
simply by using TLS. Addressing the insider malleability attack is more using TLS. Addressing the insider malleability attack, however, is
complex. more complex.
Rather than doing integrity checking of the relay cells at each hop Rather than doing integrity checking of the relay cells at each hop,
(like Mixminion \cite{minion-design}), which would increase packet size which would increase packet size
by a function of path length\footnote{This is also the argument against by a function of path length\footnote{This is also the argument against
using recent cipher modes like EAX \cite{eax} --- we don't want the added using recent cipher modes like EAX \cite{eax} --- we don't want the added
message-expansion overhead at each hop, and we don't want to leak the path message-expansion overhead at each hop, and we don't want to leak the path
length (or pad to some max path length).}, we choose to accept passive length (or pad to some max path length).}, we choose to
timing attacks, and do integrity % accept passive timing attacks,
% (How? I don't get it. Do we mean end-to-end traffic
% confirmation attacks? -NM)
and preform integrity
checking only at the edges of the circuit. When Alice negotiates a key checking only at the edges of the circuit. When Alice negotiates a key
with that hop, they both start a SHA-1 with some derivative of that key, with the exit hop, they both start a SHA-1 with some derivative of that key,
thus starting out with randomness that only the two of them know. From thus starting out with randomness that only the two of them know. From
then on they each incrementally add all the data bytes flowing across then on they each incrementally add all the data bytes flowing across
the stream to the SHA-1, and each relay cell includes the first 4 bytes the stream to the SHA-1, and each relay cell includes the first 4 bytes