Initial low-level changes to section 4

svn:r696
This commit is contained in:
Nick Mathewson 2003-10-30 23:05:40 +00:00
parent 38400b3098
commit 3ae1331088

View File

@ -692,15 +692,17 @@ in Section~\ref{sec:attacks}.
\label{sec:design}
The Tor network is an overlay network; each node is called an onion router
(OR). Onion routers run on normal computers without needing any special
privileges. Each OR maintains a long-term TLS connection to every other
OR (although we look at ways to relax this clique-topology assumption in
(OR). Onion routers run as normal user-level processes without needing
any special
privileges. Currently, each OR maintains a long-term TLS connection
to every other
OR. (We examine some ways to relax this clique-topology assumption in
section \ref{subsec:restricted-routes}). A subset of the ORs also act as
directory servers, tracking which routers are currently in the network;
see section \ref{subsec:dirservers} for directory server details. Users
run local software called an onion proxy (OP) that fetches directories,
establishes paths (called \emph{virtual circuits}) over the network,
and handles connections from the user applications. Onion proxies accept
run local software called an onion proxy (OP) to fetch directories,
establish paths (called \emph{virtual circuits}) across the network,
and handle connections from user applications. Onion proxies accept
TCP streams and multiplex them across the virtual circuit. The onion
router on the other side
% I don't mean other side, I mean wherever it is on the circuit. But
@ -708,44 +710,51 @@ router on the other side
of the circuit connects to the destinations of
the TCP streams and relays data.
Onion routers have three types of keys. The first key is the identity
(signing) key. An OR uses this key to sign TLS certificates, to sign its
router descriptor (a summary of its keys, address, bandwidth, exit policy,
etc), and to sign directories if it is a directory server. Changing the
identity key of a router is considered equivalent to creating a new
router. The second key is the onion (decryption) key, which is used
for decrypting requests from users to set up a circuit and negotiate
ephemeral keys. Thirdly, each OR shares link keys (generated by TLS)
with the other ORs it's connected to. We discuss rotating these keys in
Section \ref{subsec:rotating-keys}.
Each onion router uses three public keys: a long-term identity key, a
short-term onion key, and a short-term link key. The identity
(signing) key is used to sign TLS certificates, to sign its router
descriptor (a summary of its keys, address, bandwidth, exit policy,
etc), and to sign directories if it is a directory server. Changing
the identity key of a router is considered equivalent to creating a
new router. The onion (decryption) key is used for decrypting requests
from users to set up a circuit and negotiate ephemeral keys. Finally,
link keys are used by the TLS protocol when communicating between
onion routers. We discuss rotating these keys in Section
\ref{subsec:rotating-keys}.
Section \ref{subsec:cells} discusses the structure of the fixed-size
\emph{cells} that are the unit of communication in Tor. We describe
in Section \ref{subsec:circuits} how circuits work, and how they are
in section \ref{subsec:circuits} how virtual circuits are
built, extended, truncated, and destroyed. Section \ref{subsec:tcp}
discusses the process of opening TCP streams through Tor, and finally
describes how TCP streams are routed through the network, and finally
Section \ref{subsec:congestion} talks about congestion control and
fairness issues.
\SubSection{Cells}
\label{subsec:cells}
Traffic passes from node to node in fixed-size cells. Each cell is 256
bytes, and consists of a header and a payload. The header includes the
circuit identifier (ACI) which specifies which circuit the cell refers to
% I think we should describe connections before cells. -NM
Traffic passes from one OR to another, or from a user's OP to an OR,
in fixed-size cells. Each cell is 256
bytes, and consists of a header and a payload. The header includes an
anonymous circuit identifier (ACI) the specifies which circuit the
cell refers to
(many circuits can be multiplexed over the single TCP connection between
ORs or between an OP and an OR), and a command to describe what to do
with the cell's payload. Cells are either control cells, meaning they are
intended to be interpreted by the node that receives them, or relay cells,
meaning they carry end-to-end stream data. Controls cells can be one of:
\emph{padding} (currently used for keepalive, but can be used for link
padding), \emph{create} or \emph{created} (to set up a new circuit),
with the cell's payload. Cells are either \emph{control} cells, which are
interpreted by the node that receives them, or \emph{relay} cells,
whichcarry end-to-end stream data. Controls cells can be one of:
\emph{padding} (currently used for keepalive, but also usable for link
padding); \emph{create} or \emph{created} (used to set up a new circuit);
or \emph{destroy} (to tear down a circuit).
% We need to say that ACIs are connection-specific: each circuit has
% a different ACI along each connection. -NM
Relay cells have an additional header (the relay header) after the
cell header, which specifies the stream identifier (many streams can
be multiplexed over a circuit), an end-to-end checksum for integrity
checking, the length of the relay payload, and a relay command. Relay
cell header, containing a the stream identifier (many streams can
be multiplexed over a circuit); an end-to-end checksum for integrity
checking; the length of the relay payload; and a relay command. Relay
commands can be one of: \emph{relay
data} (for data flowing down the stream), \emph{relay begin} (to open a
stream), \emph{relay end} (to close a stream), \emph{relay connected}
@ -756,36 +765,48 @@ and to acknowledge), \emph{relay truncate} and \emph{relay truncated}
sendme} (used for congestion control), and \emph{relay drop} (used to
implement long-range dummies).
We will talk more about each of these cell types below.
We describe each of these cell types in more detail below.
% Nick: should there have been a table here? -RD
% Maybe. -NM
\SubSection{Circuits and streams}
\label{subsec:circuits}
While the original Onion Routing design built one circuit for each stream,
Tor circuits can be used by many streams. Thus because circuits can
take several tenths of a second to construct due to crypto and network
latency, users construct circuits preemptively. Users build a new circuit
periodically (currently every minute) if the previous one has been used,
and expire old used circuits that are no longer in use. Thus even very
active users spend a negligible amount of time and CPU in building
circuits, but only a limited number of requests can be linked to each
other by a given exit node. Also, because circuits are built in the
background, an already failed router never affects the user experience.
% I think when we say ``the user,'' maybe we should say ``the user's OP.''
Users set up circuits incrementally, negotiating a symmetric key with
each hop one at a time. To create a new circuit, the user (call her
Alice) sends a \emph{create} cell to the first node in her chosen
path. The payload is the first half of the Diffie-Hellman handshake,
encrypted to the onion key of the OR (call him Bob). Bob responds with a
\emph{created} cell with the second half of the DH handshake, along with
a hash of $K=g^{xy}$. The goal is to get unilateral entity authentication
(Alice knows she's handshaking with Bob, Bob doesn't care who it is ---
recall that Alice has no key and is trying to remain anonymous) and
unilateral key authentication (Alice and Bob agree on a key, and Alice
knows Bob is the only other person who could know it --- if he is
honest, etc.). We also want perfect forward secrecy, key freshness, etc.
The original Onion Routing design built one circuit for each
TCP stream. Because building a circuit can take several tenths of a
second (due to public-key cryptography delays and network latency),
this design imposed high costs on applications like web browsing that
open many TCP streams.
In Tor, each circuit can be shared by many TCP streams. To avoid
delays, users construct circuits preemptively. To limit linkability
among the streams, users rotate connections by building a new circuit
periodically (currently every minute) if the previous one has been
used, and expire old used circuits that are no longer in use. Thus
even very active users spend a negligible amount of time and CPU in
building circuits, but only a limited number of requests can be linked
to each other by a given exit node. Also, because circuits are built
in the background, failed routers do not affects user experience.
\subsubsection{Constructing a circuit}
Users construct each incrementally, negotiating a symmetric key with
each hop one at a time. To begin creating a new circuit, the user
(call her Alice) sends a \emph{create} cell to the first node in her
chosen path. The cell's payload is the first half of the
Diffie-Hellman handshake, encrypted to the onion key of the OR (call
him Bob). Bob responds with a \emph{created} cell containg the second
half of the DH handshake, along with a hash of the negotiated key
$K=g^{xy}$. This protocol tries to achieve unilateral entity
authentication (Alice knows she's handshaking with Bob, Bob doesn't
care who is opening the circuit---Alice has no key and is trying to
remain anonymous); unilateral key authentication (Alice and Bob
agree on a key, and Alice knows Bob is the only other person who could
know it). We also want perfect forward
secrecy, key freshness, etc.
\begin{equation}
\begin{aligned}
@ -805,6 +826,9 @@ traditional Dolev-Yao model.
% cite Cathy? -RD
% did I use the buzzwords correctly? -RD
% Hm. I think that this paragraph could go earlier in expository
% order: we describe how to build whole circuit, then explain the
% protocol in more detail. -NM
To extend a circuit past the first hop, Alice sends a \emph{relay extend}
cell to the last node in the circuit, specifying the address of the new
OR and an encrypted $g^x$ for it. That node copies the half-handshake
@ -813,6 +837,7 @@ circuit. When it responds with a \emph{created} cell, the penultimate OR
copies the payload into a \emph{relay extended} cell and passes it back.
% Nick: please fix my "that OR" pronouns -RD
\subsubsection{Relay cells}
Once Alice has established the circuit (so she shares a key with each
OR on the circuit), she can send relay cells.
The stream ID in the relay header indicates to which stream the cell belongs.
@ -835,7 +860,7 @@ in the circuit receives the destroy cell, closes all open streams on
that circuit, and passes a new destroy cell forward. But since circuits
can be built incrementally, they can also be torn down incrementally:
Alice can send a relay truncate cell to a node along the circuit. That
node will send a destroy cell forward, and reply with an acknowledgement
node will send a destroy cell forward, and reply with an acknowledgment
(relay truncated). Alice might truncate her circuit so she can extend it
to different nodes without signaling to the first few nodes (or somebody
observing them) that she is changing her circuit. That is, nodes in the
@ -890,31 +915,33 @@ but are still willing to read.
\SubSection{Integrity checking on streams}
In the old Onion Routing design, traffic was vulnerable to a malleability
attack: without integrity checking, an adversary could
guess some of the plaintext of a cell, xor it out, and xor in his own
plaintext. Even an external adversary could do this despite the link
encryption!
In the old Onion Routing design, traffic was vulnerable to a
malleability attack: an attacker could make changes to an encrypted
cell to create corresponding changes to the data leaving the network.
(Even an external adversary could do this, despite link encryption!)
For example, an adversary could change a create cell to a
destroy cell; change the destination address in a relay begin cell
to the adversary's webserver; or change a user on an ftp connection
from typing ``dir'' to typing ``delete *''. Any node or observer along
the path can introduce such corruption in a stream.
This weakness allowed an adversary to change a create cell to a destroy
cell; change the destination address in a relay begin cell to the
adversary's webserver; or change a user on an ftp connection from
typing ``dir'' to typing ``delete *''. Any node or observer along the
path could introduce such corruption in a stream.
Tor solves this malleability attack with respect to external adversaries
simply by using TLS. Addressing the insider malleability attack is more
complex.
Tor prevents external adversaries by mounting this attack simply by
using TLS. Addressing the insider malleability attack, however, is
more complex.
Rather than doing integrity checking of the relay cells at each hop
(like Mixminion \cite{minion-design}), which would increase packet size
Rather than doing integrity checking of the relay cells at each hop,
which would increase packet size
by a function of path length\footnote{This is also the argument against
using recent cipher modes like EAX \cite{eax} --- we don't want the added
message-expansion overhead at each hop, and we don't want to leak the path
length (or pad to some max path length).}, we choose to accept passive
timing attacks, and do integrity
length (or pad to some max path length).}, we choose to
% accept passive timing attacks,
% (How? I don't get it. Do we mean end-to-end traffic
% confirmation attacks? -NM)
and preform integrity
checking only at the edges of the circuit. When Alice negotiates a key
with that hop, they both start a SHA-1 with some derivative of that key,
with the exit hop, they both start a SHA-1 with some derivative of that key,
thus starting out with randomness that only the two of them know. From
then on they each incrementally add all the data bytes flowing across
the stream to the SHA-1, and each relay cell includes the first 4 bytes