mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-27 13:53:31 +01:00
Updated HACKING and README docs
HACKING now explains bandwidth throttling, congestion control, and router twins. Read it and see if it makes sense. svn:r68
This commit is contained in:
parent
61dc00bcaa
commit
86eb8db0f0
68
HACKING
68
HACKING
@ -8,30 +8,41 @@ Read the README file first, so you can get familiar with the basics.
|
||||
|
||||
1. The pieces.
|
||||
|
||||
1.1 Connections. A connection is a long-standing tcp socket between
|
||||
1.1. Routers. Onion routers, as far as the 'or' program is concerned,
|
||||
are a bunch of data items that are loaded into the router_array when
|
||||
the program starts. After it's loaded, the router information is never
|
||||
changed. When a new OR connection is started (see below), the relevant
|
||||
information is copied from the router struct to the connection struct.
|
||||
|
||||
1.2. Connections. A connection is a long-standing tcp socket between
|
||||
nodes. A connection is named based on what it's connected to -- an "OR
|
||||
connection" has an onion router on the other end, an "OP connection" has
|
||||
an onion proxy on the other end, an "exit connection" has a website or
|
||||
other server on the other end, and an "AP connection" has an application
|
||||
proxy (and thus a user) on the other end.
|
||||
|
||||
1.2. Circuits. A circuit is a single conversation between two
|
||||
1.3. Circuits. A circuit is a single conversation between two
|
||||
participants over the onion routing network. One end of the circuit has
|
||||
an AP connection, and the other end has an exit connection. AP and exit
|
||||
connections have only one circuit associated with them, whereas OP and
|
||||
OR connections multiplex many circuits at once.
|
||||
connections have only one circuit associated with them (and thus these
|
||||
connection types are closed when the circuit is closed), whereas OP and
|
||||
OR connections multiplex many circuits at once, and stay standing even
|
||||
when there are no circuits running over them.
|
||||
|
||||
1.3. Cells. Some connections, specifically OR and OP connections, speak
|
||||
1.4. Cells. Some connections, specifically OR and OP connections, speak
|
||||
"cells". This means that data over that connection is bundled into 128
|
||||
byte packets (8 bytes of header and 120 bytes of payload). Each cell has
|
||||
a type, or "command", which indicates what it's for.
|
||||
|
||||
|
||||
2. Important parameters in the code.
|
||||
|
||||
2.1. Role.
|
||||
|
||||
|
||||
2. Other features.
|
||||
3. Robustness features.
|
||||
|
||||
2.1. Bandwidth throttling. Each cell-speaking connection has a maximum
|
||||
3.1. Bandwidth throttling. Each cell-speaking connection has a maximum
|
||||
bandwidth it can use, as specified in the routers.or file. Bandwidth
|
||||
throttling occurs on both the sender side and the receiving side. The
|
||||
sending side sends cells at regularly spaced intervals (e.g., a connection
|
||||
@ -53,8 +64,49 @@ The bandwidth throttling uses TCP to push back when we stop reading.
|
||||
We extend it with token buckets to allow more flexibility for traffic
|
||||
bursts.
|
||||
|
||||
2.2. Data congestion control.
|
||||
3.2. Data congestion control. Even with the above bandwidth throttling,
|
||||
we still need to worry about congestion, either accidental or intentional.
|
||||
If a lot of people make circuits into same node, and they all come out
|
||||
through the same connection, then that connection may become saturated
|
||||
(be unable to send out data cells as quickly as it wants to). An adversary
|
||||
can make a 'put' request through the onion routing network to a webserver
|
||||
he owns, and then refuse to read any of the bytes at the webserver end
|
||||
of the circuit. These bottlenecks can propagate back through the entire
|
||||
network, mucking up everything.
|
||||
|
||||
To handle this congestion, each circuit starts out with a receive
|
||||
window at each node of 100 cells -- it is willing to receive at most 100
|
||||
cells on that circuit. (It handles each direction separately; so that's
|
||||
really 100 cells forward and 100 cells back.) The edge of the circuit
|
||||
is willing to create at most 100 cells from data coming from outside the
|
||||
onion routing network. Nodes in the middle of the circuit will tear down
|
||||
the circuit if a data cell arrives when the receive window is 0. When
|
||||
data has traversed the network, the edge node buffers it on its outbuf,
|
||||
and evaluates whether to respond with a 'sendme' acknowledgement: if its
|
||||
outbuf is not too full, and its receive window is less than 90, then it
|
||||
queues a 'sendme' cell backwards in the circuit. Each node that receives
|
||||
the sendme increments its window by 10 and passes the cell onward.
|
||||
|
||||
In practice, all the nodes in the circuit maintain a receive window
|
||||
close to 100 except the exit node, which stays around 0, periodically
|
||||
receiving a sendme and reading 10 more data cells from the webserver.
|
||||
In this way we can use pretty much all of the available bandwidth for
|
||||
data, but gracefully back off when faced with multiple circuits (a new
|
||||
sendme arrives only after some cells have traversed the entire network),
|
||||
stalled network connections, or attacks.
|
||||
|
||||
We don't need to reimplement full tcp windows, with sequence numbers,
|
||||
the ability to drop cells when we're full etc, because the tcp streams
|
||||
already guarantee in-order delivery of each cell. Rather than trying
|
||||
to build some sort of tcp-on-tcp scheme, we implement this minimal data
|
||||
congestion control; so far it's enough.
|
||||
|
||||
3.3. Router twins. In many cases when we ask for a router with a given
|
||||
address and port, we really mean a router who knows a given key. Router
|
||||
twins are two or more routers that all share the same private key. We thus
|
||||
give routers extra flexibility in choosing the next hop in the circuit: if
|
||||
some of the twins are down or slow, it can choose the more available ones.
|
||||
|
||||
Currently the code tries for the primary router first, and if it's down,
|
||||
chooses the first available twin.
|
||||
|
||||
|
@ -3,4 +3,5 @@ SUBDIRS = src
|
||||
|
||||
DIST_SUBDIRS = src
|
||||
|
||||
EXTRA_DIST = TODO
|
||||
EXTRA_DIST = TODO HACKING FAQ
|
||||
|
||||
|
13
README
13
README
@ -13,7 +13,8 @@ If you got the source from cvs:
|
||||
|
||||
If you got the source from a tarball:
|
||||
|
||||
Run ./configure, make, make install as usual.
|
||||
Run ./configure and make as usual. There isn't much point in
|
||||
'make install' yet.
|
||||
|
||||
If this doesn't work for you:
|
||||
|
||||
@ -23,7 +24,6 @@ If this doesn't work for you:
|
||||
we'll see what we can do.
|
||||
|
||||
Once you've got it compiled:
|
||||
(these notes assume you started with source from cvs)
|
||||
|
||||
It's a bit hard to figure out what to do with the binaries. If you
|
||||
want to set up your own test network, go into src/config/ and look
|
||||
@ -54,8 +54,9 @@ Once you've got it compiled:
|
||||
then ^z the wget a little bit in. The onion routers will continue
|
||||
talking for a while, queueing around 500k in the kernel-level buffers.
|
||||
When the kernel buffers are full, and the outbuf for the AP connection
|
||||
also fills, the internal congestion control will kick in and the
|
||||
exit connection will stop reading from the webserver. The circuit
|
||||
will wait until you fg the wget -- and other circuits will work just
|
||||
fine throughout.
|
||||
also fills, the internal congestion control will kick in and the exit
|
||||
connection will stop reading from the webserver. The circuit will
|
||||
wait until you fg the wget -- and other circuits will work just fine
|
||||
throughout. Then try ^z'ing the onion routers, and watch how well it
|
||||
recovers. Then try ^z'ing several of them at once. :)
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user