identical FAQ and HACKING files, now in /doc

svn:r194
This commit is contained in:
Roger Dingledine 2003-03-18 03:28:03 +00:00
parent f9c541bfcf
commit 8fb1056a7c
2 changed files with 228 additions and 0 deletions

111
doc/FAQ Normal file
View File

@ -0,0 +1,111 @@
The Onion Routing (TOR) Frequently Asked Questions
--------------------------------------------------
1. General.
1.1. What is tor?
Tor is an implementation of version 2 of Onion Routing.
Onion Routing is a connection-oriented anonymizing communication
service. Users build a layered block of asymmetric encryptions
(an "onion") which describes a source-routed path through a set of
nodes. Those nodes build a "virtual circuit" through the network, in which
each node knows its predecessor and successor, but no others. Traffic
flowing down the circuit is unwrapped by a symmetric key at each node
which reveals the downstream node.
Basically tor provides a distributed network of servers ("onion
routers"). Users bounce their tcp streams (web traffic, ftp, ssh, etc)
around the routers, and recipients, observers, and even the routers
themselves have difficulty tracking the source of the stream.
1.2. Why's it called tor?
Because tor is the onion routing system. I kept telling people I was
working on onion routing, and they said "Neat. Which one?" Even if onion
routing has become a standard household term, this is the actual onion
routing project, started out of the Naval Research Lab.
(Theories about recursive acronyms are ok too.)
2. Compiling and installing.
[Read the README file for now; check back here once we've got packages/etc
for you.]
3. Running tor.
3.1. What's this about roles? What kind of server should I run?
The same executable ("or") functions as both client and server, depending
on the value of the config variable named 'Role'. Role represents a
combination of which tasks this particular tor server will do. The default
Role (role 15) is an onion router: it listens for onion routers, listens
for onion proxies, listens for application proxies, and it connects to
all other onion routers it learns about. A directory server (role 63)
does all of the above and also serves directory requests. A simple
onion proxy, on the other hand (role 8), only listens for application
proxies. See part 3.1 of the HACKING document for more technical details.
3.2. So I can just run a full onion router and join the network?
No. Users should run just an onion proxy (use the 'oprc' config file).
If you start up a full onion router, the rest of the routers in the
system won't recognize you, so they will reject your handshake attempts.
3.3. How do I join the network then?
If you just want to use the onion routing network, you can run a proxy
and you're all set. If you want to run a router, you must convince
the directory server operators (currently arma@mit.edu) that you're a
trustworthy person. From there, the operators add you to the directory,
which propagates out to the rest of the network. All nodes will know
about you within an hour.
3.4. I want to run a directory server too.
If you run a very reliable node, you plan to be around for a long time,
and you want to spend some time ensuring that router operators are
people we know and like, we may want you to run a directory server
too. We must manually add you to the 'dirservers' file that's part of
the distribution; users will only know about you when they upgrade to
a new version. Of course, you can always just start up your router as a
directory server too --- but users won't know to ask you for directories,
and more importantly, you'll never learn from the real directory servers
about recently joined routers.
4. Development.
4.1. Who's doing this?
4.2. Can I help?
4.3. I've got a bug.
5. Anonymity.
5.1. So I'm totally anonymous if I use tor?
5.2. Where can I learn more about anonymity?
6. Comparison to related projects.
6.1. Onion Routing.
Tor *is* onion routing.
6.2. Freedom.
7. Protocol and application support.
7.1. http? ftp? udp? socks? mozilla?

117
doc/HACKING Normal file
View File

@ -0,0 +1,117 @@
0. Intro.
Onion Routing is still very much in development stages. This document
aims to get you started in the right direction if you want to understand
the code, add features, fix bugs, etc.
Read the README file first, so you can get familiar with the basics.
1. The programs.
1.1. "or". This is the main program here. It functions as both a server
and a client, depending on which config file you give it. ...
2. The pieces.
2.1. Routers. Onion routers, as far as the 'or' program is concerned,
are a bunch of data items that are loaded into the router_array when
the program starts. After it's loaded, the router information is never
changed. When a new OR connection is started (see below), the relevant
information is copied from the router struct to the connection struct.
2.2. Connections. A connection is a long-standing tcp socket between
nodes. A connection is named based on what it's connected to -- an "OR
connection" has an onion router on the other end, an "OP connection" has
an onion proxy on the other end, an "exit connection" has a website or
other server on the other end, and an "AP connection" has an application
proxy (and thus a user) on the other end.
2.3. Circuits. A circuit is a single conversation between two
participants over the onion routing network. One end of the circuit has
an AP connection, and the other end has an exit connection. AP and exit
connections have only one circuit associated with them (and thus these
connection types are closed when the circuit is closed), whereas OP and
OR connections multiplex many circuits at once, and stay standing even
when there are no circuits running over them.
2.4. Cells. Some connections, specifically OR and OP connections, speak
"cells". This means that data over that connection is bundled into 128
byte packets (8 bytes of header and 120 bytes of payload). Each cell has
a type, or "command", which indicates what it's for.
3. Important parameters in the code.
3.1. Role.
4. Robustness features.
4.1. Bandwidth throttling. Each cell-speaking connection has a maximum
bandwidth it can use, as specified in the routers.or file. Bandwidth
throttling occurs on both the sender side and the receiving side. The
sending side sends cells at regularly spaced intervals (e.g., a connection
with a bandwidth of 12800B/s would queue a cell every 10ms). The receiving
side protects against misbehaving servers that send cells more frequently,
by using a simple token bucket:
Each connection has a token bucket with a specified capacity. Tokens are
added to the bucket each second (when the bucket is full, new tokens
are discarded.) Each token represents permission to receive one byte
from the network --- to receive a byte, the connection must remove a
token from the bucket. Thus if the bucket is empty, that connection must
wait until more tokens arrive. The number of tokens we add enforces a
longterm average rate of incoming bytes, yet we still permit short-term
bursts above the allowed bandwidth. Currently bucket sizes are set to
ten seconds worth of traffic.
The bandwidth throttling uses TCP to push back when we stop reading.
We extend it with token buckets to allow more flexibility for traffic
bursts.
4.2. Data congestion control. Even with the above bandwidth throttling,
we still need to worry about congestion, either accidental or intentional.
If a lot of people make circuits into same node, and they all come out
through the same connection, then that connection may become saturated
(be unable to send out data cells as quickly as it wants to). An adversary
can make a 'put' request through the onion routing network to a webserver
he owns, and then refuse to read any of the bytes at the webserver end
of the circuit. These bottlenecks can propagate back through the entire
network, mucking up everything.
To handle this congestion, each circuit starts out with a receive
window at each node of 100 cells -- it is willing to receive at most 100
cells on that circuit. (It handles each direction separately; so that's
really 100 cells forward and 100 cells back.) The edge of the circuit
is willing to create at most 100 cells from data coming from outside the
onion routing network. Nodes in the middle of the circuit will tear down
the circuit if a data cell arrives when the receive window is 0. When
data has traversed the network, the edge node buffers it on its outbuf,
and evaluates whether to respond with a 'sendme' acknowledgement: if its
outbuf is not too full, and its receive window is less than 90, then it
queues a 'sendme' cell backwards in the circuit. Each node that receives
the sendme increments its window by 10 and passes the cell onward.
In practice, all the nodes in the circuit maintain a receive window
close to 100 except the exit node, which stays around 0, periodically
receiving a sendme and reading 10 more data cells from the webserver.
In this way we can use pretty much all of the available bandwidth for
data, but gracefully back off when faced with multiple circuits (a new
sendme arrives only after some cells have traversed the entire network),
stalled network connections, or attacks.
We don't need to reimplement full tcp windows, with sequence numbers,
the ability to drop cells when we're full etc, because the tcp streams
already guarantee in-order delivery of each cell. Rather than trying
to build some sort of tcp-on-tcp scheme, we implement this minimal data
congestion control; so far it's enough.
4.3. Router twins. In many cases when we ask for a router with a given
address and port, we really mean a router who knows a given key. Router
twins are two or more routers that all share the same private key. We thus
give routers extra flexibility in choosing the next hop in the circuit: if
some of the twins are down or slow, it can choose the more available ones.
Currently the code tries for the primary router first, and if it's down,
chooses the first available twin.