mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-24 04:13:28 +01:00
First cut at HACKING document
svn:r567
This commit is contained in:
parent
955c8bda2b
commit
43a2e32ace
431
doc/HACKING
431
doc/HACKING
@ -1,11 +1,418 @@
|
||||
Guide to Hacking Tor
|
||||
|
||||
0. Intro.
|
||||
Onion Routing is still very much in development stages. This document
|
||||
aims to get you started in the right direction if you want to understand
|
||||
the code, add features, fix bugs, etc.
|
||||
(As of 8 October 2003, this was all accurate. If you're reading this in
|
||||
the distant future, stuff may have changed.)
|
||||
|
||||
Read the README file first, so you can get familiar with the basics.
|
||||
0. Intro and required reading
|
||||
|
||||
Onion Routing is still very much in development stages. This document
|
||||
aims to get you started in the right direction if you want to understand
|
||||
the code, add features, fix bugs, etc.
|
||||
|
||||
Read the README file first, so you can get familiar with the basics of
|
||||
installing and running an onion router.
|
||||
|
||||
Then, skim some of the introductory materials in tor-spec.txt,
|
||||
tor-design.tex, and the Tor FAQ to learn more about how the Tor protocol
|
||||
is supposed to work. This document will assume you know about Cells,
|
||||
Circuits, Streams, Connections, Onion Routers, and Onion Proxies.
|
||||
|
||||
1. Code organization
|
||||
|
||||
1.1. The modules
|
||||
|
||||
The code is divided into two directories: ./src/common and ./src/or.
|
||||
The "common" directory contains general purpose utility functions not
|
||||
specific to onion routing. The "or" directory implements all
|
||||
onion-routing and onion-proxy specific functionality.
|
||||
|
||||
Files in ./src/common:
|
||||
|
||||
aes.[ch] -- Implements the AES cipher (with 128-bit keys and blocks),
|
||||
and a counter-mode stream cipher on top of AES. This code is
|
||||
taken from the main Rijndael distribution. (We include this
|
||||
because many people are running older versions of OpenSSL without
|
||||
AES support.)
|
||||
|
||||
crypto.[ch] -- Wrapper functions to present a consistent interface to
|
||||
public-key and symmetric cryptography operations from OpenSSL.
|
||||
|
||||
fakepoll.[ch] -- Used on systems that don't have a poll() system call;
|
||||
reimplements() poll using the select() system call.
|
||||
|
||||
log.[ch] -- Tor's logging subsystem.
|
||||
|
||||
test.h -- Macros used by unit tests.
|
||||
|
||||
torint.h -- Provides missing [u]int*_t types for environments that
|
||||
don't have stdint.h.
|
||||
|
||||
tortls.[ch] -- Wrapper functions to present a consistent interface to
|
||||
TLS, SSL, and X.509 functions from OpenSSL.
|
||||
|
||||
util.[ch] -- Miscellaneous portability and convenience functions.
|
||||
|
||||
Files in ./src/or:
|
||||
|
||||
[General-purpose modules]
|
||||
|
||||
or.h -- Common header file: includes everything, define everything.
|
||||
|
||||
buffers.c -- Implements a generic buffer interface. Buffers are
|
||||
fairly opaque string holders that can read to or flush from:
|
||||
memory, file descriptors, or TLS connections.
|
||||
|
||||
Also implements parsing functions to read HTTP and SOCKS commands
|
||||
from buffers.
|
||||
|
||||
tree.h -- A splay tree implementatio by Niels Provos. Used only by
|
||||
dns.c.
|
||||
|
||||
config.c -- Code to parse and validate the configuration file.
|
||||
|
||||
[Background processing modules]
|
||||
|
||||
cpuworker.c -- Implements a separate 'CPU worker' process to perform
|
||||
CPU-intensive tasks in the background, so as not interrupt the
|
||||
onion router. (OR only)
|
||||
|
||||
dns.c -- Implements a farm of 'DNS worker' processes to perform DNS
|
||||
lookups for onion routers and cache the results. [This needs to
|
||||
be done in the background because of the lack of a good,
|
||||
ubiquitous asynchronous DNS implementation.] (OR only)
|
||||
|
||||
[Directory-related functionality.]
|
||||
|
||||
directory.c -- Code to send and fetch directories and router
|
||||
descriptors via HTTP. Directories use dirserv.c to generate the
|
||||
results; clients use routers.c to parse them.
|
||||
|
||||
dirserv.c -- Code to manage directory contents and generate
|
||||
directories. [Directory only]
|
||||
|
||||
routers.c -- Code to parse directories and router descriptors; and to
|
||||
generate a router descriptor corresponding to this OR's
|
||||
capabilities. Also presents some high-level interfaces for
|
||||
managing an OR or OP's view of the directory.
|
||||
|
||||
[Circuit-related modules.]
|
||||
|
||||
circuit.c -- Code to create circuits, manage circuits, and route
|
||||
relay cells along circuits.
|
||||
|
||||
onion.c -- Code to generate and respond to "onion skins".
|
||||
|
||||
[Core protocol implementation.]
|
||||
|
||||
connection.c -- Code used in common by all connection types. See
|
||||
1.2. below for more general information about connections.
|
||||
|
||||
connection_edge.c -- Code used only by edge connections.
|
||||
|
||||
command.c -- Code to handle specific cell types. [OR only]
|
||||
|
||||
connection_or.c -- Code to implement cell-speaking connections.
|
||||
|
||||
[Toplevel modules.]
|
||||
|
||||
main.c -- Toplevel module. Initializes keys, handles signals,
|
||||
multiplexes between connections, implements main loop, and drives
|
||||
scheduled events.
|
||||
|
||||
tor_main.c -- Stub module containing a main() function. Allows unit
|
||||
test binary to link against main.c
|
||||
|
||||
[Unit tests]
|
||||
|
||||
test.c -- Contains unit tests for many pieces of the lower level Tor
|
||||
modules.
|
||||
|
||||
1.2. All about connections
|
||||
|
||||
All sockets in Tor are handled as different types of nonblocking
|
||||
'connections'. (What the Tor spec calls a "Connection", the code refers
|
||||
to as a "Cell-speaking" or "OR" connection.)
|
||||
|
||||
Connections are implemented by the connection_t struct, defined in or.h.
|
||||
Not every kind of connection uses all the fields in connection_t; see
|
||||
the comments in or.h and the assertions in assert_connection_ok() for
|
||||
more information.
|
||||
|
||||
Every connection has a type and a state. Connections never change their
|
||||
type, but can go through many state changes in their lifetime.
|
||||
|
||||
The connection types break down as follows:
|
||||
|
||||
[Cell-speaking connections]
|
||||
CONN_TYPE_OR -- A bidirectional TLS connection transmitting a
|
||||
sequence of cells. May be from an OR to an OR, or from an OP to
|
||||
an OR.
|
||||
|
||||
[Edge connections]
|
||||
CONN_TYPE_EXIT -- A TCP connection from an onion router to a
|
||||
Stream's destination. [OR only]
|
||||
CONN_TYPE_AP -- A SOCKS proxy connection from the end user to the
|
||||
onion proxy. [OP only]
|
||||
|
||||
[Listeners]
|
||||
CONN_TYPE_OR_LISTENER [OR only]
|
||||
CONN_TYPE_AP_LISTENER [OP only]
|
||||
CONN_TYPE_DIR_LISTENER [Directory only]
|
||||
-- Bound network sockets, waiting for incoming connections.
|
||||
|
||||
[Internal]
|
||||
CONN_TYPE_DNSWORKER -- Connection from the main process to a DNS
|
||||
worker. [OR only]
|
||||
|
||||
CONN_TYPE_CPUWORKER -- Connection from the main process to a CPU
|
||||
worker. [OR only]
|
||||
|
||||
Connection states are documented in or.h.
|
||||
|
||||
Every connection has two associated input and output buffers.
|
||||
Listeners don't use them. With other connections, incoming data is
|
||||
appended to conn->inbuf, and outgoing data is taken from the front of
|
||||
conn->outbuf. Connections differ primarily in the functions called
|
||||
to fill and drain these buffers.
|
||||
|
||||
1.3. All about circuits.
|
||||
|
||||
A circuit_t structure fills two roles. First, a circuit_t links two
|
||||
connections together: either an edge connection and an OR connection,
|
||||
or two OR connections. (When joined to an OR connection, a circuit_t
|
||||
affects only cells sent to a particular ACI on that connection. When
|
||||
joined to an edge connection, a circuit_t affects all data.)
|
||||
|
||||
Second, a circuit_t holds the cipher keys and state for sending data
|
||||
along a given circuit. At the OP, it has a sequence of ciphers, each
|
||||
of which is shared with a single OR along the circuit. Separate
|
||||
ciphers are used for data going "forward" (away from the OP) and
|
||||
"backward" (towards the OP). At the OR, a circuit has only two stream
|
||||
ciphers: one for data going forward, and one for data going backward.
|
||||
|
||||
1.4. Asynchronous IO and the main loop.
|
||||
|
||||
Tor uses the poll(2) system call [or a substitute based on select(2)]
|
||||
to handle nonblocking (asynchonous) IO. If you're not familiar with
|
||||
nonblocking IO, check out the links at the end of this document.
|
||||
|
||||
All asynchronous logic is handled in main.c. The functions
|
||||
'connection_add', 'connection_set_poll_socket', and 'connection_remove'
|
||||
manage an array of connection_t*, and keep in synch with the array of
|
||||
struct pollfd required by poll(2). (This array of connection_t* is
|
||||
accessible via get_connection_array, but users should generally call
|
||||
one of the 'connection_get_by_*' functions in connection.c to look up
|
||||
individual connections.)
|
||||
|
||||
To trap read and write events, connections call the functions
|
||||
'connection_{is|stop|start}_{reading|writing}'.
|
||||
|
||||
When connections get events, main.c calls conn_read and conn_write.
|
||||
These functions dispatch events to connection_handle_read and
|
||||
connection_handle_write as appropriate.
|
||||
|
||||
When connection need to be closed, they can respond in two ways. Most
|
||||
simply, they can make connection_handle_* to return an error (-1),
|
||||
which will make conn_{read|write} close them. But if the connection
|
||||
needs to stay around [XXXX explain why] until the end of the current
|
||||
iteration of the main loop, it marks itself for closing by setting
|
||||
conn->connection_marked_for_close.
|
||||
|
||||
The main loop handles several other operations: First, it checks
|
||||
whether any signals have been received that require a response (HUP,
|
||||
KILL, USR1, CHLD). Second, it calls prepare_for_poll to handle recurring
|
||||
tasks and compute the necessary poll timeout. These recurring tasks
|
||||
include periodically fetching the directory, timing out unused
|
||||
circuits, incrementing flow control windows and re-enabling connections
|
||||
that were blocking for more bandwidth, and maintaining statistics.
|
||||
|
||||
A word about TLS: Using TLS on OR connections complicates matters in
|
||||
two ways. First, a TLS stream has its own read buffer independent of
|
||||
the connection's read buffer. (TLS needs to read an entire frame from
|
||||
the network before it can decrypt any data. Thus, trying to read 1
|
||||
byte from TLS can require that several KB be read from the network and
|
||||
decrypted. The extra data is stored in TLS's decrypt buffer.) Second,
|
||||
the TLS stream's events do not correspond directly to network events:
|
||||
sometimes, before a TLS stream can read, the network must be ready to
|
||||
write -- or vice versa.
|
||||
|
||||
[XXXX describe the consequences of this for OR connections.]
|
||||
|
||||
1.5. How data flows (An illustration.)
|
||||
|
||||
Suppose an OR receives 50 bytes along an OR connection. These 50 bytes
|
||||
complete a data relay cell, which gets decrypted and delivered to an
|
||||
edge connection. Here we give a possible call sequence for the
|
||||
delivery of this data.
|
||||
|
||||
(This may be outdated quickly.)
|
||||
|
||||
do_main_loop -- Calls poll(2), receives a POLLIN event on a struct
|
||||
pollfd, then calls:
|
||||
conn_read -- Looks up the corresponding connection_t, and calls:
|
||||
connection_handle_read -- Calls:
|
||||
connection_read_to_buf -- Notices that it has an OR connection so:
|
||||
read_to_buf_tls -- Pulls data from the TLS stream onto conn->inbuf.
|
||||
connection_process_inbuf -- Notices that it has an OR connection so:
|
||||
connection_or_process_inbuf -- Checks whether conn is open, and calls:
|
||||
connection_process_cell_from_inbuf -- Notices it has enough data for
|
||||
a cell, then calls:
|
||||
connection_fetch_from_buf -- Pulls the cell from the buffer.
|
||||
cell_unpack -- Decodes the raw cell into a cell_t
|
||||
command_process_cell -- Notices it is a relay cell, so calls:
|
||||
command_process_relay_cell -- Looks up the circuit for the cell,
|
||||
makes sure the circuit is live, then passes the cell to:
|
||||
circuit_deliver_relay_cell -- Passes the cell to each of:
|
||||
relay_crypt -- Strips a layer of encryption from the cell and
|
||||
notice that the cell is for local delivery.
|
||||
connection_edge_process_relay_cell -- extracts the cell's
|
||||
relay command, and makes sure the edge connection is
|
||||
open. Since it has a DATA cell and an open connection,
|
||||
calls:
|
||||
circuit_consider_sending_sendme -- [XXX]
|
||||
connection_write_to_buf -- To place the data on the outgoing
|
||||
buffer of the correct edge connection, by calling:
|
||||
connection_start_writing -- To tell the main poll loop about
|
||||
the pending data.
|
||||
write_to_buf -- To actually place the outgoing data on the
|
||||
edge connection.
|
||||
connection_consider_sending_sendme -- [XXX]
|
||||
|
||||
[In a subsequent iteration, main notices that the edge connection is
|
||||
ready for writing.]
|
||||
|
||||
do_main_loop -- Calls poll(2), receives a POLLOUT event on a struct
|
||||
pollfd, then calls:
|
||||
conn_write -- Looks up the corresponding connection_t, and calls:
|
||||
connection_handle_write -- This isn't a TLS connection, so calls:
|
||||
flush_buf -- Delivers data from the edge connection's outbuf to the
|
||||
network.
|
||||
connection_wants_to_flush -- Reports that all data has been flushed.
|
||||
connection_finished_flushing -- Notices the connection is an exit,
|
||||
and calls:
|
||||
connection_edge_finished_flushing -- The connection is open, so it
|
||||
calls:
|
||||
connection_stop_writing -- Tells the main poll loop that this
|
||||
connection has no more data to write.
|
||||
connection_consider_sending_sendme -- [XXX]
|
||||
|
||||
1.6. Routers, descriptors, and directories
|
||||
|
||||
All Tor processes need to keep track of a list of onion routers, for
|
||||
several reasons:
|
||||
- OPs need to establish connections and circuits to ORs.
|
||||
- ORs need to establish connections to other ORs.
|
||||
- OPs and ORs need to fetch directories from a directory servers.
|
||||
- ORs need to upload their descriptors to directory servers.
|
||||
- Directory servers need to know which ORs are allowed onto the
|
||||
network, what the descriptors are for those ORs, and which of
|
||||
those ORs are currently live.
|
||||
|
||||
Thus, every Tor process keeps track of a list of all the ORs it knows
|
||||
in a static variable 'directory' in the routers.c module. This
|
||||
variable contains a routerinfo_t object for each known OR. On startup,
|
||||
the directory is initialized to a list of known directory servers (via
|
||||
router_get_list_from_file()). Later, the directory is updated via
|
||||
router_get_dir_from_string(). (OPs and ORs retrieve fresh directories
|
||||
from directory servers; directory servers generate their own.)
|
||||
|
||||
Every OR must periodically regenerate a router descriptor for itself.
|
||||
The descriptor and the corresponding routerinfo_t are stored in the
|
||||
'desc_routerinfo' and 'descriptor' static variables in routers.c.
|
||||
|
||||
Additionally, a directory server keeps track of a list of the
|
||||
router descriptors it knows in a separte list in dirserv.c. It
|
||||
uses this list, plus the open connections in main.c, to build
|
||||
directories.
|
||||
|
||||
1.7. Data model
|
||||
|
||||
[XXX]
|
||||
|
||||
1.8. Flow control
|
||||
|
||||
[XXX]
|
||||
|
||||
2. Coding conventions
|
||||
|
||||
2.1. Details
|
||||
|
||||
Use tor_malloc, tor_strdup, and tor_gettimeofday instead of their
|
||||
generic equivalents. (They always succeed or exit.)
|
||||
|
||||
Use INLINE instead of 'inline', so that we work properly on windows.
|
||||
|
||||
2.2. Calling and naming conventions
|
||||
|
||||
Whenever possible, functions should return -1 on error and and 0 on
|
||||
success.
|
||||
|
||||
For multi-word identifiers, use lowercase words combined with
|
||||
underscores. (e.g., "multi_word_identifier"). Use ALL_CAPS for macros and
|
||||
constants.
|
||||
|
||||
Typenames should end with "_t".
|
||||
|
||||
Function names should be prefixed with a module name or object name. (In
|
||||
general, code to manipulate an object should be a module with the same
|
||||
name as the object, so it's hard to tell which convention is used.)
|
||||
|
||||
Functions that do things should have imperative-verb names
|
||||
(e.g. buffer_clear, buffer_resize); functions that return booleans should
|
||||
have predicate names (e.g. buffer_is_empty, buffer_needs_resizing).
|
||||
|
||||
2.3. What To Optimize
|
||||
|
||||
Don't optimize anything if it's not in the critical path. Right now,
|
||||
the critical path seems to be AES, logging, and the network itself.
|
||||
Feel free to do your own profiling to determine otherwise.
|
||||
|
||||
2.4. Log conventions
|
||||
|
||||
Log convention: use only these four log severities.
|
||||
|
||||
ERR is if something fatal just happened.
|
||||
WARNING is something bad happened, but we're still running. The
|
||||
bad thing is either a bug in the code, an attack or buggy
|
||||
protocol/implementation of the remote peer, etc. The operator should
|
||||
examine the bad thing and try to correct it.
|
||||
(No error or warning messages should be expected during normal OR or OP
|
||||
operation.. I expect most people to run on -l warning eventually. If a
|
||||
library function is currently called such that failure always means
|
||||
ERR, then the library function should log WARNING and let the caller
|
||||
log ERR.)
|
||||
INFO means something happened (maybe bad, maybe ok), but there's nothing
|
||||
you need to (or can) do about it.
|
||||
DEBUG is for everything louder than INFO.
|
||||
|
||||
[XXX Proposed convention: every messages of severity INFO or higher should
|
||||
either (A) be intelligible to end-users who don't know the Tor source; or
|
||||
(B) somehow inform the end-users that they aren't expected to understand
|
||||
the message (perhaps with a string like "internal error"). Option (A) is
|
||||
to be preferred to option (B). -NM]
|
||||
|
||||
3. References
|
||||
|
||||
About Tor
|
||||
|
||||
See http://freehaven.net/tor/
|
||||
http://freehaven.net/tor/cvs/doc/tor-spec.txt
|
||||
http://freehaven.net/tor/cvs/doc/tor-dessign.tex
|
||||
http://freehaven.net/tor/cvs/doc/FAQ
|
||||
|
||||
About anonymity
|
||||
|
||||
See http://freehaven.net/anonbib/
|
||||
|
||||
About nonblocking IO
|
||||
|
||||
[XXX insert references]
|
||||
|
||||
|
||||
# ======================================================================
|
||||
# Old HACKING document; merge into the above, move into tor-design.tex,
|
||||
# or delete.
|
||||
# ======================================================================
|
||||
The pieces.
|
||||
|
||||
Routers. Onion routers, as far as the 'tor' program is concerned,
|
||||
@ -99,20 +506,6 @@ Robustness features.
|
||||
Currently the code tries for the primary router first, and if it's down,
|
||||
chooses the first available twin.
|
||||
|
||||
Coding conventions:
|
||||
|
||||
Log convention: use only these four log severities.
|
||||
|
||||
ERR is if something fatal just happened.
|
||||
WARNING is something bad happened, but we're still running. The
|
||||
bad thing is either a bug in the code, an attack or buggy
|
||||
protocol/implementation of the remote peer, etc. The operator should
|
||||
examine the bad thing and try to correct it.
|
||||
(No error or warning messages should be expected. I expect most people
|
||||
to run on -l warning eventually. If a library function is currently
|
||||
called such that failure always means ERR, then the library function
|
||||
should log WARNING and let the caller log ERR.)
|
||||
INFO means something happened (maybe bad, maybe ok), but there's nothing
|
||||
you need to (or can) do about it.
|
||||
DEBUG is for everything louder than INFO.
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user