mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-24 12:23:32 +01:00
5855ca92a3
Add notes on dataflow (originally written for Dan) to HACKING document. svn:r13749
202 lines
7.2 KiB
Plaintext
202 lines
7.2 KiB
Plaintext
|
|
0. The buildbot.
|
|
|
|
http://tor-buildbot.freehaven.net:8010/
|
|
|
|
0.1. Useful command-lines that are non-trivial to reproduce but can
|
|
help with tracking bugs or leaks.
|
|
|
|
dmalloc -l ~/dmalloc.log
|
|
(run the commands it tells you)
|
|
./configure --with-dmalloc
|
|
|
|
valgrind --leak-check=yes --error-limit=no --show-reachable=yes src/or/tor
|
|
|
|
1. Coding conventions
|
|
|
|
1.0. Whitespace and C conformance
|
|
|
|
Invoke "make check-spaces" from time to time, so it can tell you about
|
|
deviations from our C whitespace style. Generally, we use:
|
|
- Unix-style line endings
|
|
- K&R-style indentation
|
|
- No space before newlines
|
|
- A blank line at the end of each file
|
|
- Never more than one blank line in a row
|
|
- Always spaces, never tabs
|
|
- No more than 79-columns per line.
|
|
- Two spaces per indent.
|
|
- A space between control keywords and their corresponding paren
|
|
"if (x)", "while (x)", and "switch (x)", never "if(x)", "while(x)", or
|
|
"switch(x)".
|
|
- A space between anything and an open brace.
|
|
- No space between a function name and an opening paren. "puts(x)", not
|
|
"puts (x)".
|
|
- Function declarations at the start of the line.
|
|
|
|
We try hard to build without warnings everywhere. In particular, if you're
|
|
using gcc, you should invoke the configure script with the option
|
|
"--enable-gcc-warnings". This will give a bunch of extra warning flags to
|
|
the compiler, and help us find divergences from our preferred C style.
|
|
|
|
1.1. Details
|
|
|
|
Use tor_malloc, tor_free, tor_strdup, and tor_gettimeofday instead of their
|
|
generic equivalents. (They always succeed or exit.)
|
|
|
|
You can get a full list of the compatibility functions that Tor provides
|
|
by looking through src/common/util.h and src/common/compat.h.
|
|
|
|
Use 'INLINE' instead of 'inline', so that we work properly on Windows.
|
|
|
|
1.2. Calling and naming conventions
|
|
|
|
Whenever possible, functions should return -1 on error and 0 on success.
|
|
|
|
For multi-word identifiers, use lowercase words combined with
|
|
underscores. (e.g., "multi_word_identifier"). Use ALL_CAPS for macros and
|
|
constants.
|
|
|
|
Typenames should end with "_t".
|
|
|
|
Function names should be prefixed with a module name or object name. (In
|
|
general, code to manipulate an object should be a module with the same
|
|
name as the object, so it's hard to tell which convention is used.)
|
|
|
|
Functions that do things should have imperative-verb names
|
|
(e.g. buffer_clear, buffer_resize); functions that return booleans should
|
|
have predicate names (e.g. buffer_is_empty, buffer_needs_resizing).
|
|
|
|
1.3. What To Optimize
|
|
|
|
Don't optimize anything if it's not in the critical path. Right now,
|
|
the critical path seems to be AES, logging, and the network itself.
|
|
Feel free to do your own profiling to determine otherwise.
|
|
|
|
1.4. Log conventions
|
|
|
|
http://wiki.noreply.org/noreply/TheOnionRouter/TorFAQ#LogLevels
|
|
|
|
No error or warning messages should be expected during normal OR or OP
|
|
operation.
|
|
|
|
If a library function is currently called such that failure always
|
|
means ERR, then the library function should log WARN and let the caller
|
|
log ERR.
|
|
|
|
[XXX Proposed convention: every message of severity INFO or higher should
|
|
either (A) be intelligible to end-users who don't know the Tor source; or
|
|
(B) somehow inform the end-users that they aren't expected to understand
|
|
the message (perhaps with a string like "internal error"). Option (A) is
|
|
to be preferred to option (B). -NM]
|
|
|
|
1.5. Doxygen
|
|
|
|
We use the 'doxygen' utility to generate documentation from our
|
|
source code. Here's how to use it:
|
|
|
|
1. Begin every file that should be documented with
|
|
/**
|
|
* \file filename.c
|
|
* \brief Short desccription of the file.
|
|
**/
|
|
|
|
(Doxygen will recognize any comment beginning with /** as special.)
|
|
|
|
2. Before any function, structure, #define, or variable you want to
|
|
document, add a comment of the form:
|
|
|
|
/** Describe the function's actions in imperative sentences.
|
|
*
|
|
* Use blank lines for paragraph breaks
|
|
* - and
|
|
* - hyphens
|
|
* - for
|
|
* - lists.
|
|
*
|
|
* Write <b>argument_names</b> in boldface.
|
|
*
|
|
* \code
|
|
* place_example_code();
|
|
* between_code_and_endcode_commands();
|
|
* \endcode
|
|
*/
|
|
|
|
3. Make sure to escape the characters "<", ">", "\", "%" and "#" as "\<",
|
|
"\>", "\\", "\%", and "\#".
|
|
|
|
4. To document structure members, you can use two forms:
|
|
|
|
struct foo {
|
|
/** You can put the comment before an element; */
|
|
int a;
|
|
int b; /**< Or use the less-than symbol to put the comment
|
|
* after the element. */
|
|
};
|
|
|
|
5. To generate documentation from the Tor source code, type:
|
|
|
|
$ doxygen -g
|
|
|
|
To generate a file called 'Doxyfile'. Edit that file and run
|
|
'doxygen' to generate the API documentation.
|
|
|
|
6. See the Doxygen manual for more information; this summary just
|
|
scratches the surface.
|
|
|
|
2. Code notes
|
|
|
|
2.1. Dataflows
|
|
|
|
2.1.1. How Incoming data is handled
|
|
|
|
There are two paths for data arriving at Tor over the network: regular
|
|
TCP data, and DNS.
|
|
|
|
2.1.1.1. TCP.
|
|
|
|
When Tor takes information over the network, it uses the functions
|
|
read_to_buf() and read_to_buf_tls() in buffers.c. These read from a
|
|
socket or an SSL* into a buffer_t, which is an mbuf-style linkedlist
|
|
of memory chunks.
|
|
|
|
read_to_buf() and read_to_buf_tls() are called only from
|
|
connection_read_to_buf() in connection.c. It takes a connection_t
|
|
pointer, and reads data into it over the network, up to the
|
|
connection's current bandwidth limits. It places that data into the
|
|
"inbuf" field of the connection, and then:
|
|
- Adjusts the connection's want-to-read/want-to-write status as
|
|
appropriate.
|
|
- Increments the read and written counts for the connection as
|
|
appropriate.
|
|
- Adjusts bandwidth buckets as appropriate.
|
|
|
|
connection_read_to_buf() is called only from connection_handle_read().
|
|
The connection_handle_read() function is called whenever libevent
|
|
decides (based on select, poll, epoll, kqueue, etc) that there is data
|
|
to read from a connection. If any data is read,
|
|
connection_handle_read() calls connection_process_inbuf() to see if
|
|
any of the data can be processed. If the connection was closed,
|
|
connection_handle_read() calls connection_reached_eof().
|
|
|
|
Connection_process_inbuf() and connection_reached_eof() both dispatch
|
|
based on the connection type to determine what to do with the data
|
|
that's just arrived on the connection's inbuf field. Each type of
|
|
connection has its own version of these functions. For example,
|
|
directory connections process incoming data in
|
|
connection_dir_process_inbuf(), while OR connections process incoming
|
|
data in connection_or_process_inbuf(). These
|
|
connection_*_process_inbuf() functions extract data from the
|
|
connection's inbuf field (a buffer_t), using functions from buffers.c.
|
|
Some of these accessor functions are straightforward data extractors
|
|
(like fetch_from_buf()); others do protocol-specific parsing.
|
|
|
|
|
|
2.1.1.2. DNS
|
|
|
|
Tor launches (and optionally accepts) DNS requests using the code in
|
|
eventdns.c, which is a copy of libevent's evdns.c. (We don't use
|
|
libevent's version because it is not yet in the versions of libevent
|
|
all our users have.) DNS replies are read in nameserver_read();
|
|
DNS queries are read in server_port_read().
|