Merge branch 'doxygen_libs'

This commit is contained in:
Nick Mathewson 2019-11-05 08:05:49 -05:00
commit 8933789fef
76 changed files with 1031 additions and 809 deletions

View File

@ -256,6 +256,8 @@ TAB_SIZE = 8
ALIASES = ALIASES =
ALIASES += refdir{1}="\ref @top_srcdir@/src/\1 \"\1\""
# This tag can be used to specify a number of word-keyword mappings (TCL only). # This tag can be used to specify a number of word-keyword mappings (TCL only).
# A mapping has the form "name=value". For example adding "class=itcl::class" # A mapping has the form "name=value". For example adding "class=itcl::class"
# will allow you to use the command class in the itcl::class meaning. # will allow you to use the command class in the itcl::class meaning.

View File

@ -1,124 +1,6 @@
## Overview ## ## Overview ##
This document describes the general structure of the Tor codebase, how
it fits together, what functionality is available for extending Tor,
and gives some notes on how Tor got that way.
Tor remains a work in progress: We've been working on it for nearly two
decades, and we've learned a lot about good coding since we first
started. This means, however, that some of the older pieces of Tor will
have some "code smell" in them that could stand a brisk
refactoring. So when I describe a piece of code, I'll sometimes give a
note on how it got that way, and whether I still think that's a good
idea.
The first drafts of this document were written in the Summer and Fall of
2015, when Tor 0.2.6 was the most recent stable version, and Tor 0.2.7
was under development. There is a revision in progress (as of late
2019), to bring it up to pace with Tor as of version 0.4.2. If you're
reading this far in the future, some things may have changed. Caveat
haxxor!
This document is not an overview of the Tor protocol. For that, see the
design paper and the specifications at https://spec.torproject.org/ .
For more information about Tor's coding standards and some helpful
development tools, see doc/HACKING in the Tor repository.
### The very high level ###
Ultimately, Tor runs as an event-driven network daemon: it responds to
network events, signals, and timers by sending and receiving things over
the network. Clients, relays, and directory authorities all use the
same codebase: the Tor process will run as a client, relay, or authority
depending on its configuration.
Tor has a few major dependencies, including Libevent (used to tell which
sockets are readable and writable), OpenSSL or NSS (used for many encryption
functions, and to implement the TLS protocol), and zlib (used to
compress and uncompress directory information).
Most of Tor's work today is done in a single event-driven main thread.
Tor also spawns one or more worker threads to handle CPU-intensive
tasks. (Right now, this only includes circuit encryption and the more
expensive compression algorithms.)
On startup, Tor initializes its libraries, reads and responds to its
configuration files, and launches a main event loop. At first, the only
events that Tor listens for are a few signals (like TERM and HUP), and
one or more listener sockets (for different kinds of incoming
connections). Tor also configures several timers to handle periodic
events. As Tor runs over time, other events will open, and new events
will be scheduled.
The codebase is divided into a few top-level subdirectories, each of
which contains several sub-modules.
* `src/ext` -- Code maintained elsewhere that we include in the Tor
source distribution.
* src/lib` -- Lower-level utility code, not necessarily tor-specific.
* `src/trunnel` -- Automatically generated code (from the Trunnel
tool): used to parse and encode binary formats.
* `src/core` -- Networking code that is implements the central parts of
the Tor protocol and main loop.
* `src/feature` -- Aspects of Tor (like directory management, running a
relay, running a directory authorities, managing a list of nodes,
running and using onion services) that are built on top of the
mainloop code.
* `src/app` -- Highest-level functionality; responsible for setting up
and configuring the Tor daemon, making sure all the lower-level
modules start up when required, and so on.
* `src/tools` -- Binaries other than Tor that we produce. Currently this
is tor-resolve, tor-gencert, and the tor_runner.o helper module.
* `src/test` -- unit tests, regression tests, and a few integration
tests.
In theory, the above parts of the codebase are sorted from highest-level to
lowest-level, where high-level code is only allowed to invoke lower-level
code, and lower-level code never includes or depends on code of a higher
level. In practice, this refactoring is incomplete: The modules in `src/lib`
are well-factored, but there are many layer violations ("upward
dependencies") in `src/core` and `src/feature`. We aim to eliminate those
over time.
### Some key high-level abstractions ###
The most important abstractions at Tor's high-level are Connections,
Channels, Circuits, and Nodes.
A 'Connection' represents a stream-based information flow. Most
connections are TCP connections to remote Tor servers and clients. (But
as a shortcut, a relay will sometimes make a connection to itself
without actually using a TCP connection. More details later on.)
Connections exist in different varieties, depending on what
functionality they provide. The principle types of connection are
"edge" (eg a socks connection or a connection from an exit relay to a
destination), "OR" (a TLS stream connecting to a relay), "Directory" (an
HTTP connection to learn about the network), and "Control" (a connection
from a controller).
A 'Circuit' is persistent tunnel through the Tor network, established
with public-key cryptography, and used to send cells one or more hops.
Clients keep track of multi-hop circuits, and the cryptography
associated with each hop. Relays, on the other hand, keep track only of
their hop of each circuit.
A 'Channel' is an abstract view of sending cells to and from a Tor
relay. Currently, all channels are implemented using OR connections.
If we switch to other strategies in the future, we'll have more
connection types.
A 'Node' is a view of a Tor instance's current knowledge and opinions
about a Tor relay or bridge.
### The rest of this document. ### ### The rest of this document. ###

View File

@ -1,171 +0,0 @@
## Library code in Tor.
Most of Tor's utility code is in modules in the `src/lib` subdirectory. In
general, this code is not necessarily Tor-specific, but is instead possibly
useful for other applications.
This code includes:
* Compatibility wrappers, to provide a uniform API across different
platforms.
* Library wrappers, to provide a tor-like API over different libraries
that Tor uses for things like compression and cryptography.
* Containers, to implement some general-purpose data container types.
The modules in `src/lib` are currently well-factored: each one depends
only on lower-level modules. You can see an up-to-date list of the
modules sorted from lowest to highest level by running
`./scripts/maint/practracker/includes.py --toposort`.
As of this writing, the library modules are (from lowest to highest
level):
* `lib/cc` -- Macros for managing the C compiler and
language. Includes macros for improving compatibility and clarity
across different C compilers.
* `lib/version` -- Holds the current version of Tor.
* `lib/testsupport` -- Helpers for making test-only code and test
mocking support.
* `lib/defs` -- Lowest-level constants used in many places across the
code.
* `lib/subsys` -- Types used for declaring a "subsystem". A subsystem
is a module with support for initialization, shutdown,
configuration, and so on.
* `lib/conf` -- Types and macros used for declaring configuration
options.
* `lib/arch` -- Compatibility functions and macros for handling
differences in CPU architecture.
* `lib/err` -- Lowest-level error handling code: responsible for
generating stack traces, handling raw assertion failures, and
otherwise reporting problems that might not be safe to report
via the regular logging module.
* `lib/malloc` -- Wrappers and utilities for memory management.
* `lib/intmath` -- Utilities for integer mathematics.
* `lib/fdio` -- Utilities and compatibility code for reading and
writing data on file descriptors (and on sockets, for platforms
where a socket is not a kind of fd).
* `lib/lock` -- Compatibility code for declaring and using locks.
Lower-level than the rest of the threading code.
* `lib/ctime` -- Constant-time implementations for data comparison
and table lookup, used to avoid timing side-channels from standard
implementations of memcmp() and so on.
* `lib/string` -- Low-level compatibility wrappers and utility
functions for string manipulation.
* `lib/wallclock` -- Compatibility and utility functions for
inspecting and manipulating the current (UTC) time.
* `lib/osinfo` -- Functions for inspecting the version and
capabilities of the operating system.
* `lib/smartlist_core` -- The bare-bones pieces of our dynamic array
("smartlist") implementation. There are higher-level pieces, but
these ones are used by (and therefore cannot use) the logging code.
* `lib/log` -- Implements the logging system used by all higher-level
Tor code. You can think of this as the logical "midpoint" of the
library code: much of the higher-level code is higher-level
_because_ it uses the logging module, and much of the lower-level
code is specifically written to avoid having to log, because the
logging module depends on it.
* `lib/container` -- General purpose containers, including dynamic arrays
("smartlists"), hashtables, bit arrays, weak-reference-like "handles",
bloom filters, and a bit more.
* `lib/trace` -- A general-purpose API for introducing
function-tracing functionality into Tor. Currently not much used.
* `lib/thread` -- Threading compatibility and utility functionality,
other than low-level locks (which are in `lib/lock`) and
workqueue/threadpool code (which belongs in `lib/evloop`).
* `lib/term` -- Code for terminal manipulation functions (like
reading a password from the user).
* `lib/memarea` -- A data structure for a fast "arena" style allocator,
where the data is freed all at once. Used for parsing.
* `lib/encoding` -- Implementations for encoding data in various
formats, datatypes, and transformations.
* `lib/dispatch` -- A general-purpose in-process message delivery
system. Used by `lib/pubsub` to implement our inter-module
publish/subscribe system.
* `lib/sandbox` -- Our Linux seccomp2 sandbox implementation.
* `lib/pubsub` -- Code and macros to implement our publish/subscribe
message passing system.
* `lib/fs` -- Utility and compatibility code for manipulating files,
filenames, directories, and so on.
* `lib/confmgt` -- Code to parse, encode, and manipulate our
configuration files, state files, and so forth.
* `lib/crypt_ops` -- Cryptographic operations. This module contains
wrappers around the cryptographic libraries that we support,
and implementations for some higher-level cryptographic
constructions that we use.
* `lib/meminfo` -- Functions for inspecting our memory usage, if the
malloc implementation exposes that to us.
* `lib/time` -- Higher level time functions, including fine-gained and
monotonic timers.
* `lib/math` -- Floating-point mathematical utilities, including
compatibility code, and probability distributions.
* `lib/buf` -- A general purpose queued buffer implementation,
similar to the BSD kernel's "mbuf" structure.
* `lib/net` -- Networking code, including address manipulation,
compatibility wrappers,
* `lib/compress` -- A compatibility wrapper around several
compression libraries, currently including zlib, zstd, and lzma.
* `lib/geoip` -- Utilities to manage geoip (IP to country) lookups
and formats.
* `lib/tls` -- Compatibility wrappers around the library (NSS or
OpenSSL, depending on configuration) that Tor uses to implement the
TLS link security protocol.
* `lib/evloop` -- Tools to manage the event loop and related
functionality, in order to implement asynchronous networking,
timers, periodic events, and other scheduling tasks.
* `lib/process` -- Utilities and compatibility code to launch and
manage subprocesses.
### What belongs in lib?
In general, if you can imagine some program wanting the functionality
you're writing, even if that program had nothing to do with Tor, your
functionality belongs in lib.
If it falls into one of the existing "lib" categories, your
functionality belongs in lib.
If you are using platform-specific `#ifdef`s to manage compatibility
issues among platforms, you should probably consider whether you can
put your code into lib.

View File

@ -1,103 +0,0 @@
## Memory management
### Heap-allocation functions: lib/malloc/malloc.h
Tor imposes a few light wrappers over C's native malloc and free
functions, to improve convenience, and to allow wholescale replacement
of malloc and free as needed.
You should never use 'malloc', 'calloc', 'realloc, or 'free' on their
own; always use the variants prefixed with 'tor_'.
They are the same as the standard C functions, with the following
exceptions:
* `tor_free(NULL)` is a no-op.
* `tor_free()` is a macro that takes an lvalue as an argument and sets it to
NULL after freeing it. To avoid this behavior, you can use `tor_free_()`
instead.
* tor_malloc() and friends fail with an assertion if they are asked to
allocate a value so large that it is probably an underflow.
* It is always safe to `tor_malloc(0)`, regardless of whether your libc
allows it.
* `tor_malloc()`, `tor_realloc()`, and friends are never allowed to fail.
Instead, Tor will die with an assertion. This means that you never
need to check their return values. See the next subsection for
information on why we think this is a good idea.
We define additional general-purpose memory allocation functions as well:
* `tor_malloc_zero(x)` behaves as `calloc(1, x)`, except the it makes clear
the intent to allocate a single zeroed-out value.
* `tor_reallocarray(x,y)` behaves as the OpenBSD reallocarray function.
Use it for cases when you need to realloc() in a multiplication-safe
way.
And specific-purpose functions as well:
* `tor_strdup()` and `tor_strndup()` behaves as the underlying libc
functions, but use `tor_malloc()` instead of the underlying function.
* `tor_memdup()` copies a chunk of memory of a given size.
* `tor_memdup_nulterm()` copies a chunk of memory of a given size, then
NUL-terminates it just to be safe.
#### Why assert on allocation failure?
Why don't we allow `tor_malloc()` and its allies to return NULL?
First, it's error-prone. Many programmers forget to check for NULL return
values, and testing for `malloc()` failures is a major pain.
Second, it's not necessarily a great way to handle OOM conditions. It's
probably better (we think) to have a memory target where we dynamically free
things ahead of time in order to stay under the target. Trying to respond to
an OOM at the point of `tor_malloc()` failure, on the other hand, would involve
a rare operation invoked from deep in the call stack. (Again, that's
error-prone and hard to debug.)
Third, thanks to the rise of Linux and other operating systems that allow
memory to be overcommitted, you can't actually ever rely on getting a NULL
from `malloc()` when you're out of memory; instead you have to use an approach
closer to tracking the total memory usage.
#### Conventions for your own allocation functions.
Whenever you create a new type, the convention is to give it a pair of
`x_new()` and `x_free_()` functions, named after the type.
Calling `x_free(NULL)` should always be a no-op.
There should additionally be an `x_free()` macro, defined in terms of
`x_free_()`. This macro should set its lvalue to NULL. You can define it
using the FREE_AND_NULL macro, as follows:
```
#define x_free(ptr) FREE_AND_NULL(x_t, x_free_, (ptr))
```
### Grow-only memory allocation: lib/memarea
It's often handy to allocate a large number of tiny objects, all of which
need to disappear at the same time. You can do this in tor using the
memarea.c abstraction, which uses a set of grow-only buffers for allocation,
and only supports a single "free" operation at the end.
Using memareas also helps you avoid memory fragmentation. You see, some libc
malloc implementations perform badly on the case where a large number of
small temporary objects are allocated at the same time as a few long-lived
objects of similar size. But if you use tor_malloc() for the long-lived ones
and a memarea for the temporary object, the malloc implementation is likelier
to do better.
To create a new memarea, use `memarea_new()`. To drop all the storage from a
memarea, and invalidate its pointers, use `memarea_drop_all()`.
The allocation functions `memarea_alloc()`, `memarea_alloc_zero()`,
`memarea_memdup()`, `memarea_strdup()`, and `memarea_strndup()` are analogous
to the similarly-named malloc() functions. There is intentionally no
`memarea_free()` or `memarea_realloc()`.
### Special allocation: lib/malloc/map_anon.h
TODO: WRITEME.

View File

@ -1,45 +0,0 @@
## Collections in tor
### Smartlists: Neither lists, nor especially smart.
For historical reasons, we call our dynamic-allocated array type
`smartlist_t`. It can grow or shrink as elements are added and removed.
All smartlists hold an array of `void *`. Whenever you expose a smartlist
in an API you *must* document which types its pointers actually hold.
<!-- It would be neat to fix that, wouldn't it? -NM -->
Smartlists are created empty with `smartlist_new()` and freed with
`smartlist_free()`. See the `containers.h` module documentation for more
information; there are many convenience functions for commonly needed
operations.
<!-- TODO: WRITE more about what you can do with smartlists. -->
### Digest maps, string maps, and more.
Tor makes frequent use of maps from 160-bit digests, 256-bit digests,
or nul-terminated strings to `void *`. These types are `digestmap_t`,
`digest256map_t`, and `strmap_t` respectively. See the containers.h
module documentation for more information.
### Intrusive lists and hashtables
For performance-sensitive cases, we sometimes want to use "intrusive"
collections: ones where the bookkeeping pointers are stuck inside the
structures that belong to the collection. If you've used the
BSD-style sys/queue.h macros, you'll be familiar with these.
Unfortunately, the `sys/queue.h` macros vary significantly between the
platforms that have them, so we provide our own variants in
`src/ext/tor_queue.h`.
We also provide an intrusive hashtable implementation in `src/ext/ht.h`.
When you're using it, you'll need to define your own hash
functions. If attacker-induced collisions are a worry here, use the
cryptographic siphash24g function to extract hashes.
<!-- TODO: WRITE about bloom filters, namemaps, bit-arrays, order functions.
-->

View File

@ -1,132 +1,4 @@
## Lower-level cryptography functionality in Tor ##
Generally speaking, Tor code shouldn't be calling OpenSSL (or any
other crypto library) directly. Instead, we should indirect through
one of the functions in src/common/crypto\*.c or src/common/tortls.c.
Cryptography functionality that's available is described below.
### RNG facilities ###
The most basic RNG capability in Tor is the crypto_rand() family of
functions. These currently use OpenSSL's RAND_() backend, but may use
something faster in the future.
In addition to crypto_rand(), which fills in a buffer with random
bytes, we also have functions to produce random integers in certain
ranges; to produce random hostnames; to produce random doubles, etc.
When you're creating a long-term cryptographic secret, you might want
to use crypto_strongest_rand() instead of crypto_rand(). It takes the
operating system's entropy source and combines it with output from
crypto_rand(). This is a pure paranoia measure, but it might help us
someday.
You can use smartlist_choose() to pick a random element from a smartlist
and smartlist_shuffle() to randomize the order of a smartlist. Both are
potentially a bit slow.
### Cryptographic digests and related functions ###
We treat digests as separate types based on the length of their
outputs. We support one 160-bit digest (SHA1), two 256-bit digests
(SHA256 and SHA3-256), and two 512-bit digests (SHA512 and SHA3-512).
You should not use SHA1 for anything new.
The crypto_digest\*() family of functions manipulates digests. You
can either compute a digest of a chunk of memory all at once using
crypto_digest(), crypto_digest256(), or crypto_digest512(). Or you
can create a crypto_digest_t object with
crypto_digest{,256,512}_new(), feed information to it in chunks using
crypto_digest_add_bytes(), and then extract the final digest using
crypto_digest_get_digest(). You can copy the state of one of these
objects using crypto_digest_dup() or crypto_digest_assign().
We support the HMAC hash-based message authentication code
instantiated using SHA256. See crypto_hmac_sha256. (You should not
add any HMAC users with SHA1, and HMAC is not necessary with SHA3.)
We also support the SHA3 cousins, SHAKE128 and SHAKE256. Unlike
digests, these are extendable output functions (or XOFs) where you can
get any amount of output. Use the crypto_xof_\*() functions to access
these.
We have several ways to derive keys from cryptographically strong secret
inputs (like diffie-hellman outputs). The old
crypto_expand_key_material-TAP() performs an ad-hoc KDF based on SHA1 -- you
shouldn't use it for implementing anything but old versions of the Tor
protocol. You can use HKDF-SHA256 (as defined in RFC5869) for more modern
protocols. Also consider SHAKE256.
If your input is potentially weak, like a password or passphrase, use a salt
along with the secret_to_key() functions as defined in crypto_s2k.c. Prefer
scrypt over other hashing methods when possible. If you're using a password
to encrypt something, see the "boxed file storage" section below.
Finally, in order to store objects in hash tables, Tor includes the
randomized SipHash 2-4 function. Call it via the siphash24g() function in
src/ext/siphash.h whenever you're creating a hashtable whose keys may be
manipulated by an attacker in order to DoS you with collisions.
### Stream ciphers ###
You can create instances of a stream cipher using crypto_cipher_new().
These are stateful objects of type crypto_cipher_t. Note that these
objects only support AES-128 right now; a future version should add
support for AES-128 and/or ChaCha20.
You can encrypt/decrypt with crypto_cipher_encrypt or
crypto_cipher_decrypt. The crypto_cipher_crypt_inplace function performs
an encryption without a copy.
Note that sensible people should not use raw stream ciphers; they should
probably be using some kind of AEAD. Sorry.
### Public key functionality ###
We support four public key algorithms: DH1024, RSA, Curve25519, and
Ed25519.
We support DH1024 over two prime groups. You access these via the
crypto_dh_\*() family of functions.
We support RSA in many bit sizes for signing and encryption. You access
it via the crypto_pk_*() family of functions. Note that a crypto_pk_t
may or may not include a private key. See the crypto_pk_* functions in
crypto.c for a full list of functions here.
For Curve25519 functionality, see the functions and types in
crypto_curve25519.c. Curve25519 is generally suitable for when you need
a secure fast elliptic-curve diffie hellman implementation. When
designing new protocols, prefer it over DH in Z_p.
For Ed25519 functionality, see the functions and types in
crypto_ed25519.c. Ed25519 is a generally suitable as a secure fast
elliptic curve signature method. For new protocols, prefer it over RSA
signatures.
### Metaformats for storage ###
When OpenSSL manages the storage of some object, we use whatever format
OpenSSL provides -- typically, some kind of PEM-wrapped base 64 encoding
that starts with "----- BEGIN CRYPTOGRAPHIC OBJECT ----".
When we manage the storage of some cryptographic object, we prefix the
object with 32-byte NUL-padded prefix in order to avoid accidental
object confusion; see the crypto_read_tagged_contents_from_file() and
crypto_write_tagged_contents_to_file() functions for manipulating
these. The prefix is "== type: tag ==", where type describes the object
and its encoding, and tag indicates which one it is.
### Boxed-file storage ###
When managing keys, you frequently want to have some way to write a
secret object to disk, encrypted with a passphrase. The crypto_pwbox
and crypto_unpwbox functions do so in a way that's likely to be
readable by future versions of Tor.
### Certificates ### ### Certificates ###
@ -153,17 +25,3 @@ napkin.
documents that include keys and which are signed by keys. You can documents that include keys and which are signed by keys. You can
consider these documents to be an additional kind of certificate if you consider these documents to be an additional kind of certificate if you
want.) want.)
### TLS ###
Tor's TLS implementation is more tightly coupled to OpenSSL than we'd
prefer. You can read most of it in tortls.c.
Unfortunately, TLS's state machine and our requirement for nonblocking
IO support means that using TLS in practice is a bit hairy, since
logical writes can block on a physical reads, and vice versa.
If you are lucky, you will never have to look at the code here.

View File

@ -1,95 +1,6 @@
## Tor's modules ## ## Tor's modules ##
### Generic modules ###
`buffers.c`
: Implements the `buf_t` buffered data type for connections, and several
low-level data handling functions to handle network protocols on it.
`channel.c`
: Generic channel implementation. Channels handle sending and receiving cells
among tor nodes.
`channeltls.c`
: Channel implementation for TLS-based OR connections. Uses `connection_or.c`.
`circuitbuild.c`
: Code for constructing circuits and choosing their paths. (*Note*:
this module could plausibly be split into handling the client side,
the server side, and the path generation aspects of circuit building.)
`circuitlist.c`
: Code for maintaining and navigating the global list of circuits.
`circuitmux.c`
: Generic circuitmux implementation. A circuitmux handles deciding, for a
particular channel, which circuit should write next.
`circuitmux_ewma.c`
: A circuitmux implementation based on the EWMA (exponentially
weighted moving average) algorithm.
`circuituse.c`
: Code to actually send and receive data on circuits.
`command.c`
: Handles incoming cells on channels.
`config.c`
: Parses options from torrc, and uses them to configure the rest of Tor.
`confparse.c`
: Generic torrc-style parser. Used to parse torrc and state files.
`connection.c`
: Generic and common connection tools, and implementation for the simpler
connection types.
`connection_edge.c`
: Implementation for entry and exit connections.
`connection_or.c`
: Implementation for OR connections (the ones that send cells over TLS).
`main.c`
: Principal entry point, main loops, scheduled events, and network
management for Tor.
`ntmain.c`
: Implements Tor as a Windows service. (Not very well.)
`onion.c`
: Generic code for generating and responding to CREATE and CREATED
cells, and performing the appropriate onion handshakes. Also contains
code to manage the server-side onion queue.
`onion_fast.c`
: Implements the old SHA1-based CREATE_FAST/CREATED_FAST circuit
creation handshake. (Now deprecated.)
`onion_ntor.c`
: Implements the Curve25519-based NTOR circuit creation handshake.
`onion_tap.c`
: Implements the old RSA1024/DH1024-based TAP circuit creation handshake. (Now
deprecated.)
`relay.c`
: Handles particular types of relay cells, and provides code to receive,
encrypt, route, and interpret relay cells.
`scheduler.c`
: Decides which channel/circuit pair is ready to receive the next cell.
`statefile.c`
: Handles loading and storing Tor's state file.
`tor_main.c`
: Contains the actual `main()` function. (This is placed in a separate
file so that the unit tests can have their own `main()`.)
### Node-status modules ### ### Node-status modules ###
`directory.c` `directory.c`

View File

@ -1,5 +1,5 @@
/** /**
@dir app @dir /app
@brief app: top-level entry point for Tor @brief app: top-level entry point for Tor
The "app" directory has Tor's main entry point and configuration logic, The "app" directory has Tor's main entry point and configuration logic,

View File

@ -1,4 +1,8 @@
/** /**
@dir app/config @dir /app/config
@brief app/config @brief app/config: Top-level configuration code
Refactoring this module is a work in progress, see
[ticket 29211](https://trac.torproject.org/projects/tor/ticket/29211).
**/ **/

View File

@ -1,4 +1,4 @@
/** /**
@dir app/main @dir /app/main
@brief app/main @brief app/main: Entry point for tor.
**/ **/

View File

@ -1,8 +1,20 @@
/** /**
@dir core @dir /core
@brief core: main loop and onion routing functionality @brief core: main loop and onion routing functionality
The "core" directory has the central protocols for Tor, which every The "core" directory has the central protocols for Tor, which every
client and relay must implement in order to perform onion routing. client and relay must implement in order to perform onion routing.
It is divided into three lower-level pieces:
- \refdir{core/crypto} -- Tor-specific cryptography.
- \refdir{core/proto} -- Protocol encoding/decoding.
- \refdir{core/mainloop} -- A connection-oriented asynchronous mainloop.
and one high-level piece:
- \refdir{core/or} -- Implements onion routing itself.
**/ **/

View File

@ -1,4 +1,8 @@
/** /**
@dir core/crypto @dir /core/crypto
@brief core/crypto @brief core/crypto: Tor-specific cryptography
This module implements Tor's circuit-construction crypto and Tor's
relay crypto.
**/ **/

View File

@ -1,4 +1,12 @@
/** /**
@dir core/mainloop @dir /core/mainloop
@brief core/mainloop @brief core/mainloop: Non-onion-routing mainloop functionality
This module uses the event-loop code of \refdir{lib/evloop} to implement an
asynchronous connection-oriented protocol handler.
The layering here is imperfect: the code here was split from \refdir{core/or}
without refactoring how the two modules call one another. Probably many
functions should be moved and refactored.
**/ **/

View File

@ -1,4 +1,62 @@
/** /**
@dir core/or @dir /core/or
@brief core/or @brief core/or: *Onion routing happens here*.
**/
This is the central part of Tor that handles the core tasks of onion routing:
building circuit, handling circuits, attaching circuit to streams, moving
data around, and so forth.
Some aspects of this module should probably be refactored into others.
Notable files here include:
`channel.c`
: Generic channel implementation. Channels handle sending and receiving cells
among tor nodes.
`channeltls.c`
: Channel implementation for TLS-based OR connections. Uses `connection_or.c`.
`circuitbuild.c`
: Code for constructing circuits and choosing their paths. (*Note*:
this module could plausibly be split into handling the client side,
the server side, and the path generation aspects of circuit building.)
`circuitlist.c`
: Code for maintaining and navigating the global list of circuits.
`circuitmux.c`
: Generic circuitmux implementation. A circuitmux handles deciding, for a
particular channel, which circuit should write next.
`circuitmux_ewma.c`
: A circuitmux implementation based on the EWMA (exponentially
weighted moving average) algorithm.
`circuituse.c`
: Code to actually send and receive data on circuits.
`command.c`
: Handles incoming cells on channels.
`connection.c`
: Generic and common connection tools, and implementation for the simpler
connection types.
`connection_edge.c`
: Implementation for entry and exit connections.
`connection_or.c`
: Implementation for OR connections (the ones that send cells over TLS).
`onion.c`
: Generic code for generating and responding to CREATE and CREATED
cells, and performing the appropriate onion handshakes. Also contains
code to manage the server-side onion queue.
`relay.c`
: Handles particular types of relay cells, and provides code to receive,
encrypt, route, and interpret relay cells.
`scheduler.c`
: Decides which channel/circuit pair is ready to receive the next cell.

View File

@ -1,4 +1,8 @@
/** /**
@dir core/proto @dir /core/proto
@brief core/proto @brief core/proto: Protocol encoding/decoding
These functions should (but do not always) exist at a lower level than most
of the rest of core.
**/ **/

View File

@ -1,4 +1,4 @@
/** /**
@dir feature/api @dir /feature/api
@brief feature/api @brief feature/api: In-process interface to starting/stopping Tor.
**/ **/

View File

@ -1,4 +1,7 @@
/** /**
@dir feature/client @dir /feature/client
@brief feature/client @brief feature/client: Client-specific code
(There is also a bunch of client-specific code in other modules.)
**/ **/

View File

@ -1,4 +1,10 @@
/** /**
@dir feature/control @dir /feature/control
@brief feature/control @brief feature/control: Controller API.
The Controller API is a text-based protocol that another program (or another
thread, if you're running Tor in-process) can use to configure and control
Tor while it is running. The current protocol is documented in
[control-spec.txt](https://gitweb.torproject.org/torspec.git/tree/control-spec.txt).
**/ **/

View File

@ -1,4 +1,11 @@
/** /**
@dir feature/dirauth @dir /feature/dirauth
@brief feature/dirauth @brief feature/dirauth: Directory authority implementation.
This module handles running Tor as a directory authority.
The directory protocol is specified in
[dir-spec.txt](https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt).
**/ **/

View File

@ -1,4 +1,8 @@
/** /**
@dir feature/dircache @dir /feature/dircache
@brief feature/dircache @brief feature/dircache: Run as a directory cache server
This module handles the directory caching functionality that all relays may
provide, for serving cached directory objects to objects.
**/ **/

View File

@ -1,4 +1,9 @@
/** /**
@dir feature/dirclient @dir /feature/dirclient
@brief feature/dirclient @brief feature/dirclient: Directory client implementation.
The code here is used by all Tor instances that need to download directory
information. Currently, that is all of them, since even authorities need to
launch downloads to learn about relays that other authorities have listed.
**/ **/

View File

@ -1,4 +1,9 @@
/** /**
@dir feature/dircommon @dir /feature/dircommon
@brief feature/dircommon @brief feature/dircommon: Directory client and server shared code
This module has the code that directory clients (anybody who download
information about relays) and directory servers (anybody who serves such
information) share in common.
**/ **/

View File

@ -1,4 +1,10 @@
/** /**
@dir feature/dirparse @dir /feature/dirparse
@brief feature/dirparse @brief feature/dirparse: Parsing Tor directory objects
We define a number of "directory objects" in
[dir-spec.txt](https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt),
all of them using a common line-oriented meta-format. This module is used by
other parts of Tor to parse them.
**/ **/

View File

@ -1,5 +1,5 @@
/** /**
@dir feature @dir /feature
@brief feature: domain-specific modules @brief feature: domain-specific modules
The "feature" directory has modules that Tor uses only for a particular The "feature" directory has modules that Tor uses only for a particular

View File

@ -1,4 +1,16 @@
/** /**
@dir feature/hibernate @dir /feature/hibernate
@brief feature/hibernate @brief feature/hibernate: Bandwidth accounting and hibernation (!)
This module implements two features that are only somewhat related, and
should probably be separated in the future. One feature is bandwidth
accounting (making sure we use no more than so many gigabytes in a day) and
hibernation (avoiding network activity while we have used up all/most of our
configured gigabytes). The other feature is clean shutdown, where we stop
accepting new connections for a while and give the old ones time to close.
The two features are related only in the sense that "soft hibernation" (being
almost out of ) is very close to the "shutting down" state. But it would be
better in the long run to make the two completely separate.
**/ **/

View File

@ -1,4 +1,10 @@
/** /**
@dir feature/hs @dir /feature/hs
@brief feature/hs @brief feature/hs: v3 (current) onion service protocol
This directory implements the v3 onion service protocol,
as specified in
[rend-spec-v3.txt](https://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt).
**/ **/

View File

@ -1,4 +1,5 @@
/** /**
@dir feature/hs_common @dir /feature/hs_common
@brief feature/hs_common @brief feature/hs_common: Common to v2 (old) and v3 (current) onion services
**/ **/

View File

@ -1,4 +1,5 @@
/** /**
@dir feature/keymgt @dir /feature/keymgt
@brief feature/keymgt @brief feature/keymgt: Store keys for relays, authorities, etc.
**/ **/

View File

@ -1,4 +1,4 @@
/** /**
@dir feature/nodelist @dir /feature/nodelist
@brief feature/nodelist @brief feature/nodelist: Download and manage a list of relays
**/ **/

View File

@ -1,4 +1,6 @@
/** /**
@dir feature/relay @dir /feature/relay
@brief feature/relay @brief feature/relay: Relay-specific code
(There is also a bunch of relay-specific code in other modules.)
**/ **/

View File

@ -1,4 +1,9 @@
/** /**
@dir feature/rend @dir /feature/rend
@brief feature/rend @brief feature/rend: version 2 (old) hidden services
This directory implements the v2 onion service protocol,
as specified in
[rend-spec-v2.txt](https://gitweb.torproject.org/torspec.git/tree/rend-spec-v2.txt).
**/ **/

View File

@ -1,4 +1,12 @@
/** /**
@dir feature/stats @dir /feature/stats
@brief feature/stats @brief feature/stats: Relay statistics. Also, port prediction.
This module collects anonymized relay statistics in order to publish them in
relays' routerinfo and extrainfo documents.
Additionally, it contains predict_ports.c, which remembers which ports we've
visited recently as a client, so we can make sure we have open circuits that
support them.
**/ **/

View File

@ -1,4 +1,4 @@
/** /**
@dir lib/arch @dir /lib/arch
@brief lib/arch @brief lib/arch: Compatibility code for handling different CPU architectures.
**/ **/

View File

@ -1,4 +1,15 @@
/** /**
@dir lib/buf @dir /lib/buf
@brief lib/buf @brief lib/buf: An efficient byte queue.
This module defines the buf_t type, which is used throughout our networking
code. The implementation is a singly-linked queue of buffer chunks, similar
to the BSD kernel's
["mbuf"](https://www.freebsd.org/cgi/man.cgi?query=mbuf&sektion=9) structure.
The buf_t type is also reasonable for use in constructing long strings.
See \refdir{lib/net} for networking code that uses buf_t, and
\refdir{lib/tls} for cryptographic code that uses buf_t.
**/ **/

View File

@ -1,4 +1,4 @@
/** /**
@dir lib/cc @dir /lib/cc
@brief lib/cc @brief lib/cc: Macros for managing the C compiler and language.
**/ **/

View File

@ -1,4 +1,8 @@
/** /**
@dir lib/compress @dir /lib/compress
@brief lib/compress @brief lib/compress: Wraps several compression libraries
Currently supported are zlib (mandatory), zstd (optional), and lzma
(optional).
**/ **/

View File

@ -1,4 +1,5 @@
/** /**
@dir lib/conf @dir /lib/conf
@brief lib/conf @brief lib/conf: Types and macros for declaring configuration options.
**/ **/

View File

@ -1,4 +1,9 @@
/** /**
@dir lib/confmgt @dir /lib/confmgt
@brief lib/confmgt @brief lib/confmgt: Parse, encode, manipulate configuration files.
This logic is used in common by our state files (statefile.c) and
configuration files (config.c) to manage a set of named, typed fields,
reading and writing them to disk and to the controller.
**/ **/

View File

@ -1,4 +1,51 @@
/** /**
@dir lib/container @dir /lib/container
@brief lib/container @brief lib/container: Hash tables, dynamic arrays, bit arrays, etc.
### Smartlists: Neither lists, nor especially smart.
For historical reasons, we call our dynamic-allocated array type
`smartlist_t`. It can grow or shrink as elements are added and removed.
All smartlists hold an array of `void *`. Whenever you expose a smartlist
in an API you *must* document which types its pointers actually hold.
<!-- It would be neat to fix that, wouldn't it? -NM -->
Smartlists are created empty with `smartlist_new()` and freed with
`smartlist_free()`. See the `containers.h` header documentation for more
information; there are many convenience functions for commonly needed
operations.
For low-level operations on smartlists, see also
\refdir{lib/smartlist_core}.
<!-- TODO: WRITE more about what you can do with smartlists. -->
### Digest maps, string maps, and more.
Tor makes frequent use of maps from 160-bit digests, 256-bit digests,
or nul-terminated strings to `void *`. These types are `digestmap_t`,
`digest256map_t`, and `strmap_t` respectively. See the containers.h
module documentation for more information.
### Intrusive lists and hashtables
For performance-sensitive cases, we sometimes want to use "intrusive"
collections: ones where the bookkeeping pointers are stuck inside the
structures that belong to the collection. If you've used the
BSD-style sys/queue.h macros, you'll be familiar with these.
Unfortunately, the `sys/queue.h` macros vary significantly between the
platforms that have them, so we provide our own variants in
`ext/tor_queue.h`.
We also provide an intrusive hashtable implementation in `ext/ht.h`.
When you're using it, you'll need to define your own hash
functions. If attacker-induced collisions are a worry here, use the
cryptographic siphash24g function to extract hashes.
<!-- TODO: WRITE about bloom filters, namemaps, bit-arrays, order functions.
-->
**/ **/

View File

@ -1,4 +1,139 @@
/** /**
@dir lib/crypt_ops @dir /lib/crypt_ops
@brief lib/crypt_ops @brief lib/crypt_ops: Cryptographic operations.
This module contains wrappers around the cryptographic libraries that we
support, and implementations for some higher-level cryptographic
constructions that we use.
It wraps our two major cryptographic backends (OpenSSL or NSS, as configured
by the user), and also wraps other cryptographic code in src/ext.
Generally speaking, Tor code shouldn't be calling OpenSSL or NSS
(or any other crypto library) directly. Instead, we should indirect through
one of the functions in this directory, or through \refdir{lib/tls}.
Cryptography functionality that's available is described below.
### RNG facilities ###
The most basic RNG capability in Tor is the crypto_rand() family of
functions. These currently use OpenSSL's RAND_() backend, but may use
something faster in the future.
In addition to crypto_rand(), which fills in a buffer with random
bytes, we also have functions to produce random integers in certain
ranges; to produce random hostnames; to produce random doubles, etc.
When you're creating a long-term cryptographic secret, you might want
to use crypto_strongest_rand() instead of crypto_rand(). It takes the
operating system's entropy source and combines it with output from
crypto_rand(). This is a pure paranoia measure, but it might help us
someday.
You can use smartlist_choose() to pick a random element from a smartlist
and smartlist_shuffle() to randomize the order of a smartlist. Both are
potentially a bit slow.
### Cryptographic digests and related functions ###
We treat digests as separate types based on the length of their
outputs. We support one 160-bit digest (SHA1), two 256-bit digests
(SHA256 and SHA3-256), and two 512-bit digests (SHA512 and SHA3-512).
You should not use SHA1 for anything new.
The crypto_digest\*() family of functions manipulates digests. You
can either compute a digest of a chunk of memory all at once using
crypto_digest(), crypto_digest256(), or crypto_digest512(). Or you
can create a crypto_digest_t object with
crypto_digest{,256,512}_new(), feed information to it in chunks using
crypto_digest_add_bytes(), and then extract the final digest using
crypto_digest_get_digest(). You can copy the state of one of these
objects using crypto_digest_dup() or crypto_digest_assign().
We support the HMAC hash-based message authentication code
instantiated using SHA256. See crypto_hmac_sha256. (You should not
add any HMAC users with SHA1, and HMAC is not necessary with SHA3.)
We also support the SHA3 cousins, SHAKE128 and SHAKE256. Unlike
digests, these are extendable output functions (or XOFs) where you can
get any amount of output. Use the crypto_xof_\*() functions to access
these.
We have several ways to derive keys from cryptographically strong secret
inputs (like diffie-hellman outputs). The old
crypto_expand_key_material_TAP() performs an ad-hoc KDF based on SHA1 -- you
shouldn't use it for implementing anything but old versions of the Tor
protocol. You can use HKDF-SHA256 (as defined in RFC5869) for more modern
protocols. Also consider SHAKE256.
If your input is potentially weak, like a password or passphrase, use a salt
along with the secret_to_key() functions as defined in crypto_s2k.c. Prefer
scrypt over other hashing methods when possible. If you're using a password
to encrypt something, see the "boxed file storage" section below.
Finally, in order to store objects in hash tables, Tor includes the
randomized SipHash 2-4 function. Call it via the siphash24g() function in
src/ext/siphash.h whenever you're creating a hashtable whose keys may be
manipulated by an attacker in order to DoS you with collisions.
### Stream ciphers ###
You can create instances of a stream cipher using crypto_cipher_new().
These are stateful objects of type crypto_cipher_t. Note that these
objects only support AES-128 right now; a future version should add
support for AES-128 and/or ChaCha20.
You can encrypt/decrypt with crypto_cipher_encrypt or
crypto_cipher_decrypt. The crypto_cipher_crypt_inplace function performs
an encryption without a copy.
Note that sensible people should not use raw stream ciphers; they should
probably be using some kind of AEAD. Sorry.
### Public key functionality ###
We support four public key algorithms: DH1024, RSA, Curve25519, and
Ed25519.
We support DH1024 over two prime groups. You access these via the
crypto_dh_\*() family of functions.
We support RSA in many bit sizes for signing and encryption. You access
it via the crypto_pk_*() family of functions. Note that a crypto_pk_t
may or may not include a private key. See the crypto_pk_* functions in
crypto.c for a full list of functions here.
For Curve25519 functionality, see the functions and types in
crypto_curve25519.c. Curve25519 is generally suitable for when you need
a secure fast elliptic-curve diffie hellman implementation. When
designing new protocols, prefer it over DH in Z_p.
For Ed25519 functionality, see the functions and types in
crypto_ed25519.c. Ed25519 is a generally suitable as a secure fast
elliptic curve signature method. For new protocols, prefer it over RSA
signatures.
### Metaformats for storage ###
When OpenSSL manages the storage of some object, we use whatever format
OpenSSL provides -- typically, some kind of PEM-wrapped base 64 encoding
that starts with "----- BEGIN CRYPTOGRAPHIC OBJECT ----".
When we manage the storage of some cryptographic object, we prefix the
object with 32-byte NUL-padded prefix in order to avoid accidental
object confusion; see the crypto_read_tagged_contents_from_file() and
crypto_write_tagged_contents_to_file() functions for manipulating
these. The prefix is "== type: tag ==", where type describes the object
and its encoding, and tag indicates which one it is.
### Boxed-file storage ###
When managing keys, you frequently want to have some way to write a
secret object to disk, encrypted with a passphrase. The crypto_pwbox
and crypto_unpwbox functions do so in a way that's likely to be
readable by future versions of Tor.
**/ **/

View File

@ -1,4 +1,16 @@
/** /**
@dir lib/ctime @dir /lib/ctime
@brief lib/ctime @brief lib/ctime: Constant-time code to avoid side-channels.
This module contains constant-time implementations of various
data comparison and table lookup functions. We use these in preference to
memcmp() and so forth, since memcmp() can leak information about its inputs
based on how fast it returns. In general, your code should call tor_memeq()
and tor_memneq(), not memcmp().
We also define some _non_-constant-time wrappers for memcmp() here: Since we
consider calls to memcmp() to be in error, we require that code that actually
doesn't need to be constant-time to use the fast_memeq() / fast_memneq() /
fast_memcmp() aliases instead.
**/ **/

View File

@ -1,4 +1,4 @@
/** /**
@dir lib/defs @dir /lib/defs
@brief lib/defs @brief lib/defs: Lowest-level constants, used in many places.
**/ **/

View File

@ -1,4 +1,16 @@
/** /**
@dir lib/dispatch @dir /lib/dispatch
@brief lib/dispatch @brief lib/dispatch: In-process message delivery.
This module provides a general in-process "message dispatch" system in which
typed messages are sent on channels. The dispatch.h header has far more
information.
It is used by by \refdir{lib/pubsub} to implement our general
inter-module publish/subscribe system.
This is not a fancy multi-threaded many-to-many dispatcher as you may be used
to from more sophisticated architectures: this dispatcher is intended only
for use in improving Tor's architecture.
**/ **/

View File

@ -1,4 +1,8 @@
/** /**
@dir lib/encoding @dir /lib/encoding
@brief lib/encoding @brief lib/encoding: Encoding data in various forms, types, and transformations
Here we have time formats (timefmt.c), quoted strings (qstring.c), C strings
(string.c) base-16/32/64 (binascii.c), and more.
**/ **/

View File

@ -1,4 +1,15 @@
/** /**
@dir lib/err @dir /lib/err
@brief lib/err @brief lib/err: Lowest-level error handling code.
This module is responsible for generating stack traces, handling raw
assertion failures, and otherwise reporting problems that might not be
safe to report via the regular logging module.
There are three kinds of users for the functions in this module:
* Code that needs a way to assert(), but which cannot use the regular
`tor_assert()` macros in logging module.
* Code that needs signal-safe error reporting.
* Higher-level error handling code.
**/ **/

View File

@ -1,4 +1,9 @@
/** /**
@dir lib/evloop @dir /lib/evloop
@brief lib/evloop @brief lib/evloop: Low-level event loop.
This modules has tools to manage the [libevent](https://libevent.org/) event
loop and related functionality, in order to implement asynchronous
networking, timers, periodic events, and other scheduling tasks.
**/ **/

View File

@ -1,4 +1,7 @@
/** /**
@dir lib/fdio @dir /lib/fdio
@brief lib/fdio @brief lib/fdio: Code to read/write on file descriptors.
(This module also handles sockets, on platforms where a socket is not a kind
of fd.)
**/ **/

View File

@ -1,4 +1,11 @@
/** /**
@dir lib/fs @dir /lib/fs
@brief lib/fs @brief lib/fs: Files, filenames, directories, etc.
This module is mostly a set of compatibility wrappers around
operating-system-specific filesystem access.
It also contains a set of convenience functions for safely writing to files,
creating directories, and so on.
**/ **/

View File

@ -1,4 +1,5 @@
/** /**
@dir lib/geoip @dir /lib/geoip
@brief lib/geoip @brief lib/geoip: IP-to-country mapping
**/ **/

View File

@ -1,4 +1,4 @@
/** /**
@dir lib/intmath @dir /lib/intmath
@brief lib/intmath @brief lib/intmath: Integer mathematics.
**/ **/

View File

@ -1,8 +1,133 @@
/** /**
@dir lib @dir /lib
@brief lib: low-level functionality. @brief lib: low-level functionality.
The "lib" directory contains low-level functionality, most of it not The "lib" directory contains low-level functionality. In general, this
necessarily Tor-specific. code is not necessarily Tor-specific, but is instead possibly useful for
other applications.
The modules in `lib` are currently well-factored: each one depends
only on lower-level modules. You can see an up-to-date list of the
modules, sorted from lowest to highest level, by running
`./scripts/maint/practracker/includes.py --toposort`.
As of this writing, the library modules are (from lowest to highest
level):
- \refdir{lib/cc} -- Macros for managing the C compiler and
language.
- \refdir{lib/version} -- Holds the current version of Tor.
- \refdir{lib/testsupport} -- Helpers for making
test-only code, and test mocking support.
- \refdir{lib/defs} -- Lowest-level constants.
- \refdir{lib/subsys} -- Types used for declaring a
"subsystem". (_A subsystem is a module with support for initialization,
shutdown, configuration, and so on._)
- \refdir{lib/conf} -- For declaring configuration options.
- \refdir{lib/arch} -- For handling differences in CPU
architecture.
- \refdir{lib/err} -- Lowest-level error handling code.
- \refdir{lib/malloc} -- Memory management.
management.
- \refdir{lib/intmath} -- Integer mathematics.
- \refdir{lib/fdio} -- For
reading and writing n file descriptors.
- \refdir{lib/lock} -- Simple locking support.
(_Lower-level than the rest of the threading code._)
- \refdir{lib/ctime} -- Constant-time code to avoid
side-channels.
- \refdir{lib/string} -- Low-level string manipulation.
- \refdir{lib/wallclock} --
For inspecting and manipulating the current (UTC) time.
- \refdir{lib/osinfo} -- For inspecting the OS version
and capabilities.
- \refdir{lib/smartlist_core} -- The bare-bones
pieces of our dynamic array ("smartlist") implementation.
- \refdir{lib/log} -- Log messages to files, syslogs, etc.
- \refdir{lib/container} -- General purpose containers,
including dynamic arrays ("smartlists"), hashtables, bit arrays,
etc.
- \refdir{lib/trace} -- A general-purpose API
function-tracing functionality Tor. (_Currently not much used._)
- \refdir{lib/thread} -- Mid-level Threading.
- \refdir{lib/term} -- Terminal manipulation
(like reading a password from the user).
- \refdir{lib/memarea} -- A fast
"arena" style allocator, where the data is freed all at once.
- \refdir{lib/encoding} -- Encoding
data in various formats, datatypes, and transformations.
- \refdir{lib/dispatch} -- A general-purpose in-process
message delivery system.
- \refdir{lib/sandbox} -- Our Linux seccomp2 sandbox
implementation.
- \refdir{lib/pubsub} -- A publish/subscribe message passing system.
- \refdir{lib/fs} -- Files, filenames, directories, etc.
- \refdir{lib/confmgt} -- Parse, encode, and manipulate onfiguration files.
- \refdir{lib/crypt_ops} -- Cryptographic operations.
- \refdir{lib/meminfo} -- Functions for inspecting our
memory usage, if the malloc implementation exposes that to us.
- \refdir{lib/time} -- Higher level time functions, including
fine-gained and monotonic timers.
- \refdir{lib/math} -- Floating-point mathematical utilities.
- \refdir{lib/buf} -- An efficient byte queue.
- \refdir{lib/net} -- Networking code, including address
manipulation, compatibility wrappers, etc.
- \refdir{lib/compress} -- Wraps several compression libraries.
- \refdir{lib/geoip} -- IP-to-country mapping.
- \refdir{lib/tls} -- TLS library wrappers.
- \refdir{lib/evloop} -- Low-level event-loop.
- \refdir{lib/process} -- Launch and manage subprocesses.
### What belongs in lib?
In general, if you can imagine some program wanting the functionality
you're writing, even if that program had nothing to do with Tor, your
functionality belongs in lib.
If it falls into one of the existing "lib" categories, your
functionality belongs in lib.
If you are using platform-specific `ifdef`s to manage compatibility
issues among platforms, you should probably consider whether you can
put your code into lib.
**/ **/

View File

@ -1,4 +1,8 @@
/** /**
@dir lib/lock @dir /lib/lock
@brief lib/lock @brief lib/lock: Simple locking support.
This module is more low-level than the rest of the threading code, since it
is needed by more intermediate-level modules.
**/ **/

View File

@ -1,4 +1,12 @@
/** /**
@dir lib/log @dir /lib/log
@brief lib/log @brief lib/log: Log messages to files, syslogs, etc.
You can think of this as the logical "midpoint" of the
\refdir{lib} code": much of the higher-level code is higher-level
_because_ it uses the logging module, and much of the lower-level code is
specifically written to avoid having to log, because the logging module
depends on it.
**/ **/

View File

@ -1,4 +1,78 @@
/** /**
@dir lib/malloc @dir /lib/malloc
@brief lib/malloc @brief lib/malloc: Wrappers and utilities for memory management.
Tor imposes a few light wrappers over C's native malloc and free
functions, to improve convenience, and to allow wholescale replacement
of malloc and free as needed.
You should never use 'malloc', 'calloc', 'realloc, or 'free' on their
own; always use the variants prefixed with 'tor_'.
They are the same as the standard C functions, with the following
exceptions:
* `tor_free(NULL)` is a no-op.
* `tor_free()` is a macro that takes an lvalue as an argument and sets it to
NULL after freeing it. To avoid this behavior, you can use `tor_free_()`
instead.
* tor_malloc() and friends fail with an assertion if they are asked to
allocate a value so large that it is probably an underflow.
* It is always safe to `tor_malloc(0)`, regardless of whether your libc
allows it.
* `tor_malloc()`, `tor_realloc()`, and friends are never allowed to fail.
Instead, Tor will die with an assertion. This means that you never
need to check their return values. See the next subsection for
information on why we think this is a good idea.
We define additional general-purpose memory allocation functions as well:
* `tor_malloc_zero(x)` behaves as `calloc(1, x)`, except the it makes clear
the intent to allocate a single zeroed-out value.
* `tor_reallocarray(x,y)` behaves as the OpenBSD reallocarray function.
Use it for cases when you need to realloc() in a multiplication-safe
way.
And specific-purpose functions as well:
* `tor_strdup()` and `tor_strndup()` behaves as the underlying libc
functions, but use `tor_malloc()` instead of the underlying function.
* `tor_memdup()` copies a chunk of memory of a given size.
* `tor_memdup_nulterm()` copies a chunk of memory of a given size, then
NUL-terminates it just to be safe.
#### Why assert on allocation failure?
Why don't we allow `tor_malloc()` and its allies to return NULL?
First, it's error-prone. Many programmers forget to check for NULL return
values, and testing for `malloc()` failures is a major pain.
Second, it's not necessarily a great way to handle OOM conditions. It's
probably better (we think) to have a memory target where we dynamically free
things ahead of time in order to stay under the target. Trying to respond to
an OOM at the point of `tor_malloc()` failure, on the other hand, would involve
a rare operation invoked from deep in the call stack. (Again, that's
error-prone and hard to debug.)
Third, thanks to the rise of Linux and other operating systems that allow
memory to be overcommitted, you can't actually ever rely on getting a NULL
from `malloc()` when you're out of memory; instead you have to use an approach
closer to tracking the total memory usage.
#### Conventions for your own allocation functions.
Whenever you create a new type, the convention is to give it a pair of
`x_new()` and `x_free_()` functions, named after the type.
Calling `x_free(NULL)` should always be a no-op.
There should additionally be an `x_free()` macro, defined in terms of
`x_free_()`. This macro should set its lvalue to NULL. You can define it
using the FREE_AND_NULL macro, as follows:
```
#define x_free(ptr) FREE_AND_NULL(x_t, x_free_, (ptr))
```
**/ **/

View File

@ -1,4 +1,8 @@
/** /**
@dir lib/math @dir /lib/math
@brief lib/math @brief lib/math: Floating-point math utilities.
This module includes a bunch of floating-point compatibility code, and
implementations for several probability distributions.
**/ **/

View File

@ -1,4 +1,30 @@
/** /**
@dir lib/memarea @dir /lib/memarea
@brief lib/memarea @brief lib/memarea: A fast arena-style allocator.
This module has a fast "arena" style allocator, where memory is freed all at
once. This kind of allocation is very fast and avoids fragmentation, at the
expense of requiring all the data to be freed at the same time. We use this
for parsing and diff calculations.
It's often handy to allocate a large number of tiny objects, all of which
need to disappear at the same time. You can do this in tor using the
memarea.c abstraction, which uses a set of grow-only buffers for allocation,
and only supports a single "free" operation at the end.
Using memareas also helps you avoid memory fragmentation. You see, some libc
malloc implementations perform badly on the case where a large number of
small temporary objects are allocated at the same time as a few long-lived
objects of similar size. But if you use tor_malloc() for the long-lived ones
and a memarea for the temporary object, the malloc implementation is likelier
to do better.
To create a new memarea, use `memarea_new()`. To drop all the storage from a
memarea, and invalidate its pointers, use `memarea_drop_all()`.
The allocation functions `memarea_alloc()`, `memarea_alloc_zero()`,
`memarea_memdup()`, `memarea_strdup()`, and `memarea_strndup()` are analogous
to the similarly-named malloc() functions. There is intentionally no
`memarea_free()` or `memarea_realloc()`.
**/ **/

View File

@ -1,4 +1,7 @@
/** /**
@dir lib/meminfo @dir /lib/meminfo
@brief lib/meminfo @brief lib/meminfo: Inspecting malloc() usage.
Only available when malloc() provides mallinfo() or something similar.
**/ **/

View File

@ -1,4 +1,8 @@
/** /**
@dir lib/net @dir /lib/net
@brief lib/net @brief lib/net: Low-level network-related code.
This module includes address manipulation, compatibility wrappers,
convenience functions, and so on.
**/ **/

View File

@ -1,4 +1,10 @@
/** /**
@dir lib/osinfo @dir /lib/osinfo
@brief lib/osinfo @brief lib/osinfo: For inspecting the OS version and capabilities.
In general, we use this module when we're telling the user what operating
system they are running. We shouldn't make decisions based on the output of
these checks: instead, we should have more specific checks, either at compile
time or run time, based on the observed system behavior.
**/ **/

View File

@ -1,4 +1,4 @@
/** /**
@dir lib/process @dir /lib/process
@brief lib/process @brief lib/process: Launch and manage subprocesses.
**/ **/

View File

@ -1,4 +1,16 @@
/** /**
@dir lib/pubsub @dir /lib/pubsub
@brief lib/pubsub @brief lib/pubsub: Publish-subscribe message passing.
This module wraps the \refdir{lib/dispatch} module, to provide a more
ergonomic and type-safe approach to message passing.
In general, we favor this mechanism for cases where higher-level modules
need to be notified when something happens in lower-level modules. (The
alternative would be calling up from the lower-level modules, which
would be error-prone; or maintaining lists of function-pointers, which
would be clumsy and tend to complicate the call graph.)
See pubsub.c for more information.
**/ **/

View File

@ -1,4 +1,17 @@
/** /**
@dir lib/sandbox @dir /lib/sandbox
@brief lib/sandbox @brief lib/sandbox: Linux seccomp2-based sandbox.
This module uses Linux's seccomp2 facility via the
[`libseccomp` library](https://github.com/seccomp/libseccomp), to restrict
the set of system calls that Tor is allowed to invoke while it is running.
Because there are many libc versions that invoke different system calls, and
because handling strings is quite complex, this module is more complex and
less portable than it needs to be.
A better architecture would put the responsibility for invoking tricky system
calls (like open()) in another, less restricted process, and give that
process responsibility for enforcing our sandbox rules.
**/ **/

View File

@ -1,4 +1,12 @@
/** /**
@dir lib/smartlist_core @dir /lib/smartlist_core
@brief lib/smartlist_core @brief lib/smartlist_core: Minimal dynamic array implementation
A `smartlist_t` is a dynamic array type for holding `void *`. We use it
throughout the rest of the codebase.
There are higher-level pieces in \refdir{lib/container} but
the ones in lib/smartlist_core are used by the logging code, and therefore
cannot use the logging code.
**/ **/

View File

@ -1,4 +0,0 @@
/**
@dir lib/stats
@brief lib/stats
**/

View File

@ -1,4 +1,15 @@
/** /**
@dir lib/string @dir /lib/string
@brief lib/string @brief lib/string: Low-level string manipulation.
We have a number of compatibility functions here: some are for handling
functionality that is not implemented (or not implemented the same) on every
platform; some are for providing locale-independent versions of libc
functions that would otherwise be defined differently for different users.
Other functions here are for common string-manipulation operations that we do
in the rest of the codebase.
Any string function high-level enough to need logging belongs in a
higher-level module.
**/ **/

View File

@ -1,4 +1,34 @@
/** /**
@dir lib/subsys @dir /lib/subsys
@brief lib/subsys @brief lib/subsys: Types for declaring a "subsystem".
## Subsystems in Tor
A subsystem is a module with support for initialization, shutdown,
configuration, and so on.
Many parts of Tor can be initialized, cleaned up, and configured somewhat
independently through a table-driven mechanism. Each such part is called a
"subsystem".
To declare a subsystem, make a global `const` instance of the `subsys_fns_t`
type, filling in the function pointer fields that you require with ones
corresponding to your subsystem. Any function pointers left as "NULL" will
be a no-op. Each system must have a name and a "level", which corresponds to
the order in which it is initialized. (See `app/main/subsystem_list.c` for a
list of current subsystems and their levels.)
Then, insert your subsystem in the list in `app/main/subsystem_list.c`. It
will need to occupy a position corresponding to its level.
At this point, your subsystem will be handled like the others: it will get
initialized at startup, torn down at exit, and so on.
Historical note: Not all of Tor's code is currently handled as
subsystems. As you work with older code, you may see some parts of the code
that are initialized from `tor_init()` or `run_tor_main_loop()` or
`tor_run_main()`; and torn down from `tor_cleanup()`. We aim to migrate
these to subsystems over time; please don't add any new code that follows
this pattern.
**/ **/

View File

@ -1,4 +1,4 @@
/** /**
@dir lib/term @dir /lib/term
@brief lib/term @brief lib/term: Terminal operations (password input).
**/ **/

View File

@ -1,4 +1,4 @@
/** /**
@dir lib/testsupport @dir /lib/testsupport
@brief lib/testsupport @brief lib/testsupport: Helpers for test-only code and for function mocking.
**/ **/

View File

@ -1,4 +1,9 @@
/** /**
@dir lib/thread @dir /lib/thread
@brief lib/thread @brief lib/thread: Mid-level threading.
This module contains compatibility and convenience code for multithreading,
except for low-level locks (which are in \refdir{lib/lock} and
workqueue/threadpool code (which belongs in \refdir{lib/evloop}.)
**/ **/

View File

@ -1,4 +1,11 @@
/** /**
@dir lib/time @dir /lib/time
@brief lib/time @brief lib/time: Higher-level time functions
This includes both fine-grained timers and monotonic timers, along with
wrappers for them to try to improve efficiency.
For "what time is it" in UTC, see \refdir{lib/wallclock}. For parsing and
encoding times and dates, see \refdir{lib/encoding}.
**/ **/

View File

@ -1,4 +1,13 @@
/** /**
@dir lib/tls @dir /lib/tls
@brief lib/tls @brief lib/tls: TLS library wrappers
This module has compatibility wrappers around the library (NSS or OpenSSL,
depending on configuration) that Tor uses to implement the TLS link security
protocol.
It also implements the logic for some legacy TLS protocol usage we used to
support in old versions of Tor, involving conditional delivery of certificate
chains (v1 link protocol) and conditional renegotiation (v2 link protocol).
**/ **/

View File

@ -1,4 +1,8 @@
/** /**
@dir lib/trace @dir /lib/trace
@brief lib/trace @brief lib/trace: Function-tracing functionality API.
This module is used for adding "trace" support (low-granularity function
logging) to Tor. Right now it doesn't have many users.
**/ **/

View File

@ -1,4 +1,4 @@
/** /**
@dir lib/version @dir /lib/version
@brief lib/version @brief lib/version: holds the current version of Tor.
**/ **/

View File

@ -1,4 +1,13 @@
/** /**
@dir lib/wallclock @dir /lib/wallclock
@brief lib/wallclock @brief lib/wallclock: Inspect and manipulate the current time.
This module handles our concept of "what time is it" or "what time does the
world agree it is?" Generally, if you want something derived from UTC, this
is the module for you.
For versions of the time that are more local, more monotonic, or more
accurate, see \refdir{lib/time}. For parsing and encoding times and dates,
see \refdir{lib/encoding}.
**/ **/

View File

@ -1,11 +1,122 @@
/** /**
@mainpage Tor source reference @mainpage Tor source reference
@section intro Getting to know Tor @section intro Welcome to Tor
Welcome to the Tor source code documentation! Here we have documentation for This documentation describes the general structure of the Tor codebase, how
nearly every function, type, and module in the Tor source code. The high-level it fits together, what functionality is available for extending Tor, and
documentation is a work in progress. For now, have a look at the source code gives some notes on how Tor got that way. It also includes a reference for
overview in doc/HACKING/design. nearly every function, type, file, and module in the Tor source code. The
high-level documentation is a work in progress.
Tor itself remains a work in progress too: We've been working on it for
nearly two decades, and we've learned a lot about good coding since we first
started. This means, however, that some of the older pieces of Tor will have
some "code smell" in them that could stand a brisk refactoring. So when we
describe a piece of code, we'll sometimes give a note on how it got that way,
and whether we still think that's a good idea.
This document is not an overview of the Tor protocol. For that, see the
design paper and the specifications at https://spec.torproject.org/ .
For more information about Tor's coding standards and some helpful
development tools, see
[doc/HACKING](https://gitweb.torproject.org/tor.git/tree/doc/HACKING) in the
Tor repository.
@section highlevel The very high level
Ultimately, Tor runs as an event-driven network daemon: it responds to
network events, signals, and timers by sending and receiving things over
the network. Clients, relays, and directory authorities all use the
same codebase: the Tor process will run as a client, relay, or authority
depending on its configuration.
Tor has a few major dependencies, including Libevent (used to tell which
sockets are readable and writable), OpenSSL or NSS (used for many encryption
functions, and to implement the TLS protocol), and zlib (used to
compress and uncompress directory information).
Most of Tor's work today is done in a single event-driven main thread.
Tor also spawns one or more worker threads to handle CPU-intensive
tasks. (Right now, this only includes circuit encryption and the more
expensive compression algorithms.)
On startup, Tor initializes its libraries, reads and responds to its
configuration files, and launches a main event loop. At first, the only
events that Tor listens for are a few signals (like TERM and HUP), and
one or more listener sockets (for different kinds of incoming
connections). Tor also configures several timers to handle periodic
events. As Tor runs over time, other events will open, and new events
will be scheduled.
The codebase is divided into a few top-level subdirectories, each of
which contains several sub-modules.
- `ext` -- Code maintained elsewhere that we include in the Tor
source distribution.
- \refdir{lib} -- Lower-level utility code, not necessarily
tor-specific.
- `trunnel` -- Automatically generated code (from the Trunnel
tool): used to parse and encode binary formats.
- \refdir{core} -- Networking code that is implements the central
parts of the Tor protocol and main loop.
- \refdir{feature} -- Aspects of Tor (like directory management,
running a relay, running a directory authorities, managing a list of
nodes, running and using onion services) that are built on top of the
mainloop code.
- \refdir{app} -- Highest-level functionality; responsible for setting
up and configuring the Tor daemon, making sure all the lower-level
modules start up when required, and so on.
- \refdir{tools} -- Binaries other than Tor that we produce.
Currently this is tor-resolve, tor-gencert, and the tor_runner.o helper
module.
- `test` -- unit tests, regression tests, and a few integration
tests.
In theory, the above parts of the codebase are sorted from highest-level to
lowest-level, where high-level code is only allowed to invoke lower-level
code, and lower-level code never includes or depends on code of a higher
level. In practice, this refactoring is incomplete: The modules in
\refdir{lib} are well-factored, but there are many layer violations ("upward
dependencies") in \refdir{core} and \refdir{feature}.
We aim to eliminate those over time.
@section keyabstractions Some key high-level abstractions
The most important abstractions at Tor's high-level are Connections,
Channels, Circuits, and Nodes.
A 'Connection' (connection_t) represents a stream-based information flow.
Most connections are TCP connections to remote Tor servers and clients. (But
as a shortcut, a relay will sometimes make a connection to itself without
actually using a TCP connection. More details later on.) Connections exist
in different varieties, depending on what functionality they provide. The
principle types of connection are edge_connection_t (eg a socks connection or
a connection from an exit relay to a destination), or_connection_t (a TLS
stream connecting to a relay), dir_connection_t (an HTTP connection to learn
about the network), and control_connection_t (a connection from a
controller).
A 'Circuit' (circuit_t) is persistent tunnel through the Tor network,
established with public-key cryptography, and used to send cells one or more
hops. Clients keep track of multi-hop circuits (origin_circuit_t), and the
cryptography associated with each hop. Relays, on the other hand, keep track
only of their hop of each circuit (or_circuit_t).
A 'Channel' (channel_t) is an abstract view of sending cells to and from a
Tor relay. Currently, all channels are implemented using OR connections
(channel_tls_t). If we switch to other strategies in the future, we'll have
more connection types.
A 'Node' (node_t) is a view of a Tor instance's current knowledge and opinions
about a Tor relay or bridge.
**/ **/

View File

@ -1,5 +1,5 @@
/** /**
@dir tools @dir /tools
@brief tools: other command-line tools for use with Tor. @brief tools: other command-line tools for use with Tor.
The "tools" directory has a few other programs that use Tor, but are not part The "tools" directory has a few other programs that use Tor, but are not part