mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-30 15:43:32 +01:00
Merge branch 'doxygen_libs'
This commit is contained in:
commit
8933789fef
@ -256,6 +256,8 @@ TAB_SIZE = 8
|
||||
|
||||
ALIASES =
|
||||
|
||||
ALIASES += refdir{1}="\ref @top_srcdir@/src/\1 \"\1\""
|
||||
|
||||
# This tag can be used to specify a number of word-keyword mappings (TCL only).
|
||||
# A mapping has the form "name=value". For example adding "class=itcl::class"
|
||||
# will allow you to use the command class in the itcl::class meaning.
|
||||
|
@ -1,124 +1,6 @@
|
||||
|
||||
## Overview ##
|
||||
|
||||
This document describes the general structure of the Tor codebase, how
|
||||
it fits together, what functionality is available for extending Tor,
|
||||
and gives some notes on how Tor got that way.
|
||||
|
||||
Tor remains a work in progress: We've been working on it for nearly two
|
||||
decades, and we've learned a lot about good coding since we first
|
||||
started. This means, however, that some of the older pieces of Tor will
|
||||
have some "code smell" in them that could stand a brisk
|
||||
refactoring. So when I describe a piece of code, I'll sometimes give a
|
||||
note on how it got that way, and whether I still think that's a good
|
||||
idea.
|
||||
|
||||
The first drafts of this document were written in the Summer and Fall of
|
||||
2015, when Tor 0.2.6 was the most recent stable version, and Tor 0.2.7
|
||||
was under development. There is a revision in progress (as of late
|
||||
2019), to bring it up to pace with Tor as of version 0.4.2. If you're
|
||||
reading this far in the future, some things may have changed. Caveat
|
||||
haxxor!
|
||||
|
||||
This document is not an overview of the Tor protocol. For that, see the
|
||||
design paper and the specifications at https://spec.torproject.org/ .
|
||||
|
||||
For more information about Tor's coding standards and some helpful
|
||||
development tools, see doc/HACKING in the Tor repository.
|
||||
|
||||
|
||||
### The very high level ###
|
||||
|
||||
Ultimately, Tor runs as an event-driven network daemon: it responds to
|
||||
network events, signals, and timers by sending and receiving things over
|
||||
the network. Clients, relays, and directory authorities all use the
|
||||
same codebase: the Tor process will run as a client, relay, or authority
|
||||
depending on its configuration.
|
||||
|
||||
Tor has a few major dependencies, including Libevent (used to tell which
|
||||
sockets are readable and writable), OpenSSL or NSS (used for many encryption
|
||||
functions, and to implement the TLS protocol), and zlib (used to
|
||||
compress and uncompress directory information).
|
||||
|
||||
Most of Tor's work today is done in a single event-driven main thread.
|
||||
Tor also spawns one or more worker threads to handle CPU-intensive
|
||||
tasks. (Right now, this only includes circuit encryption and the more
|
||||
expensive compression algorithms.)
|
||||
|
||||
On startup, Tor initializes its libraries, reads and responds to its
|
||||
configuration files, and launches a main event loop. At first, the only
|
||||
events that Tor listens for are a few signals (like TERM and HUP), and
|
||||
one or more listener sockets (for different kinds of incoming
|
||||
connections). Tor also configures several timers to handle periodic
|
||||
events. As Tor runs over time, other events will open, and new events
|
||||
will be scheduled.
|
||||
|
||||
The codebase is divided into a few top-level subdirectories, each of
|
||||
which contains several sub-modules.
|
||||
|
||||
* `src/ext` -- Code maintained elsewhere that we include in the Tor
|
||||
source distribution.
|
||||
|
||||
* src/lib` -- Lower-level utility code, not necessarily tor-specific.
|
||||
|
||||
* `src/trunnel` -- Automatically generated code (from the Trunnel
|
||||
tool): used to parse and encode binary formats.
|
||||
|
||||
* `src/core` -- Networking code that is implements the central parts of
|
||||
the Tor protocol and main loop.
|
||||
|
||||
* `src/feature` -- Aspects of Tor (like directory management, running a
|
||||
relay, running a directory authorities, managing a list of nodes,
|
||||
running and using onion services) that are built on top of the
|
||||
mainloop code.
|
||||
|
||||
* `src/app` -- Highest-level functionality; responsible for setting up
|
||||
and configuring the Tor daemon, making sure all the lower-level
|
||||
modules start up when required, and so on.
|
||||
|
||||
* `src/tools` -- Binaries other than Tor that we produce. Currently this
|
||||
is tor-resolve, tor-gencert, and the tor_runner.o helper module.
|
||||
|
||||
* `src/test` -- unit tests, regression tests, and a few integration
|
||||
tests.
|
||||
|
||||
In theory, the above parts of the codebase are sorted from highest-level to
|
||||
lowest-level, where high-level code is only allowed to invoke lower-level
|
||||
code, and lower-level code never includes or depends on code of a higher
|
||||
level. In practice, this refactoring is incomplete: The modules in `src/lib`
|
||||
are well-factored, but there are many layer violations ("upward
|
||||
dependencies") in `src/core` and `src/feature`. We aim to eliminate those
|
||||
over time.
|
||||
|
||||
### Some key high-level abstractions ###
|
||||
|
||||
The most important abstractions at Tor's high-level are Connections,
|
||||
Channels, Circuits, and Nodes.
|
||||
|
||||
A 'Connection' represents a stream-based information flow. Most
|
||||
connections are TCP connections to remote Tor servers and clients. (But
|
||||
as a shortcut, a relay will sometimes make a connection to itself
|
||||
without actually using a TCP connection. More details later on.)
|
||||
Connections exist in different varieties, depending on what
|
||||
functionality they provide. The principle types of connection are
|
||||
"edge" (eg a socks connection or a connection from an exit relay to a
|
||||
destination), "OR" (a TLS stream connecting to a relay), "Directory" (an
|
||||
HTTP connection to learn about the network), and "Control" (a connection
|
||||
from a controller).
|
||||
|
||||
A 'Circuit' is persistent tunnel through the Tor network, established
|
||||
with public-key cryptography, and used to send cells one or more hops.
|
||||
Clients keep track of multi-hop circuits, and the cryptography
|
||||
associated with each hop. Relays, on the other hand, keep track only of
|
||||
their hop of each circuit.
|
||||
|
||||
A 'Channel' is an abstract view of sending cells to and from a Tor
|
||||
relay. Currently, all channels are implemented using OR connections.
|
||||
If we switch to other strategies in the future, we'll have more
|
||||
connection types.
|
||||
|
||||
A 'Node' is a view of a Tor instance's current knowledge and opinions
|
||||
about a Tor relay or bridge.
|
||||
|
||||
### The rest of this document. ###
|
||||
|
||||
|
@ -1,171 +0,0 @@
|
||||
|
||||
## Library code in Tor.
|
||||
|
||||
Most of Tor's utility code is in modules in the `src/lib` subdirectory. In
|
||||
general, this code is not necessarily Tor-specific, but is instead possibly
|
||||
useful for other applications.
|
||||
|
||||
This code includes:
|
||||
|
||||
* Compatibility wrappers, to provide a uniform API across different
|
||||
platforms.
|
||||
|
||||
* Library wrappers, to provide a tor-like API over different libraries
|
||||
that Tor uses for things like compression and cryptography.
|
||||
|
||||
* Containers, to implement some general-purpose data container types.
|
||||
|
||||
The modules in `src/lib` are currently well-factored: each one depends
|
||||
only on lower-level modules. You can see an up-to-date list of the
|
||||
modules sorted from lowest to highest level by running
|
||||
`./scripts/maint/practracker/includes.py --toposort`.
|
||||
|
||||
As of this writing, the library modules are (from lowest to highest
|
||||
level):
|
||||
|
||||
* `lib/cc` -- Macros for managing the C compiler and
|
||||
language. Includes macros for improving compatibility and clarity
|
||||
across different C compilers.
|
||||
|
||||
* `lib/version` -- Holds the current version of Tor.
|
||||
|
||||
* `lib/testsupport` -- Helpers for making test-only code and test
|
||||
mocking support.
|
||||
|
||||
* `lib/defs` -- Lowest-level constants used in many places across the
|
||||
code.
|
||||
|
||||
* `lib/subsys` -- Types used for declaring a "subsystem". A subsystem
|
||||
is a module with support for initialization, shutdown,
|
||||
configuration, and so on.
|
||||
|
||||
* `lib/conf` -- Types and macros used for declaring configuration
|
||||
options.
|
||||
|
||||
* `lib/arch` -- Compatibility functions and macros for handling
|
||||
differences in CPU architecture.
|
||||
|
||||
* `lib/err` -- Lowest-level error handling code: responsible for
|
||||
generating stack traces, handling raw assertion failures, and
|
||||
otherwise reporting problems that might not be safe to report
|
||||
via the regular logging module.
|
||||
|
||||
* `lib/malloc` -- Wrappers and utilities for memory management.
|
||||
|
||||
* `lib/intmath` -- Utilities for integer mathematics.
|
||||
|
||||
* `lib/fdio` -- Utilities and compatibility code for reading and
|
||||
writing data on file descriptors (and on sockets, for platforms
|
||||
where a socket is not a kind of fd).
|
||||
|
||||
* `lib/lock` -- Compatibility code for declaring and using locks.
|
||||
Lower-level than the rest of the threading code.
|
||||
|
||||
* `lib/ctime` -- Constant-time implementations for data comparison
|
||||
and table lookup, used to avoid timing side-channels from standard
|
||||
implementations of memcmp() and so on.
|
||||
|
||||
* `lib/string` -- Low-level compatibility wrappers and utility
|
||||
functions for string manipulation.
|
||||
|
||||
* `lib/wallclock` -- Compatibility and utility functions for
|
||||
inspecting and manipulating the current (UTC) time.
|
||||
|
||||
* `lib/osinfo` -- Functions for inspecting the version and
|
||||
capabilities of the operating system.
|
||||
|
||||
* `lib/smartlist_core` -- The bare-bones pieces of our dynamic array
|
||||
("smartlist") implementation. There are higher-level pieces, but
|
||||
these ones are used by (and therefore cannot use) the logging code.
|
||||
|
||||
* `lib/log` -- Implements the logging system used by all higher-level
|
||||
Tor code. You can think of this as the logical "midpoint" of the
|
||||
library code: much of the higher-level code is higher-level
|
||||
_because_ it uses the logging module, and much of the lower-level
|
||||
code is specifically written to avoid having to log, because the
|
||||
logging module depends on it.
|
||||
|
||||
* `lib/container` -- General purpose containers, including dynamic arrays
|
||||
("smartlists"), hashtables, bit arrays, weak-reference-like "handles",
|
||||
bloom filters, and a bit more.
|
||||
|
||||
* `lib/trace` -- A general-purpose API for introducing
|
||||
function-tracing functionality into Tor. Currently not much used.
|
||||
|
||||
* `lib/thread` -- Threading compatibility and utility functionality,
|
||||
other than low-level locks (which are in `lib/lock`) and
|
||||
workqueue/threadpool code (which belongs in `lib/evloop`).
|
||||
|
||||
* `lib/term` -- Code for terminal manipulation functions (like
|
||||
reading a password from the user).
|
||||
|
||||
* `lib/memarea` -- A data structure for a fast "arena" style allocator,
|
||||
where the data is freed all at once. Used for parsing.
|
||||
|
||||
* `lib/encoding` -- Implementations for encoding data in various
|
||||
formats, datatypes, and transformations.
|
||||
|
||||
* `lib/dispatch` -- A general-purpose in-process message delivery
|
||||
system. Used by `lib/pubsub` to implement our inter-module
|
||||
publish/subscribe system.
|
||||
|
||||
* `lib/sandbox` -- Our Linux seccomp2 sandbox implementation.
|
||||
|
||||
* `lib/pubsub` -- Code and macros to implement our publish/subscribe
|
||||
message passing system.
|
||||
|
||||
* `lib/fs` -- Utility and compatibility code for manipulating files,
|
||||
filenames, directories, and so on.
|
||||
|
||||
* `lib/confmgt` -- Code to parse, encode, and manipulate our
|
||||
configuration files, state files, and so forth.
|
||||
|
||||
* `lib/crypt_ops` -- Cryptographic operations. This module contains
|
||||
wrappers around the cryptographic libraries that we support,
|
||||
and implementations for some higher-level cryptographic
|
||||
constructions that we use.
|
||||
|
||||
* `lib/meminfo` -- Functions for inspecting our memory usage, if the
|
||||
malloc implementation exposes that to us.
|
||||
|
||||
* `lib/time` -- Higher level time functions, including fine-gained and
|
||||
monotonic timers.
|
||||
|
||||
* `lib/math` -- Floating-point mathematical utilities, including
|
||||
compatibility code, and probability distributions.
|
||||
|
||||
* `lib/buf` -- A general purpose queued buffer implementation,
|
||||
similar to the BSD kernel's "mbuf" structure.
|
||||
|
||||
* `lib/net` -- Networking code, including address manipulation,
|
||||
compatibility wrappers,
|
||||
|
||||
* `lib/compress` -- A compatibility wrapper around several
|
||||
compression libraries, currently including zlib, zstd, and lzma.
|
||||
|
||||
* `lib/geoip` -- Utilities to manage geoip (IP to country) lookups
|
||||
and formats.
|
||||
|
||||
* `lib/tls` -- Compatibility wrappers around the library (NSS or
|
||||
OpenSSL, depending on configuration) that Tor uses to implement the
|
||||
TLS link security protocol.
|
||||
|
||||
* `lib/evloop` -- Tools to manage the event loop and related
|
||||
functionality, in order to implement asynchronous networking,
|
||||
timers, periodic events, and other scheduling tasks.
|
||||
|
||||
* `lib/process` -- Utilities and compatibility code to launch and
|
||||
manage subprocesses.
|
||||
|
||||
### What belongs in lib?
|
||||
|
||||
In general, if you can imagine some program wanting the functionality
|
||||
you're writing, even if that program had nothing to do with Tor, your
|
||||
functionality belongs in lib.
|
||||
|
||||
If it falls into one of the existing "lib" categories, your
|
||||
functionality belongs in lib.
|
||||
|
||||
If you are using platform-specific `#ifdef`s to manage compatibility
|
||||
issues among platforms, you should probably consider whether you can
|
||||
put your code into lib.
|
@ -1,103 +0,0 @@
|
||||
|
||||
## Memory management
|
||||
|
||||
### Heap-allocation functions: lib/malloc/malloc.h
|
||||
|
||||
Tor imposes a few light wrappers over C's native malloc and free
|
||||
functions, to improve convenience, and to allow wholescale replacement
|
||||
of malloc and free as needed.
|
||||
|
||||
You should never use 'malloc', 'calloc', 'realloc, or 'free' on their
|
||||
own; always use the variants prefixed with 'tor_'.
|
||||
They are the same as the standard C functions, with the following
|
||||
exceptions:
|
||||
|
||||
* `tor_free(NULL)` is a no-op.
|
||||
* `tor_free()` is a macro that takes an lvalue as an argument and sets it to
|
||||
NULL after freeing it. To avoid this behavior, you can use `tor_free_()`
|
||||
instead.
|
||||
* tor_malloc() and friends fail with an assertion if they are asked to
|
||||
allocate a value so large that it is probably an underflow.
|
||||
* It is always safe to `tor_malloc(0)`, regardless of whether your libc
|
||||
allows it.
|
||||
* `tor_malloc()`, `tor_realloc()`, and friends are never allowed to fail.
|
||||
Instead, Tor will die with an assertion. This means that you never
|
||||
need to check their return values. See the next subsection for
|
||||
information on why we think this is a good idea.
|
||||
|
||||
We define additional general-purpose memory allocation functions as well:
|
||||
|
||||
* `tor_malloc_zero(x)` behaves as `calloc(1, x)`, except the it makes clear
|
||||
the intent to allocate a single zeroed-out value.
|
||||
* `tor_reallocarray(x,y)` behaves as the OpenBSD reallocarray function.
|
||||
Use it for cases when you need to realloc() in a multiplication-safe
|
||||
way.
|
||||
|
||||
And specific-purpose functions as well:
|
||||
|
||||
* `tor_strdup()` and `tor_strndup()` behaves as the underlying libc
|
||||
functions, but use `tor_malloc()` instead of the underlying function.
|
||||
* `tor_memdup()` copies a chunk of memory of a given size.
|
||||
* `tor_memdup_nulterm()` copies a chunk of memory of a given size, then
|
||||
NUL-terminates it just to be safe.
|
||||
|
||||
#### Why assert on allocation failure?
|
||||
|
||||
Why don't we allow `tor_malloc()` and its allies to return NULL?
|
||||
|
||||
First, it's error-prone. Many programmers forget to check for NULL return
|
||||
values, and testing for `malloc()` failures is a major pain.
|
||||
|
||||
Second, it's not necessarily a great way to handle OOM conditions. It's
|
||||
probably better (we think) to have a memory target where we dynamically free
|
||||
things ahead of time in order to stay under the target. Trying to respond to
|
||||
an OOM at the point of `tor_malloc()` failure, on the other hand, would involve
|
||||
a rare operation invoked from deep in the call stack. (Again, that's
|
||||
error-prone and hard to debug.)
|
||||
|
||||
Third, thanks to the rise of Linux and other operating systems that allow
|
||||
memory to be overcommitted, you can't actually ever rely on getting a NULL
|
||||
from `malloc()` when you're out of memory; instead you have to use an approach
|
||||
closer to tracking the total memory usage.
|
||||
|
||||
#### Conventions for your own allocation functions.
|
||||
|
||||
Whenever you create a new type, the convention is to give it a pair of
|
||||
`x_new()` and `x_free_()` functions, named after the type.
|
||||
|
||||
Calling `x_free(NULL)` should always be a no-op.
|
||||
|
||||
There should additionally be an `x_free()` macro, defined in terms of
|
||||
`x_free_()`. This macro should set its lvalue to NULL. You can define it
|
||||
using the FREE_AND_NULL macro, as follows:
|
||||
|
||||
```
|
||||
#define x_free(ptr) FREE_AND_NULL(x_t, x_free_, (ptr))
|
||||
```
|
||||
|
||||
|
||||
### Grow-only memory allocation: lib/memarea
|
||||
|
||||
It's often handy to allocate a large number of tiny objects, all of which
|
||||
need to disappear at the same time. You can do this in tor using the
|
||||
memarea.c abstraction, which uses a set of grow-only buffers for allocation,
|
||||
and only supports a single "free" operation at the end.
|
||||
|
||||
Using memareas also helps you avoid memory fragmentation. You see, some libc
|
||||
malloc implementations perform badly on the case where a large number of
|
||||
small temporary objects are allocated at the same time as a few long-lived
|
||||
objects of similar size. But if you use tor_malloc() for the long-lived ones
|
||||
and a memarea for the temporary object, the malloc implementation is likelier
|
||||
to do better.
|
||||
|
||||
To create a new memarea, use `memarea_new()`. To drop all the storage from a
|
||||
memarea, and invalidate its pointers, use `memarea_drop_all()`.
|
||||
|
||||
The allocation functions `memarea_alloc()`, `memarea_alloc_zero()`,
|
||||
`memarea_memdup()`, `memarea_strdup()`, and `memarea_strndup()` are analogous
|
||||
to the similarly-named malloc() functions. There is intentionally no
|
||||
`memarea_free()` or `memarea_realloc()`.
|
||||
|
||||
### Special allocation: lib/malloc/map_anon.h
|
||||
|
||||
TODO: WRITEME.
|
@ -1,45 +0,0 @@
|
||||
|
||||
## Collections in tor
|
||||
|
||||
### Smartlists: Neither lists, nor especially smart.
|
||||
|
||||
For historical reasons, we call our dynamic-allocated array type
|
||||
`smartlist_t`. It can grow or shrink as elements are added and removed.
|
||||
|
||||
All smartlists hold an array of `void *`. Whenever you expose a smartlist
|
||||
in an API you *must* document which types its pointers actually hold.
|
||||
|
||||
<!-- It would be neat to fix that, wouldn't it? -NM -->
|
||||
|
||||
Smartlists are created empty with `smartlist_new()` and freed with
|
||||
`smartlist_free()`. See the `containers.h` module documentation for more
|
||||
information; there are many convenience functions for commonly needed
|
||||
operations.
|
||||
|
||||
<!-- TODO: WRITE more about what you can do with smartlists. -->
|
||||
|
||||
### Digest maps, string maps, and more.
|
||||
|
||||
Tor makes frequent use of maps from 160-bit digests, 256-bit digests,
|
||||
or nul-terminated strings to `void *`. These types are `digestmap_t`,
|
||||
`digest256map_t`, and `strmap_t` respectively. See the containers.h
|
||||
module documentation for more information.
|
||||
|
||||
### Intrusive lists and hashtables
|
||||
|
||||
For performance-sensitive cases, we sometimes want to use "intrusive"
|
||||
collections: ones where the bookkeeping pointers are stuck inside the
|
||||
structures that belong to the collection. If you've used the
|
||||
BSD-style sys/queue.h macros, you'll be familiar with these.
|
||||
|
||||
Unfortunately, the `sys/queue.h` macros vary significantly between the
|
||||
platforms that have them, so we provide our own variants in
|
||||
`src/ext/tor_queue.h`.
|
||||
|
||||
We also provide an intrusive hashtable implementation in `src/ext/ht.h`.
|
||||
When you're using it, you'll need to define your own hash
|
||||
functions. If attacker-induced collisions are a worry here, use the
|
||||
cryptographic siphash24g function to extract hashes.
|
||||
|
||||
<!-- TODO: WRITE about bloom filters, namemaps, bit-arrays, order functions.
|
||||
-->
|
@ -1,132 +1,4 @@
|
||||
|
||||
## Lower-level cryptography functionality in Tor ##
|
||||
|
||||
Generally speaking, Tor code shouldn't be calling OpenSSL (or any
|
||||
other crypto library) directly. Instead, we should indirect through
|
||||
one of the functions in src/common/crypto\*.c or src/common/tortls.c.
|
||||
|
||||
Cryptography functionality that's available is described below.
|
||||
|
||||
### RNG facilities ###
|
||||
|
||||
The most basic RNG capability in Tor is the crypto_rand() family of
|
||||
functions. These currently use OpenSSL's RAND_() backend, but may use
|
||||
something faster in the future.
|
||||
|
||||
In addition to crypto_rand(), which fills in a buffer with random
|
||||
bytes, we also have functions to produce random integers in certain
|
||||
ranges; to produce random hostnames; to produce random doubles, etc.
|
||||
|
||||
When you're creating a long-term cryptographic secret, you might want
|
||||
to use crypto_strongest_rand() instead of crypto_rand(). It takes the
|
||||
operating system's entropy source and combines it with output from
|
||||
crypto_rand(). This is a pure paranoia measure, but it might help us
|
||||
someday.
|
||||
|
||||
You can use smartlist_choose() to pick a random element from a smartlist
|
||||
and smartlist_shuffle() to randomize the order of a smartlist. Both are
|
||||
potentially a bit slow.
|
||||
|
||||
### Cryptographic digests and related functions ###
|
||||
|
||||
We treat digests as separate types based on the length of their
|
||||
outputs. We support one 160-bit digest (SHA1), two 256-bit digests
|
||||
(SHA256 and SHA3-256), and two 512-bit digests (SHA512 and SHA3-512).
|
||||
|
||||
You should not use SHA1 for anything new.
|
||||
|
||||
The crypto_digest\*() family of functions manipulates digests. You
|
||||
can either compute a digest of a chunk of memory all at once using
|
||||
crypto_digest(), crypto_digest256(), or crypto_digest512(). Or you
|
||||
can create a crypto_digest_t object with
|
||||
crypto_digest{,256,512}_new(), feed information to it in chunks using
|
||||
crypto_digest_add_bytes(), and then extract the final digest using
|
||||
crypto_digest_get_digest(). You can copy the state of one of these
|
||||
objects using crypto_digest_dup() or crypto_digest_assign().
|
||||
|
||||
We support the HMAC hash-based message authentication code
|
||||
instantiated using SHA256. See crypto_hmac_sha256. (You should not
|
||||
add any HMAC users with SHA1, and HMAC is not necessary with SHA3.)
|
||||
|
||||
We also support the SHA3 cousins, SHAKE128 and SHAKE256. Unlike
|
||||
digests, these are extendable output functions (or XOFs) where you can
|
||||
get any amount of output. Use the crypto_xof_\*() functions to access
|
||||
these.
|
||||
|
||||
We have several ways to derive keys from cryptographically strong secret
|
||||
inputs (like diffie-hellman outputs). The old
|
||||
crypto_expand_key_material-TAP() performs an ad-hoc KDF based on SHA1 -- you
|
||||
shouldn't use it for implementing anything but old versions of the Tor
|
||||
protocol. You can use HKDF-SHA256 (as defined in RFC5869) for more modern
|
||||
protocols. Also consider SHAKE256.
|
||||
|
||||
If your input is potentially weak, like a password or passphrase, use a salt
|
||||
along with the secret_to_key() functions as defined in crypto_s2k.c. Prefer
|
||||
scrypt over other hashing methods when possible. If you're using a password
|
||||
to encrypt something, see the "boxed file storage" section below.
|
||||
|
||||
Finally, in order to store objects in hash tables, Tor includes the
|
||||
randomized SipHash 2-4 function. Call it via the siphash24g() function in
|
||||
src/ext/siphash.h whenever you're creating a hashtable whose keys may be
|
||||
manipulated by an attacker in order to DoS you with collisions.
|
||||
|
||||
|
||||
### Stream ciphers ###
|
||||
|
||||
You can create instances of a stream cipher using crypto_cipher_new().
|
||||
These are stateful objects of type crypto_cipher_t. Note that these
|
||||
objects only support AES-128 right now; a future version should add
|
||||
support for AES-128 and/or ChaCha20.
|
||||
|
||||
You can encrypt/decrypt with crypto_cipher_encrypt or
|
||||
crypto_cipher_decrypt. The crypto_cipher_crypt_inplace function performs
|
||||
an encryption without a copy.
|
||||
|
||||
Note that sensible people should not use raw stream ciphers; they should
|
||||
probably be using some kind of AEAD. Sorry.
|
||||
|
||||
### Public key functionality ###
|
||||
|
||||
We support four public key algorithms: DH1024, RSA, Curve25519, and
|
||||
Ed25519.
|
||||
|
||||
We support DH1024 over two prime groups. You access these via the
|
||||
crypto_dh_\*() family of functions.
|
||||
|
||||
We support RSA in many bit sizes for signing and encryption. You access
|
||||
it via the crypto_pk_*() family of functions. Note that a crypto_pk_t
|
||||
may or may not include a private key. See the crypto_pk_* functions in
|
||||
crypto.c for a full list of functions here.
|
||||
|
||||
For Curve25519 functionality, see the functions and types in
|
||||
crypto_curve25519.c. Curve25519 is generally suitable for when you need
|
||||
a secure fast elliptic-curve diffie hellman implementation. When
|
||||
designing new protocols, prefer it over DH in Z_p.
|
||||
|
||||
For Ed25519 functionality, see the functions and types in
|
||||
crypto_ed25519.c. Ed25519 is a generally suitable as a secure fast
|
||||
elliptic curve signature method. For new protocols, prefer it over RSA
|
||||
signatures.
|
||||
|
||||
### Metaformats for storage ###
|
||||
|
||||
When OpenSSL manages the storage of some object, we use whatever format
|
||||
OpenSSL provides -- typically, some kind of PEM-wrapped base 64 encoding
|
||||
that starts with "----- BEGIN CRYPTOGRAPHIC OBJECT ----".
|
||||
|
||||
When we manage the storage of some cryptographic object, we prefix the
|
||||
object with 32-byte NUL-padded prefix in order to avoid accidental
|
||||
object confusion; see the crypto_read_tagged_contents_from_file() and
|
||||
crypto_write_tagged_contents_to_file() functions for manipulating
|
||||
these. The prefix is "== type: tag ==", where type describes the object
|
||||
and its encoding, and tag indicates which one it is.
|
||||
|
||||
### Boxed-file storage ###
|
||||
|
||||
When managing keys, you frequently want to have some way to write a
|
||||
secret object to disk, encrypted with a passphrase. The crypto_pwbox
|
||||
and crypto_unpwbox functions do so in a way that's likely to be
|
||||
readable by future versions of Tor.
|
||||
|
||||
### Certificates ###
|
||||
|
||||
@ -153,17 +25,3 @@ napkin.
|
||||
documents that include keys and which are signed by keys. You can
|
||||
consider these documents to be an additional kind of certificate if you
|
||||
want.)
|
||||
|
||||
### TLS ###
|
||||
|
||||
Tor's TLS implementation is more tightly coupled to OpenSSL than we'd
|
||||
prefer. You can read most of it in tortls.c.
|
||||
|
||||
Unfortunately, TLS's state machine and our requirement for nonblocking
|
||||
IO support means that using TLS in practice is a bit hairy, since
|
||||
logical writes can block on a physical reads, and vice versa.
|
||||
|
||||
If you are lucky, you will never have to look at the code here.
|
||||
|
||||
|
||||
|
||||
|
@ -1,95 +1,6 @@
|
||||
|
||||
## Tor's modules ##
|
||||
|
||||
### Generic modules ###
|
||||
|
||||
`buffers.c`
|
||||
: Implements the `buf_t` buffered data type for connections, and several
|
||||
low-level data handling functions to handle network protocols on it.
|
||||
|
||||
`channel.c`
|
||||
: Generic channel implementation. Channels handle sending and receiving cells
|
||||
among tor nodes.
|
||||
|
||||
`channeltls.c`
|
||||
: Channel implementation for TLS-based OR connections. Uses `connection_or.c`.
|
||||
|
||||
`circuitbuild.c`
|
||||
: Code for constructing circuits and choosing their paths. (*Note*:
|
||||
this module could plausibly be split into handling the client side,
|
||||
the server side, and the path generation aspects of circuit building.)
|
||||
|
||||
`circuitlist.c`
|
||||
: Code for maintaining and navigating the global list of circuits.
|
||||
|
||||
`circuitmux.c`
|
||||
: Generic circuitmux implementation. A circuitmux handles deciding, for a
|
||||
particular channel, which circuit should write next.
|
||||
|
||||
`circuitmux_ewma.c`
|
||||
: A circuitmux implementation based on the EWMA (exponentially
|
||||
weighted moving average) algorithm.
|
||||
|
||||
`circuituse.c`
|
||||
: Code to actually send and receive data on circuits.
|
||||
|
||||
`command.c`
|
||||
: Handles incoming cells on channels.
|
||||
|
||||
`config.c`
|
||||
: Parses options from torrc, and uses them to configure the rest of Tor.
|
||||
|
||||
`confparse.c`
|
||||
: Generic torrc-style parser. Used to parse torrc and state files.
|
||||
|
||||
`connection.c`
|
||||
: Generic and common connection tools, and implementation for the simpler
|
||||
connection types.
|
||||
|
||||
`connection_edge.c`
|
||||
: Implementation for entry and exit connections.
|
||||
|
||||
`connection_or.c`
|
||||
: Implementation for OR connections (the ones that send cells over TLS).
|
||||
|
||||
`main.c`
|
||||
: Principal entry point, main loops, scheduled events, and network
|
||||
management for Tor.
|
||||
|
||||
`ntmain.c`
|
||||
: Implements Tor as a Windows service. (Not very well.)
|
||||
|
||||
`onion.c`
|
||||
: Generic code for generating and responding to CREATE and CREATED
|
||||
cells, and performing the appropriate onion handshakes. Also contains
|
||||
code to manage the server-side onion queue.
|
||||
|
||||
`onion_fast.c`
|
||||
: Implements the old SHA1-based CREATE_FAST/CREATED_FAST circuit
|
||||
creation handshake. (Now deprecated.)
|
||||
|
||||
`onion_ntor.c`
|
||||
: Implements the Curve25519-based NTOR circuit creation handshake.
|
||||
|
||||
`onion_tap.c`
|
||||
: Implements the old RSA1024/DH1024-based TAP circuit creation handshake. (Now
|
||||
deprecated.)
|
||||
|
||||
`relay.c`
|
||||
: Handles particular types of relay cells, and provides code to receive,
|
||||
encrypt, route, and interpret relay cells.
|
||||
|
||||
`scheduler.c`
|
||||
: Decides which channel/circuit pair is ready to receive the next cell.
|
||||
|
||||
`statefile.c`
|
||||
: Handles loading and storing Tor's state file.
|
||||
|
||||
`tor_main.c`
|
||||
: Contains the actual `main()` function. (This is placed in a separate
|
||||
file so that the unit tests can have their own `main()`.)
|
||||
|
||||
|
||||
### Node-status modules ###
|
||||
|
||||
`directory.c`
|
||||
|
@ -1,5 +1,5 @@
|
||||
/**
|
||||
@dir app
|
||||
@dir /app
|
||||
@brief app: top-level entry point for Tor
|
||||
|
||||
The "app" directory has Tor's main entry point and configuration logic,
|
||||
|
@ -1,4 +1,8 @@
|
||||
/**
|
||||
@dir app/config
|
||||
@brief app/config
|
||||
@dir /app/config
|
||||
@brief app/config: Top-level configuration code
|
||||
|
||||
Refactoring this module is a work in progress, see
|
||||
[ticket 29211](https://trac.torproject.org/projects/tor/ticket/29211).
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,4 @@
|
||||
/**
|
||||
@dir app/main
|
||||
@brief app/main
|
||||
@dir /app/main
|
||||
@brief app/main: Entry point for tor.
|
||||
**/
|
||||
|
@ -1,8 +1,20 @@
|
||||
/**
|
||||
@dir core
|
||||
@dir /core
|
||||
@brief core: main loop and onion routing functionality
|
||||
|
||||
The "core" directory has the central protocols for Tor, which every
|
||||
client and relay must implement in order to perform onion routing.
|
||||
|
||||
It is divided into three lower-level pieces:
|
||||
|
||||
- \refdir{core/crypto} -- Tor-specific cryptography.
|
||||
|
||||
- \refdir{core/proto} -- Protocol encoding/decoding.
|
||||
|
||||
- \refdir{core/mainloop} -- A connection-oriented asynchronous mainloop.
|
||||
|
||||
and one high-level piece:
|
||||
|
||||
- \refdir{core/or} -- Implements onion routing itself.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,8 @@
|
||||
/**
|
||||
@dir core/crypto
|
||||
@brief core/crypto
|
||||
@dir /core/crypto
|
||||
@brief core/crypto: Tor-specific cryptography
|
||||
|
||||
This module implements Tor's circuit-construction crypto and Tor's
|
||||
relay crypto.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,12 @@
|
||||
/**
|
||||
@dir core/mainloop
|
||||
@brief core/mainloop
|
||||
@dir /core/mainloop
|
||||
@brief core/mainloop: Non-onion-routing mainloop functionality
|
||||
|
||||
This module uses the event-loop code of \refdir{lib/evloop} to implement an
|
||||
asynchronous connection-oriented protocol handler.
|
||||
|
||||
The layering here is imperfect: the code here was split from \refdir{core/or}
|
||||
without refactoring how the two modules call one another. Probably many
|
||||
functions should be moved and refactored.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,62 @@
|
||||
/**
|
||||
@dir core/or
|
||||
@brief core/or
|
||||
**/
|
||||
@dir /core/or
|
||||
@brief core/or: *Onion routing happens here*.
|
||||
|
||||
This is the central part of Tor that handles the core tasks of onion routing:
|
||||
building circuit, handling circuits, attaching circuit to streams, moving
|
||||
data around, and so forth.
|
||||
|
||||
Some aspects of this module should probably be refactored into others.
|
||||
|
||||
Notable files here include:
|
||||
|
||||
`channel.c`
|
||||
: Generic channel implementation. Channels handle sending and receiving cells
|
||||
among tor nodes.
|
||||
|
||||
`channeltls.c`
|
||||
: Channel implementation for TLS-based OR connections. Uses `connection_or.c`.
|
||||
|
||||
`circuitbuild.c`
|
||||
: Code for constructing circuits and choosing their paths. (*Note*:
|
||||
this module could plausibly be split into handling the client side,
|
||||
the server side, and the path generation aspects of circuit building.)
|
||||
|
||||
`circuitlist.c`
|
||||
: Code for maintaining and navigating the global list of circuits.
|
||||
|
||||
`circuitmux.c`
|
||||
: Generic circuitmux implementation. A circuitmux handles deciding, for a
|
||||
particular channel, which circuit should write next.
|
||||
|
||||
`circuitmux_ewma.c`
|
||||
: A circuitmux implementation based on the EWMA (exponentially
|
||||
weighted moving average) algorithm.
|
||||
|
||||
`circuituse.c`
|
||||
: Code to actually send and receive data on circuits.
|
||||
|
||||
`command.c`
|
||||
: Handles incoming cells on channels.
|
||||
|
||||
`connection.c`
|
||||
: Generic and common connection tools, and implementation for the simpler
|
||||
connection types.
|
||||
|
||||
`connection_edge.c`
|
||||
: Implementation for entry and exit connections.
|
||||
|
||||
`connection_or.c`
|
||||
: Implementation for OR connections (the ones that send cells over TLS).
|
||||
|
||||
`onion.c`
|
||||
: Generic code for generating and responding to CREATE and CREATED
|
||||
cells, and performing the appropriate onion handshakes. Also contains
|
||||
code to manage the server-side onion queue.
|
||||
|
||||
`relay.c`
|
||||
: Handles particular types of relay cells, and provides code to receive,
|
||||
encrypt, route, and interpret relay cells.
|
||||
|
||||
`scheduler.c`
|
||||
: Decides which channel/circuit pair is ready to receive the next cell.
|
||||
|
@ -1,4 +1,8 @@
|
||||
/**
|
||||
@dir core/proto
|
||||
@brief core/proto
|
||||
@dir /core/proto
|
||||
@brief core/proto: Protocol encoding/decoding
|
||||
|
||||
These functions should (but do not always) exist at a lower level than most
|
||||
of the rest of core.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,4 @@
|
||||
/**
|
||||
@dir feature/api
|
||||
@brief feature/api
|
||||
@dir /feature/api
|
||||
@brief feature/api: In-process interface to starting/stopping Tor.
|
||||
**/
|
||||
|
@ -1,4 +1,7 @@
|
||||
/**
|
||||
@dir feature/client
|
||||
@brief feature/client
|
||||
@dir /feature/client
|
||||
@brief feature/client: Client-specific code
|
||||
|
||||
(There is also a bunch of client-specific code in other modules.)
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,10 @@
|
||||
/**
|
||||
@dir feature/control
|
||||
@brief feature/control
|
||||
@dir /feature/control
|
||||
@brief feature/control: Controller API.
|
||||
|
||||
The Controller API is a text-based protocol that another program (or another
|
||||
thread, if you're running Tor in-process) can use to configure and control
|
||||
Tor while it is running. The current protocol is documented in
|
||||
[control-spec.txt](https://gitweb.torproject.org/torspec.git/tree/control-spec.txt).
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,11 @@
|
||||
/**
|
||||
@dir feature/dirauth
|
||||
@brief feature/dirauth
|
||||
@dir /feature/dirauth
|
||||
@brief feature/dirauth: Directory authority implementation.
|
||||
|
||||
This module handles running Tor as a directory authority.
|
||||
|
||||
The directory protocol is specified in
|
||||
[dir-spec.txt](https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt).
|
||||
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,8 @@
|
||||
/**
|
||||
@dir feature/dircache
|
||||
@brief feature/dircache
|
||||
@dir /feature/dircache
|
||||
@brief feature/dircache: Run as a directory cache server
|
||||
|
||||
This module handles the directory caching functionality that all relays may
|
||||
provide, for serving cached directory objects to objects.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,9 @@
|
||||
/**
|
||||
@dir feature/dirclient
|
||||
@brief feature/dirclient
|
||||
@dir /feature/dirclient
|
||||
@brief feature/dirclient: Directory client implementation.
|
||||
|
||||
The code here is used by all Tor instances that need to download directory
|
||||
information. Currently, that is all of them, since even authorities need to
|
||||
launch downloads to learn about relays that other authorities have listed.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,9 @@
|
||||
/**
|
||||
@dir feature/dircommon
|
||||
@brief feature/dircommon
|
||||
@dir /feature/dircommon
|
||||
@brief feature/dircommon: Directory client and server shared code
|
||||
|
||||
This module has the code that directory clients (anybody who download
|
||||
information about relays) and directory servers (anybody who serves such
|
||||
information) share in common.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,10 @@
|
||||
/**
|
||||
@dir feature/dirparse
|
||||
@brief feature/dirparse
|
||||
@dir /feature/dirparse
|
||||
@brief feature/dirparse: Parsing Tor directory objects
|
||||
|
||||
We define a number of "directory objects" in
|
||||
[dir-spec.txt](https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt),
|
||||
all of them using a common line-oriented meta-format. This module is used by
|
||||
other parts of Tor to parse them.
|
||||
|
||||
**/
|
||||
|
@ -1,5 +1,5 @@
|
||||
/**
|
||||
@dir feature
|
||||
@dir /feature
|
||||
@brief feature: domain-specific modules
|
||||
|
||||
The "feature" directory has modules that Tor uses only for a particular
|
||||
|
@ -1,4 +1,16 @@
|
||||
/**
|
||||
@dir feature/hibernate
|
||||
@brief feature/hibernate
|
||||
@dir /feature/hibernate
|
||||
@brief feature/hibernate: Bandwidth accounting and hibernation (!)
|
||||
|
||||
This module implements two features that are only somewhat related, and
|
||||
should probably be separated in the future. One feature is bandwidth
|
||||
accounting (making sure we use no more than so many gigabytes in a day) and
|
||||
hibernation (avoiding network activity while we have used up all/most of our
|
||||
configured gigabytes). The other feature is clean shutdown, where we stop
|
||||
accepting new connections for a while and give the old ones time to close.
|
||||
|
||||
The two features are related only in the sense that "soft hibernation" (being
|
||||
almost out of ) is very close to the "shutting down" state. But it would be
|
||||
better in the long run to make the two completely separate.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,10 @@
|
||||
/**
|
||||
@dir feature/hs
|
||||
@brief feature/hs
|
||||
@dir /feature/hs
|
||||
@brief feature/hs: v3 (current) onion service protocol
|
||||
|
||||
This directory implements the v3 onion service protocol,
|
||||
as specified in
|
||||
[rend-spec-v3.txt](https://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt).
|
||||
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,5 @@
|
||||
/**
|
||||
@dir feature/hs_common
|
||||
@brief feature/hs_common
|
||||
@dir /feature/hs_common
|
||||
@brief feature/hs_common: Common to v2 (old) and v3 (current) onion services
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,5 @@
|
||||
/**
|
||||
@dir feature/keymgt
|
||||
@brief feature/keymgt
|
||||
@dir /feature/keymgt
|
||||
@brief feature/keymgt: Store keys for relays, authorities, etc.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,4 @@
|
||||
/**
|
||||
@dir feature/nodelist
|
||||
@brief feature/nodelist
|
||||
@dir /feature/nodelist
|
||||
@brief feature/nodelist: Download and manage a list of relays
|
||||
**/
|
||||
|
@ -1,4 +1,6 @@
|
||||
/**
|
||||
@dir feature/relay
|
||||
@brief feature/relay
|
||||
@dir /feature/relay
|
||||
@brief feature/relay: Relay-specific code
|
||||
|
||||
(There is also a bunch of relay-specific code in other modules.)
|
||||
**/
|
||||
|
@ -1,4 +1,9 @@
|
||||
/**
|
||||
@dir feature/rend
|
||||
@brief feature/rend
|
||||
@dir /feature/rend
|
||||
@brief feature/rend: version 2 (old) hidden services
|
||||
|
||||
This directory implements the v2 onion service protocol,
|
||||
as specified in
|
||||
[rend-spec-v2.txt](https://gitweb.torproject.org/torspec.git/tree/rend-spec-v2.txt).
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,12 @@
|
||||
/**
|
||||
@dir feature/stats
|
||||
@brief feature/stats
|
||||
@dir /feature/stats
|
||||
@brief feature/stats: Relay statistics. Also, port prediction.
|
||||
|
||||
This module collects anonymized relay statistics in order to publish them in
|
||||
relays' routerinfo and extrainfo documents.
|
||||
|
||||
Additionally, it contains predict_ports.c, which remembers which ports we've
|
||||
visited recently as a client, so we can make sure we have open circuits that
|
||||
support them.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,4 @@
|
||||
/**
|
||||
@dir lib/arch
|
||||
@brief lib/arch
|
||||
@dir /lib/arch
|
||||
@brief lib/arch: Compatibility code for handling different CPU architectures.
|
||||
**/
|
||||
|
@ -1,4 +1,15 @@
|
||||
/**
|
||||
@dir lib/buf
|
||||
@brief lib/buf
|
||||
@dir /lib/buf
|
||||
@brief lib/buf: An efficient byte queue.
|
||||
|
||||
This module defines the buf_t type, which is used throughout our networking
|
||||
code. The implementation is a singly-linked queue of buffer chunks, similar
|
||||
to the BSD kernel's
|
||||
["mbuf"](https://www.freebsd.org/cgi/man.cgi?query=mbuf&sektion=9) structure.
|
||||
|
||||
The buf_t type is also reasonable for use in constructing long strings.
|
||||
|
||||
See \refdir{lib/net} for networking code that uses buf_t, and
|
||||
\refdir{lib/tls} for cryptographic code that uses buf_t.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,4 @@
|
||||
/**
|
||||
@dir lib/cc
|
||||
@brief lib/cc
|
||||
@dir /lib/cc
|
||||
@brief lib/cc: Macros for managing the C compiler and language.
|
||||
**/
|
||||
|
@ -1,4 +1,8 @@
|
||||
/**
|
||||
@dir lib/compress
|
||||
@brief lib/compress
|
||||
@dir /lib/compress
|
||||
@brief lib/compress: Wraps several compression libraries
|
||||
|
||||
Currently supported are zlib (mandatory), zstd (optional), and lzma
|
||||
(optional).
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,5 @@
|
||||
/**
|
||||
@dir lib/conf
|
||||
@brief lib/conf
|
||||
@dir /lib/conf
|
||||
@brief lib/conf: Types and macros for declaring configuration options.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,9 @@
|
||||
/**
|
||||
@dir lib/confmgt
|
||||
@brief lib/confmgt
|
||||
@dir /lib/confmgt
|
||||
@brief lib/confmgt: Parse, encode, manipulate configuration files.
|
||||
|
||||
This logic is used in common by our state files (statefile.c) and
|
||||
configuration files (config.c) to manage a set of named, typed fields,
|
||||
reading and writing them to disk and to the controller.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,51 @@
|
||||
/**
|
||||
@dir lib/container
|
||||
@brief lib/container
|
||||
@dir /lib/container
|
||||
@brief lib/container: Hash tables, dynamic arrays, bit arrays, etc.
|
||||
|
||||
### Smartlists: Neither lists, nor especially smart.
|
||||
|
||||
For historical reasons, we call our dynamic-allocated array type
|
||||
`smartlist_t`. It can grow or shrink as elements are added and removed.
|
||||
|
||||
All smartlists hold an array of `void *`. Whenever you expose a smartlist
|
||||
in an API you *must* document which types its pointers actually hold.
|
||||
|
||||
<!-- It would be neat to fix that, wouldn't it? -NM -->
|
||||
|
||||
Smartlists are created empty with `smartlist_new()` and freed with
|
||||
`smartlist_free()`. See the `containers.h` header documentation for more
|
||||
information; there are many convenience functions for commonly needed
|
||||
operations.
|
||||
|
||||
For low-level operations on smartlists, see also
|
||||
\refdir{lib/smartlist_core}.
|
||||
|
||||
<!-- TODO: WRITE more about what you can do with smartlists. -->
|
||||
|
||||
### Digest maps, string maps, and more.
|
||||
|
||||
Tor makes frequent use of maps from 160-bit digests, 256-bit digests,
|
||||
or nul-terminated strings to `void *`. These types are `digestmap_t`,
|
||||
`digest256map_t`, and `strmap_t` respectively. See the containers.h
|
||||
module documentation for more information.
|
||||
|
||||
### Intrusive lists and hashtables
|
||||
|
||||
For performance-sensitive cases, we sometimes want to use "intrusive"
|
||||
collections: ones where the bookkeeping pointers are stuck inside the
|
||||
structures that belong to the collection. If you've used the
|
||||
BSD-style sys/queue.h macros, you'll be familiar with these.
|
||||
|
||||
Unfortunately, the `sys/queue.h` macros vary significantly between the
|
||||
platforms that have them, so we provide our own variants in
|
||||
`ext/tor_queue.h`.
|
||||
|
||||
We also provide an intrusive hashtable implementation in `ext/ht.h`.
|
||||
When you're using it, you'll need to define your own hash
|
||||
functions. If attacker-induced collisions are a worry here, use the
|
||||
cryptographic siphash24g function to extract hashes.
|
||||
|
||||
<!-- TODO: WRITE about bloom filters, namemaps, bit-arrays, order functions.
|
||||
-->
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,139 @@
|
||||
/**
|
||||
@dir lib/crypt_ops
|
||||
@brief lib/crypt_ops
|
||||
@dir /lib/crypt_ops
|
||||
@brief lib/crypt_ops: Cryptographic operations.
|
||||
|
||||
This module contains wrappers around the cryptographic libraries that we
|
||||
support, and implementations for some higher-level cryptographic
|
||||
constructions that we use.
|
||||
|
||||
It wraps our two major cryptographic backends (OpenSSL or NSS, as configured
|
||||
by the user), and also wraps other cryptographic code in src/ext.
|
||||
|
||||
Generally speaking, Tor code shouldn't be calling OpenSSL or NSS
|
||||
(or any other crypto library) directly. Instead, we should indirect through
|
||||
one of the functions in this directory, or through \refdir{lib/tls}.
|
||||
|
||||
Cryptography functionality that's available is described below.
|
||||
|
||||
### RNG facilities ###
|
||||
|
||||
The most basic RNG capability in Tor is the crypto_rand() family of
|
||||
functions. These currently use OpenSSL's RAND_() backend, but may use
|
||||
something faster in the future.
|
||||
|
||||
In addition to crypto_rand(), which fills in a buffer with random
|
||||
bytes, we also have functions to produce random integers in certain
|
||||
ranges; to produce random hostnames; to produce random doubles, etc.
|
||||
|
||||
When you're creating a long-term cryptographic secret, you might want
|
||||
to use crypto_strongest_rand() instead of crypto_rand(). It takes the
|
||||
operating system's entropy source and combines it with output from
|
||||
crypto_rand(). This is a pure paranoia measure, but it might help us
|
||||
someday.
|
||||
|
||||
You can use smartlist_choose() to pick a random element from a smartlist
|
||||
and smartlist_shuffle() to randomize the order of a smartlist. Both are
|
||||
potentially a bit slow.
|
||||
|
||||
### Cryptographic digests and related functions ###
|
||||
|
||||
We treat digests as separate types based on the length of their
|
||||
outputs. We support one 160-bit digest (SHA1), two 256-bit digests
|
||||
(SHA256 and SHA3-256), and two 512-bit digests (SHA512 and SHA3-512).
|
||||
|
||||
You should not use SHA1 for anything new.
|
||||
|
||||
The crypto_digest\*() family of functions manipulates digests. You
|
||||
can either compute a digest of a chunk of memory all at once using
|
||||
crypto_digest(), crypto_digest256(), or crypto_digest512(). Or you
|
||||
can create a crypto_digest_t object with
|
||||
crypto_digest{,256,512}_new(), feed information to it in chunks using
|
||||
crypto_digest_add_bytes(), and then extract the final digest using
|
||||
crypto_digest_get_digest(). You can copy the state of one of these
|
||||
objects using crypto_digest_dup() or crypto_digest_assign().
|
||||
|
||||
We support the HMAC hash-based message authentication code
|
||||
instantiated using SHA256. See crypto_hmac_sha256. (You should not
|
||||
add any HMAC users with SHA1, and HMAC is not necessary with SHA3.)
|
||||
|
||||
We also support the SHA3 cousins, SHAKE128 and SHAKE256. Unlike
|
||||
digests, these are extendable output functions (or XOFs) where you can
|
||||
get any amount of output. Use the crypto_xof_\*() functions to access
|
||||
these.
|
||||
|
||||
We have several ways to derive keys from cryptographically strong secret
|
||||
inputs (like diffie-hellman outputs). The old
|
||||
crypto_expand_key_material_TAP() performs an ad-hoc KDF based on SHA1 -- you
|
||||
shouldn't use it for implementing anything but old versions of the Tor
|
||||
protocol. You can use HKDF-SHA256 (as defined in RFC5869) for more modern
|
||||
protocols. Also consider SHAKE256.
|
||||
|
||||
If your input is potentially weak, like a password or passphrase, use a salt
|
||||
along with the secret_to_key() functions as defined in crypto_s2k.c. Prefer
|
||||
scrypt over other hashing methods when possible. If you're using a password
|
||||
to encrypt something, see the "boxed file storage" section below.
|
||||
|
||||
Finally, in order to store objects in hash tables, Tor includes the
|
||||
randomized SipHash 2-4 function. Call it via the siphash24g() function in
|
||||
src/ext/siphash.h whenever you're creating a hashtable whose keys may be
|
||||
manipulated by an attacker in order to DoS you with collisions.
|
||||
|
||||
|
||||
### Stream ciphers ###
|
||||
|
||||
You can create instances of a stream cipher using crypto_cipher_new().
|
||||
These are stateful objects of type crypto_cipher_t. Note that these
|
||||
objects only support AES-128 right now; a future version should add
|
||||
support for AES-128 and/or ChaCha20.
|
||||
|
||||
You can encrypt/decrypt with crypto_cipher_encrypt or
|
||||
crypto_cipher_decrypt. The crypto_cipher_crypt_inplace function performs
|
||||
an encryption without a copy.
|
||||
|
||||
Note that sensible people should not use raw stream ciphers; they should
|
||||
probably be using some kind of AEAD. Sorry.
|
||||
|
||||
### Public key functionality ###
|
||||
|
||||
We support four public key algorithms: DH1024, RSA, Curve25519, and
|
||||
Ed25519.
|
||||
|
||||
We support DH1024 over two prime groups. You access these via the
|
||||
crypto_dh_\*() family of functions.
|
||||
|
||||
We support RSA in many bit sizes for signing and encryption. You access
|
||||
it via the crypto_pk_*() family of functions. Note that a crypto_pk_t
|
||||
may or may not include a private key. See the crypto_pk_* functions in
|
||||
crypto.c for a full list of functions here.
|
||||
|
||||
For Curve25519 functionality, see the functions and types in
|
||||
crypto_curve25519.c. Curve25519 is generally suitable for when you need
|
||||
a secure fast elliptic-curve diffie hellman implementation. When
|
||||
designing new protocols, prefer it over DH in Z_p.
|
||||
|
||||
For Ed25519 functionality, see the functions and types in
|
||||
crypto_ed25519.c. Ed25519 is a generally suitable as a secure fast
|
||||
elliptic curve signature method. For new protocols, prefer it over RSA
|
||||
signatures.
|
||||
|
||||
### Metaformats for storage ###
|
||||
|
||||
When OpenSSL manages the storage of some object, we use whatever format
|
||||
OpenSSL provides -- typically, some kind of PEM-wrapped base 64 encoding
|
||||
that starts with "----- BEGIN CRYPTOGRAPHIC OBJECT ----".
|
||||
|
||||
When we manage the storage of some cryptographic object, we prefix the
|
||||
object with 32-byte NUL-padded prefix in order to avoid accidental
|
||||
object confusion; see the crypto_read_tagged_contents_from_file() and
|
||||
crypto_write_tagged_contents_to_file() functions for manipulating
|
||||
these. The prefix is "== type: tag ==", where type describes the object
|
||||
and its encoding, and tag indicates which one it is.
|
||||
|
||||
### Boxed-file storage ###
|
||||
|
||||
When managing keys, you frequently want to have some way to write a
|
||||
secret object to disk, encrypted with a passphrase. The crypto_pwbox
|
||||
and crypto_unpwbox functions do so in a way that's likely to be
|
||||
readable by future versions of Tor.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,16 @@
|
||||
/**
|
||||
@dir lib/ctime
|
||||
@brief lib/ctime
|
||||
@dir /lib/ctime
|
||||
@brief lib/ctime: Constant-time code to avoid side-channels.
|
||||
|
||||
This module contains constant-time implementations of various
|
||||
data comparison and table lookup functions. We use these in preference to
|
||||
memcmp() and so forth, since memcmp() can leak information about its inputs
|
||||
based on how fast it returns. In general, your code should call tor_memeq()
|
||||
and tor_memneq(), not memcmp().
|
||||
|
||||
We also define some _non_-constant-time wrappers for memcmp() here: Since we
|
||||
consider calls to memcmp() to be in error, we require that code that actually
|
||||
doesn't need to be constant-time to use the fast_memeq() / fast_memneq() /
|
||||
fast_memcmp() aliases instead.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,4 @@
|
||||
/**
|
||||
@dir lib/defs
|
||||
@brief lib/defs
|
||||
@dir /lib/defs
|
||||
@brief lib/defs: Lowest-level constants, used in many places.
|
||||
**/
|
||||
|
@ -1,4 +1,16 @@
|
||||
/**
|
||||
@dir lib/dispatch
|
||||
@brief lib/dispatch
|
||||
@dir /lib/dispatch
|
||||
@brief lib/dispatch: In-process message delivery.
|
||||
|
||||
This module provides a general in-process "message dispatch" system in which
|
||||
typed messages are sent on channels. The dispatch.h header has far more
|
||||
information.
|
||||
|
||||
It is used by by \refdir{lib/pubsub} to implement our general
|
||||
inter-module publish/subscribe system.
|
||||
|
||||
This is not a fancy multi-threaded many-to-many dispatcher as you may be used
|
||||
to from more sophisticated architectures: this dispatcher is intended only
|
||||
for use in improving Tor's architecture.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,8 @@
|
||||
/**
|
||||
@dir lib/encoding
|
||||
@brief lib/encoding
|
||||
@dir /lib/encoding
|
||||
@brief lib/encoding: Encoding data in various forms, types, and transformations
|
||||
|
||||
Here we have time formats (timefmt.c), quoted strings (qstring.c), C strings
|
||||
(string.c) base-16/32/64 (binascii.c), and more.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,15 @@
|
||||
/**
|
||||
@dir lib/err
|
||||
@brief lib/err
|
||||
@dir /lib/err
|
||||
@brief lib/err: Lowest-level error handling code.
|
||||
|
||||
This module is responsible for generating stack traces, handling raw
|
||||
assertion failures, and otherwise reporting problems that might not be
|
||||
safe to report via the regular logging module.
|
||||
|
||||
There are three kinds of users for the functions in this module:
|
||||
* Code that needs a way to assert(), but which cannot use the regular
|
||||
`tor_assert()` macros in logging module.
|
||||
* Code that needs signal-safe error reporting.
|
||||
* Higher-level error handling code.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,9 @@
|
||||
/**
|
||||
@dir lib/evloop
|
||||
@brief lib/evloop
|
||||
@dir /lib/evloop
|
||||
@brief lib/evloop: Low-level event loop.
|
||||
|
||||
This modules has tools to manage the [libevent](https://libevent.org/) event
|
||||
loop and related functionality, in order to implement asynchronous
|
||||
networking, timers, periodic events, and other scheduling tasks.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,7 @@
|
||||
/**
|
||||
@dir lib/fdio
|
||||
@brief lib/fdio
|
||||
@dir /lib/fdio
|
||||
@brief lib/fdio: Code to read/write on file descriptors.
|
||||
|
||||
(This module also handles sockets, on platforms where a socket is not a kind
|
||||
of fd.)
|
||||
**/
|
||||
|
@ -1,4 +1,11 @@
|
||||
/**
|
||||
@dir lib/fs
|
||||
@brief lib/fs
|
||||
@dir /lib/fs
|
||||
@brief lib/fs: Files, filenames, directories, etc.
|
||||
|
||||
This module is mostly a set of compatibility wrappers around
|
||||
operating-system-specific filesystem access.
|
||||
|
||||
It also contains a set of convenience functions for safely writing to files,
|
||||
creating directories, and so on.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,5 @@
|
||||
/**
|
||||
@dir lib/geoip
|
||||
@brief lib/geoip
|
||||
@dir /lib/geoip
|
||||
@brief lib/geoip: IP-to-country mapping
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,4 @@
|
||||
/**
|
||||
@dir lib/intmath
|
||||
@brief lib/intmath
|
||||
@dir /lib/intmath
|
||||
@brief lib/intmath: Integer mathematics.
|
||||
**/
|
||||
|
131
src/lib/lib.dox
131
src/lib/lib.dox
@ -1,8 +1,133 @@
|
||||
/**
|
||||
@dir lib
|
||||
@dir /lib
|
||||
@brief lib: low-level functionality.
|
||||
|
||||
The "lib" directory contains low-level functionality, most of it not
|
||||
necessarily Tor-specific.
|
||||
The "lib" directory contains low-level functionality. In general, this
|
||||
code is not necessarily Tor-specific, but is instead possibly useful for
|
||||
other applications.
|
||||
|
||||
The modules in `lib` are currently well-factored: each one depends
|
||||
only on lower-level modules. You can see an up-to-date list of the
|
||||
modules, sorted from lowest to highest level, by running
|
||||
`./scripts/maint/practracker/includes.py --toposort`.
|
||||
|
||||
As of this writing, the library modules are (from lowest to highest
|
||||
level):
|
||||
|
||||
- \refdir{lib/cc} -- Macros for managing the C compiler and
|
||||
language.
|
||||
|
||||
- \refdir{lib/version} -- Holds the current version of Tor.
|
||||
|
||||
- \refdir{lib/testsupport} -- Helpers for making
|
||||
test-only code, and test mocking support.
|
||||
|
||||
- \refdir{lib/defs} -- Lowest-level constants.
|
||||
|
||||
- \refdir{lib/subsys} -- Types used for declaring a
|
||||
"subsystem". (_A subsystem is a module with support for initialization,
|
||||
shutdown, configuration, and so on._)
|
||||
|
||||
- \refdir{lib/conf} -- For declaring configuration options.
|
||||
|
||||
- \refdir{lib/arch} -- For handling differences in CPU
|
||||
architecture.
|
||||
|
||||
- \refdir{lib/err} -- Lowest-level error handling code.
|
||||
|
||||
- \refdir{lib/malloc} -- Memory management.
|
||||
management.
|
||||
|
||||
- \refdir{lib/intmath} -- Integer mathematics.
|
||||
|
||||
- \refdir{lib/fdio} -- For
|
||||
reading and writing n file descriptors.
|
||||
|
||||
- \refdir{lib/lock} -- Simple locking support.
|
||||
(_Lower-level than the rest of the threading code._)
|
||||
|
||||
- \refdir{lib/ctime} -- Constant-time code to avoid
|
||||
side-channels.
|
||||
|
||||
- \refdir{lib/string} -- Low-level string manipulation.
|
||||
|
||||
- \refdir{lib/wallclock} --
|
||||
For inspecting and manipulating the current (UTC) time.
|
||||
|
||||
- \refdir{lib/osinfo} -- For inspecting the OS version
|
||||
and capabilities.
|
||||
|
||||
- \refdir{lib/smartlist_core} -- The bare-bones
|
||||
pieces of our dynamic array ("smartlist") implementation.
|
||||
|
||||
- \refdir{lib/log} -- Log messages to files, syslogs, etc.
|
||||
|
||||
- \refdir{lib/container} -- General purpose containers,
|
||||
including dynamic arrays ("smartlists"), hashtables, bit arrays,
|
||||
etc.
|
||||
|
||||
- \refdir{lib/trace} -- A general-purpose API
|
||||
function-tracing functionality Tor. (_Currently not much used._)
|
||||
|
||||
- \refdir{lib/thread} -- Mid-level Threading.
|
||||
|
||||
- \refdir{lib/term} -- Terminal manipulation
|
||||
(like reading a password from the user).
|
||||
|
||||
- \refdir{lib/memarea} -- A fast
|
||||
"arena" style allocator, where the data is freed all at once.
|
||||
|
||||
- \refdir{lib/encoding} -- Encoding
|
||||
data in various formats, datatypes, and transformations.
|
||||
|
||||
- \refdir{lib/dispatch} -- A general-purpose in-process
|
||||
message delivery system.
|
||||
|
||||
- \refdir{lib/sandbox} -- Our Linux seccomp2 sandbox
|
||||
implementation.
|
||||
|
||||
- \refdir{lib/pubsub} -- A publish/subscribe message passing system.
|
||||
|
||||
- \refdir{lib/fs} -- Files, filenames, directories, etc.
|
||||
|
||||
- \refdir{lib/confmgt} -- Parse, encode, and manipulate onfiguration files.
|
||||
|
||||
- \refdir{lib/crypt_ops} -- Cryptographic operations.
|
||||
|
||||
- \refdir{lib/meminfo} -- Functions for inspecting our
|
||||
memory usage, if the malloc implementation exposes that to us.
|
||||
|
||||
- \refdir{lib/time} -- Higher level time functions, including
|
||||
fine-gained and monotonic timers.
|
||||
|
||||
- \refdir{lib/math} -- Floating-point mathematical utilities.
|
||||
|
||||
- \refdir{lib/buf} -- An efficient byte queue.
|
||||
|
||||
- \refdir{lib/net} -- Networking code, including address
|
||||
manipulation, compatibility wrappers, etc.
|
||||
|
||||
- \refdir{lib/compress} -- Wraps several compression libraries.
|
||||
|
||||
- \refdir{lib/geoip} -- IP-to-country mapping.
|
||||
|
||||
- \refdir{lib/tls} -- TLS library wrappers.
|
||||
|
||||
- \refdir{lib/evloop} -- Low-level event-loop.
|
||||
|
||||
- \refdir{lib/process} -- Launch and manage subprocesses.
|
||||
|
||||
### What belongs in lib?
|
||||
|
||||
In general, if you can imagine some program wanting the functionality
|
||||
you're writing, even if that program had nothing to do with Tor, your
|
||||
functionality belongs in lib.
|
||||
|
||||
If it falls into one of the existing "lib" categories, your
|
||||
functionality belongs in lib.
|
||||
|
||||
If you are using platform-specific `ifdef`s to manage compatibility
|
||||
issues among platforms, you should probably consider whether you can
|
||||
put your code into lib.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,8 @@
|
||||
/**
|
||||
@dir lib/lock
|
||||
@brief lib/lock
|
||||
@dir /lib/lock
|
||||
@brief lib/lock: Simple locking support.
|
||||
|
||||
This module is more low-level than the rest of the threading code, since it
|
||||
is needed by more intermediate-level modules.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,12 @@
|
||||
/**
|
||||
@dir lib/log
|
||||
@brief lib/log
|
||||
@dir /lib/log
|
||||
@brief lib/log: Log messages to files, syslogs, etc.
|
||||
|
||||
You can think of this as the logical "midpoint" of the
|
||||
\refdir{lib} code": much of the higher-level code is higher-level
|
||||
_because_ it uses the logging module, and much of the lower-level code is
|
||||
specifically written to avoid having to log, because the logging module
|
||||
depends on it.
|
||||
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,78 @@
|
||||
/**
|
||||
@dir lib/malloc
|
||||
@brief lib/malloc
|
||||
@dir /lib/malloc
|
||||
@brief lib/malloc: Wrappers and utilities for memory management.
|
||||
|
||||
|
||||
Tor imposes a few light wrappers over C's native malloc and free
|
||||
functions, to improve convenience, and to allow wholescale replacement
|
||||
of malloc and free as needed.
|
||||
|
||||
You should never use 'malloc', 'calloc', 'realloc, or 'free' on their
|
||||
own; always use the variants prefixed with 'tor_'.
|
||||
They are the same as the standard C functions, with the following
|
||||
exceptions:
|
||||
|
||||
* `tor_free(NULL)` is a no-op.
|
||||
* `tor_free()` is a macro that takes an lvalue as an argument and sets it to
|
||||
NULL after freeing it. To avoid this behavior, you can use `tor_free_()`
|
||||
instead.
|
||||
* tor_malloc() and friends fail with an assertion if they are asked to
|
||||
allocate a value so large that it is probably an underflow.
|
||||
* It is always safe to `tor_malloc(0)`, regardless of whether your libc
|
||||
allows it.
|
||||
* `tor_malloc()`, `tor_realloc()`, and friends are never allowed to fail.
|
||||
Instead, Tor will die with an assertion. This means that you never
|
||||
need to check their return values. See the next subsection for
|
||||
information on why we think this is a good idea.
|
||||
|
||||
We define additional general-purpose memory allocation functions as well:
|
||||
|
||||
* `tor_malloc_zero(x)` behaves as `calloc(1, x)`, except the it makes clear
|
||||
the intent to allocate a single zeroed-out value.
|
||||
* `tor_reallocarray(x,y)` behaves as the OpenBSD reallocarray function.
|
||||
Use it for cases when you need to realloc() in a multiplication-safe
|
||||
way.
|
||||
|
||||
And specific-purpose functions as well:
|
||||
|
||||
* `tor_strdup()` and `tor_strndup()` behaves as the underlying libc
|
||||
functions, but use `tor_malloc()` instead of the underlying function.
|
||||
* `tor_memdup()` copies a chunk of memory of a given size.
|
||||
* `tor_memdup_nulterm()` copies a chunk of memory of a given size, then
|
||||
NUL-terminates it just to be safe.
|
||||
|
||||
#### Why assert on allocation failure?
|
||||
|
||||
Why don't we allow `tor_malloc()` and its allies to return NULL?
|
||||
|
||||
First, it's error-prone. Many programmers forget to check for NULL return
|
||||
values, and testing for `malloc()` failures is a major pain.
|
||||
|
||||
Second, it's not necessarily a great way to handle OOM conditions. It's
|
||||
probably better (we think) to have a memory target where we dynamically free
|
||||
things ahead of time in order to stay under the target. Trying to respond to
|
||||
an OOM at the point of `tor_malloc()` failure, on the other hand, would involve
|
||||
a rare operation invoked from deep in the call stack. (Again, that's
|
||||
error-prone and hard to debug.)
|
||||
|
||||
Third, thanks to the rise of Linux and other operating systems that allow
|
||||
memory to be overcommitted, you can't actually ever rely on getting a NULL
|
||||
from `malloc()` when you're out of memory; instead you have to use an approach
|
||||
closer to tracking the total memory usage.
|
||||
|
||||
#### Conventions for your own allocation functions.
|
||||
|
||||
Whenever you create a new type, the convention is to give it a pair of
|
||||
`x_new()` and `x_free_()` functions, named after the type.
|
||||
|
||||
Calling `x_free(NULL)` should always be a no-op.
|
||||
|
||||
There should additionally be an `x_free()` macro, defined in terms of
|
||||
`x_free_()`. This macro should set its lvalue to NULL. You can define it
|
||||
using the FREE_AND_NULL macro, as follows:
|
||||
|
||||
```
|
||||
#define x_free(ptr) FREE_AND_NULL(x_t, x_free_, (ptr))
|
||||
```
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,8 @@
|
||||
/**
|
||||
@dir lib/math
|
||||
@brief lib/math
|
||||
@dir /lib/math
|
||||
@brief lib/math: Floating-point math utilities.
|
||||
|
||||
This module includes a bunch of floating-point compatibility code, and
|
||||
implementations for several probability distributions.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,30 @@
|
||||
/**
|
||||
@dir lib/memarea
|
||||
@brief lib/memarea
|
||||
@dir /lib/memarea
|
||||
@brief lib/memarea: A fast arena-style allocator.
|
||||
|
||||
This module has a fast "arena" style allocator, where memory is freed all at
|
||||
once. This kind of allocation is very fast and avoids fragmentation, at the
|
||||
expense of requiring all the data to be freed at the same time. We use this
|
||||
for parsing and diff calculations.
|
||||
|
||||
It's often handy to allocate a large number of tiny objects, all of which
|
||||
need to disappear at the same time. You can do this in tor using the
|
||||
memarea.c abstraction, which uses a set of grow-only buffers for allocation,
|
||||
and only supports a single "free" operation at the end.
|
||||
|
||||
Using memareas also helps you avoid memory fragmentation. You see, some libc
|
||||
malloc implementations perform badly on the case where a large number of
|
||||
small temporary objects are allocated at the same time as a few long-lived
|
||||
objects of similar size. But if you use tor_malloc() for the long-lived ones
|
||||
and a memarea for the temporary object, the malloc implementation is likelier
|
||||
to do better.
|
||||
|
||||
To create a new memarea, use `memarea_new()`. To drop all the storage from a
|
||||
memarea, and invalidate its pointers, use `memarea_drop_all()`.
|
||||
|
||||
The allocation functions `memarea_alloc()`, `memarea_alloc_zero()`,
|
||||
`memarea_memdup()`, `memarea_strdup()`, and `memarea_strndup()` are analogous
|
||||
to the similarly-named malloc() functions. There is intentionally no
|
||||
`memarea_free()` or `memarea_realloc()`.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,7 @@
|
||||
/**
|
||||
@dir lib/meminfo
|
||||
@brief lib/meminfo
|
||||
@dir /lib/meminfo
|
||||
@brief lib/meminfo: Inspecting malloc() usage.
|
||||
|
||||
Only available when malloc() provides mallinfo() or something similar.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,8 @@
|
||||
/**
|
||||
@dir lib/net
|
||||
@brief lib/net
|
||||
@dir /lib/net
|
||||
@brief lib/net: Low-level network-related code.
|
||||
|
||||
This module includes address manipulation, compatibility wrappers,
|
||||
convenience functions, and so on.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,10 @@
|
||||
/**
|
||||
@dir lib/osinfo
|
||||
@brief lib/osinfo
|
||||
@dir /lib/osinfo
|
||||
@brief lib/osinfo: For inspecting the OS version and capabilities.
|
||||
|
||||
In general, we use this module when we're telling the user what operating
|
||||
system they are running. We shouldn't make decisions based on the output of
|
||||
these checks: instead, we should have more specific checks, either at compile
|
||||
time or run time, based on the observed system behavior.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,4 @@
|
||||
/**
|
||||
@dir lib/process
|
||||
@brief lib/process
|
||||
@dir /lib/process
|
||||
@brief lib/process: Launch and manage subprocesses.
|
||||
**/
|
||||
|
@ -1,4 +1,16 @@
|
||||
/**
|
||||
@dir lib/pubsub
|
||||
@brief lib/pubsub
|
||||
@dir /lib/pubsub
|
||||
@brief lib/pubsub: Publish-subscribe message passing.
|
||||
|
||||
This module wraps the \refdir{lib/dispatch} module, to provide a more
|
||||
ergonomic and type-safe approach to message passing.
|
||||
|
||||
In general, we favor this mechanism for cases where higher-level modules
|
||||
need to be notified when something happens in lower-level modules. (The
|
||||
alternative would be calling up from the lower-level modules, which
|
||||
would be error-prone; or maintaining lists of function-pointers, which
|
||||
would be clumsy and tend to complicate the call graph.)
|
||||
|
||||
See pubsub.c for more information.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,17 @@
|
||||
/**
|
||||
@dir lib/sandbox
|
||||
@brief lib/sandbox
|
||||
@dir /lib/sandbox
|
||||
@brief lib/sandbox: Linux seccomp2-based sandbox.
|
||||
|
||||
This module uses Linux's seccomp2 facility via the
|
||||
[`libseccomp` library](https://github.com/seccomp/libseccomp), to restrict
|
||||
the set of system calls that Tor is allowed to invoke while it is running.
|
||||
|
||||
Because there are many libc versions that invoke different system calls, and
|
||||
because handling strings is quite complex, this module is more complex and
|
||||
less portable than it needs to be.
|
||||
|
||||
A better architecture would put the responsibility for invoking tricky system
|
||||
calls (like open()) in another, less restricted process, and give that
|
||||
process responsibility for enforcing our sandbox rules.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,12 @@
|
||||
/**
|
||||
@dir lib/smartlist_core
|
||||
@brief lib/smartlist_core
|
||||
@dir /lib/smartlist_core
|
||||
@brief lib/smartlist_core: Minimal dynamic array implementation
|
||||
|
||||
A `smartlist_t` is a dynamic array type for holding `void *`. We use it
|
||||
throughout the rest of the codebase.
|
||||
|
||||
There are higher-level pieces in \refdir{lib/container} but
|
||||
the ones in lib/smartlist_core are used by the logging code, and therefore
|
||||
cannot use the logging code.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +0,0 @@
|
||||
/**
|
||||
@dir lib/stats
|
||||
@brief lib/stats
|
||||
**/
|
@ -1,4 +1,15 @@
|
||||
/**
|
||||
@dir lib/string
|
||||
@brief lib/string
|
||||
@dir /lib/string
|
||||
@brief lib/string: Low-level string manipulation.
|
||||
|
||||
We have a number of compatibility functions here: some are for handling
|
||||
functionality that is not implemented (or not implemented the same) on every
|
||||
platform; some are for providing locale-independent versions of libc
|
||||
functions that would otherwise be defined differently for different users.
|
||||
|
||||
Other functions here are for common string-manipulation operations that we do
|
||||
in the rest of the codebase.
|
||||
|
||||
Any string function high-level enough to need logging belongs in a
|
||||
higher-level module.
|
||||
**/
|
||||
|
@ -1,4 +1,34 @@
|
||||
/**
|
||||
@dir lib/subsys
|
||||
@brief lib/subsys
|
||||
@dir /lib/subsys
|
||||
@brief lib/subsys: Types for declaring a "subsystem".
|
||||
|
||||
## Subsystems in Tor
|
||||
|
||||
A subsystem is a module with support for initialization, shutdown,
|
||||
configuration, and so on.
|
||||
|
||||
Many parts of Tor can be initialized, cleaned up, and configured somewhat
|
||||
independently through a table-driven mechanism. Each such part is called a
|
||||
"subsystem".
|
||||
|
||||
To declare a subsystem, make a global `const` instance of the `subsys_fns_t`
|
||||
type, filling in the function pointer fields that you require with ones
|
||||
corresponding to your subsystem. Any function pointers left as "NULL" will
|
||||
be a no-op. Each system must have a name and a "level", which corresponds to
|
||||
the order in which it is initialized. (See `app/main/subsystem_list.c` for a
|
||||
list of current subsystems and their levels.)
|
||||
|
||||
Then, insert your subsystem in the list in `app/main/subsystem_list.c`. It
|
||||
will need to occupy a position corresponding to its level.
|
||||
|
||||
At this point, your subsystem will be handled like the others: it will get
|
||||
initialized at startup, torn down at exit, and so on.
|
||||
|
||||
Historical note: Not all of Tor's code is currently handled as
|
||||
subsystems. As you work with older code, you may see some parts of the code
|
||||
that are initialized from `tor_init()` or `run_tor_main_loop()` or
|
||||
`tor_run_main()`; and torn down from `tor_cleanup()`. We aim to migrate
|
||||
these to subsystems over time; please don't add any new code that follows
|
||||
this pattern.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,4 @@
|
||||
/**
|
||||
@dir lib/term
|
||||
@brief lib/term
|
||||
@dir /lib/term
|
||||
@brief lib/term: Terminal operations (password input).
|
||||
**/
|
||||
|
@ -1,4 +1,4 @@
|
||||
/**
|
||||
@dir lib/testsupport
|
||||
@brief lib/testsupport
|
||||
@dir /lib/testsupport
|
||||
@brief lib/testsupport: Helpers for test-only code and for function mocking.
|
||||
**/
|
||||
|
@ -1,4 +1,9 @@
|
||||
/**
|
||||
@dir lib/thread
|
||||
@brief lib/thread
|
||||
@dir /lib/thread
|
||||
@brief lib/thread: Mid-level threading.
|
||||
|
||||
This module contains compatibility and convenience code for multithreading,
|
||||
except for low-level locks (which are in \refdir{lib/lock} and
|
||||
workqueue/threadpool code (which belongs in \refdir{lib/evloop}.)
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,11 @@
|
||||
/**
|
||||
@dir lib/time
|
||||
@brief lib/time
|
||||
@dir /lib/time
|
||||
@brief lib/time: Higher-level time functions
|
||||
|
||||
This includes both fine-grained timers and monotonic timers, along with
|
||||
wrappers for them to try to improve efficiency.
|
||||
|
||||
For "what time is it" in UTC, see \refdir{lib/wallclock}. For parsing and
|
||||
encoding times and dates, see \refdir{lib/encoding}.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,13 @@
|
||||
/**
|
||||
@dir lib/tls
|
||||
@brief lib/tls
|
||||
@dir /lib/tls
|
||||
@brief lib/tls: TLS library wrappers
|
||||
|
||||
This module has compatibility wrappers around the library (NSS or OpenSSL,
|
||||
depending on configuration) that Tor uses to implement the TLS link security
|
||||
protocol.
|
||||
|
||||
It also implements the logic for some legacy TLS protocol usage we used to
|
||||
support in old versions of Tor, involving conditional delivery of certificate
|
||||
chains (v1 link protocol) and conditional renegotiation (v2 link protocol).
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,8 @@
|
||||
/**
|
||||
@dir lib/trace
|
||||
@brief lib/trace
|
||||
@dir /lib/trace
|
||||
@brief lib/trace: Function-tracing functionality API.
|
||||
|
||||
This module is used for adding "trace" support (low-granularity function
|
||||
logging) to Tor. Right now it doesn't have many users.
|
||||
|
||||
**/
|
||||
|
@ -1,4 +1,4 @@
|
||||
/**
|
||||
@dir lib/version
|
||||
@brief lib/version
|
||||
@dir /lib/version
|
||||
@brief lib/version: holds the current version of Tor.
|
||||
**/
|
||||
|
@ -1,4 +1,13 @@
|
||||
/**
|
||||
@dir lib/wallclock
|
||||
@brief lib/wallclock
|
||||
@dir /lib/wallclock
|
||||
@brief lib/wallclock: Inspect and manipulate the current time.
|
||||
|
||||
This module handles our concept of "what time is it" or "what time does the
|
||||
world agree it is?" Generally, if you want something derived from UTC, this
|
||||
is the module for you.
|
||||
|
||||
For versions of the time that are more local, more monotonic, or more
|
||||
accurate, see \refdir{lib/time}. For parsing and encoding times and dates,
|
||||
see \refdir{lib/encoding}.
|
||||
|
||||
**/
|
||||
|
121
src/mainpage.dox
121
src/mainpage.dox
@ -1,11 +1,122 @@
|
||||
/**
|
||||
@mainpage Tor source reference
|
||||
|
||||
@section intro Getting to know Tor
|
||||
@section intro Welcome to Tor
|
||||
|
||||
Welcome to the Tor source code documentation! Here we have documentation for
|
||||
nearly every function, type, and module in the Tor source code. The high-level
|
||||
documentation is a work in progress. For now, have a look at the source code
|
||||
overview in doc/HACKING/design.
|
||||
This documentation describes the general structure of the Tor codebase, how
|
||||
it fits together, what functionality is available for extending Tor, and
|
||||
gives some notes on how Tor got that way. It also includes a reference for
|
||||
nearly every function, type, file, and module in the Tor source code. The
|
||||
high-level documentation is a work in progress.
|
||||
|
||||
Tor itself remains a work in progress too: We've been working on it for
|
||||
nearly two decades, and we've learned a lot about good coding since we first
|
||||
started. This means, however, that some of the older pieces of Tor will have
|
||||
some "code smell" in them that could stand a brisk refactoring. So when we
|
||||
describe a piece of code, we'll sometimes give a note on how it got that way,
|
||||
and whether we still think that's a good idea.
|
||||
|
||||
This document is not an overview of the Tor protocol. For that, see the
|
||||
design paper and the specifications at https://spec.torproject.org/ .
|
||||
|
||||
For more information about Tor's coding standards and some helpful
|
||||
development tools, see
|
||||
[doc/HACKING](https://gitweb.torproject.org/tor.git/tree/doc/HACKING) in the
|
||||
Tor repository.
|
||||
|
||||
@section highlevel The very high level
|
||||
|
||||
Ultimately, Tor runs as an event-driven network daemon: it responds to
|
||||
network events, signals, and timers by sending and receiving things over
|
||||
the network. Clients, relays, and directory authorities all use the
|
||||
same codebase: the Tor process will run as a client, relay, or authority
|
||||
depending on its configuration.
|
||||
|
||||
Tor has a few major dependencies, including Libevent (used to tell which
|
||||
sockets are readable and writable), OpenSSL or NSS (used for many encryption
|
||||
functions, and to implement the TLS protocol), and zlib (used to
|
||||
compress and uncompress directory information).
|
||||
|
||||
Most of Tor's work today is done in a single event-driven main thread.
|
||||
Tor also spawns one or more worker threads to handle CPU-intensive
|
||||
tasks. (Right now, this only includes circuit encryption and the more
|
||||
expensive compression algorithms.)
|
||||
|
||||
On startup, Tor initializes its libraries, reads and responds to its
|
||||
configuration files, and launches a main event loop. At first, the only
|
||||
events that Tor listens for are a few signals (like TERM and HUP), and
|
||||
one or more listener sockets (for different kinds of incoming
|
||||
connections). Tor also configures several timers to handle periodic
|
||||
events. As Tor runs over time, other events will open, and new events
|
||||
will be scheduled.
|
||||
|
||||
The codebase is divided into a few top-level subdirectories, each of
|
||||
which contains several sub-modules.
|
||||
|
||||
- `ext` -- Code maintained elsewhere that we include in the Tor
|
||||
source distribution.
|
||||
|
||||
- \refdir{lib} -- Lower-level utility code, not necessarily
|
||||
tor-specific.
|
||||
|
||||
- `trunnel` -- Automatically generated code (from the Trunnel
|
||||
tool): used to parse and encode binary formats.
|
||||
|
||||
- \refdir{core} -- Networking code that is implements the central
|
||||
parts of the Tor protocol and main loop.
|
||||
|
||||
- \refdir{feature} -- Aspects of Tor (like directory management,
|
||||
running a relay, running a directory authorities, managing a list of
|
||||
nodes, running and using onion services) that are built on top of the
|
||||
mainloop code.
|
||||
|
||||
- \refdir{app} -- Highest-level functionality; responsible for setting
|
||||
up and configuring the Tor daemon, making sure all the lower-level
|
||||
modules start up when required, and so on.
|
||||
|
||||
- \refdir{tools} -- Binaries other than Tor that we produce.
|
||||
Currently this is tor-resolve, tor-gencert, and the tor_runner.o helper
|
||||
module.
|
||||
|
||||
- `test` -- unit tests, regression tests, and a few integration
|
||||
tests.
|
||||
|
||||
In theory, the above parts of the codebase are sorted from highest-level to
|
||||
lowest-level, where high-level code is only allowed to invoke lower-level
|
||||
code, and lower-level code never includes or depends on code of a higher
|
||||
level. In practice, this refactoring is incomplete: The modules in
|
||||
\refdir{lib} are well-factored, but there are many layer violations ("upward
|
||||
dependencies") in \refdir{core} and \refdir{feature}.
|
||||
We aim to eliminate those over time.
|
||||
|
||||
@section keyabstractions Some key high-level abstractions
|
||||
|
||||
The most important abstractions at Tor's high-level are Connections,
|
||||
Channels, Circuits, and Nodes.
|
||||
|
||||
A 'Connection' (connection_t) represents a stream-based information flow.
|
||||
Most connections are TCP connections to remote Tor servers and clients. (But
|
||||
as a shortcut, a relay will sometimes make a connection to itself without
|
||||
actually using a TCP connection. More details later on.) Connections exist
|
||||
in different varieties, depending on what functionality they provide. The
|
||||
principle types of connection are edge_connection_t (eg a socks connection or
|
||||
a connection from an exit relay to a destination), or_connection_t (a TLS
|
||||
stream connecting to a relay), dir_connection_t (an HTTP connection to learn
|
||||
about the network), and control_connection_t (a connection from a
|
||||
controller).
|
||||
|
||||
A 'Circuit' (circuit_t) is persistent tunnel through the Tor network,
|
||||
established with public-key cryptography, and used to send cells one or more
|
||||
hops. Clients keep track of multi-hop circuits (origin_circuit_t), and the
|
||||
cryptography associated with each hop. Relays, on the other hand, keep track
|
||||
only of their hop of each circuit (or_circuit_t).
|
||||
|
||||
A 'Channel' (channel_t) is an abstract view of sending cells to and from a
|
||||
Tor relay. Currently, all channels are implemented using OR connections
|
||||
(channel_tls_t). If we switch to other strategies in the future, we'll have
|
||||
more connection types.
|
||||
|
||||
A 'Node' (node_t) is a view of a Tor instance's current knowledge and opinions
|
||||
about a Tor relay or bridge.
|
||||
|
||||
**/
|
||||
|
@ -1,5 +1,5 @@
|
||||
/**
|
||||
@dir tools
|
||||
@dir /tools
|
||||
@brief tools: other command-line tools for use with Tor.
|
||||
|
||||
The "tools" directory has a few other programs that use Tor, but are not part
|
||||
|
Loading…
Reference in New Issue
Block a user