Rewrite "common" overview into a "lib" overview.

This commit is contained in:
Nick Mathewson 2019-10-14 13:49:27 -04:00
parent 908070bbd5
commit 8ef5d96c2e

View File

@ -1,121 +1,171 @@
## Utility code in Tor
## Library code in Tor.
Most of Tor's utility code is in modules in the src/common subdirectory.
Most of Tor's utility code is in modules in the `src/lib` subdirectory. In
general, this code is not necessarily Tor-specific, but is instead possibly
useful for other applications.
These are divided, broadly, into _compatibility_ functions, _utility_
functions, _containers_, and _cryptography_. (Someday in the future, it
would be great to split these modules into separate directories. Also, some
functions are probably put in the wrong modules)
This code includes:
### Compatibility code
* Compatibility wrappers, to provide a uniform API across different
platforms.
These functions live in src/common/compat\*.c; some corresponding macros live
in src/common/compat\*.h. They serve as wrappers around platform-specific or
compiler-specific logic functionality.
* Library wrappers, to provide a tor-like API over different libraries
that Tor uses for things like compression and cryptography.
In general, the rest of the Tor code *should not* be calling platform-specific
or otherwise non-portable functions. Instead, they should call wrappers from
compat.c, which implement a common cross-platform API. (If you don't know
whether a function is portable, it's usually good enough to see whether it
exists on OSX, Linux, and Windows.)
* Containers, to implement some general-purpose data container types.
Other compatibility modules include backtrace.c, which generates stack traces
for crash reporting; sandbox.c, which implements the Linux seccomp2 sandbox;
and procmon.c, which handles monitoring a child process.
The modules in `src/lib` are currently well-factored: each one depends
only on lower-level modules. You can see an up-to-date list of the
modules sorted from lowest to highest level by running
`./scripts/maint/practracker/includes.py --toposort`.
Parts of address.c are compatibility code for handling network addressing
issues; other parts are in util.c.
As of this writing, the library modules are (from lowest to highest
level):
Notable compatibility areas are:
* `lib/cc` -- Macros for managing the C compiler and
language. Includes macros for improving compatibility and clarity
across different C compilers.
* mmap support for mapping files into the address space (read-only)
* `lib/version` -- Holds the current version of Tor.
* Code to work around the intricacies
* `lib/testsupport` -- Helpers for making test-only code and test
mocking support.
* Workaround code for Windows's horrible winsock incompatibilities and
Linux's intricate socket extensions.
* `lib/defs` -- Lowest-level constants used in many places across the
code.
* Helpful string functions like memmem, memstr, asprintf, strlcpy, and
strlcat that not all platforms have.
* `lib/subsys` -- Types used for declaring a "subsystem". A subsystem
is a module with support for initialization, shutdown,
configuration, and so on.
* Locale-ignoring variants of the ctypes functions.
* `lib/conf` -- Types and macros used for declaring configuration
options.
* Time-manipulation functions
* `lib/arch` -- Compatibility functions and macros for handling
differences in CPU architecture.
* File locking function
* `lib/err` -- Lowest-level error handling code: responsible for
generating stack traces, handling raw assertion failures, and
otherwise reporting problems that might not be safe to report
via the regular logging module.
* IPv6 functions for platforms that don't have enough IPv6 support
* `lib/malloc` -- Wrappers and utilities for memory management.
* Endianness functions
* `lib/intmath` -- Utilities for integer mathematics.
* OS functions
* `lib/fdio` -- Utilities and compatibility code for reading and
writing data on file descriptors (and on sockets, for platforms
where a socket is not a kind of fd).
* Threading and locking functions.
* `lib/lock` -- Compatibility code for declaring and using locks.
Lower-level than the rest of the threading code.
=== Utility functions
* `lib/ctime` -- Constant-time implementations for data comparison
and table lookup, used to avoid timing side-channels from standard
implementations of memcmp() and so on.
General-purpose utilities are in util.c; they include higher-level wrappers
around many of the compatibility functions to provide things like
file-at-once access, memory management functions, math, string manipulation,
time manipulation, filesystem manipulation, etc.
* `lib/string` -- Low-level compatibility wrappers and utility
functions for string manipulation.
(Some functionality, like daemon-launching, would be better off in a
compatibility module.)
* `lib/wallclock` -- Compatibility and utility functions for
inspecting and manipulating the current (UTC) time.
In util_format.c, we have code to implement stuff like base-32 and base-64
encoding.
* `lib/osinfo` -- Functions for inspecting the version and
capabilities of the operating system.
The address.c module interfaces with the system resolver and implements
address parsing and formatting functions. It converts sockaddrs to and from
a more compact tor_addr_t type.
* `lib/smartlist_core` -- The bare-bones pieces of our dynamic array
("smartlist") implementation. There are higher-level pieces, but
these ones are used by (and therefore cannot use) the logging code.
The di_ops.c module provides constant-time comparison and associative-array
operations, for side-channel avoidance.
* `lib/log` -- Implements the logging system used by all higher-level
Tor code. You can think of this as the logical "midpoint" of the
library code: much of the higher-level code is higher-level
_because_ it uses the logging module, and much of the lower-level
code is specifically written to avoid having to log, because the
logging module depends on it.
The logging subsystem in log.c supports logging to files, to controllers, to
stdout/stderr, or to the system log.
* `lib/container` -- General purpose containers, including dynamic arrays,
hashtables, bit arrays, weak-reference-like "handles", bloom
filters, and a bit more.
The abstraction in memarea.c is used in cases when a large amount of
temporary objects need to be allocated, and they can all be freed at the same
time.
* `lib/trace` -- A general-purpose API for introducing
function-tracing functionality into Tor. Currently not much used.
The torgzip.c module wraps the zlib library to implement compression.
* `lib/thread` -- Threading compatibility and utility functionality,
other than low-level locks (which are in `lib/lock`) and
workqueue/threadpool code (which belongs in `lib/evloop`).
Workqueue.c provides a simple multithreaded work-queue implementation.
* `lib/term` -- Code for terminal manipulation functions (like
reading a password from the user).
### Containers
* `lib/memarea` -- A data structure for a fast "arena" style allocator,
where the data is freed all at once. Used for parsing.
The container.c module defines these container types, used throughout the Tor
codebase.
* `lib/encoding` -- Implementations for encoding data in various
formats, datatypes, and transformations.
There is a dynamic array called **smartlist**, used as our general resizeable
array type. It supports sorting, searching, common set operations, and so
on. It has specialized functions for smartlists of strings, and for
heap-based priority queues.
* `lib/dispatch` -- A general-purpose in-process message delivery
system. Used by `lib/pubsub` to implement our inter-module
publish/subscribe system.
There's a bit-array type.
* `lib/sandbox` -- Our Linux seccomp2 sandbox implementation.
A set of mapping types to map strings, 160-bit digests, and 256-bit digests
to void \*. These are what we generally use when we want O(1) lookup.
* `lib/pubsub` -- Code and macros to implement our publish/subscribe
message passing system.
Additionally, for containers, we use the ht.h and tor_queue.h headers, in
src/ext. These provide intrusive hashtable and linked-list macros.
* `lib/fs` -- Utility and compatibility code for manipulating files,
filenames, directories, and so on.
### Cryptography
* `lib/confmgt` -- Code to parse, encode, and manipulate our
configuration files, state files, and so forth.
Once, we tried to keep our cryptography code in a single "crypto.c" file,
with an "aes.c" module containing an AES implementation for use with older
OpenSSLs.
* `lib/crypt_ops` -- Cryptographic operations. This module contains
wrappers around the cryptographic libraries that we support,
and implementations for some higher-level cryptographic
constructions that we use.
Now, our practice has become to introduce crypto_\*.c modules when adding new
cryptography backend code. We have modules for Ed25519, Curve25519,
secret-to-key algorithms, and password-based boxed encryption.
* `lib/meminfo` -- Functions for inspecting our memory usage, if the
malloc implementation exposes that to us.
Our various TLS compatibility code, wrappers, and hacks are kept in
tortls.c, which is probably too full of Tor-specific kludges. I'm
hoping we can eliminate most of those kludges when we finally remove
support for older versions of our TLS handshake.
* `lib/time` -- Higher level time functions, including fine-gained and
monotonic timers.
* `lib/math` -- Floating-point mathematical utilities, including
compatibility code, and probability distributions.
* `lib/buf` -- A general purpose queued buffer implementation,
similar to the BSD kernel's "mbuf" structure.
* `lib/net` -- Networking code, including address manipulation,
compatibility wrappers,
* `lib/compress` -- A compatibility wrapper around several
compression libraries, currently including zlib, zstd, and lzma.
* `lib/geoip` -- Utilities to manage geoip (IP to country) lookups
and formats.
* `lib/tls` -- Compatibility wrappers around the library (NSS or
OpenSSL, depending on configuration) that Tor uses to implement the
TLS link security protocol.
* `lib/evloop` -- Tools to manage the event loop and related
functionality, in order to implement asynchronous networking,
timers, periodic events, and other scheduling tasks.
* `lib/process` -- Utilities and compatibility code to launch and
manage subprocesses.
### What belongs in lib?
In general, if you can imagine some program wanting the functionality
you're writing, even if that program had nothing to do with Tor, your
functionality belongs in lib.
If it falls into one of the existing "lib" categories, your
functionality belongs in lib.
If you are using platform-specific `#ifdef`s to manage compatibility
issues among platforms, you should probably consider whether you can
put your code into lib.