Rewrite "common" overview into a "lib" overview.

This commit is contained in:
Nick Mathewson 2019-10-14 13:49:27 -04:00
parent 908070bbd5
commit 8ef5d96c2e

View File

@ -1,121 +1,171 @@
## Utility code in Tor ## Library code in Tor.
Most of Tor's utility code is in modules in the src/common subdirectory. Most of Tor's utility code is in modules in the `src/lib` subdirectory. In
general, this code is not necessarily Tor-specific, but is instead possibly
useful for other applications.
These are divided, broadly, into _compatibility_ functions, _utility_ This code includes:
functions, _containers_, and _cryptography_. (Someday in the future, it
would be great to split these modules into separate directories. Also, some
functions are probably put in the wrong modules)
### Compatibility code * Compatibility wrappers, to provide a uniform API across different
platforms.
These functions live in src/common/compat\*.c; some corresponding macros live * Library wrappers, to provide a tor-like API over different libraries
in src/common/compat\*.h. They serve as wrappers around platform-specific or that Tor uses for things like compression and cryptography.
compiler-specific logic functionality.
In general, the rest of the Tor code *should not* be calling platform-specific * Containers, to implement some general-purpose data container types.
or otherwise non-portable functions. Instead, they should call wrappers from
compat.c, which implement a common cross-platform API. (If you don't know
whether a function is portable, it's usually good enough to see whether it
exists on OSX, Linux, and Windows.)
Other compatibility modules include backtrace.c, which generates stack traces The modules in `src/lib` are currently well-factored: each one depends
for crash reporting; sandbox.c, which implements the Linux seccomp2 sandbox; only on lower-level modules. You can see an up-to-date list of the
and procmon.c, which handles monitoring a child process. modules sorted from lowest to highest level by running
`./scripts/maint/practracker/includes.py --toposort`.
Parts of address.c are compatibility code for handling network addressing As of this writing, the library modules are (from lowest to highest
issues; other parts are in util.c. level):
Notable compatibility areas are: * `lib/cc` -- Macros for managing the C compiler and
language. Includes macros for improving compatibility and clarity
across different C compilers.
* mmap support for mapping files into the address space (read-only) * `lib/version` -- Holds the current version of Tor.
* Code to work around the intricacies * `lib/testsupport` -- Helpers for making test-only code and test
mocking support.
* Workaround code for Windows's horrible winsock incompatibilities and * `lib/defs` -- Lowest-level constants used in many places across the
Linux's intricate socket extensions. code.
* Helpful string functions like memmem, memstr, asprintf, strlcpy, and * `lib/subsys` -- Types used for declaring a "subsystem". A subsystem
strlcat that not all platforms have. is a module with support for initialization, shutdown,
configuration, and so on.
* Locale-ignoring variants of the ctypes functions. * `lib/conf` -- Types and macros used for declaring configuration
options.
* Time-manipulation functions * `lib/arch` -- Compatibility functions and macros for handling
differences in CPU architecture.
* File locking function * `lib/err` -- Lowest-level error handling code: responsible for
generating stack traces, handling raw assertion failures, and
otherwise reporting problems that might not be safe to report
via the regular logging module.
* IPv6 functions for platforms that don't have enough IPv6 support * `lib/malloc` -- Wrappers and utilities for memory management.
* Endianness functions * `lib/intmath` -- Utilities for integer mathematics.
* OS functions * `lib/fdio` -- Utilities and compatibility code for reading and
writing data on file descriptors (and on sockets, for platforms
where a socket is not a kind of fd).
* Threading and locking functions. * `lib/lock` -- Compatibility code for declaring and using locks.
Lower-level than the rest of the threading code.
=== Utility functions * `lib/ctime` -- Constant-time implementations for data comparison
and table lookup, used to avoid timing side-channels from standard
implementations of memcmp() and so on.
General-purpose utilities are in util.c; they include higher-level wrappers * `lib/string` -- Low-level compatibility wrappers and utility
around many of the compatibility functions to provide things like functions for string manipulation.
file-at-once access, memory management functions, math, string manipulation,
time manipulation, filesystem manipulation, etc.
(Some functionality, like daemon-launching, would be better off in a * `lib/wallclock` -- Compatibility and utility functions for
compatibility module.) inspecting and manipulating the current (UTC) time.
In util_format.c, we have code to implement stuff like base-32 and base-64 * `lib/osinfo` -- Functions for inspecting the version and
encoding. capabilities of the operating system.
The address.c module interfaces with the system resolver and implements * `lib/smartlist_core` -- The bare-bones pieces of our dynamic array
address parsing and formatting functions. It converts sockaddrs to and from ("smartlist") implementation. There are higher-level pieces, but
a more compact tor_addr_t type. these ones are used by (and therefore cannot use) the logging code.
The di_ops.c module provides constant-time comparison and associative-array * `lib/log` -- Implements the logging system used by all higher-level
operations, for side-channel avoidance. Tor code. You can think of this as the logical "midpoint" of the
library code: much of the higher-level code is higher-level
_because_ it uses the logging module, and much of the lower-level
code is specifically written to avoid having to log, because the
logging module depends on it.
The logging subsystem in log.c supports logging to files, to controllers, to * `lib/container` -- General purpose containers, including dynamic arrays,
stdout/stderr, or to the system log. hashtables, bit arrays, weak-reference-like "handles", bloom
filters, and a bit more.
The abstraction in memarea.c is used in cases when a large amount of * `lib/trace` -- A general-purpose API for introducing
temporary objects need to be allocated, and they can all be freed at the same function-tracing functionality into Tor. Currently not much used.
time.
The torgzip.c module wraps the zlib library to implement compression. * `lib/thread` -- Threading compatibility and utility functionality,
other than low-level locks (which are in `lib/lock`) and
workqueue/threadpool code (which belongs in `lib/evloop`).
Workqueue.c provides a simple multithreaded work-queue implementation. * `lib/term` -- Code for terminal manipulation functions (like
reading a password from the user).
### Containers * `lib/memarea` -- A data structure for a fast "arena" style allocator,
where the data is freed all at once. Used for parsing.
The container.c module defines these container types, used throughout the Tor * `lib/encoding` -- Implementations for encoding data in various
codebase. formats, datatypes, and transformations.
There is a dynamic array called **smartlist**, used as our general resizeable * `lib/dispatch` -- A general-purpose in-process message delivery
array type. It supports sorting, searching, common set operations, and so system. Used by `lib/pubsub` to implement our inter-module
on. It has specialized functions for smartlists of strings, and for publish/subscribe system.
heap-based priority queues.
There's a bit-array type. * `lib/sandbox` -- Our Linux seccomp2 sandbox implementation.
A set of mapping types to map strings, 160-bit digests, and 256-bit digests * `lib/pubsub` -- Code and macros to implement our publish/subscribe
to void \*. These are what we generally use when we want O(1) lookup. message passing system.
Additionally, for containers, we use the ht.h and tor_queue.h headers, in * `lib/fs` -- Utility and compatibility code for manipulating files,
src/ext. These provide intrusive hashtable and linked-list macros. filenames, directories, and so on.
### Cryptography * `lib/confmgt` -- Code to parse, encode, and manipulate our
configuration files, state files, and so forth.
Once, we tried to keep our cryptography code in a single "crypto.c" file, * `lib/crypt_ops` -- Cryptographic operations. This module contains
with an "aes.c" module containing an AES implementation for use with older wrappers around the cryptographic libraries that we support,
OpenSSLs. and implementations for some higher-level cryptographic
constructions that we use.
Now, our practice has become to introduce crypto_\*.c modules when adding new * `lib/meminfo` -- Functions for inspecting our memory usage, if the
cryptography backend code. We have modules for Ed25519, Curve25519, malloc implementation exposes that to us.
secret-to-key algorithms, and password-based boxed encryption.
Our various TLS compatibility code, wrappers, and hacks are kept in * `lib/time` -- Higher level time functions, including fine-gained and
tortls.c, which is probably too full of Tor-specific kludges. I'm monotonic timers.
hoping we can eliminate most of those kludges when we finally remove
support for older versions of our TLS handshake.
* `lib/math` -- Floating-point mathematical utilities, including
compatibility code, and probability distributions.
* `lib/buf` -- A general purpose queued buffer implementation,
similar to the BSD kernel's "mbuf" structure.
* `lib/net` -- Networking code, including address manipulation,
compatibility wrappers,
* `lib/compress` -- A compatibility wrapper around several
compression libraries, currently including zlib, zstd, and lzma.
* `lib/geoip` -- Utilities to manage geoip (IP to country) lookups
and formats.
* `lib/tls` -- Compatibility wrappers around the library (NSS or
OpenSSL, depending on configuration) that Tor uses to implement the
TLS link security protocol.
* `lib/evloop` -- Tools to manage the event loop and related
functionality, in order to implement asynchronous networking,
timers, periodic events, and other scheduling tasks.
* `lib/process` -- Utilities and compatibility code to launch and
manage subprocesses.
### What belongs in lib?
In general, if you can imagine some program wanting the functionality
you're writing, even if that program had nothing to do with Tor, your
functionality belongs in lib.
If it falls into one of the existing "lib" categories, your
functionality belongs in lib.
If you are using platform-specific `#ifdef`s to manage compatibility
issues among platforms, you should probably consider whether you can
put your code into lib.