mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-13 06:33:44 +01:00
Merge branch 'design_revision'
This commit is contained in:
commit
dfe7f004df
@ -5,18 +5,20 @@ This document describes the general structure of the Tor codebase, how
|
|||||||
it fits together, what functionality is available for extending Tor,
|
it fits together, what functionality is available for extending Tor,
|
||||||
and gives some notes on how Tor got that way.
|
and gives some notes on how Tor got that way.
|
||||||
|
|
||||||
Tor remains a work in progress: We've been working on it for more than a
|
Tor remains a work in progress: We've been working on it for nearly two
|
||||||
decade, and we've learned a lot about good coding since we first
|
decades, and we've learned a lot about good coding since we first
|
||||||
started. This means, however, that some of the older pieces of Tor will
|
started. This means, however, that some of the older pieces of Tor will
|
||||||
have some "code smell" in them that could sure stand a brisk
|
have some "code smell" in them that could stand a brisk
|
||||||
refactoring. So when I describe a piece of code, I'll sometimes give a
|
refactoring. So when I describe a piece of code, I'll sometimes give a
|
||||||
note on how it got that way, and whether I still think that's a good
|
note on how it got that way, and whether I still think that's a good
|
||||||
idea.
|
idea.
|
||||||
|
|
||||||
The first drafts of this document were written in the Summer and Fall of
|
The first drafts of this document were written in the Summer and Fall of
|
||||||
2015, when Tor 0.2.6 was the most recent stable version, and Tor 0.2.7
|
2015, when Tor 0.2.6 was the most recent stable version, and Tor 0.2.7
|
||||||
was under development. If you're reading this far in the future, some
|
was under development. There is a revision in progress (as of late
|
||||||
things may have changed. Caveat haxxor!
|
2019), to bring it up to pace with Tor as of version 0.4.2. If you're
|
||||||
|
reading this far in the future, some things may have changed. Caveat
|
||||||
|
haxxor!
|
||||||
|
|
||||||
This document is not an overview of the Tor protocol. For that, see the
|
This document is not an overview of the Tor protocol. For that, see the
|
||||||
design paper and the specifications at https://spec.torproject.org/ .
|
design paper and the specifications at https://spec.torproject.org/ .
|
||||||
@ -24,8 +26,6 @@ design paper and the specifications at https://spec.torproject.org/ .
|
|||||||
For more information about Tor's coding standards and some helpful
|
For more information about Tor's coding standards and some helpful
|
||||||
development tools, see doc/HACKING in the Tor repository.
|
development tools, see doc/HACKING in the Tor repository.
|
||||||
|
|
||||||
For more information about writing tests, see doc/HACKING/WritingTests.txt
|
|
||||||
in the Tor repository.
|
|
||||||
|
|
||||||
### The very high level ###
|
### The very high level ###
|
||||||
|
|
||||||
@ -36,35 +36,59 @@ same codebase: the Tor process will run as a client, relay, or authority
|
|||||||
depending on its configuration.
|
depending on its configuration.
|
||||||
|
|
||||||
Tor has a few major dependencies, including Libevent (used to tell which
|
Tor has a few major dependencies, including Libevent (used to tell which
|
||||||
sockets are readable and writable), OpenSSL (used for many encryption
|
sockets are readable and writable), OpenSSL or NSS (used for many encryption
|
||||||
functions, and to implement the TLS protocol), and zlib (used to
|
functions, and to implement the TLS protocol), and zlib (used to
|
||||||
compress and uncompress directory information).
|
compress and uncompress directory information).
|
||||||
|
|
||||||
Most of Tor's work today is done in a single event-driven main thread.
|
Most of Tor's work today is done in a single event-driven main thread.
|
||||||
Tor also spawns one or more worker threads to handle CPU-intensive
|
Tor also spawns one or more worker threads to handle CPU-intensive
|
||||||
tasks. (Right now, this only includes circuit encryption.)
|
tasks. (Right now, this only includes circuit encryption and the more
|
||||||
|
expensive compression algorithms.)
|
||||||
|
|
||||||
On startup, Tor initializes its libraries, reads and responds to its
|
On startup, Tor initializes its libraries, reads and responds to its
|
||||||
configuration files, and launches a main event loop. At first, the only
|
configuration files, and launches a main event loop. At first, the only
|
||||||
events that Tor listens for are a few signals (like TERM and HUP), and
|
events that Tor listens for are a few signals (like TERM and HUP), and
|
||||||
one or more listener sockets (for different kinds of incoming
|
one or more listener sockets (for different kinds of incoming
|
||||||
connections). Tor also configures a timer function to run once per
|
connections). Tor also configures several timers to handle periodic
|
||||||
second to handle periodic events. As Tor runs over time, other events
|
events. As Tor runs over time, other events will open, and new events
|
||||||
will open, and new events will be scheduled.
|
will be scheduled.
|
||||||
|
|
||||||
The codebase is divided into a few main subdirectories:
|
The codebase is divided into a few top-level subdirectories, each of
|
||||||
|
which contains several sub-modules.
|
||||||
|
|
||||||
src/common -- utility functions, not necessarily tor-specific.
|
* `src/ext` -- Code maintained elsewhere that we include in the Tor
|
||||||
|
source distribution.
|
||||||
|
|
||||||
src/or -- implements the Tor protocols.
|
* src/lib` -- Lower-level utility code, not necessarily tor-specific.
|
||||||
|
|
||||||
src/test -- unit and regression tests
|
* `src/trunnel` -- Automatically generated code (from the Trunnel
|
||||||
|
tool): used to parse and encode binary formats.
|
||||||
|
|
||||||
src/ext -- Code maintained elsewhere that we include in the Tor
|
* `src/core` -- Networking code that is implements the central parts of
|
||||||
source distribution.
|
the Tor protocol and main loop.
|
||||||
|
|
||||||
src/trunnel -- automatically generated code (from the Trunnel)
|
* `src/feature` -- Aspects of Tor (like directory management, running a
|
||||||
tool: used to parse and encode binary formats.
|
relay, running a directory authorities, managing a list of nodes,
|
||||||
|
running and using onion services) that are built on top of the
|
||||||
|
mainloop code.
|
||||||
|
|
||||||
|
* `src/app` -- Highest-level functionality; responsible for setting up
|
||||||
|
and configuring the Tor daemon, making sure all the lower-level
|
||||||
|
modules start up when required, and so on.
|
||||||
|
|
||||||
|
* `src/tools` -- Binaries other than Tor that we produce. Currently this
|
||||||
|
is tor-resolve, tor-gencert, and the tor_runner.o helper module.
|
||||||
|
|
||||||
|
* `src/test` -- unit tests, regression tests, and a few integration
|
||||||
|
tests.
|
||||||
|
|
||||||
|
In theory, the above parts of the codebase are sorted from highest-level to
|
||||||
|
lowest-level, where high-level code is only allowed to invoke lower-level
|
||||||
|
code, and lower-level code never includes or depends on code of a higher
|
||||||
|
level. In practice, this refactoring is incomplete: The modules in `src/lib`
|
||||||
|
are well-factored, but there are many layer violations ("upward
|
||||||
|
dependencies") in `src/core` and `src/feature`. We aim to eliminate those
|
||||||
|
over time.
|
||||||
|
|
||||||
### Some key high-level abstractions ###
|
### Some key high-level abstractions ###
|
||||||
|
|
||||||
@ -94,31 +118,26 @@ If we switch to other strategies in the future, we'll have more
|
|||||||
connection types.
|
connection types.
|
||||||
|
|
||||||
A 'Node' is a view of a Tor instance's current knowledge and opinions
|
A 'Node' is a view of a Tor instance's current knowledge and opinions
|
||||||
about a Tor relay orbridge.
|
about a Tor relay or bridge.
|
||||||
|
|
||||||
### The rest of this document. ###
|
### The rest of this document. ###
|
||||||
|
|
||||||
> **Note**: This section describes the eventual organization of this
|
> **Note**: This section describes the eventual organization of this
|
||||||
> document, which is not yet complete.
|
> document, which is not yet complete.
|
||||||
|
|
||||||
We'll begin with an overview of the various utility functions available
|
We'll begin with an overview of the facilities provided by the modules
|
||||||
in Tor's 'common' directory. Knowing about these is key to writing
|
in src/lib. Knowing about these is key to writing portable, simple code
|
||||||
portable, simple code in Tor.
|
in Tor.
|
||||||
|
|
||||||
|
Then we'll move on to a discussion of how parts of the Tor codebase are
|
||||||
|
initialized, finalized, configured, and managed.
|
||||||
|
|
||||||
Then we'll go on and talk about the main data-flow of the Tor network:
|
Then we'll go on and talk about the main data-flow of the Tor network:
|
||||||
how Tor generates and responds to network traffic. This will occupy a
|
how Tor generates and responds to network traffic. This will occupy a
|
||||||
chapter for the main overview, with other chapters for special topics.
|
chapter for the main overview, with other chapters for special topics.
|
||||||
|
|
||||||
After that, we'll mention the main modules in Tor, and describe the
|
After that, we'll mention the main modules in src/features and describe the
|
||||||
function of each.
|
functions of each.
|
||||||
|
|
||||||
We'll cover the directory subsystem next: how Tor learns about other
|
|
||||||
relays, and how relays advertise themselves.
|
|
||||||
|
|
||||||
Then we'll cover a few specialized modules, such as hidden services,
|
|
||||||
sandboxing, hibernation, accounting, statistics, guards, path
|
|
||||||
generation, pluggable transports, and how they integrate with the rest of Tor.
|
|
||||||
|
|
||||||
We'll close with a meandering overview of important pending issues in
|
We'll close with a meandering overview of important pending issues in
|
||||||
the Tor codebase, and how they affect the future of the Tor software.
|
the Tor codebase, and how they affect the future of the Tor software.
|
||||||
|
|
||||||
|
@ -1,121 +0,0 @@
|
|||||||
|
|
||||||
## Utility code in Tor
|
|
||||||
|
|
||||||
Most of Tor's utility code is in modules in the src/common subdirectory.
|
|
||||||
|
|
||||||
These are divided, broadly, into _compatibility_ functions, _utility_
|
|
||||||
functions, _containers_, and _cryptography_. (Someday in the future, it
|
|
||||||
would be great to split these modules into separate directories. Also, some
|
|
||||||
functions are probably put in the wrong modules)
|
|
||||||
|
|
||||||
### Compatibility code
|
|
||||||
|
|
||||||
These functions live in src/common/compat\*.c; some corresponding macros live
|
|
||||||
in src/common/compat\*.h. They serve as wrappers around platform-specific or
|
|
||||||
compiler-specific logic functionality.
|
|
||||||
|
|
||||||
In general, the rest of the Tor code *should not* be calling platform-specific
|
|
||||||
or otherwise non-portable functions. Instead, they should call wrappers from
|
|
||||||
compat.c, which implement a common cross-platform API. (If you don't know
|
|
||||||
whether a function is portable, it's usually good enough to see whether it
|
|
||||||
exists on OSX, Linux, and Windows.)
|
|
||||||
|
|
||||||
Other compatibility modules include backtrace.c, which generates stack traces
|
|
||||||
for crash reporting; sandbox.c, which implements the Linux seccomp2 sandbox;
|
|
||||||
and procmon.c, which handles monitoring a child process.
|
|
||||||
|
|
||||||
Parts of address.c are compatibility code for handling network addressing
|
|
||||||
issues; other parts are in util.c.
|
|
||||||
|
|
||||||
Notable compatibility areas are:
|
|
||||||
|
|
||||||
* mmap support for mapping files into the address space (read-only)
|
|
||||||
|
|
||||||
* Code to work around the intricacies
|
|
||||||
|
|
||||||
* Workaround code for Windows's horrible winsock incompatibilities and
|
|
||||||
Linux's intricate socket extensions.
|
|
||||||
|
|
||||||
* Helpful string functions like memmem, memstr, asprintf, strlcpy, and
|
|
||||||
strlcat that not all platforms have.
|
|
||||||
|
|
||||||
* Locale-ignoring variants of the ctypes functions.
|
|
||||||
|
|
||||||
* Time-manipulation functions
|
|
||||||
|
|
||||||
* File locking function
|
|
||||||
|
|
||||||
* IPv6 functions for platforms that don't have enough IPv6 support
|
|
||||||
|
|
||||||
* Endianness functions
|
|
||||||
|
|
||||||
* OS functions
|
|
||||||
|
|
||||||
* Threading and locking functions.
|
|
||||||
|
|
||||||
=== Utility functions
|
|
||||||
|
|
||||||
General-purpose utilities are in util.c; they include higher-level wrappers
|
|
||||||
around many of the compatibility functions to provide things like
|
|
||||||
file-at-once access, memory management functions, math, string manipulation,
|
|
||||||
time manipulation, filesystem manipulation, etc.
|
|
||||||
|
|
||||||
(Some functionality, like daemon-launching, would be better off in a
|
|
||||||
compatibility module.)
|
|
||||||
|
|
||||||
In util_format.c, we have code to implement stuff like base-32 and base-64
|
|
||||||
encoding.
|
|
||||||
|
|
||||||
The address.c module interfaces with the system resolver and implements
|
|
||||||
address parsing and formatting functions. It converts sockaddrs to and from
|
|
||||||
a more compact tor_addr_t type.
|
|
||||||
|
|
||||||
The di_ops.c module provides constant-time comparison and associative-array
|
|
||||||
operations, for side-channel avoidance.
|
|
||||||
|
|
||||||
The logging subsystem in log.c supports logging to files, to controllers, to
|
|
||||||
stdout/stderr, or to the system log.
|
|
||||||
|
|
||||||
The abstraction in memarea.c is used in cases when a large amount of
|
|
||||||
temporary objects need to be allocated, and they can all be freed at the same
|
|
||||||
time.
|
|
||||||
|
|
||||||
The torgzip.c module wraps the zlib library to implement compression.
|
|
||||||
|
|
||||||
Workqueue.c provides a simple multithreaded work-queue implementation.
|
|
||||||
|
|
||||||
### Containers
|
|
||||||
|
|
||||||
The container.c module defines these container types, used throughout the Tor
|
|
||||||
codebase.
|
|
||||||
|
|
||||||
There is a dynamic array called **smartlist**, used as our general resizeable
|
|
||||||
array type. It supports sorting, searching, common set operations, and so
|
|
||||||
on. It has specialized functions for smartlists of strings, and for
|
|
||||||
heap-based priority queues.
|
|
||||||
|
|
||||||
There's a bit-array type.
|
|
||||||
|
|
||||||
A set of mapping types to map strings, 160-bit digests, and 256-bit digests
|
|
||||||
to void \*. These are what we generally use when we want O(1) lookup.
|
|
||||||
|
|
||||||
Additionally, for containers, we use the ht.h and tor_queue.h headers, in
|
|
||||||
src/ext. These provide intrusive hashtable and linked-list macros.
|
|
||||||
|
|
||||||
### Cryptography
|
|
||||||
|
|
||||||
Once, we tried to keep our cryptography code in a single "crypto.c" file,
|
|
||||||
with an "aes.c" module containing an AES implementation for use with older
|
|
||||||
OpenSSLs.
|
|
||||||
|
|
||||||
Now, our practice has become to introduce crypto_\*.c modules when adding new
|
|
||||||
cryptography backend code. We have modules for Ed25519, Curve25519,
|
|
||||||
secret-to-key algorithms, and password-based boxed encryption.
|
|
||||||
|
|
||||||
Our various TLS compatibility code, wrappers, and hacks are kept in
|
|
||||||
tortls.c, which is probably too full of Tor-specific kludges. I'm
|
|
||||||
hoping we can eliminate most of those kludges when we finally remove
|
|
||||||
support for older versions of our TLS handshake.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
171
doc/HACKING/design/01.00-lib-overview.md
Normal file
171
doc/HACKING/design/01.00-lib-overview.md
Normal file
@ -0,0 +1,171 @@
|
|||||||
|
|
||||||
|
## Library code in Tor.
|
||||||
|
|
||||||
|
Most of Tor's utility code is in modules in the `src/lib` subdirectory. In
|
||||||
|
general, this code is not necessarily Tor-specific, but is instead possibly
|
||||||
|
useful for other applications.
|
||||||
|
|
||||||
|
This code includes:
|
||||||
|
|
||||||
|
* Compatibility wrappers, to provide a uniform API across different
|
||||||
|
platforms.
|
||||||
|
|
||||||
|
* Library wrappers, to provide a tor-like API over different libraries
|
||||||
|
that Tor uses for things like compression and cryptography.
|
||||||
|
|
||||||
|
* Containers, to implement some general-purpose data container types.
|
||||||
|
|
||||||
|
The modules in `src/lib` are currently well-factored: each one depends
|
||||||
|
only on lower-level modules. You can see an up-to-date list of the
|
||||||
|
modules sorted from lowest to highest level by running
|
||||||
|
`./scripts/maint/practracker/includes.py --toposort`.
|
||||||
|
|
||||||
|
As of this writing, the library modules are (from lowest to highest
|
||||||
|
level):
|
||||||
|
|
||||||
|
* `lib/cc` -- Macros for managing the C compiler and
|
||||||
|
language. Includes macros for improving compatibility and clarity
|
||||||
|
across different C compilers.
|
||||||
|
|
||||||
|
* `lib/version` -- Holds the current version of Tor.
|
||||||
|
|
||||||
|
* `lib/testsupport` -- Helpers for making test-only code and test
|
||||||
|
mocking support.
|
||||||
|
|
||||||
|
* `lib/defs` -- Lowest-level constants used in many places across the
|
||||||
|
code.
|
||||||
|
|
||||||
|
* `lib/subsys` -- Types used for declaring a "subsystem". A subsystem
|
||||||
|
is a module with support for initialization, shutdown,
|
||||||
|
configuration, and so on.
|
||||||
|
|
||||||
|
* `lib/conf` -- Types and macros used for declaring configuration
|
||||||
|
options.
|
||||||
|
|
||||||
|
* `lib/arch` -- Compatibility functions and macros for handling
|
||||||
|
differences in CPU architecture.
|
||||||
|
|
||||||
|
* `lib/err` -- Lowest-level error handling code: responsible for
|
||||||
|
generating stack traces, handling raw assertion failures, and
|
||||||
|
otherwise reporting problems that might not be safe to report
|
||||||
|
via the regular logging module.
|
||||||
|
|
||||||
|
* `lib/malloc` -- Wrappers and utilities for memory management.
|
||||||
|
|
||||||
|
* `lib/intmath` -- Utilities for integer mathematics.
|
||||||
|
|
||||||
|
* `lib/fdio` -- Utilities and compatibility code for reading and
|
||||||
|
writing data on file descriptors (and on sockets, for platforms
|
||||||
|
where a socket is not a kind of fd).
|
||||||
|
|
||||||
|
* `lib/lock` -- Compatibility code for declaring and using locks.
|
||||||
|
Lower-level than the rest of the threading code.
|
||||||
|
|
||||||
|
* `lib/ctime` -- Constant-time implementations for data comparison
|
||||||
|
and table lookup, used to avoid timing side-channels from standard
|
||||||
|
implementations of memcmp() and so on.
|
||||||
|
|
||||||
|
* `lib/string` -- Low-level compatibility wrappers and utility
|
||||||
|
functions for string manipulation.
|
||||||
|
|
||||||
|
* `lib/wallclock` -- Compatibility and utility functions for
|
||||||
|
inspecting and manipulating the current (UTC) time.
|
||||||
|
|
||||||
|
* `lib/osinfo` -- Functions for inspecting the version and
|
||||||
|
capabilities of the operating system.
|
||||||
|
|
||||||
|
* `lib/smartlist_core` -- The bare-bones pieces of our dynamic array
|
||||||
|
("smartlist") implementation. There are higher-level pieces, but
|
||||||
|
these ones are used by (and therefore cannot use) the logging code.
|
||||||
|
|
||||||
|
* `lib/log` -- Implements the logging system used by all higher-level
|
||||||
|
Tor code. You can think of this as the logical "midpoint" of the
|
||||||
|
library code: much of the higher-level code is higher-level
|
||||||
|
_because_ it uses the logging module, and much of the lower-level
|
||||||
|
code is specifically written to avoid having to log, because the
|
||||||
|
logging module depends on it.
|
||||||
|
|
||||||
|
* `lib/container` -- General purpose containers, including dynamic arrays
|
||||||
|
("smartlists"), hashtables, bit arrays, weak-reference-like "handles",
|
||||||
|
bloom filters, and a bit more.
|
||||||
|
|
||||||
|
* `lib/trace` -- A general-purpose API for introducing
|
||||||
|
function-tracing functionality into Tor. Currently not much used.
|
||||||
|
|
||||||
|
* `lib/thread` -- Threading compatibility and utility functionality,
|
||||||
|
other than low-level locks (which are in `lib/lock`) and
|
||||||
|
workqueue/threadpool code (which belongs in `lib/evloop`).
|
||||||
|
|
||||||
|
* `lib/term` -- Code for terminal manipulation functions (like
|
||||||
|
reading a password from the user).
|
||||||
|
|
||||||
|
* `lib/memarea` -- A data structure for a fast "arena" style allocator,
|
||||||
|
where the data is freed all at once. Used for parsing.
|
||||||
|
|
||||||
|
* `lib/encoding` -- Implementations for encoding data in various
|
||||||
|
formats, datatypes, and transformations.
|
||||||
|
|
||||||
|
* `lib/dispatch` -- A general-purpose in-process message delivery
|
||||||
|
system. Used by `lib/pubsub` to implement our inter-module
|
||||||
|
publish/subscribe system.
|
||||||
|
|
||||||
|
* `lib/sandbox` -- Our Linux seccomp2 sandbox implementation.
|
||||||
|
|
||||||
|
* `lib/pubsub` -- Code and macros to implement our publish/subscribe
|
||||||
|
message passing system.
|
||||||
|
|
||||||
|
* `lib/fs` -- Utility and compatibility code for manipulating files,
|
||||||
|
filenames, directories, and so on.
|
||||||
|
|
||||||
|
* `lib/confmgt` -- Code to parse, encode, and manipulate our
|
||||||
|
configuration files, state files, and so forth.
|
||||||
|
|
||||||
|
* `lib/crypt_ops` -- Cryptographic operations. This module contains
|
||||||
|
wrappers around the cryptographic libraries that we support,
|
||||||
|
and implementations for some higher-level cryptographic
|
||||||
|
constructions that we use.
|
||||||
|
|
||||||
|
* `lib/meminfo` -- Functions for inspecting our memory usage, if the
|
||||||
|
malloc implementation exposes that to us.
|
||||||
|
|
||||||
|
* `lib/time` -- Higher level time functions, including fine-gained and
|
||||||
|
monotonic timers.
|
||||||
|
|
||||||
|
* `lib/math` -- Floating-point mathematical utilities, including
|
||||||
|
compatibility code, and probability distributions.
|
||||||
|
|
||||||
|
* `lib/buf` -- A general purpose queued buffer implementation,
|
||||||
|
similar to the BSD kernel's "mbuf" structure.
|
||||||
|
|
||||||
|
* `lib/net` -- Networking code, including address manipulation,
|
||||||
|
compatibility wrappers,
|
||||||
|
|
||||||
|
* `lib/compress` -- A compatibility wrapper around several
|
||||||
|
compression libraries, currently including zlib, zstd, and lzma.
|
||||||
|
|
||||||
|
* `lib/geoip` -- Utilities to manage geoip (IP to country) lookups
|
||||||
|
and formats.
|
||||||
|
|
||||||
|
* `lib/tls` -- Compatibility wrappers around the library (NSS or
|
||||||
|
OpenSSL, depending on configuration) that Tor uses to implement the
|
||||||
|
TLS link security protocol.
|
||||||
|
|
||||||
|
* `lib/evloop` -- Tools to manage the event loop and related
|
||||||
|
functionality, in order to implement asynchronous networking,
|
||||||
|
timers, periodic events, and other scheduling tasks.
|
||||||
|
|
||||||
|
* `lib/process` -- Utilities and compatibility code to launch and
|
||||||
|
manage subprocesses.
|
||||||
|
|
||||||
|
### What belongs in lib?
|
||||||
|
|
||||||
|
In general, if you can imagine some program wanting the functionality
|
||||||
|
you're writing, even if that program had nothing to do with Tor, your
|
||||||
|
functionality belongs in lib.
|
||||||
|
|
||||||
|
If it falls into one of the existing "lib" categories, your
|
||||||
|
functionality belongs in lib.
|
||||||
|
|
||||||
|
If you are using platform-specific `#ifdef`s to manage compatibility
|
||||||
|
issues among platforms, you should probably consider whether you can
|
||||||
|
put your code into lib.
|
@ -1,7 +1,7 @@
|
|||||||
|
|
||||||
## Memory management
|
## Memory management
|
||||||
|
|
||||||
### Heap-allocation functions
|
### Heap-allocation functions: lib/malloc/malloc.h
|
||||||
|
|
||||||
Tor imposes a few light wrappers over C's native malloc and free
|
Tor imposes a few light wrappers over C's native malloc and free
|
||||||
functions, to improve convenience, and to allow wholescale replacement
|
functions, to improve convenience, and to allow wholescale replacement
|
||||||
@ -12,63 +12,71 @@ own; always use the variants prefixed with 'tor_'.
|
|||||||
They are the same as the standard C functions, with the following
|
They are the same as the standard C functions, with the following
|
||||||
exceptions:
|
exceptions:
|
||||||
|
|
||||||
* tor_free(NULL) is a no-op.
|
* `tor_free(NULL)` is a no-op.
|
||||||
* tor_free() is a macro that takes an lvalue as an argument and sets it to
|
* `tor_free()` is a macro that takes an lvalue as an argument and sets it to
|
||||||
NULL after freeing it. To avoid this behavior, you can use tor_free_()
|
NULL after freeing it. To avoid this behavior, you can use `tor_free_()`
|
||||||
instead.
|
instead.
|
||||||
* tor_malloc() and friends fail with an assertion if they are asked to
|
* tor_malloc() and friends fail with an assertion if they are asked to
|
||||||
allocate a value so large that it is probably an underflow.
|
allocate a value so large that it is probably an underflow.
|
||||||
* It is always safe to tor_malloc(0), regardless of whether your libc
|
* It is always safe to `tor_malloc(0)`, regardless of whether your libc
|
||||||
allows it.
|
allows it.
|
||||||
* tor_malloc(), tor_realloc(), and friends are never allowed to fail.
|
* `tor_malloc()`, `tor_realloc()`, and friends are never allowed to fail.
|
||||||
Instead, Tor will die with an assertion. This means that you never
|
Instead, Tor will die with an assertion. This means that you never
|
||||||
need to check their return values. See the next subsection for
|
need to check their return values. See the next subsection for
|
||||||
information on why we think this is a good idea.
|
information on why we think this is a good idea.
|
||||||
|
|
||||||
We define additional general-purpose memory allocation functions as well:
|
We define additional general-purpose memory allocation functions as well:
|
||||||
|
|
||||||
* tor_malloc_zero(x) behaves as calloc(1, x), except the it makes clear
|
* `tor_malloc_zero(x)` behaves as `calloc(1, x)`, except the it makes clear
|
||||||
the intent to allocate a single zeroed-out value.
|
the intent to allocate a single zeroed-out value.
|
||||||
* tor_reallocarray(x,y) behaves as the OpenBSD reallocarray function.
|
* `tor_reallocarray(x,y)` behaves as the OpenBSD reallocarray function.
|
||||||
Use it for cases when you need to realloc() in a multiplication-safe
|
Use it for cases when you need to realloc() in a multiplication-safe
|
||||||
way.
|
way.
|
||||||
|
|
||||||
And specific-purpose functions as well:
|
And specific-purpose functions as well:
|
||||||
|
|
||||||
* tor_strdup() and tor_strndup() behaves as the underlying libc functions,
|
* `tor_strdup()` and `tor_strndup()` behaves as the underlying libc
|
||||||
but use tor_malloc() instead of the underlying function.
|
functions, but use `tor_malloc()` instead of the underlying function.
|
||||||
* tor_memdup() copies a chunk of memory of a given size.
|
* `tor_memdup()` copies a chunk of memory of a given size.
|
||||||
* tor_memdup_nulterm() copies a chunk of memory of a given size, then
|
* `tor_memdup_nulterm()` copies a chunk of memory of a given size, then
|
||||||
NUL-terminates it just to be safe.
|
NUL-terminates it just to be safe.
|
||||||
|
|
||||||
#### Why assert on failure?
|
#### Why assert on allocation failure?
|
||||||
|
|
||||||
Why don't we allow tor_malloc() and its allies to return NULL?
|
Why don't we allow `tor_malloc()` and its allies to return NULL?
|
||||||
|
|
||||||
First, it's error-prone. Many programmers forget to check for NULL return
|
First, it's error-prone. Many programmers forget to check for NULL return
|
||||||
values, and testing for malloc() failures is a major pain.
|
values, and testing for `malloc()` failures is a major pain.
|
||||||
|
|
||||||
Second, it's not necessarily a great way to handle OOM conditions. It's
|
Second, it's not necessarily a great way to handle OOM conditions. It's
|
||||||
probably better (we think) to have a memory target where we dynamically free
|
probably better (we think) to have a memory target where we dynamically free
|
||||||
things ahead of time in order to stay under the target. Trying to respond to
|
things ahead of time in order to stay under the target. Trying to respond to
|
||||||
an OOM at the point of tor_malloc() failure, on the other hand, would involve
|
an OOM at the point of `tor_malloc()` failure, on the other hand, would involve
|
||||||
a rare operation invoked from deep in the call stack. (Again, that's
|
a rare operation invoked from deep in the call stack. (Again, that's
|
||||||
error-prone and hard to debug.)
|
error-prone and hard to debug.)
|
||||||
|
|
||||||
Third, thanks to the rise of Linux and other operating systems that allow
|
Third, thanks to the rise of Linux and other operating systems that allow
|
||||||
memory to be overcommitted, you can't actually ever rely on getting a NULL
|
memory to be overcommitted, you can't actually ever rely on getting a NULL
|
||||||
from malloc() when you're out of memory; instead you have to use an approach
|
from `malloc()` when you're out of memory; instead you have to use an approach
|
||||||
closer to tracking the total memory usage.
|
closer to tracking the total memory usage.
|
||||||
|
|
||||||
#### Conventions for your own allocation functions.
|
#### Conventions for your own allocation functions.
|
||||||
|
|
||||||
Whenever you create a new type, the convention is to give it a pair of
|
Whenever you create a new type, the convention is to give it a pair of
|
||||||
x_new() and x_free() functions, named after the type.
|
`x_new()` and `x_free_()` functions, named after the type.
|
||||||
|
|
||||||
Calling x_free(NULL) should always be a no-op.
|
Calling `x_free(NULL)` should always be a no-op.
|
||||||
|
|
||||||
|
There should additionally be an `x_free()` macro, defined in terms of
|
||||||
|
`x_free_()`. This macro should set its lvalue to NULL. You can define it
|
||||||
|
using the FREE_AND_NULL macro, as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
#define x_free(ptr) FREE_AND_NULL(x_t, x_free_, (ptr))
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
### Grow-only memory allocation: memarea.c
|
### Grow-only memory allocation: lib/memarea
|
||||||
|
|
||||||
It's often handy to allocate a large number of tiny objects, all of which
|
It's often handy to allocate a large number of tiny objects, all of which
|
||||||
need to disappear at the same time. You can do this in tor using the
|
need to disappear at the same time. You can do this in tor using the
|
||||||
@ -82,12 +90,14 @@ objects of similar size. But if you use tor_malloc() for the long-lived ones
|
|||||||
and a memarea for the temporary object, the malloc implementation is likelier
|
and a memarea for the temporary object, the malloc implementation is likelier
|
||||||
to do better.
|
to do better.
|
||||||
|
|
||||||
To create a new memarea, use memarea_new(). To drop all the storage from a
|
To create a new memarea, use `memarea_new()`. To drop all the storage from a
|
||||||
memarea, and invalidate its pointers, use memarea_drop_all().
|
memarea, and invalidate its pointers, use `memarea_drop_all()`.
|
||||||
|
|
||||||
The allocation functions memarea_alloc(), memarea_alloc_zero(),
|
The allocation functions `memarea_alloc()`, `memarea_alloc_zero()`,
|
||||||
memarea_memdup(), memarea_strdup(), and memarea_strndup() are analogous to
|
`memarea_memdup()`, `memarea_strdup()`, and `memarea_strndup()` are analogous
|
||||||
the similarly-named malloc() functions. There is intentionally no
|
to the similarly-named malloc() functions. There is intentionally no
|
||||||
memarea_free() or memarea_realloc().
|
`memarea_free()` or `memarea_realloc()`.
|
||||||
|
|
||||||
|
### Special allocation: lib/malloc/map_anon.h
|
||||||
|
|
||||||
|
TODO: WRITEME.
|
||||||
|
@ -4,27 +4,27 @@
|
|||||||
### Smartlists: Neither lists, nor especially smart.
|
### Smartlists: Neither lists, nor especially smart.
|
||||||
|
|
||||||
For historical reasons, we call our dynamic-allocated array type
|
For historical reasons, we call our dynamic-allocated array type
|
||||||
"smartlist_t". It can grow or shrink as elements are added and removed.
|
`smartlist_t`. It can grow or shrink as elements are added and removed.
|
||||||
|
|
||||||
All smartlists hold an array of void \*. Whenever you expose a smartlist
|
All smartlists hold an array of `void *`. Whenever you expose a smartlist
|
||||||
in an API you *must* document which types its pointers actually hold.
|
in an API you *must* document which types its pointers actually hold.
|
||||||
|
|
||||||
<!-- It would be neat to fix that, wouldn't it? -NM -->
|
<!-- It would be neat to fix that, wouldn't it? -NM -->
|
||||||
|
|
||||||
Smartlists are created empty with smartlist_new() and freed with
|
Smartlists are created empty with `smartlist_new()` and freed with
|
||||||
smartlist_free(). See the containers.h module documentation for more
|
`smartlist_free()`. See the `containers.h` module documentation for more
|
||||||
information; there are many convenience functions for commonly needed
|
information; there are many convenience functions for commonly needed
|
||||||
operations.
|
operations.
|
||||||
|
|
||||||
|
<!-- TODO: WRITE more about what you can do with smartlists. -->
|
||||||
|
|
||||||
### Digest maps, string maps, and more.
|
### Digest maps, string maps, and more.
|
||||||
|
|
||||||
Tor makes frequent use of maps from 160-bit digests, 256-bit digests,
|
Tor makes frequent use of maps from 160-bit digests, 256-bit digests,
|
||||||
or nul-terminated strings to void \*. These types are digestmap_t,
|
or nul-terminated strings to `void *`. These types are `digestmap_t`,
|
||||||
digest256map_t, and strmap_t respectively. See the containers.h
|
`digest256map_t`, and `strmap_t` respectively. See the containers.h
|
||||||
module documentation for more information.
|
module documentation for more information.
|
||||||
|
|
||||||
|
|
||||||
### Intrusive lists and hashtables
|
### Intrusive lists and hashtables
|
||||||
|
|
||||||
For performance-sensitive cases, we sometimes want to use "intrusive"
|
For performance-sensitive cases, we sometimes want to use "intrusive"
|
||||||
@ -32,12 +32,14 @@ collections: ones where the bookkeeping pointers are stuck inside the
|
|||||||
structures that belong to the collection. If you've used the
|
structures that belong to the collection. If you've used the
|
||||||
BSD-style sys/queue.h macros, you'll be familiar with these.
|
BSD-style sys/queue.h macros, you'll be familiar with these.
|
||||||
|
|
||||||
Unfortunately, the sys/queue.h macros vary significantly between the
|
Unfortunately, the `sys/queue.h` macros vary significantly between the
|
||||||
platforms that have them, so we provide our own variants in
|
platforms that have them, so we provide our own variants in
|
||||||
src/ext/tor_queue.h .
|
`src/ext/tor_queue.h`.
|
||||||
|
|
||||||
We also provide an intrusive hashtable implementation in src/ext/ht.h
|
We also provide an intrusive hashtable implementation in `src/ext/ht.h`.
|
||||||
. When you're using it, you'll need to define your own hash
|
When you're using it, you'll need to define your own hash
|
||||||
functions. If attacker-induced collisions are a worry here, use the
|
functions. If attacker-induced collisions are a worry here, use the
|
||||||
cryptographic siphash24g function to extract hashes.
|
cryptographic siphash24g function to extract hashes.
|
||||||
|
|
||||||
|
<!-- TODO: WRITE about bloom filters, namemaps, bit-arrays, order functions.
|
||||||
|
-->
|
||||||
|
Loading…
Reference in New Issue
Block a user