tor/src/mainpage.md

@mainpage Tor source reference

@tableofcontents

@section welcome Welcome to Tor

(For an up-to-date rendered copy of this documentation, see
https://src-ref.docs.torproject.org/tor/index.html .)

This documentation describes the general structure of the Tor codebase, how
it fits together, what functionality is available for extending Tor, and
gives some notes on how Tor got that way.  It also includes a reference for
nearly every function, type, file, and module in the Tor source code.  The
high-level documentation is a work in progress.

Tor itself remains a work in progress too: We've been working on it for
nearly two decades, and we've learned a lot about good coding since we first
started.  This means, however, that some of the older pieces of Tor will have
some "code smell" in them that could stand a brisk refactoring.  So when we
describe a piece of code, we'll sometimes give a note on how it got that way,
and whether we still think that's a good idea.

This document is not an overview of the Tor protocol.  For that, see the
design paper and the specifications at https://spec.torproject.org/ .

For more information about Tor's coding standards and some helpful
development tools, see
[doc/HACKING](https://gitweb.torproject.org/tor.git/tree/doc/HACKING) in the
Tor repository.

@section topics Topic-related documentation

@subpage intro

@subpage arch_goals

@subpage initialization

@subpage dataflow

@subpage certificates

@subpage threading

@subpage strings

@subpage time_periodic

@subpage configuration

@subpage publish_subscribe

@page intro A high-level overview

@tableofcontents

@section highlevel The very high level

Ultimately, Tor runs as an event-driven network daemon: it responds to
network events, signals, and timers by sending and receiving things over
the network.  Clients, relays, and directory authorities all use the
same codebase: the Tor process will run as a client, relay, or authority
depending on its configuration.

Tor has a few major dependencies, including Libevent (used to tell which
sockets are readable and writable), OpenSSL or NSS (used for many encryption
functions, and to implement the TLS protocol), and zlib (used to
compress and uncompress directory information).

Most of Tor's work today is done in a single event-driven main thread.
Tor also spawns one or more worker threads to handle CPU-intensive
tasks.  (Right now, this only includes circuit encryption and the more
expensive compression algorithms.)

On startup, Tor initializes its libraries, reads and responds to its
configuration files, and launches a main event loop.  At first, the only
events that Tor listens for are a few signals (like TERM and HUP), and
one or more listener sockets (for different kinds of incoming
connections).  Tor also configures several timers to handle periodic
events.  As Tor runs over time, other events will open, and new events
will be scheduled.

The codebase is divided into a few top-level subdirectories, each of
which contains several sub-modules.

   - \refdir{ext} -- Code maintained elsewhere that we include in the Tor
     source distribution.  You should not edit this code if you can
     avoid it: we try to keep it identical to the upstream versions.

   - \refdir{lib} -- Lower-level utility code, not necessarily
     tor-specific.

   - `trunnel` -- Automatically generated code (from the Trunnel
     tool): used to parse and encode binary formats.

   - \refdir{core} -- Networking code that is implements the central
     parts of the Tor protocol and main loop.

   - \refdir{feature} -- Aspects of Tor (like directory management,
     running a relay, running a directory authorities, managing a list of
     nodes, running and using onion services) that are built on top of the
     mainloop code.

   - \refdir{app} -- Highest-level functionality; responsible for setting
     up and configuring the Tor daemon, making sure all the lower-level
     modules start up when required, and so on.

   - \refdir{tools} -- Binaries other than Tor that we produce.
      Currently this is tor-resolve, tor-gencert, and the tor_runner.o helper
      module.

   - `test` -- unit tests, regression tests, and a few integration
     tests.

In theory, the above parts of the codebase are sorted from highest-level to
lowest-level, where high-level code is only allowed to invoke lower-level
code, and lower-level code never includes or depends on code of a higher
level.  In practice, this refactoring is incomplete: The modules in
\refdir{lib} are well-factored, but there are many layer violations ("upward
dependencies") in \refdir{core} and \refdir{feature}.
We aim to eliminate those over time.

@section keyabstractions Some key high-level abstractions

The most important abstractions at Tor's high-level are Connections,
Channels, Circuits, and Nodes.

A 'Connection' (connection_t) represents a stream-based information flow.
Most connections are TCP connections to remote Tor servers and clients. (But
as a shortcut, a relay will sometimes make a connection to itself without
actually using a TCP connection.  More details later on.)  Connections exist
in different varieties, depending on what functionality they provide.  The
principle types of connection are edge_connection_t (eg a socks connection or
a connection from an exit relay to a destination), or_connection_t (a TLS
stream connecting to a relay), dir_connection_t (an HTTP connection to learn
about the network), and control_connection_t (a connection from a
controller).

A 'Circuit' (circuit_t) is persistent tunnel through the Tor network,
established with public-key cryptography, and used to send cells one or more
hops.  Clients keep track of multi-hop circuits (origin_circuit_t), and the
cryptography associated with each hop.  Relays, on the other hand, keep track
only of their hop of each circuit (or_circuit_t).

A 'Channel' (channel_t) is an abstract view of sending cells to and from a
Tor relay.  Currently, all channels are implemented using OR connections
(channel_tls_t).  If we switch to other strategies in the future, we'll have
more connection types.

A 'Node' (node_t) is a view of a Tor instance's current knowledge and opinions
about a Tor relay or bridge.
Add a stub main page for the documentation. 2019-11-03 21:04:35 +01:00			`@mainpage Tor source reference`

Turn the "dataflow" document into a doxygen page. 2019-11-06 18:50:57 +01:00			`@tableofcontents`

			`@section welcome Welcome to Tor`
Move much of 00-overview.md into doxygen. 2019-11-04 18:22:58 +01:00
doxygen: add a link to src-ref on mainpage.md. Closes 34246. 2020-05-19 15:44:36 +02:00			`(For an up-to-date rendered copy of this documentation, see`
			`https://src-ref.docs.torproject.org/tor/index.html .)`

Move much of 00-overview.md into doxygen. 2019-11-04 18:22:58 +01:00			`This documentation describes the general structure of the Tor codebase, how`
			`it fits together, what functionality is available for extending Tor, and`
			`gives some notes on how Tor got that way. It also includes a reference for`
			`nearly every function, type, file, and module in the Tor source code. The`
			`high-level documentation is a work in progress.`

			`Tor itself remains a work in progress too: We've been working on it for`
			`nearly two decades, and we've learned a lot about good coding since we first`
			`started. This means, however, that some of the older pieces of Tor will have`
			`some "code smell" in them that could stand a brisk refactoring. So when we`
			`describe a piece of code, we'll sometimes give a note on how it got that way,`
			`and whether we still think that's a good idea.`

			`This document is not an overview of the Tor protocol. For that, see the`
			`design paper and the specifications at https://spec.torproject.org/ .`

			`For more information about Tor's coding standards and some helpful`
Doxygen: add several missing links. 2019-11-05 14:05:42 +01:00			`development tools, see`
			`[doc/HACKING](https://gitweb.torproject.org/tor.git/tree/doc/HACKING) in the`
			`Tor repository.`
Move much of 00-overview.md into doxygen. 2019-11-04 18:22:58 +01:00
Turn the "dataflow" document into a doxygen page. 2019-11-06 18:50:57 +01:00			`@section topics Topic-related documentation`

			`@subpage intro`

Document high-level architecture goals Create a high-level description of the long-term software architecture goals. Closes ticket 32206. 2019-12-09 00:07:34 +01:00			`@subpage arch_goals`

Improve subsys documentation; add initialization documentation. 2019-11-10 22:11:40 +01:00			`@subpage initialization`

Turn the "dataflow" document into a doxygen page. 2019-11-06 18:50:57 +01:00			`@subpage dataflow`
HACKING/design: turn the remaining parts of crypto into a certs doc 2019-11-12 18:01:33 +01:00
			`@subpage certificates`
01f-threads.md becomes threading.dox. 2019-11-12 18:05:05 +01:00
			`@subpage threading`

Move 01g-strings.md into doxygen. 2019-11-12 18:09:53 +01:00			`@subpage strings`

HACKING/design: move 01c-time.md into doxygen. 2019-11-12 18:14:14 +01:00			`@subpage time_periodic`

High-level documentation of configuration in Tor. Closes ticket 32209. 2019-11-15 16:49:25 +01:00			`@subpage configuration`

Merge branch 'ticket32207' 2019-12-02 17:41:40 +01:00			`@subpage publish_subscribe`
Turn the "dataflow" document into a doxygen page. 2019-11-06 18:50:57 +01:00
			`@page intro A high-level overview`

			`@tableofcontents`

Move much of 00-overview.md into doxygen. 2019-11-04 18:22:58 +01:00			`@section highlevel The very high level`

			`Ultimately, Tor runs as an event-driven network daemon: it responds to`
			`network events, signals, and timers by sending and receiving things over`
			`the network. Clients, relays, and directory authorities all use the`
			`same codebase: the Tor process will run as a client, relay, or authority`
			`depending on its configuration.`

			`Tor has a few major dependencies, including Libevent (used to tell which`
			`sockets are readable and writable), OpenSSL or NSS (used for many encryption`
			`functions, and to implement the TLS protocol), and zlib (used to`
			`compress and uncompress directory information).`

			`Most of Tor's work today is done in a single event-driven main thread.`
			`Tor also spawns one or more worker threads to handle CPU-intensive`
			`tasks. (Right now, this only includes circuit encryption and the more`
			`expensive compression algorithms.)`

			`On startup, Tor initializes its libraries, reads and responds to its`
			`configuration files, and launches a main event loop. At first, the only`
			`events that Tor listens for are a few signals (like TERM and HUP), and`
			`one or more listener sockets (for different kinds of incoming`
			`connections). Tor also configures several timers to handle periodic`
			`events. As Tor runs over time, other events will open, and new events`
			`will be scheduled.`

			`The codebase is divided into a few top-level subdirectories, each of`
			`which contains several sub-modules.`

Move description of ext contents into an md file. This way, doxygen can include it. 2020-07-02 19:46:09 +02:00			`- \refdir{ext} -- Code maintained elsewhere that we include in the Tor`
Mention that src/ext should not be edited lightly. Closes #32661 2020-07-02 19:38:13 +02:00			`source distribution. You should not edit this code if you can`
			`avoid it: we try to keep it identical to the upstream versions.`
Move much of 00-overview.md into doxygen. 2019-11-04 18:22:58 +01:00
			`- \refdir{lib} -- Lower-level utility code, not necessarily`
			`tor-specific.`

			- `trunnel` -- Automatically generated code (from the Trunnel
			`tool): used to parse and encode binary formats.`

			`- \refdir{core} -- Networking code that is implements the central`
			`parts of the Tor protocol and main loop.`

			`- \refdir{feature} -- Aspects of Tor (like directory management,`
			`running a relay, running a directory authorities, managing a list of`
			`nodes, running and using onion services) that are built on top of the`
			`mainloop code.`

			`- \refdir{app} -- Highest-level functionality; responsible for setting`
			`up and configuring the Tor daemon, making sure all the lower-level`
			`modules start up when required, and so on.`

			`- \refdir{tools} -- Binaries other than Tor that we produce.`
			`Currently this is tor-resolve, tor-gencert, and the tor_runner.o helper`
			`module.`

			- `test` -- unit tests, regression tests, and a few integration
			`tests.`

			`In theory, the above parts of the codebase are sorted from highest-level to`
			`lowest-level, where high-level code is only allowed to invoke lower-level`
			`code, and lower-level code never includes or depends on code of a higher`
			`level. In practice, this refactoring is incomplete: The modules in`
			`\refdir{lib} are well-factored, but there are many layer violations ("upward`
			`dependencies") in \refdir{core} and \refdir{feature}.`
			`We aim to eliminate those over time.`

			`@section keyabstractions Some key high-level abstractions`

			`The most important abstractions at Tor's high-level are Connections,`
			`Channels, Circuits, and Nodes.`

			`A 'Connection' (connection_t) represents a stream-based information flow.`
			`Most connections are TCP connections to remote Tor servers and clients. (But`
			`as a shortcut, a relay will sometimes make a connection to itself without`
			`actually using a TCP connection. More details later on.) Connections exist`
			`in different varieties, depending on what functionality they provide. The`
			`principle types of connection are edge_connection_t (eg a socks connection or`
			`a connection from an exit relay to a destination), or_connection_t (a TLS`
			`stream connecting to a relay), dir_connection_t (an HTTP connection to learn`
			`about the network), and control_connection_t (a connection from a`
			`controller).`

			`A 'Circuit' (circuit_t) is persistent tunnel through the Tor network,`
			`established with public-key cryptography, and used to send cells one or more`
			`hops. Clients keep track of multi-hop circuits (origin_circuit_t), and the`
			`cryptography associated with each hop. Relays, on the other hand, keep track`
			`only of their hop of each circuit (or_circuit_t).`

			`A 'Channel' (channel_t) is an abstract view of sending cells to and from a`
			`Tor relay. Currently, all channels are implemented using OR connections`
			`(channel_tls_t). If we switch to other strategies in the future, we'll have`
			`more connection types.`

			`A 'Node' (node_t) is a view of a Tor instance's current knowledge and opinions`
			`about a Tor relay or bridge.`