From 981350815dfd3f7f7a91eb6cf3f2cbcc9e93a729 Mon Sep 17 00:00:00 2001 From: Mike Perry Date: Thu, 15 Aug 2019 15:35:06 -0500 Subject: [PATCH] Add circuit padding developer and quickstart docs. --- doc/HACKING/CircuitPaddingDevelopment.md | 1230 ++++++++++++++++++++++ doc/HACKING/CircuitPaddingQuickStart.md | 263 +++++ 2 files changed, 1493 insertions(+) create mode 100644 doc/HACKING/CircuitPaddingDevelopment.md create mode 100644 doc/HACKING/CircuitPaddingQuickStart.md diff --git a/doc/HACKING/CircuitPaddingDevelopment.md b/doc/HACKING/CircuitPaddingDevelopment.md new file mode 100644 index 0000000000..a0dee68e91 --- /dev/null +++ b/doc/HACKING/CircuitPaddingDevelopment.md @@ -0,0 +1,1230 @@ +# Circuit Padding Developer Documentation + +This document is written for researchers who wish to prototype and evaluate circuit-level padding defenses in Tor. + +Written by Mike Perry and George Kadianakis. + +# Table of Contents + +- [0. Background](#0-background) +- [1. Introduction](#1-introduction) + - [1.1. System Overview](#11-system-overview) + - [1.2. Layering Model](#12-layering-model) + - [1.3. Computation Model](#13-computation-model) + - [1.4. Deployment Constraints](#14-other-deployment-constraints) +- [2. Creating New Padding Machines](#2-creating-new-padding-machines) + - [2.1. Registering a New Padding Machine](#21-registering-a-new-padding-machine) + - [2.2. Machine Activation and Shutdown](#22-machine-activation-and-shutdown) +- [3. Specifying Padding Machines](#3-specifying-padding-machines) + - [3.1. Padding Machine States](#31-padding-machine-states) + - [3.2. Padding Machine State Transitions](#32-padding-machine-state-transitions) + - [3.3. Specifying Per-State Padding](#33-specifying-per-state-padding) + - [3.4. Specifying Precise Cell Counts](#34-specifying-precise-cell-counts) + - [3.5. Specifying Overhead Limits](#35-specifying-overhead-limits) +- [4. Evaluating Padding Machines](#4-evaluating-padding-machines) + - [4.1. Pure Simulation](#41-pure-simulation) + - [4.2. Testing in Chutney](#42-testing-in-chutney) + - [4.3. Testing in Shadow](#43-testing-in-shadow) + - [4.4. Testing on the Live Network](#44-testing-on-the-live-network) +- [5. Example Padding Machines](#5-example-padding-machines) + - [5.1. Deployed Circuit Setup Machines](#51-deployed-circuit-setup-machines) + - [5.2. Adaptive Padding Early](#52-adaptive-padding-early) + - [5.3. Sketch of Tamaraw](#53-sketch-of-tamaraw) + - [5.4. Other Padding Machines](#54-other-padding-machines) +- [6. Framework Implementation Details](#6-framework-implementation-details) + - [6.1. Memory Allocation Conventions](#61-memory-allocation-conventions) + - [6.2. Machine Application Events](#62-machine-application-events) + - [6.3. Internal Machine Events](#63-internal-machine-events) +- [7. Future Features and Optimizations](#7-future-features-and-optimizations) + - [7.1. Load Balancing and Flow Control](#71-load-balancing-and-flow-control) + - [7.2. Timing and Queuing Optimizations](#72-timing-and-queuing-optimizations) + - [7.3. Better Machine Negotiation](#73-better-machine-negotiation) + - [7.4. Probabilistic State Transitions](#74-probabilistic-state-transitions) + - [7.5. More Complex Pattern Recognition](#75-more-complex-pattern-recognition) +- [8. Open Research Problems](#8-open-research-problems) + - [8.1. Onion Service Circuit Setup](#81-onion-service-circuit-setup) + - [8.2. Onion Service Fingerprinting](#82-onion-service-fingerprinting) + - [8.3. Open World Fingerprinting](#83-open-world-fingerprinting) + - [8.4. Protocol Usage Fingerprinting](#84-protocol-usage-fingerprinting) + - [8.5. Datagram Transport Side Channels](#85-datagram-transport-side-channels) +- [9. Must Read Papers](#9-must-read-papers) + + +## 0. Background + +Tor supports both connection-level and circuit-level padding, and both +systems are live on the network today. The connection-level padding behavior +is described in [section 2 of +padding-spec.txt](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L47). The +circuit-level padding behavior is described in [section 3 of +padding-spec.txt](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L282). + +These two systems are orthogonal and should not be confused. The +connection-level padding system is only active while the TLS connection is +otherwise idle. Moreover, it regards circuit-level padding as normal data +traffic, and hence while the circuit-level padding system is actively padding, +the connection-level padding system will not add any additional overhead. + +While the currently deployed circuit-level padding behavior is quite simple, +it is built on a flexible framework. This framework supports the description +of event-driven finite state machine by filling in fields of a simple C +structure, and is designed to support any delay-free statistically shaped +cover traffic on individual circuits, with cover traffic flowing to and from a +node of the implementor's choice (Guard, Middle, Exit, Rendezvous, etc). + +This class of system was first proposed in +[Timing analysis in low-latency mix networks: attacks and defenses](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf) +by Shmatikov and Wang, and extended for the website traffic fingerprinting +domain by Juarez et al. in +[Toward an Efficient Website Fingerprinting Defense](http://arxiv.org/pdf/1512.00524). The +framework also supports fixed parameterized probability distributions, as +used in [APE](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/) by Tobias +Pulls, and many other features. + +This document describes how to use Tor's circuit padding framework to +implement and deploy novel delay-free cover traffic defenses. + +## 1. Introduction + +The circuit padding framework is the official way to implement padding +defenses in Tor. It may be used in combination with application-layer +defenses, and/or obfuscation defenses, or on its own. + +Its current design should be enough to deploy most defenses without +modification, but you can extend it to [provide new +features](#7-future-features-and-optimizations) as well. + +### 1.1. System Overview + +Circuit-level padding can occur between Tor clients and relays at any hop of +one of the client's circuits. Both parties need to support the same padding +mechanisms for the system to work, and the client must enable it. + +We added a padding negotiation relay cell to the Tor protocol that clients use +to ask a relay to start padding, as well as a torrc directive for researchers +to pin their clients' relay selection to the subset of Tor nodes that +implement their custom defenses, to support ethical live network testing and +evaluation. + +Circuit-level padding is performed by 'padding machines'. A padding machine is +a finite state machine. Every state specifies a different form of +padding style, or stage of padding, in terms of inter-packet timings and total +packet counts. + +Padding state machines are specified by filling in fields of a C structure, +which specifies the transitions between padding states based on various events, +probability distributions of inter-packet delays, and the conditions under +which padding machines should be applied to circuits. + +This compact C structure representation is designed to function as a +microlanguage, which can be compiled down into a +bitstring that [can be tuned](#13-computation-model) using various +optimization methods (such as gradient descent, GAs, or GANs), either in +bitstring form or C struct form. + +The event driven, self-contained nature of this framework is also designed to +make [evaluation](#4-evaluating-padding-machines) both expedient and rigorously +reproducible. + +This document covers the engineering steps to write, test, and deploy a +padding machine, as well as how to extend the framework to support new machine +features. + +If you prefer to learn by example, you may want to skip to either the +[QuickStart Guide](CircuitPaddingQuickStart.md), and/or [Section +5](#5-example-padding-machines) for example machines to get you up and running +quickly. + +### 1.2. Layering Model + +The circuit padding framework is designed to provide one layer in a layered +system of interchangeable components. + +The circuit padding framework operates at the Tor circuit layer. It only deals +with the inter-cell timings and quantity of cells sent on a circuit. It can +insert cells on a circuit in arbitrary patterns, and in response to arbitrary +conditions, but it cannot delay cells. It also does not deal with packet +sizes, how cells are packed into TLS records, or ways that the Tor protocol +might be recognized on the wire. + +The problem of differentiating Tor traffic from non-Tor traffic based on +TCP/TLS packet sizes, initial handshake patterns, and DPI characteristics is the +domain of [pluggable +transports](https://trac.torproject.org/projects/tor/wiki/doc/AChildsGardenOfPluggableTransports), +which may optionally be used in conjunction with this framework (or without +it). + +This document focuses primarily on the circuit padding framework's cover +traffic features, and will only briefly touch on the potential obfuscation and +application layer coupling points of the framework. Explicit layer coupling +points can be created by adding either new [machine application +events](#62-machine-application-events) or new [internal machine +events](#63-internal-machine-events) to the circuit padding framework, so that +your padding machines can react to events from other layers. + +### 1.3. Computation Model + +The circuit padding framework is designed to support succinctly specified +defenses that can be tuned through [computer-assisted +optimization](#4-evaluating-padding-machines). + +We chose to generalize the original [Adaptive Padding 2-state +design](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf) into an +event-driven state machine because state machines are the simplest form of +sequence recognition devices from [automata +theory](https://en.wikipedia.org/wiki/Finite-state_machine). + +Most importantly: this framing allows cover traffic defenses to be modeled as +an optimization problem search space, expressed as fields of a C structure +(which is simultaneously a compact opaque bitstring as well as a symbolic +vector in an abstract feature space). This kind of space is particularly well +suited to search by gradient descent, GAs, and GANs. + +When performing this optimization search, each padding machine should have a +fitness function, which will allow two padding machines to be compared for +relative effectiveness. Optimization searches work best if this fitness can be +represented as a single number, for example the total amount by which it +reduces the [Balanced +Accuracy](https://en.wikipedia.org/wiki/Precision_and_recall#Imbalanced_Data) +of an adversary's classifier, divided by an amount of traffic overhead. + +Before you begin the optimization phase for your defense, you should +also carefully consider the [features and +optimizations](#7-future-features-and-optimizations) that we suspect will be +useful, and also see if you can come up with any more. You should similarly be +sure to restrict your search space to avoid areas of the bitstring/feature +vector that you are sure you will not need. For example, some +[applications](#8-open-research-problems) may not need the histogram +accounting used by Adaptive Padding, but might need to add other forms of +[pattern recognition](#75-more-complex-pattern-recognition) to react to +sequences that resemble HTTP GET and HTTP POST. + +### 1.4. Other Deployment Constraints + +The framework has some limitations that are the result of deliberate +choices. We are unlikely to deploy defenses that ignore these limitations. + +In particular, we have deliberately not provided any mechanism to delay actual +user traffic, even though we are keenly aware that if we were to support +additional delay, defenses would be able to have [more success with less +bandwidth +overhead](https://freedom.cs.purdue.edu/anonymity/trilemma/index.html). + +In the website traffic fingerprinting domain, [provably optimal +defenses](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf) +achieve their bandwidth overhead bounds by ensuring that a non-empty queue is +maintained, by rate limiting traffic below the actual throughput of a circuit. +For optimal results, this queue must avoid draining to empty, and yet it +must also be drained fast enough to avoid tremendous queue overhead in fast +Tor relays, which carry hundreds of thousands of circuits simultaneously. + +Unfortunately, Tor's end-to-end flow control is not congestion control. Its +window sizes are currently fixed. This means there is no signal when queuing +occurs, and thus no ability to limit queue size through pushback. This means +there is currently no way to do the fine-grained queue management necessary to +create such a queue and rate limit traffic effectively enough to keep this +queue from draining to empty, without also risking that aggregate queuing +would cause out-of-memory conditions on fast relays. + +It may be possible to create a congestion control algorithm that can support +such fine grained queue management, but this is a [deeply unsolved area of +research](https://lists.torproject.org/pipermail/tor-dev/2018-November/013562.html). + +Even beyond these major technical hurdles, additional latency is also +unappealing to the wider Internet community, for the simple reason that +bandwidth [continues to increase +exponentially](https://ipcarrier.blogspot.com/2014/02/bandwidth-growth-nearly-what-one-would.html) +where as the speed of light is fixed. Significant engineering effort has been +devoted to optimizations that reduce the effect of latency on Internet +protocols. To go against this trend would ensure our irrelevance to the wider +conversation about traffic analysis defenses for low latency Internet protocols. + +On the other hand, through [load +balancing](https://gitweb.torproject.org/torspec.git/tree/proposals/265-load-balancing-with-overhead.txt) +and [circuit multiplexing strategies](https://bugs.torproject.org/29494), we +believe it is possible to add significant bandwidth overhead in the form of +cover traffic, without significantly impacting end-user performance. + +For these reasons, we believe the trade-off should be in favor of adding more +cover traffic, rather than imposing queuing memory overhead and queuing delay. + +As a last resort for narrowly scoped application domains (such as +shaping Tor service-side onion service traffic to look like other websites or +different application-layer protocols), delay *may* be added at the +[application layer](https://petsymposium.org/2017/papers/issue2/paper54-2017-2-source.pdf). +Any additional cover traffic required by such defenses should still be +added at the circuit padding layer using this framework, to provide +engineering efficiency through loose layer coupling and component re-use, as +well as to provide additional gains against [low +resolution](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L47) +end-to-end traffic correlation. + +Because such delay-based defenses will impact performance significantly more +than simply adding cover traffic, they must be optional, and negotiated by +only specific application layer endpoints that want them. This will have +consequences for anonymity sets and base rates, if such traffic shaping and +additional cover traffic is not very carefully constructed. + +In terms of acceptable overhead, because Tor onion services +[currently use](https://metrics.torproject.org/hidserv-rend-relayed-cells.html) +less than 1% of the +[total consumed bandwidth](https://metrics.torproject.org/bandwidth-flags.html) +of the Tor network, and because onion services exist to provide higher +security as compared to Tor Exit traffic, they are an attractive target for +higher-overhead defenses. We encourage researchers to target this use case +for defenses that require more overhead, and/or for the deployment of +optional negotiated application-layer delays on either the server or the +client side. + +## 2. Creating New Padding Machines + +This section explains how to use the existing mechanisms in Tor to define a +new circuit padding machine. We assume here that you know C, and are at +least somewhat familiar with Tor development. For more information on Tor +development in general, see the other files in doc/HACKING/ in a recent Tor +distribution. + +Again, if you prefer to learn by example, you may want to skip to either the +[QuickStart Guide](CircuitPaddingQuickStart.md), and/or [Section +5](#5-example-padding-machines) for example machines to get up and running +quickly. + +To create a new padding machine, you must: + + 1. Define your machine using the fields of a heap-allocated + `circpad_machine_spec_t` C structure. + + 2. Register this object in the global list of available padding machines, + using `circpad_register_padding_machine()`. + + 3. Ensure that your machine is properly negotiated under your desired + circuit conditions. + +### 2.1. Registering a New Padding Machine + +Again, a circuit padding machine is designed to be specified entirely as a [single +C structure](#13-computation-model). + +Your machine definitions should go into their own functions in +[circuitpadding_machines.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c). For +details on all of the fields involved in specifying a padding machine, see +[Section 3](#3-specifying-padding-machines). + +You must register your machine in `circpad_machines_init()` in +[circuitpadding.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.c). To +add a new padding machine specification, you must allocate a +`circpad_machine_spec_t` on the heap with `tor_malloc_zero()`, give it a +human readable name string, and a machine number equivalent to the number of +machines in the list, and register the structure using +`circpad_register_padding_machine()`. + +Each machine must have a client instance and a relay instance. Register your +client-side machine instance in the `origin_padding_machines` list, and your +relay side machine instance in the `relay_padding_machines` list. Once you +have registered your instance, you do not need to worry about deallocation; +this is handled for you automatically. + +Both machine lists use registration order to signal machine precedence for a +given `machine_idx` slot on a circuit. This means that machines that are +registered last are checked for activation *before* machines that are +registered first. (This reverse precedence ordering allows us to +deprecate older machines simply by adding new ones after them.) + +### 2.2. Machine Activation and Shutdown + +After a machine has been successfully registered with the framework, it will +be instantiated on any client-side circuits that support it. Only client-side +circuits may initiate padding machines, but either clients or relays may shut +down padding machines. + +#### 2.2.1. Machine Application Conditions + +The +[circpad_machine_conditions_t conditions field](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L641) +of your `circpad_machine_spec_t` machine definition instance controls the +conditions under which your machine will be attached and enabled on a Tor +circuit, and when it gets shut down. + +*All* of your explicitly specified conditions in +`circpad_machine_spec_t.conditions` *must* be met for the machine to be +applied to a circuit. If *any* condition ceases to be met, then the machine +is shut down. (This is checked on every event that arrives, even if the +condition is unrelated to the event.) +Another way to look at this is that +all specified conditions must evaluate to true for the entire duration that +your machine is running. If any are false, your machine does not run (or +stops running and shuts down). + +In particular, as part of the +[circpad_machine_conditions_t structure](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L149), +the circuit padding subsystem gives the developer the option to enable a +machine based on: + - The + [length](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L157) + on the circuit (via the `min_hops` field). + - The + [current state](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L174) + of the circuit, such as streams, relay_early, etc. (via the + `circpad_circuit_state_t state_mask` field). + - The + [purpose](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L178) + (i.e. type) of the circuit (via the `circpad_purpose_mask_t purpose_mask` + field). + +This condition mechanism is the preferred way to determine if a machine should +apply to a circuit. For information about potentially useful conditions that +we have considered but have not yet implemented, see [Section +7.3](#73-better-machine-negotiation). We will happily accept patches for those +conditions, or any for other additional conditions that are needed for your +use case. + +#### 2.2.2. Detecting and Negotiating Machine Support + +When a new machine specification is added to Tor (or removed from Tor), you +should bump the Padding subprotocol version in `src/core/or/protover.c` and +`src/rust/protover/protover.rs`, add a field to `protover_summary_flags_t` in +`or.h`, and set this field in `memoize_protover_summary()` in versions.c. This +new field must then be checked in `circpad_node_supports_padding()` in +`circuitpadding.c`. + +Note that this protocol version update and associated support check is not +necessary if your experiments will *only* be using your own relays that +support your own padding machines. This can be accomplished by using the +`MiddleNodes` directive; see [Section 4](#4-evaluating-padding-machines) for more information. + +If the protocol support check passes for the circuit, then the client sends a +`RELAY_COMMAND_PADDING_NEGOTIATE` cell towards the +`circpad_machine_spec_t.target_hop` relay, and immediately enables the +padding machine, and may begin sending padding. (The framework does not wait +for the `RELAY_COMMAND_PADDING_NEGOTIATED` response to begin padding, +so that we can +switch between machines rapidly.) + +#### 2.2.3. Machine Shutdown Mechanisms + +Padding machines can be shut down on a circuit in three main ways: + 1. During a `circpad_machine_event` callback, when + `circpad_machine_spec_t.conditions` no longer applies (client side) + 2. After a transition to the CIRCPAD_STATE_END, if + `circpad_machine_spec_t.should_negotiate_end` is set (client or relay + side) + 3. If there is a `RELAY_COMMAND_PADDING_NEGOTIATED` error response from the + relay during negotiation. + +Each of these cases causes the originating node to send a relay cell towards +the other side, indicating that shutdown has occurred. The client side sends +`RELAY_COMMAND_PADDING_NEGOTIATE`, and the relay side sends +`RELAY_COMMAND_PADDING_NEGOTIATED`. + +Because padding from malicious exit nodes can be used to construct active +timing-based side channels to malicious guard nodes, the client checks that +padding-related cells only come from relays with active padding machines. +For this reason, when a client decides to shut down a padding machine, +the framework frees the mutable `circuit_t.padding_info`, but leaves the +`circuit_t.padding_machine` pointer set until the +`RELAY_COMMAND_PADDING_NEGOTIATED` response comes back, to ensure that any +remaining in-flight padding packets are recognized a valid. Tor does +not yet close circuits due to violation of this property, but the +[vanguards addon component "bandguard"](https://github.com/mikeperry-tor/vanguards/blob/master/README_TECHNICAL.md#the-bandguards-subsystem) +does. + +As an optimization, a client may replace a machine with another, by +sending a `RELAY_COMMAND_PADDING_NEGOTIATE` cell to shut down a machine, and +immediately sending a `RELAY_COMMAND_PADDING_NEGOTIATE` to start a new machine +in the same index, without waiting for the response from the first negotiate +cell. + +Unfortunately, there is a known bug as a consequence of this optimization. If +your machine depends on repeated shutdown and restart of the same machine +number on the same circuit, please see [Bug +30922](https://bugs.torproject.org/30992). Depending on your use case, we may +need to fix that bug or help you find a workaround. See also [Section +6.1.3](#613-deallocation-and-shutdown) for some more technical details on this +mechanism. + + +## 3. Specifying Padding Machines + +By now, you should understand how to register, negotiate, and control the +lifetime of your padding machine, but you still don't know how to make it do +anything yet. This section should help you understand how to specify how your +machine reacts to events and adds padding to the wire. + +If you prefer to learn by example first instead, you may wish to skip to +[Section 5](#5-example-padding-machines). + + +A padding machine is specified by filling in an instance of +[circpad_machine_spec_t](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L605). Instances +of this structure specify the precise functionality of a machine: it's +what the circuit padding developer is called to write. These instances +are created only at startup, and are referenced via `const` pointers during +normal operation. + +In this section we will go through the most important elements of this +structure. + +### 3.1. Padding Machine States + +A padding machine is a finite state machine where each state +specifies a different style of padding. + +As an example of a simple padding machine, you could have a state machine +with the following states: `[START] -> [SETUP] -> [HTTP] -> [END]` where the +`[SETUP]` state pads in a way that obfuscates the ''circuit setup'' of Tor, +and the `[HTTP]` state pads in a way that emulates a simple HTTP session. Of +course, padding machines can be more complicated than that, with dozens of +states and non-trivial transitions. + +Padding developers encode the machine states in the +[circpad_machine_spec_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L655). Each +machine state is described by a +[circpad_state_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L273) +and each such structure specifies the style and amount of padding to be sent, +as well as the possible state transitions. + +The function `circpad_machine_states_init()` must be used for allocating and +initializing the `circpad_machine_spec_t.states` array before states and +state transitions can be defined, as some of the state object has non-zero +default values. + +### 3.2. Padding Machine State Transitions + +As described above, padding machines can have multiple states, to +support different forms of padding. Machines can transition between +states based on events that occur either on the circuit level or on +the machine level. + +State transitions are specified using the +[next_state field](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L381) +of the `circpad_state_t` structure. As a simple example, to transition +from state `A` to state `B` when event `E` occurs, you would use the +following code: `A.next_state[E] = B`. + +#### 3.2.1. State Transition Events + +Here we will go through +[the various events](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L30) +that can be used to transition between states: + +* Circuit-level events + * `CIRCPAD_EVENT_NONPADDING_RECV`: A non-padding cell is received + * `CIRCPAD_EVENT_NONPADDING_SENT`: A non-adding cell is sent + * `CIRCPAD_EVENT_PADDING_SENT`: A padding cell is sent + * `CIRCPAD_EVENT_PADDING_RECV`: A padding cell is received +* Machine-level events + * `CIRCPAD_EVENT_INFINITY`: Tried to schedule padding using the ''infinity bin''. + * `CIRCPAD_EVENT_BINS_EMPTY`: All histogram bins are empty (out of tokens) + * `CIRCPAD_EVENT_LENGTH_COUNT`: State has used all its padding capacity (see `length_dist` below) + +### 3.3. Specifying Per-State Padding + +Each state of a padding machine specifies either: + * A padding histogram describing inter-transmission delays between cells; +d OR + * A parameterized delay probability distribution for inter-transmission + delays between cells. + +Either mechanism specifies essentially the *minimum inter-transmission time* +distribution. If non-padding traffic does not get transmitted from this +endpoint before the delay value sampled from this distribution expires, a +padding packet is sent. + +The choice between histograms and probability distributions can be subtle. A +rule of thumb is that probability distributions are easy to specify and +consume very little memory, but might not be able to describe certain types +of complex padding logic. Histograms, in contrast, can support precise +packet-count oriented or multimodal delay schemes, and can use token removal +logic to reduce overhead and shape the total padding+non-padding inter-packet +delay distribution towards an overall target distribution. + +We suggest that you start with a probability distribution if possible, and +you move to a histogram-based approach only if a probability distribution +does not suit your needs. + +#### 3.3.1. Padding Probability Distributions + +The easiest, most compact way to schedule padding using a machine state is to +use a probability distribution that specifies the possible delays. That can +be done +[using the circpad_state_t fields](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L339) +`iat_dist`, `dist_max_sample_usec` and `dist_added_shift_usec`. + +The Tor circuit padding framework +[supports multiple types](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L214) +of probability distributions, and the developer should use the +[circpad_distribution_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L240) +to specify them as well as the required parameters. + +#### 3.3.2. Padding Histograms + +A more advanced way to schedule padding is to use a ''padding +histogram''. The main advantages of a histogram are that it allows you to +specify distributions that are not easily parameterized in closed form, or +require specific packet counts at particular time intervals. Histograms also +allow you to make use of an optional traffic minimization and shaping +optimization called *token removal*, which is central to the original +[Adaptive Padding](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf) +concept. + +If a histogram is used by a state (as opposed to a fixed parameterized +distribution), then the developer must use the +[histogram-related fields](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L285) +of the `circpad_state_t` structure. + +The width of a histogram bin specifies the range of inter-packet delay times, +whereas its height specifies the amount of tokens in each bin. To sample a +padding delay from a histogram, we first randomly pick a bin (weighted by the +amount of tokens in each bin) and then sample a delay from within that bin by +picking a uniformly random delay using the width of the bin as the range. + +Each histogram also has an ''infinity bin'' as its final bin. If the +''infinity bin'' is chosen, +we don't schedule any padding (i.e., we schedule padding with +infinite delay). If the developer does not want infinite delay, they +should not give any tokens to the ''infinity bin''. + +If a token removal strategy is specified (via the +`circpad_state_t.token_removal` field), each time padding is sent using a +histogram, the padding machine will remove a token from the appropriate +histogram bin whenever this endpoint sends *either a padding packet or a +non-padding packet*. The different removal strategies govern what to do when +the bin corresponding to the current inter-packet delay is empty. + +Token removal is optional. It is useful if you want to do things like specify +a burst should be at least N packets long, and you only want to add padding +packets if there are not enough non-padding packets. The cost of doing token +removal is additional memory allocations for making per-circuit copies of +your histogram that can be modified. + +### 3.4. Specifying Precise Cell Counts + +Padding machines should be able to specify the exact amount of padding they +send. For histogram-based machines this can be done using a specific amount +of tokens, but another (and perhaps easier) way to do this, is to use the +[length_dist field](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L362) +of the `circpad_state_t` structure. + +The `length_dist` field is basically a probability distribution similar to the +padding probability distributions, which applies to a specific machine state +and specifies the amount of padding we are willing to send during that state. +This value gets sampled when we transition to that state (TODO document this +in the code). + +### 3.5. Specifying Overhead Limits + +Separately from the length counts, it is possible to rate limit the overhead +percentage of padding at both the global level across all machines, and on a +per-machine basis. + +At the global level, the overhead percentage of all circuit padding machines +as compared to total traffic can be limited through the Tor consensus +parameter `circpad_global_max_padding_pct`. This overhead is defined as the +percentage of padding cells *sent* out of the sum of non padding and padding +cells *sent*, and is applied *only after* at least +`circpad_global_allowed_cells` padding cells are sent by that relay or client +(to allow for small bursts of pure padding on otherwise idle or freshly +restarted relays). When both of these limits are hit by a relay or client, no +further padding cells will be sent, until sufficient non-padding traffic is +sent to cause the percentage of padding traffic to fall back below the +threshold. + +Additionally, each individual padding machine can rate limit itself by +filling in the fields `circpad_machine_spec_t.max_padding_percent` and +`circpad_machine_spec_t.allowed_padding_count`, which behave identically to +the consensus parameters, but only apply to that specific machine. + +## 4. Evaluating Padding Machines + +One of the goals of the circuit padding framework is to provide improved +evaluation and scientific reproducibility for lower cost. This includes both +the [choice](#13-computation-model) of the compact C structure representation +(which has an easy-to-produce bitstring representation for optimization by +gradient descent, GAs, or GANs), as well as rapid prototyping and evaluation. + +So far, whenever evaluation cost has been a barrier, each research group has +developed their own ad-hoc packet-level simulators of various padding +mechanisms for evaluating website fingerprinting attacks and defenses. The +process typically involves doing a crawl of Alexa top sites over Tor, and +recording the Tor cell count and timing information for each page in the +trace. These traces are then fed to simulations of defenses, which output +modified trace files. + +Because no standardized simulation and evaluation mechanism exists, it is +often hard to tell if independent implementations of various attacks and +defenses are in fact true-to-form or even properly calibrated for direct +comparison, and discrepancies in results across the literature suggests +this is not always so. + +Our preferred outcome with this framework is that machines are tuned +and optimized on a tracing simulator, but that the final results come from +an actual live network test of the defense. The traces from this final crawl +should be preserved as artifacts to be run on the simulator and reproduced +on the live network by future papers, ideally in journal venues that have an +artifact preservation policy. + +### 4.1. Pure Simulation + +When doing initial tuning of padding machines, especially in adversarial +settings, variations of a padding machine defense may need to be applied to +network activity hundreds or even millions of times. The wall-clock time +required to do this kind of tuning using live testing or even Shadow network +emulation may often be prohibitive. + +To help address this, and to better standardize results, Tobias Pulls has +implemented a [circpad machine trace simulator](https://github.com/pylls/circpad-sim), +which uses Tor's unit test framework to simulate applying padding machines to +circuit packet traces via a combination of Tor patches and python scripts. This +simulator can be used to record traces from clients, Guards, Middles, Exits, +and any other hop in the path, only for circuits that are run by the +researcher. This makes it possible to safely record baseline traces and +ultimately even mount passive attacks on the live network, without impacting +or recording any normal user traffic. + +In this way, a live crawl of the Alexa top sites could be performed once, to +produce a standard "undefended" corpus. Padding machines can be then quickly +evaluated and tuned on these simulated traces in a standardized way, and then +the results can then be [reproduced on the live Tor +network](#44-Testing-on-the-Live-Network) with the machines running on your own relays. + +Please be mindful of the Limitations section of the simulator documentation, +however, to ensure that you are aware of the edge cases and timing +approximations that are introduced by this approach. + +### 4.2. Testing in Chutney + +The Tor Project provides a tool called +[Chutney](https://github.com/torproject/chutney/) which makes it very easy to +setup private Tor networks. While getting it work for the first time might +take you some time of doc reading, the final result is well worth it for the +following reasons: + +- You control all the relays and hence you have greater control and debugging + capabilities. +- You control all the relays and hence you can toggle padding support on/off + at will. +- You don't need to be cautious about overhead or damaging the real Tor + network during testing. +- You don't even need to be online; you can do all your testing offline over + localhost. + +A final word of warning here is that since Chutney runs over localhost, the +packet latencies and delays are completely different from the real Tor +network, so if your padding machines rely on real network timings you will +get different results on Chutney. You can work around this by using a +different set of delays if Chutney is used, or by moving your padding +machines to the real network when you want to do latency-related testing. + +### 4.3. Testing in Shadow + +[Shadow](https://shadow.github.io/) is an environment for running entire Tor +network simulations, similar to Chutney, but designed to be both more memory +efficient, as well as provide an accurate Tor network topology and latency +model. + +While Shadow is significantly more memory efficient than Chutney, and can make +use of extremely accurate Tor network capacity and latency models, it will not +be as fast or efficient as the [circpad trace simulator](https://github.com/pylls/circpad-sim), +if you need to do many many iterations of an experiment to tune your defense. + +### 4.4. Testing on the Live Network + +Live network testing is the gold standard for verifying that any attack or +defense is behaving as expected, to minimize the influence of simplifying +assumptions. + +However, it is not ethical, or necessarily possible, to run high-resolution +traffic analysis attacks on the entire Tor network. But it is both ethical +and possible to run small scale experiments that target only your own +clients, who will only use your own Tor relays that support your new padding +machines. + +We provide the `MiddleNodes` torrc directive to enable this, which will allow +you to specify the fingerprints and/or IP netmasks of relays to be used in +the second hop position. Options to restrict other hops also exist, if your +padding system is padding to a different hop. The `HSLayer2Nodes` option +overrides the `MiddleNodes` option for onion service circuits, if both are +set. (The +[vanguards addon](https://github.com/mikeperry-tor/vanguards/README_TECHNICAL.md) +will set `HSLayer2Nodes`.) + +When you run your own clients, and use MiddleNodes to restrict your clients +to use your relays, you can perform live network evaluations of a defense +applied to whatever traffic crawl or activity your clients do. + +## 5. Example Padding Machines + +### 5.1. Deployed Circuit Setup Machines + +Tor currently has two padding machines enabled by default, which aim to hide +certain features of the client-side onion service circuit setup protocol. For +more details on their precise goal and function, please see +[proposal 302](https://github.com/torproject/torspec/blob/master/proposals/302-padding-machines-for-onion-clients.txt) +. In this section we will go over the code of those machines to clarify some +of the engineering parts: + +#### 5.1.1. Overview + +The codebase of proposal 302 can be found in +[circuitpadding_machines.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c) +and specifies four padding machines: + +- The [client-side introduction](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L60) circuit machine. +- The [relay-side introduction](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L146) circuit machine. +- The [client-side rendezvous](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L257) circuit machine +- The [relay-side rendezvous](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L374) circuit machine. + +Each of those machines has its own setup function, and +[they are all initialized](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.c#L2718) +by the circuit padding framework. To understand more about them, please +carefully read the individual setup function for each machine which are +fairly well documented. Each function goes through the following steps: +- Machine initialization + - Give it a [name](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L70) + - Specify [which hop](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L73) the padding should go to + - Specify whether it should be [client-side](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L75) or relay-side. +- Specify for [which types of circuits](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L78) the machine should apply +- Specify whether the circuits should be [kept alive](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c#L112) until the machine finishes padding. +- Sets [padding limits](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L116) to avoid too much overhead in case of bugs or errors. +- Setup [machine states](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L120) + - Specify [state transitions](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L123). +- Finally [register the machine](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L137) to the global machine list + +### 5.2. Adaptive Padding Early + +[Adaptive Padding Early](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/) +is a variant of Adaptive Padding/WTF-PAD that does not use histograms or token +removal to shift padding distributions, but instead uses fixed parameterized +distributions to specify inter-packet timing thresholds for burst and gap +inter-packet delays. + +Tobias Pulls's [QuickStart Guide](CircuitPaddingQuickStart.md) describes how +to get this machine up and running, and has links to branches with a working +implementation. + +### 5.3. Sketch of Tamaraw + +The [Tamaraw defense +paper](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf) is the +only defense to date that provides a proof of optimality for the finite-length +website traffic fingerprinting domain. These bounds assume that a defense is +able to perform a full, arbitrary transform of a trace that is under a fixed +number of packets in length. + +The key insight to understand Tamaraw's optimality is that it achieves one +such optimal transform by delaying traffic below a circuit's throughput. By +doing this, it creates a queue that is rarely empty, allowing it to produce +a provably optimal transform with minimal overhead. As [Section +1.4](#14-other-deployment-constraints) explains, this queue +cannot be maintained on the live Tor network without risk of out-of-memory +conditions at relays. + +However, if the queue is not maintained in the Tor network, but instead by the +application layer, it could be deployed by websites that opt in to using it. + +In this case, the application layer component would do *optional* constant +rate shaping, negotiated between a web browser and a website. The Circuit +Padding Framework can then easily fill in any missing gaps of cover traffic +packets, and also ensure that only a fixed length number of packets are sent +in total. + +However, for such a defense to be safe, additional care must be taken to +ensure that the resulting traffic pattern still has a large +anonymity/confusion set with other traces on the live network. + +Accomplishing this is an unsolved problem. + +### 5.4. Other Padding Machines + +Our partners in this project at RIT have produced a couple prototypes, based +on their published research designs +[REB and RBB](https://www.researchgate.net/publication/329743510_UNDERSTANDING_FEATURE_DISCOVERY_IN_WEBSITE_FINGERPRINTING_ATTACKS). + +As [their writeup +explains](https://github.com/notem/tor-rbp-padding-machine-doc), because RBB +uses delay, the circuit padding machine that they made is a no-delay version. + +They also ran into an issue with the 0-delay timing workaround for [bug +31653](https://bugs.torproject.org/31653). Keep an eye on that bug for updates +with improved workarounds/fixes. + +Their code is [available on github](https://github.com/notem/tor/tree/circuit_padding_rbp_machine). + + +## 6. Framework Implementation Details + +If you need to add additional events, conditions, or other features to the +circuit padding framework, then this section is for you. + +### 6.1. Memory Allocation Conventions + +If the existing circuit padding features are sufficient for your needs, then +you do not need to worry about memory management or pointer lifespans. The +circuit padding framework should take care of this for you automatically. + +However, if you need to add new padding machine features to support your +padding machines, then it will be helpful to understand how circuits +correspond to the global machine definitions, and how mutable padding machine +memory is managed. + +#### 6.1.1. Circuits and Padding Machines + +In Tor, the +[circuit_t structure](https://github.com/torproject/tor/blob/master/src/core/or/circuit_st.h) +is the superclass structure for circuit-related state that is used on both +clients and relays. On clients, the actual datatype of the object pointed to +by `circuit_t *` is the subclass structure +[origin_circuit_t](https://github.com/torproject/tor/blob/master/src/core/or/origin_circuit_st.h). The +macros `CIRCUIT_IS_ORIGIN()` and `TO_ORIGIN_CIRCUIT()` are used to determine +if a circuit is a client-side (origin) circuit and to cast the pointer safely +to `origin_circuit_t *`. + +Because circuit padding machines can be present at both clients and relays, +the circuit padding fields are stored in the `circuit_t *` superclass +structure. Notice that there are actually two sets of circuit padding fields: +a `const circpad_machine_spec_t *` array, and a `circpad_machine_runtime_t *` +array. Each of these arrays holds at most two elements, as there can be at +most two padding machines on each circuit. + +The `const circpad_machine_spec_t *` points to a globally allocated machine +specification. These machine specifications are +allocated and set up during Tor program startup, in `circpad_machines_init()` +in +[circuitpadding.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.c). Because +the machine specification object is shared by all circuits, it must not be +modified or freed until program exit (by `circpad_machines_free()`). The +`const` qualifier should enforce this at compile time. + +The `circpad_machine_runtime_t *` array member points to the mutable runtime +information for machine specification at that same array index. This runtime +structure keeps track of the current machine state, packet counts, and other +information that must be updated as the machine operates. When a padding +machine is successfully negotiated `circpad_setup_machine_on_circ()` allocates +the associated runtime information. + +#### 6.1.2. Histogram Management + +If a `circpad_state_t` of a machine specifies a `token_removal` strategy +other than `CIRCPAD_TOKEN_REMOVAL_NONE`, then every time +there is a state transition into this state, `circpad_machine_setup_tokens()` +will copy the read-only `circpad_state_t.histogram` array into a mutable +version at `circpad_machine_runtime_t.histogram`. This mutable copy is used +to decrement the histogram bin accounts as packets are sent, as per the +specified token removal strategy. + +When the state machine transitions out of this state, the mutable histogram copy is freed +by this same `circpad_machine_setup_tokens()` function. + +#### 6.1.3. Deallocation and Shutdown + +As an optimization, padding machines can be swapped in and out by the client +without waiting a full round trip for the relay machine to shut down. + +Internally, this is accomplished by immediately freeing the heap-allocated +`circuit_t.padding_info` field corresponding to that machine, but still preserving the +`circuit_t.padding_machine` pointer to the global padding machine +specification until the response comes back from the relay. Once the response +comes back, that `circuit_t.padding_machine` pointer is set to NULL, if the +response machine number matches the current machine present. + +Because of this partial shutdown condition, we have two macros for iterating +over machines. `FOR_EACH_ACTIVE_CIRCUIT_MACHINE_BEGIN()` is used to iterate +over machines that have both a `circuit_t.padding_info` slot and a +`circuit_t.padding_machine` slot occupied. `FOR_EACH_CIRCUIT_MACHINE_BEGIN()` +is used when we need to iterate over all machines that are either active or +are simply waiting for a response to a shutdown request. + +If the machine is replaced instead of just shut down, then the client frees +the `circuit_t.padding_info`, and then sets the `circuit_t.padding_machine` +and `circuit_t.padding_info` fields for this next machine immediately. This is +done in `circpad_add_matching_machines()`. In this case, since the new machine +should have a different machine number, the shut down response from the relay +is silently discarded, since it will not match the new machine number. + +If this sequence of machine teardown and spin-up happens rapidly enough for +the same machine number (as opposed to different machines), then a race +condition can happen. This is +[known bug #30992](https://bugs.torproject.org/30992). + +When the relay side decides to shut down a machine, it sends a +`RELAY_COMMAND_PADDING_NEGOTIATED` towards the client. If this cell matches the +current machine number on the client, that machine is torn down, by freeing +the `circuit_t.padding_info` slot and immediately setting +`circuit_t.padding_machine` slot to NULL. + +Additionally, if Tor decides to close a circuit forcibly due to error before +the padding machine is shut down, then `circuit_t.padding_info` is still +properly freed by the call to `circpad_circuit_free_all_machineinfos()` +in `circuit_free_()`. + +### 6.2. Machine Application Events + +The framework checks client-side origin circuits to see if padding machines +should be activated or terminated during specific event callbacks in +`circuitpadding.c`. We list these event callbacks here only for reference. You +should not modify any of these callbacks to get your machine to run; instead, +you should use the `circpad_machine_spec_t.conditions` field. + +However, you may add new event callbacks if you need other activation events, +for example to provide obfuscation-layer or application-layer signaling. Any +new event callbacks should behave exactly like the existing callbacks. + +During each of these event callbacks, the framework checks to see if any +current running padding machines have conditions that no longer apply as a +result of the event, and shuts those machines down. Then, it checks to see if +any new padding machines should be activated as a result of the event, based +on their circuit application conditions. **Remember: Machines are checked in +reverse order in the machine list. This means that later, more recently added +machines take precedence over older, earlier entries in each list.** + +Both of these checks are performed using the machine application conditions +that you specify in your machine's `circpad_machine_spec_t.conditions` field. + +The machine application event callbacks are prefixed by `circpad_machine_event_` by convention in circuitpadding.c. As of this writing, these callbacks are: + + - `circpad_machine_event_circ_added_hop()`: Called whenever a new hop is + added to a circuit. + - `circpad_machine_event_circ_built()`: Called when a circuit has completed + construction and is + opened. + - `circpad_machine_event_circ_purpose_changed()`: Called when a circuit + changes purpose. + - `circpad_machine_event_circ_has_no_relay_early()`: Called when a circuit + runs out of `RELAY_EARLY` cells. + - `circpad_machine_event_circ_has_streams()`: Called when a circuit gets a + stream attached. + - `circpad_machine_event_circ_has_no_streams()`: Called when the last + stream is detached from a circuit. + +### 6.3. Internal Machine Events + +To provide for some additional capabilities beyond simple finite state machine +behavior, the circuit padding machines also have internal events that they +emit to themselves when packet count length limits are hit, when the Infinity +bin is sampled, and when the histogram bins are emptied of all tokens. + +These events are implemented as `circpad_internal_event_*` functions in +`circuitpadding.c`, which are called from various areas that determine when +the events should occur. + +While the conditions that trigger these internal events to be called may be +complex, they are processed by the state machine definitions in a nearly +identical manner as the cell processing events, with the exception that they +are sent to the current machine only, rather than all machines on the circuit. + + +## 7. Future Features and Optimizations + +While implementing the circuit padding framework, our goal was to deploy a +system that obscured client-side onion service circuit setup and supported +deployment of WTF-PAD and/or APE. Along the way, we noticed several features +that might prove useful to others, but were not essential to implement +immediately. We do not have immediate plans to implement these ideas, but we +would gladly accept patches that do so. + +The following list gives an overview of these improvements, but as this +document ages, it may become stale. The canonical list of improvements that +researchers may find useful is tagged in our bugtracker with +[circpad-researchers](https://trac.torproject.org/projects/tor/query?keywords=~circpad-researchers), +and the list of improvements that are known to be necessary for some research +areas are tagged with +[circpad-researchers-want](https://trac.torproject.org/projects/tor/query?keywords=~circpad-researchers-want). + +Please consult those lists for the latest status of these issues. Note that +not all fixes will be backported to all Tor versions, so be mindful of which +Tor releases receive which fixes as you conduct your experiments. + +### 7.1. Load Balancing and Flow Control + +Fortunately, non-Exit bandwidth is already plentiful and exceeds the Exit +capacity, and we anticipate that if we inform our relay operator community of +the need for non-Exit bandwidth to satisfy padding overhead requirements, +they will be able to provide that with relative ease. + +Unfortunately, padding machines that have large quantities of overhead will +require changes to our load balancing system to account for this +overhead. The necessary changes are documented in +[Proposal 265](https://gitweb.torproject.org/torspec.git/tree/proposals/265-load-balancing-with-overhead.txt). + +Additionally, padding cells are not currently subjected to flow control. For +high amounts of padding, we may want to change this. See [ticket +31782](https://bugs.torproject.org/31782) for details. + +### 7.2. Timing and Queuing Optimizations + +The circuitpadding framework has some timing related issues that may impact +results. If high-resolution timestamps are fed to opaque deep learning +trainers, those training models may end up able to differentiate padding +traffic from non-padding traffic due to these timing bugs. + +The circuit padding cell event callbacks come from post-decryption points in +the cell processing codepath, and from the pre-queue points in the cell send +codepath. This means that our cell events will not reflect the actual time +when packets are read or sent on the wire. This is much worse in the send +direction, as the circuitmux queue, channel outbuf, and kernel TCP buffer will +impose significant additional delay between when we currently report that a +packet was sent, and when it actually hits the wire. + +[Ticket 29494](https://bugs.torproject.org/29494) has a more detailed +description of this problem, and an experimental branch that changes the cell +event callback locations to be from circuitmux post-queue, which with KIST, +should be an accurate reflection of when they are actually sent on the wire. + +If your padding machine and problem space depends on very accurate notions of +relay-side packet timing, please try that branch and let us know on the +ticket if you need any further assistance fixing it up. + +Additionally, with these changes, it will be possible to provide further +overhead reducing optimizations by letting machines specify flags to indicate +that padding should not be sent if there are any cells pending in the cell +queue, for doing things like extending cell bursts more accurately and with +less overhead. + +However, even if we solve the queuing issues, Tor's current timers are not as +precise as some padding designs may require. We will still have issues of +timing precision to solve. [Ticket 31653](https://bugs.torproject.org/31653) +describes an issue the circuit padding system has with sending 0-delay padding +cells, and [ticket 32670](https://bugs.torproject.org/32670) describes a +libevent timer accuracy issue, which causes callbacks to vary up to 10ms from +their scheduled time, even in absence of load. + +All of these issues strongly suggest that you either truncate the resolution +of any timers you feed to your classifier, or that you omit timestamps +entirely from the classification problem until these issues are addressed. + +### 7.3. Better Machine Negotiation + +Circuit padding is applied to circuits through machine conditions. + +The following machine conditions may be useful for some use cases, but have +not been implemented yet: + * [Exit Policy-based Stream Conditions](https://bugs.torproject.org/29083) + * [Probability to apply machine/Cointoss condition](https://bugs.torproject.org/30092) + * [Probability distributions for launching new padding circuit(s)](https://bugs.torproject.org/31783) + * [More flexible purpose handling](https://bugs.torproject.org/32040) + +Additionally, the following features may help to obscure that padding is being +negotiated, and/or streamline that negotiation process: + * [Always send negotiation cell on all circuits](https://bugs.torproject.org/30172) + * [Better shutdown handling](https://bugs.torproject.org/30992) + * [Preference-ordered negotiation menu](https://bugs.torproject.org/30348) + +### 7.4. Probabilistic State Transitions + +Right now, the state machine transitions are fully deterministic. However, +one could imagine a state machine that uses probabilistic transitions between +states to simulate a random walk or Hidden Markov Model traversal across +several pages. + +The simplest way to implement this is to make the `circpad_state_t.next_state` array +into an array of structs that have a next state field, and a probability to +transition to that state. + +If you need this feature, please see [ticket +31787](https://bugs.torproject.org/31787) for more details. + +### 7.5. More Complex Pattern Recognition + +State machines are extremely efficient sequence recognition devices. But they +are not great pattern recognition devices. This is one of the reasons why +[Adaptive Padding](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf) +used state machines in combination with histograms, to model the target +distribution of interpacket delays for transmitted packets. + +However, there currently is no such optimization for reaction to patterns of +*received* traffic. There may also be cases where defenses must react to more +complex patterns of sent traffic than can be expressed by our current +histogram and length count events. + +For example: if you wish your machine to react to a certain count of incoming +cells in a row, right now you have to have a state for each cell, and use the +infinity bin to timeout of the sequence in each state. We could make this more +compact if each state had an arrival cell counter and inter-cell timeout. Or +we could make more complex mechanisms to recognize certain patterns of arrival +traffic in a state. + +The best way to build recognition primitives like this into the framework is +to add additional [Internal Machine Events](#63-internal-machine-events) for +the pattern in question. + +As another simple example, a useful additional event might be to transition +whenever any of your histogram bins are empty, rather than all of them. To do +this, you would add `CIRCPAD_EVENT_ANY_BIN_EMPTY` to the enum +`circpad_event_t` in `circuitpadding.h`. You would then create a function +`circuitpadding_internal_event_any_bin_empty()`, which would work just like +`circuitpadding_internal_event_bin_empty()`, and also be called from +`check_machine_token_supply()` in `circuitpadding.c` but with the check for +each bin being zero instead of the total. With this change, new machines could +react to this new event in the same way as any other. + +If you have any other ideas that may be useful, please comment on [ticket +32680](https://bugs.torproject.org/32680). + + +## 8. Open Research Problems + +### 8.1. Onion Service Circuit Setup + +Our circuit setup padding does not address timing-based features, only +packet counts. Deep learning can probably see this. + +However, before going too deep down the timing rabithole, we may need to make +[some improvements to Tor](#72-timing-and-queuing-optimizations). Please +comment on those tickets if you need this. + +### 8.2. Onion Service Fingerprinting + +We have done nothing to obscure the service side of onion service circuit +setup. Because service-side onion services will have the reverse traffic byte +counts as normal clients, they will likely need some kind of [hybrid +application layer traffic shaping](#53-sketch-of-tamaraw), in addition to +simple circuit setup obfuscation. + +Fingerprinting in +[combination](https://github.com/mikeperry-tor/vanguards/blob/master/README_SECURITY.md) +with +[vanguards](https://github.com/mikeperry-tor/vanguards/blob/master/README_TECHNICAL.md) +ia also an open issue. + +### 8.3. Open World Fingerprinting + +Similarly, Open World/clearweb website fingerprinting defenses remain +an unsolved problem from the practicality point of view. The original WTF-PAD +defense was never tuned, and it is showing accuracy issues against deep +learning attacks. + +### 8.4. Protocol Usage Fingerprinting + +Traffic Fingerprinting to determine the protocol in use by a client has not +been studied, either from the attack or defense point of view. + +### 8.5. Datagram Transport Side Channels + +Padding can reduce the accuracy of dropped-cell side channels in such +transports, but we don't know [how to measure +this](https://lists.torproject.org/pipermail/tor-dev/2018-November/013562.html). + +## 9. Must Read Papers + +These are by far the most important papers in the space, to date: + + - [Tamaraw](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf) + - [Bayes, Not Naive](https://www.petsymposium.org/2017/papers/issue4/paper50-2017-4-source.pdf) + - [Anonymity Trilemma](https://eprint.iacr.org/2017/954.pdf) + - [WTF-PAD](http://arxiv.org/pdf/1512.00524) + +Except for WTF-PAD, these papers were selected because they point towards +optimality bounds that can be benchmarked against. + +We cite them even though we are skeptical that provably optimal defenses can +be constructed, at least not without trivial or impractical transforms (such as +those that can be created with unbounded queue capacity, or stored knowledge +of traces for every possible HTTP trace on the Internet). + +We also are not demanding an optimality or security proof for every defense. + +Instead, we cite the above as benchmarks. We believe the space, especially the +open-world case, to be more akin to an optimization problem, where a +WTF-PAD-like defense must be tuned through an optimizer to produce results +comparable to provably optimal but practically unrealizable defenses, through +rigorous adversarial evaluation. diff --git a/doc/HACKING/CircuitPaddingQuickStart.md b/doc/HACKING/CircuitPaddingQuickStart.md new file mode 100644 index 0000000000..167ff9f292 --- /dev/null +++ b/doc/HACKING/CircuitPaddingQuickStart.md @@ -0,0 +1,263 @@ +# A Padding Machine from Scratch + +A quickstart guide by Tobias Pulls. + +This document describes the process of building a "padding machine" in tor's new +circuit padding framework from scratch. Notes were taken as part of porting +[Adaptive Padding Early +(APE)](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/) from basket2 to the +circuit padding framework. The goal is just to document the process and provide +useful pointers along the way, not create a useful machine. + +The quick and dirty plan is to: +1. clone and compile tor +2. use newly built tor in TB and at small (non-exit) relay we run +3. add a bare-bones APE padding machine +4. run the machine, inspect logs for activity +5. port APE's state machine without thinking much about parameters + +## Clone and compile tor + +```bash +git clone https://git.torproject.org/tor.git +cd tor +git checkout tor-0.4.1.5 +``` +Above we use the tag for tor-0.4.1.5 where the circuit padding framework was +released. Note that this version of the framework is missing many features and +fixes that have since been merged to origin/master. If you need the newest +framework features, you should use that master instead. + +```bash +sh autogen.sh +./configure +make +``` +When you run `./configure` you'll be told of missing dependencies and packages +to install on debian-based distributions. Important: if you plan to run `tor` on +a relay as part of the real Tor network and your server runs a distribution that +uses systemd, then I'd recommend that you `apt install dpkg dpkg-dev +libevent-dev libssl-dev asciidoc quilt dh-apparmor libseccomp-dev dh-systemd +libsystemd-dev pkg-config dh-autoreconf libfakeroot zlib1g zlib1g-dev automake +liblzma-dev libzstd-dev` and ensure that tor has systemd support enabled: +`./configure --enable-systemd`. Without this, on a recent Ubuntu, my tor service +was forcefully restarted (SIGINT interrupt) by systemd every five minutes. + +If you want to install on your localsystem, run `make install`. For our case we +just want the tor binary at `src/app/tor`. + +## Use tor in TB and at a relay +Download and install a fresh Tor Browser (TB) from torproject.org. Make sure it +works. From the command line, relative to the folder created when you extracted +TB, run `./Browser/start-tor-browser --verbose` to get some basic log output. +Note the version of tor, in my case, `Tor 0.4.0.5 (git-bf071e34aa26e096)` as +part of TB 8.5.4. Shut down TB, copy the `tor` binary that you compiled earlier +and replace `Browser/TorBrowser/Tor/tor`. Start TB from the command line again, +you should see a different version, in my case `Tor 0.4.1.5 +(git-439ca48989ece545)`. + +The relay we run is also on linux, and `tor` is located at `/usr/bin/tor`. To +view relevant logs since last boot `sudo journalctl -b /usr/bin/tor`, where we +find `Tor 0.4.0.5 running on Linux`. Copy the locally compiled `tor` to the +relay at a temporary location and then make sure it's ownership and access +rights are identical to `/usr/bin/tor`. Next, shut down the running tor service +with `sudo service tor stop`, wait for it to stop (typically 30s), copy our +locally compiled tor to replace `/usr/bin/tor` then start the service again. +Checking the logs we see `or 0.4.1.5 (git-439ca48989ece545)`. + +Repeatedly shutting down a relay is detrimental to the network and should be +avoided. Sorry about that. + +We have one more step left before we move on the machine: configure TB to always +use our middle relay. Edit `Browser/TorBrowser/Data/Tor/torrc` and set +`MiddleNodes `, where `` is the fingerprint of the +relay. Start TB, visit a website, and manually confirm that the middle is used +by looking at the circuit display. + +## Add a bare-bones APE padding machine +Now the fun part. We have several resources at our disposal (mind that links +might be broken in the future, just search for the headings): +- The official [Circuit Padding Developer + Documentation](https://storm.torproject.org/shared/ChieH_sLU93313A2gopZYT3x2waJ41hz5Hn2uG1Uuh7). +- Notes we made on the [implementation of the circuit padding + framework](https://github.com/pylls/padding-machines-for-tor/blob/master/notes/circuit-padding-framework.md). +- The implementation of the current circuit padding machines in tor: + [circuitpadding.c](https://gitweb.torproject.org/tor.git/tree/src/core/or/circuitpadding_machines.c) + and + [circuitpadding_machines.h](https://gitweb.torproject.org/tor.git/tree/src/core/or/circuitpadding_machines.h). + +Please consult the above links for details. Moving forward, the focus is to +describe what was done, not necessarily explaining all the details why. + +Since we plan to make changes to tor, create a new branch `git checkout -b +circuit-padding-ape-machine tor-0.4.1.5`. + +We start with declaring two functions, one for the machine at the client and one +at the relay, in `circuitpadding_machines.h`: + +```c +void circpad_machine_relay_wf_ape(smartlist_t *machines_sl); +void circpad_machine_client_wf_ape(smartlist_t *machines_sl); +``` + +The definitions go into `circuitpadding_machines.c`: + +```c +/**************** Adaptive Padding Early (APE) machine ****************/ + +/** + * Create a relay-side padding machine based on the APE design. + */ +void +circpad_machine_relay_wf_ape(smartlist_t *machines_sl) +{ + circpad_machine_spec_t *relay_machine + = tor_malloc_zero(sizeof(circpad_machine_spec_t)); + + relay_machine->name = "relay_wf_ape"; + relay_machine->is_origin_side = 0; // relay-side + + // Pad to/from the middle relay, only when the circuit has streams + relay_machine->target_hopnum = 2; + relay_machine->conditions.min_hops = 2; + relay_machine->conditions.state_mask = CIRCPAD_CIRC_STREAMS; + + // limits to help guard against excessive padding + relay_machine->allowed_padding_count = 1; + relay_machine->max_padding_percent = 1; + + // one state to start with: START (-> END, never takes a slot in states) + circpad_machine_states_init(relay_machine, 1); + relay_machine->states[CIRCPAD_STATE_START]. + next_state[CIRCPAD_EVENT_NONPADDING_SENT] = + CIRCPAD_STATE_END; + + // register the machine + relay_machine->machine_num = smartlist_len(machines_sl); + circpad_register_padding_machine(relay_machine, machines_sl); + + log_info(LD_CIRC, + "Registered relay WF APE padding machine (%u)", + relay_machine->machine_num); +} + +/** + * Create a client-side padding machine based on the APE design. + */ +void +circpad_machine_client_wf_ape(smartlist_t *machines_sl) +{ + circpad_machine_spec_t *client_machine + = tor_malloc_zero(sizeof(circpad_machine_spec_t)); + + client_machine->name = "client_wf_ape"; + client_machine->is_origin_side = 1; // client-side + + /** Pad to/from the middle relay, only when the circuit has streams, and only + * for general purpose circuits (typical for web browsing) + */ + client_machine->target_hopnum = 2; + client_machine->conditions.min_hops = 2; + client_machine->conditions.state_mask = CIRCPAD_CIRC_STREAMS; + client_machine->conditions.purpose_mask = + circpad_circ_purpose_to_mask(CIRCUIT_PURPOSE_C_GENERAL); + + // limits to help guard against excessive padding + client_machine->allowed_padding_count = 1; + client_machine->max_padding_percent = 1; + + // one state to start with: START (-> END, never takes a slot in states) + circpad_machine_states_init(client_machine, 1); + client_machine->states[CIRCPAD_STATE_START]. + next_state[CIRCPAD_EVENT_NONPADDING_SENT] = + CIRCPAD_STATE_END; + + client_machine->machine_num = smartlist_len(machines_sl); + circpad_register_padding_machine(client_machine, machines_sl); + log_info(LD_CIRC, + "Registered client WF APE padding machine (%u)", + client_machine->machine_num); +} +``` + +We also have to modify `circpad_machines_init()` in `circuitpadding.c` to +register our machines: + +```c + /* Register machines for the APE WF defense */ + circpad_machine_client_wf_ape(origin_padding_machines); + circpad_machine_relay_wf_ape(relay_padding_machines); +``` + +We run `make` to get a new `tor` binary and copy it to our local TB. + +## Run the machine +To be able +to view circuit info events in the console as we launch TB, we add `Log +[circ]info notice stdout` to `torrc` of TB. + +Running TB to visit example.com we first find in the log: + +``` +Aug 30 18:36:43.000 [info] circpad_machine_client_hide_intro_circuits(): Registered client intro point hiding padding machine (0) +Aug 30 18:36:43.000 [info] circpad_machine_relay_hide_intro_circuits(): Registered relay intro circuit hiding padding machine (0) +Aug 30 18:36:43.000 [info] circpad_machine_client_hide_rend_circuits(): Registered client rendezvous circuit hiding padding machine (1) +Aug 30 18:36:43.000 [info] circpad_machine_relay_hide_rend_circuits(): Registered relay rendezvous circuit hiding padding machine (1) +Aug 30 18:36:43.000 [info] circpad_machine_client_wf_ape(): Registered client WF APE padding machine (2) +Aug 30 18:36:43.000 [info] circpad_machine_relay_wf_ape(): Registered relay WF APE padding machine (2) +``` + +All good, our machine is running. Looking further we find: + +``` +Aug 30 18:36:55.000 [info] circpad_setup_machine_on_circ(): Registering machine client_wf_ape to origin circ 2 (5) +Aug 30 18:36:55.000 [info] circpad_node_supports_padding(): Checking padding: supported +Aug 30 18:36:55.000 [info] circpad_negotiate_padding(): Negotiating padding on circuit 2 (5), command 2 +Aug 30 18:36:55.000 [info] circpad_machine_spec_transition(): Circuit 2 circpad machine 0 transitioning from 0 to 65535 +Aug 30 18:36:55.000 [info] circpad_machine_spec_transitioned_to_end(): Padding machine in end state on circuit 2 (5) +Aug 30 18:36:55.000 [info] circpad_circuit_machineinfo_free_idx(): Freeing padding info idx 0 on circuit 2 (5) +Aug 30 18:36:55.000 [info] circpad_handle_padding_negotiated(): Middle node did not accept our padding request on circuit 2 (5) +``` +We see that our middle support padding (since we upgraded to tor-0.4.1.5), that +we attempt to negotiate, our machine starts on the client, transitions to the +end state, and is freed. The last line shows that the middle doesn't have a +padding machine that can run. + +Next, we follow the same steps as earlier and replace the modified `tor` at our +middle relay. We don't update the logging there to avoid logging on the info +level on the live network. Looking at the client log again we see that +negotiation works as before except for the last line: it's missing, so the +machine is running at the middle as well. + +## Implementing the APE state machine + +Porting is fairly straightforward: define the states for all machines, add two +more machines (for the receive portion of WTFP-PAD, beyond AP), and pick +reasonable parameters for the distributions (I completely winged it now, as when +implementing APE). The [circuit-padding-ape-machine +branch](https://github.com/pylls/tor/tree/circuit-padding-ape-machine) contains +the commits for the full machines with plenty of comments. + +Some comments on the process: + +- `tor-0.4.1.5` does not support two machines on the same circuit, the following + fix has to be made: https://trac.torproject.org/projects/tor/ticket/31111 . + The good news is that everything else seems to work after the small change in + the fix. +- APE randomizes its distributions. Currently, this can only be done during + start of `tor`. This makes sense in the censorship circumvention setting + (`obfs4`), less so for WF defenses: further randomizing each circuit is likely + a PITA for attackers with few downsides. +- it was annoying to figure out that the lack of systemd support in my compiled + tor caused systemd to interrupt (SIGINT) my tor process at the middle relay + every five minutes. Updated build steps above to hopefully save others the + pain. +- there's for sure some bug on relays when sending padding cells too early (?). + It can happen with some probability with the APE implementation due to + `circpad_machine_relay_wf_ape_send()`. Will investigate next. +- Moving the registration of machines from the definition of the machines to + `circpad_machines_init()` makes sense, as suggested in the circuit padding doc + draft. + +Remember that APE is just a proof-of-concept and we make zero claims about its +ability to withstand WF attacks, in particular those based on deep learning.