mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-30 23:53:32 +01:00
1235 lines
64 KiB
Markdown
1235 lines
64 KiB
Markdown
# Circuit Padding Developer Documentation
|
|
|
|
This document is written for researchers who wish to prototype and evaluate circuit-level padding defenses in Tor.
|
|
|
|
Written by Mike Perry and George Kadianakis.
|
|
|
|
# Table of Contents
|
|
|
|
- [0. Background](#0-background)
|
|
- [1. Introduction](#1-introduction)
|
|
- [1.1. System Overview](#11-system-overview)
|
|
- [1.2. Layering Model](#12-layering-model)
|
|
- [1.3. Computation Model](#13-computation-model)
|
|
- [1.4. Deployment Constraints](#14-other-deployment-constraints)
|
|
- [2. Creating New Padding Machines](#2-creating-new-padding-machines)
|
|
- [2.1. Registering a New Padding Machine](#21-registering-a-new-padding-machine)
|
|
- [2.2. Machine Activation and Shutdown](#22-machine-activation-and-shutdown)
|
|
- [3. Specifying Padding Machines](#3-specifying-padding-machines)
|
|
- [3.1. Padding Machine States](#31-padding-machine-states)
|
|
- [3.2. Padding Machine State Transitions](#32-padding-machine-state-transitions)
|
|
- [3.3. Specifying Per-State Padding](#33-specifying-per-state-padding)
|
|
- [3.4. Specifying Precise Cell Counts](#34-specifying-precise-cell-counts)
|
|
- [3.5. Specifying Overhead Limits](#35-specifying-overhead-limits)
|
|
- [4. Evaluating Padding Machines](#4-evaluating-padding-machines)
|
|
- [4.1. Pure Simulation](#41-pure-simulation)
|
|
- [4.2. Testing in Chutney](#42-testing-in-chutney)
|
|
- [4.3. Testing in Shadow](#43-testing-in-shadow)
|
|
- [4.4. Testing on the Live Network](#44-testing-on-the-live-network)
|
|
- [5. Example Padding Machines](#5-example-padding-machines)
|
|
- [5.1. Deployed Circuit Setup Machines](#51-deployed-circuit-setup-machines)
|
|
- [5.2. Adaptive Padding Early](#52-adaptive-padding-early)
|
|
- [5.3. Sketch of Tamaraw](#53-sketch-of-tamaraw)
|
|
- [5.4. Other Padding Machines](#54-other-padding-machines)
|
|
- [6. Framework Implementation Details](#6-framework-implementation-details)
|
|
- [6.1. Memory Allocation Conventions](#61-memory-allocation-conventions)
|
|
- [6.2. Machine Application Events](#62-machine-application-events)
|
|
- [6.3. Internal Machine Events](#63-internal-machine-events)
|
|
- [7. Future Features and Optimizations](#7-future-features-and-optimizations)
|
|
- [7.1. Load Balancing and Flow Control](#71-load-balancing-and-flow-control)
|
|
- [7.2. Timing and Queuing Optimizations](#72-timing-and-queuing-optimizations)
|
|
- [7.3. Better Machine Negotiation](#73-better-machine-negotiation)
|
|
- [7.4. Probabilistic State Transitions](#74-probabilistic-state-transitions)
|
|
- [7.5. More Complex Pattern Recognition](#75-more-complex-pattern-recognition)
|
|
- [8. Open Research Problems](#8-open-research-problems)
|
|
- [8.1. Onion Service Circuit Setup](#81-onion-service-circuit-setup)
|
|
- [8.2. Onion Service Fingerprinting](#82-onion-service-fingerprinting)
|
|
- [8.3. Open World Fingerprinting](#83-open-world-fingerprinting)
|
|
- [8.4. Protocol Usage Fingerprinting](#84-protocol-usage-fingerprinting)
|
|
- [8.5. Datagram Transport Side Channels](#85-datagram-transport-side-channels)
|
|
- [9. Must Read Papers](#9-must-read-papers)
|
|
|
|
|
|
## 0. Background
|
|
|
|
Tor supports both connection-level and circuit-level padding, and both
|
|
systems are live on the network today. The connection-level padding behavior
|
|
is described in [section 2 of
|
|
padding-spec.txt](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L47). The
|
|
circuit-level padding behavior is described in [section 3 of
|
|
padding-spec.txt](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L282).
|
|
|
|
These two systems are orthogonal and should not be confused. The
|
|
connection-level padding system is only active while the TLS connection is
|
|
otherwise idle. Moreover, it regards circuit-level padding as normal data
|
|
traffic, and hence while the circuit-level padding system is actively padding,
|
|
the connection-level padding system will not add any additional overhead.
|
|
|
|
While the currently deployed circuit-level padding behavior is quite simple,
|
|
it is built on a flexible framework. This framework supports the description
|
|
of event-driven finite state machine by filling in fields of a simple C
|
|
structure, and is designed to support any delay-free statistically shaped
|
|
cover traffic on individual circuits, with cover traffic flowing to and from a
|
|
node of the implementor's choice (Guard, Middle, Exit, Rendezvous, etc).
|
|
|
|
This class of system was first proposed in
|
|
[Timing analysis in low-latency mix networks: attacks and defenses](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
|
|
by Shmatikov and Wang, and extended for the website traffic fingerprinting
|
|
domain by Juarez et al. in
|
|
[Toward an Efficient Website Fingerprinting Defense](http://arxiv.org/pdf/1512.00524). The
|
|
framework also supports fixed parameterized probability distributions, as
|
|
used in [APE](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/) by Tobias
|
|
Pulls, and many other features.
|
|
|
|
This document describes how to use Tor's circuit padding framework to
|
|
implement and deploy novel delay-free cover traffic defenses.
|
|
|
|
## 1. Introduction
|
|
|
|
The circuit padding framework is the official way to implement padding
|
|
defenses in Tor. It may be used in combination with application-layer
|
|
defenses, and/or obfuscation defenses, or on its own.
|
|
|
|
Its current design should be enough to deploy most defenses without
|
|
modification, but you can extend it to [provide new
|
|
features](#7-future-features-and-optimizations) as well.
|
|
|
|
### 1.1. System Overview
|
|
|
|
Circuit-level padding can occur between Tor clients and relays at any hop of
|
|
one of the client's circuits. Both parties need to support the same padding
|
|
mechanisms for the system to work, and the client must enable it.
|
|
|
|
We added a padding negotiation relay cell to the Tor protocol that clients use
|
|
to ask a relay to start padding, as well as a torrc directive for researchers
|
|
to pin their clients' relay selection to the subset of Tor nodes that
|
|
implement their custom defenses, to support ethical live network testing and
|
|
evaluation.
|
|
|
|
Circuit-level padding is performed by 'padding machines'. A padding machine is
|
|
a finite state machine. Every state specifies a different form of
|
|
padding style, or stage of padding, in terms of inter-packet timings and total
|
|
packet counts.
|
|
|
|
Padding state machines are specified by filling in fields of a C structure,
|
|
which specifies the transitions between padding states based on various events,
|
|
probability distributions of inter-packet delays, and the conditions under
|
|
which padding machines should be applied to circuits.
|
|
|
|
This compact C structure representation is designed to function as a
|
|
microlanguage, which can be compiled down into a
|
|
bitstring that [can be tuned](#13-computation-model) using various
|
|
optimization methods (such as gradient descent, GAs, or GANs), either in
|
|
bitstring form or C struct form.
|
|
|
|
The event driven, self-contained nature of this framework is also designed to
|
|
make [evaluation](#4-evaluating-padding-machines) both expedient and rigorously
|
|
reproducible.
|
|
|
|
This document covers the engineering steps to write, test, and deploy a
|
|
padding machine, as well as how to extend the framework to support new machine
|
|
features.
|
|
|
|
If you prefer to learn by example, you may want to skip to either the
|
|
[QuickStart Guide](CircuitPaddingQuickStart.md), and/or [Section
|
|
5](#5-example-padding-machines) for example machines to get you up and running
|
|
quickly.
|
|
|
|
### 1.2. Layering Model
|
|
|
|
The circuit padding framework is designed to provide one layer in a layered
|
|
system of interchangeable components.
|
|
|
|
The circuit padding framework operates at the Tor circuit layer. It only deals
|
|
with the inter-cell timings and quantity of cells sent on a circuit. It can
|
|
insert cells on a circuit in arbitrary patterns, and in response to arbitrary
|
|
conditions, but it cannot delay cells. It also does not deal with packet
|
|
sizes, how cells are packed into TLS records, or ways that the Tor protocol
|
|
might be recognized on the wire.
|
|
|
|
The problem of differentiating Tor traffic from non-Tor traffic based on
|
|
TCP/TLS packet sizes, initial handshake patterns, and DPI characteristics is the
|
|
domain of [pluggable
|
|
transports](https://trac.torproject.org/projects/tor/wiki/doc/AChildsGardenOfPluggableTransports),
|
|
which may optionally be used in conjunction with this framework (or without
|
|
it).
|
|
|
|
This document focuses primarily on the circuit padding framework's cover
|
|
traffic features, and will only briefly touch on the potential obfuscation and
|
|
application layer coupling points of the framework. Explicit layer coupling
|
|
points can be created by adding either new [machine application
|
|
events](#62-machine-application-events) or new [internal machine
|
|
events](#63-internal-machine-events) to the circuit padding framework, so that
|
|
your padding machines can react to events from other layers.
|
|
|
|
### 1.3. Computation Model
|
|
|
|
The circuit padding framework is designed to support succinctly specified
|
|
defenses that can be tuned through [computer-assisted
|
|
optimization](#4-evaluating-padding-machines).
|
|
|
|
We chose to generalize the original [Adaptive Padding 2-state
|
|
design](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf) into an
|
|
event-driven state machine because state machines are the simplest form of
|
|
sequence recognition devices from [automata
|
|
theory](https://en.wikipedia.org/wiki/Finite-state_machine).
|
|
|
|
Most importantly: this framing allows cover traffic defenses to be modeled as
|
|
an optimization problem search space, expressed as fields of a C structure
|
|
(which is simultaneously a compact opaque bitstring as well as a symbolic
|
|
vector in an abstract feature space). This kind of space is particularly well
|
|
suited to search by gradient descent, GAs, and GANs.
|
|
|
|
When performing this optimization search, each padding machine should have a
|
|
fitness function, which will allow two padding machines to be compared for
|
|
relative effectiveness. Optimization searches work best if this fitness can be
|
|
represented as a single number, for example the total amount by which it
|
|
reduces the [Balanced
|
|
Accuracy](https://en.wikipedia.org/wiki/Precision_and_recall#Imbalanced_Data)
|
|
of an adversary's classifier, divided by an amount of traffic overhead.
|
|
|
|
Before you begin the optimization phase for your defense, you should
|
|
also carefully consider the [features and
|
|
optimizations](#7-future-features-and-optimizations) that we suspect will be
|
|
useful, and also see if you can come up with any more. You should similarly be
|
|
sure to restrict your search space to avoid areas of the bitstring/feature
|
|
vector that you are sure you will not need. For example, some
|
|
[applications](#8-open-research-problems) may not need the histogram
|
|
accounting used by Adaptive Padding, but might need to add other forms of
|
|
[pattern recognition](#75-more-complex-pattern-recognition) to react to
|
|
sequences that resemble HTTP GET and HTTP POST.
|
|
|
|
### 1.4. Other Deployment Constraints
|
|
|
|
The framework has some limitations that are the result of deliberate
|
|
choices. We are unlikely to deploy defenses that ignore these limitations.
|
|
|
|
In particular, we have deliberately not provided any mechanism to delay actual
|
|
user traffic, even though we are keenly aware that if we were to support
|
|
additional delay, defenses would be able to have [more success with less
|
|
bandwidth
|
|
overhead](https://freedom.cs.purdue.edu/anonymity/trilemma/index.html).
|
|
|
|
In the website traffic fingerprinting domain, [provably optimal
|
|
defenses](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf)
|
|
achieve their bandwidth overhead bounds by ensuring that a non-empty queue is
|
|
maintained, by rate limiting traffic below the actual throughput of a circuit.
|
|
For optimal results, this queue must avoid draining to empty, and yet it
|
|
must also be drained fast enough to avoid tremendous queue overhead in fast
|
|
Tor relays, which carry hundreds of thousands of circuits simultaneously.
|
|
|
|
Unfortunately, Tor's end-to-end flow control is not congestion control. Its
|
|
window sizes are currently fixed. This means there is no signal when queuing
|
|
occurs, and thus no ability to limit queue size through pushback. This means
|
|
there is currently no way to do the fine-grained queue management necessary to
|
|
create such a queue and rate limit traffic effectively enough to keep this
|
|
queue from draining to empty, without also risking that aggregate queuing
|
|
would cause out-of-memory conditions on fast relays.
|
|
|
|
It may be possible to create a congestion control algorithm that can support
|
|
such fine grained queue management, but this is a [deeply unsolved area of
|
|
research](https://lists.torproject.org/pipermail/tor-dev/2018-November/013562.html).
|
|
|
|
Even beyond these major technical hurdles, additional latency is also
|
|
unappealing to the wider Internet community, for the simple reason that
|
|
bandwidth [continues to increase
|
|
exponentially](https://ipcarrier.blogspot.com/2014/02/bandwidth-growth-nearly-what-one-would.html)
|
|
where as the speed of light is fixed. Significant engineering effort has been
|
|
devoted to optimizations that reduce the effect of latency on Internet
|
|
protocols. To go against this trend would ensure our irrelevance to the wider
|
|
conversation about traffic analysis defenses for low latency Internet protocols.
|
|
|
|
On the other hand, through [load
|
|
balancing](https://gitweb.torproject.org/torspec.git/tree/proposals/265-load-balancing-with-overhead.txt)
|
|
and [circuit multiplexing strategies](https://bugs.torproject.org/29494), we
|
|
believe it is possible to add significant bandwidth overhead in the form of
|
|
cover traffic, without significantly impacting end-user performance.
|
|
|
|
For these reasons, we believe the trade-off should be in favor of adding more
|
|
cover traffic, rather than imposing queuing memory overhead and queuing delay.
|
|
|
|
As a last resort for narrowly scoped application domains (such as
|
|
shaping Tor service-side onion service traffic to look like other websites or
|
|
different application-layer protocols), delay *may* be added at the
|
|
[application layer](https://petsymposium.org/2017/papers/issue2/paper54-2017-2-source.pdf).
|
|
Any additional cover traffic required by such defenses should still be
|
|
added at the circuit padding layer using this framework, to provide
|
|
engineering efficiency through loose layer coupling and component re-use, as
|
|
well as to provide additional gains against [low
|
|
resolution](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L47)
|
|
end-to-end traffic correlation.
|
|
|
|
Because such delay-based defenses will impact performance significantly more
|
|
than simply adding cover traffic, they must be optional, and negotiated by
|
|
only specific application layer endpoints that want them. This will have
|
|
consequences for anonymity sets and base rates, if such traffic shaping and
|
|
additional cover traffic is not very carefully constructed.
|
|
|
|
In terms of acceptable overhead, because Tor onion services
|
|
[currently use](https://metrics.torproject.org/hidserv-rend-relayed-cells.html)
|
|
less than 1% of the
|
|
[total consumed bandwidth](https://metrics.torproject.org/bandwidth-flags.html)
|
|
of the Tor network, and because onion services exist to provide higher
|
|
security as compared to Tor Exit traffic, they are an attractive target for
|
|
higher-overhead defenses. We encourage researchers to target this use case
|
|
for defenses that require more overhead, and/or for the deployment of
|
|
optional negotiated application-layer delays on either the server or the
|
|
client side.
|
|
|
|
## 2. Creating New Padding Machines
|
|
|
|
This section explains how to use the existing mechanisms in Tor to define a
|
|
new circuit padding machine. We assume here that you know C, and are at
|
|
least somewhat familiar with Tor development. For more information on Tor
|
|
development in general, see the other files in doc/HACKING/ in a recent Tor
|
|
distribution.
|
|
|
|
Again, if you prefer to learn by example, you may want to skip to either the
|
|
[QuickStart Guide](CircuitPaddingQuickStart.md), and/or [Section
|
|
5](#5-example-padding-machines) for example machines to get up and running
|
|
quickly.
|
|
|
|
To create a new padding machine, you must:
|
|
|
|
1. Define your machine using the fields of a heap-allocated
|
|
`circpad_machine_spec_t` C structure.
|
|
|
|
2. Register this object in the global list of available padding machines,
|
|
using `circpad_register_padding_machine()`.
|
|
|
|
3. Ensure that your machine is properly negotiated under your desired
|
|
circuit conditions.
|
|
|
|
### 2.1. Registering a New Padding Machine
|
|
|
|
Again, a circuit padding machine is designed to be specified entirely as a [single
|
|
C structure](#13-computation-model).
|
|
|
|
Your machine definitions should go into their own functions in
|
|
[circuitpadding_machines.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c). For
|
|
details on all of the fields involved in specifying a padding machine, see
|
|
[Section 3](#3-specifying-padding-machines).
|
|
|
|
You must register your machine in `circpad_machines_init()` in
|
|
[circuitpadding.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.c). To
|
|
add a new padding machine specification, you must allocate a
|
|
`circpad_machine_spec_t` on the heap with `tor_malloc_zero()`, give it a
|
|
human readable name string, and a machine number equivalent to the number of
|
|
machines in the list, and register the structure using
|
|
`circpad_register_padding_machine()`.
|
|
|
|
Each machine must have a client instance and a relay instance. Register your
|
|
client-side machine instance in the `origin_padding_machines` list, and your
|
|
relay side machine instance in the `relay_padding_machines` list. Once you
|
|
have registered your instance, you do not need to worry about deallocation;
|
|
this is handled for you automatically.
|
|
|
|
Both machine lists use registration order to signal machine precedence for a
|
|
given `machine_idx` slot on a circuit. This means that machines that are
|
|
registered last are checked for activation *before* machines that are
|
|
registered first. (This reverse precedence ordering allows us to
|
|
deprecate older machines simply by adding new ones after them.)
|
|
|
|
### 2.2. Machine Activation and Shutdown
|
|
|
|
After a machine has been successfully registered with the framework, it will
|
|
be instantiated on any client-side circuits that support it. Only client-side
|
|
circuits may initiate padding machines, but either clients or relays may shut
|
|
down padding machines.
|
|
|
|
#### 2.2.1. Machine Application Conditions
|
|
|
|
The
|
|
[circpad_machine_conditions_t conditions field](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L641)
|
|
of your `circpad_machine_spec_t` machine definition instance controls the
|
|
conditions under which your machine will be attached and enabled on a Tor
|
|
circuit, and when it gets shut down.
|
|
|
|
*All* of your explicitly specified conditions in
|
|
`circpad_machine_spec_t.conditions` *must* be met for the machine to be
|
|
applied to a circuit. If *any* condition ceases to be met, then the machine
|
|
is shut down. (This is checked on every event that arrives, even if the
|
|
condition is unrelated to the event.)
|
|
Another way to look at this is that
|
|
all specified conditions must evaluate to true for the entire duration that
|
|
your machine is running. If any are false, your machine does not run (or
|
|
stops running and shuts down).
|
|
|
|
In particular, as part of the
|
|
[circpad_machine_conditions_t structure](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L149),
|
|
the circuit padding subsystem gives the developer the option to enable a
|
|
machine based on:
|
|
- The
|
|
[length](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L157)
|
|
on the circuit (via the `min_hops` field).
|
|
- The
|
|
[current state](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L174)
|
|
of the circuit, such as streams, relay_early, etc. (via the
|
|
`circpad_circuit_state_t state_mask` field).
|
|
- The
|
|
[purpose](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L178)
|
|
(i.e. type) of the circuit (via the `circpad_purpose_mask_t purpose_mask`
|
|
field).
|
|
|
|
This condition mechanism is the preferred way to determine if a machine should
|
|
apply to a circuit. For information about potentially useful conditions that
|
|
we have considered but have not yet implemented, see [Section
|
|
7.3](#73-better-machine-negotiation). We will happily accept patches for those
|
|
conditions, or any for other additional conditions that are needed for your
|
|
use case.
|
|
|
|
#### 2.2.2. Detecting and Negotiating Machine Support
|
|
|
|
When a new machine specification is added to Tor (or removed from Tor), you
|
|
should bump the Padding subprotocol version in `src/core/or/protover.c` and
|
|
`src/rust/protover/protover.rs`, add a field to `protover_summary_flags_t` in
|
|
`or.h`, and set this field in `memoize_protover_summary()` in versions.c. This
|
|
new field must then be checked in `circpad_node_supports_padding()` in
|
|
`circuitpadding.c`.
|
|
|
|
Note that this protocol version update and associated support check is not
|
|
necessary if your experiments will *only* be using your own relays that
|
|
support your own padding machines. This can be accomplished by using the
|
|
`MiddleNodes` directive; see [Section 4](#4-evaluating-padding-machines) for more information.
|
|
|
|
If the protocol support check passes for the circuit, then the client sends a
|
|
`RELAY_COMMAND_PADDING_NEGOTIATE` cell towards the
|
|
`circpad_machine_spec_t.target_hop` relay, and immediately enables the
|
|
padding machine, and may begin sending padding. (The framework does not wait
|
|
for the `RELAY_COMMAND_PADDING_NEGOTIATED` response to begin padding,
|
|
so that we can
|
|
switch between machines rapidly.)
|
|
|
|
#### 2.2.3. Machine Shutdown Mechanisms
|
|
|
|
Padding machines can be shut down on a circuit in three main ways:
|
|
1. During a `circpad_machine_event` callback, when
|
|
`circpad_machine_spec_t.conditions` no longer applies (client side)
|
|
2. After a transition to the CIRCPAD_STATE_END, if
|
|
`circpad_machine_spec_t.should_negotiate_end` is set (client or relay
|
|
side)
|
|
3. If there is a `RELAY_COMMAND_PADDING_NEGOTIATED` error response from the
|
|
relay during negotiation.
|
|
|
|
Each of these cases causes the originating node to send a relay cell towards
|
|
the other side, indicating that shutdown has occurred. The client side sends
|
|
`RELAY_COMMAND_PADDING_NEGOTIATE`, and the relay side sends
|
|
`RELAY_COMMAND_PADDING_NEGOTIATED`.
|
|
|
|
Because padding from malicious exit nodes can be used to construct active
|
|
timing-based side channels to malicious guard nodes, the client checks that
|
|
padding-related cells only come from relays with active padding machines.
|
|
For this reason, when a client decides to shut down a padding machine,
|
|
the framework frees the mutable `circuit_t.padding_info`, but leaves the
|
|
`circuit_t.padding_machine` pointer set until the
|
|
`RELAY_COMMAND_PADDING_NEGOTIATED` response comes back, to ensure that any
|
|
remaining in-flight padding packets are recognized a valid. Tor does
|
|
not yet close circuits due to violation of this property, but the
|
|
[vanguards addon component "bandguard"](https://github.com/mikeperry-tor/vanguards/blob/master/README_TECHNICAL.md#the-bandguards-subsystem)
|
|
does.
|
|
|
|
As an optimization, a client may replace a machine with another, by
|
|
sending a `RELAY_COMMAND_PADDING_NEGOTIATE` cell to shut down a machine, and
|
|
immediately sending a `RELAY_COMMAND_PADDING_NEGOTIATE` to start a new machine
|
|
in the same index, without waiting for the response from the first negotiate
|
|
cell.
|
|
|
|
Unfortunately, there is a known bug as a consequence of this optimization. If
|
|
your machine depends on repeated shutdown and restart of the same machine
|
|
number on the same circuit, please see [Bug
|
|
30922](https://bugs.torproject.org/30992). Depending on your use case, we may
|
|
need to fix that bug or help you find a workaround. See also [Section
|
|
6.1.3](#613-deallocation-and-shutdown) for some more technical details on this
|
|
mechanism.
|
|
|
|
|
|
## 3. Specifying Padding Machines
|
|
|
|
By now, you should understand how to register, negotiate, and control the
|
|
lifetime of your padding machine, but you still don't know how to make it do
|
|
anything yet. This section should help you understand how to specify how your
|
|
machine reacts to events and adds padding to the wire.
|
|
|
|
If you prefer to learn by example first instead, you may wish to skip to
|
|
[Section 5](#5-example-padding-machines).
|
|
|
|
|
|
A padding machine is specified by filling in an instance of
|
|
[circpad_machine_spec_t](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L605). Instances
|
|
of this structure specify the precise functionality of a machine: it's
|
|
what the circuit padding developer is called to write. These instances
|
|
are created only at startup, and are referenced via `const` pointers during
|
|
normal operation.
|
|
|
|
In this section we will go through the most important elements of this
|
|
structure.
|
|
|
|
### 3.1. Padding Machine States
|
|
|
|
A padding machine is a finite state machine where each state
|
|
specifies a different style of padding.
|
|
|
|
As an example of a simple padding machine, you could have a state machine
|
|
with the following states: `[START] -> [SETUP] -> [HTTP] -> [END]` where the
|
|
`[SETUP]` state pads in a way that obfuscates the ''circuit setup'' of Tor,
|
|
and the `[HTTP]` state pads in a way that emulates a simple HTTP session. Of
|
|
course, padding machines can be more complicated than that, with dozens of
|
|
states and non-trivial transitions.
|
|
|
|
Padding developers encode the machine states in the
|
|
[circpad_machine_spec_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L655). Each
|
|
machine state is described by a
|
|
[circpad_state_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L273)
|
|
and each such structure specifies the style and amount of padding to be sent,
|
|
as well as the possible state transitions.
|
|
|
|
The function `circpad_machine_states_init()` must be used for allocating and
|
|
initializing the `circpad_machine_spec_t.states` array before states and
|
|
state transitions can be defined, as some of the state object has non-zero
|
|
default values.
|
|
|
|
### 3.2. Padding Machine State Transitions
|
|
|
|
As described above, padding machines can have multiple states, to
|
|
support different forms of padding. Machines can transition between
|
|
states based on events that occur either on the circuit level or on
|
|
the machine level.
|
|
|
|
State transitions are specified using the
|
|
[next_state field](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L381)
|
|
of the `circpad_state_t` structure. As a simple example, to transition
|
|
from state `A` to state `B` when event `E` occurs, you would use the
|
|
following code: `A.next_state[E] = B`.
|
|
|
|
#### 3.2.1. State Transition Events
|
|
|
|
Here we will go through
|
|
[the various events](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L30)
|
|
that can be used to transition between states:
|
|
|
|
* Circuit-level events
|
|
* `CIRCPAD_EVENT_NONPADDING_RECV`: A non-padding cell is received
|
|
* `CIRCPAD_EVENT_NONPADDING_SENT`: A non-adding cell is sent
|
|
* `CIRCPAD_EVENT_PADDING_SENT`: A padding cell is sent
|
|
* `CIRCPAD_EVENT_PADDING_RECV`: A padding cell is received
|
|
* Machine-level events
|
|
* `CIRCPAD_EVENT_INFINITY`: Tried to schedule padding using the ''infinity bin''.
|
|
* `CIRCPAD_EVENT_BINS_EMPTY`: All histogram bins are empty (out of tokens)
|
|
* `CIRCPAD_EVENT_LENGTH_COUNT`: State has used all its padding capacity (see `length_dist` below)
|
|
|
|
### 3.3. Specifying Per-State Padding
|
|
|
|
Each state of a padding machine specifies either:
|
|
* A padding histogram describing inter-transmission delays between cells;
|
|
d OR
|
|
* A parameterized delay probability distribution for inter-transmission
|
|
delays between cells.
|
|
|
|
Either mechanism specifies essentially the *minimum inter-transmission time*
|
|
distribution. If non-padding traffic does not get transmitted from this
|
|
endpoint before the delay value sampled from this distribution expires, a
|
|
padding packet is sent.
|
|
|
|
The choice between histograms and probability distributions can be subtle. A
|
|
rule of thumb is that probability distributions are easy to specify and
|
|
consume very little memory, but might not be able to describe certain types
|
|
of complex padding logic. Histograms, in contrast, can support precise
|
|
packet-count oriented or multimodal delay schemes, and can use token removal
|
|
logic to reduce overhead and shape the total padding+non-padding inter-packet
|
|
delay distribution towards an overall target distribution.
|
|
|
|
We suggest that you start with a probability distribution if possible, and
|
|
you move to a histogram-based approach only if a probability distribution
|
|
does not suit your needs.
|
|
|
|
#### 3.3.1. Padding Probability Distributions
|
|
|
|
The easiest, most compact way to schedule padding using a machine state is to
|
|
use a probability distribution that specifies the possible delays. That can
|
|
be done
|
|
[using the circpad_state_t fields](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L339)
|
|
`iat_dist`, `dist_max_sample_usec` and `dist_added_shift_usec`.
|
|
|
|
The Tor circuit padding framework
|
|
[supports multiple types](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L214)
|
|
of probability distributions, and the developer should use the
|
|
[circpad_distribution_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L240)
|
|
to specify them as well as the required parameters.
|
|
|
|
#### 3.3.2. Padding Histograms
|
|
|
|
A more advanced way to schedule padding is to use a ''padding
|
|
histogram''. The main advantages of a histogram are that it allows you to
|
|
specify distributions that are not easily parameterized in closed form, or
|
|
require specific packet counts at particular time intervals. Histograms also
|
|
allow you to make use of an optional traffic minimization and shaping
|
|
optimization called *token removal*, which is central to the original
|
|
[Adaptive Padding](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
|
|
concept.
|
|
|
|
If a histogram is used by a state (as opposed to a fixed parameterized
|
|
distribution), then the developer must use the
|
|
[histogram-related fields](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L285)
|
|
of the `circpad_state_t` structure.
|
|
|
|
The width of a histogram bin specifies the range of inter-packet delay times,
|
|
whereas its height specifies the amount of tokens in each bin. To sample a
|
|
padding delay from a histogram, we first randomly pick a bin (weighted by the
|
|
amount of tokens in each bin) and then sample a delay from within that bin by
|
|
picking a uniformly random delay using the width of the bin as the range.
|
|
|
|
Each histogram also has an ''infinity bin'' as its final bin. If the
|
|
''infinity bin'' is chosen,
|
|
we don't schedule any padding (i.e., we schedule padding with
|
|
infinite delay). If the developer does not want infinite delay, they
|
|
should not give any tokens to the ''infinity bin''.
|
|
|
|
If a token removal strategy is specified (via the
|
|
`circpad_state_t.token_removal` field), each time padding is sent using a
|
|
histogram, the padding machine will remove a token from the appropriate
|
|
histogram bin whenever this endpoint sends *either a padding packet or a
|
|
non-padding packet*. The different removal strategies govern what to do when
|
|
the bin corresponding to the current inter-packet delay is empty.
|
|
|
|
Token removal is optional. It is useful if you want to do things like specify
|
|
a burst should be at least N packets long, and you only want to add padding
|
|
packets if there are not enough non-padding packets. The cost of doing token
|
|
removal is additional memory allocations for making per-circuit copies of
|
|
your histogram that can be modified.
|
|
|
|
### 3.4. Specifying Precise Cell Counts
|
|
|
|
Padding machines should be able to specify the exact amount of padding they
|
|
send. For histogram-based machines this can be done using a specific amount
|
|
of tokens, but another (and perhaps easier) way to do this, is to use the
|
|
[length_dist field](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L362)
|
|
of the `circpad_state_t` structure.
|
|
|
|
The `length_dist` field is basically a probability distribution similar to the
|
|
padding probability distributions, which applies to a specific machine state
|
|
and specifies the amount of padding we are willing to send during that state.
|
|
This value gets sampled when we transition to that state (TODO document this
|
|
in the code).
|
|
|
|
### 3.5. Specifying Overhead Limits
|
|
|
|
Separately from the length counts, it is possible to rate limit the overhead
|
|
percentage of padding at both the global level across all machines, and on a
|
|
per-machine basis.
|
|
|
|
At the global level, the overhead percentage of all circuit padding machines
|
|
as compared to total traffic can be limited through the Tor consensus
|
|
parameter `circpad_global_max_padding_pct`. This overhead is defined as the
|
|
percentage of padding cells *sent* out of the sum of non padding and padding
|
|
cells *sent*, and is applied *only after* at least
|
|
`circpad_global_allowed_cells` padding cells are sent by that relay or client
|
|
(to allow for small bursts of pure padding on otherwise idle or freshly
|
|
restarted relays). When both of these limits are hit by a relay or client, no
|
|
further padding cells will be sent, until sufficient non-padding traffic is
|
|
sent to cause the percentage of padding traffic to fall back below the
|
|
threshold.
|
|
|
|
Additionally, each individual padding machine can rate limit itself by
|
|
filling in the fields `circpad_machine_spec_t.max_padding_percent` and
|
|
`circpad_machine_spec_t.allowed_padding_count`, which behave identically to
|
|
the consensus parameters, but only apply to that specific machine.
|
|
|
|
## 4. Evaluating Padding Machines
|
|
|
|
One of the goals of the circuit padding framework is to provide improved
|
|
evaluation and scientific reproducibility for lower cost. This includes both
|
|
the [choice](#13-computation-model) of the compact C structure representation
|
|
(which has an easy-to-produce bitstring representation for optimization by
|
|
gradient descent, GAs, or GANs), as well as rapid prototyping and evaluation.
|
|
|
|
So far, whenever evaluation cost has been a barrier, each research group has
|
|
developed their own ad-hoc packet-level simulators of various padding
|
|
mechanisms for evaluating website fingerprinting attacks and defenses. The
|
|
process typically involves doing a crawl of Alexa top sites over Tor, and
|
|
recording the Tor cell count and timing information for each page in the
|
|
trace. These traces are then fed to simulations of defenses, which output
|
|
modified trace files.
|
|
|
|
Because no standardized simulation and evaluation mechanism exists, it is
|
|
often hard to tell if independent implementations of various attacks and
|
|
defenses are in fact true-to-form or even properly calibrated for direct
|
|
comparison, and discrepancies in results across the literature suggests
|
|
this is not always so.
|
|
|
|
Our preferred outcome with this framework is that machines are tuned
|
|
and optimized on a tracing simulator, but that the final results come from
|
|
an actual live network test of the defense. The traces from this final crawl
|
|
should be preserved as artifacts to be run on the simulator and reproduced
|
|
on the live network by future papers, ideally in journal venues that have an
|
|
artifact preservation policy.
|
|
|
|
### 4.1. Pure Simulation
|
|
|
|
When doing initial tuning of padding machines, especially in adversarial
|
|
settings, variations of a padding machine defense may need to be applied to
|
|
network activity hundreds or even millions of times. The wall-clock time
|
|
required to do this kind of tuning using live testing or even Shadow network
|
|
emulation may often be prohibitive.
|
|
|
|
To help address this, and to better standardize results, Tobias Pulls has
|
|
implemented a [circpad machine trace simulator](https://github.com/pylls/circpad-sim),
|
|
which uses Tor's unit test framework to simulate applying padding machines to
|
|
circuit packet traces via a combination of Tor patches and python scripts. This
|
|
simulator can be used to record traces from clients, Guards, Middles, Exits,
|
|
and any other hop in the path, only for circuits that are run by the
|
|
researcher. This makes it possible to safely record baseline traces and
|
|
ultimately even mount passive attacks on the live network, without impacting
|
|
or recording any normal user traffic.
|
|
|
|
In this way, a live crawl of the Alexa top sites could be performed once, to
|
|
produce a standard "undefended" corpus. Padding machines can be then quickly
|
|
evaluated and tuned on these simulated traces in a standardized way, and then
|
|
the results can then be [reproduced on the live Tor
|
|
network](#44-Testing-on-the-Live-Network) with the machines running on your own relays.
|
|
|
|
Please be mindful of the Limitations section of the simulator documentation,
|
|
however, to ensure that you are aware of the edge cases and timing
|
|
approximations that are introduced by this approach.
|
|
|
|
### 4.2. Testing in Chutney
|
|
|
|
The Tor Project provides a tool called
|
|
[Chutney](https://github.com/torproject/chutney/) which makes it very easy to
|
|
setup private Tor networks. While getting it work for the first time might
|
|
take you some time of doc reading, the final result is well worth it for the
|
|
following reasons:
|
|
|
|
- You control all the relays and hence you have greater control and debugging
|
|
capabilities.
|
|
- You control all the relays and hence you can toggle padding support on/off
|
|
at will.
|
|
- You don't need to be cautious about overhead or damaging the real Tor
|
|
network during testing.
|
|
- You don't even need to be online; you can do all your testing offline over
|
|
localhost.
|
|
|
|
A final word of warning here is that since Chutney runs over localhost, the
|
|
packet latencies and delays are completely different from the real Tor
|
|
network, so if your padding machines rely on real network timings you will
|
|
get different results on Chutney. You can work around this by using a
|
|
different set of delays if Chutney is used, or by moving your padding
|
|
machines to the real network when you want to do latency-related testing.
|
|
|
|
### 4.3. Testing in Shadow
|
|
|
|
[Shadow](https://shadow.github.io/) is an environment for running entire Tor
|
|
network simulations, similar to Chutney, but designed to be both more memory
|
|
efficient, as well as provide an accurate Tor network topology and latency
|
|
model.
|
|
|
|
While Shadow is significantly more memory efficient than Chutney, and can make
|
|
use of extremely accurate Tor network capacity and latency models, it will not
|
|
be as fast or efficient as the [circpad trace simulator](https://github.com/pylls/circpad-sim),
|
|
if you need to do many many iterations of an experiment to tune your defense.
|
|
|
|
### 4.4. Testing on the Live Network
|
|
|
|
Live network testing is the gold standard for verifying that any attack or
|
|
defense is behaving as expected, to minimize the influence of simplifying
|
|
assumptions.
|
|
|
|
However, it is not ethical, or necessarily possible, to run high-resolution
|
|
traffic analysis attacks on the entire Tor network. But it is both ethical
|
|
and possible to run small scale experiments that target only your own
|
|
clients, who will only use your own Tor relays that support your new padding
|
|
machines.
|
|
|
|
We provide the `MiddleNodes` torrc directive to enable this, which will allow
|
|
you to specify the fingerprints and/or IP netmasks of relays to be used in
|
|
the second hop position. Options to restrict other hops also exist, if your
|
|
padding system is padding to a different hop. The `HSLayer2Nodes` option
|
|
overrides the `MiddleNodes` option for onion service circuits, if both are
|
|
set. (The
|
|
[vanguards addon](https://github.com/mikeperry-tor/vanguards/README_TECHNICAL.md)
|
|
will set `HSLayer2Nodes`.)
|
|
|
|
When you run your own clients, and use MiddleNodes to restrict your clients
|
|
to use your relays, you can perform live network evaluations of a defense
|
|
applied to whatever traffic crawl or activity your clients do.
|
|
|
|
## 5. Example Padding Machines
|
|
|
|
### 5.1. Deployed Circuit Setup Machines
|
|
|
|
Tor currently has two padding machines enabled by default, which aim to hide
|
|
certain features of the client-side onion service circuit setup protocol. For
|
|
more details on their precise goal and function, please see
|
|
[proposal 302](https://github.com/torproject/torspec/blob/master/proposals/302-padding-machines-for-onion-clients.txt)
|
|
. In this section we will go over the code of those machines to clarify some
|
|
of the engineering parts:
|
|
|
|
#### 5.1.1. Overview
|
|
|
|
The codebase of proposal 302 can be found in
|
|
[circuitpadding_machines.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c)
|
|
and specifies four padding machines:
|
|
|
|
- The [client-side introduction](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L60) circuit machine.
|
|
- The [relay-side introduction](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L146) circuit machine.
|
|
- The [client-side rendezvous](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L257) circuit machine
|
|
- The [relay-side rendezvous](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L374) circuit machine.
|
|
|
|
Each of those machines has its own setup function, and
|
|
[they are all initialized](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.c#L2718)
|
|
by the circuit padding framework. To understand more about them, please
|
|
carefully read the individual setup function for each machine which are
|
|
fairly well documented. Each function goes through the following steps:
|
|
- Machine initialization
|
|
- Give it a [name](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L70)
|
|
- Specify [which hop](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L73) the padding should go to
|
|
- Specify whether it should be [client-side](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L75) or relay-side.
|
|
- Specify for [which types of circuits](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L78) the machine should apply
|
|
- Specify whether the circuits should be [kept alive](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c#L112) until the machine finishes padding.
|
|
- Sets [padding limits](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L116) to avoid too much overhead in case of bugs or errors.
|
|
- Setup [machine states](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L120)
|
|
- Specify [state transitions](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L123).
|
|
- Finally [register the machine](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L137) to the global machine list
|
|
|
|
### 5.2. Adaptive Padding Early
|
|
|
|
[Adaptive Padding Early](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/)
|
|
is a variant of Adaptive Padding/WTF-PAD that does not use histograms or token
|
|
removal to shift padding distributions, but instead uses fixed parameterized
|
|
distributions to specify inter-packet timing thresholds for burst and gap
|
|
inter-packet delays.
|
|
|
|
Tobias Pulls's [QuickStart Guide](CircuitPaddingQuickStart.md) describes how
|
|
to get this machine up and running, and has links to branches with a working
|
|
implementation.
|
|
|
|
### 5.3. Sketch of Tamaraw
|
|
|
|
The [Tamaraw defense
|
|
paper](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf) is the
|
|
only defense to date that provides a proof of optimality for the finite-length
|
|
website traffic fingerprinting domain. These bounds assume that a defense is
|
|
able to perform a full, arbitrary transform of a trace that is under a fixed
|
|
number of packets in length.
|
|
|
|
The key insight to understand Tamaraw's optimality is that it achieves one
|
|
such optimal transform by delaying traffic below a circuit's throughput. By
|
|
doing this, it creates a queue that is rarely empty, allowing it to produce
|
|
a provably optimal transform with minimal overhead. As [Section
|
|
1.4](#14-other-deployment-constraints) explains, this queue
|
|
cannot be maintained on the live Tor network without risk of out-of-memory
|
|
conditions at relays.
|
|
|
|
However, if the queue is not maintained in the Tor network, but instead by the
|
|
application layer, it could be deployed by websites that opt in to using it.
|
|
|
|
In this case, the application layer component would do *optional* constant
|
|
rate shaping, negotiated between a web browser and a website. The Circuit
|
|
Padding Framework can then easily fill in any missing gaps of cover traffic
|
|
packets, and also ensure that only a fixed length number of packets are sent
|
|
in total.
|
|
|
|
However, for such a defense to be safe, additional care must be taken to
|
|
ensure that the resulting traffic pattern still has a large
|
|
anonymity/confusion set with other traces on the live network.
|
|
|
|
Accomplishing this is an unsolved problem.
|
|
|
|
### 5.4. Other Padding Machines
|
|
|
|
Our partners in this project at RIT have produced a couple prototypes, based
|
|
on their published research designs
|
|
[REB and RBB](https://www.researchgate.net/publication/329743510_UNDERSTANDING_FEATURE_DISCOVERY_IN_WEBSITE_FINGERPRINTING_ATTACKS).
|
|
|
|
As [their writeup
|
|
explains](https://github.com/notem/tor-rbp-padding-machine-doc), because RBB
|
|
uses delay, the circuit padding machine that they made is a no-delay version.
|
|
|
|
They also ran into an issue with the 0-delay timing workaround for [bug
|
|
31653](https://bugs.torproject.org/31653). Keep an eye on that bug for updates
|
|
with improved workarounds/fixes.
|
|
|
|
Their code is [available on github](https://github.com/notem/tor/tree/circuit_padding_rbp_machine).
|
|
|
|
|
|
## 6. Framework Implementation Details
|
|
|
|
If you need to add additional events, conditions, or other features to the
|
|
circuit padding framework, then this section is for you.
|
|
|
|
### 6.1. Memory Allocation Conventions
|
|
|
|
If the existing circuit padding features are sufficient for your needs, then
|
|
you do not need to worry about memory management or pointer lifespans. The
|
|
circuit padding framework should take care of this for you automatically.
|
|
|
|
However, if you need to add new padding machine features to support your
|
|
padding machines, then it will be helpful to understand how circuits
|
|
correspond to the global machine definitions, and how mutable padding machine
|
|
memory is managed.
|
|
|
|
#### 6.1.1. Circuits and Padding Machines
|
|
|
|
In Tor, the
|
|
[circuit_t structure](https://github.com/torproject/tor/blob/master/src/core/or/circuit_st.h)
|
|
is the superclass structure for circuit-related state that is used on both
|
|
clients and relays. On clients, the actual datatype of the object pointed to
|
|
by `circuit_t *` is the subclass structure
|
|
[origin_circuit_t](https://github.com/torproject/tor/blob/master/src/core/or/origin_circuit_st.h). The
|
|
macros `CIRCUIT_IS_ORIGIN()` and `TO_ORIGIN_CIRCUIT()` are used to determine
|
|
if a circuit is a client-side (origin) circuit and to cast the pointer safely
|
|
to `origin_circuit_t *`.
|
|
|
|
Because circuit padding machines can be present at both clients and relays,
|
|
the circuit padding fields are stored in the `circuit_t *` superclass
|
|
structure. Notice that there are actually two sets of circuit padding fields:
|
|
a `const circpad_machine_spec_t *` array, and a `circpad_machine_runtime_t *`
|
|
array. Each of these arrays holds at most two elements, as there can be at
|
|
most two padding machines on each circuit.
|
|
|
|
The `const circpad_machine_spec_t *` points to a globally allocated machine
|
|
specification. These machine specifications are
|
|
allocated and set up during Tor program startup, in `circpad_machines_init()`
|
|
in
|
|
[circuitpadding.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.c). Because
|
|
the machine specification object is shared by all circuits, it must not be
|
|
modified or freed until program exit (by `circpad_machines_free()`). The
|
|
`const` qualifier should enforce this at compile time.
|
|
|
|
The `circpad_machine_runtime_t *` array member points to the mutable runtime
|
|
information for machine specification at that same array index. This runtime
|
|
structure keeps track of the current machine state, packet counts, and other
|
|
information that must be updated as the machine operates. When a padding
|
|
machine is successfully negotiated `circpad_setup_machine_on_circ()` allocates
|
|
the associated runtime information.
|
|
|
|
#### 6.1.2. Histogram Management
|
|
|
|
If a `circpad_state_t` of a machine specifies a `token_removal` strategy
|
|
other than `CIRCPAD_TOKEN_REMOVAL_NONE`, then every time
|
|
there is a state transition into this state, `circpad_machine_setup_tokens()`
|
|
will copy the read-only `circpad_state_t.histogram` array into a mutable
|
|
version at `circpad_machine_runtime_t.histogram`. This mutable copy is used
|
|
to decrement the histogram bin accounts as packets are sent, as per the
|
|
specified token removal strategy.
|
|
|
|
When the state machine transitions out of this state, the mutable histogram copy is freed
|
|
by this same `circpad_machine_setup_tokens()` function.
|
|
|
|
#### 6.1.3. Deallocation and Shutdown
|
|
|
|
As an optimization, padding machines can be swapped in and out by the client
|
|
without waiting a full round trip for the relay machine to shut down.
|
|
|
|
Internally, this is accomplished by immediately freeing the heap-allocated
|
|
`circuit_t.padding_info` field corresponding to that machine, but still preserving the
|
|
`circuit_t.padding_machine` pointer to the global padding machine
|
|
specification until the response comes back from the relay. Once the response
|
|
comes back, that `circuit_t.padding_machine` pointer is set to NULL, if the
|
|
response machine number matches the current machine present.
|
|
|
|
Because of this partial shutdown condition, we have two macros for iterating
|
|
over machines. `FOR_EACH_ACTIVE_CIRCUIT_MACHINE_BEGIN()` is used to iterate
|
|
over machines that have both a `circuit_t.padding_info` slot and a
|
|
`circuit_t.padding_machine` slot occupied. `FOR_EACH_CIRCUIT_MACHINE_BEGIN()`
|
|
is used when we need to iterate over all machines that are either active or
|
|
are simply waiting for a response to a shutdown request.
|
|
|
|
If the machine is replaced instead of just shut down, then the client frees
|
|
the `circuit_t.padding_info`, and then sets the `circuit_t.padding_machine`
|
|
and `circuit_t.padding_info` fields for this next machine immediately. This is
|
|
done in `circpad_add_matching_machines()`. In this case, since the new machine
|
|
should have a different machine number, the shut down response from the relay
|
|
is silently discarded, since it will not match the new machine number.
|
|
|
|
If this sequence of machine teardown and spin-up happens rapidly enough for
|
|
the same machine number (as opposed to different machines), then a race
|
|
condition can happen. This is
|
|
[known bug #30992](https://bugs.torproject.org/30992).
|
|
|
|
When the relay side decides to shut down a machine, it sends a
|
|
`RELAY_COMMAND_PADDING_NEGOTIATED` towards the client. If this cell matches the
|
|
current machine number on the client, that machine is torn down, by freeing
|
|
the `circuit_t.padding_info` slot and immediately setting
|
|
`circuit_t.padding_machine` slot to NULL.
|
|
|
|
Additionally, if Tor decides to close a circuit forcibly due to error before
|
|
the padding machine is shut down, then `circuit_t.padding_info` is still
|
|
properly freed by the call to `circpad_circuit_free_all_machineinfos()`
|
|
in `circuit_free_()`.
|
|
|
|
### 6.2. Machine Application Events
|
|
|
|
The framework checks client-side origin circuits to see if padding machines
|
|
should be activated or terminated during specific event callbacks in
|
|
`circuitpadding.c`. We list these event callbacks here only for reference. You
|
|
should not modify any of these callbacks to get your machine to run; instead,
|
|
you should use the `circpad_machine_spec_t.conditions` field.
|
|
|
|
However, you may add new event callbacks if you need other activation events,
|
|
for example to provide obfuscation-layer or application-layer signaling. Any
|
|
new event callbacks should behave exactly like the existing callbacks.
|
|
|
|
During each of these event callbacks, the framework checks to see if any
|
|
current running padding machines have conditions that no longer apply as a
|
|
result of the event, and shuts those machines down. Then, it checks to see if
|
|
any new padding machines should be activated as a result of the event, based
|
|
on their circuit application conditions. **Remember: Machines are checked in
|
|
reverse order in the machine list. This means that later, more recently added
|
|
machines take precedence over older, earlier entries in each list.**
|
|
|
|
Both of these checks are performed using the machine application conditions
|
|
that you specify in your machine's `circpad_machine_spec_t.conditions` field.
|
|
|
|
The machine application event callbacks are prefixed by `circpad_machine_event_` by convention in circuitpadding.c. As of this writing, these callbacks are:
|
|
|
|
- `circpad_machine_event_circ_added_hop()`: Called whenever a new hop is
|
|
added to a circuit.
|
|
- `circpad_machine_event_circ_built()`: Called when a circuit has completed
|
|
construction and is
|
|
opened. <!-- open != ready for traffic. Which do we mean? -nickm -->
|
|
- `circpad_machine_event_circ_purpose_changed()`: Called when a circuit
|
|
changes purpose.
|
|
- `circpad_machine_event_circ_has_no_relay_early()`: Called when a circuit
|
|
runs out of `RELAY_EARLY` cells.
|
|
- `circpad_machine_event_circ_has_streams()`: Called when a circuit gets a
|
|
stream attached.
|
|
- `circpad_machine_event_circ_has_no_streams()`: Called when the last
|
|
stream is detached from a circuit.
|
|
|
|
### 6.3. Internal Machine Events
|
|
|
|
To provide for some additional capabilities beyond simple finite state machine
|
|
behavior, the circuit padding machines also have internal events that they
|
|
emit to themselves when packet count length limits are hit, when the Infinity
|
|
bin is sampled, and when the histogram bins are emptied of all tokens.
|
|
|
|
These events are implemented as `circpad_internal_event_*` functions in
|
|
`circuitpadding.c`, which are called from various areas that determine when
|
|
the events should occur.
|
|
|
|
While the conditions that trigger these internal events to be called may be
|
|
complex, they are processed by the state machine definitions in a nearly
|
|
identical manner as the cell processing events, with the exception that they
|
|
are sent to the current machine only, rather than all machines on the circuit.
|
|
|
|
|
|
## 7. Future Features and Optimizations
|
|
|
|
While implementing the circuit padding framework, our goal was to deploy a
|
|
system that obscured client-side onion service circuit setup and supported
|
|
deployment of WTF-PAD and/or APE. Along the way, we noticed several features
|
|
that might prove useful to others, but were not essential to implement
|
|
immediately. We do not have immediate plans to implement these ideas, but we
|
|
would gladly accept patches that do so.
|
|
|
|
The following list gives an overview of these improvements, but as this
|
|
document ages, it may become stale. The canonical list of improvements that
|
|
researchers may find useful is tagged in our bugtracker with
|
|
[circpad-researchers](https://trac.torproject.org/projects/tor/query?keywords=~circpad-researchers),
|
|
and the list of improvements that are known to be necessary for some research
|
|
areas are tagged with
|
|
[circpad-researchers-want](https://trac.torproject.org/projects/tor/query?keywords=~circpad-researchers-want).
|
|
|
|
Please consult those lists for the latest status of these issues. Note that
|
|
not all fixes will be backported to all Tor versions, so be mindful of which
|
|
Tor releases receive which fixes as you conduct your experiments.
|
|
|
|
### 7.1. Load Balancing and Flow Control
|
|
|
|
Fortunately, non-Exit bandwidth is already plentiful and exceeds the Exit
|
|
capacity, and we anticipate that if we inform our relay operator community of
|
|
the need for non-Exit bandwidth to satisfy padding overhead requirements,
|
|
they will be able to provide that with relative ease.
|
|
|
|
Unfortunately, padding machines that have large quantities of overhead will
|
|
require changes to our load balancing system to account for this
|
|
overhead. The necessary changes are documented in
|
|
[Proposal 265](https://gitweb.torproject.org/torspec.git/tree/proposals/265-load-balancing-with-overhead.txt).
|
|
|
|
Additionally, padding cells are not currently subjected to flow control. For
|
|
high amounts of padding, we may want to change this. See [ticket
|
|
31782](https://bugs.torproject.org/31782) for details.
|
|
|
|
### 7.2. Timing and Queuing Optimizations
|
|
|
|
The circuitpadding framework has some timing related issues that may impact
|
|
results. If high-resolution timestamps are fed to opaque deep learning
|
|
trainers, those training models may end up able to differentiate padding
|
|
traffic from non-padding traffic due to these timing bugs.
|
|
|
|
The circuit padding cell event callbacks come from post-decryption points in
|
|
the cell processing codepath, and from the pre-queue points in the cell send
|
|
codepath. This means that our cell events will not reflect the actual time
|
|
when packets are read or sent on the wire. This is much worse in the send
|
|
direction, as the circuitmux queue, channel outbuf, and kernel TCP buffer will
|
|
impose significant additional delay between when we currently report that a
|
|
packet was sent, and when it actually hits the wire.
|
|
|
|
[Ticket 29494](https://bugs.torproject.org/29494) has a more detailed
|
|
description of this problem, and an experimental branch that changes the cell
|
|
event callback locations to be from circuitmux post-queue, which with KIST,
|
|
should be an accurate reflection of when they are actually sent on the wire.
|
|
|
|
If your padding machine and problem space depends on very accurate notions of
|
|
relay-side packet timing, please try that branch and let us know on the
|
|
ticket if you need any further assistance fixing it up.
|
|
|
|
Additionally, with these changes, it will be possible to provide further
|
|
overhead reducing optimizations by letting machines specify flags to indicate
|
|
that padding should not be sent if there are any cells pending in the cell
|
|
queue, for doing things like extending cell bursts more accurately and with
|
|
less overhead.
|
|
|
|
However, even if we solve the queuing issues, Tor's current timers are not as
|
|
precise as some padding designs may require. We will still have issues of
|
|
timing precision to solve. [Ticket 31653](https://bugs.torproject.org/31653)
|
|
describes an issue the circuit padding system has with sending 0-delay padding
|
|
cells, and [ticket 32670](https://bugs.torproject.org/32670) describes a
|
|
libevent timer accuracy issue, which causes callbacks to vary up to 10ms from
|
|
their scheduled time, even in absence of load.
|
|
|
|
All of these issues strongly suggest that you either truncate the resolution
|
|
of any timers you feed to your classifier, or that you omit timestamps
|
|
entirely from the classification problem until these issues are addressed.
|
|
|
|
### 7.3. Better Machine Negotiation
|
|
|
|
Circuit padding is applied to circuits through machine conditions.
|
|
|
|
The following machine conditions may be useful for some use cases, but have
|
|
not been implemented yet:
|
|
* [Exit Policy-based Stream Conditions](https://bugs.torproject.org/29083)
|
|
* [Probability to apply machine/Cointoss condition](https://bugs.torproject.org/30092)
|
|
* [Probability distributions for launching new padding circuit(s)](https://bugs.torproject.org/31783)
|
|
* [More flexible purpose handling](https://bugs.torproject.org/32040)
|
|
|
|
Additionally, the following features may help to obscure that padding is being
|
|
negotiated, and/or streamline that negotiation process:
|
|
* [Always send negotiation cell on all circuits](https://bugs.torproject.org/30172)
|
|
* [Better shutdown handling](https://bugs.torproject.org/30992)
|
|
* [Preference-ordered negotiation menu](https://bugs.torproject.org/30348)
|
|
|
|
### 7.4. Probabilistic State Transitions
|
|
|
|
Right now, the state machine transitions are fully deterministic. However,
|
|
one could imagine a state machine that uses probabilistic transitions between
|
|
states to simulate a random walk or Hidden Markov Model traversal across
|
|
several pages.
|
|
|
|
The simplest way to implement this is to make the `circpad_state_t.next_state` array
|
|
into an array of structs that have a next state field, and a probability to
|
|
transition to that state.
|
|
|
|
If you need this feature, please see [ticket
|
|
31787](https://bugs.torproject.org/31787) for more details.
|
|
|
|
### 7.5. More Complex Pattern Recognition
|
|
|
|
State machines are extremely efficient sequence recognition devices. But they
|
|
are not great pattern recognition devices. This is one of the reasons why
|
|
[Adaptive Padding](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
|
|
used state machines in combination with histograms, to model the target
|
|
distribution of interpacket delays for transmitted packets.
|
|
|
|
However, there currently is no such optimization for reaction to patterns of
|
|
*received* traffic. There may also be cases where defenses must react to more
|
|
complex patterns of sent traffic than can be expressed by our current
|
|
histogram and length count events.
|
|
|
|
For example: if you wish your machine to react to a certain count of incoming
|
|
cells in a row, right now you have to have a state for each cell, and use the
|
|
infinity bin to timeout of the sequence in each state. We could make this more
|
|
compact if each state had an arrival cell counter and inter-cell timeout. Or
|
|
we could make more complex mechanisms to recognize certain patterns of arrival
|
|
traffic in a state.
|
|
|
|
The best way to build recognition primitives like this into the framework is
|
|
to add additional [Internal Machine Events](#63-internal-machine-events) for
|
|
the pattern in question.
|
|
|
|
As another simple example, a useful additional event might be to transition
|
|
whenever any of your histogram bins are empty, rather than all of them. To do
|
|
this, you would add `CIRCPAD_EVENT_ANY_BIN_EMPTY` to the enum
|
|
`circpad_event_t` in `circuitpadding.h`. You would then create a function
|
|
`circuitpadding_internal_event_any_bin_empty()`, which would work just like
|
|
`circuitpadding_internal_event_bin_empty()`, and also be called from
|
|
`check_machine_token_supply()` in `circuitpadding.c` but with the check for
|
|
each bin being zero instead of the total. With this change, new machines could
|
|
react to this new event in the same way as any other.
|
|
|
|
If you have any other ideas that may be useful, please comment on [ticket
|
|
32680](https://bugs.torproject.org/32680).
|
|
|
|
|
|
## 8. Open Research Problems
|
|
|
|
### 8.1. Onion Service Circuit Setup
|
|
|
|
Our circuit setup padding does not address timing-based features, only
|
|
packet counts. Deep learning can probably see this.
|
|
|
|
However, before going too deep down the timing rabithole, we may need to make
|
|
[some improvements to Tor](#72-timing-and-queuing-optimizations). Please
|
|
comment on those tickets if you need this.
|
|
|
|
### 8.2. Onion Service Fingerprinting
|
|
|
|
We have done nothing to obscure the service side of onion service circuit
|
|
setup. Because service-side onion services will have the reverse traffic byte
|
|
counts as normal clients, they will likely need some kind of [hybrid
|
|
application layer traffic shaping](#53-sketch-of-tamaraw), in addition to
|
|
simple circuit setup obfuscation.
|
|
|
|
Fingerprinting in
|
|
[combination](https://github.com/mikeperry-tor/vanguards/blob/master/README_SECURITY.md)
|
|
with
|
|
[vanguards](https://github.com/mikeperry-tor/vanguards/blob/master/README_TECHNICAL.md)
|
|
ia also an open issue.
|
|
|
|
### 8.3. Open World Fingerprinting
|
|
|
|
Similarly, Open World/clearweb website fingerprinting defenses remain
|
|
an unsolved problem from the practicality point of view. The original WTF-PAD
|
|
defense was never tuned, and it is showing accuracy issues against deep
|
|
learning attacks.
|
|
|
|
### 8.4. Protocol Usage Fingerprinting
|
|
|
|
Traffic Fingerprinting to determine the protocol in use by a client has not
|
|
been studied, either from the attack or defense point of view.
|
|
|
|
### 8.5. Datagram Transport Side Channels
|
|
|
|
Padding can reduce the accuracy of dropped-cell side channels in such
|
|
transports, but we don't know [how to measure
|
|
this](https://lists.torproject.org/pipermail/tor-dev/2018-November/013562.html).
|
|
|
|
## 9. Must Read Papers
|
|
|
|
These are by far the most important papers in the space, to date:
|
|
|
|
- [Tamaraw](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf)
|
|
- [Bayes, Not Naive](https://www.petsymposium.org/2017/papers/issue4/paper50-2017-4-source.pdf)
|
|
- [Anonymity Trilemma](https://eprint.iacr.org/2017/954.pdf)
|
|
- [WTF-PAD](http://arxiv.org/pdf/1512.00524)
|
|
|
|
Except for WTF-PAD, these papers were selected because they point towards
|
|
optimality bounds that can be benchmarked against.
|
|
|
|
We cite them even though we are skeptical that provably optimal defenses can
|
|
be constructed, at least not without trivial or impractical transforms (such as
|
|
those that can be created with unbounded queue capacity, or stored knowledge
|
|
of traces for every possible HTTP trace on the Internet).
|
|
|
|
We also are not demanding an optimality or security proof for every defense.
|
|
|
|
Instead, we cite the above as benchmarks. We believe the space, especially the
|
|
open-world case, to be more akin to an optimization problem, where a
|
|
WTF-PAD-like defense must be tuned through an optimizer to produce results
|
|
comparable to provably optimal but practically unrealizable defenses, through
|
|
rigorous adversarial evaluation.
|
|
|
|
## A. Acknowledgments
|
|
|
|
This research was supported in part by NSF grants CNS-1619454 and CNS-1526306.
|