Proposal 289 prevents SENDME-flooding by requiring the other side to
authenticate the data it has received. But this data won't actually
be random if they are downloading a known resource. "No problem",
we said, "let's fell the empty parts of our cells with some
randomness!" and we did that in #26871.
Unfortunately, if the relay data payloads are all completely full,
there won't be any empty parts for us to randomize.
Therefore, we now pick random "randomness windows" between
CIRCWINDOW_INCREMENT/2 and CIRCWINDOW_INCREMENT. We remember whether we have
sent a cell containing at least 16 bytes of randomness in that window. If we
haven't, then when the window is exhausted, we send one. (This window approach
is designed to lower the number of rng checks we have to do. The number 16 is
pulled out of a hat to change the attacker's guessing difficulty to
"impossible".)
Implements 28646.
Because it invokes the Tor mainloop, it does unpredictable things to
test coverage of a lot of code that it doesn't actually test at
all. (It is more an integration test than anything else.)
The ordinary definitions of timeradd() and timersub() contain a
branch. However, in coverage builds, this means that we get spurious
complaints about partially covered basic blocks, in a way that makes
our coverage determinism harder to check.
Ordinarily we skip calling log_fn(LOG_DEBUG,...) if debug logging is
completely disabled. However, in coverage builds, this means that
we get spurious complaints about partially covered basic blocks, in
a way that makes our coverage determinism harder to check.
Two non fatal asserts are added in this commit. First one is to see if the
SENDME digest list kept on the circuit for validation ever grows bigger than
the maximum number of expected SENDME on a circuit (currently 10).
The second one is to know if we ever send more than one SENDME at a time on a
circuit. In theory, we shouldn't but if we ever do, the v1 implementation
wouldn't work because we only keep one single cell digest (the previous cell
to the SENDME) on the circuit/cpath. Thus, sending two SENDME consecutively
will lead to a mismatch on the other side because the same cell digest would
be use and thus the circuit would collapse.
Finally, add an extra debug log in case we emit a v0 which also includes the
consensus emit version in that case.
Part of #30428
Signed-off-by: David Goulet <dgoulet@torproject.org>
We must not accumulate digests on the circuit if the other end point is using
another SENDME version that is not using those digests like v0.
This commit makes it that we always pop the digest regardless of the version.
Part of #30428
Signed-off-by: David Goulet <dgoulet@torproject.org>
Commit 4ef8470fa5480d3b was actually reverted before because in the end we
needed to do this minus 1 check on the window.
This commit clarifies that in the code, takes the useful comment changes from
4ef8470fa5480d3b and makes sendme_circuit_cell_is_next() private since it
behaves in a very specific way that one external caller might expect.
Part of #30428.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Turns out that we were only recording the "b_digest" but to have
bidirectionnal authenticated SENDMEs, we need to use the "f_digest" in the
forward cell situation.
Because of the cpath refactoring, this commit plays with the crypt_path_ and
relay_crypto_t API a little bit in order to respect the abstractions.
Previously, we would record the cell digest as the SENDME digest in the
decrypt cell function but to avoid code duplication (both directions needs to
record), we now do that right after iff the cell is recognized (at the edge).
It is now done in circuit_receive_relay_cell() instead.
We now also record the cell digest as the SENDME digest in both relay cell
encryption functions since they are split depending on the direction.
relay_encrypt_cell_outbound() and relay_encrypt_cell_inbound() need to
consider recording the cell digest depending on their direction (f vs b
digest).
Fixes#30428
Signed-off-by: David Goulet <dgoulet@torproject.org>
There was a missing cell version check against our max supported version. In
other words, we do not fallback to v0 anymore in case we do know the SENDME
version.
We can either handle it or not, never fallback to the unauthenticated version
in order to avoid gaming the authenticated logic.
Add a unit tests making sure we properly test that and also test that we can
always handle the default emit and accepted versions.
Fixes#30428
Signed-off-by: David Goulet <dgoulet@torproject.org>
The validation of the SENDME cell is now done as the very first thing when
receiving it for both client and exit. On failure to validate, the circuit is
closed as detailed in the specification.
Part of #30428
Signed-off-by: David Goulet <dgoulet@torproject.org>
It turns out that only the exit side is validating the authenticated SENDME v1
logic and never the client side. Which means that if a client ever uploaded
data towards an exit, the authenticated SENDME logic wouldn't apply.
For this to work, we have to record the cell digest client side as well which
introduced a new function that supports both type of edges.
This also removes a test that is not valid anymore which was that we didn't
allow cell recording on an origin circuit (client).
Part of #30428
Signed-off-by: David Goulet <dgoulet@torproject.org>
We want to support parsing a cell with unknown status code so we are forward
compatible.
Part of #30454
Signed-off-by: David Goulet <dgoulet@torproject.org>
Like the previous commit about the INTRODUCE_ACK status code, change all auth
key type to use the one defined in the trunnel file.
Standardize the use of these auth type to a common ABI.
Part of #30454
Signed-off-by: David Goulet <dgoulet@torproject.org>
This enum was the exact same as hs_intro_ack_status_t that was removed at the
previous commit. It was used client side when parsing the INTRODUCE_ACK cell.
Now, the entire code dealing with the INTRODUCE_ACK cell (both sending and
receiving) have been modified to all use the same ABI defined in the trunnel
introduce1 file.
Finally, the client will default to the normal behavior when receiving an
unknown NACK status code which is to note down that we've failed and re-extend
to the next intro point. This way, unknown status code won't trigger a
different behavior client side.
Part of #30454.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Remove the hs_intro_ack_status_t enum and move the value into trunnel. Only
use these values from now on in the intro point code.
Interestingly enough, the client side also re-define these values in hs_cell.h
with the hs_cell_introd_ack_status_t enum. Next commit will fix that and force
to use the trunnel ABI.
Part of #30454
Signed-off-by: David Goulet <dgoulet@torproject.org>
Previously we purged it in 1-hour increments -- but one-hour is the
maximum TTL for the cache! Now we do it in 25%-TTL increments.
Fixes bug 29617; bugfix on 0.3.5.1-alpha.
The client side had garbage histograms and deadcode here, too. That code has
been removed.
The tests have also been updated to properly test the intro circ by sending
padding from the relay side to the client, and verifying that both shut down
when padding was up. (The tests previously erroneously tested only the client
side of intro circs, which actually were supposed to be doing nothing).
This just moves the state transition directives into the proper client/relay
side functions. It also allows us to remove some dead-code from the client
side (since the client doesn't send padding).
This is the first half of implementing proposal 301. The
RecommendedPackages torrc option is marked as obsolete and
the test cases for the option removed. Additionally, the code relating
to generating and formatting package lines in votes is removed.
These lines may still appear in votes from other directory authorities
running earlier versions of the code and so consensuses may still
contain package lines. A new consensus method will be needed to stop
including package lines in consensuses.
Fixes: #28465
- Add some more useful logs for future debugging.
- Stop usage of circpad_state_to_string(). It's innacurate.
- Reduce severity and fix up log domain of some logging messages.
To ease debugging of miscount issues, attach vanguards with --loglevel DEBUG
and obtain control port logs (or use any other control port CIRC and
CIRC_MINOR event logging mechanism).
If circuit padding wants to keep a circuit open and pathbias used to ignore
it, pathbias should continue to ignore it.
This may catch other purpose-change related miscounts (such as timeout
measurement, cannibalization, onion service circuit transitions, and
vanguards).
When a circuit is marked for close, check to see if any of our padding
machines want to take ownership of it and continue padding until the machine
hits the END state.
For safety, we also ensure that machines that do not terminate are still
closed as follows: Because padding machine timers are UINT32_MAX in size, if
some sort of network event doesn't happen on a padding-only circuit within
that time, we can conclude it is deadlocked and allow
circuit_expire_old_circuits_clientside() to close it.
If too much network activity happens, then per-machine padding limits can be
used to cease padding, which will cause network cell events to cease, on the
circuit, which will cause circpad to abandon the circuit as per the above time
limit.
We need to check here because otherwise we can try to schedule padding with no
tokens left upon the receipt of a padding event when our bins just became
empty.
Our other tests tested state lengths against padding packets, and token counts
against non-padding packets. This test checks state lengths against
non-padding packets (and also padding packets too), and checks token counts
against padding packets (and also non-padding packets too).
The next three commits are needed to make this test pass (it found 3 bugs).
Yay?
Since the reproducible RNG dumps its own seed, we don't need to do
it for it. Since tinytest can tell us if the test failed, we don't
need our own test_failed booleans.
This commit moves code that updates the state length and padding limit counts
out from the callback to its own function, for clarity.
It does not change functionality.
This commit moves the padding state limit checks and the padding rate limit
checks out of the token removal codepath, and causes all three functions to
get called from a single circpad_machine_count_nonpadding_sent() function.
It does not change functionality.
The code flow in theory can end up with a layer_hint to be NULL but in
practice it should never happen because with an origin circuit, we must have
the layer_hint.
Just in case, BUG() on it if we ever end up in this situation and recover by
closing the circuit.
Fixes#30467.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Fortunately, in 0.3.5.1-alpha we improved logging for various
failure cases involved with onion service client auth.
Unfortunately, for this one, we freed the file right before logging
its name.
Fortunately, tor_free() sets its pointer to NULL, so we didn't have
a use-after-free bug.
Unfortunately, passing NULL to %s is not defined.
Fortunately, GCC 9.1.1 caught the issue!
Unfortunately, nobody has actually tried building Tor with GCC 9.1.1
before. Or if they had, they didn't report the warning.
Fixes bug 30475; bugfix on 0.3.5.1-alpha.
The INTRODUCE1 trunnel definition file doesn't support that value so it can
not be used else it leads to an assert on the intro point side if ever tried.
Fortunately, it was impossible to reach that code path.
Part of #30454
Signed-off-by: David Goulet <dgoulet@torproject.org>
See proposal 289 section 4.3 for more details.
It describes the flow control protocol at the circuit and stream level. If
there is no FlowCtrl protocol version, tor supports the unauthenticated flow
control features from its supported Relay protocols.
At this commit, relay will start advertising FlowCtrl=1 meaning they support
authenticated SENDMEs v1.
Closes#30363
Signed-off-by: David Goulet <dgoulet@torproject.org>
- Move test-only cpath_get_n_hops() to crypt_path.c.
- Move onion_next_hop_in_cpath() and rename to cpath_get_next_non_open_hop().
The latter function was directly accessing cpath->state, and it's a first step
at hiding ->state.
Some of these functions are now public and cpath-specific so their name should
signify the fact they are part of the cpath module:
assert_cpath_layer_ok -> cpath_assert_layer_ok
assert_cpath_ok -> cpath_assert_ok
onion_append_hop -> cpath_append_hop
circuit_init_cpath_crypto -> cpath_init_circuit_crypto
circuit_free_cpath_node -> cpath_free
onion_append_to_cpath -> cpath_extend_linked_list
Now that we are using a constructor we should be more careful that we are
always using the constructor to initialize crypt_path_t, so make sure that
->private is initialized.
We are using an opaque pointer so the structure needs to be allocated on the
heap. This means we now need a constructor for crypt_path_t.
Also modify all places initializing a crypt_path_t to use the constructor.
For various reasons, this was a nontrivial movement. There are
several places in the code where we do something like "update the
flags on this routerstatus or node if we're an authority", and at
least one where we pretended to be an authority when we weren't.
I don't believe any of these represent a real timing vulnerability
(remote timing against memcmp() on a modern CPU is not easy), but
these are the ones where I believe we should be more careful.
For memeq and friends, "tor_" indicates constant-time and "fast_"
indicates optimized. I'm fine with leaving the constant-time
"safe_mem_is_zero" with its current name, but the "tor_" prefix on
the current optimized version is misleading.
Also, make the tor_digest*_is_zero() uniformly constant-time, and
add a fast_digest*_is_zero() version to use as needed.
A later commit in this branch will fix all the users of
tor_mem_is_zero().
Closes ticket 30309.