This patch has no effect on the C tor build.
Adds a function hashx_rng_callback() to the hashx API, defined only
when HASHX_RNG_CALLBACK is defined. This is then used in the Rust
wrapper to implement a similar rng_callback().
Included some minimal test cases. This code is intented for
use in cross-compatibility fuzzing tests which drive multiple
implementations of hashx with the same custom Rng stream.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
The idea behind this is that we may want to start exporting more pieces
of c-tor as Rust crates so that Arti can perform cross compatibility and
comparison testing using Rust tooling.
This turns the 'tor' repo into a Cargo workspace, and adds one crate to
start with: "tor-c-equix", rooted in src/ext/equix. This actually
includes both Equi-X itself and HashX, since there's less overall
duplication if we package these together instead of packaging HashX
separately.
This patch adds a basic safe Rust interface, but doesn't expose any
additional internals for testing purposes.
No changes to the C code here or the normal Tor build system.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
Rotate to a new L2 vanguard whenever an existing one loses the
Stable or Fast flag. Previously, we would leave these relays in the
L2 vanguard list but never use them, and if all of our vanguards
end up like this we wouldn't have any middle nodes left to choose
from so we would fail to make onion-related circuits.
Fixes bug 40805; bugfix on 0.4.7.1-alpha.
This addresses issue #40800 and a couple other problems I noticed while
trying to reproduce that one.
The original issue is just a missing cast to void* on the args of
__builtin___clear_cache(), and clang is picky about the implicit cast
between what it considers to be char of different signedness. Original
report is from MacOS but it's also reproducible on other clang targets.
The cmake-based original build system for equix and hashx was a handy
way to run tests, but it suffered from some warnings due to incorrect
application of include_directories().
And lastly, there were some return codes from hashx_exec() that get
ignored on equix when asserts are disabled. It bugged me too much to
just silence this with a (void) cast, since even though this is in the
realm of low-likelyhood programming errors and not true runtime errors, I
don't want to make it easy for the hashx_exec() wrappers to return
values that are dangerously wrong if an error is ignored. I made sure
that even if asserts are disabled, we return values that will cause the
solver and verifier to both fail to validate a potential solution.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This fixes an "initializer is not a constant" compilation error that manifests
itself on gcc versions < 8.1 and MSVC (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69960#c18).
Fixes bug #40773
Signed-off-by: Gabriela Moldovan <gabi@torproject.org>
Here is the report:
*** CID 1531835: Resource leaks (RESOURCE_LEAK)
/src/test/test_crypto_slow.c: 683 in test_crypto_equix()
677
678 /* Solve phase: Make sure the test vector matches */
679 memset(&output, 0xa5, sizeof output);
680 equix_result result;
681 result = equix_solve(solve_ctx, challenge_literal,
682 challenge_len, &output);
>>> CID 1531835: Resource leaks (RESOURCE_LEAK)
>>> Variable "solve_ctx" going out of scope leaks the storage it points to.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This is a change intended for 0.4.7 maintenance as well as main.
The CI builds use Debian Buster which is now end of life, and I was
experiencing inconsistent CI failures with accessing its security update
server. I wanted to update CI to a distro that isn't EOL, and Bullseye
is the current stable release of Debian.
This opened up a small can of worms that this commit also deals with.
In particular there's a docker engine bug that we work around by
removing the docker-specific apt cleanup script if it exists, and
there's a new incompatibility between tracing and sandbox support.
The tracing/sandbox incompatibility itself had two parts:
- The membarrier() syscall is used to deliver inter-processor
synchronization events, and the external "userspace-rcu"
data structure library would make assumptions that if membarrier
is available at initialization it always will be. This caused
segfaults in some cases when running trace + sandbox. Resolved this
by allowing membarrier entirely, in the sandbox.
- userspace-rcu also assumes it can block signals, and fails
hard if this can't be done. We already include a similar carveout
to allow this in the sandbox for fragile-hardening, so I extended
that to cover tracing as well.
Addresses issue #40799
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This exposes the new fallback behavior in hashx via a new AUTOBOOL
configuration option, available to both clients and services. The
default should be fine for nearly everyone, but it might be necessary
to enable or disable the compiler manually for diagnostic purposes.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This change adapts the hs_pow layer and unit tests to API changes
in hashx and equix which modify the fault recovery responsibilities
and reporting behaivor.
This and the corresponding implementation changes in hashx and equix
form the fix for #40794, both solving the segfault and giving hashx a
way to report those failures up the call chain without them being
mistaken for a different error (unusable seed) that would warrant a
retry.
To handle these new late compiler failures with a minimum of fuss or
inefficiency, the failover is delegated to the internals of hashx and
tor needs only pass in a EQUIX_CTX_TRY_COMPILE flag to get the behavior
that tor was previously responsible for implementing.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This change adapts Equi-X to the corresponding HashX API changes that
added HASHX_TRY_COMPILE. The new regularized HashX return codes are
reflected by revised corresponding Equi-X return codes.
Both solve and verify operations now return an error/success code, and a
new equix_solutions_buffer struct includes both the solution buffer
and information about the solution count and hashx implementation.
With this change, it's possible to discern between hash construction
failures (invalid seed) and some external error like an mprotect()
failure.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This is an API breaking change to hashx, which modifies the error
handling strategy. The main goal here is to allow unproblematic
recovery from hashx_compile failures.
hashx_alloc can no longer fail for reasons other than memory
allocation. All platform-specific compile failures are now reported via
hashx_make(), in order to both allow later failure and avoid requiring
users of the API to maintain and test multiple failure paths.
Note that late failures may be more common in actual use than early
failures. Early failures represent architectures other than x86_64 and
aarch64. Late failures could represent a number of system configurations
where syscalls are restricted.
The definition of a hashx context no longer tries to overlay storage for
the different types of program, and instead allows one context to always
contain an interpretable description of the program as well as an optional
buffer for compiled code.
The hashx_type enum is now used to mean either a specific type of hash
function or a type of hashx context. You can allocate a context for use
only with interpreted or compiled functions, or you can use
HASHX_TRY_COMPILE to prefer the compiler with an automatic fallback on
the interpreter. After calling hashx_make(), the new hashx_query_type()
can be used if needed to determine which implementation was actually
chosen.
The error return types have been overhauled so that everyone uses the
hashx_result enum, and seed failures vs compile failures are always
clearly distinguishable.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This is a minimal portion of the fix for tor issue #40794, in which
hashx segfaults due to denial of mprotect() syscalls at runtime.
Prior to this fix, hashx makes the assumption that if the JIT is
supported on the current architecture, it will also be usable at
runtime. This isn't true if mprotect fails on linux, which it may for
various reasons: the tor built-in sandbox, the shadow simulator, or
external security software that implements a syscall filter.
The necessary error propagation was missing internally in hashx,
causing us to obliviously call into code which was never made
executable. With this fix, hashx_make() will instead fail by returning
zero.
A proper fix will require API changes so that callers can discern
between different types of failures. Zero already means that a program
couldn't be constructed, which requires a different response: choosing a
different seed, vs switching implementations. Callers would also benefit
from a way to use one context (with its already-built program) to
run in either compiled or interpreted mode.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
The code style in equix and hashx sometimes uses bitwise operators
in place of logical ones in cases where it doesn't really matter
either way. This sometimes annoys our static analyzer tools.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This is an additional test case for test_sandbox that runs a small
subset of test_crypto_equix() inside the syscall sandbox, where
mprotect() is filtered.
It's reasonable for the sandbox to disallow JIT. We could revise this
policy if we want, but it seems a good default for now. The problem
in issue 40794 is that both equix and hashx need improvements in their
API to handle failures after allocation time, and this failure occurs
while the hash function is being compiled.
With this commit only, the segfault from issue 40794 is reproduced.
Subsequent commits will fix the segfault and revise the API.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This patch makes Tor log state transitions within the PT layer at the
info log-level. This should make it easier to figure out if Tor ends up
in a strange state.
See: tpo/core/tor#33669
This started as a response to ticket #40792 where Coverity is
complaining about a potential year 2038 bug where we cast time_t from
approx_time() to uint32_t for use in token_bucket_ctr.
There was a larger can of worms though, since token_bucket really
doesn't want to be using wallclock time here. I audited the call sites
for approx_time() and changed any that used a 32-bit cast or made
inappropriate use of wallclock time. Things like certificate lifetime,
consensus intervals, etc. need wallclock time. Measurements of rates
over time, however, are better served with a monotonic timer that does
not try and sync with wallclock ever.
Looking closer at token_bucket, its design is a bit odd because it was
initially intended for use with tick units but later forked into
token_bucket_rw which uses ticks to count bytes per second, and
token_bucket_ctr which uses seconds to count slower events. The rates
represented by either token bucket can't be lower than 1 per second, so
the slower timer in 'ctr' is necessary to represent the slower rates of
things like connections or introduction packets or rendezvous attempts.
I considered modifying token_bucket to use 64-bit timestamps overall
instead of 32-bit, but that seemed like an unnecessarily invasive change
that would grant some peace of mind but probably not help much. I was
more interested in removing the dependency on wallclock time. The
token_bucket_rw timer already uses monotonic time. This patch converts
token_bucket_ctr to use monotonic time as well. It introduces a new
monotime_coarse_absolute_sec(), which is currently the same as nsec
divided by a billion but could be optimized easily if we ever need to.
This patch also might fix a rollover bug.. I haven't tested this
extensively but I don't think the previous version of the rollover code
on either token bucket was correct, and I would expect it to get stuck
after the first rollover.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This adds a bit more to hs_descriptor/test_decode_descriptor, mostly
testing pow-params and triggering the tor_assert() in issue #40793.
There was no mechanism for adding arbitrary test strings to the
encrypted portion of the desc without duplicating encode logic. One
option might be to publicize get_inner_encrypted_layer_plaintext enough
to add a mock implementation. In this patch I opt for what seems like
the simplest solution, at the cost of a small amount of #ifdef noise.
The unpacked descriptor grows a new test-only member that's used for
dropping arbitrary data in at encode time.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
The descriptor validation table had an out of date minimum length
for pow-params (3) whereas the spec and the current code expect at
least 4 parameters. This was an opportunity for a malicious service
to cause an assert failure in clients which attempted to parse its
descriptor.
Addresses issue #40793
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This should fix one of the warnings in issue #40792.
I was sloppy with freeing memory in the failure cases for
test_crypto_hashx. ASAN didn't notice but coverity did. Okay, I'll eat
my vegetables and put hashx_ctx's deinit in an upper scope and use
'goto done' correctly like a properly diligent C programmer.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This addresses one of the warnings in issue #40792. As far as I can tell
this is a false positive, since the use of "ctx->type" in hashx_free()
can only be hit after the unioned code/program pointer is non-NULL. It's
no big deal to zero this value explicitly to silence the warning though.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
Switch rate limiting will likely be helpful for limiting OOQ, but according to
shadow it was the cause of slower performance in Hong Kong endpoints.
So let's disable it, and then optimize for OOQ later.
This is a protocol breaking change that implements nickm's
changes to prop 327 to add an algorithm personalization string
and blinded HS id to the EquiX challenge string for our onion
service client puzzle.
This corresponds with the spec changes in torspec!130,
and it fixes a proposed vulnerability documented in
ticket tor#40789.
Clients and services prior to this patch will no longer
be compatible with the proposed "v1" proof-of-work protocol.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This lets controller apps see the outgoing PoW effort on client
circuits, and the validated effort received on an incoming service
circuit.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This error path with the "PoW cpuworker returned with no solution.
Will retry soon." message was usually lying. It's concerning
now because we expect to always find a solution no matter how
long it takes, rather than re-enter the solver repeatedly, so any
exit without a solution is a sign of a problem.
In fact when this error path gets hit, we are usually missing a
circuit instead because the request is quite old and the circuits
have been destroyed. This is not an emergency, it's just a sign
of client-side overload.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
i think we're done with these?
and swap in a nonfatal assert to replace one of the comments.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This dequeue path has been through a few revisions by now, first
limiting us to a fixed number per event loop callback, then an
additional limit based on a token bucket, then the current version
which has only the token bucket.
The thinking behing processing multiple requests per callback was to
optimize our usage of libevent, but in effect this creates a
prioritization problem. I think even a small fixed limit would be less
reliable than just backing out this optimization and always allowing
other callbacks to interrupt us in-between dequeues.
With this patch I'm seeing much smoother queueing behavior when I add
artificial delays to the main thread in testing.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This centralizes the logic for deciding on these magic thresholds,
and tries to reduce them to just two: a min and max. The min should be a
"nearly empty" threshold, indicating that the queue only contains work
we expect to be able to complete very soon. The max level triggers a
bulk culling process that reduces the queue to half that amount.
This patch calculates both thresholds based on the torrc pqueue rate
settings if they're present, and uses generic defaults if the user asked
for an unlimited dequeue rate in torrc.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
The worker job queue for hs_pow needs what's effectively a weak pointer
to two circuits, but there's not a generic mechanism for this in c-tor.
The previous approach of circuit_get_by_global_id() is straightforward
but not efficient. These global IDs are normally only used by the
control port protocol. To reduce the number of O(N) lookups we have over
the whole circuit list, we can use hs_circuitmap to look up the rend
circuit by its auth cookie.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This is trying to be an AIMD event-driven algorithm, but we ended up with
two different add paths with diverging behavior. This fix makes the AIMD
events more explicit, and it fixes an earlier behavior where the effort
could be decreased (by the add/recalculate branch) even when the pqueue
was not emptying at all. With this patch we shouldn't drop down to an
effort of zero as long as even low-effort attacks are flooding the
pqueue.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
The goal of this patch is to add an additional mechanism for adjusting
PoW effort upwards, where clients rather than services can choose to
solve their puzzles at a higher effort than what was suggested in the
descriptor.
I wanted to use hs_cache's existing unreachability stats to drive this
effort bump, but this revealed some cases where a circuit (intro or
rend) closed early on can end up in hs_cache with an all zero intro
point key, where nobody will find it. This moves intro_auth_pk
initialization earlier in a couple places and adds nonfatal asserts to
catch the problem if it shows up elsewhere.
The actual effort adjustment method I chose is to multiply the suggested
effort by (1 + unresponsive_count), then ensure the result is at least
1. If a service has suggested effort of 0 but we fail to connect,
retries will all use an effort of 1. If the suggestion was 50, we'll try
50, 100, 150, 200, etc. This is bounded both by our client effort limit
and by the limit on unresponsive_count (currently 5).
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
Asan catches this pretty readily when ending a service gracefully while
a DoS is in progress and the queue is full of items that haven't yet
timed out.
The module boundaries in hs_circuit are quite fuzzy here, but I'm trying
to follow the vibe of the existing hs_pow code.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
500 was quite low, but this limit was helpful when the suggested-effort
estimation algorithm was likely to give us large abrupt increases. Now
that this should be fixed, let's allow spending a bit more time on the
client puzzles if it's actually necessary.
Solving a puzzle with effort=10000 usually completes within a minute
on my old x86_64 machine. We may want to fine tune this further, and it
should probably be made into a config option.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
I don't think the concept of "minimum effort" is really useful to us,
so this patch removes it entirely and consequentially changes the way
that "total" effort is calculated so that we don't rely on any minimum
and we instead ramp up effort no faster than necessary.
If at least some portion of the attack is conducted by clients that
avoid PoW or provide incorrect solutions, those (potentially very
cheap) attacks will end up keeping the pqueue full. Prior to this patch,
that would cause suggested efforts to be unnecessarily high, because
rounding these very cheap requests up to even a minimum of 1 will
overestimate how much actual attack effort is being spent.
The result is that this patch is a simplification and it also allows a
slower start, where PoW effort jumps up either by a single unit or by an
amount calculated from actual effort in the queue.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This patch is intended to clarify the points at which we convert
between the internal representation of an equix_solution and a portable
but opaque byte array representation.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This fixes a failure that was showing up on i386 Debian hosts
with sandboxing enabled, now that cpuworker is enabled on clients.
We already had allowances for creating threads and creating stacks
in the sandbox, but prot_none (probably used for a stack guard)
was not allowed so thread creation failed.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This leak was showing up in address sanitizer runs of test_hs_pow,
but it will also happen during normal operation as seeds are rotated.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This is more consistent with the specification, and it's much
less confusing with endianness. This resolves the underlying
cause of the earlier byte-swap. This patch itself does not
change the wire protocol at all, it's just tidying up the
types we use at the trunnel layer.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
We were using a native uint128_t to represent the hs_pow nonce,
but as the comments note it's more portable and more flexible to
use a byte array. Indeed the uint128_t was a problem for 32-bit
platforms. This swaps in a new implementation that uses multiple
machine words to implement the nonce incrementation.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
In proposal 327, "POW_SEED is the first 4 bytes of the seed used".
The proposal doesn't specifically mention the data type of this field,
and the code in hs_pow so far treats it as an integer but semantically
it's more like the first four bytes of an already-encoded little endian
blob. This leads to a byte swap, since the type confusion takes place
in a little-endian subsystem but the wire encoding of seed_head uses
tor's default of big endian.
This patch does not address the underlying type confusion, it's a
minimal change that only swaps the byte order and updates unit tests
accordingly. Further changes will clean up the data types.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
Much faster per-hash, affects both verify and solve.
Only implemented on x86_64 and aarch64, other platforms
always use the interpreted version of hashx.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This adds test vectors for the overall client puzzle at the
hs_pow and hs_cell layers.
These are similar to the crypto/equix tests, but they also cover
particulars of our hs_pow format like the conversion to byte arrays,
the replay cache, the effort test, and the formatting of the equix
challenge string.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
Fixes some type nitpicks that show up in Tor development builds,
which usually run with -Wall -Werror. Tested on x86_64 and aarch64
for clean build and passing equix-tests + hashx-tests.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This replaces the sketchy cmake invocation we had inside configure
The libs are always built and always used in unit tests, but only
included in libtor and tor when --enable-gpl is set.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This forgoes another external library dependency, and instead
introduces a compatibility header so that interested parties
(who already depend on equix, like hs_pow and unit tests) can
use the implementation of blake2b included in hashx.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This adds test vectors for the Equi-X proof of work algorithm and the
Hash-X function it's based on. The overall Equi-X test takes about
10 seconds to run on my machine, so it's in test_crypto_slow. The hashx
test still covers both the compiled and interpreted versions of the
hash function.
There aren't any official test vectors for Equi-X or for its particular
configuration of Hash-X, so I made some up based on the current
implementation.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
I'm planning on swapping blake2b implementations, and this test
is intended to prevent regressions. Right now blake2b is only used by
hs_pow.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This adds a new "pow" module for the user-visible proof
of work support in ./configure, and this disables
src/feature/hs/hs_pow at compile-time.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This change on its own doesn't use the option for anything, but
it includes support for configure and a message in 'tor --version'
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This was apparently misinterpreting "zero solutions" as an error
instead of just moving on to the next nonce. Additionally, equix
could have been returning up to 8 solutions and we would only
give one of those a chance to succeed.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
We may want to choose something larger eventually, but 20 seemed
much too large. Very low nonzero efforts are still useful against
a script kiddie level DoS attack.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This adds a token bucket ratelimiter on the dequeue side
of hs_pow's priority queue. It adds config options and docs
for those options. (HiddenServicePoWQueueRate/Burst)
I'm testing this as a way to limit the overhead of circuit
creation when we're experiencing a flood of rendezvous requests.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
Without this check, we never actually refetch the hs descriptor
when PoW parameters expire, because can_client_refetch_desc
deems the descriptor to be still good.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
Adds two new metrics for hs_pow, and an internal parameter within
hs_metrics for implementing gauge parameters that reset before
every update.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
We mark the intro circuit with a new flag saying that the pow is
in the cpuworker queue. When the cpuworker comes back, it either
has a solution, in which case we proceed with sending the intro1
cell, or it has no solution, in which case we unmark the intro
circuit and let the whole process restart on the next iteration of
connection_ap_handshake_attach_circuit().
into two parts:
* a "consider whether to send an intro2 cell" part (now called
consider_sending_introduce1()), and
* an "actually send it" (now called send_introduce1()).