Commit Graph

38710 Commits

Author SHA1 Message Date
friendly73
3d5d8d59c1 Added relay prefix to new metrics functions 2023-05-25 11:03:35 -04:00
friendly73
7e57b9dbbf Fixed enum type not found in relay_stub 2023-05-25 11:03:35 -04:00
friendly73
5b76ce1843 Added void stubs for the relay metrics functions to fix building without relay module 2023-05-25 11:03:35 -04:00
friendly73
8cbfc90686 Fixed new arguments for metrics_store_add 2023-05-25 11:03:35 -04:00
friendly73
1899b6230d Removed getter abstraction and moved from rephist to relay_metrics. 2023-05-25 11:03:35 -04:00
friendly73
2f8a88448d Fixed est intro getter using wrong array 2023-05-25 11:03:35 -04:00
friendly73
24bc66f663 Fixed REND1 metric label value 2023-05-25 11:03:35 -04:00
friendly73
36076d3c46 Added INTRO and REND metrics for relay. 2023-05-25 11:03:35 -04:00
David Goulet
a1042f4873 Merge branch 'tor-gitlab/mr/443' 2023-05-25 10:50:15 -04:00
Alexander Færøy
5f432bff31 Add missing changes file for tor#33669.
See: tpo/core/tor#33669.
2023-05-25 10:50:11 -04:00
Alexander Færøy
506781d41e Restart PT processes when they die on us.
This patch forces a PT reconfigure of infant PT processes as part of the
PT process' exit handler.

See: tpo/core/tor#33669
2023-05-25 10:50:11 -04:00
Alexander Færøy
58f0e548ff Log at LD_PT instead of LD_GENERAL for PT process stdout lines.
See: tpo/core/tor#33669
2023-05-25 10:50:11 -04:00
Alexander Færøy
3338b34ec9 Only terminate PT processes that are running.
See: tpo/core/tor#33669
2023-05-25 10:50:11 -04:00
Alexander Færøy
0d51dfa605 Log name of managed proxy in exit handler.
This patch ensures that we can figure out which PT that terminated in
the PT exit handler.

See: tpo/core/tor#33669
2023-05-25 10:50:11 -04:00
Alexander Færøy
5118a8003b Log state transitions for Pluggable Transports
This patch makes Tor log state transitions within the PT layer at the
info log-level. This should make it easier to figure out if Tor ends up
in a strange state.

See: tpo/core/tor#33669
2023-05-25 10:50:11 -04:00
David Goulet
970a534f03 test: Fix parseconf to account for ClientUseIPv6 change for dirauth disabled
Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-25 10:20:12 -04:00
David Goulet
86bc3cc452 test: Fix parseconf to account for ClientUseIPv6 change
Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-25 09:21:23 -04:00
David Goulet
a2ec9a1199 Merge branch 'tor-gitlab/mr/711' 2023-05-24 11:45:40 -04:00
Micah Elizabeth Scott
23f4a28f97 token_bucket_ctr: replace 32-bit wallclock time with monotime
This started as a response to ticket #40792 where Coverity is
complaining about a potential year 2038 bug where we cast time_t from
approx_time() to uint32_t for use in token_bucket_ctr.

There was a larger can of worms though, since token_bucket really
doesn't want to be using wallclock time here. I audited the call sites
for approx_time() and changed any that used a 32-bit cast or made
inappropriate use of wallclock time. Things like certificate lifetime,
consensus intervals, etc. need wallclock time. Measurements of rates
over time, however, are better served with a monotonic timer that does
not try and sync with wallclock ever.

Looking closer at token_bucket, its design is a bit odd because it was
initially intended for use with tick units but later forked into
token_bucket_rw which uses ticks to count bytes per second, and
token_bucket_ctr which uses seconds to count slower events. The rates
represented by either token bucket can't be lower than 1 per second, so
the slower timer in 'ctr' is necessary to represent the slower rates of
things like connections or introduction packets or rendezvous attempts.

I considered modifying token_bucket to use 64-bit timestamps overall
instead of 32-bit, but that seemed like an unnecessarily invasive change
that would grant some peace of mind but probably not help much. I was
more interested in removing the dependency on wallclock time. The
token_bucket_rw timer already uses monotonic time. This patch converts
token_bucket_ctr to use monotonic time as well. It introduces a new
monotime_coarse_absolute_sec(), which is currently the same as nsec
divided by a billion but could be optimized easily if we ever need to.

This patch also might fix a rollover bug.. I haven't tested this
extensively but I don't think the previous version of the rollover code
on either token bucket was correct, and I would expect it to get stuck
after the first rollover.

Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-24 11:43:11 -04:00
David Goulet
9976da9367 Merge branch 'tor-gitlab/mr/709' 2023-05-24 11:37:05 -04:00
David Goulet
0781c2968d Merge branch 'tor-gitlab/mr/710' 2023-05-24 11:12:22 -04:00
Micah Elizabeth Scott
71b2958a62 test_hs_descriptor: Add a test case that fails without the fix for 40793
This adds a bit more to hs_descriptor/test_decode_descriptor, mostly
testing pow-params and triggering the tor_assert() in issue #40793.

There was no mechanism for adding arbitrary test strings to the
encrypted portion of the desc without duplicating encode logic. One
option might be to publicize get_inner_encrypted_layer_plaintext enough
to add a mock implementation. In this patch I opt for what seems like
the simplest solution, at the cost of a small amount of #ifdef noise.
The unpacked descriptor grows a new test-only member that's used for
dropping arbitrary data in at encode time.

Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-24 11:12:15 -04:00
David Goulet
6afe03aa51 Merge branch 'tor-gitlab/mr/708' 2023-05-24 11:03:47 -04:00
agowa338
ffb764949e ipv6: Flip ClientUseIPv6 to 1
Fixes #40785

Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-24 11:03:15 -04:00
David Goulet
8eae9f17ae metrics: Add ticket 40546 changes file and code fix
The MR was using an old function definition so the code fix is for that.

Closes #40546

Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-24 10:45:21 -04:00
David Goulet
21ec9017f6 Merge branch 'tor-gitlab/mr/698' 2023-05-24 10:40:25 -04:00
David Goulet
6bf56ac301 Merge branch 'tor-gitlab/mr/703' 2023-05-24 10:38:58 -04:00
Micah Elizabeth Scott
459b775a7e hs_pow: fix insufficient length check in pow-params
The descriptor validation table had an out of date minimum length
for pow-params (3) whereas the spec and the current code expect at
least 4 parameters. This was an opportunity for a malicious service
to cause an assert failure in clients which attempted to parse its
descriptor.

Addresses issue #40793

Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-15 12:11:00 -07:00
Mike Perry
032d850a2c Add changes file for conflux. 2023-05-11 19:24:49 +00:00
Micah Elizabeth Scott
a3ff3155c2 test_crypto: avoid memory leak in some hashx test failures
This should fix one of the warnings in issue #40792.

I was sloppy with freeing memory in the failure cases for
test_crypto_hashx. ASAN didn't notice but coverity did. Okay, I'll eat
my vegetables and put hashx_ctx's deinit in an upper scope and use
'goto done' correctly like a properly diligent C programmer.

Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-11 11:17:43 -07:00
Micah Elizabeth Scott
c71b6a14a3 equix: avoid a coverity warning in hashx_alloc()
This addresses one of the warnings in issue #40792. As far as I can tell
this is a false positive, since the use of "ctx->type" in hashx_free()
can only be hit after the unioned code/program pointer is non-NULL. It's
no big deal to zero this value explicitly to silence the warning though.

Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-11 11:10:15 -07:00
Mike Perry
79dab29a05 Add torrc option for conflux client UX. 2023-05-11 18:05:28 +00:00
Mike Perry
0c11577987 Fix unit tests. 2023-05-11 18:05:28 +00:00
Mike Perry
a340acb492 Clean up UX decision logic; hardcode for browser UX case. 2023-05-11 18:02:51 +00:00
Roger Dingledine
34da50718a fix minor typos in conflux and pow areas 2023-05-11 13:09:34 -04:00
Mike Perry
c8341abf82 Clean up and disable switch rate limiting.
Switch rate limiting will likely be helpful for limiting OOQ, but according to
shadow it was the cause of slower performance in Hong Kong endpoints.

So let's disable it, and then optimize for OOQ later.
2023-05-10 17:49:51 +00:00
Mike Perry
a98b1d3ab6 Remove two conflux algs: maxrate and cwndrate.
Maxrate had slower throughput than lowrtt in Shadow, which is not too
surprising. We just wanted to test it.
2023-05-10 17:49:51 +00:00
Micah Elizabeth Scott
e643a70879 hs_pow: Modify challenge format, include blinded HS id
This is a protocol breaking change that implements nickm's
changes to prop 327 to add an algorithm personalization string
and blinded HS id to the EquiX challenge string for our onion
service client puzzle.

This corresponds with the spec changes in torspec!130,
and it fixes a proposed vulnerability documented in
ticket tor#40789.

Clients and services prior to this patch will no longer
be compatible with the proposed "v1" proof-of-work protocol.

Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-10 07:41:37 -07:00
Micah Elizabeth Scott
138fd57072 hs_pow: add per-circuit effort information to control port
This lets controller apps see the outgoing PoW effort on client
circuits, and the validated effort received on an incoming service
circuit.

Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-10 07:41:37 -07:00
Micah Elizabeth Scott
971de27c07 hs_pow: fix error path with outdated assumption
This error path with the "PoW cpuworker returned with no solution.
Will retry soon." message was usually lying. It's concerning
now because we expect to always find a solution no matter how
long it takes, rather than re-enter the solver repeatedly, so any
exit without a solution is a sign of a problem.

In fact when this error path gets hit, we are usually missing a
circuit instead because the request is quite old and the circuits
have been destroyed. This is not an emergency, it's just a sign
of client-side overload.

Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-10 07:41:37 -07:00
Micah Elizabeth Scott
cba1ffb43a hs_pow: swap out some comments
i think we're done with these?
and swap in a nonfatal assert to replace one of the comments.

Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-10 07:41:37 -07:00
Micah Elizabeth Scott
a13d7bd5e9 hs_pow: always give other events a chance to run between rend requests
This dequeue path has been through a few revisions by now, first
limiting us to a fixed number per event loop callback, then an
additional limit based on a token bucket, then the current version
which has only the token bucket.

The thinking behing processing multiple requests per callback was to
optimize our usage of libevent, but in effect this creates a
prioritization problem. I think even a small fixed limit would be less
reliable than just backing out this optimization and always allowing
other callbacks to interrupt us in-between dequeues.

With this patch I'm seeing much smoother queueing behavior when I add
artificial delays to the main thread in testing.

Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-10 07:41:37 -07:00
Micah Elizabeth Scott
6023153631 hs_pow: modified approach to pqueue level thresholds
This centralizes the logic for deciding on these magic thresholds,
and tries to reduce them to just two: a min and max. The min should be a
"nearly empty" threshold, indicating that the queue only contains work
we expect to be able to complete very soon. The max level triggers a
bulk culling process that reduces the queue to half that amount.

This patch calculates both thresholds based on the torrc pqueue rate
settings if they're present, and uses generic defaults if the user asked
for an unlimited dequeue rate in torrc.

Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-10 07:41:37 -07:00
Micah Elizabeth Scott
50313d114f hs_pow: faster hs_circuitmap lookup for rend in pow_worker_job_t
The worker job queue for hs_pow needs what's effectively a weak pointer
to two circuits, but there's not a generic mechanism for this in c-tor.
The previous approach of circuit_get_by_global_id() is straightforward
but not efficient. These global IDs are normally only used by the
control port protocol. To reduce the number of O(N) lookups we have over
the whole circuit list, we can use hs_circuitmap to look up the rend
circuit by its auth cookie.

Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-10 07:41:37 -07:00
Micah Elizabeth Scott
a6138486f7 hs_pow: review feedback, use MAX for max_trimmed_effort
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-10 07:41:37 -07:00
Micah Elizabeth Scott
ee63863dca hs_pow: Lower several logs from notice to info
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-10 07:41:37 -07:00
Micah Elizabeth Scott
ff678d0fb5 hs_pow: update_suggested_effort fix and cleanup
This is trying to be an AIMD event-driven algorithm, but we ended up with
two different add paths with diverging behavior. This fix makes the AIMD
events more explicit, and it fixes an earlier behavior where the effort
could be decreased (by the add/recalculate branch) even when the pqueue
was not emptying at all. With this patch we shouldn't drop down to an
effort of zero as long as even low-effort attacks are flooding the
pqueue.

Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-10 07:40:46 -07:00
Micah Elizabeth Scott
903c6cf1ab hs_pow: client side effort adjustment
The goal of this patch is to add an additional mechanism for adjusting
PoW effort upwards, where clients rather than services can choose to
solve their puzzles at a higher effort than what was suggested in the
descriptor.

I wanted to use hs_cache's existing unreachability stats to drive this
effort bump, but this revealed some cases where a circuit (intro or
rend) closed early on can end up in hs_cache with an all zero intro
point key, where nobody will find it. This moves intro_auth_pk
initialization earlier in a couple places and adds nonfatal asserts to
catch the problem if it shows up elsewhere.

The actual effort adjustment method I chose is to multiply the suggested
effort by (1 + unresponsive_count), then ensure the result is at least
1. If a service has suggested effort of 0 but we fail to connect,
retries will all use an effort of 1. If the suggestion was 50, we'll try
50, 100, 150, 200, etc. This is bounded both by our client effort limit
and by the limit on unresponsive_count (currently 5).

Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-10 07:40:46 -07:00
Micah Elizabeth Scott
ac466a2219 hs_pow: leak fix, free the contents of pqueue entries in hs_pow_free_service_state
Asan catches this pretty readily when ending a service gracefully while
a DoS is in progress and the queue is full of items that haven't yet
timed out.

The module boundaries in hs_circuit are quite fuzzy here, but I'm trying
to follow the vibe of the existing hs_pow code.

Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-10 07:38:29 -07:00
Micah Elizabeth Scott
ac29c7209d hs_pow: bump client-side effort limit from 500 to 10000
500 was quite low, but this limit was helpful when the suggested-effort
estimation algorithm was likely to give us large abrupt increases. Now
that this should be fixed, let's allow spending a bit more time on the
client puzzles if it's actually necessary.

Solving a puzzle with effort=10000 usually completes within a minute
on my old x86_64 machine. We may want to fine tune this further, and it
should probably be made into a config option.

Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
2023-05-10 07:38:29 -07:00