Commit Graph

38781 Commits

Author SHA1 Message Date
Roger Dingledine
eba9190933 compute the client-side pow in a cpuworker thread
We mark the intro circuit with a new flag saying that the pow is
in the cpuworker queue. When the cpuworker comes back, it either
has a solution, in which case we proceed with sending the intro1
cell, or it has no solution, in which case we unmark the intro
circuit and let the whole process restart on the next iteration of
connection_ap_handshake_attach_circuit().
2023-05-10 07:37:11 -07:00
Roger Dingledine
aa41d4b939 refactor send_introduce1()
into two parts:

* a "consider whether to send an intro2 cell" part (now called
consider_sending_introduce1()), and

* an "actually send it" (now called send_introduce1()).
2023-05-10 07:37:11 -07:00
Roger Dingledine
a5b0c7b404 start the cpuworkers always, even for clients
prepares the way for client-side pow cpuworkers

also happens to resolve bug https://bugs.torproject.org/tpo/core/tor/40617
(which went into 0.4.7.4-alpha) because now we survive initing the
cpuworker subsystem when we're not a relay.
2023-05-10 07:37:11 -07:00
Roger Dingledine
0716cd7cb2 allow suggested effort to be 0
First (both client and service), make descriptor parsing not fail when
suggested_effort is 0.

Second (client side), if we get a descriptor with a pow_params section
but with suggested_effort of 0, treat it as not requiring a pow.

Third (service side), when deciding whether the suggested effort has
changed, don't treat "previous suggested effort 0, new suggested effort 0"
as a change.

An alternative design to resolve 'first' and 'second' above would be
to omit the pow_params from the descriptor when suggested_effort is 0,
so clients never see the pow_params so they don't compute a pow. But
I decided to include a pow_params with an explicit suggested_effort
of 0, since this way the client knows the seed etc so they can solve
a higher-effort pow if they want. The tradeoff is that the descriptor
reveals whether HiddenServicePoWDefensesEnabled is set to 1 for this onion
service, even if the AIMD calculation is currently requiring effort 0.
2023-05-10 07:37:11 -07:00
Mike Perry
d36144ba31 Initialize startup effort at 0.
If it works correctly, auto-tuning should set a non-zero effort once
an attack begins.
2023-05-10 07:37:11 -07:00
Mike Perry
ec9e95cf1e Implement AIMD effort estimation.
Now, pow should auto-enable and auto-disable itself.
2023-05-10 07:37:11 -07:00
Mike Perry
5b3a067fe3 Replace the constant bottom-half rate with handled count.
This allows us to more accurately estimate effort, based on real bottom-half
throughput over the duration of a descriptor update.
2023-05-10 07:37:11 -07:00
Mike Perry
121766e6b8 Make the thing compile. 2023-05-10 07:37:11 -07:00
Roger Dingledine
e605620744 clients defend themselves from absurd pow requests
if asked for higher than a cap, we just solve it at the cap

i picked 500 for now but maybe we'll pick a better number in the future.
2023-05-10 07:37:11 -07:00
Roger Dingledine
ec7495d35a log_err is reserved for fatal failures 2023-05-10 07:37:11 -07:00
Roger Dingledine
e436ce2a3c drop the default min effort to 20
effort 100 is really quite expensive
2023-05-10 07:37:11 -07:00
Roger Dingledine
a575e35c17 sort pqueue ties by time-added
our pqueue implementation does bizarre unspecified things with
ordering of elements that are equal. it certainly doesn't do any
sort of "first in first out" property that i was expecting.

now make it explicit by saying that "equal-effort, added-earlier" is
higher priority.
2023-05-10 07:37:11 -07:00
Roger Dingledine
13f6258245 rate-limit low-effort rendezvous responses
specifically, if we have 16 in-flight rend circs, and the next
one at the top of the pqueue is lower than our suggested effort,
then don't launch it yet.

this way we always launch adequate-effort requests immediately, and
we always handle *some* low-effort requests, but we are ready at any
moment to handle a few new adequate-effort requests.
2023-05-10 07:37:11 -07:00
Roger Dingledine
dec3a0af7a make the rend_pqueue_cb event be postloop
this change makes us reach the callback *after* each mainloop
run, rather than as the next event to run immediately after
activation.

with the old behavior, we were starving everything else to drain the
pqueue entirely, each time we got a new intro2 cell.

now we at least will get to other activities as well.
2023-05-10 07:37:11 -07:00
Roger Dingledine
b95bd5017f track how many in-flight hs-side rend circs
not used in decision-making yet, but it's all ready to use in a
"don't dequeue any more if we have too many in-flight" kind of way
2023-05-10 07:37:11 -07:00
Roger Dingledine
5e768d5cb9 we were sorting our pqueue the wrong way
i.e. we were putting higher effort intro2 cells at the *end*
2023-05-10 07:37:11 -07:00
Roger Dingledine
d0c2d4cb43 add a log line for when client succeeds 2023-05-10 07:37:11 -07:00
Roger Dingledine
4e55f28220 bump up some log messages for easier debugging 2023-05-10 07:37:11 -07:00
Roger Dingledine
8042379c44 new design for handling too many pending rend reqs
now we let ourselves queue up to twice as many as we expect, and when
we get to the limit we make a new pqueue and move over the first n
elements that we like most.

(the old approach, of calling SMARTLIST_DEL_CURRENT_KEEPORDER() on
elements in a pqueue, will destroy its heapify property.)

we also discard elements that are too old, either during the trimming
process or if they come up as the next request to respond to.

lastly, fix a fencepost error on how many rend reqs we would handle
per iteration.
2023-05-10 07:37:11 -07:00
Roger Dingledine
85cba057e7 make a log message clearer about our actual intent 2023-05-10 07:37:11 -07:00
Roger Dingledine
4571faf0c3 pass time around as a parameter
should help with unit testing
2023-05-10 07:37:11 -07:00
David Goulet
047f8c63ee hs: Maximum rend request and trimming of the queue
Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-10 07:37:11 -07:00
David Goulet
bc9fe5a6f8 hs: Handle multiple rend request per mainloop run
Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-10 07:37:11 -07:00
David Goulet
c2f6b057b8 hs: Don't expire RP circuits to HS with PoW
Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-10 07:37:11 -07:00
David Goulet
35227a7a15 trunnel: Centralize the INTRO1 extension type
Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-10 07:37:11 -07:00
David Goulet
4eb783e97b hs: Priority queue for rendezvous requests
If PoW are enabled, use a priority queue by effort for the rendezvous
requests hooked into the mainloop.

Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-10 07:37:11 -07:00
David Goulet
f0b63ca242 hs: Move rendezvous circuit data structure
When parsing an INTRODUCE2 cell, we extract data in order to launch the
rendezvous circuit. This commit creates a data structure just for that
data so it can be used by future commits for prop327 in order to copy
that data over a priority queue instead of the whole intro data data
structure which contains pointers that could dissapear.

Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-10 07:37:11 -07:00
David Goulet
ca74530b40 hs: Setup service side PoW defenses
Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-10 07:37:11 -07:00
David Goulet
8b41e09a77 hs: Client now solve PoW if present
At this commit, the tor main loop solves it. We might consider moving
this to the CPU pool at some point.

Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-10 07:37:11 -07:00
David Goulet
26957b47ac hs: Descriptor support for PoW
Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-10 07:37:11 -07:00
David Goulet
51ce0bb6ef hs: Add solve and verify PoW functions
Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-10 07:37:10 -07:00
David Goulet
c611e328de hs: Add data structure needed for PoW 2023-05-10 07:37:10 -07:00
David Goulet
d79814f1b1 hs: PoW extension encoding
Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-10 07:37:10 -07:00
David Goulet
5ef811b7d0 trunnel: INTRODUCE1 PoW cell extension
Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-10 07:37:10 -07:00
David Goulet
95445f49f1 ext: Add Equi-X library
Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-05-10 07:37:10 -07:00
Mike Perry
9ee71eaf5a CID 1524707: Quiet coverity noise 2023-05-04 16:31:08 +00:00
Mike Perry
bdf4fef2db CID 1524706: Remove dead assignment 2023-05-04 16:31:08 +00:00
Mike Perry
33c3059c82 Handle infinite loop with only one bridge (or snowflake). 2023-05-04 16:31:08 +00:00
Mike Perry
61aa4c3657 Actually count exits with conflux support, rather than relays. 2023-04-18 16:51:07 +00:00
Hans-Christoph Steiner
7415cdefff gitlab-ci: fix apt conf syntax for Acquire::Retries
Acquire is its own group, not a subgroup of APT:
https://manpages.debian.org/buster/apt/apt.conf.5.en.html#THE_ACQUIRE_GROUP
2023-04-11 16:26:36 +02:00
David Goulet
2bb8988629 Fix cases where edge connections can stall.
We discovered two cases where edge connections can stall during testing:
  1. Due to final data sitting in the edge inbuf when it was resumed
  2. Due to flag synchronization between the token bucket and XON/XOFF

The first issue has always existed in C-Tor, but we were able to tickle it
in scp testing. If the last data from the protocol is able to fit in the
inbuf, but not large enough to send, if an XOFF or connection block comes in
at exactly that point, when the edge connection resumes, there will be no
data to read from the socket, but the inbuf can just sit there, never
draining.

We noticed the second issue along the way to finding the first. It seems
wrong, but it didn't seem to affect anything in practice.

These are extremely rare in normal operation, but with conflux, XON/XOFF
activity is more common, so we hit these.

Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-04-06 15:57:11 +00:00
Mike Perry
7c70f713c3 Avoid closing dirty circs with active half-edges
In https://gitlab.torproject.org/tpo/core/tor/-/issues/40623, we changed the
DESTROY propogation to ensure memory was freed quickly at relays. This was a
good move, but it exacerbates the condition where a stream is closed on a
circuit, and then it is immediately closed because it is dirty. This creates a
race between the DESTROY and the last data sent on the stream. This race is
visible in shadow, and does happen.

This could be backported. A better solution to these kinds of problems is to
create an ENDED cell, and not close any circuits until the ENDED comes back.
But this will also require thinking, since this ENDED cell can also get lost,
so some kind of timeout may be needed either way. The ENDED cell could just
allow us to have much longer timeouts for this case.
2023-04-06 15:57:11 +00:00
David Goulet
731a50c8c4 Prop#329: Add conflux to build
Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-04-06 15:57:11 +00:00
Mike Perry
8d4781e730 Prop#329 Tests: Add tests for the conflux pool 2023-04-06 15:57:11 +00:00
David Goulet
39c2927d6f Prop#329 Pool: Handle pre-building and using conflux sets.
Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-04-06 15:57:11 +00:00
Mike Perry
46e473f43e Prop#329 Pool: Avoid sharing Guards and Middles between circuits.
Conflux must not use the same Guard for each leg; nor the same middle for each
leg.
2023-04-06 15:57:11 +00:00
David Goulet
336a24754d Prop#329 Pool: Handle linking, unlinking, and relaunching conflux circuit legs.
Signed-off-by: David Goulet <dgoulet@torproject.org>
2023-04-06 15:57:11 +00:00
Mike Perry
2f865b4bba Prop#329 streams: Handle stream usage with conflux
This adds utility functions to help stream block decisions, as well as cpath
layer_hint checks for stream cell acceptance, and syncing stream lists
for conflux circuits.

These functions are then called throughout the codebase to properly manage
conflux streams.
2023-04-06 15:57:11 +00:00
Mike Perry
21c861bfa3 Refactor stream blocking due to channel cell queues
Streams can get blocked on a circuit in two ways:
  1. When the circuit package window is full
  2. When the channel's cell queue is too high

Conflux needs to decouple stream blocking from both of these conditions,
because streams can continue on another circuit, even if the primary circuit
is blocked for either of these cases.

However, both conflux and congestion control need to know if the channel's
cell queue hit the highwatermark and is still draining, because this condition
is used by those components, independent of stream state.

Therefore, this commit renames the 'streams_blocked_on_chan' variable to
signify that it refers to the cell queue state, and also refactors the actual
stream blocking bits out, so they can be handled separately if conflux is
present.
2023-04-06 15:57:10 +00:00
Mike Perry
a4ee0c29ee Prop#329: Add purposes for conflux circuits
Because UNLINKED circuits must never be used for streams, but LINKED circuits
can be, we want these separate.
2023-04-06 15:57:10 +00:00