our pqueue implementation does bizarre unspecified things with
ordering of elements that are equal. it certainly doesn't do any
sort of "first in first out" property that i was expecting.
now make it explicit by saying that "equal-effort, added-earlier" is
higher priority.
specifically, if we have 16 in-flight rend circs, and the next
one at the top of the pqueue is lower than our suggested effort,
then don't launch it yet.
this way we always launch adequate-effort requests immediately, and
we always handle *some* low-effort requests, but we are ready at any
moment to handle a few new adequate-effort requests.
this change makes us reach the callback *after* each mainloop
run, rather than as the next event to run immediately after
activation.
with the old behavior, we were starving everything else to drain the
pqueue entirely, each time we got a new intro2 cell.
now we at least will get to other activities as well.
now we let ourselves queue up to twice as many as we expect, and when
we get to the limit we make a new pqueue and move over the first n
elements that we like most.
(the old approach, of calling SMARTLIST_DEL_CURRENT_KEEPORDER() on
elements in a pqueue, will destroy its heapify property.)
we also discard elements that are too old, either during the trimming
process or if they come up as the next request to respond to.
lastly, fix a fencepost error on how many rend reqs we would handle
per iteration.
If PoW are enabled, use a priority queue by effort for the rendezvous
requests hooked into the mainloop.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When parsing an INTRODUCE2 cell, we extract data in order to launch the
rendezvous circuit. This commit creates a data structure just for that
data so it can be used by future commits for prop327 in order to copy
that data over a priority queue instead of the whole intro data data
structure which contains pointers that could dissapear.
Signed-off-by: David Goulet <dgoulet@torproject.org>
At this commit, the tor main loop solves it. We might consider moving
this to the CPU pool at some point.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This adds utility functions to help stream block decisions, as well as cpath
layer_hint checks for stream cell acceptance, and syncing stream lists
for conflux circuits.
These functions are then called throughout the codebase to properly manage
conflux streams.
This adds 2 histogram metrics for hidden services:
* `tor_hs_rend_circ_build_time` - the rendezvous circuit build time in milliseconds
* `tor_hs_intro_circ_build_time` - the introduction circuit build time in milliseconds
The text representation representation of the new metrics looks like this:
```
# HELP tor_hs_rend_circ_build_time The rendezvous circuit build time in milliseconds
# TYPE tor_hs_rend_circ_build_time histogram
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="1000.00"} 2
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="5000.00"} 10
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="10000.00"} 10
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="30000.00"} 10
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="60000.00"} 10
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="+Inf"} 10
tor_hs_rend_circ_build_time_sum{onion="<elided>"} 10824
tor_hs_rend_circ_build_time_count{onion="<elided>"} 10
# HELP tor_hs_intro_circ_build_time The introduction circuit build time in milliseconds
# TYPE tor_hs_intro_circ_build_time histogram
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="1000.00"} 0
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="5000.00"} 6
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="10000.00"} 6
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="30000.00"} 6
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="60000.00"} 6
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="+Inf"} 6
tor_hs_intro_circ_build_time_sum{onion="<elided>"} 9843
tor_hs_intro_circ_build_time_count{onion="<elided>"} 6
```
Signed-off-by: Gabriela Moldovan <gabi@torproject.org>
This adds a `reason` label to the `hs_intro_rejected_intro_req_count` and
`hs_rdv_error_count` metrics introduced in #40755.
Metric look up and intialization is now more a bit more involved. This may be
fine for now, but it will become unwieldy if/when we add more labels (and as
such will need to be refactored).
Also, in the future, we may want to introduce finer grained `reason` labels.
For example, the `invalid_introduce2` label actually covers multiple types of
errors that can happen during the processing of an INTRODUCE2 cell (such as
cell parse errors, replays, decryption errors).
Signed-off-by: Gabriela Moldovan <gabi@torproject.org>
This introduces a couple of new service side metrics:
* `hs_intro_rejected_intro_req_count`, which counts the number of introduction
requests rejected by the hidden service
* `hs_rdv_error_count`, which counts the number of rendezvous errors as seen by
the hidden service (this number includes the number of circuit establishment
failures, failed retries, end-to-end circuit setup failures)
Closes#40755. This partially addresses #40717.
Signed-off-by: Gabriela Moldovan <gabi@torproject.org>
Directory authorities now include their AuthDirMaxServersPerAddr
config option in the consensus parameter section of their vote. Now
external tools can better predict how they will behave.
In particular, the value should make its way to the
https://consensus-health.torproject.org/#consensusparams page.
Once enough dir auths vote this param, they should also compute a
consensus value for it in the consensus document. Nothing uses this
consensus value yet, but we could imagine having dir auths consult it
in the future.
Implements ticket 40753.
This updates the docs to stop suggesting `pk` can be NULL, as that doesn't seem
to be the case anymore (`tor_assert(pk)`).
Signed-off-by: Gabriela Moldovan <gabi@torproject.org>
We cap our number of CPU worker threads to at least 2 even if we have a
single core. But also, before we used to always add one extra thread
regardless of the number of core.
This meant that we were off when re-using the get_num_cpus() function
when calculating our onionskin work overhead because we were always off
by one.
This commit makes it that we always use the number of thread our actual
thread pool was configured with.
Fixes#40719
Signed-off-by: David Goulet <dgoulet@torproject.org>
Created and Rejected connections are ever going up counters. While
Opened connections are gauges going up and down.
Fixes#40712
Signed-off-by: David Goulet <dgoulet@torproject.org>
This change mitigates DNS-based website oracles by making the time that
a domain name is cached uncertain (+- 4 minutes of what's measurable).
Resolves TROVE-2021-009.
Fixes#40674
This is part of the fast path so we need to cache consensus parameters
instead of querying it everytime we need to learn a value.
Part of #40704
Signed-off-by: David Goulet <dgoulet@torproject.org>
Until now, there was this magic number (64) used as the maximum number
of tasks a CPU worker can take at once.
This commit makes it a consensus parameter so our future selves can
think of a better value depending on network conditions.
Part of #40704
Signed-off-by: David Goulet <dgoulet@torproject.org>
Transform the hardcoded value ONIONQUEUE_WAIT_CUTOFF into a consensus
parameter so we can control it network wide.
Closes#40704
Signed-off-by: David Goulet <dgoulet@torproject.org>
This also incidently removes a use of uninitialized stack data from the
connection_or_set_ext_or_identifier() function.
Fixes#40648
Signed-off-by: David Goulet <dgoulet@torproject.org>
Remove a harmless "Bug" log message that can happen in
relay_addr_learn_from_dirauth() on relays during startup:
tor_bug_occurred_(): Bug: ../src/feature/relay/relay_find_addr.c:225: relay_addr_learn_from_dirauth: Non-fatal assertion !(!ei) failed. (on Tor 0.4.7.10 )
Bug: Tor 0.4.7.10: Non-fatal assertion !(!ei) failed in relay_addr_learn_from_dirauth at ../src/feature/relay/relay_find_addr.c:225. Stack trace: (on Tor 0.4.7.10 )
Finishes fixing bug 40231.
Fixes bug 40523; bugfix on 0.4.5.4-rc.
Move the retry from circuit_expire_building() to when the offending
circuit is being closed.
Fixes#40695
Signed-off-by: David Goulet <dgoulet@torproject.org>
Logic is too convoluted and we can't efficiently apply a specific
timeout depending on the purpose.
Remove it and instead rely on the right circuit cutoff instead of
keeping this flagged circuit open forever.
Part of #40694
Signed-off-by: David Goulet <dgoulet@torproject.org>
Change it to an "unreachable" error so the intro point can be retried
and not flagged as a failure and never retried again.
Closes#40692
Signed-off-by: David Goulet <dgoulet@torproject.org>
Directory authorities and relays now interact properly with directory
authorities if they change addresses. In the past, they would continue
to upload votes, signatures, descriptors, etc to the hard-coded address
in the configuration. Now, if the directory authority is listed in
the consensus at a different address, they will direct queries to this
new address.
Specifically, these three activities have changed:
* Posting a vote, a signature, or a relay descriptor to all the dir auths.
* Dir auths fetching missing votes or signatures from all the dir auths.
* Dir auths fetching new descriptors from a specific dir auth when they
just learned about them from that dir auth's vote.
We already do this desired behavior (prefer the address in the consensus,
but fall back to the hard-coded dirservers info if needed) when fetching
missing certs.
There is a fifth case, in router_pick_trusteddirserver(), where clients
and relays are trying to reach a random dir auth to fetch something. I
left that case alone for now because the interaction with fallbackdirs
is complicated.
Implements ticket 40705.
Directory authorities stop voting a consensus "Measured" weight
for relays with the Authority flag. Now these relays will be
considered unmeasured, which should reserve their bandwidth
for their dir auth role and minimize distractions from other roles.
In place of the "Measured" weight, they now include a
"MeasuredButAuthority" weight (not used by anything) so the bandwidth
authority's opinion on this relay can be recorded for posterity.
Resolves ticket 40698.
The AuthDirDontVoteOnDirAuthBandwidth torrc option never worked, and it
was implemented in a way that could have produced consensus conflicts
if it had.
Resolves bug 40700.
Move the retry from circuit_expire_building() to when the offending
circuit is being closed.
Fixes#40695
Signed-off-by: David Goulet <dgoulet@torproject.org>
Logic is too convoluted and we can't efficiently apply a specific
timeout depending on the purpose.
Remove it and instead rely on the right circuit cutoff instead of
keeping this flagged circuit open forever.
Part of #40694
Signed-off-by: David Goulet <dgoulet@torproject.org>
Change it to an "unreachable" error so the intro point can be retried
and not flagged as a failure and never retried again.
Closes#40692
Signed-off-by: David Goulet <dgoulet@torproject.org>
Patch to address #40673. An additional check has been added to
onion_pending_add() in order to ensure that we avoid counting create
cells from clients.
In the cpuworker.c assign_onionskin_to_cpuworker
method if total_pending_tasks >= max_pending_tasks
and channel_is_client(circ->p_chan) returns false then
rep_hist_note_circuit_handshake_dropped() will be called and
rep_hist_note_circuit_handshake_assigned() will not be called. This
causes relays to run into errors due to the fact that the number of
dropped packets exceeds the total number of assigned packets.
To avoid this situation a check has been added to
onion_pending_add() to ensure that these erroneous calls to
rep_hist_note_circuit_handshake_dropped() are not made.
See the #40673 ticket for the conversation with armadev about this issue.
mike is concerned that we would get too much exposure to adversaries,
if we enforce that none of our L2 guards can be in the same family.
this change set now essentially finishes the feature that commit a77727cdc
was attempting to add, but strips the "_and_family" part of that plan.
We had omitted some checks for whether our vanguards (second layer
guards from proposal 333) overlapped or came from the same family.
Now make sure to pick each of them to be independent.
Fixes bug 40639; bugfix on 0.4.7.1-alpha.
Remove UPTIME_TO_GUARANTEE_STABLE, MTBF_TO_GUARANTEE_STABLE,
TIME_KNOWN_TO_GUARANTEE_FAMILIAR WFU_TO_GUARANTEE_GUARD and replace each
of them with a tunnable torrc option.
Related to #40652
Signed-off-by: David Goulet <dgoulet@torproject.org>
This also incidently removes a use of uninitialized stack data from the
connection_or_set_ext_or_identifier() function.
Fixes#40648
Signed-off-by: David Goulet <dgoulet@torproject.org>