Rotate to a new L2 vanguard whenever an existing one loses the
Stable or Fast flag. Previously, we would leave these relays in the
L2 vanguard list but never use them, and if all of our vanguards
end up like this we wouldn't have any middle nodes left to choose
from so we would fail to make onion-related circuits.
Fixes bug 40805; bugfix on 0.4.7.1-alpha.
This addresses issue #40800 and a couple other problems I noticed while
trying to reproduce that one.
The original issue is just a missing cast to void* on the args of
__builtin___clear_cache(), and clang is picky about the implicit cast
between what it considers to be char of different signedness. Original
report is from MacOS but it's also reproducible on other clang targets.
The cmake-based original build system for equix and hashx was a handy
way to run tests, but it suffered from some warnings due to incorrect
application of include_directories().
And lastly, there were some return codes from hashx_exec() that get
ignored on equix when asserts are disabled. It bugged me too much to
just silence this with a (void) cast, since even though this is in the
realm of low-likelyhood programming errors and not true runtime errors, I
don't want to make it easy for the hashx_exec() wrappers to return
values that are dangerously wrong if an error is ignored. I made sure
that even if asserts are disabled, we return values that will cause the
solver and verifier to both fail to validate a potential solution.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
This fixes an "initializer is not a constant" compilation error that manifests
itself on gcc versions < 8.1 and MSVC (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69960#c18).
Fixes bug #40773
Signed-off-by: Gabriela Moldovan <gabi@torproject.org>
This adds 2 histogram metrics for hidden services:
* `tor_hs_rend_circ_build_time` - the rendezvous circuit build time in milliseconds
* `tor_hs_intro_circ_build_time` - the introduction circuit build time in milliseconds
The text representation representation of the new metrics looks like this:
```
# HELP tor_hs_rend_circ_build_time The rendezvous circuit build time in milliseconds
# TYPE tor_hs_rend_circ_build_time histogram
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="1000.00"} 2
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="5000.00"} 10
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="10000.00"} 10
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="30000.00"} 10
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="60000.00"} 10
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="+Inf"} 10
tor_hs_rend_circ_build_time_sum{onion="<elided>"} 10824
tor_hs_rend_circ_build_time_count{onion="<elided>"} 10
# HELP tor_hs_intro_circ_build_time The introduction circuit build time in milliseconds
# TYPE tor_hs_intro_circ_build_time histogram
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="1000.00"} 0
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="5000.00"} 6
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="10000.00"} 6
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="30000.00"} 6
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="60000.00"} 6
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="+Inf"} 6
tor_hs_intro_circ_build_time_sum{onion="<elided>"} 9843
tor_hs_intro_circ_build_time_count{onion="<elided>"} 6
```
Signed-off-by: Gabriela Moldovan <gabi@torproject.org>
This adds a `reason` label to the `hs_intro_rejected_intro_req_count` and
`hs_rdv_error_count` metrics introduced in #40755.
Metric look up and intialization is now more a bit more involved. This may be
fine for now, but it will become unwieldy if/when we add more labels (and as
such will need to be refactored).
Also, in the future, we may want to introduce finer grained `reason` labels.
For example, the `invalid_introduce2` label actually covers multiple types of
errors that can happen during the processing of an INTRODUCE2 cell (such as
cell parse errors, replays, decryption errors).
Signed-off-by: Gabriela Moldovan <gabi@torproject.org>
This introduces a couple of new service side metrics:
* `hs_intro_rejected_intro_req_count`, which counts the number of introduction
requests rejected by the hidden service
* `hs_rdv_error_count`, which counts the number of rendezvous errors as seen by
the hidden service (this number includes the number of circuit establishment
failures, failed retries, end-to-end circuit setup failures)
Closes#40755. This partially addresses #40717.
Signed-off-by: Gabriela Moldovan <gabi@torproject.org>
Directory authorities now include their AuthDirMaxServersPerAddr
config option in the consensus parameter section of their vote. Now
external tools can better predict how they will behave.
In particular, the value should make its way to the
https://consensus-health.torproject.org/#consensusparams page.
Once enough dir auths vote this param, they should also compute a
consensus value for it in the consensus document. Nothing uses this
consensus value yet, but we could imagine having dir auths consult it
in the future.
Implements ticket 40753.
Add new liblzma enums (LZMA_SEEK_NEEDED and LZMA_RET_INTERNAL*)
conditional to the API version they arrived in. The first stable
version of liblzma this affects is 5.4.0
Fixes#40741
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
Add new liblzma enums (LZMA_SEEK_NEEDED and LZMA_RET_INTERNAL*)
conditional to the API version they arrived in. The first stable
version of liblzma this affects is 5.4.0
Fixes#40741
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
Having no TotalBuildTimes along a positive CircuitBuildAbandonedCount
count lead to a segfault. We check for that condition and then BUG + log
warn if that is the case.
It should never happened in theory but if someone modified their state
file, it can lead to this problem so instead of segfaulting, warn.
Fixes#40437
Signed-off-by: David Goulet <dgoulet@torproject.org>
The logic was inverted. Introduced in commit
9155e08450.
This was reported through our bug bounty program on H1. It fixes the
TROVE-2022-002.
Fixes#40730
Signed-off-by: David Goulet <dgoulet@torproject.org>
Rotate the relay identity key and v3 identity key for moria1. They
have been online for more than a decade, there was a known potential
compromise, and anyway refreshing keys periodically is good practice.
Advertise new ports too, to avoid confusion.
Closes ticket 40722.
We cap our number of CPU worker threads to at least 2 even if we have a
single core. But also, before we used to always add one extra thread
regardless of the number of core.
This meant that we were off when re-using the get_num_cpus() function
when calculating our onionskin work overhead because we were always off
by one.
This commit makes it that we always use the number of thread our actual
thread pool was configured with.
Fixes#40719
Signed-off-by: David Goulet <dgoulet@torproject.org>
Cap this to 2 threads always because we need a low and high priority
thread even with a single core.
Fixes#40713
Signed-off-by: David Goulet <dgoulet@torproject.org>
Until now, there was this magic number (64) used as the maximum number
of tasks a CPU worker can take at once.
This commit makes it a consensus parameter so our future selves can
think of a better value depending on network conditions.
Part of #40704
Signed-off-by: David Goulet <dgoulet@torproject.org>
Transform the hardcoded value ONIONQUEUE_WAIT_CUTOFF into a consensus
parameter so we can control it network wide.
Closes#40704
Signed-off-by: David Goulet <dgoulet@torproject.org>
This also incidently removes a use of uninitialized stack data from the
connection_or_set_ext_or_identifier() function.
Fixes#40648
Signed-off-by: David Goulet <dgoulet@torproject.org>
This BUG() was added when the code was written to see if this callback
was ever executed after we marked the handle as EOF. It turns out, it
does, but we handle it gracefully. We can therefore remove the BUG().
Fixes tpo/core/tor#40596.
Remove a harmless "Bug" log message that can happen in
relay_addr_learn_from_dirauth() on relays during startup:
tor_bug_occurred_(): Bug: ../src/feature/relay/relay_find_addr.c:225: relay_addr_learn_from_dirauth: Non-fatal assertion !(!ei) failed. (on Tor 0.4.7.10 )
Bug: Tor 0.4.7.10: Non-fatal assertion !(!ei) failed in relay_addr_learn_from_dirauth at ../src/feature/relay/relay_find_addr.c:225. Stack trace: (on Tor 0.4.7.10 )
Finishes fixing bug 40231.
Fixes bug 40523; bugfix on 0.4.5.4-rc.
Change it to an "unreachable" error so the intro point can be retried
and not flagged as a failure and never retried again.
Closes#40692
Signed-off-by: David Goulet <dgoulet@torproject.org>
This adds two consensus parameters to control the outbound max circuit
queue cell size limit and how many times it is allowed to reach that
limit for a single client IP.
Closes#40680
Signed-off-by: David Goulet <dgoulet@torproject.org>
Directory authorities and relays now interact properly with directory
authorities if they change addresses. In the past, they would continue
to upload votes, signatures, descriptors, etc to the hard-coded address
in the configuration. Now, if the directory authority is listed in
the consensus at a different address, they will direct queries to this
new address.
Specifically, these three activities have changed:
* Posting a vote, a signature, or a relay descriptor to all the dir auths.
* Dir auths fetching missing votes or signatures from all the dir auths.
* Dir auths fetching new descriptors from a specific dir auth when they
just learned about them from that dir auth's vote.
We already do this desired behavior (prefer the address in the consensus,
but fall back to the hard-coded dirservers info if needed) when fetching
missing certs.
There is a fifth case, in router_pick_trusteddirserver(), where clients
and relays are trying to reach a random dir auth to fetch something. I
left that case alone for now because the interaction with fallbackdirs
is complicated.
Implements ticket 40705.
Directory authorities stop voting a consensus "Measured" weight
for relays with the Authority flag. Now these relays will be
considered unmeasured, which should reserve their bandwidth
for their dir auth role and minimize distractions from other roles.
In place of the "Measured" weight, they now include a
"MeasuredButAuthority" weight (not used by anything) so the bandwidth
authority's opinion on this relay can be recorded for posterity.
Resolves ticket 40698.
Change it to an "unreachable" error so the intro point can be retried
and not flagged as a failure and never retried again.
Closes#40692
Signed-off-by: David Goulet <dgoulet@torproject.org>
Bug 1: We were purporting to calculate milliseconds per tick, when we
*should* have been computing ticks per millisecond.
Bug 2: Instead of computing either one of those, we were _actually_
computing femtoseconds per tick.
These two bugs covered for one another on x86 hardware, where 1 tick
== 1 nanosecond. But on M1 OSX, 1 tick is about 41 nanoseconds,
causing surprising results.
Fixes bug 40684; bugfix on 0.3.3.1-alpha.
Patch to address #40673. An additional check has been added to
onion_pending_add() in order to ensure that we avoid counting create
cells from clients.
In the cpuworker.c assign_onionskin_to_cpuworker
method if total_pending_tasks >= max_pending_tasks
and channel_is_client(circ->p_chan) returns false then
rep_hist_note_circuit_handshake_dropped() will be called and
rep_hist_note_circuit_handshake_assigned() will not be called. This
causes relays to run into errors due to the fact that the number of
dropped packets exceeds the total number of assigned packets.
To avoid this situation a check has been added to
onion_pending_add() to ensure that these erroneous calls to
rep_hist_note_circuit_handshake_dropped() are not made.
See the #40673 ticket for the conversation with armadev about this issue.
mike is concerned that we would get too much exposure to adversaries,
if we enforce that none of our L2 guards can be in the same family.
this change set now essentially finishes the feature that commit a77727cdc
was attempting to add, but strips the "_and_family" part of that plan.
We had omitted some checks for whether our vanguards (second layer
guards from proposal 333) overlapped or came from the same family.
Now make sure to pick each of them to be independent.
Fixes bug 40639; bugfix on 0.4.7.1-alpha.
Remove UPTIME_TO_GUARANTEE_STABLE, MTBF_TO_GUARANTEE_STABLE,
TIME_KNOWN_TO_GUARANTEE_FAMILIAR WFU_TO_GUARANTEE_GUARD and replace each
of them with a tunnable torrc option.
Related to #40652
Signed-off-by: David Goulet <dgoulet@torproject.org>
Previously, `channelpadding_get_netflow_inactive_timeout_ms` would
crash with an assertion failure if `low_timeout` was greater than
`high_timeout`. That wasn't possible in practice because of checks
in `channelpadding_update_padding_for_channel`, but it's better not
to have a function whose correctness is this tricky to prove.
Fixes#40645. Bugfix on 0.3.1.1-alpha.
Note that with this commit, TRUNCATED cells won't be used anymore that
is client and relays won't emit them.
Fixes#40623
Signed-off-by: David Goulet <dgoulet@torproject.org>
This also incidently removes a use of uninitialized stack data from the
connection_or_set_ext_or_identifier() function.
Fixes#40648
Signed-off-by: David Goulet <dgoulet@torproject.org>