our pqueue implementation does bizarre unspecified things with
ordering of elements that are equal. it certainly doesn't do any
sort of "first in first out" property that i was expecting.
now make it explicit by saying that "equal-effort, added-earlier" is
higher priority.
specifically, if we have 16 in-flight rend circs, and the next
one at the top of the pqueue is lower than our suggested effort,
then don't launch it yet.
this way we always launch adequate-effort requests immediately, and
we always handle *some* low-effort requests, but we are ready at any
moment to handle a few new adequate-effort requests.
this change makes us reach the callback *after* each mainloop
run, rather than as the next event to run immediately after
activation.
with the old behavior, we were starving everything else to drain the
pqueue entirely, each time we got a new intro2 cell.
now we at least will get to other activities as well.
now we let ourselves queue up to twice as many as we expect, and when
we get to the limit we make a new pqueue and move over the first n
elements that we like most.
(the old approach, of calling SMARTLIST_DEL_CURRENT_KEEPORDER() on
elements in a pqueue, will destroy its heapify property.)
we also discard elements that are too old, either during the trimming
process or if they come up as the next request to respond to.
lastly, fix a fencepost error on how many rend reqs we would handle
per iteration.
If PoW are enabled, use a priority queue by effort for the rendezvous
requests hooked into the mainloop.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When parsing an INTRODUCE2 cell, we extract data in order to launch the
rendezvous circuit. This commit creates a data structure just for that
data so it can be used by future commits for prop327 in order to copy
that data over a priority queue instead of the whole intro data data
structure which contains pointers that could dissapear.
Signed-off-by: David Goulet <dgoulet@torproject.org>
At this commit, the tor main loop solves it. We might consider moving
this to the CPU pool at some point.
Signed-off-by: David Goulet <dgoulet@torproject.org>
We discovered two cases where edge connections can stall during testing:
1. Due to final data sitting in the edge inbuf when it was resumed
2. Due to flag synchronization between the token bucket and XON/XOFF
The first issue has always existed in C-Tor, but we were able to tickle it
in scp testing. If the last data from the protocol is able to fit in the
inbuf, but not large enough to send, if an XOFF or connection block comes in
at exactly that point, when the edge connection resumes, there will be no
data to read from the socket, but the inbuf can just sit there, never
draining.
We noticed the second issue along the way to finding the first. It seems
wrong, but it didn't seem to affect anything in practice.
These are extremely rare in normal operation, but with conflux, XON/XOFF
activity is more common, so we hit these.
Signed-off-by: David Goulet <dgoulet@torproject.org>
In https://gitlab.torproject.org/tpo/core/tor/-/issues/40623, we changed the
DESTROY propogation to ensure memory was freed quickly at relays. This was a
good move, but it exacerbates the condition where a stream is closed on a
circuit, and then it is immediately closed because it is dirty. This creates a
race between the DESTROY and the last data sent on the stream. This race is
visible in shadow, and does happen.
This could be backported. A better solution to these kinds of problems is to
create an ENDED cell, and not close any circuits until the ENDED comes back.
But this will also require thinking, since this ENDED cell can also get lost,
so some kind of timeout may be needed either way. The ENDED cell could just
allow us to have much longer timeouts for this case.
This adds utility functions to help stream block decisions, as well as cpath
layer_hint checks for stream cell acceptance, and syncing stream lists
for conflux circuits.
These functions are then called throughout the codebase to properly manage
conflux streams.
Streams can get blocked on a circuit in two ways:
1. When the circuit package window is full
2. When the channel's cell queue is too high
Conflux needs to decouple stream blocking from both of these conditions,
because streams can continue on another circuit, even if the primary circuit
is blocked for either of these cases.
However, both conflux and congestion control need to know if the channel's
cell queue hit the highwatermark and is still draining, because this condition
is used by those components, independent of stream state.
Therefore, this commit renames the 'streams_blocked_on_chan' variable to
signify that it refers to the cell queue state, and also refactors the actual
stream blocking bits out, so they can be handled separately if conflux is
present.
Because circuit-level sendmes are sent before relay data cells are processed,
we can safely move this to before the conflux decision. In this way,
regardless of conflux being negotiated, we still send sendmes as soon as data
cells are recieved. This avoids introducing conflux queue delay into RTT
measurement, which is important for measuring actual circuit capacity.
The circuit-level tracking must happen inside the call to send a data cell,
since that call now chooses a circuit to send on. Turns out, we were already
doing this kind of here, but only for the digest. Now we do both things here.
We use the oldest-circ-first method here, since that seems good for conflux:
queues could briefly spike, but the bad case is if they are maliciously
bloated to stick around for a long time.
The tradeoff here is that it is possible to kill old circuits on a relay
quickly, but that has always been the case with this algorithm choice.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This adds 2 histogram metrics for hidden services:
* `tor_hs_rend_circ_build_time` - the rendezvous circuit build time in milliseconds
* `tor_hs_intro_circ_build_time` - the introduction circuit build time in milliseconds
The text representation representation of the new metrics looks like this:
```
# HELP tor_hs_rend_circ_build_time The rendezvous circuit build time in milliseconds
# TYPE tor_hs_rend_circ_build_time histogram
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="1000.00"} 2
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="5000.00"} 10
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="10000.00"} 10
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="30000.00"} 10
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="60000.00"} 10
tor_hs_rend_circ_build_time_bucket{onion="<elided>",le="+Inf"} 10
tor_hs_rend_circ_build_time_sum{onion="<elided>"} 10824
tor_hs_rend_circ_build_time_count{onion="<elided>"} 10
# HELP tor_hs_intro_circ_build_time The introduction circuit build time in milliseconds
# TYPE tor_hs_intro_circ_build_time histogram
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="1000.00"} 0
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="5000.00"} 6
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="10000.00"} 6
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="30000.00"} 6
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="60000.00"} 6
tor_hs_intro_circ_build_time_bucket{onion="<elided>",le="+Inf"} 6
tor_hs_intro_circ_build_time_sum{onion="<elided>"} 9843
tor_hs_intro_circ_build_time_count{onion="<elided>"} 6
```
Signed-off-by: Gabriela Moldovan <gabi@torproject.org>
This adds a `reason` label to the `hs_intro_rejected_intro_req_count` and
`hs_rdv_error_count` metrics introduced in #40755.
Metric look up and intialization is now more a bit more involved. This may be
fine for now, but it will become unwieldy if/when we add more labels (and as
such will need to be refactored).
Also, in the future, we may want to introduce finer grained `reason` labels.
For example, the `invalid_introduce2` label actually covers multiple types of
errors that can happen during the processing of an INTRODUCE2 cell (such as
cell parse errors, replays, decryption errors).
Signed-off-by: Gabriela Moldovan <gabi@torproject.org>
This introduces a couple of new service side metrics:
* `hs_intro_rejected_intro_req_count`, which counts the number of introduction
requests rejected by the hidden service
* `hs_rdv_error_count`, which counts the number of rendezvous errors as seen by
the hidden service (this number includes the number of circuit establishment
failures, failed retries, end-to-end circuit setup failures)
Closes#40755. This partially addresses #40717.
Signed-off-by: Gabriela Moldovan <gabi@torproject.org>
Directory authorities now include their AuthDirMaxServersPerAddr
config option in the consensus parameter section of their vote. Now
external tools can better predict how they will behave.
In particular, the value should make its way to the
https://consensus-health.torproject.org/#consensusparams page.
Once enough dir auths vote this param, they should also compute a
consensus value for it in the consensus document. Nothing uses this
consensus value yet, but we could imagine having dir auths consult it
in the future.
Implements ticket 40753.
This updates the docs to stop suggesting `pk` can be NULL, as that doesn't seem
to be the case anymore (`tor_assert(pk)`).
Signed-off-by: Gabriela Moldovan <gabi@torproject.org>
When I copied this to arti, I messed up and thought that the default
time period was 1440 seconds for some weird testing reason. That led
to confusion.
This commit adds a test case that time period 1440 is May 20, 1973:
now arti and c tor match!
Add new liblzma enums (LZMA_SEEK_NEEDED and LZMA_RET_INTERNAL*)
conditional to the API version they arrived in. The first stable
version of liblzma this affects is 5.4.0
Fixes#40741
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
Add new liblzma enums (LZMA_SEEK_NEEDED and LZMA_RET_INTERNAL*)
conditional to the API version they arrived in. The first stable
version of liblzma this affects is 5.4.0
Fixes#40741
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
If a circuit only sends a tiny amount of data such that its cwnd is not
full, it won't increase its cwnd above the minimum. Since slow start circuits
should never hit the minimum otherwise, we can just ignore them for RTT reset
to handle this.
Since these are derived from the number of SENDMEs in a cwnd/cc update,
and a cwnd should not exceed ~10k, there's plenty of room in uint16_t
for them, even if the network gets significantly faster.
Also provides some stickiness, so that once full, the congestion window is
considered still full for the rest of an update cycle, or the entire
congestion window.
In this way, we avoid increasing the congestion window if it is not fully
utilized, but we can still back off in this case. This substantially reduces
queue use in Shadow.
Since these are derived from the number of SENDMEs in a cwnd/cc update,
and a cwnd should not exceed ~10k, there's plenty of room in uint16_t
for them, even if the network gets significantly faster.
Also provides some stickiness, so that once full, the congestion window is
considered still full for the rest of an update cycle, or the entire
congestion window.
In this way, we avoid increasing the congestion window if it is not fully
utilized, but we can still back off in this case. This substantially reduces
queue use in Shadow.
Having no TotalBuildTimes along a positive CircuitBuildAbandonedCount
count lead to a segfault. We check for that condition and then BUG + log
warn if that is the case.
It should never happened in theory but if someone modified their state
file, it can lead to this problem so instead of segfaulting, warn.
Fixes#40437
Signed-off-by: David Goulet <dgoulet@torproject.org>
The logic was inverted. Introduced in commit
9155e08450.
This was reported through our bug bounty program on H1. It fixes the
TROVE-2022-002.
Fixes#40730
Signed-off-by: David Goulet <dgoulet@torproject.org>
We need the fallbackdir file to be the same so our release CI can
generate a new list and apply it uniformly on all series.
(Same as geoip)
Signed-off-by: David Goulet <dgoulet@torproject.org>
We need all geoip files to be the same so our release CI can generate a
new list and apply it uniformly on all series.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Rotate the relay identity key and v3 identity key for moria1. They
have been online for more than a decade, there was a known potential
compromise, and anyway refreshing keys periodically is good practice.
Advertise new ports too, to avoid confusion.
Closes ticket 40722.
This change mitigates DNS-based website oracles by making the time that
a domain name is cached uncertain (+- 4 minutes of what's measurable).
Resolves TROVE-2021-009.
Fixes#40674
We cap our number of CPU worker threads to at least 2 even if we have a
single core. But also, before we used to always add one extra thread
regardless of the number of core.
This meant that we were off when re-using the get_num_cpus() function
when calculating our onionskin work overhead because we were always off
by one.
This commit makes it that we always use the number of thread our actual
thread pool was configured with.
Fixes#40719
Signed-off-by: David Goulet <dgoulet@torproject.org>
Cap this to 2 threads always because we need a low and high priority
thread even with a single core.
Fixes#40713
Signed-off-by: David Goulet <dgoulet@torproject.org>
Created and Rejected connections are ever going up counters. While
Opened connections are gauges going up and down.
Fixes#40712
Signed-off-by: David Goulet <dgoulet@torproject.org>
This change mitigates DNS-based website oracles by making the time that
a domain name is cached uncertain (+- 4 minutes of what's measurable).
Resolves TROVE-2021-009.
Fixes#40674
This is part of the fast path so we need to cache consensus parameters
instead of querying it everytime we need to learn a value.
Part of #40704
Signed-off-by: David Goulet <dgoulet@torproject.org>
Until now, there was this magic number (64) used as the maximum number
of tasks a CPU worker can take at once.
This commit makes it a consensus parameter so our future selves can
think of a better value depending on network conditions.
Part of #40704
Signed-off-by: David Goulet <dgoulet@torproject.org>
Transform the hardcoded value ONIONQUEUE_WAIT_CUTOFF into a consensus
parameter so we can control it network wide.
Closes#40704
Signed-off-by: David Goulet <dgoulet@torproject.org>
This also incidently removes a use of uninitialized stack data from the
connection_or_set_ext_or_identifier() function.
Fixes#40648
Signed-off-by: David Goulet <dgoulet@torproject.org>
* it also uses sys/param.h to track its version;
* present that to tor_libc_get_version_str() as libc version;
while here, we also fix the return of FreeBSD version
* __FreeBSD_version is the correct var tracking the OSVERSION
* we use OSVERSION here (defined by __FreeBSD__);
* it's part of the <sys/param.h> include;
* that tracks all noteworthy changes made to the base system.
* __BSD_VISIBLE is defined by systems like FreeBSD and OpenBSD;
* that also extends to DragonFlyBSD;
* it's used on stdlib.h and ctypes.h on those systems.
This BUG() was added when the code was written to see if this callback
was ever executed after we marked the handle as EOF. It turns out, it
does, but we handle it gracefully. We can therefore remove the BUG().
Fixes tpo/core/tor#40596.
Remove a harmless "Bug" log message that can happen in
relay_addr_learn_from_dirauth() on relays during startup:
tor_bug_occurred_(): Bug: ../src/feature/relay/relay_find_addr.c:225: relay_addr_learn_from_dirauth: Non-fatal assertion !(!ei) failed. (on Tor 0.4.7.10 )
Bug: Tor 0.4.7.10: Non-fatal assertion !(!ei) failed in relay_addr_learn_from_dirauth at ../src/feature/relay/relay_find_addr.c:225. Stack trace: (on Tor 0.4.7.10 )
Finishes fixing bug 40231.
Fixes bug 40523; bugfix on 0.4.5.4-rc.
Move the retry from circuit_expire_building() to when the offending
circuit is being closed.
Fixes#40695
Signed-off-by: David Goulet <dgoulet@torproject.org>
Logic is too convoluted and we can't efficiently apply a specific
timeout depending on the purpose.
Remove it and instead rely on the right circuit cutoff instead of
keeping this flagged circuit open forever.
Part of #40694
Signed-off-by: David Goulet <dgoulet@torproject.org>
Explicitly set the S_CONNECT_REND purpose to a 4-hop cutoff.
As for the established rendezvous circuit waiting on the RENDEZVOUS2,
set one that is very long considering the possible waiting time for the
service to get the request and join our rendezvous.
Part of #40694
Signed-off-by: David Goulet <dgoulet@torproject.org>
Change it to an "unreachable" error so the intro point can be retried
and not flagged as a failure and never retried again.
Closes#40692
Signed-off-by: David Goulet <dgoulet@torproject.org>
This adds two consensus parameters to control the outbound max circuit
queue cell size limit and how many times it is allowed to reach that
limit for a single client IP.
Closes#40680
Signed-off-by: David Goulet <dgoulet@torproject.org>
Directory authorities and relays now interact properly with directory
authorities if they change addresses. In the past, they would continue
to upload votes, signatures, descriptors, etc to the hard-coded address
in the configuration. Now, if the directory authority is listed in
the consensus at a different address, they will direct queries to this
new address.
Specifically, these three activities have changed:
* Posting a vote, a signature, or a relay descriptor to all the dir auths.
* Dir auths fetching missing votes or signatures from all the dir auths.
* Dir auths fetching new descriptors from a specific dir auth when they
just learned about them from that dir auth's vote.
We already do this desired behavior (prefer the address in the consensus,
but fall back to the hard-coded dirservers info if needed) when fetching
missing certs.
There is a fifth case, in router_pick_trusteddirserver(), where clients
and relays are trying to reach a random dir auth to fetch something. I
left that case alone for now because the interaction with fallbackdirs
is complicated.
Implements ticket 40705.
Directory authorities stop voting a consensus "Measured" weight
for relays with the Authority flag. Now these relays will be
considered unmeasured, which should reserve their bandwidth
for their dir auth role and minimize distractions from other roles.
In place of the "Measured" weight, they now include a
"MeasuredButAuthority" weight (not used by anything) so the bandwidth
authority's opinion on this relay can be recorded for posterity.
Resolves ticket 40698.
The AuthDirDontVoteOnDirAuthBandwidth torrc option never worked, and it
was implemented in a way that could have produced consensus conflicts
if it had.
Resolves bug 40700.
Move the retry from circuit_expire_building() to when the offending
circuit is being closed.
Fixes#40695
Signed-off-by: David Goulet <dgoulet@torproject.org>
Logic is too convoluted and we can't efficiently apply a specific
timeout depending on the purpose.
Remove it and instead rely on the right circuit cutoff instead of
keeping this flagged circuit open forever.
Part of #40694
Signed-off-by: David Goulet <dgoulet@torproject.org>
Explicitly set the S_CONNECT_REND purpose to a 4-hop cutoff.
As for the established rendezvous circuit waiting on the RENDEZVOUS2,
set one that is very long considering the possible waiting time for the
service to get the request and join our rendezvous.
Part of #40694
Signed-off-by: David Goulet <dgoulet@torproject.org>
Change it to an "unreachable" error so the intro point can be retried
and not flagged as a failure and never retried again.
Closes#40692
Signed-off-by: David Goulet <dgoulet@torproject.org>
Bug 1: We were purporting to calculate milliseconds per tick, when we
*should* have been computing ticks per millisecond.
Bug 2: Instead of computing either one of those, we were _actually_
computing femtoseconds per tick.
These two bugs covered for one another on x86 hardware, where 1 tick
== 1 nanosecond. But on M1 OSX, 1 tick is about 41 nanoseconds,
causing surprising results.
Fixes bug 40684; bugfix on 0.3.3.1-alpha.
Patch to address #40673. An additional check has been added to
onion_pending_add() in order to ensure that we avoid counting create
cells from clients.
In the cpuworker.c assign_onionskin_to_cpuworker
method if total_pending_tasks >= max_pending_tasks
and channel_is_client(circ->p_chan) returns false then
rep_hist_note_circuit_handshake_dropped() will be called and
rep_hist_note_circuit_handshake_assigned() will not be called. This
causes relays to run into errors due to the fact that the number of
dropped packets exceeds the total number of assigned packets.
To avoid this situation a check has been added to
onion_pending_add() to ensure that these erroneous calls to
rep_hist_note_circuit_handshake_dropped() are not made.
See the #40673 ticket for the conversation with armadev about this issue.
We need to tune these, but we're not likely to need the subtle differences
between a few of them. Removing them will prevent our consensus parameter
string from becoming too long in the event of tuning.
mike is concerned that we would get too much exposure to adversaries,
if we enforce that none of our L2 guards can be in the same family.
this change set now essentially finishes the feature that commit a77727cdc
was attempting to add, but strips the "_and_family" part of that plan.
We had omitted some checks for whether our vanguards (second layer
guards from proposal 333) overlapped or came from the same family.
Now make sure to pick each of them to be independent.
Fixes bug 40639; bugfix on 0.4.7.1-alpha.
Remove UPTIME_TO_GUARANTEE_STABLE, MTBF_TO_GUARANTEE_STABLE,
TIME_KNOWN_TO_GUARANTEE_FAMILIAR WFU_TO_GUARANTEE_GUARD and replace each
of them with a tunnable torrc option.
Related to #40652
Signed-off-by: David Goulet <dgoulet@torproject.org>
Previously, `channelpadding_get_netflow_inactive_timeout_ms` would
crash with an assertion failure if `low_timeout` was greater than
`high_timeout`. That wasn't possible in practice because of checks
in `channelpadding_update_padding_for_channel`, but it's better not
to have a function whose correctness is this tricky to prove.
Fixes#40645. Bugfix on 0.3.1.1-alpha.
Note that with this commit, TRUNCATED cells won't be used anymore that
is client and relays won't emit them.
Fixes#40623
Signed-off-by: David Goulet <dgoulet@torproject.org>
This also incidently removes a use of uninitialized stack data from the
connection_or_set_ext_or_identifier() function.
Fixes#40648
Signed-off-by: David Goulet <dgoulet@torproject.org>
LibreSSL is now closer to OpenSSL 1.1 than OpenSSL 1.0. According to
https://undeadly.org/cgi?action=article;sid=20220116121253, this is the
intention of OpenBSD developers.
According to #40630, many special cases are needed to compile Tor against
LibreSSL 3.5 when using Tor's OpenSSL 1.0 compatibility mode, whereas only a
small number of #defines are required when using OpenSSL 1.1 compatibility
mode. One additional workaround is required for LibreSSL 3.4 compatibility.
Compiles and passes unit tests with LibreSSL 3.4.3 and 3.5.1.
The escaped() function and its kin already wrap their output in
quotes: there's no reason to do so twice.
I am _NOT_ making a corresponding change in calls that make the same
mistake in controller-related functions, however, due to the risk of
a compatibility break. :(
Closes#22723.
Update the sandbox implementation to allow its use with fragile hardening
enabled on AArch64 (ARM64) and other architectures that use Linux's generic
syscall interface. Note that in this configuration the sandbox is completely
unable to filter requests to open files and directories.
Update the sandbox unit tests to match.
On architectures that use Linux's generic syscall interface the legacy "chown"
call is not available; on these systems glibc uses "fchownat" instead. Modify
the sandbox implementation to match.
On architectures that use Linux's generic syscall interface the legacy "chmod"
call is not available; on these systems glibc uses "fchmodat" instead. Modify
the sandbox implementation to match.
On architectures that use Linux's generic syscall interface the legacy "stat"
and "stat64" calls may not be available; on these systems glibc uses
"newfstatat" instead. Modify the sandbox implementation to match.
Note that on these architectures as on others glibc 2.33 uses "newfstatat" in a
way the sandbox cannot filter, so preserve in add_noparam_filter() the code
that allows the use of this syscall without restriction when glibc version 2.33
is in use.
On architectures where Linux does not provide the legacy "rename" syscall it
offers one or both of "renameat" and "renameat2" instead. Follow glibc's logic
in selecting which syscall to filter.
On architectures where Linux does not provide the legacy "open" syscall glibc
necessarily uses "openat" instead. Omit the unnecessary glibc-version check on
these systems.
For some syscalls the kernel ABI uses 32 bit signed integers. Whether
these 32 bit integer values are sign extended or zero extended to the
native 64 bit register sizes is undefined and dependent on the {arch,
compiler, libc} being used. Instead of trying to detect which cases
zero-extend and which cases sign-extend, this commit uses a masked
equality check on the lower 32 bits of the value.
The chown/chmod/rename syscalls have never existed on AArch64, and libc
implements the POSIX functions via the fchownat/fchmodat/renameat
syscalls instead.
Add new filter functions for fchownat/fchmodat/renameat, not made
architecture specific since the syscalls exists everywhere else too.
However, in order to limit seccomp filter space usage, we only insert
rules for one of {chown, chown32, fchownat} depending on the
architecture (resp. {chmod, fchmodat}, {rename, renameat}).
New glibc versions not sign-extending 32 bit negative constants seems to
not be a thing on AArch64. I suspect that this might not be the only
architecture where the sign-extensions is happening, and the correct fix
might be instead to use a proper 32 bit comparison for the first openat
parameter. For now, band-aid fix this so the sandbox can work again on
AArch64.
Not revalidating keys on every fork speeds up make test from about 45 seconds
to 10 seconds with OpenSSL 1.1.1n and from 6 minutes to 10 seconds with OpenSSL
3.0.2.
MSVC compilation has been broken since at least 1e417b7275 ("All remaining
files in src/common belong to the event loop.") deleted
src/common/Makefile.nmake in 2018.
This rule has not been used since 4ead083dbc ("Do not ship a
fallback-consensus until the related bugs are fixed.") in 2008, and
fallback-consensus support was removed in f742b33d85 ("Drop
FallbackNetworkstatusFile; it never worked.").
Using tor_free is wrong; event_free must be called for objects obtained from
event_new. Additionally, this slightly simplifies the code.
Also, add a static_assert to prevent further instances.