Move the retry from circuit_expire_building() to when the offending
circuit is being closed.
Fixes#40695
Signed-off-by: David Goulet <dgoulet@torproject.org>
Logic is too convoluted and we can't efficiently apply a specific
timeout depending on the purpose.
Remove it and instead rely on the right circuit cutoff instead of
keeping this flagged circuit open forever.
Part of #40694
Signed-off-by: David Goulet <dgoulet@torproject.org>
Explicitly set the S_CONNECT_REND purpose to a 4-hop cutoff.
As for the established rendezvous circuit waiting on the RENDEZVOUS2,
set one that is very long considering the possible waiting time for the
service to get the request and join our rendezvous.
Part of #40694
Signed-off-by: David Goulet <dgoulet@torproject.org>
Change it to an "unreachable" error so the intro point can be retried
and not flagged as a failure and never retried again.
Closes#40692
Signed-off-by: David Goulet <dgoulet@torproject.org>
This adds two consensus parameters to control the outbound max circuit
queue cell size limit and how many times it is allowed to reach that
limit for a single client IP.
Closes#40680
Signed-off-by: David Goulet <dgoulet@torproject.org>
Directory authorities and relays now interact properly with directory
authorities if they change addresses. In the past, they would continue
to upload votes, signatures, descriptors, etc to the hard-coded address
in the configuration. Now, if the directory authority is listed in
the consensus at a different address, they will direct queries to this
new address.
Specifically, these three activities have changed:
* Posting a vote, a signature, or a relay descriptor to all the dir auths.
* Dir auths fetching missing votes or signatures from all the dir auths.
* Dir auths fetching new descriptors from a specific dir auth when they
just learned about them from that dir auth's vote.
We already do this desired behavior (prefer the address in the consensus,
but fall back to the hard-coded dirservers info if needed) when fetching
missing certs.
There is a fifth case, in router_pick_trusteddirserver(), where clients
and relays are trying to reach a random dir auth to fetch something. I
left that case alone for now because the interaction with fallbackdirs
is complicated.
Implements ticket 40705.
Directory authorities stop voting a consensus "Measured" weight
for relays with the Authority flag. Now these relays will be
considered unmeasured, which should reserve their bandwidth
for their dir auth role and minimize distractions from other roles.
In place of the "Measured" weight, they now include a
"MeasuredButAuthority" weight (not used by anything) so the bandwidth
authority's opinion on this relay can be recorded for posterity.
Resolves ticket 40698.
The AuthDirDontVoteOnDirAuthBandwidth torrc option never worked, and it
was implemented in a way that could have produced consensus conflicts
if it had.
Resolves bug 40700.
Move the retry from circuit_expire_building() to when the offending
circuit is being closed.
Fixes#40695
Signed-off-by: David Goulet <dgoulet@torproject.org>
Logic is too convoluted and we can't efficiently apply a specific
timeout depending on the purpose.
Remove it and instead rely on the right circuit cutoff instead of
keeping this flagged circuit open forever.
Part of #40694
Signed-off-by: David Goulet <dgoulet@torproject.org>
Explicitly set the S_CONNECT_REND purpose to a 4-hop cutoff.
As for the established rendezvous circuit waiting on the RENDEZVOUS2,
set one that is very long considering the possible waiting time for the
service to get the request and join our rendezvous.
Part of #40694
Signed-off-by: David Goulet <dgoulet@torproject.org>
Change it to an "unreachable" error so the intro point can be retried
and not flagged as a failure and never retried again.
Closes#40692
Signed-off-by: David Goulet <dgoulet@torproject.org>
Bug 1: We were purporting to calculate milliseconds per tick, when we
*should* have been computing ticks per millisecond.
Bug 2: Instead of computing either one of those, we were _actually_
computing femtoseconds per tick.
These two bugs covered for one another on x86 hardware, where 1 tick
== 1 nanosecond. But on M1 OSX, 1 tick is about 41 nanoseconds,
causing surprising results.
Fixes bug 40684; bugfix on 0.3.3.1-alpha.
Patch to address #40673. An additional check has been added to
onion_pending_add() in order to ensure that we avoid counting create
cells from clients.
In the cpuworker.c assign_onionskin_to_cpuworker
method if total_pending_tasks >= max_pending_tasks
and channel_is_client(circ->p_chan) returns false then
rep_hist_note_circuit_handshake_dropped() will be called and
rep_hist_note_circuit_handshake_assigned() will not be called. This
causes relays to run into errors due to the fact that the number of
dropped packets exceeds the total number of assigned packets.
To avoid this situation a check has been added to
onion_pending_add() to ensure that these erroneous calls to
rep_hist_note_circuit_handshake_dropped() are not made.
See the #40673 ticket for the conversation with armadev about this issue.
We need to tune these, but we're not likely to need the subtle differences
between a few of them. Removing them will prevent our consensus parameter
string from becoming too long in the event of tuning.
mike is concerned that we would get too much exposure to adversaries,
if we enforce that none of our L2 guards can be in the same family.
this change set now essentially finishes the feature that commit a77727cdc
was attempting to add, but strips the "_and_family" part of that plan.
We had omitted some checks for whether our vanguards (second layer
guards from proposal 333) overlapped or came from the same family.
Now make sure to pick each of them to be independent.
Fixes bug 40639; bugfix on 0.4.7.1-alpha.
Remove UPTIME_TO_GUARANTEE_STABLE, MTBF_TO_GUARANTEE_STABLE,
TIME_KNOWN_TO_GUARANTEE_FAMILIAR WFU_TO_GUARANTEE_GUARD and replace each
of them with a tunnable torrc option.
Related to #40652
Signed-off-by: David Goulet <dgoulet@torproject.org>
Previously, `channelpadding_get_netflow_inactive_timeout_ms` would
crash with an assertion failure if `low_timeout` was greater than
`high_timeout`. That wasn't possible in practice because of checks
in `channelpadding_update_padding_for_channel`, but it's better not
to have a function whose correctness is this tricky to prove.
Fixes#40645. Bugfix on 0.3.1.1-alpha.
Note that with this commit, TRUNCATED cells won't be used anymore that
is client and relays won't emit them.
Fixes#40623
Signed-off-by: David Goulet <dgoulet@torproject.org>
This also incidently removes a use of uninitialized stack data from the
connection_or_set_ext_or_identifier() function.
Fixes#40648
Signed-off-by: David Goulet <dgoulet@torproject.org>
LibreSSL is now closer to OpenSSL 1.1 than OpenSSL 1.0. According to
https://undeadly.org/cgi?action=article;sid=20220116121253, this is the
intention of OpenBSD developers.
According to #40630, many special cases are needed to compile Tor against
LibreSSL 3.5 when using Tor's OpenSSL 1.0 compatibility mode, whereas only a
small number of #defines are required when using OpenSSL 1.1 compatibility
mode. One additional workaround is required for LibreSSL 3.4 compatibility.
Compiles and passes unit tests with LibreSSL 3.4.3 and 3.5.1.
The escaped() function and its kin already wrap their output in
quotes: there's no reason to do so twice.
I am _NOT_ making a corresponding change in calls that make the same
mistake in controller-related functions, however, due to the risk of
a compatibility break. :(
Closes#22723.
For some syscalls the kernel ABI uses 32 bit signed integers. Whether
these 32 bit integer values are sign extended or zero extended to the
native 64 bit register sizes is undefined and dependent on the {arch,
compiler, libc} being used. Instead of trying to detect which cases
zero-extend and which cases sign-extend, this commit uses a masked
equality check on the lower 32 bits of the value.
The chown/chmod/rename syscalls have never existed on AArch64, and libc
implements the POSIX functions via the fchownat/fchmodat/renameat
syscalls instead.
Add new filter functions for fchownat/fchmodat/renameat, not made
architecture specific since the syscalls exists everywhere else too.
However, in order to limit seccomp filter space usage, we only insert
rules for one of {chown, chown32, fchownat} depending on the
architecture (resp. {chmod, fchmodat}, {rename, renameat}).
New glibc versions not sign-extending 32 bit negative constants seems to
not be a thing on AArch64. I suspect that this might not be the only
architecture where the sign-extensions is happening, and the correct fix
might be instead to use a proper 32 bit comparison for the first openat
parameter. For now, band-aid fix this so the sandbox can work again on
AArch64.
Not revalidating keys on every fork speeds up make test from about 45 seconds
to 10 seconds with OpenSSL 1.1.1n and from 6 minutes to 10 seconds with OpenSSL
3.0.2.
MSVC compilation has been broken since at least 1e417b7275 ("All remaining
files in src/common belong to the event loop.") deleted
src/common/Makefile.nmake in 2018.
This rule has not been used since 4ead083dbc ("Do not ship a
fallback-consensus until the related bugs are fixed.") in 2008, and
fallback-consensus support was removed in f742b33d85 ("Drop
FallbackNetworkstatusFile; it never worked.").
Using tor_free is wrong; event_free must be called for objects obtained from
event_new. Additionally, this slightly simplifies the code.
Also, add a static_assert to prevent further instances.
It came as a surprise that Serge, the bridge authority, omits the Running
flag for all bridges in its first 30 minutes after a restart:
https://bugs.torproject.org/tpo/anti-censorship/rdsys/102
The fix we're doing for now is to accept it as correct behavior in
Tor, and change all the supporting tools to be able to handle bridge
networkstatus docs that have no Running bridges.
I'm documenting it here inside Tor too so the next person might not
be so surprised.
This code was heavily reused from the previous DNS timeout work done in
ticket #40491 that was removed afterall from our code.
Closes#40560
Signed-off-by: David Goulet <dgoulet@torproject.org>
We had 3 callsites setting up the circuit congestion control and so this
commit consolidates all 3 calls into 1 function.
Related to #40586
Signed-off-by: David Goulet <dgoulet@torproject.org>
Once the cpath is finalized, e2e encryption setup, transfer the ccontrol
from the rendezvous circuit to the cpath.
This allows the congestion control subsystem to properly function for
both upload and download side of onion services.
Closes#40586
Signed-off-by: David Goulet <dgoulet@torproject.org>
This applies only for relays. Previous commit adds two new consensus
parameters that dictate how libevent is configured with DNS resolution.
And so, with a new consensus, we now look at those values in case they
ever change.
Without this, Exit relay would have to HUP or restart to apply any new
Exit DNS consensus parameters.
Related to #40312
Signed-off-by: David Goulet <dgoulet@torproject.org>
Introduces two new consensus parameter:
exit_dns_timeout: Number of seconds before libevent should consider
the DNS request a timeout.
exit_dns_num_attempts: Number of attempts that libeven should retry a
previously failing query before calling it a timeout.
Closes#40312
Signed-off-by: David Goulet <dgoulet@torproject.org>
This code was heavily reused from the previous DNS timeout work done in
ticket #40491 that was removed afterall from our code.
Closes#40560
Signed-off-by: David Goulet <dgoulet@torproject.org>
Due to a possible Guard subsystem recursion, when the HS client gets
notified that the directory information has changed, it must run it in a
seperate mainloop event to avoid such issue.
See the ticket for more information on the recursion. This also fixes a
fatal assert.
Fixes#40579
Signed-off-by: David Goulet <dgoulet@torproject.org>
It is possible to not have the descriptor anymore by the time the
rendezvous circuit opens. Don't BUG() on that.
Instead, when sending the INTRODUCE1 cell, make sure the descriptor we
have (or have just fetched) matches what we setup in the rendezvous
circuit.
If not, the circuit is closed and another one is opened for a retry.
Fixes#40576
Signed-off-by: David Goulet <dgoulet@torproject.org>
Prometheus needs unique labels and so this bug was causing an onion
service with multiple ports to have multiple "port=" label for the
metrics requiring a port label.
Fixes#40581
Signed-off-by: David Goulet <dgoulet@torproject.org>
Prometheus needs unique labels and so this bug was causing an onion
service with multiple ports to have multiple "port=" label for the
metrics requiring a port label.
Fixes#40581
Signed-off-by: David Goulet <dgoulet@torproject.org>