Change it from "timeout" to "tor_timeout" in order to indicate that the
DNS timeout is one from tor's DNS threshold and not the DNS server
itself.
Fixes#40527
Signed-off-by: David Goulet <dgoulet@torproject.org>
Tor has configure libevent to attempt up to 3 times a DNS query for a
maximum of 5 seconds each. Once that 5 seconds has elapsed, it consider
the query "Timed Out" but tor only gets a timeout if all 3 attempts have
failed.
For example, using Unbound, it has a much higher threshold of timeout.
It is well defined in
https://www.nlnetlabs.nl/documentation/unbound/info-timeout/ and has
some complexity to it. But the gist is that if it times out, it will be
much more than 5 seconds.
And so the Tor DNS timeouts are more of a "UX issue" rather than a
"network issue". For this reason, we are removing this metric from the
overload general signal.
See https://gitlab.torproject.org/tpo/network-health/team/-/issues/139
for more information.
Fixes#40527
Signed-off-by: David Goulet <dgoulet@torproject.org>
This avoids performing and then freeing a lot of small mallocs() if
the hash line has too many elements.
Fixes one case of bug 40472; resolves OSS-Fuzz 38363. Bugfix on
0.3.1.1-alpha when the consdiff parsing code was introduced.
Some PT applications support more than one transport. For example,
obfs4proxy supports obfs4, obfs3, and meek. If one or more transports
specified in the torrc file are supported, we shouldn't kill the managed
proxy on a {C,S}METHOD-ERROR. Instead, we should log a warning.
We were already logging warnings on method errors. This change just
makes sure that the managed proxy isn't killed, and then if no
transports are configured for the managed proxy, bumps the log level up
from a notice to a warning.
Closes#7362
Mingw headers sometimes like to define alternative scanf/printf
format attributes depending on whether they're using clang, UCRT,
MINGW_ANSI_STDIO, or the microsoft version of printf/scanf. This
change attempts to use the right one on the given platform.
This is an attempt to fix part of #40355.
glibc versions 2.33 and newer use the modern "statx" system call in their
implementations of stat() and opendir() for Linux on i386. Prevent failures in
the sandbox unit tests by modifying the sandbox to allow this system call
without restriction on i386 when it is available, and update the test suite to
skip the "sandbox/stat_filename" test in this case as it is certain to fail.
On 32-bit architectures where Linux provides the "stat64" system call,
including i386, the sandbox is unable to filter calls to stat() as glibc uses
this system call itself internally and the sandbox must allow it without
restriction.
Update the sandbox unit tests to skip the "sandbox/stat_filename" test on
systems where the "stat64" system call is defined and the test is certain to
fail. Also reorder the "#if" statement's clauses to correspond with the
comment preceding it, for clarity.
On 32-bit architectures where Linux provides the "clock_gettime64" system call,
including i386, glibc uses it in place of "clock_gettime". Modify the sandbox
implementation to match, to prevent Tor's monotonic-time functions (in
src/lib/time/compat_time.c) failing when the sandbox is active.
On i386 glibc uses the "chown32" system call instead of "chown". Prevent
attempts to filter calls to chown() on this architecture from failing by
modifying the sandbox implementation to match.
This also moves the warnings and add some theatrical effect around the
code so anyone modifying those list should notice the warnings signs and
read the comment accordingly.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Our code doesn't allow it and so this prevents an assert() crash if the
DirPort is for instance IPv6 only.
Fixes#40494
Signed-off-by: David Goulet <dgoulet@torproject.org>
While trying to resolve our CI issues, the Windows build broke with an
unused function error:
src/test/test_switch_id.c:37:1: error: ‘unprivileged_port_range_start’
defined but not used [-Werror=unused-function]
We solve this by moving the `#if !defined(_WIN32)` test above the
`unprivileged_port_range_start()` function defintion such that it is
included in its body.
This is an unreviewed commit.
See: tor#40275
When we don't yet have a descriptor for one of our bridges, disable
the entry guard retry schedule on that bridge. The entry guard retry
schedule and the bridge descriptor retry schedule can conflict,
e.g. where we mark a bridge as "maybe up" yet we don't try to fetch
its descriptor yet, leading Tor to wait (refusing to do anything)
until it becomes time to fetch the descriptor.
Fixes bug 40497; bugfix on 0.3.0.3-alpha.
we do a notice-level log when we decide we *don't* have enough dir
info, but in 0.3.5.1-alpha (see commit eee62e13d9, #14950) we lost our
corresponding notice-level log when things come back.
bugfix on 0.3.5.1-alpha; fixes bug 40496.
Specifically, every time a guard moves into or out of state
GUARD_REACHABLE_MAYBE, it is an opportunity for the guard reachability
state to get out of sync with the have-minimum-dir-info state.
Fixes even more of #40396.
When we try to fetch a bridge descriptor and we fail, we mark
the guard as failed, but we never scheduled a re-compute for
router_have_minimum_dir_info().
So if we had already decided we needed to wait for this new descriptor,
we would just wait forever -- even if, counterintuitively, *losing* the
bridge is just what we need to *resume* using the network, if we had it
in state GUARD_REACHABLE_MAYBE and we were stalling to learn this outcome.
See bug 40396 for more details.
The bridge descriptor fetching codes ends up fetching a lot of duplicate
bridge descriptors, because this is how we learn when the descriptor
changes.
This commit only changes comments plus whether we log that one line.
It moves us back to the old behavior, before the previous commit for
30496, where we would only log that line when the bridge descriptor
we're talking about is better than the one we already had (if any).
Without this change, if we have a working bridge, and we add a new bridge,
we will schedule the fetch attempt for that new bridge descriptor for
three hours(!) in the future.
This change is especially needed because of bug #40396, where if you have
one working bridge and one bridge whose descriptor you haven't fetched
yet, your Tor will stall until you have successfully fetched that new
descriptor -- in this case for hours.
In the old design, we would put off all further bridge descriptor fetches
once we had any working bridge descriptor. In this new design, we make the
decision per bridge based on whether we successfully got *its* descriptor.
To make this work, we need to also call learned_bridge_descriptor() every
time we get a bridge descriptor, not just when it's a novel descriptor.
Fixes bug 40396.
Also happens to fix bug 40495 (redundant descriptor fetches for every
bridge) since now we delay fetches once we succeed.
A side effect of this change is that if we have any configured bridges
that *aren't* working, we will keep trying to fetch their descriptors
on the modern directory retry schedule -- every couple of seconds for
the first half minute, then backing off after that -- which is a lot
faster than before.