This code was heavily reused from the previous DNS timeout work done in
ticket #40491 that was removed afterall from our code.
Closes#40560
Signed-off-by: David Goulet <dgoulet@torproject.org>
Change it from "timeout" to "tor_timeout" in order to indicate that the
DNS timeout is one from tor's DNS threshold and not the DNS server
itself.
Fixes#40527
Signed-off-by: David Goulet <dgoulet@torproject.org>
Tor has configure libevent to attempt up to 3 times a DNS query for a
maximum of 5 seconds each. Once that 5 seconds has elapsed, it consider
the query "Timed Out" but tor only gets a timeout if all 3 attempts have
failed.
For example, using Unbound, it has a much higher threshold of timeout.
It is well defined in
https://www.nlnetlabs.nl/documentation/unbound/info-timeout/ and has
some complexity to it. But the gist is that if it times out, it will be
much more than 5 seconds.
And so the Tor DNS timeouts are more of a "UX issue" rather than a
"network issue". For this reason, we are removing this metric from the
overload general signal.
See https://gitlab.torproject.org/tpo/network-health/team/-/issues/139
for more information.
Fixes#40527
Signed-off-by: David Goulet <dgoulet@torproject.org>
This is due to the libevent bug
https://github.com/libevent/libevent/issues/1219 that fails to return
back the DNS record type on error.
And so, the MetricsPort now only reports the errors as a global counter
and not a per record type.
Closes#40490
Signed-off-by: David Goulet <dgoulet@torproject.org>
With this commit, we will only report a general overload state if we've
seen more than X% of DNS timeout errors over Y seconds. Previous
behavior was to report when a single timeout occured which is really too
small of a threshold.
The value X is a consensus parameters called
"overload_dns_timeout_scale_percent" which is a scaled percentage
(factor of 1000) so we can represent decimal points for X like 0.5% for
instance. Its default is 1000 which ends up being 1%.
The value Y is a consensus parameters called
"overload_dns_timeout_period_secs" which is the time period for which
will gather DNS errors and once over, we assess if that X% has been
reached ultimately triggering a general overload signal.
Closes#40491
Signed-off-by: David Goulet <dgoulet@torproject.org>
With this commit, we will only report a general overload state if we've
seen more than X% of DNS timeout errors over Y seconds. Previous
behavior was to report when a single timeout occured which is really too
small of a threshold.
The value X is a consensus parameters called
"overload_dns_timeout_scale_percent" which is a scaled percentage
(factor of 1000) so we can represent decimal points for X like 0.5% for
instance. Its default is 1000 which ends up being 1%.
The value Y is a consensus parameters called
"overload_dns_timeout_period_secs" which is the time period for which
will gather DNS errors and once over, we assess if that X% has been
reached ultimately triggering a general overload signal.
Closes#40491
Signed-off-by: David Goulet <dgoulet@torproject.org>
Current counters are reset every heartbeat. This commit adds two
counters for the assigned and dropped onionskins that are not reset so
they can be exported onto the MetricsPort.
Closes#40387
Signed-off-by: David Goulet <dgoulet@torproject.org>
We now keep track of all errors and total number of request seen. This
is so we can expose those values to the MetricsPort to help Exit
operators monitor the DNS requests and failures.
Related to #40367.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This emits two events (read and write) of the total number that the
global connection limit was reached.
Related to #40367
Signed-off-by: David Goulet <dgoulet@torproject.org>
With this commit, a relay now emits metrics event on the MetricsPort
related to how many onionskins were handled (processed or dropped) for
each handshake type.
Related to #40367
Signed-off-by: David Goulet <dgoulet@torproject.org>
We use it in router.c, where chunks are joined with "", not with
NL... so leaving off the terminating NL will lead to an unparseable
extrainfo.
Found by toralf. Bug not in any released Tor.
```
src/feature/stats/rephist.c: In function ‘overload_happened_recently’:
src/feature/stats/rephist.c:215:21: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
if (overload_time > approx_time() - 3600 * n_hours) {
```
from https://gitlab.torproject.org/tpo/core/tor/-/issues/40341#note_2729364
- Implement overload statistics structure.
- Implement function that keeps track of overload statistics.
- Implement function that writes overload statistics to descriptor.
- Unittest for the whole logic.
This option changes the time for which a bandwidth measurement period
must have been in progress before we include it when reporting our
observed bandwidth in our descriptors. Without this option, we only
consider a time period towards our maximum if it has been running
for a full day. Obviously, that's unacceptable for testing
networks, where we'd like to get results as soon as possible.
For non-testing networks, I've put a (somewhat arbitrary) 2-hour
minimum on the option, since there are traffic analysis concerns
with immediate reporting here.
Closes#40337.
This is a new detection type which is that a relay can now control the rate of
client connections from a single address.
The mechanism is pretty simple, if the rate/burst is reached, the address is
marked for a period of time and any connection from that address is denied.
Closes#40253
Signed-off-by: David Goulet <dgoulet@torproject.org>
Regular relays are about to get their DirPort removed so that reachability
test is not useful anymore
Authorities will still use the DirPort but because network reentry towards
their DirPort is now denied network wide, this test is not useful anymore and
so it should simply be considered reachable at all time.
Part of #40282
Signed-off-by: David Goulet <dgoulet@torproject.org>
We still keep v2 rendezvous stats since we will allow them until the network
has entirely phased out.
Related to #40266
Signed-off-by: David Goulet <dgoulet@torproject.org>
The rest of rephist.c is doing the same kind of unsigned casting. For example
see rep_hist_format_buffer_stats() and rep_hist_format_exit_stats().
The previous switch to %ld made Appveyor fail:
https://ci.appveyor.com/project/torproject/tor/builds/36118502
Typos found with codespell.
Please keep in mind that this should have impact on actual code
and must be carefully evaluated:
src/core/or/lttng_circuit.inc
- ctf_enum_value("CONTROLER", CIRCUIT_PURPOSE_CONTROLLER)
+ ctf_enum_value("CONTROLLER", CIRCUIT_PURPOSE_CONTROLLER)