Rotate the relay identity key and v3 identity key for moria1. They
have been online for more than a decade, there was a known potential
compromise, and anyway refreshing keys periodically is good practice.
Advertise new ports too, to avoid confusion.
Closes ticket 40722.
This change mitigates DNS-based website oracles by making the time that
a domain name is cached uncertain (+- 4 minutes of what's measurable).
Resolves TROVE-2021-009.
Fixes#40674
We cap our number of CPU worker threads to at least 2 even if we have a
single core. But also, before we used to always add one extra thread
regardless of the number of core.
This meant that we were off when re-using the get_num_cpus() function
when calculating our onionskin work overhead because we were always off
by one.
This commit makes it that we always use the number of thread our actual
thread pool was configured with.
Fixes#40719
Signed-off-by: David Goulet <dgoulet@torproject.org>
Cap this to 2 threads always because we need a low and high priority
thread even with a single core.
Fixes#40713
Signed-off-by: David Goulet <dgoulet@torproject.org>
Created and Rejected connections are ever going up counters. While
Opened connections are gauges going up and down.
Fixes#40712
Signed-off-by: David Goulet <dgoulet@torproject.org>
This change mitigates DNS-based website oracles by making the time that
a domain name is cached uncertain (+- 4 minutes of what's measurable).
Resolves TROVE-2021-009.
Fixes#40674
This is part of the fast path so we need to cache consensus parameters
instead of querying it everytime we need to learn a value.
Part of #40704
Signed-off-by: David Goulet <dgoulet@torproject.org>
Until now, there was this magic number (64) used as the maximum number
of tasks a CPU worker can take at once.
This commit makes it a consensus parameter so our future selves can
think of a better value depending on network conditions.
Part of #40704
Signed-off-by: David Goulet <dgoulet@torproject.org>
Transform the hardcoded value ONIONQUEUE_WAIT_CUTOFF into a consensus
parameter so we can control it network wide.
Closes#40704
Signed-off-by: David Goulet <dgoulet@torproject.org>
This also incidently removes a use of uninitialized stack data from the
connection_or_set_ext_or_identifier() function.
Fixes#40648
Signed-off-by: David Goulet <dgoulet@torproject.org>
* it also uses sys/param.h to track its version;
* present that to tor_libc_get_version_str() as libc version;
while here, we also fix the return of FreeBSD version
* __FreeBSD_version is the correct var tracking the OSVERSION
* we use OSVERSION here (defined by __FreeBSD__);
* it's part of the <sys/param.h> include;
* that tracks all noteworthy changes made to the base system.
* __BSD_VISIBLE is defined by systems like FreeBSD and OpenBSD;
* that also extends to DragonFlyBSD;
* it's used on stdlib.h and ctypes.h on those systems.
This BUG() was added when the code was written to see if this callback
was ever executed after we marked the handle as EOF. It turns out, it
does, but we handle it gracefully. We can therefore remove the BUG().
Fixes tpo/core/tor#40596.
Remove a harmless "Bug" log message that can happen in
relay_addr_learn_from_dirauth() on relays during startup:
tor_bug_occurred_(): Bug: ../src/feature/relay/relay_find_addr.c:225: relay_addr_learn_from_dirauth: Non-fatal assertion !(!ei) failed. (on Tor 0.4.7.10 )
Bug: Tor 0.4.7.10: Non-fatal assertion !(!ei) failed in relay_addr_learn_from_dirauth at ../src/feature/relay/relay_find_addr.c:225. Stack trace: (on Tor 0.4.7.10 )
Finishes fixing bug 40231.
Fixes bug 40523; bugfix on 0.4.5.4-rc.
Move the retry from circuit_expire_building() to when the offending
circuit is being closed.
Fixes#40695
Signed-off-by: David Goulet <dgoulet@torproject.org>
Logic is too convoluted and we can't efficiently apply a specific
timeout depending on the purpose.
Remove it and instead rely on the right circuit cutoff instead of
keeping this flagged circuit open forever.
Part of #40694
Signed-off-by: David Goulet <dgoulet@torproject.org>
Explicitly set the S_CONNECT_REND purpose to a 4-hop cutoff.
As for the established rendezvous circuit waiting on the RENDEZVOUS2,
set one that is very long considering the possible waiting time for the
service to get the request and join our rendezvous.
Part of #40694
Signed-off-by: David Goulet <dgoulet@torproject.org>
Change it to an "unreachable" error so the intro point can be retried
and not flagged as a failure and never retried again.
Closes#40692
Signed-off-by: David Goulet <dgoulet@torproject.org>
This adds two consensus parameters to control the outbound max circuit
queue cell size limit and how many times it is allowed to reach that
limit for a single client IP.
Closes#40680
Signed-off-by: David Goulet <dgoulet@torproject.org>
Directory authorities and relays now interact properly with directory
authorities if they change addresses. In the past, they would continue
to upload votes, signatures, descriptors, etc to the hard-coded address
in the configuration. Now, if the directory authority is listed in
the consensus at a different address, they will direct queries to this
new address.
Specifically, these three activities have changed:
* Posting a vote, a signature, or a relay descriptor to all the dir auths.
* Dir auths fetching missing votes or signatures from all the dir auths.
* Dir auths fetching new descriptors from a specific dir auth when they
just learned about them from that dir auth's vote.
We already do this desired behavior (prefer the address in the consensus,
but fall back to the hard-coded dirservers info if needed) when fetching
missing certs.
There is a fifth case, in router_pick_trusteddirserver(), where clients
and relays are trying to reach a random dir auth to fetch something. I
left that case alone for now because the interaction with fallbackdirs
is complicated.
Implements ticket 40705.
Directory authorities stop voting a consensus "Measured" weight
for relays with the Authority flag. Now these relays will be
considered unmeasured, which should reserve their bandwidth
for their dir auth role and minimize distractions from other roles.
In place of the "Measured" weight, they now include a
"MeasuredButAuthority" weight (not used by anything) so the bandwidth
authority's opinion on this relay can be recorded for posterity.
Resolves ticket 40698.
The AuthDirDontVoteOnDirAuthBandwidth torrc option never worked, and it
was implemented in a way that could have produced consensus conflicts
if it had.
Resolves bug 40700.
Move the retry from circuit_expire_building() to when the offending
circuit is being closed.
Fixes#40695
Signed-off-by: David Goulet <dgoulet@torproject.org>
Logic is too convoluted and we can't efficiently apply a specific
timeout depending on the purpose.
Remove it and instead rely on the right circuit cutoff instead of
keeping this flagged circuit open forever.
Part of #40694
Signed-off-by: David Goulet <dgoulet@torproject.org>
Explicitly set the S_CONNECT_REND purpose to a 4-hop cutoff.
As for the established rendezvous circuit waiting on the RENDEZVOUS2,
set one that is very long considering the possible waiting time for the
service to get the request and join our rendezvous.
Part of #40694
Signed-off-by: David Goulet <dgoulet@torproject.org>
Change it to an "unreachable" error so the intro point can be retried
and not flagged as a failure and never retried again.
Closes#40692
Signed-off-by: David Goulet <dgoulet@torproject.org>
Bug 1: We were purporting to calculate milliseconds per tick, when we
*should* have been computing ticks per millisecond.
Bug 2: Instead of computing either one of those, we were _actually_
computing femtoseconds per tick.
These two bugs covered for one another on x86 hardware, where 1 tick
== 1 nanosecond. But on M1 OSX, 1 tick is about 41 nanoseconds,
causing surprising results.
Fixes bug 40684; bugfix on 0.3.3.1-alpha.
Patch to address #40673. An additional check has been added to
onion_pending_add() in order to ensure that we avoid counting create
cells from clients.
In the cpuworker.c assign_onionskin_to_cpuworker
method if total_pending_tasks >= max_pending_tasks
and channel_is_client(circ->p_chan) returns false then
rep_hist_note_circuit_handshake_dropped() will be called and
rep_hist_note_circuit_handshake_assigned() will not be called. This
causes relays to run into errors due to the fact that the number of
dropped packets exceeds the total number of assigned packets.
To avoid this situation a check has been added to
onion_pending_add() to ensure that these erroneous calls to
rep_hist_note_circuit_handshake_dropped() are not made.
See the #40673 ticket for the conversation with armadev about this issue.
We need to tune these, but we're not likely to need the subtle differences
between a few of them. Removing them will prevent our consensus parameter
string from becoming too long in the event of tuning.
mike is concerned that we would get too much exposure to adversaries,
if we enforce that none of our L2 guards can be in the same family.
this change set now essentially finishes the feature that commit a77727cdc
was attempting to add, but strips the "_and_family" part of that plan.
We had omitted some checks for whether our vanguards (second layer
guards from proposal 333) overlapped or came from the same family.
Now make sure to pick each of them to be independent.
Fixes bug 40639; bugfix on 0.4.7.1-alpha.
Remove UPTIME_TO_GUARANTEE_STABLE, MTBF_TO_GUARANTEE_STABLE,
TIME_KNOWN_TO_GUARANTEE_FAMILIAR WFU_TO_GUARANTEE_GUARD and replace each
of them with a tunnable torrc option.
Related to #40652
Signed-off-by: David Goulet <dgoulet@torproject.org>
Previously, `channelpadding_get_netflow_inactive_timeout_ms` would
crash with an assertion failure if `low_timeout` was greater than
`high_timeout`. That wasn't possible in practice because of checks
in `channelpadding_update_padding_for_channel`, but it's better not
to have a function whose correctness is this tricky to prove.
Fixes#40645. Bugfix on 0.3.1.1-alpha.
Note that with this commit, TRUNCATED cells won't be used anymore that
is client and relays won't emit them.
Fixes#40623
Signed-off-by: David Goulet <dgoulet@torproject.org>
This also incidently removes a use of uninitialized stack data from the
connection_or_set_ext_or_identifier() function.
Fixes#40648
Signed-off-by: David Goulet <dgoulet@torproject.org>
LibreSSL is now closer to OpenSSL 1.1 than OpenSSL 1.0. According to
https://undeadly.org/cgi?action=article;sid=20220116121253, this is the
intention of OpenBSD developers.
According to #40630, many special cases are needed to compile Tor against
LibreSSL 3.5 when using Tor's OpenSSL 1.0 compatibility mode, whereas only a
small number of #defines are required when using OpenSSL 1.1 compatibility
mode. One additional workaround is required for LibreSSL 3.4 compatibility.
Compiles and passes unit tests with LibreSSL 3.4.3 and 3.5.1.
The escaped() function and its kin already wrap their output in
quotes: there's no reason to do so twice.
I am _NOT_ making a corresponding change in calls that make the same
mistake in controller-related functions, however, due to the risk of
a compatibility break. :(
Closes#22723.
Update the sandbox implementation to allow its use with fragile hardening
enabled on AArch64 (ARM64) and other architectures that use Linux's generic
syscall interface. Note that in this configuration the sandbox is completely
unable to filter requests to open files and directories.
Update the sandbox unit tests to match.
On architectures that use Linux's generic syscall interface the legacy "chown"
call is not available; on these systems glibc uses "fchownat" instead. Modify
the sandbox implementation to match.
On architectures that use Linux's generic syscall interface the legacy "chmod"
call is not available; on these systems glibc uses "fchmodat" instead. Modify
the sandbox implementation to match.
On architectures that use Linux's generic syscall interface the legacy "stat"
and "stat64" calls may not be available; on these systems glibc uses
"newfstatat" instead. Modify the sandbox implementation to match.
Note that on these architectures as on others glibc 2.33 uses "newfstatat" in a
way the sandbox cannot filter, so preserve in add_noparam_filter() the code
that allows the use of this syscall without restriction when glibc version 2.33
is in use.
On architectures where Linux does not provide the legacy "rename" syscall it
offers one or both of "renameat" and "renameat2" instead. Follow glibc's logic
in selecting which syscall to filter.
On architectures where Linux does not provide the legacy "open" syscall glibc
necessarily uses "openat" instead. Omit the unnecessary glibc-version check on
these systems.
For some syscalls the kernel ABI uses 32 bit signed integers. Whether
these 32 bit integer values are sign extended or zero extended to the
native 64 bit register sizes is undefined and dependent on the {arch,
compiler, libc} being used. Instead of trying to detect which cases
zero-extend and which cases sign-extend, this commit uses a masked
equality check on the lower 32 bits of the value.
The chown/chmod/rename syscalls have never existed on AArch64, and libc
implements the POSIX functions via the fchownat/fchmodat/renameat
syscalls instead.
Add new filter functions for fchownat/fchmodat/renameat, not made
architecture specific since the syscalls exists everywhere else too.
However, in order to limit seccomp filter space usage, we only insert
rules for one of {chown, chown32, fchownat} depending on the
architecture (resp. {chmod, fchmodat}, {rename, renameat}).
New glibc versions not sign-extending 32 bit negative constants seems to
not be a thing on AArch64. I suspect that this might not be the only
architecture where the sign-extensions is happening, and the correct fix
might be instead to use a proper 32 bit comparison for the first openat
parameter. For now, band-aid fix this so the sandbox can work again on
AArch64.
Not revalidating keys on every fork speeds up make test from about 45 seconds
to 10 seconds with OpenSSL 1.1.1n and from 6 minutes to 10 seconds with OpenSSL
3.0.2.
MSVC compilation has been broken since at least 1e417b7275 ("All remaining
files in src/common belong to the event loop.") deleted
src/common/Makefile.nmake in 2018.
This rule has not been used since 4ead083dbc ("Do not ship a
fallback-consensus until the related bugs are fixed.") in 2008, and
fallback-consensus support was removed in f742b33d85 ("Drop
FallbackNetworkstatusFile; it never worked.").
Using tor_free is wrong; event_free must be called for objects obtained from
event_new. Additionally, this slightly simplifies the code.
Also, add a static_assert to prevent further instances.
It came as a surprise that Serge, the bridge authority, omits the Running
flag for all bridges in its first 30 minutes after a restart:
https://bugs.torproject.org/tpo/anti-censorship/rdsys/102
The fix we're doing for now is to accept it as correct behavior in
Tor, and change all the supporting tools to be able to handle bridge
networkstatus docs that have no Running bridges.
I'm documenting it here inside Tor too so the next person might not
be so surprised.
This code was heavily reused from the previous DNS timeout work done in
ticket #40491 that was removed afterall from our code.
Closes#40560
Signed-off-by: David Goulet <dgoulet@torproject.org>
We had 3 callsites setting up the circuit congestion control and so this
commit consolidates all 3 calls into 1 function.
Related to #40586
Signed-off-by: David Goulet <dgoulet@torproject.org>
Once the cpath is finalized, e2e encryption setup, transfer the ccontrol
from the rendezvous circuit to the cpath.
This allows the congestion control subsystem to properly function for
both upload and download side of onion services.
Closes#40586
Signed-off-by: David Goulet <dgoulet@torproject.org>
This applies only for relays. Previous commit adds two new consensus
parameters that dictate how libevent is configured with DNS resolution.
And so, with a new consensus, we now look at those values in case they
ever change.
Without this, Exit relay would have to HUP or restart to apply any new
Exit DNS consensus parameters.
Related to #40312
Signed-off-by: David Goulet <dgoulet@torproject.org>
Introduces two new consensus parameter:
exit_dns_timeout: Number of seconds before libevent should consider
the DNS request a timeout.
exit_dns_num_attempts: Number of attempts that libeven should retry a
previously failing query before calling it a timeout.
Closes#40312
Signed-off-by: David Goulet <dgoulet@torproject.org>
This code was heavily reused from the previous DNS timeout work done in
ticket #40491 that was removed afterall from our code.
Closes#40560
Signed-off-by: David Goulet <dgoulet@torproject.org>
Due to a possible Guard subsystem recursion, when the HS client gets
notified that the directory information has changed, it must run it in a
seperate mainloop event to avoid such issue.
See the ticket for more information on the recursion. This also fixes a
fatal assert.
Fixes#40579
Signed-off-by: David Goulet <dgoulet@torproject.org>
It is possible to not have the descriptor anymore by the time the
rendezvous circuit opens. Don't BUG() on that.
Instead, when sending the INTRODUCE1 cell, make sure the descriptor we
have (or have just fetched) matches what we setup in the rendezvous
circuit.
If not, the circuit is closed and another one is opened for a retry.
Fixes#40576
Signed-off-by: David Goulet <dgoulet@torproject.org>
Prometheus needs unique labels and so this bug was causing an onion
service with multiple ports to have multiple "port=" label for the
metrics requiring a port label.
Fixes#40581
Signed-off-by: David Goulet <dgoulet@torproject.org>
Prometheus needs unique labels and so this bug was causing an onion
service with multiple ports to have multiple "port=" label for the
metrics requiring a port label.
Fixes#40581
Signed-off-by: David Goulet <dgoulet@torproject.org>
It looks like our code actually assumes (by dereferencing it in a
log call) that ed_id will _not_ be NULL, but rather will be a bunch
of zero bytes. Refactor the code accordingly, and stop using NULL
tests on ed_id.
We expect ed_id == NULL here to indicate "no ed id", but other parts
of Tor sometimes use an all-0 ed_id. Here we detect that input and
replace it with what's expected.
The previous code would notice if we were changing from one identity
to another, but not if we were changing from no identity to having
an identity. This problem caused a bug (spotted by cypherpunks in
ticket #40563) where if we created a channel for a circuit request
that doesn't include an Ed25519 identity, we won't be able to use
that channel later for requests that _do_ list Ed25519.
Fix for 40563; bugfix on 0.3.0.1-alpha.
Previously we were using tor_assert() to enforce the documented
invariant here; this commit changes it to use BUG() instead. It
will protect us from crashes if the next commit (on #40563) turns
out to expose a bug somewhere.
Republishing is necessary to ensure that clients connect using the correct
sendme_inc upon any change. Additionally, introduction points must be
re-chosen, so that cached descriptors with old values are not usable.
We do not expect to change sendme_inc, unless cell size or TLS record size
changes, so this should be rare.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Move it to extension.trunnel instead so that extension ABI construction
can be used in other parts of tor than just HS cells.
Specifically, we'll use it in the ntorv3 data payload and make a
congestion control parameter extension using that binary structure.
Only rename. No code behavior changes.
Signed-off-by: David Goulet <dgoulet@torproject.org>
We can now query the protover subsystem to get the current value we
support for a specific protover type.
This will be useful for prop324 onion service part which puts in the
FlowCtrl value in the service descriptor.
No behavior change.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This is for public network testing and for sbws. Should not otherwise be used,
hence it is an undocumented __option.
The option deliberately does not allow force-disabling congestion control, as
this is bad for queueing and fairness.
It is possible that a scheduled channel ended up with 0 bytes in its
outbuf after the scheduling loop and having an outbuf table entry
indicating that we need to flush bytes on the wire after the loop.
This lead to attempt to write 0 bytes up to the TLS layer that would
prevent such action.
All in all, this fixes wasted CPU cycles on attempting to flush nothing.
Fixes#40548
Signed-off-by: David Goulet <dgoulet@torproject.org>
Introduced in bf10206e9e which is not
released yet thus no changes file.
Found by Coverity with cid #1495786.
Fixes#40532
Signed-off-by: David Goulet <dgoulet@torproject.org>
HSv2 is not advertised as a supported protocol version anymore.
This was introduced with the merge-forward of commit 72041c6306
which didn't fix the unit test.
Fixes#40549
Signed-off-by: David Goulet <dgoulet@torproject.org>
We removed HSIntro=3 and HSDir=1 that are v2 specific. Since 0.3.5.17,
we do not support introducing or being a directory for onion service v2.
Closes#40509
Signed-off-by: David Goulet <dgoulet@torproject.org>
Change it from "timeout" to "tor_timeout" in order to indicate that the
DNS timeout is one from tor's DNS threshold and not the DNS server
itself.
Fixes#40527
Signed-off-by: David Goulet <dgoulet@torproject.org>
Tor has configure libevent to attempt up to 3 times a DNS query for a
maximum of 5 seconds each. Once that 5 seconds has elapsed, it consider
the query "Timed Out" but tor only gets a timeout if all 3 attempts have
failed.
For example, using Unbound, it has a much higher threshold of timeout.
It is well defined in
https://www.nlnetlabs.nl/documentation/unbound/info-timeout/ and has
some complexity to it. But the gist is that if it times out, it will be
much more than 5 seconds.
And so the Tor DNS timeouts are more of a "UX issue" rather than a
"network issue". For this reason, we are removing this metric from the
overload general signal.
See https://gitlab.torproject.org/tpo/network-health/team/-/issues/139
for more information.
Fixes#40527
Signed-off-by: David Goulet <dgoulet@torproject.org>
This avoids performing and then freeing a lot of small mallocs() if
the hash line has too many elements.
Fixes one case of bug 40472; resolves OSS-Fuzz 38363. Bugfix on
0.3.1.1-alpha when the consdiff parsing code was introduced.
Some PT applications support more than one transport. For example,
obfs4proxy supports obfs4, obfs3, and meek. If one or more transports
specified in the torrc file are supported, we shouldn't kill the managed
proxy on a {C,S}METHOD-ERROR. Instead, we should log a warning.
We were already logging warnings on method errors. This change just
makes sure that the managed proxy isn't killed, and then if no
transports are configured for the managed proxy, bumps the log level up
from a notice to a warning.
Closes#7362
When a new consensus method is negotiated, these values will all get
replaced with "2038-01-01 00:00:00".
This change should be safe because:
* As of 0.2.9.11 / 0.3.0.7 / 0.3.1.1-alpha, Tor takes no action
about published_on times in the future.
* The only remaining parties relying on published_on values are (we
believe) relays running 0.3.5.x, which rely on the values in a NS
consensus to see whether their descriptors are out of date. But
this patch only changes microdesc consensuses.
* The latest Tor no longer looks at this field in consensuses.
Why make this change? In experiments, replacing these values with a
fixed value made the size of compressed consensus diffs much much
smaller. (Like, by over 50%!)
Implements proposal 275; Implements #40130.
Nothing breaks here, since all non-voting users of
routerstatus_t.published_on have been adjusted or removed in
previous commits.
We have to expand the API of routerstatus_format_entry() a bit,
though, so that it can always get a published time as argument,
since it can't get it from the routerstatus any more.
This should have no effect on voter behavior.
It used to describe when the old and new routerinfos were published
when we'd decide to download a routerinfo. Now it describes what
their descriptor digests are.
The consensus voters shouldn't actually include such old routers in
the consensus anyway, so this logic shouldn't come up...
but if a client _does_ download something it wouldn't use, it won't
retry infinitely: see checks for WRA_NEVER_DOWNLOADABLE.
Previously we'd look at the routerstatus published_on field when
deciding what to dump, which really has no point. If something's in
the consensus with an ancient published date, then we do want to
keep it.
where the or_conn for testing the failure cache would be initialized
with random stack data, so e.g. its potentially_used_for_bootstrapping
field would start out at some random value.
The connect failure cache had a bad interaction with retrying connections
to our guards or bridges when we go offline and then come back online --
while offline we would fail to connect and cache this result, and then
when we return we would decline to even attempt to connect, because our
failure cache said it wouldn't work.
Now only cache connect failures for relays when we connected to them
because of somebody else's EXTEND request.
Fixes bug 40499; bugfix on 0.3.3.4-alpha.
Mingw headers sometimes like to define alternative scanf/printf
format attributes depending on whether they're using clang, UCRT,
MINGW_ANSI_STDIO, or the microsoft version of printf/scanf. This
change attempts to use the right one on the given platform.
This is an attempt to fix part of #40355.
glibc versions 2.33 and newer use the modern "statx" system call in their
implementations of stat() and opendir() for Linux on i386. Prevent failures in
the sandbox unit tests by modifying the sandbox to allow this system call
without restriction on i386 when it is available, and update the test suite to
skip the "sandbox/stat_filename" test in this case as it is certain to fail.
On 32-bit architectures where Linux provides the "stat64" system call,
including i386, the sandbox is unable to filter calls to stat() as glibc uses
this system call itself internally and the sandbox must allow it without
restriction.
Update the sandbox unit tests to skip the "sandbox/stat_filename" test on
systems where the "stat64" system call is defined and the test is certain to
fail. Also reorder the "#if" statement's clauses to correspond with the
comment preceding it, for clarity.
On 32-bit architectures where Linux provides the "clock_gettime64" system call,
including i386, glibc uses it in place of "clock_gettime". Modify the sandbox
implementation to match, to prevent Tor's monotonic-time functions (in
src/lib/time/compat_time.c) failing when the sandbox is active.
On i386 glibc uses the "chown32" system call instead of "chown". Prevent
attempts to filter calls to chown() on this architecture from failing by
modifying the sandbox implementation to match.
This also moves the warnings and add some theatrical effect around the
code so anyone modifying those list should notice the warnings signs and
read the comment accordingly.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Our code doesn't allow it and so this prevents an assert() crash if the
DirPort is for instance IPv6 only.
Fixes#40494
Signed-off-by: David Goulet <dgoulet@torproject.org>
While trying to resolve our CI issues, the Windows build broke with an
unused function error:
src/test/test_switch_id.c:37:1: error: ‘unprivileged_port_range_start’
defined but not used [-Werror=unused-function]
We solve this by moving the `#if !defined(_WIN32)` test above the
`unprivileged_port_range_start()` function defintion such that it is
included in its body.
This is an unreviewed commit.
See: tor#40275
When we don't yet have a descriptor for one of our bridges, disable
the entry guard retry schedule on that bridge. The entry guard retry
schedule and the bridge descriptor retry schedule can conflict,
e.g. where we mark a bridge as "maybe up" yet we don't try to fetch
its descriptor yet, leading Tor to wait (refusing to do anything)
until it becomes time to fetch the descriptor.
Fixes bug 40497; bugfix on 0.3.0.3-alpha.
we do a notice-level log when we decide we *don't* have enough dir
info, but in 0.3.5.1-alpha (see commit eee62e13d9, #14950) we lost our
corresponding notice-level log when things come back.
bugfix on 0.3.5.1-alpha; fixes bug 40496.
Specifically, every time a guard moves into or out of state
GUARD_REACHABLE_MAYBE, it is an opportunity for the guard reachability
state to get out of sync with the have-minimum-dir-info state.
Fixes even more of #40396.
When we try to fetch a bridge descriptor and we fail, we mark
the guard as failed, but we never scheduled a re-compute for
router_have_minimum_dir_info().
So if we had already decided we needed to wait for this new descriptor,
we would just wait forever -- even if, counterintuitively, *losing* the
bridge is just what we need to *resume* using the network, if we had it
in state GUARD_REACHABLE_MAYBE and we were stalling to learn this outcome.
See bug 40396 for more details.
The bridge descriptor fetching codes ends up fetching a lot of duplicate
bridge descriptors, because this is how we learn when the descriptor
changes.
This commit only changes comments plus whether we log that one line.
It moves us back to the old behavior, before the previous commit for
30496, where we would only log that line when the bridge descriptor
we're talking about is better than the one we already had (if any).
Without this change, if we have a working bridge, and we add a new bridge,
we will schedule the fetch attempt for that new bridge descriptor for
three hours(!) in the future.
This change is especially needed because of bug #40396, where if you have
one working bridge and one bridge whose descriptor you haven't fetched
yet, your Tor will stall until you have successfully fetched that new
descriptor -- in this case for hours.
In the old design, we would put off all further bridge descriptor fetches
once we had any working bridge descriptor. In this new design, we make the
decision per bridge based on whether we successfully got *its* descriptor.
To make this work, we need to also call learned_bridge_descriptor() every
time we get a bridge descriptor, not just when it's a novel descriptor.
Fixes bug 40396.
Also happens to fix bug 40495 (redundant descriptor fetches for every
bridge) since now we delay fetches once we succeed.
A side effect of this change is that if we have any configured bridges
that *aren't* working, we will keep trying to fetch their descriptors
on the modern directory retry schedule -- every couple of seconds for
the first half minute, then backing off after that -- which is a lot
faster than before.
When this method is in place, then any relay which is assigned
MiddleOnly has Exit, V2Dir, Guard, and HSDir cleared
(and has BadExit set if appropriate).
This proposal implements part of Prop335; it's based on a patch
from Neel Chauhan.
When configured to do so, authorities will assign a MiddleOnly flag
to certain relays. Any relay which an authority gives this flag
will not get Exit, V2Dir, Guard, or HSDir, and might get BadExit if
the authority votes for that one.
We keep it around until libevent is fixed, it should be used again. In
the meantime, avoid the compiler to complain of this unused variable.
https://gitlab.torproject.org/dgoulet/tor/-/jobs/43358#L1522
Signed-off-by: David Goulet <dgoulet@torproject.org>
When we looked, this was the third most frequent message at
PROTOCOL_WARN, and doesn't actually tell us what to do about it.
Now:
* we just log it at info
* we log it only once per circuit
* we report, in the heartbeat, how many times it happens, how many
cells it happens with per circuit, and how long these circuits
have been alive (on average).
Fixes the final part of #40400.
We don't output per-type DNS errors anymore so avoid looping over the
DNS query type and output each errors for them. Before this commit, it
created 3x the same message because we had A, AAAA and PTR type records.
Fix on previous commit e7abab8782
Signed-off-by: David Goulet <dgoulet@torproject.org>
This is due to the libevent bug
https://github.com/libevent/libevent/issues/1219 that fails to return
back the DNS record type on error.
And so, the MetricsPort now only reports the errors as a global counter
and not a per record type.
Closes#40490
Signed-off-by: David Goulet <dgoulet@torproject.org>
With this commit, we will only report a general overload state if we've
seen more than X% of DNS timeout errors over Y seconds. Previous
behavior was to report when a single timeout occured which is really too
small of a threshold.
The value X is a consensus parameters called
"overload_dns_timeout_scale_percent" which is a scaled percentage
(factor of 1000) so we can represent decimal points for X like 0.5% for
instance. Its default is 1000 which ends up being 1%.
The value Y is a consensus parameters called
"overload_dns_timeout_period_secs" which is the time period for which
will gather DNS errors and once over, we assess if that X% has been
reached ultimately triggering a general overload signal.
Closes#40491
Signed-off-by: David Goulet <dgoulet@torproject.org>
With this commit, we will only report a general overload state if we've
seen more than X% of DNS timeout errors over Y seconds. Previous
behavior was to report when a single timeout occured which is really too
small of a threshold.
The value X is a consensus parameters called
"overload_dns_timeout_scale_percent" which is a scaled percentage
(factor of 1000) so we can represent decimal points for X like 0.5% for
instance. Its default is 1000 which ends up being 1%.
The value Y is a consensus parameters called
"overload_dns_timeout_period_secs" which is the time period for which
will gather DNS errors and once over, we assess if that X% has been
reached ultimately triggering a general overload signal.
Closes#40491
Signed-off-by: David Goulet <dgoulet@torproject.org>
There is no absolute install path that an app can expect data files on
Android. Everything is expected to be a path inside of the app, and those
paths depend on the Application ID of the app. /data/local/tmp is
guaranteed to exist, but will only be usable by the 'shell' and 'root'
users, so this fallback is for debugging only.
This fixes a reproducible issue where the tor-android build harness ended
up including build paths for SHARE_DATADIR.
https://gitlab.torproject.org/tpo/core/tor/-/merge_requests/460
This means that at this commit, tor will stop logging that v2 is
deprecated and treat a v2 address as a bad hostname that we can't use.
Part of #40476
Signed-off-by: David Goulet <dgoulet@torproject.org>
Now that we don't have version 2, it gives us:
[warn] HiddenServiceVersion must be between 3 and 3, not 2.
This commit changes it to:
[warn] HiddenServiceVersion must be 3, not 2.
Part of #40476
Signed-off-by: David Goulet <dgoulet@torproject.org>
Some tests were removed because they were testing something not usable
anymore.
Some tests remains to make sure that things are indeed disabled.
Part of #40476
Signed-off-by: David Goulet <dgoulet@torproject.org>
Now that we don't have version 2, it gives us:
[warn] HiddenServiceVersion must be between 3 and 3, not 2.
This commit changes it to:
[warn] HiddenServiceVersion must be 3, not 2.
Part of #40476
Signed-off-by: David Goulet <dgoulet@torproject.org>
Some tests were removed because they were testing something not usable
anymore.
Some tests remains to make sure that things are indeed disabled.
Part of #40476
Signed-off-by: David Goulet <dgoulet@torproject.org>
Relay do not accept both stores and lookups of version 2 descriptor.
This effectively disable version 2 HSDir supports for relays.
Part of #40476
Signed-off-by: David Goulet <dgoulet@torproject.org>
Now that we don't have version 2, it gives us:
[warn] HiddenServiceVersion must be between 3 and 3, not 2.
This commit changes it to:
[warn] HiddenServiceVersion must be 3, not 2.
Part of #40476
Signed-off-by: David Goulet <dgoulet@torproject.org>
Some tests were removed because they were testing something not usable
anymore.
Some tests remains to make sure that things are indeed disabled.
Part of #40476
Signed-off-by: David Goulet <dgoulet@torproject.org>
Relay do not accept both stores and lookups of version 2 descriptor.
This effectively disable version 2 HSDir supports for relays.
Part of #40476
Signed-off-by: David Goulet <dgoulet@torproject.org>
Upon receiving a v2 introduction request, the relay will close the
circuit and send back a tor protocol error.
Part of #40476
Signed-off-by: David Goulet <dgoulet@torproject.org>
The minimum service version is raised from 2 to 3 which effectively
disable loading or creating an onion service v2.
As for ADD_ONION, for version 2, a 551 error is returned:
"551 Failed to add Onion Service"
Part of #40476
Signed-off-by: David Goulet <dgoulet@torproject.org>
This effectively turns off the ability of tor to use HSv2 as a client by
invalidating the v2 onion hostname passed through a SOCKS request.
Part of #40476
Signed-off-by: David Goulet <dgoulet@torproject.org>
Values greater than 100 would have had the same effect as 100, so
this doesn't actually change Tor's behavior; it just makes the
intent clearer. Fixes#40486; see also torspec#66.
This is the loudest of our LOG_PROTOCOL_WARN messages, it can occur
naturally, and there doesn't seem to be a great response to it.
Partial fix for 40400; bugfix on 0.1.1.13-alpha.
This one happens every time we get a failure from
circuit_receive_relay_cell -- but for all the relevant failing cases
in that function, we already log in that function.
This resolves one case of #40400. Two cases remain.
Series 0.4.2.x, 0.4.3.x and 0.4.4.x will all be rejected at the
authority level at this commit.
Futhermore, the 0.4.5.x alphas and rc will also be rejected.
Closes#40480
Signed-off-by: David Goulet <dgoulet@torproject.org>
Coverity report: CID 1492322
________________________________________________________________________________________________________
*** CID 1492322: Integer handling issues (OVERFLOW_BEFORE_WIDEN)
/src/core/or/congestion_control_flow.c: 399 in circuit_process_stream_xon()
393 }
394
395 log_info(LD_EDGE, "Got XON: %d", xon->kbps_ewma);
396
397 /* Adjust the token bucket of this edge connection with the drain rate in
398 * the XON. Rate is in bytes from kilobit (kpbs). */
>>> CID 1492322: Integer handling issues (OVERFLOW_BEFORE_WIDEN)
>>> Potentially overflowing expression "xon_cell_get_kbps_ewma(xon) * 1000U" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned).
399 uint64_t rate = xon_cell_get_kbps_ewma(xon) * 1000;
400 if (rate == 0 || INT32_MAX < rate) {
401 /* No rate. */
402 rate = INT32_MAX;
403 }
404 token_bucket_rw_adjust(&conn->bucket, (uint32_t) rate, (uint32_t) rate);
Fixes#40478
Signed-off-by: David Goulet <dgoulet@torproject.org>
Fixes issue #22469 where port strings such as '0x00' get accepted, not
because the string gets converted to hex, but because the string is
silently truncated past the invalid character 'x'. This also causes
issues for strings such as '0x01-0x02' which look like a hex port range,
but in reality gets truncated to '0', which is definitely not what a
user intends.
Warn and reject such port strings as invalid.
Also, since we're throwing that "malformed port" warning a lot in the
function, wrap it up in a nice goto.
Fixes#22469
The connection_ap_attach_pending() function processes all pending
streams in the pending_entry_connections list. It first copy the pointer
and then allocates a brand new empty list.
It then iterates over that copy pointer to try to attach entry
connections onto any fitting circuits using
connection_ap_handshake_attach_circuit().
That very function, for onion service, can lead to flagging _all_
streams of the same onion service to be put in state RENDDESC_WAIT from
CIRCUIT_WAIT. By doing so, it also tries to remove them from the
pending_entry_connections but at that point it is already empty.
Problem is that the we are iterating over the previous
pending_entry_connections which contains the streams that have just
changed state and are no longer in CIRCUIT_WAIT.
This lead to this bug warning occuring a lot on busy services:
May 01 08:55:43.000 [warn] connection_ap_attach_pending(): Bug:
0x55d8764ae550 is no longer in circuit_wait. Its current state is
waiting for rendezvous desc. Why is it on pending_entry_connections?
(on Tor 0.4.4.0-alpha-dev )
This fix is minimal and basically allow a state to be not CIRCUIT_WAIT
and move on to the next one without logging a warning. Because the
pending_entry_connections is emptied before processing, there is no
chance for a streams to be stuck there forever thus it is OK to ignore
streams not in the right state.
Fixes#34083
Signed-off-by: David Goulet <dgoulet@torproject.org>