This is the loudest of our LOG_PROTOCOL_WARN messages, it can occur
naturally, and there doesn't seem to be a great response to it.
Partial fix for 40400; bugfix on 0.1.1.13-alpha.
This one happens every time we get a failure from
circuit_receive_relay_cell -- but for all the relevant failing cases
in that function, we already log in that function.
This resolves one case of #40400. Two cases remain.
Series 0.4.2.x, 0.4.3.x and 0.4.4.x will all be rejected at the
authority level at this commit.
Futhermore, the 0.4.5.x alphas and rc will also be rejected.
Closes#40480
Signed-off-by: David Goulet <dgoulet@torproject.org>
The connection_ap_attach_pending() function processes all pending
streams in the pending_entry_connections list. It first copy the pointer
and then allocates a brand new empty list.
It then iterates over that copy pointer to try to attach entry
connections onto any fitting circuits using
connection_ap_handshake_attach_circuit().
That very function, for onion service, can lead to flagging _all_
streams of the same onion service to be put in state RENDDESC_WAIT from
CIRCUIT_WAIT. By doing so, it also tries to remove them from the
pending_entry_connections but at that point it is already empty.
Problem is that the we are iterating over the previous
pending_entry_connections which contains the streams that have just
changed state and are no longer in CIRCUIT_WAIT.
This lead to this bug warning occuring a lot on busy services:
May 01 08:55:43.000 [warn] connection_ap_attach_pending(): Bug:
0x55d8764ae550 is no longer in circuit_wait. Its current state is
waiting for rendezvous desc. Why is it on pending_entry_connections?
(on Tor 0.4.4.0-alpha-dev )
This fix is minimal and basically allow a state to be not CIRCUIT_WAIT
and move on to the next one without logging a warning. Because the
pending_entry_connections is emptied before processing, there is no
chance for a streams to be stuck there forever thus it is OK to ignore
streams not in the right state.
Fixes#34083
Signed-off-by: David Goulet <dgoulet@torproject.org>
When building with --enable-fragile-hardening, add or relax Linux
seccomp rules to allow AddressSanitizer to execute normally if the
process terminates with the sandbox active.
Further resolves issue 11477.
When a directory request fails, we flag the relay as non Running so we
don't use it anymore.
This can be problematic with onion services because there are cases
where a tor instance could have a lot of services, ephemeral ones, and
keeps failing to upload descriptors, let say due to a bad network, and
thus flag a lot of nodes as non Running which then in turn can not be
used for circuit building.
This commit makes it that we never flag nodes as non Running on a onion
service directory request (upload or fetch) failure as to keep the
hashring intact and not affect other parts of tor.
Fortunately, the onion service hashring is _not_ selected by looking at
the Running flag but since we do a 3-hop circuit to the HSDir, other
services on the same instance can influence each other by removing nodes
from the consensus for path selection.
This was made apparent with a small network that ran out of nodes to
used due to rapid succession of onion services uploading and failing.
See #40434 for details.
Fixes#40434
Signed-off-by: David Goulet <dgoulet@torproject.org>
Fixes bug 40078.
As reported by hdevalence our batch verification logic can cause an assert
crash.
The assert happens because when the batch verification of ed25519-donna fails,
the code in `ed25519_checksig_batch()` falls back to doing a single
verification for each signature.
The crash occurs because batch verification failed, but then all signatures
individually verified just fine.
That's because batch verification and single verification use a different
equation which means that there are sigs that can pass single verification
but fail batch verification.
Fixing this would require modding ed25519-donna which is not in scope for
this ticket, and will be soon deprecated in favor of arti and
ed25519-dalek, so my branch instead removes batch verification.
Continue having a tor_gmtime_impl() unit test so that we can detect
any problems in our replacement function; add a new test function to
make sure that gmtime<->timegm are a round-trip on now-ish times.
This is a fix for bug #40383, wherein we ran into trouble because
tor_timegm() does not believe that time_t should include a count of
leap seconds, but FreeBSD's gmtime believes that it should. This
disagreement meant that for a certain amount of time each day,
instead of calculating the most recent midnight, our voting-schedule
functions would calculate the second-most-recent midnight, and lead
to an assertion failure.
I am calling this a bugfix on 0.2.0.3-alpha when we first started
calculating our voting schedule in this way.
This patch enables the deterministic RNG for address set tests,
including the tests which uses address set indirectly via the nodelist
API.
This should prevent random test failures in the highly unlikely case of
a false positive which was seen in tor#40419.
See: tpo/core/tor#40419.
This issue was reported by Jann Horn part of Google's Project Zero.
Jann's one-sentence summary: entry/middle relays can spoof RELAY_END cells on
half-closed streams, which can lead to stream confusion between OP and
exit.
Fixes#40389
Previously, we would detect errors from a missing RNG
implementation, but not failures from the RNG code itself.
Fortunately, it appears those failures do not happen in practice
when Tor is using OpenSSL's default RNG implementation. Fixes bug
40390; bugfix on 0.2.8.1-alpha. This issue is also tracked as
TROVE-2021-004. Reported by Jann Horn at Google's Project Zero.
Cached_dir_t is a somewhat "legacy" kind of storage when used for
consensus documents, and it appears that there are cases when
changing our settings causes us to stop updating those entries.
This can cause trouble, as @arma found out in #40375, where he
changed his settings around, and consensus diff application got
messed up: consensus diffs were being _requested_ based on the
latest consensus, but were being (incorrectly) applied to a
consensus that was no longer the latest one.
This patch is a minimal fix for backporting purposes: it has Tor do
the same search when applying consensus diffs as we use to request
them. This should be sufficient for correct behavior.
There's a similar case in GETINFO handling; I've fixed that too.
Fixes#40375; bugfix on 0.3.1.1-alpha.
When seccomp sandbox is active, SAVECONF failed because it was not
able to save the backup files for torrc. This commit simplifies
the implementation of SAVECONF and sandbox by making it keep only
one backup of the configuration file.
The connection type for the listener part was missing from the "is
connection a listener" function.
This lead to our periodic event that retries our listeners to keep
trying to bind() again on an already opened MetricsPort.
Closes#40370
Signed-off-by: David Goulet <dgoulet@torproject.org>
This change permits the newfstatat() system call, and fixes issues
40382 (and 40381).
This isn't a free change. From the commit:
// Libc 2.33 uses this syscall to implement both fstat() and stat().
//
// The trouble is that to implement fstat(fd, &st), it calls:
// newfstatat(fs, "", &st, AT_EMPTY_PATH)
// We can't detect this usage in particular, because "" is a pointer
// we don't control. And we can't just look for AT_EMPTY_PATH, since
// AT_EMPTY_PATH only has effect when the path string is empty.
//
// So our only solution seems to be allowing all fstatat calls, which
// means that an attacker can stat() anything on the filesystem. That's
// not a great solution, but I can't find a better one.
As of GCC 11.1.1, the compiler warns us about code like this:
if (a)
b;
c;
and that's a good thing: we wouldn't want to "goto fail". But we
had an instance if this in circuituse.c, which was making our
compilation sad.
Fixes bug 40380; bugfix on 0.3.0.1-alpha.
Turns out that passing client authorization keys to ADD_ONION for v3 was
not working because we were not setting the "is_client_auth_enabled"
flag to true once the clients were configured. This lead to the
descriptor being encoded without the clients.
This patch removes that flag and instead adds an inline function that
can be used to check if a given service has client authorization
enabled.
This will be much less error prone of needing to keep in sync the client
list and a flag instead.
Fixes#40378
Signed-off-by: David Goulet <dgoulet@torproject.org>
This function has been a no-op since Libevent 2.0.4-alpha, when
libevent got an arc4random() implementation. Libevent has finally
removed it, which will break our compilation unless we stop calling
it. (This is currently breaking compilation in OSS-fuzz.)
Closes#40371.
This is related to ticket #40360 which found this problem when a Bridge entry
with a transport name (let say obfs4) is set without a fingerprint:
Bridge obfs4 <IP>:<PORT> cert=<...> iat-mode=0
(Notice, no fingerprint between PORT and "cert=")
Problem: commit 09c6d03246 added a check in
get_sampled_guard_for_bridge() that would return NULL if the selected bridge
did not have a valid transport name (that is the Bridge transport name that
corresponds to a ClientTransportPlugin).
Unfortuantely, this function is also used when selecting our eligible guards
which is done *before* the transport list is populated and so the added check
for the bridge<->transport name is querying an empty list of transports
resulting in always returning NULL.
For completion, the logic is: Pick eligible guards (use bridge(s) if need be)
then for those, initiate a connection to the pluggable transport proxy and
then populate the transport list once we've connected.
Back to get_sampled_guard_for_bridge(). As said earlier, it is used when
selecting our eligible guards in a way that prevents us from selecting
duplicates. In other words, if that function returns non-NULL, the selection
continues considering the bridge was sampled before. But if it returns NULL,
the relay is added to the eligible list.
This bug made it that our eligible guard list was populated with the *same*
bridge 3 times like so (remember no fingerprint):
[info] entry_guards_update_primary(): Primary entry guards have changed. New primary guard list is:
[info] entry_guards_update_primary(): 1/3: [bridge] ($0000000000000000000000000000000000000000)
[info] entry_guards_update_primary(): 2/3: [bridge] ($0000000000000000000000000000000000000000)
[info] entry_guards_update_primary(): 3/3: [bridge] ($0000000000000000000000000000000000000000)
When tor starts, it will find the bridge fingerprint by connecting to it and
will then update the primary guard list by calling
entry_guard_learned_bridge_identity() which then goes and update only 1 single
entry resulting in this list:
[debug] sampled_guards_update_consensus_presence(): Sampled guard [bridge] ($<FINGERPRINT>) is still listed.
[debug] sampled_guards_update_consensus_presence(): Sampled guard [bridge] ($0000000000000000000000000000000000000000) is still listed.
[debug] sampled_guards_update_consensus_presence(): Sampled guard [bridge] ($0000000000000000000000000000000000000000) is still listed.
And here lies the problem, now tor is stuck attempting to wait for a valid
descriptor for at least 2 guards where the second one is a bunch of zeroes and
thus tor will never fully bootstraps:
[info] I learned some more directory information, but not enough to build a
circuit: We're missing descriptors for 1/2 of our primary entry guards
(total microdescriptors: 6671/6703). That's ok. We will try to fetch missing
descriptors soon.
Now, why passing the fingerprint then works? This is because the list of
guards contains 3 times the same bridge but they all have a fingerprint and so
the descriptor can be found and tor can bootstraps.
The solution here is to entirely remove the transport name check in
get_sampled_guard_for_bridge() since the transport_list is empty at that
point. That way, the eligible guard list only gets 1 entry, the bridge, and
can then go on to bootstrap properly.
It is OK to do so since when launching a bridge descriptor fetch, we validate
that the bridge transport name is OK and thus avoid connecting to a bridge
without a ClientTransportPlugin. If we wanted to keep the check in place, we
would need to populate the transport_list much earlier and this would require
a much bigger refactoring.
Fixes#40360
Signed-off-by: David Goulet <dgoulet@torproject.org>
When seccomp sandbox is active, SAVECONF failed because it was not
able to save the backup files for torrc. This commit simplifies
the implementation of SAVECONF and sandbox by making it keep only
one backup of the configuration file.
In versions <=2.69, according to the autoconf docs, AC_PROG_CC_C99
is needed with some compilers, if they require extra arguments to
build C99 programs. In versions >=2.70, AC_PROG_CC checks for these
compilers automatically, and so the AC_PROG_CC_C99 macro is
obsolete.
So, what can you do if you want your script to work right with both
autoconf versions? IIUC, neither including AC_PROG_CC_C99 macro nor
leaving it out will give you the right behavior with both versions.
It looks like you need to look at the autoconf version explicitly.
(Now, the autoconf manual implies that it's "against autoconf
philosophy" to look at the autoconf version rather than trying the
behavior to see if it works, but they don't actually tell you how to
detect recoverably at autoconf-time whether a macro is obsolete or
not, and I can't find a way to do that.)
So, is it safe to use m4_version_prereq, like I do here? It isn't
listed in the autoconf 2.63 manual (which is the oldest version we
support). But a mailing list message [1] (which added the
documentation back in 2008) implies that m4_version_prereq has been
there since "at least back to autoconf 2.59".
https://lists.gnu.org/archive/html/autoconf-patches/2008-12/msg00025.html
So I think this will work.
I am basing this patch against Tor 0.3.5 since, if autoconf 2.70
becomes widespread before 0.3.5 is unsupported, we might need this
patch to continue 0.3.5 development. But I don't think we should
backport farther than 0.4.5 until/unless that actually happens.
This is part of a fix for #40355.
On Linux systems, glob automatically ignores the errors ENOENT and
ENOTDIR because they are expected during glob expansion. But BSD
systems do not ignore these, resulting in glob failing when globs
expand to invalid paths. This is fixed by adding a custom error
handler that ignores only these two errors and removing the
GLOB_ERR flag as it makes glob fail even if the error handler
ignores the error and is unnecessary as the error handler will
make glob fail on all other errors anyway.
Fortunately, our tor_free() is setting the variable to NULL after so we were
in a situation where NULL was always used instead of the transport name.
This first appeared in 894ff2dc84 and results in
basically no bridge with a transport being able to use DoS defenses.
Fixes#40345
Signed-off-by: David Goulet <dgoulet@torproject.org>
Clients now check whether their streams are attempting to re-enter
the Tor network (i.e. to send Tor traffic over Tor), and they close
them preemptively if they think exit relays will refuse them.
See bug 2667 for details. Resolves ticket 40271.
- Implement overload statistics structure.
- Implement function that keeps track of overload statistics.
- Implement function that writes overload statistics to descriptor.
- Unittest for the whole logic.
This option changes the time for which a bandwidth measurement period
must have been in progress before we include it when reporting our
observed bandwidth in our descriptors. Without this option, we only
consider a time period towards our maximum if it has been running
for a full day. Obviously, that's unacceptable for testing
networks, where we'd like to get results as soon as possible.
For non-testing networks, I've put a (somewhat arbitrary) 2-hour
minimum on the option, since there are traffic analysis concerns
with immediate reporting here.
Closes#40337.
We were looking for the first instance of "directory-signature "
when instead the correct behavior is to look for the first instance
of "directory-signature " at the start of a line.
Unfortunately, this can be exploited as to crash authorities while
they're voting.
Fixes#40316; bugfix on 0.2.2.4-alpha. This is TROVE-2021-002,
also tracked as CVE-2021-28090.
When reloading a service, we can re-register a service and thus end up again
in the metrics store initialization code path which is fine. No need to BUG()
anymore.
Fixes#40334
Signed-off-by: David Goulet <dgoulet@torproject.org>
Use find_str_at_start_of_line(), not strstr() here: we don't want
to match "MemTotal: " if it appears in the middle of a line.
Fixes#40315; bugfix on 0.2.5.4-alpha.
The directory_fetches_from_authorities() is used to know if a client or relay
should fetch data from an authority early in the boot process.
We had a condition in that function that made a relay trigger that fetch if it
didn't know its address (so we can learn it). However, when this is called,
the address discovery has not been done yet so it would always return true for
a relay.
Furthermore, it would always trigger a log notice that the IPv4 couldn't be
found which was inevitable because the address discovery process has not been
done yet (done when building our first descriptor).
It is also important to point out that starting in 0.4.5.1-alpha, asking an
authority for an address is done during address discovery time using a one-hop
circuit thus independent from the relay deciding to fetch or not documents
from an authority.
Small fix also is to reverse the "IPv(4|6)Only" flag in the notice so that if
we can't find IPv6 it would output to use IPv4Only.
Fixes#40300
Signed-off-by: David Goulet <dgoulet@torproject.org>
Fix a bug introduced in 94b56eaa75 which
overwrite the connection message line.
Furthermore, improve how we generate that line by using a smartlist and change
the format so it is clearer of what is being rejected/detected and, if
applicable, which option is disabled thus yielding no stats.
Closes#40308
Signed-off-by: David Goulet <dgoulet@torproject.org>
This is a new detection type which is that a relay can now control the rate of
client connections from a single address.
The mechanism is pretty simple, if the rate/burst is reached, the address is
marked for a period of time and any connection from that address is denied.
Closes#40253
Signed-off-by: David Goulet <dgoulet@torproject.org>
When trying to find our address to publish, we would log notice if we couldn't
find it from the cache but then we would look at the suggested cache (which
contains the address from the authorities) in which we might actually have the
address.
Thus that log notice was misplaced. Move it down after the suggested address
cache lookup.
Closes#40300
Signed-off-by: David Goulet <dgoulet@torproject.org>
It can be called with strings that should have been
length-delimited, but which in fact are not. This can cause a
CPU-DoS bug or, in a worse case, a crash.
Since this function isn't essential, the best solution for older
Tors is to just turn it off.
Fixes bug 40286; bugfix on 0.2.2.1-alpha when dump_desc() was
introduced.
Now that exit relays don't allow exit connections to directory authority
DirPorts, the follow-up step is to make directory authorities stop doing
DirPort reachability checks.
Fixes#40287
Signed-off-by: David Goulet <dgoulet@torproject.org>
Turns out, we forgot to add the METRICS connection type fo the finished
flushing handler.
Fixes#40295
Signed-off-by: David Goulet <dgoulet@torproject.org>
We were just looking at the family which is not correct because it is possible
to have two explicit ORPort for the same family but different addresses. One
example is:
ORPort 127.0.0.1:9001 NoAdvertise
ORPort 1.2.3.4:9001 NoListen
Thus, this patch now ignores ports that have different addresses iff they are
both explicits. That is, if we have this example, also two different
addresses:
ORPort 9001
ORPort 127.0.0.1:9001 NoAdvertise
The first one is implicit and second one is explicit and thus we have to
consider them for removal which in this case would remove the "ORPort 9001" in
favor of the second port.
Fixes#40289
Signe-off-by: David Goulet <dgoulet@torproject.org>
In other words, if PublishServerDescriptor is set to 0 and AssumeReachable to
1, then allow a relay to hold a RFC1918 address.
Reasons for this are documented in #40208Fixes#40208
Signed-off-by: David Goulet <dgoulet@torproject.org>
Handle the EOF situation for a metrics connection. Furthermore, if we failed
to fetch the data from the inbuf properly, mark the socket as closed because
the caller, connection_process_inbuf(), assumes that we did so on error.
Fixes#40257
Signed-off-by: David Goulet <dgoulet@torproject.org>
Previously we would warn in this case... but there's really no
justification for doing so, and it can only cause confusion.
Fixes bug #40281; bugfix on 0.4.0.1-alpha.
In two instances we must look at this flag:
1. When we build the descriptor so the IPv6 is NOT added to the descriptor in
case we judge that we need to omit the address but still publish.
2. When we are deciding if the descriptor is publishable. This flags tells us
that the IPv6 was not found reachable but we should still publish.
Fixes#40279
Signed-off-by: David Goulet <dgoulet@torproject.org>