This is the loudest of our LOG_PROTOCOL_WARN messages, it can occur
naturally, and there doesn't seem to be a great response to it.
Partial fix for 40400; bugfix on 0.1.1.13-alpha.
This one happens every time we get a failure from
circuit_receive_relay_cell -- but for all the relevant failing cases
in that function, we already log in that function.
This resolves one case of #40400. Two cases remain.
Series 0.4.2.x, 0.4.3.x and 0.4.4.x will all be rejected at the
authority level at this commit.
Futhermore, the 0.4.5.x alphas and rc will also be rejected.
Closes#40480
Signed-off-by: David Goulet <dgoulet@torproject.org>
The connection_ap_attach_pending() function processes all pending
streams in the pending_entry_connections list. It first copy the pointer
and then allocates a brand new empty list.
It then iterates over that copy pointer to try to attach entry
connections onto any fitting circuits using
connection_ap_handshake_attach_circuit().
That very function, for onion service, can lead to flagging _all_
streams of the same onion service to be put in state RENDDESC_WAIT from
CIRCUIT_WAIT. By doing so, it also tries to remove them from the
pending_entry_connections but at that point it is already empty.
Problem is that the we are iterating over the previous
pending_entry_connections which contains the streams that have just
changed state and are no longer in CIRCUIT_WAIT.
This lead to this bug warning occuring a lot on busy services:
May 01 08:55:43.000 [warn] connection_ap_attach_pending(): Bug:
0x55d8764ae550 is no longer in circuit_wait. Its current state is
waiting for rendezvous desc. Why is it on pending_entry_connections?
(on Tor 0.4.4.0-alpha-dev )
This fix is minimal and basically allow a state to be not CIRCUIT_WAIT
and move on to the next one without logging a warning. Because the
pending_entry_connections is emptied before processing, there is no
chance for a streams to be stuck there forever thus it is OK to ignore
streams not in the right state.
Fixes#34083
Signed-off-by: David Goulet <dgoulet@torproject.org>
When building with --enable-fragile-hardening, add or relax Linux
seccomp rules to allow AddressSanitizer to execute normally if the
process terminates with the sandbox active.
Further resolves issue 11477.
When a directory request fails, we flag the relay as non Running so we
don't use it anymore.
This can be problematic with onion services because there are cases
where a tor instance could have a lot of services, ephemeral ones, and
keeps failing to upload descriptors, let say due to a bad network, and
thus flag a lot of nodes as non Running which then in turn can not be
used for circuit building.
This commit makes it that we never flag nodes as non Running on a onion
service directory request (upload or fetch) failure as to keep the
hashring intact and not affect other parts of tor.
Fortunately, the onion service hashring is _not_ selected by looking at
the Running flag but since we do a 3-hop circuit to the HSDir, other
services on the same instance can influence each other by removing nodes
from the consensus for path selection.
This was made apparent with a small network that ran out of nodes to
used due to rapid succession of onion services uploading and failing.
See #40434 for details.
Fixes#40434
Signed-off-by: David Goulet <dgoulet@torproject.org>
Fixes bug 40078.
As reported by hdevalence our batch verification logic can cause an assert
crash.
The assert happens because when the batch verification of ed25519-donna fails,
the code in `ed25519_checksig_batch()` falls back to doing a single
verification for each signature.
The crash occurs because batch verification failed, but then all signatures
individually verified just fine.
That's because batch verification and single verification use a different
equation which means that there are sigs that can pass single verification
but fail batch verification.
Fixing this would require modding ed25519-donna which is not in scope for
this ticket, and will be soon deprecated in favor of arti and
ed25519-dalek, so my branch instead removes batch verification.
Continue having a tor_gmtime_impl() unit test so that we can detect
any problems in our replacement function; add a new test function to
make sure that gmtime<->timegm are a round-trip on now-ish times.
This is a fix for bug #40383, wherein we ran into trouble because
tor_timegm() does not believe that time_t should include a count of
leap seconds, but FreeBSD's gmtime believes that it should. This
disagreement meant that for a certain amount of time each day,
instead of calculating the most recent midnight, our voting-schedule
functions would calculate the second-most-recent midnight, and lead
to an assertion failure.
I am calling this a bugfix on 0.2.0.3-alpha when we first started
calculating our voting schedule in this way.
This patch enables the deterministic RNG for address set tests,
including the tests which uses address set indirectly via the nodelist
API.
This should prevent random test failures in the highly unlikely case of
a false positive which was seen in tor#40419.
See: tpo/core/tor#40419.
This issue was reported by Jann Horn part of Google's Project Zero.
Jann's one-sentence summary: entry/middle relays can spoof RELAY_END cells on
half-closed streams, which can lead to stream confusion between OP and
exit.
Fixes#40389
Previously, we would detect errors from a missing RNG
implementation, but not failures from the RNG code itself.
Fortunately, it appears those failures do not happen in practice
when Tor is using OpenSSL's default RNG implementation. Fixes bug
40390; bugfix on 0.2.8.1-alpha. This issue is also tracked as
TROVE-2021-004. Reported by Jann Horn at Google's Project Zero.