This one happens every time we get a failure from
circuit_receive_relay_cell -- but for all the relevant failing cases
in that function, we already log in that function.
This resolves one case of #40400. Two cases remain.
The connection_ap_attach_pending() function processes all pending
streams in the pending_entry_connections list. It first copy the pointer
and then allocates a brand new empty list.
It then iterates over that copy pointer to try to attach entry
connections onto any fitting circuits using
connection_ap_handshake_attach_circuit().
That very function, for onion service, can lead to flagging _all_
streams of the same onion service to be put in state RENDDESC_WAIT from
CIRCUIT_WAIT. By doing so, it also tries to remove them from the
pending_entry_connections but at that point it is already empty.
Problem is that the we are iterating over the previous
pending_entry_connections which contains the streams that have just
changed state and are no longer in CIRCUIT_WAIT.
This lead to this bug warning occuring a lot on busy services:
May 01 08:55:43.000 [warn] connection_ap_attach_pending(): Bug:
0x55d8764ae550 is no longer in circuit_wait. Its current state is
waiting for rendezvous desc. Why is it on pending_entry_connections?
(on Tor 0.4.4.0-alpha-dev )
This fix is minimal and basically allow a state to be not CIRCUIT_WAIT
and move on to the next one without logging a warning. Because the
pending_entry_connections is emptied before processing, there is no
chance for a streams to be stuck there forever thus it is OK to ignore
streams not in the right state.
Fixes#34083
Signed-off-by: David Goulet <dgoulet@torproject.org>
When building with --enable-fragile-hardening, add or relax Linux
seccomp rules to allow AddressSanitizer to execute normally if the
process terminates with the sandbox active.
Further resolves issue 11477.
When a directory request fails, we flag the relay as non Running so we
don't use it anymore.
This can be problematic with onion services because there are cases
where a tor instance could have a lot of services, ephemeral ones, and
keeps failing to upload descriptors, let say due to a bad network, and
thus flag a lot of nodes as non Running which then in turn can not be
used for circuit building.
This commit makes it that we never flag nodes as non Running on a onion
service directory request (upload or fetch) failure as to keep the
hashring intact and not affect other parts of tor.
Fortunately, the onion service hashring is _not_ selected by looking at
the Running flag but since we do a 3-hop circuit to the HSDir, other
services on the same instance can influence each other by removing nodes
from the consensus for path selection.
This was made apparent with a small network that ran out of nodes to
used due to rapid succession of onion services uploading and failing.
See #40434 for details.
Fixes#40434
Signed-off-by: David Goulet <dgoulet@torproject.org>
Fixes bug 40078.
As reported by hdevalence our batch verification logic can cause an assert
crash.
The assert happens because when the batch verification of ed25519-donna fails,
the code in `ed25519_checksig_batch()` falls back to doing a single
verification for each signature.
The crash occurs because batch verification failed, but then all signatures
individually verified just fine.
That's because batch verification and single verification use a different
equation which means that there are sigs that can pass single verification
but fail batch verification.
Fixing this would require modding ed25519-donna which is not in scope for
this ticket, and will be soon deprecated in favor of arti and
ed25519-dalek, so my branch instead removes batch verification.
Continue having a tor_gmtime_impl() unit test so that we can detect
any problems in our replacement function; add a new test function to
make sure that gmtime<->timegm are a round-trip on now-ish times.
This is a fix for bug #40383, wherein we ran into trouble because
tor_timegm() does not believe that time_t should include a count of
leap seconds, but FreeBSD's gmtime believes that it should. This
disagreement meant that for a certain amount of time each day,
instead of calculating the most recent midnight, our voting-schedule
functions would calculate the second-most-recent midnight, and lead
to an assertion failure.
I am calling this a bugfix on 0.2.0.3-alpha when we first started
calculating our voting schedule in this way.