Due to #23662 this can happen under natural causes and does not disturb
the functionality of the service. This is a simple 0.3.2 fix for now,
and we plan to fix this properly in 0.3.3.
On failure to upload, the HS_DESC event would report "UPLOAD_FAILED" as the
Action but it should have reported "FAILED" according to the spec.
Fixes#24230
Signed-off-by: David Goulet <dgoulet@torproject.org>
Commit e67f4441eb introduced a safeguard against
using an uninitialized voting schedule object. However, the dirvote_act() code
was looking roughly at the same thing to know if it had to compute the timings
before voting with this condition:
if (!voting_schedule.voting_starts) {
...
dirvote_recalculate_timing(options, now);
}
The sr_init() function is called very early and goes through the safeguard
thus the voting schedule is always initilized before the first vote.
That first vote is a crucial one because we need to have our voting schedule
aligned to the "now" time we are about to use for voting. Then, the schedule
is updated when we publish our consensus or/and when we set a new consensus.
From that point on, we only want to update the voting schedule through that
code flow.
This "created_on_demand" is indicating that the timings have been recalculated
on demand by another subsystem so if it is flagged, we know that we need to
ignore its values before voting.
Fixes#24186
Signed-off-by: David Goulet <dgoulet@torproject.org>
When we have fewer than 15 descriptors to fetch, we will delay the
fetch for a little while. That's fine, if we can go ahead and build
circuits... but if not, it's a poor choice indeed.
Fixes bug 23985; bugfix on 0.1.1.11-alpha.
In 0.3.0.3-alpha, when we made primary guard descriptors necessary
for circuit building, this situation got worse.
When calculating the fraction of nodes that have descriptors, and all
all nodes in the network have zero bandwidths, count the number of nodes
instead.
Fixes bug 23318; bugfix on 0.2.4.10-alpha.
Back in 0.2.4.3-alpha (e106812a77), when we switched from using
double to using uint64 for selecting by bandwidth, I got the math
wrong: I should have used llround(x), or (uint64_t)(x+0.5), but
instead I wrote llround(x+0.5). That means we would always round
up, rather than rounding to the closest integer
Fixes bug 23318; bugfix on 0.2.4.3-alpha.
The flush cells process can close a channel if the connection write fails but
still return that it flushed at least one cell. This is due because the error
is not propagated up the call stack so there is no way of knowing if the flush
actually was successful or not.
Because this would require an important refactoring touching multiple
subsystems, this patch is a bandaid to avoid the KIST scheduler to handle
closed channel in its loop.
Bandaid on #23751.
Signed-off-by: David Goulet <dgoulet@torproject.org>
dirvote_get_next_valid_after_time() is the only public function that uses the
voting schedule outside of the dirvote subsystem so if it is zeroed,
recalculate its timing if we can that is if a consensus exists.
Part of #24161
Signed-off-by: David Goulet <dgoulet@torproject.org>
Because the HS and SR subsystems can use the voting schedule early (with the
changes in #23623 making the SR subsystem using the static voting schedule
object), we need to recalculate the schedule very early when setting the new
consensus.
Fixes#24161
Signed-off-by: David Goulet <dgoulet@torproject.org>
If it decrypts something that turns out to start with a NUL byte,
then decrypt_desc_layer() will return 0 to indicate the length of
its result. But 0 also indicates an error, which causes the result
not to be freed by decrypt_desc_layer()'s callers.
Since we're trying to stabilize 0.3.2.x, I've opted for the simpler
possible fix here and made it so that an empty decrypted string will
also count as an error.
Fixes bug 24150 and OSS-Fuzz issue 3994.
The original bug was present but unreachable in 0.3.1.1-alpha. I'm
calling this a bugfix on 0.3.2.1-alpha since that's the first version
where you could actually try to decrypt these descriptors.
The node_get_ed25519_id() warning can actually be triggered by a relay flagged
with NoEdConsensus so instead of triggering a warning on all relays of the
network, downgrade it to protocol warning.
Fixes#24025
Signed-off-by: David Goulet <dgoulet@torproject.org>
When a BUG() occurs, this macro will print extra information about the state
of the scheduler and the given channel if any. This will help us greatly to
fix future bugs in the scheduler especially when they occur rarely.
Fixes#23753
Signed-off-by: David Goulet <dgoulet@torproject.org>
They are not yet implemented: they will upload descriptors, but won't be
able to rendezvous, because IPv6 addresses in link specifiers are ignored.
Part of #23820.
The previous version of this function had the following issues:
* it didn't check if the extend_info contained an IPv6 address,
* it didn't check if the ed25519 identity key was valid.
But we can't add IPv6 support in a bugfix release.
Instead, BUG() if the address is an IPv6 address, so we always put IPv4
addresses in link specifiers. And ignore missing ed25519 identifiers,
rather than generating an all-zero link specifier.
This supports v3 hidden services on IPv4, dual-stack, and IPv6, and
v3 single onion services on IPv4 only.
Part of 23820, bugfix on 0.3.2.1-alpha.
When the directory information changes, callback to the HS client subsystem so
it can check if any pending SOCKS connections are waiting for a descriptor. If
yes, attempt a refetch for those.
Fixes#23762
Signed-off-by: David Goulet <dgoulet@torproject.org>
Because the HS subsystem needs the voting schedule to compute time period, we
need all tor type to do that.
Part of #23623
Signed-off-by: David Goulet <dgoulet@torproject.org>
The new decryption function performs no decryption, skips the salt,
and doesn't check the mac. This allows us to fuzz the
hs_descriptor.c code using unencrypted descriptor test, and exercise
more of the code.
Related to 21509.
The exposed get_voting_schedule() allocates and return a new object everytime
it is called leading to an awful lot of memory allocation when getting the
start time of the current round which is done for each node in the consensus.
Closes#23623
Signed-off-by: David Goulet <dgoulet@torproject.org>
If the intro point supports ed25519 link authentication, make sure we don't
have a zeroed key which would lead to a failure to extend to it.
We already check for an empty key if the intro point does not support it so
this makes the check on the key more consistent and symmetric.
Fixes#24002
Signed-off-by: David Goulet <dgoulet@torproject.org>
The previous version of these functions had the following issues:
* they can't supply both the IPv4 and IPv6 addresses in link specifiers,
* they try to fall back to a 3-hop path when the address for a direct
connection is unreachable, but this isn't supported by
launch_rendezvous_point_circuit(), so it fails.
But we can't fix these things in a bugfix release.
Instead, always put IPv4 addresses in link specifiers.
And if a v3 single onion service can't reach any intro points, fail.
This supports v3 hidden services on IPv4, dual-stack, and IPv6, and
v3 single onion services on IPv4 only.
Part of 23820, bugfix on 0.3.2.1-alpha.
The previous version of this function has the following issues:
* it doesn't choose between IPv4 and IPv6 addresses correctly, and
* it doesn't fall back to a 3-hop path when the address for a direct
connection is unreachable.
But we can't fix these things in a bugfix release.
Instead, treat IPv6 addresses like any other unrecognised link specifier
and ignore them. If there is no IPv4 address, return NULL.
This supports v3 hidden services on IPv4, dual-stack, and IPv6, and
v3 single onion services on IPv4 only.
Part of 23820, bugfix on 0.3.2.1-alpha.
Turns out that when reloading a tor configured with hidden service(s), we
weren't copying all the needed information between the old service object to
the new one.
For instance, the desc_is_dirty timestamp wasn't which could lead to the
service uploading its desriptor much later than it would need to.
The replaycache wasn't also moved over and some intro point information as
well.
Fixes#23790
Signed-off-by: David Goulet <dgoulet@torproject.org>
Bridge relays can use it to add a "bridge-distribution-request" line
to their bridge descriptor, which tells BridgeDB how they'd like their
bridge address to be given out.
Implements tickets 18329.
Fixes bug 23908; bugfix on 0.3.1.6-rc when we made the keypin
failure message really long.
Backport from 0.3.2's 771fb7e7ba,
where arma said "get rid of the scary 256-byte-buf landmine".
Create a function that tells us if we can fetch or not the descriptor for the
given service key.
No behavior change. Mostly moving code but with a slight change so the
function can properly work by returning a boolean and also a possible fetch
status code.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When we added HTTPTunnelPort, the answer that we give when you try
to use your SOCKSPort as an HTTP proxy became wrong. Now we explain
that Tor sorta _is_ an HTTP proxy, but a SOCKSPort isn't.
I have left the status line the same, in case anything is depending
on it. I have removed the extra padding for Internet Explorer,
since the message is well over 512 bytes without it.
Fixes bug 23678; bugfix on 0.3.2.1-alpha.
Without this fix, changes from client to bridge don't trigger
transition_affects_workers(), so we would never have actually
initialized the cpuworkers.
Fixes bug 23693. Bugfix on 3bcdb26267 0.2.6.3-alpha, which
fixed bug 14901 in the general case, but not on the case where
public_server_mode() did not change.
Because our monotonic time interface doesn't play well with value set to 0,
always initialize to now() the scheduler_last_run at init() of the KIST
scheduler.
Fixes#23696
Signed-off-by: David Goulet <dgoulet@torproject.org>
When a channel is scheduled and flush cells returns 0 that is no cells to
flush, we flag it back in waiting for cells so it doesn't get stuck in a
possible infinite loop.
It has been observed on moria1 where a closed channel end up in the scheduler
where the flush process returned 0 cells but it was ultimately kept in the
scheduling loop forever. We suspect that this is due to a more deeper problem
in tor where the channel_more_to_flush() is actually looking at the wrong
queue and was returning 1 for an empty channel thus putting the channel in the
"Case 4" of the scheduler which is to go back in pending state thus
re-considered at the next iteration.
This is a fix that allows the KIST scheduler to recover properly from a not
entirelly diagnosed problem in tor.
Fixes#23676
Signed-off-by: David Goulet <dgoulet@torproject.org>
When we added single_conn_free_bytes(), we cleared the outbuf on a
connection without setting outbuf_flushlen() to 0. This could cause
an assertion failure later on in flush_buf().
Fixes bug 23690; bugfix on 0.2.6.1-alpha.
This caused a BUG log when we noticed that the circuit had no
channel. The likeliest culprit for exposing that behavior is
d769cab3e5, where we made circuit_mark_for_close() NULL out
the n_chan and p_chan fields of the circuit.
Fixes bug 8185; bugfix on 0.2.5.4-alpha, I think.
My current theory is that this is just a marked circuit that hasn't
closed yet, but let's gather more information in case that theory is
wrong.
Diagnostic for 8185.
If 6 SOCKS requests are opened at once, it would have triggered 6 fetches
which ultimately poke all 6 HSDir. We don't want that, if we have multiple
SOCKS requests for the same service, do one fetch only.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When purging last HSDir requests, we used time(NULL) for computing the
service blinded key but in all other places in our codebase we actually
use the consensus times. That can cause wrong behavior if the consensus
is in a different time period than time(NULL).
This commit is required for proper purging of HSDir requests.
The confparse field has type UINT, which corresponds to an int
type. We had uint32_t.
This shouldn't cause trouble in practice, since int happens to
4-bytes wide on every platform where an authority is running. It's
still wrong, though.
These should have been int, but we had listed them as unsigned.
That's an easy mistake to make, since "int" corresponds with either
INT or UINT in the configuration file.
This bug cannot have actually caused a problem in practice, since we
check those fields' values on load, and ensure that they are in
range 0..INT32_MAX.
New approach, suggested by Taylor: During testing builds, we
initialize a union member of an appropriate pointer type with the
address of the member field we're trying to test, so we can make
sure that the compiler doesn't warn.
My earlier approach invoked undefined behavior.
Also demote a log message that can occur under natural causes
(if the circuit subsystem is missing descriptors/consensus etc.).
The HS subsystem will naturally retry to connect to intro points,
so no need to make that log user-facing.
So we can track them more easily in the logs and match any open/close/free
with those identifiers.
Part of #23645
Signed-off-by: David Goulet <dgoulet@torproject.org>
This removes the "nickname" of the cannibalized circuit last hop as it is
useless. It now logs the n_circ_id and global identifier so we can match it
with other logging statement.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Prior to the log statement, the circuit n_circ_id value is zeroed so keep a
copy so we can log it at the end.
Part of #23645
Signed-off-by: David Goulet <dgoulet@torproject.org>
Make the "Exit" flag assignment only depend on whether the exit
policy allows connections to ports 80 and 443. Previously relays
would get the Exit flag if they allowed connections to one of
these ports and also port 6667.
Resolves ticket 23637.
Back in 0.2.4.3-alpha (e106812a77), when we switched from using
double to using uint64 for selecting by bandwidth, I got the math
wrong: I should have used llround(x), or (uint64_t)(x+0.5), but
instead I wrote llround(x+0.5). That means we would always round
up, rather than rounding to the closest integer
Fixes bug 23318; bugfix on 0.2.4.3-alpha.
The is_first_hop field should have been called used_create_fast,
but everywhere that we wanted to check it, we should have been
checking channel_is_client() instead.
The diff is confusing, but were two static scheduler functions that
needed moving to static comment block.
No code change. Thanks dgoulet for original commit
The clock_skew_warning() refactoring allowed calls from
or_state_load() to control_event_bootstrap_problem() to occur prior
bootstrap phase 0, causing an assertion failure. Initialize the
bootstrap status prior to calling clock_skew_warning() from
or_state_load().
or_state_load() was using an incorrect sign convention when calling
clock_skew_warning() to warn about state file clock skew. This caused
the wording of the warning to be incorrect about the direction of the
skew.
is_canonical doesn't mean "am I connected to the one true address of
this relay"; it means "does this relay tell me that the address I'm
connected to belong to it." The point is to prevent TCP-based MITM,
not to prevent the relay from multi-homing.
Related to 22890.
Authority IPv6 addresses were originally added in 0.2.8.1-alpha.
This leaves 3/8 directory authorities with IPv6 addresses, but there
are also 52 fallback directory mirrors with IPv6 addresses.
Resolves 19760.
Use this value instead of hardcoded values of 32 everywhere. This also
addresses the use of REND_DESC_ID_V2_LEN_BASE32 in
hs_lookup_last_hid_serv_request() for the HSDir encoded identity digest length
which is accurate but semantically wrong.
Fixes#23305.
Signed-off-by: David Goulet <dgoulet@torproject.org>
RENDEZVOUS1 cell is 84 bytes long in v3 and 168 bytes long in v2 so this
commit pads with random bytes the v3 cells up to 168 bytes so they all look
alike at the rendezvous point.
Closes#23420
Signed-off-by: David Goulet <dgoulet@torproject.org>
This warning is caused by a different tv_usec data type on macOS
compared to the system on which the patch was developed.
Fixes 23575 on 0.3.2.1-alpha.
It is highly unlikely to happen but if so, we need to know and why. The
warning with the next_run values could help.
Signed-off-by: David Goulet <dgoulet@torproject.org>