Fixes bug 23908; bugfix on 0.3.1.6-rc when we made the keypin
failure message really long.
Backport from 0.3.2's 771fb7e7ba,
where arma said "get rid of the scary 256-byte-buf landmine".
It _should_ work, and I don't see a reason that it wouldn't, but
just in case, add a 10 second timer to make tor die with an
assertion failure if it's supposed to exit but it doesn't.
This function was never about 'finishing' the event loop, but rather
about making sure that the code outside the event loop would be run
at least once.
Sometimes when we call exit(), it's because the process is
completely hopeless: openssl has a broken AES-CTR implementation, or
the clock is in the 1960s, or something like that.
But sometimes, we should return cleanly from tor_main() instead, so
that embedders can keep embedding us and start another Tor process.
I've gone through all the exit() and _exit() calls to annotate them
with "exit ok" or "XXXX bad exit" -- the next step will be to fix
the bad exit()s.
First step towards 23848.
At first, we put the tor_git_revision constant in tor_main.c, so
that we wouldn't have to recompile config.o every time the git
revision changed. But putting it there had unintended side effect
of forcing every program that wanted to link libor.a (including
test, test-slow, the fuzzers, the benchmarks, etc) to declare their
own tor_git_revision instance.
That's not very nice, especially since we want to start supporting
others who want to link against Tor (see 23846).
So, create a new git_revision.c file that only contains this
constant, and remove the duplicated boilerplate from everywhere
else.
Part of implementing ticket 23845.
This feature should help programs that want to launch and manage a
Tor process, as well as programs that want to launch and manage a
Tor instance in a separate thread. Right now, they have to open a
controlport, and then connect to it, with attendant authentication
issues. This feature allows them to just start with an
authenticated connection.
Bug 23900.
Create a function that tells us if we can fetch or not the descriptor for the
given service key.
No behavior change. Mostly moving code but with a slight change so the
function can properly work by returning a boolean and also a possible fetch
status code.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When we added HTTPTunnelPort, the answer that we give when you try
to use your SOCKSPort as an HTTP proxy became wrong. Now we explain
that Tor sorta _is_ an HTTP proxy, but a SOCKSPort isn't.
I have left the status line the same, in case anything is depending
on it. I have removed the extra padding for Internet Explorer,
since the message is well over 512 bytes without it.
Fixes bug 23678; bugfix on 0.3.2.1-alpha.
Without this fix, changes from client to bridge don't trigger
transition_affects_workers(), so we would never have actually
initialized the cpuworkers.
Fixes bug 23693. Bugfix on 3bcdb26267 0.2.6.3-alpha, which
fixed bug 14901 in the general case, but not on the case where
public_server_mode() did not change.
Because our monotonic time interface doesn't play well with value set to 0,
always initialize to now() the scheduler_last_run at init() of the KIST
scheduler.
Fixes#23696
Signed-off-by: David Goulet <dgoulet@torproject.org>
When a channel is scheduled and flush cells returns 0 that is no cells to
flush, we flag it back in waiting for cells so it doesn't get stuck in a
possible infinite loop.
It has been observed on moria1 where a closed channel end up in the scheduler
where the flush process returned 0 cells but it was ultimately kept in the
scheduling loop forever. We suspect that this is due to a more deeper problem
in tor where the channel_more_to_flush() is actually looking at the wrong
queue and was returning 1 for an empty channel thus putting the channel in the
"Case 4" of the scheduler which is to go back in pending state thus
re-considered at the next iteration.
This is a fix that allows the KIST scheduler to recover properly from a not
entirelly diagnosed problem in tor.
Fixes#23676
Signed-off-by: David Goulet <dgoulet@torproject.org>
When we added single_conn_free_bytes(), we cleared the outbuf on a
connection without setting outbuf_flushlen() to 0. This could cause
an assertion failure later on in flush_buf().
Fixes bug 23690; bugfix on 0.2.6.1-alpha.
This caused a BUG log when we noticed that the circuit had no
channel. The likeliest culprit for exposing that behavior is
d769cab3e5, where we made circuit_mark_for_close() NULL out
the n_chan and p_chan fields of the circuit.
Fixes bug 8185; bugfix on 0.2.5.4-alpha, I think.
My current theory is that this is just a marked circuit that hasn't
closed yet, but let's gather more information in case that theory is
wrong.
Diagnostic for 8185.
If 6 SOCKS requests are opened at once, it would have triggered 6 fetches
which ultimately poke all 6 HSDir. We don't want that, if we have multiple
SOCKS requests for the same service, do one fetch only.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When purging last HSDir requests, we used time(NULL) for computing the
service blinded key but in all other places in our codebase we actually
use the consensus times. That can cause wrong behavior if the consensus
is in a different time period than time(NULL).
This commit is required for proper purging of HSDir requests.
The confparse field has type UINT, which corresponds to an int
type. We had uint32_t.
This shouldn't cause trouble in practice, since int happens to
4-bytes wide on every platform where an authority is running. It's
still wrong, though.
These should have been int, but we had listed them as unsigned.
That's an easy mistake to make, since "int" corresponds with either
INT or UINT in the configuration file.
This bug cannot have actually caused a problem in practice, since we
check those fields' values on load, and ensure that they are in
range 0..INT32_MAX.
New approach, suggested by Taylor: During testing builds, we
initialize a union member of an appropriate pointer type with the
address of the member field we're trying to test, so we can make
sure that the compiler doesn't warn.
My earlier approach invoked undefined behavior.
Also demote a log message that can occur under natural causes
(if the circuit subsystem is missing descriptors/consensus etc.).
The HS subsystem will naturally retry to connect to intro points,
so no need to make that log user-facing.
So we can track them more easily in the logs and match any open/close/free
with those identifiers.
Part of #23645
Signed-off-by: David Goulet <dgoulet@torproject.org>
This removes the "nickname" of the cannibalized circuit last hop as it is
useless. It now logs the n_circ_id and global identifier so we can match it
with other logging statement.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Prior to the log statement, the circuit n_circ_id value is zeroed so keep a
copy so we can log it at the end.
Part of #23645
Signed-off-by: David Goulet <dgoulet@torproject.org>
Make the "Exit" flag assignment only depend on whether the exit
policy allows connections to ports 80 and 443. Previously relays
would get the Exit flag if they allowed connections to one of
these ports and also port 6667.
Resolves ticket 23637.
Back in 0.2.4.3-alpha (e106812a77), when we switched from using
double to using uint64 for selecting by bandwidth, I got the math
wrong: I should have used llround(x), or (uint64_t)(x+0.5), but
instead I wrote llround(x+0.5). That means we would always round
up, rather than rounding to the closest integer
Fixes bug 23318; bugfix on 0.2.4.3-alpha.
The is_first_hop field should have been called used_create_fast,
but everywhere that we wanted to check it, we should have been
checking channel_is_client() instead.
The diff is confusing, but were two static scheduler functions that
needed moving to static comment block.
No code change. Thanks dgoulet for original commit
The clock_skew_warning() refactoring allowed calls from
or_state_load() to control_event_bootstrap_problem() to occur prior
bootstrap phase 0, causing an assertion failure. Initialize the
bootstrap status prior to calling clock_skew_warning() from
or_state_load().
or_state_load() was using an incorrect sign convention when calling
clock_skew_warning() to warn about state file clock skew. This caused
the wording of the warning to be incorrect about the direction of the
skew.
is_canonical doesn't mean "am I connected to the one true address of
this relay"; it means "does this relay tell me that the address I'm
connected to belong to it." The point is to prevent TCP-based MITM,
not to prevent the relay from multi-homing.
Related to 22890.
Authority IPv6 addresses were originally added in 0.2.8.1-alpha.
This leaves 3/8 directory authorities with IPv6 addresses, but there
are also 52 fallback directory mirrors with IPv6 addresses.
Resolves 19760.
Use this value instead of hardcoded values of 32 everywhere. This also
addresses the use of REND_DESC_ID_V2_LEN_BASE32 in
hs_lookup_last_hid_serv_request() for the HSDir encoded identity digest length
which is accurate but semantically wrong.
Fixes#23305.
Signed-off-by: David Goulet <dgoulet@torproject.org>
RENDEZVOUS1 cell is 84 bytes long in v3 and 168 bytes long in v2 so this
commit pads with random bytes the v3 cells up to 168 bytes so they all look
alike at the rendezvous point.
Closes#23420
Signed-off-by: David Goulet <dgoulet@torproject.org>
This warning is caused by a different tv_usec data type on macOS
compared to the system on which the patch was developed.
Fixes 23575 on 0.3.2.1-alpha.
It is highly unlikely to happen but if so, we need to know and why. The
warning with the next_run values could help.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When the KIST schedule() is called, it computes a diff value between the last
scheduler run and the current monotonic time. If tha value is below the run
interval, the libevent even is updated else the event is run.
It turned out that casting to int32_t the returned int64_t value for the very
first scheduler run (which is set to 0) was creating an overflow on the 32 bit
value leading to adding the event with a gigantic usec value. The scheduler
was simply never running for a while.
First of all, a BUG() is added for a diff value < 0 because if the time is
really monotonic, we should never have a now time that is lower than the last
scheduler run time. And we will try to recover with a diff value to 0.
Second, the diff value is changed to int64_t so we avoid this "bootstrap
overflow" and future casting overflow problems.
Fixes#23558
Signed-off-by: David Goulet <dgoulet@torproject.org>
Otherwise integer overflows can happen. Remember, doing a i32xi32
multiply doesn't actually produce a 64-bit output. You need to do
i64xi32 or i64xi64.
Coverity found this as CID 1417753
Each type of scheduler implements its own static scheduler_t object and
returns a reference to it.
This commit also makes it a const pointer that is it can only change inside
the scheduler type subsystem but not outside for extra protection.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Instead, add wrappers to do the needed action the different scheduler needs
with the libevent object.
Signed-off-by: David Goulet <dgoulet@torproject.org>
A channel can bounce in the scheduler and bounce out with the IDLE state which
means that if it came in the scheduler once, it has socket information that
needs to be freed from the global hash table.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This option is a list of possible scheduler type tor can use ordered by
priority. Its default value is "KIST,KISTLite,Vanilla" which means that KIST
will be used first and if unavailable will fallback to KISTLite and so on.
Signed-off-by: David Goulet <dgoulet@torproject.org>
It is possible that tor was compiled with KIST support but the running kernel
has no support for it. In that case, fallback to a naive approach and flag
that we have no kernel support.
At this commit, if the kernel support is disabled, there are no ways to come
back from it other than restarting tor with a kernel that supporst KIST.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Add a detection for the KIST scheduler in our build system and set
HAVE_KIST_SUPPORT if available.
Adapt the should use kist function with this new compile option.
Signed-off-by: David Goulet <dgoulet@torproject.org>
- HT_FOREACH_FN defined in an additional place because nickm did that
in an old kist prototype
- Make channel_more_to_flush mockable for future sched tests
- Add empty scheduler_{vanilla,kist}.c files and put in include.am
Signed-off-by: David Goulet <dgoulet@torproject.org>
- massive change to src/tgest/test_options.c since the sched options
were added all over the place in it
- removing the sched options caused some tests to pass/fail in new ways
so I assumed current behavior is correct and made them pass again
- ex: "ConnLimit must be greater" lines
- ex: "Authoritative directory servers must" line
- remove test_options_validate__scheduler in prep for new sched tests
Signed-off-by: David Goulet <dgoulet@torproject.org>
An unnecessary routerlist check in the NETINFO clock skew detection in
channel_tls_process_netinfo_cell() was preventing clients from
reporting NETINFO clock skew to controllers.
This patch replaces a few calls to router_get_by_id_digest ("do we
have a routerinfo?") with connection_or_digest_is_known_relay ("do
we know this relay to be in the consensus, or have been there some
time recently?").
Found while doing the 21585 audit; fixes bug 23533. Bugfix on
0.3.0.1-alpha.
The memleak was occuring because of the way ExcludeNodes is handled in
that function. Basically, we were putting excluded intro points extend
infos in a special variable which was never freed. Also, if there were
multiple excluded intro points then that variable was overwritten
everytime leaking more memory. This commit should fix both issues.
This commit adds a pretty advanced test for the client-side making sure that
picking intro is done properly.
This unittest also reveals a memleak on the client_pick_intro() function which
is fixed by the subsequent commit.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Make download status next attempts reported over the control port
consistent with the time used by tor. This issue only occurs if a
download status has not been reset before it is queried over the
control port.
Fixes 23525, not in any released version of tor.
If future code asks if there are any running bridges, without checking
if bridges are enabled, log a BUG warning rather than crashing.
Fixes 23524 on 0.3.0.1-alpha
Change the contract of control_event_bootstrap_problem() to be more
general and to take a connection_t. New function
control_event_bootstrap_prob_or() has the specific or_connection_t
funcionality previously used.
Fix the test_build_address() test and its test vectors python script.
They were both using a bogus pubkey for building an HS address which
does not validate anymore.
Also fix a few more unittests that were using bogus onion addresses
and were failing the validation. I replaced the bogus address with
the one generated from the test vector script.
Directory servers now include a "Date:" http header for response
codes other than 200. Clients starting with a skewed clock and a
recent consensus were getting "304 Not modified" responses from
directory authorities, so without a Date header the client would
never hear about a wrong clock.
Fixes bug 23499; bugfix on 0.0.8rc1.
In #23466 we discovered that cached descriptors can stay around on the
client-side for up to 72 hours. In reality we only want those descs to
get cached for the duration of the current time period, since after that
TP is gone the client needs to compute a new blinded key to use for the HS.
In this commit we start using the consensus time (if available) when
cleaning up cached client descriptor entries. That makes sense because
the client uses consensus time anyway for connecting to hidden
services (e.g. computing blinded keys and time periods).
If no recent consensus is available, we consider descriptors to be
expired since we will want to fetch new ones when we get a live
consensus to avoid the Roger bug. If we didn't do that, when Roger
desuspends his laptop there would be a race between Tor fetching a new
consensus, and Tor connecting to the HS which would still cause
reachability issues.
We also turned a rev counter check into a BUG, since we should never
receive a descriptor with a strictly smaller rev counter than the one we
already have, except if there is a bug or if the HSDir wants to mess
with us. In any case, let's turn this into a BUG so that we can detect
and debug such cases easily.
This change refactors find_dl_schedule() to only call dependent functions
as needed. In particular, directory_fetches_from_authorities() only needs
to be called on clients.
Stopping spurious directory_fetches_from_authorities() calls on every
download on public relays has the following impacts:
* fewer address resolution attempts, particularly those mentioned in 21789
* fewer descriptor rebuilds
* fewer log messages, particularly those limited in 20610
Fixes 23470 in 0.2.8.1-alpha.
The original bug was introduced in commit 35bbf2e as part of prop210.
But when clients are just starting, make them try each bridge a few times
before giving up on it.
These changes make the bridge download schedules more explicit: before
17750, they relied on undocumented behaviour and specific schedule
entries. (And between 17750 and this fix, they were broken.)
Fixes 23347, not in any released version of tor.
We were always incrementing bridge download statuses on each attempt,
but we were using the "increment on failure" functions to do it.
And we never incremented them on failure.
No behaviour change.
The download schedule tells Tor to wait 15 minutes before downloading
bridge descriptors. But 17750 made Tor ignore that and start immediately.
Since we fixed 17750, Tor waits 15 minutes for bridge client bootstrap,
like the schedule says.
This fixes the download schedule to start immediately, and to try each
bridge 3 times in the first 30 seconds. This should make bridge bootstraps
more reliable.
Fixes 23347.
It is possible that two descriptor upload requests are launched in a very
short time frame which can lead to the second request finishing before the
first one and where that first one will make the HSDir send back a 400
malformed descriptor leading to a warning.
To avoid such, cancel all active directory connections for the specific
descriptor we are about to upload.
Note that this race is still possible on the HSDir side which triggers a log
info to be printed out but that is fine.
Fixes#23457
Signed-off-by: David Goulet <dgoulet@torproject.org>
When option validation or transition is happening, there are no
"current options" -- only "old options" and "maybe new options".
Looking at get_options() is likely a mistake, so have a nonfatal
assertion let us know if we do that.
Closes 22281.
Because we can get a RENDEZVOUS2 cell before the INTRODUCE_ACK, we need to
correctly handle the circuit purpose REND_JOINED that is not change its
purpose when we get an INTRODUCE_ACK and simply close the intro circuit
normally.
Fixes#23455
Signed-off-by: David Goulet <dgoulet@torproject.org>
Before, this function meant "can we connect to this node and
authenticate it using its ed25519 key?" Now it can additionally
mean, "when somebody else connects to this node, do we expect that
they can authenticate using the node's ed25519 key"?
This change lets us future-proof our link authentication a bit.
Closes ticket 20895. No backport needed, since ed25519 link
authentication support has not been in any LTS release yet, and
existing releases with it should be obsolete before any releases
without support for linkauth=3 are released.
This test is important because it tests that upload_descriptor_to_all()
is in synch with pick_hsdir_v3(). That's not the case for the
reachability test which just compares the responsible hsdir sets.
There was a bug in upload_descriptor_to_all() where we picked between
first and second hsdir index based on which time segment we are. That's
not right and instead we should be uploading our two descriptors using a
different hsdir index every time. That is, upload first descriptor using
first hsdir index, and upload second descriptor using second hdsir index.
Also simplify stuff in pick_hdsir_v3() since that's only used to fetch
descriptors and hence we can just always use the fetch hsdir index.
This is a large and important unit test for the hidden service version
3! It tests the service reachability for a client using different
consensus timings and makes sure that the computed hashring is the same
on both side so it is actually reachable.
Signed-off-by: David Goulet <dgoulet@torproject.org>
With the latest change on how we use the HSDir index, the client and service
need to pick their responsible HSDir differently that is depending on if they
are before or after a new time period.
The overlap mode is active function has been renamed for this and test added.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Because of #23387, we've realized that there is one scenario that makes
the client unable to reach the service because of a desynch in the time
period used. The scenario is as follows:
+------------------------------------------------------------------+
| |
| 00:00 12:00 00:00 12:00 00:00 12:00 |
| SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 |
| |
| $==========|-----------$===========|-----------$===========| |
| ^ ^ |
| C S |
+------------------------------------------------------------------+
In this scenario the HS has a newer consensus than the client, and the
HS just moved to the next TP but the client is still stuck on the old
one. However, the service is not in any sort of overlap mode so it
doesn't cover the old TP anymore, so the client is unable to fetch a
descriptor.
We've decided to solve this by extending the concept of overlap period
to be permanent so that the service always publishes two descriptors and
aims to cover clients with both older and newer consensuses. See the
spec patch in #23387 for more details.
Based on our #23387 findings, it seems like to maintain 24/7
reachability we need to employ different logic when computing hsdir
indices for fetching vs storing. That's to guarantee that the client
will always fetch the current descriptor, while the service will always
publish two descriptors aiming to cover all possible edge cases.
For more details see the next commit and the spec branch.
Signed-off-by: David Goulet <dgoulet@torproject.org>
The pruning process and the deleting ephemeral service function iterates over
all circuits and were asserting on rend_data for a matching circuit. This is
not good because now we have v3 circuits without a rend_data.
Fixes#23429
Signed-off-by: David Goulet <dgoulet@torproject.org>
Use the valid_after time from the consensus to get the time period number else
we might get out of sync with the overlap period that uses valid_after.
Make it an optional feature since some functions require passing a
specific time (like hs_get_start_time_of_next_time_period()).
Signed-off-by: David Goulet <dgoulet@torproject.org>
Undeprecate it;
rename it to TestingClientDNSRejectInternalAddresses;
add the old name as an alias;
reject configurations where it is set but TestingTorNetwork is not;
change the documentation accordingly.
Closes tickets 21031 and 21522.
Version 3 hidden service needs rendezvous point that have the protocol version
HSRend >= 2 else the rendezvous cells are rejected.
Fixes#23361
Signed-off-by: David Goulet <dgoulet@torproject.org>
The chdir() call in RunAsDaemon makes the behavior here surprising,
and either way of trying to resolve the surprise seems sure to
startle a significant fraction of users. Instead, let's refuse to
guess, and refuse these configurations.
Closes ticket 22731.
I'm doing this using the Proxy-Authorization: header to support
clients that understand it, and with a new tor-specific header that
makes more sense for our use.
By convention, a function that frobs a foo_t should be called
foo_frob, and it should have a foo_t * as its first argument. But
for many of the buf_t functions, the buf_t was the final argument,
which is silly.
Our convention is that functions which manipulate a type T should be
named T_foo. But the buffer functions were super old, and followed
all kinds of conventions. Now they're uniform.
Here's the perl I used to do this:
\#!/usr/bin/perl -w -i -p
s/read_to_buf\(/buf_read_from_socket\(/;
s/flush_buf\(/buf_flush_to_socket\(/;
s/read_to_buf_tls\(/buf_read_from_tls\(/;
s/flush_buf_tls\(/buf_flush_to_tls\(/;
s/write_to_buf\(/buf_add\(/;
s/write_to_buf_compress\(/buf_add_compress\(/;
s/move_buf_to_buf\(/buf_move_to_buf\(/;
s/peek_from_buf\(/buf_peek\(/;
s/fetch_from_buf\(/buf_get_bytes\(/;
s/fetch_from_buf_line\(/buf_get_line\(/;
s/fetch_from_buf_line\(/buf_get_line\(/;
s/buf_remove_from_front\(/buf_drain\(/;
s/peek_buf_startswith\(/buf_peek_startswith\(/;
s/assert_buf_ok\(/buf_assert_ok\(/;