It _should_ work, and I don't see a reason that it wouldn't, but
just in case, add a 10 second timer to make tor die with an
assertion failure if it's supposed to exit but it doesn't.
This function was never about 'finishing' the event loop, but rather
about making sure that the code outside the event loop would be run
at least once.
Sometimes when we call exit(), it's because the process is
completely hopeless: openssl has a broken AES-CTR implementation, or
the clock is in the 1960s, or something like that.
But sometimes, we should return cleanly from tor_main() instead, so
that embedders can keep embedding us and start another Tor process.
I've gone through all the exit() and _exit() calls to annotate them
with "exit ok" or "XXXX bad exit" -- the next step will be to fix
the bad exit()s.
First step towards 23848.
At first, we put the tor_git_revision constant in tor_main.c, so
that we wouldn't have to recompile config.o every time the git
revision changed. But putting it there had unintended side effect
of forcing every program that wanted to link libor.a (including
test, test-slow, the fuzzers, the benchmarks, etc) to declare their
own tor_git_revision instance.
That's not very nice, especially since we want to start supporting
others who want to link against Tor (see 23846).
So, create a new git_revision.c file that only contains this
constant, and remove the duplicated boilerplate from everywhere
else.
Part of implementing ticket 23845.
This feature should help programs that want to launch and manage a
Tor process, as well as programs that want to launch and manage a
Tor instance in a separate thread. Right now, they have to open a
controlport, and then connect to it, with attendant authentication
issues. This feature allows them to just start with an
authenticated connection.
Bug 23900.
Our socket accounting functions assumed that we'd never be asked to
close a socket that we didn't open ourselves. But now we want to
support taking control sockets that we inherit -- so we need a way
of taking ownership of them, so we don't freak out later on when we
close them.
Create a function that tells us if we can fetch or not the descriptor for the
given service key.
No behavior change. Mostly moving code but with a slight change so the
function can properly work by returning a boolean and also a possible fetch
status code.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Skip test_config_include_no_permission() when running as root, because
it will get an unexpected success from config_get_lines_include().
This affects some continuous integration setups. Fixes bug 23758.
When we added HTTPTunnelPort, the answer that we give when you try
to use your SOCKSPort as an HTTP proxy became wrong. Now we explain
that Tor sorta _is_ an HTTP proxy, but a SOCKSPort isn't.
I have left the status line the same, in case anything is depending
on it. I have removed the extra padding for Internet Explorer,
since the message is well over 512 bytes without it.
Fixes bug 23678; bugfix on 0.3.2.1-alpha.
Without this fix, changes from client to bridge don't trigger
transition_affects_workers(), so we would never have actually
initialized the cpuworkers.
Fixes bug 23693. Bugfix on 3bcdb26267 0.2.6.3-alpha, which
fixed bug 14901 in the general case, but not on the case where
public_server_mode() did not change.
Because our monotonic time interface doesn't play well with value set to 0,
always initialize to now() the scheduler_last_run at init() of the KIST
scheduler.
Fixes#23696
Signed-off-by: David Goulet <dgoulet@torproject.org>
When a channel is scheduled and flush cells returns 0 that is no cells to
flush, we flag it back in waiting for cells so it doesn't get stuck in a
possible infinite loop.
It has been observed on moria1 where a closed channel end up in the scheduler
where the flush process returned 0 cells but it was ultimately kept in the
scheduling loop forever. We suspect that this is due to a more deeper problem
in tor where the channel_more_to_flush() is actually looking at the wrong
queue and was returning 1 for an empty channel thus putting the channel in the
"Case 4" of the scheduler which is to go back in pending state thus
re-considered at the next iteration.
This is a fix that allows the KIST scheduler to recover properly from a not
entirelly diagnosed problem in tor.
Fixes#23676
Signed-off-by: David Goulet <dgoulet@torproject.org>
When we added single_conn_free_bytes(), we cleared the outbuf on a
connection without setting outbuf_flushlen() to 0. This could cause
an assertion failure later on in flush_buf().
Fixes bug 23690; bugfix on 0.2.6.1-alpha.
This caused a BUG log when we noticed that the circuit had no
channel. The likeliest culprit for exposing that behavior is
d769cab3e5, where we made circuit_mark_for_close() NULL out
the n_chan and p_chan fields of the circuit.
Fixes bug 8185; bugfix on 0.2.5.4-alpha, I think.
My current theory is that this is just a marked circuit that hasn't
closed yet, but let's gather more information in case that theory is
wrong.
Diagnostic for 8185.
This patch ensures that we return TOR_COMPRESS_BUFFER_FULL in case we
have a input bytes left to process, but are out of output buffer or in
case we need to finish where the compression implementation might need
to write an epilogue.
See: https://bugs.torproject.org/23551
This patch ensures that we return TOR_COMPRESS_BUFFER_FULL in case we
have a input bytes left to process, but are out of output buffer or in
case we need to finish where the compression implementation might need
to write an epilogue.
See: https://bugs.torproject.org/23551
If 6 SOCKS requests are opened at once, it would have triggered 6 fetches
which ultimately poke all 6 HSDir. We don't want that, if we have multiple
SOCKS requests for the same service, do one fetch only.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When purging last HSDir requests, we used time(NULL) for computing the
service blinded key but in all other places in our codebase we actually
use the consensus times. That can cause wrong behavior if the consensus
is in a different time period than time(NULL).
This commit is required for proper purging of HSDir requests.
The confparse field has type UINT, which corresponds to an int
type. We had uint32_t.
This shouldn't cause trouble in practice, since int happens to
4-bytes wide on every platform where an authority is running. It's
still wrong, though.
These should have been int, but we had listed them as unsigned.
That's an easy mistake to make, since "int" corresponds with either
INT or UINT in the configuration file.
This bug cannot have actually caused a problem in practice, since we
check those fields' values on load, and ensure that they are in
range 0..INT32_MAX.
New approach, suggested by Taylor: During testing builds, we
initialize a union member of an appropriate pointer type with the
address of the member field we're trying to test, so we can make
sure that the compiler doesn't warn.
My earlier approach invoked undefined behavior.
Also demote a log message that can occur under natural causes
(if the circuit subsystem is missing descriptors/consensus etc.).
The HS subsystem will naturally retry to connect to intro points,
so no need to make that log user-facing.
So we can track them more easily in the logs and match any open/close/free
with those identifiers.
Part of #23645
Signed-off-by: David Goulet <dgoulet@torproject.org>
This removes the "nickname" of the cannibalized circuit last hop as it is
useless. It now logs the n_circ_id and global identifier so we can match it
with other logging statement.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Prior to the log statement, the circuit n_circ_id value is zeroed so keep a
copy so we can log it at the end.
Part of #23645
Signed-off-by: David Goulet <dgoulet@torproject.org>
Make the "Exit" flag assignment only depend on whether the exit
policy allows connections to ports 80 and 443. Previously relays
would get the Exit flag if they allowed connections to one of
these ports and also port 6667.
Resolves ticket 23637.
Back in 0.2.4.3-alpha (e106812a77), when we switched from using
double to using uint64 for selecting by bandwidth, I got the math
wrong: I should have used llround(x), or (uint64_t)(x+0.5), but
instead I wrote llround(x+0.5). That means we would always round
up, rather than rounding to the closest integer
Fixes bug 23318; bugfix on 0.2.4.3-alpha.
The is_first_hop field should have been called used_create_fast,
but everywhere that we wanted to check it, we should have been
checking channel_is_client() instead.
The diff is confusing, but were two static scheduler functions that
needed moving to static comment block.
No code change. Thanks dgoulet for original commit
The clock_skew_warning() refactoring allowed calls from
or_state_load() to control_event_bootstrap_problem() to occur prior
bootstrap phase 0, causing an assertion failure. Initialize the
bootstrap status prior to calling clock_skew_warning() from
or_state_load().
or_state_load() was using an incorrect sign convention when calling
clock_skew_warning() to warn about state file clock skew. This caused
the wording of the warning to be incorrect about the direction of the
skew.
is_canonical doesn't mean "am I connected to the one true address of
this relay"; it means "does this relay tell me that the address I'm
connected to belong to it." The point is to prevent TCP-based MITM,
not to prevent the relay from multi-homing.
Related to 22890.
Authority IPv6 addresses were originally added in 0.2.8.1-alpha.
This leaves 3/8 directory authorities with IPv6 addresses, but there
are also 52 fallback directory mirrors with IPv6 addresses.
Resolves 19760.
If "1" is not 64 bits wide already, then "1 << i" will not actually
work.
This bug only affects the TEST_BITOPS code, and shouldn't matter for
the actual use of the timeout code (except if/when it causes this
test to fail).
Reported by dcb314@hotmail.com. Fix for bug 23583. Not adding a
changes file, since this code is never compiled into Tor.
Use this value instead of hardcoded values of 32 everywhere. This also
addresses the use of REND_DESC_ID_V2_LEN_BASE32 in
hs_lookup_last_hid_serv_request() for the HSDir encoded identity digest length
which is accurate but semantically wrong.
Fixes#23305.
Signed-off-by: David Goulet <dgoulet@torproject.org>
RENDEZVOUS1 cell is 84 bytes long in v3 and 168 bytes long in v2 so this
commit pads with random bytes the v3 cells up to 168 bytes so they all look
alike at the rendezvous point.
Closes#23420
Signed-off-by: David Goulet <dgoulet@torproject.org>
This warning is caused by a different tv_usec data type on macOS
compared to the system on which the patch was developed.
Fixes 23575 on 0.3.2.1-alpha.
It is highly unlikely to happen but if so, we need to know and why. The
warning with the next_run values could help.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When the KIST schedule() is called, it computes a diff value between the last
scheduler run and the current monotonic time. If tha value is below the run
interval, the libevent even is updated else the event is run.
It turned out that casting to int32_t the returned int64_t value for the very
first scheduler run (which is set to 0) was creating an overflow on the 32 bit
value leading to adding the event with a gigantic usec value. The scheduler
was simply never running for a while.
First of all, a BUG() is added for a diff value < 0 because if the time is
really monotonic, we should never have a now time that is lower than the last
scheduler run time. And we will try to recover with a diff value to 0.
Second, the diff value is changed to int64_t so we avoid this "bootstrap
overflow" and future casting overflow problems.
Fixes#23558
Signed-off-by: David Goulet <dgoulet@torproject.org>
This was introduced in 4ff170d7b1, and is probably
unreachable, but coverity complained about it (CID 1417761). Bug not
in any released Tor, so no changes file.
Otherwise integer overflows can happen. Remember, doing a i32xi32
multiply doesn't actually produce a 64-bit output. You need to do
i64xi32 or i64xi64.
Coverity found this as CID 1417753
Each type of scheduler implements its own static scheduler_t object and
returns a reference to it.
This commit also makes it a const pointer that is it can only change inside
the scheduler type subsystem but not outside for extra protection.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Instead, add wrappers to do the needed action the different scheduler needs
with the libevent object.
Signed-off-by: David Goulet <dgoulet@torproject.org>
A channel can bounce in the scheduler and bounce out with the IDLE state which
means that if it came in the scheduler once, it has socket information that
needs to be freed from the global hash table.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This option is a list of possible scheduler type tor can use ordered by
priority. Its default value is "KIST,KISTLite,Vanilla" which means that KIST
will be used first and if unavailable will fallback to KISTLite and so on.
Signed-off-by: David Goulet <dgoulet@torproject.org>
It is possible that tor was compiled with KIST support but the running kernel
has no support for it. In that case, fallback to a naive approach and flag
that we have no kernel support.
At this commit, if the kernel support is disabled, there are no ways to come
back from it other than restarting tor with a kernel that supporst KIST.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Add a detection for the KIST scheduler in our build system and set
HAVE_KIST_SUPPORT if available.
Adapt the should use kist function with this new compile option.
Signed-off-by: David Goulet <dgoulet@torproject.org>
- HT_FOREACH_FN defined in an additional place because nickm did that
in an old kist prototype
- Make channel_more_to_flush mockable for future sched tests
- Add empty scheduler_{vanilla,kist}.c files and put in include.am
Signed-off-by: David Goulet <dgoulet@torproject.org>
- massive change to src/tgest/test_options.c since the sched options
were added all over the place in it
- removing the sched options caused some tests to pass/fail in new ways
so I assumed current behavior is correct and made them pass again
- ex: "ConnLimit must be greater" lines
- ex: "Authoritative directory servers must" line
- remove test_options_validate__scheduler in prep for new sched tests
Signed-off-by: David Goulet <dgoulet@torproject.org>
An unnecessary routerlist check in the NETINFO clock skew detection in
channel_tls_process_netinfo_cell() was preventing clients from
reporting NETINFO clock skew to controllers.
This patch replaces a few calls to router_get_by_id_digest ("do we
have a routerinfo?") with connection_or_digest_is_known_relay ("do
we know this relay to be in the consensus, or have been there some
time recently?").
Found while doing the 21585 audit; fixes bug 23533. Bugfix on
0.3.0.1-alpha.
The memleak was occuring because of the way ExcludeNodes is handled in
that function. Basically, we were putting excluded intro points extend
infos in a special variable which was never freed. Also, if there were
multiple excluded intro points then that variable was overwritten
everytime leaking more memory. This commit should fix both issues.
This commit adds a pretty advanced test for the client-side making sure that
picking intro is done properly.
This unittest also reveals a memleak on the client_pick_intro() function which
is fixed by the subsequent commit.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Using a test vector in python, test both hs_build_hsdir_index() and
hs_build_hs_index().
This commit also adds the hs_build_address.py to EXTRA_DIST which was missing.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Do two major improvements:
a) Make the client pick 6 HSDirs instead of just 1 and make sure they
all match the service's HSDirs.
b) Test two additional missing scenarios borrowed from the
test_reachability() test.
Make download status next attempts reported over the control port
consistent with the time used by tor. This issue only occurs if a
download status has not been reset before it is queried over the
control port.
Fixes 23525, not in any released version of tor.
If future code asks if there are any running bridges, without checking
if bridges are enabled, log a BUG warning rather than crashing.
Fixes 23524 on 0.3.0.1-alpha
Change the contract of control_event_bootstrap_problem() to be more
general and to take a connection_t. New function
control_event_bootstrap_prob_or() has the specific or_connection_t
funcionality previously used.
Fix the test_build_address() test and its test vectors python script.
They were both using a bogus pubkey for building an HS address which
does not validate anymore.
Also fix a few more unittests that were using bogus onion addresses
and were failing the validation. I replaced the bogus address with
the one generated from the test vector script.
Directory servers now include a "Date:" http header for response
codes other than 200. Clients starting with a skewed clock and a
recent consensus were getting "304 Not modified" responses from
directory authorities, so without a Date header the client would
never hear about a wrong clock.
Fixes bug 23499; bugfix on 0.0.8rc1.
We enrich the test_client_cache() test in two ways:
a) We check that transitioning time periods also cleans up expired
descriptors in client memory.
b) We test hs_cache_lookup_as_client() instead of
lookup_v3_desc_as_client(). The former is a higher level function
which calls the latter and allows us to test deeper into the
subsystem.
In #23466 we discovered that cached descriptors can stay around on the
client-side for up to 72 hours. In reality we only want those descs to
get cached for the duration of the current time period, since after that
TP is gone the client needs to compute a new blinded key to use for the HS.
In this commit we start using the consensus time (if available) when
cleaning up cached client descriptor entries. That makes sense because
the client uses consensus time anyway for connecting to hidden
services (e.g. computing blinded keys and time periods).
If no recent consensus is available, we consider descriptors to be
expired since we will want to fetch new ones when we get a live
consensus to avoid the Roger bug. If we didn't do that, when Roger
desuspends his laptop there would be a race between Tor fetching a new
consensus, and Tor connecting to the HS which would still cause
reachability issues.
We also turned a rev counter check into a BUG, since we should never
receive a descriptor with a strictly smaller rev counter than the one we
already have, except if there is a bug or if the HSDir wants to mess
with us. In any case, let's turn this into a BUG so that we can detect
and debug such cases easily.
This change refactors find_dl_schedule() to only call dependent functions
as needed. In particular, directory_fetches_from_authorities() only needs
to be called on clients.
Stopping spurious directory_fetches_from_authorities() calls on every
download on public relays has the following impacts:
* fewer address resolution attempts, particularly those mentioned in 21789
* fewer descriptor rebuilds
* fewer log messages, particularly those limited in 20610
Fixes 23470 in 0.2.8.1-alpha.
The original bug was introduced in commit 35bbf2e as part of prop210.
OpenBSD doesn't like tricks where you use a too-wide sscanf argument
for a too-narrow array, even when you know the input string
statically. The fix here is just to use bigger buffers.
Fixes 15582; bugfix on a3dafd3f58 in 0.2.6.2-alpha.
But when clients are just starting, make them try each bridge a few times
before giving up on it.
These changes make the bridge download schedules more explicit: before
17750, they relied on undocumented behaviour and specific schedule
entries. (And between 17750 and this fix, they were broken.)
Fixes 23347, not in any released version of tor.
We were always incrementing bridge download statuses on each attempt,
but we were using the "increment on failure" functions to do it.
And we never incremented them on failure.
No behaviour change.
The download schedule tells Tor to wait 15 minutes before downloading
bridge descriptors. But 17750 made Tor ignore that and start immediately.
Since we fixed 17750, Tor waits 15 minutes for bridge client bootstrap,
like the schedule says.
This fixes the download schedule to start immediately, and to try each
bridge 3 times in the first 30 seconds. This should make bridge bootstraps
more reliable.
Fixes 23347.
It is possible that two descriptor upload requests are launched in a very
short time frame which can lead to the second request finishing before the
first one and where that first one will make the HSDir send back a 400
malformed descriptor leading to a warning.
To avoid such, cancel all active directory connections for the specific
descriptor we are about to upload.
Note that this race is still possible on the HSDir side which triggers a log
info to be printed out but that is fine.
Fixes#23457
Signed-off-by: David Goulet <dgoulet@torproject.org>
When option validation or transition is happening, there are no
"current options" -- only "old options" and "maybe new options".
Looking at get_options() is likely a mistake, so have a nonfatal
assertion let us know if we do that.
Closes 22281.
Because we can get a RENDEZVOUS2 cell before the INTRODUCE_ACK, we need to
correctly handle the circuit purpose REND_JOINED that is not change its
purpose when we get an INTRODUCE_ACK and simply close the intro circuit
normally.
Fixes#23455
Signed-off-by: David Goulet <dgoulet@torproject.org>
Before, this function meant "can we connect to this node and
authenticate it using its ed25519 key?" Now it can additionally
mean, "when somebody else connects to this node, do we expect that
they can authenticate using the node's ed25519 key"?
This change lets us future-proof our link authentication a bit.
Closes ticket 20895. No backport needed, since ed25519 link
authentication support has not been in any LTS release yet, and
existing releases with it should be obsolete before any releases
without support for linkauth=3 are released.
This test is important because it tests that upload_descriptor_to_all()
is in synch with pick_hsdir_v3(). That's not the case for the
reachability test which just compares the responsible hsdir sets.
There was a bug in upload_descriptor_to_all() where we picked between
first and second hsdir index based on which time segment we are. That's
not right and instead we should be uploading our two descriptors using a
different hsdir index every time. That is, upload first descriptor using
first hsdir index, and upload second descriptor using second hdsir index.
Also simplify stuff in pick_hdsir_v3() since that's only used to fetch
descriptors and hence we can just always use the fetch hsdir index.
Because of the latest changes on when we rotate, longer lifetime of
descriptors and no more overlap period, the tests needed to be improved to
test more functionnalities.
Signed-off-by: David Goulet <dgoulet@torproject.org>
First, this fixes#23372.
Second, the consensus timings for the build descriptor have been changed to
the current test can pass. More extensive tests of descriptor rotation are
coming in a commit near you because the rotation and time period logic has
been changed.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This is a large and important unit test for the hidden service version
3! It tests the service reachability for a client using different
consensus timings and makes sure that the computed hashring is the same
on both side so it is actually reachable.
Signed-off-by: David Goulet <dgoulet@torproject.org>
With the latest change on how we use the HSDir index, the client and service
need to pick their responsible HSDir differently that is depending on if they
are before or after a new time period.
The overlap mode is active function has been renamed for this and test added.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Because of #23387, we've realized that there is one scenario that makes
the client unable to reach the service because of a desynch in the time
period used. The scenario is as follows:
+------------------------------------------------------------------+
| |
| 00:00 12:00 00:00 12:00 00:00 12:00 |
| SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 |
| |
| $==========|-----------$===========|-----------$===========| |
| ^ ^ |
| C S |
+------------------------------------------------------------------+
In this scenario the HS has a newer consensus than the client, and the
HS just moved to the next TP but the client is still stuck on the old
one. However, the service is not in any sort of overlap mode so it
doesn't cover the old TP anymore, so the client is unable to fetch a
descriptor.
We've decided to solve this by extending the concept of overlap period
to be permanent so that the service always publishes two descriptors and
aims to cover clients with both older and newer consensuses. See the
spec patch in #23387 for more details.