sizeof(ret) is the size of the pointer, not the size of what it
points to. Fortunately, we already have a function to compare
tor_addr_port_t values for equality.
Bugfix on c2c5b13e5d8a77e; bug not in any released Tor. Found by
clang's scan-build.
Now that we update our buckets on demand before reading or writing,
we no longer need to update them all every TokenBucketRefillInterval
msec.
When a connection runs out of bandwidth, we do need a way to
reenable it, however. We do this by scheduling a timer to reenable
all blocked connections for TokenBucketRefillInterval msec after a
connection becomes blocked.
(If we were using PerConnBWRate more, it might make sense to have a
per-connection timer, rather than a single timeout. But since
PerConnBWRate is currently (mostly) unused, I'm going to go for the
simpler approach here, since usually whenever one connection has
become blocked on bandwidth, most connections are blocked on
bandwidth.)
Implements ticket 25373.
Previously this was done as part of the refill callback, but there's
no real reason to do it like that. Since we're trying to remove the
refill callback completely, we can do this work as part of
record_num_bytes_transferred_impl(), which already does quite a lot
of this.
We used to do this 10x per second in connection_buckets_refill();
instead, we now do it when the bucket becomes empty. This change is
part of the work of making connection_buckets_refill() obsolete.
Closes ticket 25828; bugfix on 0.2.3.5-alpha.
We recently merged a circuit cell queue size safeguard. This commit adds the
number of killed circuits that have reached the limit to the DoS heartbeat. It
now looks like this:
[notice] DoS mitigation since startup: 0 circuits killed with too many
cells. 0 circuits rejected, 0 marked addresses. 0 connections closed. 0
single hop clients refused.
Second thing that this patch does. It makes tor always print the DoS
mitigation heartbeat line (for a relay) even though no DoS mitigation have
been enabled. The reason is because we now kill circuits that have too many
cells regardless on if it is enabled or not but also it will give the operator
a chance to learn what is enabled with the heartbeat instead of suddenly
appearing when it is enabled by let say the consensus.
Fixes#25824
Signed-off-by: David Goulet <dgoulet@torproject.org>
Unfortunately, the units passed to
monotime_coarse_stamp_units_to_approx_msec() was always 0 due to a type
conversion.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This commit introduces the consensus parameter "circ_max_cell_queue_size"
which controls the maximum number of cells a circuit queue should have.
The default value is currently 50000 cells which is above what should be
expected but keeps us a margin of error for padding cells.
Related to this is #9072. Back in 0.2.4.14-alpha, we've removed that limit due
to a Guard discovery attack. Ticket #25226 details why we are putting back the
limit due to the memory pressure issue on relays.
Fixes#25226
Signed-off-by: David Goulet <dgoulet@torproject.org>
Both header and code file had some indentation issues after mass renaming.
No code behavior change.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Really, the uint32_t is only an optimization; any kind of unit
should work fine. Some users might want to use time_t or
monotime_coarse_t or something like that.
Begin by creating a lowest-level triple of the types needed to
implement a token bucket: a configuration, a timestamp, and the raw
bucket itself.
Note that for low-level buckets, the units of the timestamp and the
bucket itself are unspecified: each user can use a different type.
(This patch breaks check-spaces; a later patch will fix it)
This is a simple search-and-replace to rename the token bucket type
to indicate that it contains both a read and a write bucket, bundled
with their configuration. It's preliminary to refactoring the
bucket type.
This test works by having two post-loop events activate one another
in a tight loop. If the "post-loop" mechanism didn't work, this
would be enough to starve all other events.
A linked connection_t is one that gets its I/O, not from the
network, but from another connection_t. When such a connection has
something to write, we want the corresponding connection to run its
read callback ... but not immediately, to avoid infinite recursion
and/or event loop starvation.
Previously we handled this case by activating the read events
outside the event loop. Now we use the "postloop event" logic.
This lets us simplify do_main_loop_once() a little.
We've been labeling some events as happening "outside the event
loop", to avoid Libevent starvation. This patch provides a cleaner
mechanism to avoid that starvation.
For background, the problem here is that Libevent only scans for new
events once it has run all its active callbacks. So if the
callbacks keep activating new callbacks, they could potentially
starve Libevent indefinitely and keep it from ever checking for
timed, socket, or signal events.
To solve this, we add the ability to label some events as
"post-loop". The rule for a "post-loop" event is that any events
_it_ activates can only be run after libevent has re-scanned for new
events at least once.
This differs from our previous token bucket abstraction in a few
ways:
1) It is an abstraction, and not a collection of fields.
2) It is meant to be used with monotonic timestamps, which should
produce better results than calling gettimeofday over and over.
In d1874b4339, we adjusted this check so that we insist on
using routerinfos for bridges. That's almost correct... but if we
have a bridge that is also a regular relay, then we should use
insist on its routerinfo when connecting to it as a bridge
(directly), and be willing to use its microdescriptor when
connecting to it elsewhere in our circuits.
This bug is a likely cause of some (all?) of the (exit_ei == NULL)
failures we've been seeing.
Fixes bug 25691; bugfix on 0.3.3.4-alpha
When size_t is 32 bits, the unit tests can't fit anything more than
4GB-1 into a size_t.
Additionally, tt_int_op() uses "long" -- we need tt_u64_op() to
safely test uint64_t values for equality.
Bug caused by tests for #24782 fix; not in any released Tor.
This patch changes the algorithm of compute_real_max_mem_in_queues() to
use 0.4 * RAM iff the system has more than or equal to 8 GB of RAM, but
will continue to use the old value of 0.75 * RAM if the system have less
than * GB of RAM available.
This patch also adds tests for compute_real_max_mem_in_queues().
See: https://bugs.torproject.org/24782
This patch makes get_total_system_memory mockable, which allows us to
alter the return value of the function in tests.
See: https://bugs.torproject.org/24782
This one happens if for some reason you start with DirPort enabled
but server mode turned off entirely.
Fixes a case of bug 23693; bugfix on 0.3.1.1-alpha.
This roughly doubles our test coverage of the bridges.c module.
* ADD new testing module, .../src/test/test_bridges.c.
* CHANGE a few function declarations from `static` to `STATIC`.
* CHANGE one function in transports.c, transport_get_by_name(), to be
mockable.
* CLOSES#25425: https://bugs.torproject.org/25425
This patch lifts the list of default directory authorities from config.c
into their own auth_dirs.inc file, which is then included in config.c
using the C preprocessor.
Patch by beastr0.
See: https://bugs.torproject.org/24854
* ADD new /src/common/crypto_rand.[ch] module.
* ADD new /src/common/crypto_util.[ch] module (contains the memwipe()
function, since all crypto_* modules need this).
* FIXES part of #24658: https://bugs.torproject.org/24658
This module doesn't actually need to mock the libevent mainloop at
all: it can just use the regular mainloop that the test environment
sets up.
Part of ticket 23750.
This change makes cpuworker and test_workqueue no longer need to
include event2/event.h. Now workqueue.c needs to include it, but
that is at least somewhat logical here.
We are going to stop recommending 0.2.5 so there is no reason to keep the
undef statement anymore.
Fixes#20522.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Both in geoip_note_client_seen() and options_need_geoip_info(), switch from
accessing the options directly to using the should_record_bridge_info() helper
function.
Fixes#25290
Signed-off-by: David Goulet <dgoulet@torproject.org>
Next commit is addressing the circuit queue cell limit so cleanup before doing
anything else.
Part of #25226
Signed-off-by: David Goulet <dgoulet@torproject.org>
Previously, the limit for MAX_PROTOCOLS_TO_EXPAND was actually being applied
in Rust to the maximum number of version (total, for all subprotocols).
Whereas in C, it was being applied to the number of subprotocols that were
allowed. This changes the Rust to match C's behaviour.
The behaviours still do not match, unsurprisingly, but now we know where a
primary difference is: the Rust is validating version ranges more than the C,
so in the C it's possible to call protover_all_supported on a ridiculous
version range like "Sleen=0-4294967294" because the C uses
MAX_PROTOCOLS_TO_EXPAND to count the number of *subprotocols* whereas the Rust
uses it to count the total number of *versions* of all subprotocols.
There's now no difference in these tests w.r.t. the C or Rust: both
fail miserably (well, Rust fails with nice descriptive errors, and C
gives you a traceback, because, well, C).
The DoS potential is slightly higher in C now due to some differences to the
Rust code, see the C_RUST_DIFFERS tags in src/rust/protover/tests/protover.rs.
Also, the comment about "failing at the splitting stage" in Rust wasn't true,
since when we split, we ignore empty chunks (e.g. "1--1" parses into
"(1,None),(None,1)" and "None" can't be parsed into an integer).
Finally, the comment about "Rust seems to experience an internal error" is only
true in debug mode, where u32s are bounds-checked at runtime. In release mode,
code expressing the equivalent of this test will error with
`Err(ProtoverError::Unparseable)` because 4294967295 is too large.
Previously, if "Link=1-5" was supported, and you asked protover_all_supported()
(or protover::all_supported() in Rust) if it supported "Link=3-999", the C
version would return "Link=3-999" and the Rust would return "Link=6-999". These
both behave the same now, i.e. both return "Link=6-999".
During code review and discussion with Chelsea Komlo, she pointed out
that protover::compute_for_old_tor() was a public function whose
return type was `&'static CStr`. We both agree that C-like parts of
APIs should:
1. not be exposed publicly (to other Rust crates),
2. only be called in the appropriate FFI code,
3. not expose types which are meant for FFI code (e.g. `*mut char`,
`CString`, `*const c_int`, etc.) to the pure-Rust code of other
crates.
4. FFI code (e.g. things in `ffi.rs` modules) should _never_ be called
from pure-Rust, not even from other modules in its own crate
(i.e. do not call `protover::ffi::*` from anywhere in
`protover::protoset::*`, etc).
With that in mind, this commit makes the following changes:
* CHANGE `protover::compute_for_old_tor()` to be
visible only at the `pub(crate)` level.
* RENAME `protover::compute_for_old_tor()` to
`protover::compute_for_old_tor_cstr()` to reflect the last change.
* ADD a new `protover::compute_for_old_tor()` function wrapper which
is public and intended for other Rust code to use, which returns a
`&str`.
It was changed to take borrows instead of taking ownership.
* REFACTOR `protover::ffi::protover_is_supported_here()` to use changed method
signature on `protover::is_supported_here()`.
This includes a subtle difference in behaviour to the previous Rust
implementation, where, for each vote that we're computing over, if a single one
fails to parse, we skip it. This now matches the current behaviour in the C
implementation.
* REFACTOR `protover::ffi::protover_compute_vote()` to use
new types and methods.
This includes a subtle difference in behaviour, as in 4258f1e18, where we return
(matching the C impl's return behaviour) earlier than before if parsing failed,
saving us computation in parsing the versions into a
protover::protoset::ProtoSet.
* REFACTOR `protover::ffi::protover_list_supports_protocol_or_later()` to use
new types and methods.
This includes a subtle difference in behaviour, as in 4258f1e18, where we return
(matching the C impl's return behaviour) earlier than before if parsing failed,
saving us computation in parsing the versions into a
protover::protoset::ProtoSet.
* REFACTOR `protover::ffi::protover_list_supports_protocol()` to use new types
and methods.
This includes differences in behaviour to before, which should now more closely
match the C version:
- If parsing a protover `char*` from C, and the string is not parseable, this
function will return 1 early, which matches the C behaviour when protocols
are unparseable. Previously, we would parse it and its version numbers
simultaneously, i.e. there was no fail early option, causing us to spend more
time unnecessarily parsing versions.
* REFACTOR `protover::ffi::protover_all_supported()` to use new types and
methods.
Previously, the rust implementation of protover considered an empty string to be
a valid ProtoEntry, while the C version did not (it must have a "=" character).
Other differences include that unknown protocols must now be parsed as
`protover::UnknownProtocol`s, and hence their entries as
`protover::UnvalidatedProtoEntry`s, whereas before (nearly) all protoentries
could be parsed regardless of how erroneous they might be considered by the C
version.
My apologies for this somewhat messy and difficult to read commit, if any part
is frustrating to the reviewer, please feel free to ask me to split this into
smaller changes (possibly hard to do, since so much changed), or ask me to
comment on a specific line/change and clarify how/when the behaviours differ.
The tests here should more closely match the behaviours exhibited by the C
implementation, but I do not yet personally guarantee they match precisely.
* REFACTOR unittests in protover::protover.
* ADD new integration tests for previously untested behaviour.
* FIXES part of #24031: https://bugs.torproject.org/24031.
This changes `protover::is_supported_here()` to be aware of new datatypes
(e.g. don't call `.0` on things which are no longer tuple structs) and also
changes the method signature to take borrows, making it faster, threadable, and
easier to read (i.e. the caller can know from reading the function signature
that the function won't mutate values passed into it).
* CHANGE the `protover::is_supported_here()` function to take borrows.
* REFACTOR the `protover::is_supported_here()` function to be aware of new
datatypes.
* FIXES part of #24031: https://bugs.torproject.org/24031
This adds a new type for votes upon `protover::ProtoEntry`s (technically, on
`protover::UnvalidatedProtoEntry`s, because the C code does not validate based
upon currently known protocols when voting, in order to maintain
future-compatibility), and converts several functions which would have operated
on this datatype into methods for ease-of-use and readability.
This also fixes a behavioural differentce to the C version of
protover_compute_vote(). The C version of protover_compute_vote() calls
expand_protocol_list() which checks if there would be too many subprotocols *or*
expanded individual version numbers, i.e. more than MAX_PROTOCOLS_TO_EXPAND, and
does this *per vote* (but only in compute_vote(), everywhere else in the C seems
to only care about the number of subprotocols, not the number of individual
versions). We need to match its behaviour in Rust and ensure we're not allowing
more than it would to get the votes to match.
* ADD new `protover::ProtoverVote` datatype.
* REMOVE the `protover::compute_vote()` function and refactor it into an
equivalent-in-behaviour albeit more memory-efficient voting algorithm based
on the new underlying `protover::protoset::ProtoSet` datatype, as
`ProtoverVote::compute()`.
* REMOVE the `protover::write_vote_to_string()` function, since this
functionality is now generated by the impl_to_string_for_proto_entry!() macro
for both `ProtoEntry` and `UnvalidatedProtoEntry` (the latter of which is the
correct type to return from a voting protocol instance, since the entity
voting may not know of all protocols being voted upon or known about by other
voting parties).
* FIXES part of #24031: https://bugs.torproject.org/24031
rust: Fix a difference in compute_vote() behaviour to C version.
This implements conversions from either a ProtoEntry or an UnvalidatedProtoEntry
into a String, for use in replacing such functions as
`protover::write_vote_to_string()`.
* ADD macro for implementing ToString trait for ProtoEntry and
UnvalidatedProtoEntry.
* FIXES part of #24031: https://bugs.torproject.org/24031
This adds a new protover::UnvalidatedProtoEntry type, which is the
UnknownProtocol variant of a ProtoEntry, and refactors several functions which
should operate on this type into methods.
This also fixes what was previously another difference to the C implementation:
if you asked the C version of protovet_compute_vote() to compute a single vote
containing "Fribble=", it would return NULL. However, the Rust version would
return "Fribble=" since it didn't check if the versions were empty before
constructing the string of differences. ("Fribble=" is technically a valid
protover string.) This is now fixed, and the Rust version in that case will,
analogous to (although safer than) C returning a NULL, return None.
* REMOVE internal `contains_only_supported_protocols()` function.
* REMOVE `all_supported()` function and refactor it into
`UnvalidatedProtoEntry::all_supported()`.
* REMOVE `parse_protocols_from_string_with_no_validation()` and
refactor it into the more rusty implementation of
`impl FromStr for UnvalidatedProtoEntry`.
* REMOVE `protover_string_supports_protocol()` and refactor it into
`UnvalidatedProtoEntry::supports_protocol()`.
* REMOVE `protover_string_supports_protocol_or_later()` and refactor
it into `UnvalidatedProtoEntry::supports_protocol_or_later()`.
* FIXES part of #24031: https://bugs.torproject.org/24031
rust: Fix another C/Rust different in compute_vote().
This fixes the unittest from the prior commit by checking if the versions are
empty before adding a protocol to a vote.
This replaces the `protover::SupportedProtocols` (why would you have a type just
for things which are supported?) with a new, more generic type,
`protover::ProtoEntry`, which utilises the new, more memory-efficient datatype
in protover::protoset.
* REMOVE `get_supported_protocols()` and `SupportedProtocols::tor_supported()`
(since they were never used separately) and collapse their functionality into
a single `ProtoEntry::supported()` method.
* REMOVE `SupportedProtocols::from_proto_entries()` and reimplement its
functionality as the more rusty `impl FromStr for ProtoEntry`.
* REMOVE `get_proto_and_vers()` function and refactor it into the more rusty
`impl FromStr for ProtoEntry`.
* FIXES part of #24031: https://bugs.torproject.org/24031
* ADD new type, protover::UnknownProtocol, so that we have greater type safety
and our protover functionality which works with unsanitised protocol names is
more clearly demarcated.
* REFACTOR protover::Proto, renaming it protover::Protocol to mirror the new
protover::UnknownProtocol type name.
* ADD a utility conversion of `impl From<Protocol> for UnknownProtocol` so that
we can easily with known protocols and unknown protocols simultaneously
(e.g. doing comparisons, checking their version numbers), while not allowing
UnknownProtocols to be accidentally used in functions which should only take
Protocols.
* FIXES part of #24031: https://bugs.torproject.org/24031
* ADD new protover::protoset module.
* ADD new protover::protoset::ProtoSet class for holding protover versions.
* REMOVE protover::Versions type implementation and its method
`from_version_string()`, and instead implement this behaviour in a more
rust-like manner as `impl FromStr for ProtoSet`.
* MOVE the `find_range()` utility function from protover::protover to
protover::protoset since it's only used internally in the
implementation of ProtoSet.
* REMOVE the `contract_protocol_list()` function from protover::protover and
instead refactor it (reusing nearly the entire thing, with minor superficial,
i.e. non-behavioural, changes) into a more rusty
`impl ToString for ProtoSet`.
* REMOVE the `expand_version_range()` function from protover::protover and
instead refactor it into a more rusty implementation of
`impl Into<Vec<Version>> for ProtoSet` using the new error types in
protover::errors.
* FIXES part of #24031: https://bugs.torproject.org/24031.
This will allow us to do actual error handling intra-crate in a more
rusty manner, e.g. propogating errors in match statements, conversion
between error types, logging messages, etc.
* FIXES part of #24031: https://bugs.torproject.org/24031.
Previously, the limit for MAX_PROTOCOLS_TO_EXPAND was actually being applied
in Rust to the maximum number of version (total, for all subprotocols).
Whereas in C, it was being applied to the number of subprotocols that were
allowed. This changes the Rust to match C's behaviour.
The behaviours still do not match, unsurprisingly, but now we know where a
primary difference is: the Rust is validating version ranges more than the C,
so in the C it's possible to call protover_all_supported on a ridiculous
version range like "Sleen=0-4294967294" because the C uses
MAX_PROTOCOLS_TO_EXPAND to count the number of *subprotocols* whereas the Rust
uses it to count the total number of *versions* of all subprotocols.
There's now no difference in these tests w.r.t. the C or Rust: both
fail miserably (well, Rust fails with nice descriptive errors, and C
gives you a traceback, because, well, C).
The DoS potential is slightly higher in C now due to some differences to the
Rust code, see the C_RUST_DIFFERS tags in src/rust/protover/tests/protover.rs.
Also, the comment about "failing at the splitting stage" in Rust wasn't true,
since when we split, we ignore empty chunks (e.g. "1--1" parses into
"(1,None),(None,1)" and "None" can't be parsed into an integer).
Finally, the comment about "Rust seems to experience an internal error" is only
true in debug mode, where u32s are bounds-checked at runtime. In release mode,
code expressing the equivalent of this test will error with
`Err(ProtoverError::Unparseable)` because 4294967295 is too large.
Previously, if "Link=1-5" was supported, and you asked protover_all_supported()
(or protover::all_supported() in Rust) if it supported "Link=3-999", the C
version would return "Link=3-999" and the Rust would return "Link=6-999". These
both behave the same now, i.e. both return "Link=6-999".
During code review and discussion with Chelsea Komlo, she pointed out
that protover::compute_for_old_tor() was a public function whose
return type was `&'static CStr`. We both agree that C-like parts of
APIs should:
1. not be exposed publicly (to other Rust crates),
2. only be called in the appropriate FFI code,
3. not expose types which are meant for FFI code (e.g. `*mut char`,
`CString`, `*const c_int`, etc.) to the pure-Rust code of other
crates.
4. FFI code (e.g. things in `ffi.rs` modules) should _never_ be called
from pure-Rust, not even from other modules in its own crate
(i.e. do not call `protover::ffi::*` from anywhere in
`protover::protoset::*`, etc).
With that in mind, this commit makes the following changes:
* CHANGE `protover::compute_for_old_tor()` to be
visible only at the `pub(crate)` level.
* RENAME `protover::compute_for_old_tor()` to
`protover::compute_for_old_tor_cstr()` to reflect the last change.
* ADD a new `protover::compute_for_old_tor()` function wrapper which
is public and intended for other Rust code to use, which returns a
`&str`.
It was changed to take borrows instead of taking ownership.
* REFACTOR `protover::ffi::protover_is_supported_here()` to use changed method
signature on `protover::is_supported_here()`.
This includes a subtle difference in behaviour to the previous Rust
implementation, where, for each vote that we're computing over, if a single one
fails to parse, we skip it. This now matches the current behaviour in the C
implementation.
* REFACTOR `protover::ffi::protover_compute_vote()` to use
new types and methods.
This includes a subtle difference in behaviour, as in 4258f1e18, where we return
(matching the C impl's return behaviour) earlier than before if parsing failed,
saving us computation in parsing the versions into a
protover::protoset::ProtoSet.
* REFACTOR `protover::ffi::protover_list_supports_protocol_or_later()` to use
new types and methods.
This includes a subtle difference in behaviour, as in 4258f1e18, where we return
(matching the C impl's return behaviour) earlier than before if parsing failed,
saving us computation in parsing the versions into a
protover::protoset::ProtoSet.
* REFACTOR `protover::ffi::protover_list_supports_protocol()` to use new types
and methods.
This includes differences in behaviour to before, which should now more closely
match the C version:
- If parsing a protover `char*` from C, and the string is not parseable, this
function will return 1 early, which matches the C behaviour when protocols
are unparseable. Previously, we would parse it and its version numbers
simultaneously, i.e. there was no fail early option, causing us to spend more
time unnecessarily parsing versions.
* REFACTOR `protover::ffi::protover_all_supported()` to use new types and
methods.
Previously, the rust implementation of protover considered an empty string to be
a valid ProtoEntry, while the C version did not (it must have a "=" character).
Other differences include that unknown protocols must now be parsed as
`protover::UnknownProtocol`s, and hence their entries as
`protover::UnvalidatedProtoEntry`s, whereas before (nearly) all protoentries
could be parsed regardless of how erroneous they might be considered by the C
version.
My apologies for this somewhat messy and difficult to read commit, if any part
is frustrating to the reviewer, please feel free to ask me to split this into
smaller changes (possibly hard to do, since so much changed), or ask me to
comment on a specific line/change and clarify how/when the behaviours differ.
The tests here should more closely match the behaviours exhibited by the C
implementation, but I do not yet personally guarantee they match precisely.
* REFACTOR unittests in protover::protover.
* ADD new integration tests for previously untested behaviour.
* FIXES part of #24031: https://bugs.torproject.org/24031.
This changes `protover::is_supported_here()` to be aware of new datatypes
(e.g. don't call `.0` on things which are no longer tuple structs) and also
changes the method signature to take borrows, making it faster, threadable, and
easier to read (i.e. the caller can know from reading the function signature
that the function won't mutate values passed into it).
* CHANGE the `protover::is_supported_here()` function to take borrows.
* REFACTOR the `protover::is_supported_here()` function to be aware of new
datatypes.
* FIXES part of #24031: https://bugs.torproject.org/24031
This adds a new type for votes upon `protover::ProtoEntry`s (technically, on
`protover::UnvalidatedProtoEntry`s, because the C code does not validate based
upon currently known protocols when voting, in order to maintain
future-compatibility), and converts several functions which would have operated
on this datatype into methods for ease-of-use and readability.
This also fixes a behavioural differentce to the C version of
protover_compute_vote(). The C version of protover_compute_vote() calls
expand_protocol_list() which checks if there would be too many subprotocols *or*
expanded individual version numbers, i.e. more than MAX_PROTOCOLS_TO_EXPAND, and
does this *per vote* (but only in compute_vote(), everywhere else in the C seems
to only care about the number of subprotocols, not the number of individual
versions). We need to match its behaviour in Rust and ensure we're not allowing
more than it would to get the votes to match.
* ADD new `protover::ProtoverVote` datatype.
* REMOVE the `protover::compute_vote()` function and refactor it into an
equivalent-in-behaviour albeit more memory-efficient voting algorithm based
on the new underlying `protover::protoset::ProtoSet` datatype, as
`ProtoverVote::compute()`.
* REMOVE the `protover::write_vote_to_string()` function, since this
functionality is now generated by the impl_to_string_for_proto_entry!() macro
for both `ProtoEntry` and `UnvalidatedProtoEntry` (the latter of which is the
correct type to return from a voting protocol instance, since the entity
voting may not know of all protocols being voted upon or known about by other
voting parties).
* FIXES part of #24031: https://bugs.torproject.org/24031
rust: Fix a difference in compute_vote() behaviour to C version.
This implements conversions from either a ProtoEntry or an UnvalidatedProtoEntry
into a String, for use in replacing such functions as
`protover::write_vote_to_string()`.
* ADD macro for implementing ToString trait for ProtoEntry and
UnvalidatedProtoEntry.
* FIXES part of #24031: https://bugs.torproject.org/24031
This adds a new protover::UnvalidatedProtoEntry type, which is the
UnknownProtocol variant of a ProtoEntry, and refactors several functions which
should operate on this type into methods.
This also fixes what was previously another difference to the C implementation:
if you asked the C version of protovet_compute_vote() to compute a single vote
containing "Fribble=", it would return NULL. However, the Rust version would
return "Fribble=" since it didn't check if the versions were empty before
constructing the string of differences. ("Fribble=" is technically a valid
protover string.) This is now fixed, and the Rust version in that case will,
analogous to (although safer than) C returning a NULL, return None.
* REMOVE internal `contains_only_supported_protocols()` function.
* REMOVE `all_supported()` function and refactor it into
`UnvalidatedProtoEntry::all_supported()`.
* REMOVE `parse_protocols_from_string_with_no_validation()` and
refactor it into the more rusty implementation of
`impl FromStr for UnvalidatedProtoEntry`.
* REMOVE `protover_string_supports_protocol()` and refactor it into
`UnvalidatedProtoEntry::supports_protocol()`.
* REMOVE `protover_string_supports_protocol_or_later()` and refactor
it into `UnvalidatedProtoEntry::supports_protocol_or_later()`.
* FIXES part of #24031: https://bugs.torproject.org/24031
rust: Fix another C/Rust different in compute_vote().
This fixes the unittest from the prior commit by checking if the versions are
empty before adding a protocol to a vote.
This replaces the `protover::SupportedProtocols` (why would you have a type just
for things which are supported?) with a new, more generic type,
`protover::ProtoEntry`, which utilises the new, more memory-efficient datatype
in protover::protoset.
* REMOVE `get_supported_protocols()` and `SupportedProtocols::tor_supported()`
(since they were never used separately) and collapse their functionality into
a single `ProtoEntry::supported()` method.
* REMOVE `SupportedProtocols::from_proto_entries()` and reimplement its
functionality as the more rusty `impl FromStr for ProtoEntry`.
* REMOVE `get_proto_and_vers()` function and refactor it into the more rusty
`impl FromStr for ProtoEntry`.
* FIXES part of #24031: https://bugs.torproject.org/24031
* ADD new type, protover::UnknownProtocol, so that we have greater type safety
and our protover functionality which works with unsanitised protocol names is
more clearly demarcated.
* REFACTOR protover::Proto, renaming it protover::Protocol to mirror the new
protover::UnknownProtocol type name.
* ADD a utility conversion of `impl From<Protocol> for UnknownProtocol` so that
we can easily with known protocols and unknown protocols simultaneously
(e.g. doing comparisons, checking their version numbers), while not allowing
UnknownProtocols to be accidentally used in functions which should only take
Protocols.
* FIXES part of #24031: https://bugs.torproject.org/24031
* ADD new protover::protoset module.
* ADD new protover::protoset::ProtoSet class for holding protover versions.
* REMOVE protover::Versions type implementation and its method
`from_version_string()`, and instead implement this behaviour in a more
rust-like manner as `impl FromStr for ProtoSet`.
* MOVE the `find_range()` utility function from protover::protover to
protover::protoset since it's only used internally in the
implementation of ProtoSet.
* REMOVE the `contract_protocol_list()` function from protover::protover and
instead refactor it (reusing nearly the entire thing, with minor superficial,
i.e. non-behavioural, changes) into a more rusty
`impl ToString for ProtoSet`.
* REMOVE the `expand_version_range()` function from protover::protover and
instead refactor it into a more rusty implementation of
`impl Into<Vec<Version>> for ProtoSet` using the new error types in
protover::errors.
* FIXES part of #24031: https://bugs.torproject.org/24031.
This will allow us to do actual error handling intra-crate in a more
rusty manner, e.g. propogating errors in match statements, conversion
between error types, logging messages, etc.
* FIXES part of #24031: https://bugs.torproject.org/24031.
When a relay is collecting internal statistics about how many
create cell requests it has seen of each type, accurately count the
requests from relays that temporarily fall out of the consensus.
(To be extra conservative, we were already ignoring requests from clients
in our counts, and we continue ignoring them here.)
Fixes bug 24910; bugfix on 0.2.4.17-rc.
Directory authorities no longer vote in favor of the Guard flag
for relays that don't advertise directory support.
Starting in Tor 0.3.0.1-alpha, Tor clients have been avoiding using
such relays in the Guard position, leading to increasingly broken load
balancing for the 5%-or-so of Guards that don't advertise directory
support.
Fixes bug 22310; bugfix on 0.3.0.6.
Add a missing lock acquisition around access to queued_control_events
in control_free_all(). Use the reassign-and-unlock strategy as in
queued_events_flush_all(). Fixes bug 25675. Coverity found this bug,
but only after we recently added an access to
flush_queued_event_pending.
The earlier checks in this function should ensure that components is
always nonempty. But in case somebody messes with them in the
future, let's add an extra check to make sure we aren't crashing.
Coverity found a null pointer reference in nodelist_add_microdesc().
This is almost certainly impossible assuming that the routerstatus_t
returned by router_get_consensus_status_by_descriptor_digest() always
corresponds to an entry in the nodelist. Fixes bug 25629.
Coverity found a null pointer reference in nodelist_add_microdesc().
This is almost certainly impossible assuming that the routerstatus_t
returned by router_get_consensus_status_by_descriptor_digest() always
corresponds to an entry in the nodelist. Fixes bug 25629.
Coverity found a null pointer reference in nodelist_add_microdesc().
This is almost certainly impossible assuming that the routerstatus_t
returned by router_get_consensus_status_by_descriptor_digest() always
corresponds to an entry in the nodelist. Fixes bug 25629.
If we failed to connect at the TCP level to a relay, note it down and refuse
to connect again for another 60 seconds.
Fixes#24767
Signed-off-by: David Goulet <dgoulet@torproject.org>
This update is needed to make it consistent with the behavior of
node_awaiting_ipv6(), which doesn't believe in the addresses from
routerinfos unless it actually plans to use those routerinfos.
Fixes bug 25213; bugfix on b66b62fb75 in 0.3.3.1-alpha,
which tightened up the definition of node_awaiting_ipv6().
There was a nonfatal assertion in pathbias_should_count that would
trigger if onehop_tunnel was set, but the desired_path_length was
greater than 1. This patch fixes that. Fixes bug 24903; bugfix on
0.2.5.2-alpha.
These tests handle incoming and outgoing cells on a three-hop
circuit, and make sure that the crypto works end-to-end. They don't
yet test spec conformance, leaky-pipe, or various error cases.
Make sure we actually only report client channel to the geoip cache instead of
looking if it is a known relay. Looking if it is a known relay can be
unreliable because they come and go from the consensus.
Fixes#24904
Signed-off-by: David Goulet <dgoulet@torproject.org>
Because of #25306 for which we are unable to reproduce nor understand how it
is possible, this commit removes the asserts() and BUG() on the missing
descriptors instead when rotating them.
This allows us to log more data on error but also to let tor recover
gracefully instead of dying.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This patch adds some additional logging to circuits_handle_oom() to give
us more information about which specific compression backend that is
using a certain amount of memory.
See: https://bugs.torproject.org/25372
These declarations need to exist unconditionally, but they were
trapped inside an "#else /* !(defined(HAVE_SYSLOG_H)) */" block.
Fixes a travis regression caused by 23881; bug not in any released tor.
Additionally, this change extracts the functions that created and
freed these elements.
These structures had common "forward&reverse stream&digest"
elements, but they were initialized and freed through cpath objects,
and different parts of the code depended on them. Now all that code
is extacted, and kept in relay_crypto.c
This should help us improve modularity, and should also make it
easier for people to experiment with other relay crypto strategies
down the road.
This commit is pure function movement.
This function is used upon receiving a cell, and only handles the
decrypting part. The encryption part is currently handled inside
circuit_package_relay_cell.
This should avoid most intermittent test failures on developer and CI machines,
but there could (and probably should) be a more elegant solution.
Also, this test was testing that the IP was created and its expiration time was
set to a time greater than or equal to `now+INTRO_POINT_LIFETIME_MIN_SECONDS+5`:
/* Time to expire MUST also be in that range. We add 5 seconds because
* there could be a gap between setting now and the time taken in
* service_intro_point_new. On ARM, it can be surprisingly slow... */
tt_u64_op(ip->time_to_expire, OP_GE,
now + INTRO_POINT_LIFETIME_MIN_SECONDS + 5);
However, this appears to be a typo, since, according to the comment above it,
adding five seconds was done because the IP creation can be slow on some
systems. But the five seconds is added to the *minimum* time we're comparing
against, and so it actually functions to make this test *more* likely to fail on
slower systems. (It should either subtract five seconds, or instead add it to
time_to_expire.)
* FIXES#25450: https://bugs.torproject.org/25450
These were meant to demonstrate old behavior, or old rust behavior.
One of them _should_ work in Rust, but won't because of
implementation details. We'll fix that up later.
The C code and the rust code had different separate integer overflow
bugs here. That suggests that we're better off just forbidding this
pathological case.
Also, add tests for expected behavior on receiving a bad protocol
list in a consensus.
Fixes another part of 25249.
I've refactored these to be a separate function, to avoid tricky
merge conflicts.
Some of these are disabled with "XXXX" comments; they should get
fixed moving forward.
This one can only be exploited if you can generate a correctly
signed consensus, so it's not as bad as 25074.
Fixes bug 25251; also tracked as TROVE-2018-004.
In some cases we had checked for it, but in others we had not. One
of these cases could have been used to remotely cause
denial-of-service against directory authorities while they attempted
to vote.
Fixes TROVE-2018-001.
* ADD includes for "torint.h" and "container.h" to crypto_digest.h.
* ADD includes for "crypto_digest.h" to a couple places in which
crypto_digest_t was then missing.
* FIXES part of #24658: https://bugs.torproject.org/24658#comment:30
Folks have found two in the past week or so; we may as well fix the
others.
Found with:
\#!/usr/bin/python3
import re
def findMulti(fname):
includes = set()
with open(fname) as f:
for line in f:
m = re.match(r'^\s*#\s*include\s+["<](\S+)[>"]', line)
if m:
inc = m.group(1)
if inc in includes:
print("{}: {}".format(fname, inc))
includes.add(m.group(1))
import sys
for fname in sys.argv[1:]:
findMulti(fname)
We moved the crypto_pk_obselete_* functions into crypto_rsa.[ch] because they fit
better with the RSA module.
Follows #24658.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
since all it does is produce false positives
this commit should get merged into 0.2.9 and 0.3.0 *and* 0.3.1, even
though the code in the previous commit is already present in 0.3.1. sorry
for the mess.
[Cherry-picked]
since all it does is produce false positives
this commit should get merged into 0.2.9 and 0.3.0 *and* 0.3.1, even
though the code in the previous commit is already present in 0.3.1. sorry
for the mess.
This commit takes a piece of commit af8cadf3a9 and a piece of commit
46fe353f25, with the goal of making channel_is_client() be based on what
sort of connection handshake the other side used, rather than seeing
whether the other side ever sent a create_fast cell to us.
We moved the crypto_pk_* digest functions into crypto_rsa.[ch] because they fit
better with the RSA module.
Follows #24658.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Also correct MAX_VERSIONS_TO_EXPAND to match the C.
NOTE that this patch leads to incorrect behavior: the C code allows
huge ranges; it just doesn't allow votes on them (currently). For
full compatibility, we'll need to make the rust code store ranges as
ranges natively, possibly using something like the range_map crate.
Still, this patch is smaller than a "proper" fix.
Fixes TROVE-2018-003.
Since 0.2.4, tor uses EWMA circuit policy to prioritize. The previous
algorithm, round-robin, hasn't been used since then but was still used as a
fallback.
Now that EWMA is mandatory, remove that code entirely and enforce a cmux
policy to be set.
This is part of a circuitmux cleanup to improve performance and reduce
complexity in the code. We'll be able to address future optimization with this
work.
Closes#25268
Signed-off-by: David Goulet <dgoulet@torproject.org>
The reason to do so is because these functions haven't been used in years so
since 0.2.4, every callsite is NOP.
In future commits, we'll remove the round robin circuit policy which is mostly
validated within those function.
This simplifies the code greatly and remove dead code for which we never had a
configure option in the first place nor an easy way to use them in production.
Part of #25268
Signed-off-by: David Goulet <dgoulet@torproject.org>
It is rename to something more meaningful that explains what it does exactly
which is sets the EWMA options (currently only one exists). The new name is
cmux_ewma_set_options().
Also, remove a public function from circuitmux_ewma.h that is only used in the
C file. Make it static inline as well.
Signed-off-by: David Goulet <dgoulet@torproject.org>
To achieve this, a default value for the CircuitPriorityHalflife option was
needed. We still look in the options and then the consensus but in case no
value can be found, the default CircuitPriorityHalflifeMsec=30000 is used. It
it the value we've been using since 0.2.4.4-alpha.
This means that EWMA, our only policy, can not be disabled anymore fallbacking
to the round robin algorithm. Unneeded code to control that is removed in this
commit.
Part of #25268
Signed-off-by: David Goulet <dgoulet@torproject.org>
We had this safeguard around dos_init() but not when the consensus changes
which can modify consensus parameters and possibly enable the DoS mitigation
even if tor wasn't a public relay.
Fixes#25223
Signed-off-by: David Goulet <dgoulet@torproject.org>
Explicitly inform the operator of the rejected relay to set a valid email
address in the ContactInfo field and contact bad-relays@ mailing list.
Fixes#25170
Signed-off-by: David Goulet <dgoulet@torproject.org>
We don't expect this to come up very much, but we may as well make
sure that the value isn't predictable (as we do for the other
addresses) in case the issue ever comes up.
Spotted by teor.
This patch lowers the log-level from warning to info in the cases where
we are going to attempt another method as entropy source to hopefully
make the user feel less concerned.
See: https://bugs.torproject.org/25120
* ADD a new macro, tor_util::string::cstr!() which takes Rust strings,
concatenates them together, appends a NUL byte, and converts it into a
std::ffi::CStr for handing to C.
This is to avoid positively identifying Exit relays if tor client connection
comes from them that is reentering the network.
One thing to note is that this is done only in the DoS subsystem but we'll
still add it to the geoip cache as a "client" seen. This is done that way so
to avoid as much as possible changing the current behavior of the geoip client
cache since this is being backported.
Closes#25193
Signed-off-by: David Goulet <dgoulet@torproject.org>
Rationale: this helps for performance only, but we don't actually
have any reason to think that the checks here are
performance-critical. Let's not normalize the use of unsafe {}.
Explicitly inform the operator of the rejected relay to set a valid email
address in the ContactInfo field and contact bad-relays@ mailing list.
Fixes#25170
Signed-off-by: David Goulet <dgoulet@torproject.org>
* FIXES#25127: https://bugs.torproject.org/25127
* ADDS a new module to the Rust tor_util crate for small utilities
for working with static strings between languages.
* CHANGES the return type of protover_compute_for_old_tor to point to
immutable data.
* CHANGES the code from the previous commit to use the new static
string utilities.
At this commit, the SocksSocketsGroupWritable option is renamed to
UnixSocksGroupWritable. A deprecated warning is triggered if the old option is
used and tor will use it properly.
Fixes#24343
Signed-off-by: David Goulet <dgoulet@torproject.org>
On slow system, 1 msec between one read and the other was too tight. For
instance, it failed on armel with a 4msec gap:
https://buildd.debian.org/status/package.php?p=tor&suite=experimental
Increase to 10 msec for now to address slow system. It is important that we
keep this OP_LE test in so we make sure the msec/usec/nsec read aren't
desynchronized by huge gaps. We'll adjust again if we ever encounter a system
that goes slower than 10 msec between calls.
Fixes#25113
Signed-off-by: David Goulet <dgoulet@torproject.org>
Remove a series of connection counters that were only used when dumping the
rephist statistics with SIGUSR1 signal.
This reduces the or_history_t structure size.
Closes#25163
Signed-off-by: David Goulet <dgoulet@torproject.org>
This removes the code that tracks the extend attemps a client makes. We don't
use it and it was only used to provide statistics on a SIGUSR1 from the
rephist dump stats function.
Part of #25163
Signed-off-by: David Goulet <dgoulet@torproject.org>
Since we're making it so that unstable zstd apis can be disabled,
we need to test them. I do this by adding a variant setup/cleanup
function for the tests, and teaching it about a fake compression
method called "x-zstd:nostatic".
Services can keep rendezvous circuits for a while so don't log them if tor is
a single onion service.
Fixes#25116
Signed-off-by: David Goulet <dgoulet@torproject.org>
The HT_FOREACH() is insanely heavy on the CPU and this is part of the fast
path so make it return the nice memory size counter we added in
4d812e29b9.
Fixes#25148
Signed-off-by: David Goulet <dgoulet@torproject.org>
Included crypto_digest.[ch] into include.am in order to solve a compiling
issue. Also EOF line in crypto_digest.c added.
Follows #24658.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Included crypto_digest.h in some files in order to solve xof+digest module
dependency issues. Removed crypto.h where it isn't needed anymore.
Follows #24658.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Add two new files (crypto_digest.c, crypto_digest.h) as new module of
crypto.[ch]. This new module includes all functions and dependencies related
to digest and xof operations. Those have been removed from crypto.[ch].
Follows #24658.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Becasue the circuit creation burst and rate can change at runtime it is
possible that between two refill of a bucket, we end up setting the bucket
value to less than there currently is.
Fixes#25128
Signed-off-by: David Goulet <dgoulet@torproject.org>
If the cache is using 20% of our maximum allowed memory, clean 10% of it. Same
behavior as the HS descriptor cache.
Closes#25122
Signed-off-by: David Goulet <dgoulet@torproject.org>
The current code flow makes it that we can release a channel in a PENDING
state but not in the pending list. This happens while the channel is being
processed in the scheduler loop.
Fixes#25125
Signed-off-by: David Goulet <dgoulet@torproject.org>
This tests many cases of the KIST scheduler with the pending list state by
calling entry point in the scheduler while channels are scheduled or not.
Also, it adds a test for the bug #24700.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This patch makes the wording around error cases for selecting an entropy
source in Tor slightly more verbose. We also let the user know when
something goes wrong that we are trying out a fallback method instead.
See: https://bugs.torproject.org/25120
Included crypto_rsa.[ch] into include.am in order to resolve a compiling issue.
Follows #24658.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
crypto_get_rsa_padding_overhead() and crypto_get_rsa_padding() are
not static inline anymore in order to split the crypto_rsa module
from crypto.[ch].
Also included necessary modules in order to solve dependency issues.
Also made two functions in crypto.c use crypto_pk_asn1_encdoe()
instead of reaching into the crypto_pk_t struct.
This reverts commit 9a06282546.
It appears that I misunderstood how the seccomp2 filter rules
interact. It appears that `SCMP_ACT_ERRNO()` always takes
precedence over `SCMP_ACT_ALLOW()` -- I had thought instead that
earlier rules would override later ones. But this change caused bug
25115 (not in any released Tor).
The accurate address of a connection is real_addr, not the addr member.
channel_tls_get_remote_addr_method() now returns real_addr instead.
Fixes#24952; bugfix on 707c1e2 in 0.2.4.11-alpha.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Tor preemptiely builds circuits and they can be cannibalized later in their
lifetime. A Guard node can become unusable (from our guard state) but we can
still have circuits using that node opened. It is important to not pick those
circuits for any usage through the cannibalization process.
Fixes#24469
Signed-off-by: David Goulet <dgoulet@torproject.org>
These are no longer meaningful, since there's no longer an upper
limit to how many times (in the exponential-backoff world) one can
retry a download. download_status_is_ready() didn't check these any
more, and neither do we.
In 0.3.2.1-alpha, we've added notify_networkstatus_changed() in order to have
a way to notify other subsystems that the consensus just changed. The old and
new consensus are passed to it.
Before this patch, this was done _before_ the new consensus was set globally
(thus NOT accessible by getting the latest consensus). The scheduler
notification was assuming that it was set and select_scheduler() is looking at
the latest consensus to get the parameters it might needs. This was very wrong
because at that point it is still the old consensus set globally.
This commit changes the notify_networkstatus_changed() to be the "before"
function and adds an "after" notification from which the scheduler subsystem
is notified.
Fixes#24975
When we stopped looking at the "protocols" variable directly, we
broke the hs_service/build_update_descriptors test, since it didn't
actually update any of the flags.
The fix here is to call summarize_protover_flags() from that test,
and to expose summarize_protover_flags() as "STATIC" from
routerparse.c.
This is the quick fix that is keeping the channel in PENDING state so if we
ever try to reschedule the same channel, it won't happened.
Fixes#24700
Signed-off-by: David Goulet <dgoulet@torproject.org>
It is possible in normal circumstances that a client fetches a descriptor
that has a lower revision counter than the one in its cache. This can happen
due to HSDir desync.
Fixes#24976
Signed-off-by: David Goulet <dgoulet@torproject.org>
Setting the default for this at 10 and the learning timeout to 3 minutes means
we will complete our cbt learning in 30 minutes, which is under the reduced
padding connection timeout window.
In 0.3.2.1-alpha, we've added this function in order to have a way to notify
other subsystems that the consensus just changed. The old consensus and the
new one are passed to it.
Before this patch, this was done _before_ the new consensus was set globally
(thus NOT accessible by getting the latest consensus). The scheduler
notification was assuming that it was set and select_scheduler() is looking at
the latest consensus to get the parameters it might needs. This was very wrong
because at that point it is still the old consensus set globally.
With this commit, notify_networkstatus_changed() has been moved _after_ the
new consensus is set globally. The main obvious reasons is to fix the bug
described above and in #24975. The other reason is that this notify function
doesn't return anything which could be allowing the possibility of refusing to
set the new consensus on error. In other words, the new consensus is set right
after the notification whatever happens.
It does no harm or change in behavior to set the new consensus first and then
notify the subsystems. The two functions currently used are for the control
port using the old and new consensus and sending the diff. The second is the
scheduler that needs the new consensus to be set globally before being called.
Of course, the function has been documented accordinly to clearly state it is
done _after_ the new consensus is set.
Fixes#24975
Signed-off-by: David Goulet <dgoulet@torproject.org>
Stop adding unneeded channel padding right after we finish flushing
to a connection that has been trying to flush for many seconds.
Instead, treat all partial or complete flushes as activity on the
channel, which will defer the time until we need to add padding.
This fix should resolve confusing and scary log messages like
"Channel padding timeout scheduled 221453ms in the past."
Fixes bug 22212; bugfix on 0.3.1.1-alpha.
I think technically we could resolve bug 22212 by adding a call to
channel_timestamp_active() only in the finished_flushing case. But I added
a call in the flushed_some case too since that seems to more accurately
reflect the notion of "active".
Using get_uptime() and reset_uptime() instead of
accessing stats_n_seconds_working directly.
stats_n_seconds_working is not extern anymore.
Ticket #25081
Because this touches too many commits at once, it is made into one single
commit.
Remove the use of "tenths" for the circuit rate to simplify things. We can
only refill the buckets at best once every second because of the use of
approx_time() and our token system is set to be 1 token = 1 circuit so make
the rate a flat integer of circuit per second.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Imagine this scenario. We had 10 connections over the 24h lifetime of a geoip
cache entry. The lifetime of the entry has been reached so it is about to get
freed but 2 connections remain for it. After the free, a third connection
comes in thus making us create a new geoip entry for that address matching the
2 previous ones that are still alive. If they end up being closed, we'll have
a concurrent count desynch from what the reality is.
To mitigate this probably very rare scenario in practice, when we free a geoip
entry and it has a concurrent count above 0, we'll go over all connections
matching the address and clear out the tracked flag. So once they are closed,
we don't try to decrement the count.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This option refuses any ESTABLISH_RENDEZVOUS cell arriving from a client
connection. Its default value is "auto" for which we can turn it on or off
with a consensus parameter. Default value is 0.
Signed-off-by: David Goulet <dgoulet@torproject.org>
If the client address was detected as malicious, apply a defense which is at
this commit to return a DESTROY cell.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Add a function that notifies the DoS subsystem that a new CREATE cell has
arrived. The statistics are updated accordingly and the IP address can also be
marked as malicious if it is above threshold.
At this commit, no defense is applied, just detection with a circuit creation
token bucket system.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Implement a basic connection tracking that counts the number of concurrent
connections when they open and close.
This commit also adds the circuit creation mitigation data structure that will
be needed at later commit to keep track of the circuit rate.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This commit introduces the src/or/dos.{c|h} files that contains the code for
the Denial of Service mitigation subsystem. It currently contains basic
functions to initialize and free the subsystem. They are used at this commit.
The torrc options and consensus parameters are defined at this commit and
getters are implemented.
Signed-off-by: David Goulet <dgoulet@torproject.org>
And fix the unsupported protover example so it uses a Link protover much
higher than 5.
Part of #25070, bugfix on 0.3.3.1-alpha, which introduced the protover crate.
I'm leaving the getsockname code in transproxy alone, since it is
comparatively isolated, rather platform-specific, and hard to test.
Implements 18105.
Add two new files (crypto_rsa.c, crypto_rsa.h) as new module of crypto.[ch].
This new module includes all functions and dependencies related to RSA
operations. Those have been removed from crypto.[ch].
All new changes related to RSA operations must be done in these files.
Follows #24658
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
The upcoming DoS mitigation subsytem needs to keep information on a per-IP
basis which is also what the geoip clientmap does.
For another subsystem to access that clientmap, this commit adds a lookup
function that returns the entry. For this, the clientmap_entry_t had to be
moved to the header file.
Signed-off-by: David Goulet <dgoulet@torproject.org>
We'd been using crypto_digest_dup() and crypto_digest_assign() here,
but they aren't necessary. Instead we can just use the stack to
store the previous state of the SHA_CTX and avoid a malloc/free pair.
Closes ticket 24914.
In order to make the OR and dir checking functions in router.c less confusing
we renamed some functions and splitted consider_testing_reachability() into
router_should_check_reachability() and router_do_reachability_checks(). Also we
improved the documentation.
Fixes#18918.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Previously, we wouldn't do this when running with a routerinfo_t in
some cases, leading to many needless calls to the protover module.
This change also cleans up the code in nodelist.c a bit.
Fixes bug 25008; bugfix on 0.2.9.4-alpha.
Without this patch, not only will TLS1.3 not work with Tor, but
OpenSSL 1.1.1 with TLS1.3 enabled won't build any connections at
all: It requires that either TLS1.3 be disabled, or some TLS1.3
ciphersuites be listed.
Closes ticket 24978.
As we're trying not to have all the other modules in Tor, we moved the openssl
namespace includes back into crypto.c and crypto_openssl_mgt.c files.
Follows #24658.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Included crypto_openssl_mgt.[ch] into the appropiate files in order to resolve
compiling and dependencies issues.
Follows #24658.
Signed-off-by: Fernando Fernandez Mancera <ffernandezmancera@gmail.com>
Renamed free_openssl() to crypto_openssl_free_all(). Also we made variables and
functions static again.
Follows #24658.
Signed-off-by: Fernando Fernandez Mancera <ffernandezmancera@gmail.com>
For 23847, we want Tor to be able to shut down and then restart in
the same process. Here's a patch to make the Tor binary do that.
To test it, you need to build with --enable-restart-debugging, and
then you need to set the environment variable TOR_DEBUG_RESTART.
With this option, Tor will then run for 5 seconds, then restart
itself in-process without exiting. This only happens once.
You can change the 5-second interval using
TOR_DEBUG_RESTART_AFTER_SECONDS.
Implements ticket 24583.
Fix an "off by 2" error in counting rendezvous failures on the onion
service side.
While we thought we would stop the rendezvous attempt after one failed
circuit, we were actually making three circuit attempts before giving up.
Fixes bug 24895; bugfix on 0.0.6.
Rename crypto_openssl.[ch] to crypto_openssl_mgt.[ch] because it is possible we
need crypto_openssl.[ch] in the future.
Follows #24658.
Signed-off-by: Fernando Fernandez Mancera <ffernandezmancera@gmail.com>
Since helper_create_introduce1_cell() checks "cell" for nullness,
scan-build is concerned that test_introduce1_validation()
dereferences it without checking it. So, add a check.
Not backporting, since this is spurious, _and_ tests-only.
Fix a set of false positives where relays would consider connections
to other relays as being client-only connections (and thus e.g.
deserving different link padding schemes) if those relays fell out
of the consensus briefly.
Now we look only at the initial handshake and whether the connection
authenticated as a relay.
Fixes bug 24898; bugfix on 0.3.1.1-alpha.
New-style (v3) onion services now obey the "max rendezvous circuit
attempts" logic.
Previously they would make as many rendezvous circuit attempts as they
could fit in the MAX_REND_TIMEOUT second window before giving up.
Fixes bug 24894; bugfix on 0.3.2.1-alpha.
Define TOR_PRIuSZ as minGW compiler doesn't support zu format specifier for
size_t type.
Fixes#24861 on ac9eebd.
Signed-off-by: Fernando Fernandez Mancera <ffernandezmancera@gmail.com>
... in get_interface_addresses_ioctl().
This pointer alignment issue exists on x86_64 macOS, but is unlikely to exist
elsewhere. (i386 macOS only requires 4-byte alignment, and other OSs have
8-byte ints.)
Fixes bug 24733; not in any released version of tor.
Add free_openssl() function to free the memory allocated for OpenSSL version
management variables. It is required since OpenSSL management has been isolated
from the crypto module.
Follows #24658.
Signed-off-by: Fernando Fernandez Mancera <ffernandezmancera@gmail.com>
Add two new files (crypto_openssl.c, crypto_openssl.h) as new module of
crypto.[ch]. This new module includes all functions and dependencies related
to OpenSSL management. Those have been removed from crypto.[ch].
All new changes related to OpenSSL management must be done in these files.
Follows #24658
Signed-off-by: Fernando Fernandez Mancera <ffernandezmancera@gmail.com>
If we tried to move a descriptor from routerlist->old_routers
back into the current routerlist, we were preparing a buffer with
format_iso_time() on ri->cert_expiration_time, and doing it preemptively
since router_add_to_routerlist() might free ri so we wouldn't be able
to get at it later in the function.
But if the descriptor we're moving doesn't have any ed signature, then
its cert will be marked to expire at TIME_MAX, and handing TIME_MAX
to format_iso_time() generates this log warning:
correct_tm(): Bug: gmtime(9223372036854775807) failed with error Value too large for defined data type: Rounding down to 2037
The fix is to preemptively remember the expiry time, but only prepare
the buffer if we know we're going to need it.
Bugfix on commit a1b0a0b9, which came about as part of a fix for bug
20020, and which is not yet in any released version of Tor (hence no
changes file).
Using this script:
sed -i.bak $'s|^,$|/* ===== */\\\n,|' src/or/fallback_dirs.inc
(Due to embedded newlines, this script only works in bash.)
And manually add a delimiter to the end of the header, and the start of
the fallback list.
This allows us to check that the code compiles, and the unit tests pass.
And it allows downstream users stem and atlas to adapt to the new format.
The upcoming fallback rebuild will automatically generate this new format.
Follow-up to 24725.
Using this script:
sed -i.bak $'s|^,$|/* extrainfo=0 */\\\n,|' src/or/fallback_dirs.inc
(Due to embedded newlines, this script only works in bash.)
This allows us to check that the code compiles, and the unit tests pass.
And it allows downstream users stem and atlas to adapt to the new format.
The upcoming fallback rebuild will automatically generate this new format,
with actual relay extrainfo cache flags.
Follow-up to 22759.
Using this script:
sed -i.bak $'s|^,$|/* nickname= */\\\n,|' src/or/fallback_dirs.inc
(Due to embedded newlines, this script only works in bash.)
This allows us to check that the code compiles, and the unit tests pass.
And it allows downstream users stem and atlas to adapt to the new format.
The upcoming fallback rebuild will automatically generate this new format,
with actual relay nicknames.
Follow-up to 24600.
Using this script:
sed -i.bak 's/" weight=10",/,/' src/or/fallback_dirs.inc
This allows us to check that the code compiles, and the unit tests pass.
And it allows downstream users stem and atlas to adapt to the new format.
The upcoming fallback rebuild will automatically generate this new format.
Follow-up to 24679.
The code had been using c_char and i8 interchangeably, but it turns
out that c_char is only i8 on platforms where "char" is signed. On
other platforms, c_char is u8.
Fixes bug 24794; bug not on any released Tor.
These are all about local variables shadowing global
functions. That isn't normally a problem, but at least one
compiler we care about seems to treat this as a case of -Wshadow
violation, so let's fix it.
Fixes bug 24634; bugfix on 0.3.2.1-alpha.
Tor now sets IPv6 preferences on rewrite_node_address_for_bridge() even if
there is only ri or rs. It always warns about them.
Also Tor now sets the IPv6 address in rs as well as it sets the one in ri.
Fixes#24572 on 9e9edf7 in 0.2.4.5-alpha.
Fixes#24573 on c213f27 in 0.2.8.2-alpha.
Signed-off-by: Fernando Fernandez Mancera <ffernandezmancera@gmail.com>
When the fascist_firewall_choose_address_ functions don't find a
reachable address, set the returned address to the null address and port.
This is a precautionary measure, because some callers do not check the
return value.
Fixes bug 24736; bugfix on 0.2.8.2-alpha.
This makes clients on the public tor network prefer to bootstrap off fallback
directory mirrors.
This is a follow-up to 24679, which removed weights from the default fallbacks.
Implements ticket 24681.
Using tt_assert in these helpers was implying to scan-build that our
'new' functions might be returning NULL, which in turn would make it
warn about null-pointer use.
We've been seeing problems with destroy cells queues taking up a
huge amount of RAM. We can mitigate this, since while a full packed
destroy cell takes 514 bytes, we only need 5 bytes to remember a
circuit ID and a reason.
Fixes bug 24666. Bugfix on 0.2.5.1-alpha, when destroy cell queues
were introduced.
With extra_space negative, it means that the "notsent" queue is quite large so
we must consider that value with the current computed tcp_space. If we end up
to have negative space, we should not add more data to the kernel since the
notsent queue is just too filled up.
Fixes#24665
Signed-off-by: David Goulet <dgoulet@torproject.org>
Instead of using INT_MAX as a write limit for KISTLite, use the lower layer
limit which is using the specialized num_cells_writeable() of the channel that
will down the line check the connection's outbuf and limit it to 32KB
(OR_CONN_HIGHWATER).
That way we don't take the chance of bloating the connection's outbuf and we
keep the cells in the circuit queue which our OOM handler can take care of,
not the outbuf.
Finally, this commit adds a log_debug() in the update socket information
function of KIST so we can get the socket information in debug.
Fixes#24671
Signed-off-by: David Goulet <dgoulet@torproject.org>
Exposing cell_queues_get_total_allocation(), buf_get_total_allocation(),
tor_compress_get_total_allocation(), tor_compress_get_total_allocation() when
hit MaxMemInQueues threshold.
Fixes#24501
Signed-off-by: Fernando Fernandez Mancera <ffernandezmancera@gmail.com>
This patch adds support for MainloopStats that allow developers to get
main event loop statistics via Tor's heartbeat status messages. The new
status log message will show how many succesful, erroneous, and idle
event loop iterations we have had.
See: https://bugs.torproject.org/24605
Adding tor_remove_file(filename) and refactoring tor_cleanup().
Removing CookieAuthFile and ExtORPortCookieAuthFile when tor_cleanup() is
called.
Fixes#23271.
Signed-off-by: Fernando Fernandez Mancera <ffernandezmancera@gmail.com>
Using absolute_msec requires a 64-bit division operation every time
we calculate it, which gets expensive on 32-bit architectures.
Instead, just use the lazy "monotime_coarse_get()" operation, and
don't convert to milliseconds until we absolutely must.
In this case, it seemed fine to use a full monotime_coarse_t rather
than a truncated "stamp" as we did to solve this problem for the
timerstamps in buf_t and packed_cell_t: There are vastly more cells
and buffer chunks than there are channels, and using 16 bytes per
channel in the worst case is not a big deal.
There are still more millisecond operations here than strictly
necessary; let's see any divisions show up in profiles.
Retry directory downloads when we get our first bridge descriptor
during bootstrap or while reconnecting to the network. Keep retrying
every time we get a bridge descriptor, until we have a reachable bridge.
Stop delaying bridge descriptor fetches when we have cached bridge
descriptors. Instead, only delay bridge descriptor fetches when we
have at least one reachable bridge.
Fixes bug 24367; bugfix on 0.2.0.3-alpha.
When entry_list_is_constrained() is true, guards_retry_optimistic()
always returns true.
When entry_list_is_constrained() is false,
options->UseBridges is always false,
therefore !options->UseBridges is always true,
therefore (!options->UseBridges || ...) is always true.
Cleanup after #24367.
Commit e80893e51b made tor call
hs_service_intro_circ_has_closed() when we mark for close a circuit.
When we cleanup intro points, we iterate over the descriptor's map of intro
points and we can possibly mark for close a circuit. This was problematic
because we would MAP_DEL_CURRENT() the intro point then free it and finally
mark for close the circuit which would lookup the intro point that we just
free in the map we are iterating over.
This can't be done and leads to a use-after-free because the intro point will
be returned successfully due to the fact that we are still in the loop
iterating. In other words, MAP_DEL_CURRENT() followed by a digest256map_get()
of the same object should never be done in the same loop.
Fixes#24595
Signed-off-by: David Goulet <dgoulet@torproject.org>
This patch lifts the check for whether a given log file (`logfile_t`) is
an "external logfile" (handled by an external logging system such as
syslog, android's logging subsystem, or as an external C callback
function) into a function on its own.
See: https://bugs.torproject.org/24362
In KIST, we could have a small congestion window value than the unacked
packets leading to a integer overflow which leaves the tcp_space value to be
humongous.
This has no security implications but it results in KIST scheduler allowing to
send cells on a potentially saturated connection.
Found by #24423. Fixes#24590.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This function leaks memory when the event_base is freed before the
event itself fires. That's not harmful, but it's annoying when
trying to debug other memory leaks.
Fixes bug 24584; bugfix on 0.2.8.1-alpha.
When we didn't do this before, we'd have some still-reachable memory
warnings, and we'd find ourselves crashing when we tried to
reinitialize libevent.
Part of 24581 (don't crash when restarting Tor in-process)
This patch adds support for Android's logging subsystem in Tor. When
debugging Android applications it is useful to be able to collect
information about the application running on the platform via the
various system services that is available on the platform.
This patch allows you to add "Log notice android" to your torrc and have
Tor send everything above and including the notice severity to Android's
ring buffer which can be inspected using the 'adb logcat' program.
See: https://bugs.torproject.org/24362
This patch ensures that we more easily can extend our log backends that
does not take any additional argument other than a single keyword. This
patch is mostly reindentation of some code which is why it is split out
into its own patch.
See: https://bugs.torproject.org/24362
Also make IPv6-only clients wait for microdescs for relays, even if we were
previously using descriptors (or were using them as a bridge) and have
a cached descriptor for them.
But if node_is_a_configured_bridge(), stop waiting for its IPv6 address in
a microdescriptor, because we'll never use it.
Implements #23827.
networkstatus_consensus_has_ipv6() tells us whether the consensus method of
our current consensus supports IPv6 ORPorts in the consensus.
Part of #23827.
This commit was made mechanically by this perl script:
\#!/usr/bin/perl -w -i -p
next if /^#define FREE_AND_NULL/;
s/\bFREE_AND_NULL\((\w+),/FREE_AND_NULL\(${1}_t, ${1}_free_,/;
s/\bFREE_AND_NULL_UNMATCHED\(/FREE_AND_NULL\(/;
This commit removes the old FREE_AND_NULL, and renames the old
FREE_AND_NULL_UNMATCHED so that it is now called FREE_AND_NULL.
This will break all the FREE_AND_NULL_* users; the next commit will
fix them.
Couple things happen in this commit. First, we do not re-queue a cell back in
the circuit queue if the write packed cell failed. Currently, it is close to
impossible to have it failed but just in case, the channel is mark as closed
and we move on.
The second thing is that the channel_write_packed_cell() always took ownership
of the cell whatever the outcome. This means, on success or failure, it needs
to free it.
It turns out that that we were using the wrong free function in one case and
not freeing it in an other possible code path. So, this commit makes sure we
only free it in one place that is at the very end of
channel_write_packed_cell() which is the top layer of the channel abstraction.
This makes also channel_tls_write_packed_cell_method() return a negative value
on error.
Two unit tests had to be fixed (quite trivial) due to a double free of the
packed cell in the test since now we do free it in all cases correctly.
Part of #23709
Signed-off-by: David Goulet <dgoulet@torproject.org>
Split hs_circuitmap_get_rend_circ_client_side(). One returns only established
circuits (hs_circuitmap_get_established_rend_circ_client_side()) and the other
returns all kinds of circuits.
Fixes#23459
Signed-off-by: Fernando Fernandez Mancera <ffernandezmancera@gmail.com>
Previously, circuit_stream_is_being_handled incorrectly reported
that (1) an exit port was "handled" by a circuit regardless of
whether the circuit was already isolated in some way, and
(2) that a stream could be "handled" by a circuit even if their
isolation settings were incompatible.
As a result of (1), in Tor Browser, circuit_get_unhandled_ports was
reporting that all ports were handled even though all non-internal
circuits had already been isolated by a SOCKS username+password.
Therefore, circuit_predict_and_launch_new was declining to launch
new exit circuits. Then, when the user visited a new site in Tor
Browser, a stream with new SOCKS credentials would be initiated,
and the stream would have to wait while a new circuit with those
credentials could be built. That wait was making the
time-to-first-byte longer than it needed to be.
Now, clean, not-yet-isolated circuit(s) will be automatically
launched ahead of time and be ready for use whenever a new stream
with new SOCKS credentials (or other isolation criteria) is
initiated.
Fixes bug 18859. Thanks to Nick Mathewson for improvements.
This makes sure that a non opened channel is never put back in the channel
pending list and that its state is consistent with what we expect that is
IDLE.
Test the fixes in #24502.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This patch adds a check for the return value of `hs_parse_address()` in
`hs_control_hspost_command()`. Since it should not be possible for
`hs_parse_address()` to fail in this context we wrap the error check
with the `BUG()` macro.
See: https://bugs.torproject.org/24543
This patch is a result of auditing all of our uses of
get_datadir_fname() and its kin, and dividing them into cache vs
keys vs other data.
The new get_keydir_fname() and get_cachedir_fname() functions don't
actually do anything new yet.
This introduces the test_hs_control.c file which at this commit contains basic
unit test for the HS_DESC event.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This is removed for two reasons. First, HSDir accepts descriptor even though
they don't think they are in fact an HSDir. This is to avoid consensus desync
between client/service and directories.
Second, our malicious HSDir scanner uses the HSPOST command to post on all
relays in order to test them before they could become HSDir. We had to remove
that check from the tor code that the scanner uses.
Thus, this check should not be enforced by the control port for the above use
cases. It is also a bit more complex with v3 support for which not all HSDir
support it so basically irrelevant check.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This is groundwork for the HSPOST control port command that needs a way in the
HS subsystem to upload a service descriptor to a specific HSDir.
To do so, we add a public function that takes a series of parameters including
a fully encoded descriptor and initiate a directory request to a specific
routerstatut_t object.
It is for now not used but should be, in future commit, by the HSPOST command.
This commit has no behavior change, only refactoring.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This makes the REPLICA= field optional for the control port event. A v2
service will always pass it and v3 is ignored.
Signed-off-by: David Goulet <dgoulet@torproject.org>
A new v3 specific function has been added named
control_event_hsv3_descriptor_failed().
The HS v3 subsystem now uses it.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This changes the control_event_hs_descriptor_requested() call to add the hsdir
index optional value. v2 passes NULL all the time.
This commit creates hs_control.{c|h} that contains wrappers for the HS
subsystem to interact with the control port subsystem.
The descriptor REQUESTED event is implemented following proposal 284 extension
for v3.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Make control_event_hs_descriptor_received() and
control_event_hs_descriptor_failed() v2 specific because they take a
rend_data_t object and v3 will need to pass a different object.
No behavior change.
Signed-off-by: David Goulet <dgoulet@torproject.org>
First, rename and make that function static because it is internal to
control.c and called by two HS_DESC events.
Second, make it take more basic parameters and thus not a rend_data_t object
so we can still use the function for v3 HS that doesn't use that object.
Third, move the descriptor ID lookup to the two specific events (yes little
code duplication there) because they get a rend_data_t object which won't be
the case for v3.
Finally, through this refactoring, change the pointer check to BUG() and
change some parameter names to reflect what they really are.
No behavior change at this commit.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This is a naming refactor mostly _except_ for a the events' function that take
a rend_data_t which will require much more refactoring.
No behavior change at this commit, cleanup and renaming stuff to not be only
v2 specific.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When an intro circuit has closed, do not warn anymore when we can't find the
service. It is possible to hit that condition if the service is removed before
the circuits were fully closed. This happens in the case of deleting an
ephemeral service.
Signed-off-by: David Goulet <dgoulet@torproject.org>
The functions are now used by the ADD_ONION/DEL_ONION control port command as
well. This commits makes them fully functionnal with hidden service v3.
Part of #20699
Signed-off-by: David Goulet <dgoulet@torproject.org>
Instead of using the cwd to specify the location of Cargo.toml, we
use the --manifest-path option to specify its location explicitly.
This works around the bug that isis diagnosed on our jenkins builds.
Making errno error log more useful for getrandom() call. Adding if statement to
make difference between ENOSYS and other errors.
Fixes#24500
Signed-off-by: Fernando Fernandez Mancera <ffernandezmancera@gmail.com>
First, hs_service_intro_circ_has_closed() is now called in circuit_mark_for
close() because the HS subsystem needs to learn when an intro point is
actually not established anymore as soon as possible. There is a time window
between a close and a free.
Second, when we mark for close, we also remove it from the circuitmap because
between the close and the free, a service can launch an new circuit to that
same intro point and thus register it which only succeeds if the intro point
authentication key is not already in the map.
However, we still do a remove from the circuitmap in circuit_free() in order
to also cleanup the circuit if it wasn't marked for close prior to the free.
Fixes#23603
Signed-off-by: David Goulet <dgoulet@torproject.org>
The hs_service_intro_circ_has_closed() was removing intro point objects if too
many retries.
We shouldn't cleanup those objects in that function at all but rather let
cleanup_intro_points() do its job and clean it properly.
This was causing an issue in #23603.
Furthermore, this moves the logic of remembering failing intro points in the
cleanup_intro_points() function which should really be the only function to
know when to cleanup and thus when an introduction point should be remembered
as a failed one.
Fixes#23603
Signed-off-by: David Goulet <dgoulet@torproject.org>
In the KIST main loop, if the channel happens to be not opened, set its state
to IDLE so we can release it properly later on. Prior to this fix, the channel
was in PENDING state, removed from the channel pending list and then kept in
that state because it is not opened.
This bug was introduced in commit dcabf801e5 for
which we made the scheduler loop not consider unopened channel.
This has no consequences on tor except for an annoying but harmless BUG()
warning.
Fixes#24502
Signed-off-by: David Goulet <dgoulet@torproject.org>
Some platforms don't have good monotonic time support so don't warn when the
diff between the last run of the scheduler time and now is negative. The
scheduler recovers properly from this so no need to be noisy.
Fixes#23696
Signed-off-by: David Goulet <dgoulet@torproject.org>
When creating a routerstatus (vote) from a routerinfo (descriptor),
set the IPv6 address to the unspecified IPv6 address, and explicitly
initialise the port to zero.
Also clarify the documentation for the function.
Fixes bug 24488; bugfix on 0.2.4.1-alpha.
Fortunately, use_cached_ipv4_answers was already 0, so we wouldn't
actually use this info, but it's best not to have it.
Fixes bug 24050; bugfix on 0.2.6.3-alpha
TROVE-2017-12. Severity: Medium
When choosing a random node for a circuit, directly use our router
descriptor to exclude ourself instead of the one in the global
descriptor list. That list could be empty because tor could be
downloading them which could lead to not excluding ourself.
Closes#21534
TROVE-2017-12. Severity: Medium
Thankfully, tor will close any circuits that we try to extend to
ourselves so this is not problematic but annoying.
Part of #21534.
TROVE-2017-13. Severity: High.
In the unlikely case that a hidden service could be missing intro circuit(s),
that it didn't have enough directory information to open new circuits and that
an intro point was about to expire, a use-after-free is possible because of
the intro point object being both in the retry list and expiring list at the
same time.
The intro object would get freed after the circuit failed to open and then
access a second time when cleaned up from the expiring list.
Fixes#24313
Going from 4 hours to 24 hours in order to try reduce the efficiency of guard
discovery attacks.
Closes#23856
Signed-off-by: David Goulet <dgoulet@torproject.org>
The goal here is to replace our use of msec-based timestamps with
something less precise, but easier to calculate. We're doing this
because calculating lots of msec-based timestamps requires lots of
64/32 division operations, which can be inefficient on 32-bit
platforms.
We make sure that these stamps can be calculated using only the
coarse monotonic timer and 32-bit bitwise operations.
First, that test was broken from the previous commit because the
channel_queue_cell() has been removed. This now tests the
channel_process_cell() directly.
Second, it wasn't testing much except if the channel subsystem actually went
through the cell handler. This commit adds more checks on the state of a
channel going from open, receiving a cell and closing.
Third, this and the id_map unit test are working, not the others so they've
been marked as not working and future commit will improve and fix those.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This removed code that was either never reached or irrelevant after the
incoming/outgoing queue removal such as the "timestamp_drained".
Lots of things are also removed from channel.h that do not exists anymore or
not used.
Signed-off-by: David Goulet <dgoulet@torproject.org>
If the channel layer failed to write a cell from the circuit queue, requeue it
so it can be retried on the same channel later.
Signed-off-by: David Goulet <dgoulet@torproject.org>
The channel_write_cell() and channel_write_var_cell() can't be possibly called
nor are used by tor. We only write on the connection outbuf packed cell coming
from the scheduler that takes them from the circuit queue.
This makes channel_write_packed_cell() the only usable function. It is
simplify and now returns a code value. The reason for this is that in the next
commit(s), we'll re-queue the cell onto the circuit queue if the write fails.
Finally, channel unit tests are being removed with this commit because they do
not match the new semantic. They will be re-written in future commits.
Signed-off-by: David Goulet <dgoulet@torproject.org>
The channel subsystem was doing a whole lot to track and try to predict the
channel queue size but they are gone due to previous commit.
Signed-off-by: David Goulet <dgoulet@torproject.org>
For the rationale, see ticket #23709.
This is a pretty massive commit. Those queues were everywhere in channel.c and
it turns out that it was used by lots of dead code.
The channel subsystem *never* handles variable size cell (var_cell_t) or
unpacked cells (cell_t). The variable ones are only handled in channeltls and
outbound cells are always packed from the circuit queue so this commit removes
code related to variable and unpacked cells.
However, inbound cells are unpacked (cell_t), that is untouched and is handled
via channel_process_cell() function.
In order to make the commit compile, test have been modified but not passing
at this commit. Also, many tests have been removed but better improved ones
get added in future commits.
This commit also adds a XXX: which indicates that the handling process of
outbound cells isn't fully working. This as well is fixed in a future commit.
Finally, at this commit, more dead code remains, it will be cleanup in future
commits.
Fixes#23709
Signed-off-by: David Goulet <dgoulet@torproject.org>
This function is part of the tor fast path so this commit adds more
documentation to it as it is critical.
Signed-off-by: David Goulet <dgoulet@torproject.org>
append_cell_to_circuit_queue() had code disabled from commit
2a95f31716
This code is 4+ years old related to bug #9072 so if we ever want to revisit
it, lets inspect/revert this commit.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This applies the changes in 23524 to num_usable_bridges(), because it has
replaced any_bridge_descriptors_known().
The original changes file still applies.
Stop checking for bridge descriptors when we actually want to know if
any bridges are usable. This avoids potential bootstrapping issues.
Fixes bug 24367; bugfix on 0.2.0.3-alpha.
Stop stalling when bridges are changed at runtime. Stop stalling when
old bridge descriptors are cached, but they are not in use.
Fixes bug 24367; bugfix on 23347 in 0.3.2.1-alpha.
We used to check whether we have enough filtered guards (guard set when
torrc is applied) but that's not good enough, since that might be bad in
some cases where many guards are not reachable (might cause overblocking
and hence reacahbility issues).
We now check if we have enough reachable filtered guards before applying
md restrictions which should prevent overblocking.
Previously, if store_multiple() reported a partial success, we would
store all the handles it gave us as if they had succeeded. But it's
possible for the diff to be only partially successful -- for
example, if LZMA failed but the other compressors succeeded.
Fixes bug 24086; bugfix on 0.3.1.1-alpha.
Move it to hs_common.h and rename it "hs_service_add_ephemeral_status_t". It
will be shared between v2 and v3 services.
Part of #20699
Signed-off-by: David Goulet <dgoulet@torproject.org>
At this commit, the key handling and generation is supported for a v3 service
(ED25519-V3). However, the service creation is not yet implemented. This only
adds the interface and code to deal with the new ED25519-V3 key type.
Tests have been updated for RSA key type but nothing yet for ED25519-v3.
Part of #20699
Signed-off-by: David Goulet <dgoulet@torproject.org>
This will be used by the control port command "GETINFO
hs/service/desc/id/<ADDR>" which returns the encoded current descriptor for
the given onion address.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This commit adds hs_cache_lookup_encoded_as_client() function that returns the
encoded descriptor for a given service public key. This will be needed by the
"GETINFO hs/client/desc/id/<ADDR>" control port command.
Signed-off-by: David Goulet <dgoulet@torproject.org>
If we can't read a file because of an FS issue, we say "we can't
read that" and move on. But if we can't read it because it's empty,
because it has no labels, or because its labels are misformatted, we
should remove it.
Fixes bug 24099; bugfix on 0.3.1.1-alpha.
A circuit with purpose C_INTRODUCING means that its state is opened but the
INTRODUCE1 cell hasn't been sent yet. We shouldn't consider that circuit when
looking for timing out "building circuit". We have to wait on the rendezvous
circuit to be opened before sending that cell so the intro circuit needs to be
kept alive for at least that period of time.
This patch makes that the purpose C_INTRODUCING is ignored in the
circuit_expire_building() which means that we let the circuit idle timeout
take care of it if we end up never using it.
Fixes#23681
Signed-off-by: David Goulet <dgoulet@torproject.org>
This check makes it so we can reach "done" without setting "conn",
and so the "if (conn)" check will not be redundant, and so coverity
won't complain. Fixes CID 1422205. Not actually a bug.
There are three changes here:
* We need to allow epoll_pwait.
* We need to allow PF_NETLINK sockets to be opened with SOCK_CLOEXEC.
* We need to use openat() instead of open().
Note that this fix is not complete, since the openat() change is
turned off. The next commit will make the openat() change happen
when we're running glibc 2.26 or later.
Fix for 24315.
We don't want to allow general signals to be sent, but there's no
problem sending a kill(0) to probe whether a process is there.
Fixes bug 24198; bugfix on 0.2.5.1-alpha when the seccomp2 sandbox
was introduced.
When we close a connection via connection_close_immediately, we kill
its events immediately. But if it had been blocked on bandwidth
read/write, we could try to re-add its (nonexistent) events later
from connection_bucket_refill -- if we got to that callback before
we swept the marked connections.
Fixes bug 24167. Fortunately, this hasn't been a crash bug since we
introduced connection_check_event in 0.2.9.10, and backported it.
This is a bugfix on commit 89d422914a, I believe, which
appeared in Tor 0.1.0.1-rc.
Commit 56c5e282a7 suppressed that same log
statement in directory_info_has_arrived() for microdescriptors so do the same
for the descriptors. As the commit says, we already have the bootstrap
progress for this.
Fixes#23861
Signed-off-by: David Goulet <dgoulet@torproject.org>
evdns is allowed to give us unrecognized object types; it is allowed
to give us non-IPv4 answer types, and it is (even) allowed to give
us empty answers without an error.
Closes ticket 24097.
By convention, the torrc options that the user sets are
unchangeable. If we need to change them, we should be using a copy
that's stored in another field
To avoid trouble, I'm keeping DataDirectory as the name for the
field that the rest of Tor uses, and using DataDirectory_option for
the confparse-controlled field.
This commit also modernizes some older string handling code in the
DataDirectory normalization function.
Due to #23662 this can happen under natural causes and does not disturb
the functionality of the service. This is a simple 0.3.2 fix for now,
and we plan to fix this properly in 0.3.3.
Clients add rendezvous point IPv6 addresses to introduce cell link specifiers,
when the node has a valid IPv6 address.
Also check the node's IPv4 address is valid before adding any link specifiers.
Implements #23577.
This function -- a mock replacement used only for fuzzing -- would
have a buffer overflow if it got an RSA key whose modulus was under
20 bytes long.
Fortunately, Tor itself does not appear to have a bug here.
Fixes bug 24247; bugfix on 0.3.0.3-alpha when fuzzing was
introduced. Found by OSS-Fuzz; this is OSS-Fuzz issue 4177.
On failure to upload, the HS_DESC event would report "UPLOAD_FAILED" as the
Action but it should have reported "FAILED" according to the spec.
Fixes#24230
Signed-off-by: David Goulet <dgoulet@torproject.org>
DisableNetwork is a subset of net_is_disabled(), which is (now) a
subset of should_delay_dir_fetches().
Some of these changes are redundant with others higher or lower in
the call stack. The ones that I think are behavior-relevant are
listed in the changes file. I've also added comments in a few
places where the behavior is subtle.
Fixes bug 12062; bugfix on various versions.
Commit e67f4441eb introduced a safeguard against
using an uninitialized voting schedule object. However, the dirvote_act() code
was looking roughly at the same thing to know if it had to compute the timings
before voting with this condition:
if (!voting_schedule.voting_starts) {
...
dirvote_recalculate_timing(options, now);
}
The sr_init() function is called very early and goes through the safeguard
thus the voting schedule is always initilized before the first vote.
That first vote is a crucial one because we need to have our voting schedule
aligned to the "now" time we are about to use for voting. Then, the schedule
is updated when we publish our consensus or/and when we set a new consensus.
From that point on, we only want to update the voting schedule through that
code flow.
This "created_on_demand" is indicating that the timings have been recalculated
on demand by another subsystem so if it is flagged, we know that we need to
ignore its values before voting.
Fixes#24186
Signed-off-by: David Goulet <dgoulet@torproject.org>
When we have fewer than 15 descriptors to fetch, we will delay the
fetch for a little while. That's fine, if we can go ahead and build
circuits... but if not, it's a poor choice indeed.
Fixes bug 23985; bugfix on 0.1.1.11-alpha.
In 0.3.0.3-alpha, when we made primary guard descriptors necessary
for circuit building, this situation got worse.
When calculating the fraction of nodes that have descriptors, and all
all nodes in the network have zero bandwidths, count the number of nodes
instead.
Fixes bug 23318; bugfix on 0.2.4.10-alpha.
Back in 0.2.4.3-alpha (e106812a77), when we switched from using
double to using uint64 for selecting by bandwidth, I got the math
wrong: I should have used llround(x), or (uint64_t)(x+0.5), but
instead I wrote llround(x+0.5). That means we would always round
up, rather than rounding to the closest integer
Fixes bug 23318; bugfix on 0.2.4.3-alpha.
The flush cells process can close a channel if the connection write fails but
still return that it flushed at least one cell. This is due because the error
is not propagated up the call stack so there is no way of knowing if the flush
actually was successful or not.
Because this would require an important refactoring touching multiple
subsystems, this patch is a bandaid to avoid the KIST scheduler to handle
closed channel in its loop.
Bandaid on #23751.
Signed-off-by: David Goulet <dgoulet@torproject.org>
dirvote_get_next_valid_after_time() is the only public function that uses the
voting schedule outside of the dirvote subsystem so if it is zeroed,
recalculate its timing if we can that is if a consensus exists.
Part of #24161
Signed-off-by: David Goulet <dgoulet@torproject.org>
Because the HS and SR subsystems can use the voting schedule early (with the
changes in #23623 making the SR subsystem using the static voting schedule
object), we need to recalculate the schedule very early when setting the new
consensus.
Fixes#24161
Signed-off-by: David Goulet <dgoulet@torproject.org>
If it decrypts something that turns out to start with a NUL byte,
then decrypt_desc_layer() will return 0 to indicate the length of
its result. But 0 also indicates an error, which causes the result
not to be freed by decrypt_desc_layer()'s callers.
Since we're trying to stabilize 0.3.2.x, I've opted for the simpler
possible fix here and made it so that an empty decrypted string will
also count as an error.
Fixes bug 24150 and OSS-Fuzz issue 3994.
The original bug was present but unreachable in 0.3.1.1-alpha. I'm
calling this a bugfix on 0.3.2.1-alpha since that's the first version
where you could actually try to decrypt these descriptors.
The node_get_ed25519_id() warning can actually be triggered by a relay flagged
with NoEdConsensus so instead of triggering a warning on all relays of the
network, downgrade it to protocol warning.
Fixes#24025
Signed-off-by: David Goulet <dgoulet@torproject.org>
When a BUG() occurs, this macro will print extra information about the state
of the scheduler and the given channel if any. This will help us greatly to
fix future bugs in the scheduler especially when they occur rarely.
Fixes#23753
Signed-off-by: David Goulet <dgoulet@torproject.org>
When running "make test-network-all", test that IPv6-only clients can use
microdescriptors. IPv6-only microdescriptor client support was fixed in
tor 0.3.0.1-alpha.
Requires chutney master 61c28b9 or later.
Closes ticket 24109.
They are not yet implemented: they will upload descriptors, but won't be
able to rendezvous, because IPv6 addresses in link specifiers are ignored.
Part of #23820.
The previous version of this function had the following issues:
* it didn't check if the extend_info contained an IPv6 address,
* it didn't check if the ed25519 identity key was valid.
But we can't add IPv6 support in a bugfix release.
Instead, BUG() if the address is an IPv6 address, so we always put IPv4
addresses in link specifiers. And ignore missing ed25519 identifiers,
rather than generating an all-zero link specifier.
This supports v3 hidden services on IPv4, dual-stack, and IPv6, and
v3 single onion services on IPv4 only.
Part of 23820, bugfix on 0.3.2.1-alpha.
When the directory information changes, callback to the HS client subsystem so
it can check if any pending SOCKS connections are waiting for a descriptor. If
yes, attempt a refetch for those.
Fixes#23762
Signed-off-by: David Goulet <dgoulet@torproject.org>
Because the HS subsystem needs the voting schedule to compute time period, we
need all tor type to do that.
Part of #23623
Signed-off-by: David Goulet <dgoulet@torproject.org>
The new decryption function performs no decryption, skips the salt,
and doesn't check the mac. This allows us to fuzz the
hs_descriptor.c code using unencrypted descriptor test, and exercise
more of the code.
Related to 21509.
The exposed get_voting_schedule() allocates and return a new object everytime
it is called leading to an awful lot of memory allocation when getting the
start time of the current round which is done for each node in the consensus.
Closes#23623
Signed-off-by: David Goulet <dgoulet@torproject.org>
If the intro point supports ed25519 link authentication, make sure we don't
have a zeroed key which would lead to a failure to extend to it.
We already check for an empty key if the intro point does not support it so
this makes the check on the key more consistent and symmetric.
Fixes#24002
Signed-off-by: David Goulet <dgoulet@torproject.org>
The previous version of these functions had the following issues:
* they can't supply both the IPv4 and IPv6 addresses in link specifiers,
* they try to fall back to a 3-hop path when the address for a direct
connection is unreachable, but this isn't supported by
launch_rendezvous_point_circuit(), so it fails.
But we can't fix these things in a bugfix release.
Instead, always put IPv4 addresses in link specifiers.
And if a v3 single onion service can't reach any intro points, fail.
This supports v3 hidden services on IPv4, dual-stack, and IPv6, and
v3 single onion services on IPv4 only.
Part of 23820, bugfix on 0.3.2.1-alpha.
The previous version of this function has the following issues:
* it doesn't choose between IPv4 and IPv6 addresses correctly, and
* it doesn't fall back to a 3-hop path when the address for a direct
connection is unreachable.
But we can't fix these things in a bugfix release.
Instead, treat IPv6 addresses like any other unrecognised link specifier
and ignore them. If there is no IPv4 address, return NULL.
This supports v3 hidden services on IPv4, dual-stack, and IPv6, and
v3 single onion services on IPv4 only.
Part of 23820, bugfix on 0.3.2.1-alpha.
Turns out that when reloading a tor configured with hidden service(s), we
weren't copying all the needed information between the old service object to
the new one.
For instance, the desc_is_dirty timestamp wasn't which could lead to the
service uploading its desriptor much later than it would need to.
The replaycache wasn't also moved over and some intro point information as
well.
Fixes#23790
Signed-off-by: David Goulet <dgoulet@torproject.org>
Bridge relays can use it to add a "bridge-distribution-request" line
to their bridge descriptor, which tells BridgeDB how they'd like their
bridge address to be given out.
Implements tickets 18329.
Fixes bug 23908; bugfix on 0.3.1.6-rc when we made the keypin
failure message really long.
Backport from 0.3.2's 771fb7e7ba,
where arma said "get rid of the scary 256-byte-buf landmine".
It _should_ work, and I don't see a reason that it wouldn't, but
just in case, add a 10 second timer to make tor die with an
assertion failure if it's supposed to exit but it doesn't.
This function was never about 'finishing' the event loop, but rather
about making sure that the code outside the event loop would be run
at least once.
Sometimes when we call exit(), it's because the process is
completely hopeless: openssl has a broken AES-CTR implementation, or
the clock is in the 1960s, or something like that.
But sometimes, we should return cleanly from tor_main() instead, so
that embedders can keep embedding us and start another Tor process.
I've gone through all the exit() and _exit() calls to annotate them
with "exit ok" or "XXXX bad exit" -- the next step will be to fix
the bad exit()s.
First step towards 23848.
At first, we put the tor_git_revision constant in tor_main.c, so
that we wouldn't have to recompile config.o every time the git
revision changed. But putting it there had unintended side effect
of forcing every program that wanted to link libor.a (including
test, test-slow, the fuzzers, the benchmarks, etc) to declare their
own tor_git_revision instance.
That's not very nice, especially since we want to start supporting
others who want to link against Tor (see 23846).
So, create a new git_revision.c file that only contains this
constant, and remove the duplicated boilerplate from everywhere
else.
Part of implementing ticket 23845.
This feature should help programs that want to launch and manage a
Tor process, as well as programs that want to launch and manage a
Tor instance in a separate thread. Right now, they have to open a
controlport, and then connect to it, with attendant authentication
issues. This feature allows them to just start with an
authenticated connection.
Bug 23900.
Our socket accounting functions assumed that we'd never be asked to
close a socket that we didn't open ourselves. But now we want to
support taking control sockets that we inherit -- so we need a way
of taking ownership of them, so we don't freak out later on when we
close them.
Create a function that tells us if we can fetch or not the descriptor for the
given service key.
No behavior change. Mostly moving code but with a slight change so the
function can properly work by returning a boolean and also a possible fetch
status code.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Skip test_config_include_no_permission() when running as root, because
it will get an unexpected success from config_get_lines_include().
This affects some continuous integration setups. Fixes bug 23758.
When we added HTTPTunnelPort, the answer that we give when you try
to use your SOCKSPort as an HTTP proxy became wrong. Now we explain
that Tor sorta _is_ an HTTP proxy, but a SOCKSPort isn't.
I have left the status line the same, in case anything is depending
on it. I have removed the extra padding for Internet Explorer,
since the message is well over 512 bytes without it.
Fixes bug 23678; bugfix on 0.3.2.1-alpha.
Without this fix, changes from client to bridge don't trigger
transition_affects_workers(), so we would never have actually
initialized the cpuworkers.
Fixes bug 23693. Bugfix on 3bcdb26267 0.2.6.3-alpha, which
fixed bug 14901 in the general case, but not on the case where
public_server_mode() did not change.
Because our monotonic time interface doesn't play well with value set to 0,
always initialize to now() the scheduler_last_run at init() of the KIST
scheduler.
Fixes#23696
Signed-off-by: David Goulet <dgoulet@torproject.org>
When a channel is scheduled and flush cells returns 0 that is no cells to
flush, we flag it back in waiting for cells so it doesn't get stuck in a
possible infinite loop.
It has been observed on moria1 where a closed channel end up in the scheduler
where the flush process returned 0 cells but it was ultimately kept in the
scheduling loop forever. We suspect that this is due to a more deeper problem
in tor where the channel_more_to_flush() is actually looking at the wrong
queue and was returning 1 for an empty channel thus putting the channel in the
"Case 4" of the scheduler which is to go back in pending state thus
re-considered at the next iteration.
This is a fix that allows the KIST scheduler to recover properly from a not
entirelly diagnosed problem in tor.
Fixes#23676
Signed-off-by: David Goulet <dgoulet@torproject.org>
When we added single_conn_free_bytes(), we cleared the outbuf on a
connection without setting outbuf_flushlen() to 0. This could cause
an assertion failure later on in flush_buf().
Fixes bug 23690; bugfix on 0.2.6.1-alpha.
This caused a BUG log when we noticed that the circuit had no
channel. The likeliest culprit for exposing that behavior is
d769cab3e5, where we made circuit_mark_for_close() NULL out
the n_chan and p_chan fields of the circuit.
Fixes bug 8185; bugfix on 0.2.5.4-alpha, I think.
My current theory is that this is just a marked circuit that hasn't
closed yet, but let's gather more information in case that theory is
wrong.
Diagnostic for 8185.
This patch ensures that we return TOR_COMPRESS_BUFFER_FULL in case we
have a input bytes left to process, but are out of output buffer or in
case we need to finish where the compression implementation might need
to write an epilogue.
See: https://bugs.torproject.org/23551
This patch ensures that we return TOR_COMPRESS_BUFFER_FULL in case we
have a input bytes left to process, but are out of output buffer or in
case we need to finish where the compression implementation might need
to write an epilogue.
See: https://bugs.torproject.org/23551
If 6 SOCKS requests are opened at once, it would have triggered 6 fetches
which ultimately poke all 6 HSDir. We don't want that, if we have multiple
SOCKS requests for the same service, do one fetch only.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When purging last HSDir requests, we used time(NULL) for computing the
service blinded key but in all other places in our codebase we actually
use the consensus times. That can cause wrong behavior if the consensus
is in a different time period than time(NULL).
This commit is required for proper purging of HSDir requests.
The confparse field has type UINT, which corresponds to an int
type. We had uint32_t.
This shouldn't cause trouble in practice, since int happens to
4-bytes wide on every platform where an authority is running. It's
still wrong, though.
These should have been int, but we had listed them as unsigned.
That's an easy mistake to make, since "int" corresponds with either
INT or UINT in the configuration file.
This bug cannot have actually caused a problem in practice, since we
check those fields' values on load, and ensure that they are in
range 0..INT32_MAX.
New approach, suggested by Taylor: During testing builds, we
initialize a union member of an appropriate pointer type with the
address of the member field we're trying to test, so we can make
sure that the compiler doesn't warn.
My earlier approach invoked undefined behavior.
Also demote a log message that can occur under natural causes
(if the circuit subsystem is missing descriptors/consensus etc.).
The HS subsystem will naturally retry to connect to intro points,
so no need to make that log user-facing.
So we can track them more easily in the logs and match any open/close/free
with those identifiers.
Part of #23645
Signed-off-by: David Goulet <dgoulet@torproject.org>
This removes the "nickname" of the cannibalized circuit last hop as it is
useless. It now logs the n_circ_id and global identifier so we can match it
with other logging statement.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Prior to the log statement, the circuit n_circ_id value is zeroed so keep a
copy so we can log it at the end.
Part of #23645
Signed-off-by: David Goulet <dgoulet@torproject.org>
Make the "Exit" flag assignment only depend on whether the exit
policy allows connections to ports 80 and 443. Previously relays
would get the Exit flag if they allowed connections to one of
these ports and also port 6667.
Resolves ticket 23637.
Back in 0.2.4.3-alpha (e106812a77), when we switched from using
double to using uint64 for selecting by bandwidth, I got the math
wrong: I should have used llround(x), or (uint64_t)(x+0.5), but
instead I wrote llround(x+0.5). That means we would always round
up, rather than rounding to the closest integer
Fixes bug 23318; bugfix on 0.2.4.3-alpha.
The is_first_hop field should have been called used_create_fast,
but everywhere that we wanted to check it, we should have been
checking channel_is_client() instead.
The diff is confusing, but were two static scheduler functions that
needed moving to static comment block.
No code change. Thanks dgoulet for original commit
The clock_skew_warning() refactoring allowed calls from
or_state_load() to control_event_bootstrap_problem() to occur prior
bootstrap phase 0, causing an assertion failure. Initialize the
bootstrap status prior to calling clock_skew_warning() from
or_state_load().
or_state_load() was using an incorrect sign convention when calling
clock_skew_warning() to warn about state file clock skew. This caused
the wording of the warning to be incorrect about the direction of the
skew.
is_canonical doesn't mean "am I connected to the one true address of
this relay"; it means "does this relay tell me that the address I'm
connected to belong to it." The point is to prevent TCP-based MITM,
not to prevent the relay from multi-homing.
Related to 22890.
Authority IPv6 addresses were originally added in 0.2.8.1-alpha.
This leaves 3/8 directory authorities with IPv6 addresses, but there
are also 52 fallback directory mirrors with IPv6 addresses.
Resolves 19760.
If "1" is not 64 bits wide already, then "1 << i" will not actually
work.
This bug only affects the TEST_BITOPS code, and shouldn't matter for
the actual use of the timeout code (except if/when it causes this
test to fail).
Reported by dcb314@hotmail.com. Fix for bug 23583. Not adding a
changes file, since this code is never compiled into Tor.
Use this value instead of hardcoded values of 32 everywhere. This also
addresses the use of REND_DESC_ID_V2_LEN_BASE32 in
hs_lookup_last_hid_serv_request() for the HSDir encoded identity digest length
which is accurate but semantically wrong.
Fixes#23305.
Signed-off-by: David Goulet <dgoulet@torproject.org>
RENDEZVOUS1 cell is 84 bytes long in v3 and 168 bytes long in v2 so this
commit pads with random bytes the v3 cells up to 168 bytes so they all look
alike at the rendezvous point.
Closes#23420
Signed-off-by: David Goulet <dgoulet@torproject.org>
This warning is caused by a different tv_usec data type on macOS
compared to the system on which the patch was developed.
Fixes 23575 on 0.3.2.1-alpha.
It is highly unlikely to happen but if so, we need to know and why. The
warning with the next_run values could help.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When the KIST schedule() is called, it computes a diff value between the last
scheduler run and the current monotonic time. If tha value is below the run
interval, the libevent even is updated else the event is run.
It turned out that casting to int32_t the returned int64_t value for the very
first scheduler run (which is set to 0) was creating an overflow on the 32 bit
value leading to adding the event with a gigantic usec value. The scheduler
was simply never running for a while.
First of all, a BUG() is added for a diff value < 0 because if the time is
really monotonic, we should never have a now time that is lower than the last
scheduler run time. And we will try to recover with a diff value to 0.
Second, the diff value is changed to int64_t so we avoid this "bootstrap
overflow" and future casting overflow problems.
Fixes#23558
Signed-off-by: David Goulet <dgoulet@torproject.org>
This was introduced in 4ff170d7b1, and is probably
unreachable, but coverity complained about it (CID 1417761). Bug not
in any released Tor, so no changes file.
Otherwise integer overflows can happen. Remember, doing a i32xi32
multiply doesn't actually produce a 64-bit output. You need to do
i64xi32 or i64xi64.
Coverity found this as CID 1417753
Each type of scheduler implements its own static scheduler_t object and
returns a reference to it.
This commit also makes it a const pointer that is it can only change inside
the scheduler type subsystem but not outside for extra protection.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Instead, add wrappers to do the needed action the different scheduler needs
with the libevent object.
Signed-off-by: David Goulet <dgoulet@torproject.org>
A channel can bounce in the scheduler and bounce out with the IDLE state which
means that if it came in the scheduler once, it has socket information that
needs to be freed from the global hash table.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This option is a list of possible scheduler type tor can use ordered by
priority. Its default value is "KIST,KISTLite,Vanilla" which means that KIST
will be used first and if unavailable will fallback to KISTLite and so on.
Signed-off-by: David Goulet <dgoulet@torproject.org>
It is possible that tor was compiled with KIST support but the running kernel
has no support for it. In that case, fallback to a naive approach and flag
that we have no kernel support.
At this commit, if the kernel support is disabled, there are no ways to come
back from it other than restarting tor with a kernel that supporst KIST.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Add a detection for the KIST scheduler in our build system and set
HAVE_KIST_SUPPORT if available.
Adapt the should use kist function with this new compile option.
Signed-off-by: David Goulet <dgoulet@torproject.org>
- HT_FOREACH_FN defined in an additional place because nickm did that
in an old kist prototype
- Make channel_more_to_flush mockable for future sched tests
- Add empty scheduler_{vanilla,kist}.c files and put in include.am
Signed-off-by: David Goulet <dgoulet@torproject.org>
- massive change to src/tgest/test_options.c since the sched options
were added all over the place in it
- removing the sched options caused some tests to pass/fail in new ways
so I assumed current behavior is correct and made them pass again
- ex: "ConnLimit must be greater" lines
- ex: "Authoritative directory servers must" line
- remove test_options_validate__scheduler in prep for new sched tests
Signed-off-by: David Goulet <dgoulet@torproject.org>
An unnecessary routerlist check in the NETINFO clock skew detection in
channel_tls_process_netinfo_cell() was preventing clients from
reporting NETINFO clock skew to controllers.
This patch replaces a few calls to router_get_by_id_digest ("do we
have a routerinfo?") with connection_or_digest_is_known_relay ("do
we know this relay to be in the consensus, or have been there some
time recently?").
Found while doing the 21585 audit; fixes bug 23533. Bugfix on
0.3.0.1-alpha.
The memleak was occuring because of the way ExcludeNodes is handled in
that function. Basically, we were putting excluded intro points extend
infos in a special variable which was never freed. Also, if there were
multiple excluded intro points then that variable was overwritten
everytime leaking more memory. This commit should fix both issues.
This commit adds a pretty advanced test for the client-side making sure that
picking intro is done properly.
This unittest also reveals a memleak on the client_pick_intro() function which
is fixed by the subsequent commit.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Using a test vector in python, test both hs_build_hsdir_index() and
hs_build_hs_index().
This commit also adds the hs_build_address.py to EXTRA_DIST which was missing.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Do two major improvements:
a) Make the client pick 6 HSDirs instead of just 1 and make sure they
all match the service's HSDirs.
b) Test two additional missing scenarios borrowed from the
test_reachability() test.
Make download status next attempts reported over the control port
consistent with the time used by tor. This issue only occurs if a
download status has not been reset before it is queried over the
control port.
Fixes 23525, not in any released version of tor.
If future code asks if there are any running bridges, without checking
if bridges are enabled, log a BUG warning rather than crashing.
Fixes 23524 on 0.3.0.1-alpha
Change the contract of control_event_bootstrap_problem() to be more
general and to take a connection_t. New function
control_event_bootstrap_prob_or() has the specific or_connection_t
funcionality previously used.
Fix the test_build_address() test and its test vectors python script.
They were both using a bogus pubkey for building an HS address which
does not validate anymore.
Also fix a few more unittests that were using bogus onion addresses
and were failing the validation. I replaced the bogus address with
the one generated from the test vector script.
Directory servers now include a "Date:" http header for response
codes other than 200. Clients starting with a skewed clock and a
recent consensus were getting "304 Not modified" responses from
directory authorities, so without a Date header the client would
never hear about a wrong clock.
Fixes bug 23499; bugfix on 0.0.8rc1.
We enrich the test_client_cache() test in two ways:
a) We check that transitioning time periods also cleans up expired
descriptors in client memory.
b) We test hs_cache_lookup_as_client() instead of
lookup_v3_desc_as_client(). The former is a higher level function
which calls the latter and allows us to test deeper into the
subsystem.
In #23466 we discovered that cached descriptors can stay around on the
client-side for up to 72 hours. In reality we only want those descs to
get cached for the duration of the current time period, since after that
TP is gone the client needs to compute a new blinded key to use for the HS.
In this commit we start using the consensus time (if available) when
cleaning up cached client descriptor entries. That makes sense because
the client uses consensus time anyway for connecting to hidden
services (e.g. computing blinded keys and time periods).
If no recent consensus is available, we consider descriptors to be
expired since we will want to fetch new ones when we get a live
consensus to avoid the Roger bug. If we didn't do that, when Roger
desuspends his laptop there would be a race between Tor fetching a new
consensus, and Tor connecting to the HS which would still cause
reachability issues.
We also turned a rev counter check into a BUG, since we should never
receive a descriptor with a strictly smaller rev counter than the one we
already have, except if there is a bug or if the HSDir wants to mess
with us. In any case, let's turn this into a BUG so that we can detect
and debug such cases easily.
This change refactors find_dl_schedule() to only call dependent functions
as needed. In particular, directory_fetches_from_authorities() only needs
to be called on clients.
Stopping spurious directory_fetches_from_authorities() calls on every
download on public relays has the following impacts:
* fewer address resolution attempts, particularly those mentioned in 21789
* fewer descriptor rebuilds
* fewer log messages, particularly those limited in 20610
Fixes 23470 in 0.2.8.1-alpha.
The original bug was introduced in commit 35bbf2e as part of prop210.
OpenBSD doesn't like tricks where you use a too-wide sscanf argument
for a too-narrow array, even when you know the input string
statically. The fix here is just to use bigger buffers.
Fixes 15582; bugfix on a3dafd3f58 in 0.2.6.2-alpha.
But when clients are just starting, make them try each bridge a few times
before giving up on it.
These changes make the bridge download schedules more explicit: before
17750, they relied on undocumented behaviour and specific schedule
entries. (And between 17750 and this fix, they were broken.)
Fixes 23347, not in any released version of tor.
We were always incrementing bridge download statuses on each attempt,
but we were using the "increment on failure" functions to do it.
And we never incremented them on failure.
No behaviour change.
The download schedule tells Tor to wait 15 minutes before downloading
bridge descriptors. But 17750 made Tor ignore that and start immediately.
Since we fixed 17750, Tor waits 15 minutes for bridge client bootstrap,
like the schedule says.
This fixes the download schedule to start immediately, and to try each
bridge 3 times in the first 30 seconds. This should make bridge bootstraps
more reliable.
Fixes 23347.
It is possible that two descriptor upload requests are launched in a very
short time frame which can lead to the second request finishing before the
first one and where that first one will make the HSDir send back a 400
malformed descriptor leading to a warning.
To avoid such, cancel all active directory connections for the specific
descriptor we are about to upload.
Note that this race is still possible on the HSDir side which triggers a log
info to be printed out but that is fine.
Fixes#23457
Signed-off-by: David Goulet <dgoulet@torproject.org>
When option validation or transition is happening, there are no
"current options" -- only "old options" and "maybe new options".
Looking at get_options() is likely a mistake, so have a nonfatal
assertion let us know if we do that.
Closes 22281.
Because we can get a RENDEZVOUS2 cell before the INTRODUCE_ACK, we need to
correctly handle the circuit purpose REND_JOINED that is not change its
purpose when we get an INTRODUCE_ACK and simply close the intro circuit
normally.
Fixes#23455
Signed-off-by: David Goulet <dgoulet@torproject.org>
Before, this function meant "can we connect to this node and
authenticate it using its ed25519 key?" Now it can additionally
mean, "when somebody else connects to this node, do we expect that
they can authenticate using the node's ed25519 key"?
This change lets us future-proof our link authentication a bit.
Closes ticket 20895. No backport needed, since ed25519 link
authentication support has not been in any LTS release yet, and
existing releases with it should be obsolete before any releases
without support for linkauth=3 are released.
This test is important because it tests that upload_descriptor_to_all()
is in synch with pick_hsdir_v3(). That's not the case for the
reachability test which just compares the responsible hsdir sets.
There was a bug in upload_descriptor_to_all() where we picked between
first and second hsdir index based on which time segment we are. That's
not right and instead we should be uploading our two descriptors using a
different hsdir index every time. That is, upload first descriptor using
first hsdir index, and upload second descriptor using second hdsir index.
Also simplify stuff in pick_hdsir_v3() since that's only used to fetch
descriptors and hence we can just always use the fetch hsdir index.
Because of the latest changes on when we rotate, longer lifetime of
descriptors and no more overlap period, the tests needed to be improved to
test more functionnalities.
Signed-off-by: David Goulet <dgoulet@torproject.org>
First, this fixes#23372.
Second, the consensus timings for the build descriptor have been changed to
the current test can pass. More extensive tests of descriptor rotation are
coming in a commit near you because the rotation and time period logic has
been changed.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This is a large and important unit test for the hidden service version
3! It tests the service reachability for a client using different
consensus timings and makes sure that the computed hashring is the same
on both side so it is actually reachable.
Signed-off-by: David Goulet <dgoulet@torproject.org>
With the latest change on how we use the HSDir index, the client and service
need to pick their responsible HSDir differently that is depending on if they
are before or after a new time period.
The overlap mode is active function has been renamed for this and test added.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Because of #23387, we've realized that there is one scenario that makes
the client unable to reach the service because of a desynch in the time
period used. The scenario is as follows:
+------------------------------------------------------------------+
| |
| 00:00 12:00 00:00 12:00 00:00 12:00 |
| SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 |
| |
| $==========|-----------$===========|-----------$===========| |
| ^ ^ |
| C S |
+------------------------------------------------------------------+
In this scenario the HS has a newer consensus than the client, and the
HS just moved to the next TP but the client is still stuck on the old
one. However, the service is not in any sort of overlap mode so it
doesn't cover the old TP anymore, so the client is unable to fetch a
descriptor.
We've decided to solve this by extending the concept of overlap period
to be permanent so that the service always publishes two descriptors and
aims to cover clients with both older and newer consensuses. See the
spec patch in #23387 for more details.
Based on our #23387 findings, it seems like to maintain 24/7
reachability we need to employ different logic when computing hsdir
indices for fetching vs storing. That's to guarantee that the client
will always fetch the current descriptor, while the service will always
publish two descriptors aiming to cover all possible edge cases.
For more details see the next commit and the spec branch.
Signed-off-by: David Goulet <dgoulet@torproject.org>
The pruning process and the deleting ephemeral service function iterates over
all circuits and were asserting on rend_data for a matching circuit. This is
not good because now we have v3 circuits without a rend_data.
Fixes#23429
Signed-off-by: David Goulet <dgoulet@torproject.org>
Use the valid_after time from the consensus to get the time period number else
we might get out of sync with the overlap period that uses valid_after.
Make it an optional feature since some functions require passing a
specific time (like hs_get_start_time_of_next_time_period()).
Signed-off-by: David Goulet <dgoulet@torproject.org>
Undeprecate it;
rename it to TestingClientDNSRejectInternalAddresses;
add the old name as an alias;
reject configurations where it is set but TestingTorNetwork is not;
change the documentation accordingly.
Closes tickets 21031 and 21522.
Version 3 hidden service needs rendezvous point that have the protocol version
HSRend >= 2 else the rendezvous cells are rejected.
Fixes#23361
Signed-off-by: David Goulet <dgoulet@torproject.org>
There are two reasons this is likeliest to happen -- no kernel
support, and some bug in Tor. We'll ask people to check the former
before they report. Closes 23090.
The chdir() call in RunAsDaemon makes the behavior here surprising,
and either way of trying to resolve the surprise seems sure to
startle a significant fraction of users. Instead, let's refuse to
guess, and refuse these configurations.
Closes ticket 22731.
I'm doing this using the Proxy-Authorization: header to support
clients that understand it, and with a new tor-specific header that
makes more sense for our use.
By convention, a function that frobs a foo_t should be called
foo_frob, and it should have a foo_t * as its first argument. But
for many of the buf_t functions, the buf_t was the final argument,
which is silly.
Our convention is that functions which manipulate a type T should be
named T_foo. But the buffer functions were super old, and followed
all kinds of conventions. Now they're uniform.
Here's the perl I used to do this:
\#!/usr/bin/perl -w -i -p
s/read_to_buf\(/buf_read_from_socket\(/;
s/flush_buf\(/buf_flush_to_socket\(/;
s/read_to_buf_tls\(/buf_read_from_tls\(/;
s/flush_buf_tls\(/buf_flush_to_tls\(/;
s/write_to_buf\(/buf_add\(/;
s/write_to_buf_compress\(/buf_add_compress\(/;
s/move_buf_to_buf\(/buf_move_to_buf\(/;
s/peek_from_buf\(/buf_peek\(/;
s/fetch_from_buf\(/buf_get_bytes\(/;
s/fetch_from_buf_line\(/buf_get_line\(/;
s/fetch_from_buf_line\(/buf_get_line\(/;
s/buf_remove_from_front\(/buf_drain\(/;
s/peek_buf_startswith\(/buf_peek_startswith\(/;
s/assert_buf_ok\(/buf_assert_ok\(/;
This lets us drop the testing-only function buf_get_first_chunk_data(),
and lets us implement proto_http and proto_socks without looking at
buf_t internals.
The service needs the latest SRV and set of relays for the best accurate
hashring to upload its descriptor to so it needs a live consensus thus don't
do anything until we have it.
Fixes#23331
Signed-off-by: David Goulet <dgoulet@torproject.org>
When merging #20657, somehow hs_service_dir_info_changed() became unused
leading to not use the re-upload to HSDir when we were missing information
feature.
Turns out that it is not possible to pick an HSDir with a missing descriptor
because in order to compute the HSDir index, the descriptor is mandatory to
have so we can know its position on the hashring.
This commit removes that dead feature and fix the
hs_service_dir_info_changed() not being used.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Based on questions and comments from dgoulet, I've tried to fill
in the reasoning about why these functions work in the way that they
do, so that it will be easier for future programmers to understand
why this code exists and works the way it does.
We used to check if it was set to 0 which is what unused circuit have but when
the rendezvous circuit was cannibalized, the timestamp_dirty is not 0 but we
still need to reset it so we can actually use it without having the chance of
expiring the next second (or very soon).
Fixes#23123
Signed-off-by: David Goulet <dgoulet@torproject.org>
The function was never returning an error code on failure to parse the
OutboundAddress* options.
In the process, it was making our test_options_validate__outbound_addresses()
not test the right thing.
Fixes#23366
Signed-off-by: David Goulet <dgoulet@torproject.org>
This fixes a serious bug in our hsdir set change logic:
We used to add nodes in the list of previous hsdirs everytime we
uploaded to a new hsdir and we only cleared the list when we built a new
descriptor. This means that our prev_hsdirs list could end up with 7
hsdirs, if for some reason we ended up uploading our desc to 7 hsdirs
before rebuilding our descriptor (e.g. this can happen if the set of
hsdirs changed).
After our previous hdsir set had 7 nodes, then our old algorithm would
always think that the set has changed since it was comparing a smartlist
with 7 elements against a smartlist with 6 elements.
This commit fixes this bug, by clearning the prev_hsdirs list before we
upload to all hsdirs. This makes sure that our prev_hsdirs list always
contains the latest hsdirs!
Our logic for detecting hsdir set changes was needlessly compicated: we
had to sort smartlists and compare them.
Instead, we can simplify things by employing the following logic:
"We should reupload our descriptor if the latest HSDir set contains
nodes that were not previously there"
Since we can't be sure that we can unlink enough files on windows
here, let's let the number of permitted entries grow huge if it
really must.
We do this by letting the storagedir hold lots of entries, but still
trying to keep the number of entries under the configured limit. We
also have to tell consdiffmgr not to freak out if it can't actually
remove enough entries.
Part of a fix for bug 22752
Some parentheses were missing making the rend_max_intro_circs_per_period()
return a lower value than it was suppose to.
The calculation is that a service at most will open a number of intro points
that it wants which is 3 by default or HiddenServiceNumIntroductionPoints. Two
extra are launched for performance reason. Finally, this can happen twice for
two descriptors for the current and next time period.
From:
2 * n_intro_wanted + 2
...which resulted in 8 for 3 intro points, this commit fixes it to:
(n_intro_wanted + 2) * 2
... resulting in 12 possible intro point circuit which is the correct maximum
intro circuit allowed per period.
Last, this commit rate limits the the log message if we ever go above that
limit else over a INTRO_CIRC_RETRY_PERIOD, we can print it often!
Fixes#22159
Signed-off-by: David Goulet <dgoulet@torproject.org>
Since ssize_t is signed and might be 64 bits, we should use
tt_i64_op to make sure it's positive. Otherwise, if it is negative,
and we use tt_u64_op, we'll be treating it as a uint64_t, and we
won't detect negative values.
This fixes CID 1416338 and 1416339. Bug not in any released Tor.
That check was wrong:
a) We should be making sure that the size of `key` is big enough before
proceeding, since that's the buffer that we would overread with the
tor_memeq() below.
The old check used to check that `req_key_str` is big enough which is
not right, since we won't read deep into that buffer.
The new check makes sure that `key` has enough size to survive the
tor_memeq(), and if not it moves to the next element of the strmap.
b) That check shouldn't be a BUG since that strmap contains
variable-sized elements and we should not be bugging out if we happen
to compare a small sized element (v2) to a bigger one (v3).
This way, we can clear off the directory requests from our cache and thus
allow the next client to query those HSDir again at the next SOCKS connection.
Signed-off-by: David Goulet <dgoulet@torproject.org>
v3 client now cleans up the HSDir request cache when a connection to a service
was successful.
Closes#23308
Signed-off-by: David Goulet <dgoulet@torproject.org>
We used to not copy the state which means that after HUP we would forget
if we are in overlap mode or not. That caused bugs where the service
would enter overlap mode twice, and rotate its descs twice, causing all
sorts of bugs.
Apart from the fact that a newly allocated service doesn't have descriptors
thus the move condition can never be true, the service needs the descriptor
signing key to cross-certify the authentication key of each intro point so we
need to move the descriptors between services and not only the intro points.
Fixes#23056
Signed-off-by: David Goulet <dgoulet@torproject.org>
We refactor the descriptor reupload logic to be similar to the v2 logic
where we update a global 'consider_republishing_rend_descriptors' flag
and then we use that to check for hash ring changes during the global
hidden service callbacks.
This fixes bugs where we would inspect the hash ring immediately as we
receive new dirinfo (e.g. consensus) but before running the hidden
service housekeeping events. That was leaving us in an inconsistent
state wrt hsdir indices and causing bugs all around.
The problem was that when we went from overlap mode to non-overlap mode,
we were not wiping the 'desc_next' descriptor and instead we left it on
the service. This meant that all functions that iterated service
descriptors were also inspecting the useless 'desc_next' descriptor that
should have been deleted.
This commit refactors rotate_all_descriptors() so that it rotates
descriptor both when entering overlap mode and also when leaving it.
The `test-operator-cleanup` patch, and related coccinelle patches,
don't do any checks for line length. This patch fixes the line
length issues caused by the previous commits.
This patch fixes the operator usage in src/test/*.c to use the symbolic
operators instead of the normal C comparison operators.
This patch was generated using:
./scripts/coccinelle/test-operator-cleanup src/test/*.[ch]
A client can re-extend up to 3 intro points on the same circuit. This happens
when we get NACKed by the intro point for which we choose a new intro and
re-extend the circuit to it.
That process can be arbitrarly long so reset the dirty timestamp of the
circuit everytime we choose to re-extend so we get a bit more time to actually
do our introduction.
This is a client circuit so it is short live once opened thus giving us a bit
more time to complete the introduction is ok.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When looking for an introduction circuit in circuit_get_best(), we log an info
message if we are about to launch a new intro circuit in parallel. However,
the condition was considering marked for close circuit leading to the function
triggering the log info even though there is actually no valid intro circuit.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Only register the RP circuit when it opens and not when we send the INTRODUCE1
cell else, when re-extending to a new IP, we would register the same RP
circuit with the same cookie twice leading to the circuit being closed.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Changed the assert_intro_circ_ok() to an almost non fatal function so tor can
recover properly. We keep the anonymity assert because if that is not right,
we have much deeper problems and client should stop sending bytes to the
network immediately.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This function has been replaced by hs_client_receive_rendezvous_acked(() doing
the same exact thing for both v2 and v3 service.
Signed-off-by: David Goulet <dgoulet@torproject.org>
The client needs to find the right intro point object from the circuit
identity digest it is opened to. This new function does that.
Signed-off-by: David Goulet <dgoulet@torproject.org>
New function named hs_cell_introduce1_data_clear() is introduced to clear off
an hs_cell_introduce1_data_t object.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When a client decodes a descriptor, make sure it matches the expected blinded
key which is derived from the hidden service identity key.
Signed-off-by: David Goulet <dgoulet@torproject.org>
We can't trigger a valid upload because it would require us to MOCK a long
list of functions ultimately not really testing the upload because we aren't
on a running network.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Conflicts:
src/test/test_hs_service.c
Needed by the client when fetching a descriptor. This function checks the
directory purpose and hard assert if it is not for fetching.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This commit makes the client use the intro point state cache. It notes down
when we get a NACK from the intro point and then uses that cache to decide if
it should either close the circuits or re-extend to a new intro point.
This also introduces a very useful function that checks if an intro point is
usable that is query the state cache and checks a series of requirement.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This cache keeps track of the state of intro points which is needed when we
have failures when using them. It is similar to the failure cache of the
legacy system.
At this commit, it is unused but initialized, cleanup and freed.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This moves it to hs_client.c so it can be used by both system (legacy and
prop224). For now, only the legacy system uses it.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Don't assert() on rend_data when closing circuits to report an IP failure. The
prop224 code doesn't have yet the support for this.
Signed-off-by: David Goulet <dgoulet@torproject.org>
For now, prop224 doesn't have a mechanism to note down connection attempts so
we only do it for legacy system using rend_data.
Signed-off-by: David Goulet <dgoulet@torproject.org>
The client can now handle RENDEZVOUS2 cell when they arrive. This consolidate
both hidden service version in one function.
Signed-off-by: David Goulet <dgoulet@torproject.org>
The client is now able to handle an INTRODUCE_ACK cell and do the appropriate
actions.
An intro point failure cache is missing and a way to close all intro point
that were launched in parallel. Some notes are in the comment for that.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Add a function to parse an INTRODUCE_ACK cell in hs_cell.c. Furthermore, add
an enum that lists all possible expected status code.
Signed-off-by: David Goulet <dgoulet@torproject.org>
From an edge connection object, add a function that randomly pick an
introduction point for the requested service.
This follows the code design of rend_client_get_random_intro() and returns an
extend_info_t object ready to be used to extend to.
At this commit, it is not used yet.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When a descriptor fetch has completed and it has been successfully stored in
the client cache, this callback will take appropriate actions to attach
streams and/or launch neede circuits to connect to the service.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Client now handles a RENDEZVOUS_ESTABLISHED cell when it arrives on the
rendezvous circuit. This new function applies for both the legacy system and
prop224.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Add a function to build the cell.
Add a the logic to send the cell when the rendezvous circuit opens.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Make a single entry point for the entire HS subsystem when a client circuit
opens (every HS version).
Signed-off-by: David Goulet <dgoulet@torproject.org>
Add a function in hs_cell.{c|h} for a client to build an INTRODUCE1 cell using
an object that contains all the needed keys to do so.
Add an entry point in hs_client.c that allows a tor client to send an
INTRODUCE1 cell on a given introduction circuit.
It includes the building of the cell, sending it and the setup of the
rendezvous circuit with the circuit identifier.
The entry point function is still unused at this commit.
Signed-off-by: David Goulet <dgoulet@torproject.org>
The hs circuit file had this function that takes a list of link specifiers and
return a newly allocated extend info object. Make it public so the client side
can also use it to be able to extend to introduction point.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Put all the possible assert() we can do on a client introduction circuit in
one helper function to make sure it is valid and usable.
It is disabled for now so gcc doesn't complain that we have a unused function.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This commit only moves code into a function. The client code will need a way
to take a bunch of descriptor link specifier object and encode them into link
specifiers objects.
Make this a public function so it can be used outside of hs_descriptor.c.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This will be useful to the hidden service subsystem that needs to go over all
connections of a certain state to attach them to a hidden service circuit.
Signed-off-by: David Goulet <dgoulet@torproject.org>
- Add tests that ensure that SOCKS requests for v2/v3 addresses get
intercepted and handled.
- Add test that stores and lookups an HS descriptor in the client-side cache.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Once a descriptor has been successfully downloaded from an HSDir, we flag the
directory connection to "has fetched descriptor" so the connection subsystem
doesn't trigger a new fetch on success.
Same has DIR_PURPOSE_HAS_FETCHED_RENDDESC_V2 but for prop224.
Signed-off-by: David Goulet <dgoulet@torproject.org>
- Also add tests for the hidserv_req subsystem.
- Introduce purge_v2_hidserv_req() wrapper to simplify v2 code.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Specifically move the pick_hsdir() function and all the HSDir request tracking
code. We plan to use all that code both for v2 and v3.
This commit only moves code.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Right now there's a single warn_if_unnamed flag for
router_get_consensus_status_by_nickname() and
node_get_by_nickname(), that is nearly always 1. I've turned it
into an 'unsigned' bitfield, and inverted its sense. I've added the
flags argument to node_get_by_hex_id() too, though it does nothing
there right now.
I've removed the router_get_consensus_status_by_nickname() function,
since it was only used in once place.
This patch changes the warning behavior of GETINFO ns/name/<name>,
since all other name lookups from the controller currently warn.
Later I'm going to add more flags, for ed25519 support.
We will need to edit this function, and it's already pretty huge. Let's make
it a bit smaller.
This commit moves code, fixes a 80 char line and add two lines at the start to
make it compile. Trivial change.
Signed-off-by: David Goulet <dgoulet@torproject.org>
We need this func so that we recognize SOCKS conns to v3 addresses.
- Also rename rend_valid_service_id() to rend_valid_v2_service_id()
- Also move parse_extended_hostname() tests to their own unittest, and
add a v3 address to the test as well.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When we enter overlap mode we start using the next hsdir index of
relays. However, we only compute the next hsdir index of relays when we
receive a consensus or their descriptor. This means that there is a
window of time between entering the overlap period and fetching the
consensus where relays have their next hsdir index uninitialized. This
patch fixes this by recomputing all hsdir indices when we first enter
the overlap period.
We want to reupload our descriptor if its set of responsible HSDirs
changed to minimize reachability issues.
This patch adds a callback everytime we get new dirinfo which checks if
the hash ring changed and reuploads descriptor if needed.
Because the HS subsystem calls it every second, change the log level to debug
so it doesn't spam the info log.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Our Windows compiler treats "time_t" as long long int but Linux likes it
long int so cast those to make Windows happy.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Use the PATH_SEPARATOR for a path comparaison so it works with Windows as
well.
Partially fix#23223
Signed-off-by: David Goulet <dgoulet@torproject.org>
This change should improve overhead for downloading small numbers of
descriptors and microdescriptors by improving compression
performance and lowering directory request overhead.
Closes ticket 23220.
The old implementation would fail with super-long inputs. We never
gave it any, but still, it's nicer to dtrt here.
Reported by Guido Vranken. Fixes bug 19281.
- Fix various ssize_t/size_t confusions in the tests.
- Fix a weird memset argument:
"bad_memset: Argument -16 in memset loses precision in
memset(&desc_two->blinded_kp.pubkey.pubkey, -16, 32UL)."
- Fix check_after_deref instance in check_state_line_for_service_rev_counter():
"check_after_deref: Null-checking items suggests that it may be null,
but it has already been dereferenced on all paths leading to the
check."
Add a common function for both legacy and prop224 hidden service to increment
and decrement the rendezvous stream counter on an origin circuit.
Signed-off-by: David Goulet <dgoulet@torproject.org>
We used to have a small HS desc cert lifetime but those certs can stick
around for 36 hours if they get initialized in the beginning of overlap
period.
[warn] Bug: Non-fatal assertion !(hs_desc_encode_descriptor(desc->desc, &desc->signing_kp, &encoded_desc) < 0) failed in
upload_descriptor_to_hsdir at src/or/hs_service.c:1886. Stack trace: (on Tor 0.3.2.0-alpha-dev b4a14555597fb9b3)
I repurposed the old directory_request_set_hs_ident() into a new
directory_request_upload_set_hs_ident() which is only used for the
upload purpose and so it can assert on the dir_purpose.
When coding the client-side we can make a second function for fetch.
We used to do:
h = H(BLIND_STRING | H(A | s | B | N )
when we should be doing:
h = H(BLIND_STRING | A | s | B | N)
Change the logic so that hs_common.c does the hashing, and our ed25519
libraries just receive the hashed parameter ready-made. That's easier
than doing the hashing on the ed25519 libraries, since that means we
would have to pass them a variable-length param (depending on whether
's' is set or not).
Also fix the ed25519 test vectors since they were also double hashing.
We also had to alter the SRV functions to take a consensus as optional
input, since we might be setting our HSDir index using a consensus that
is currently being processed and won't be returned by the
networkstatus_get_live_consensus() function.
This change has two results:
a) It makes sure we are using a fresh consensus with the right SRV value
when we are calculating the HSDir hash ring.
b) It ensures that we will not use the sr_get_current/previous()
functions when we don't have a consensus which would have falsely
triggered the disaster SRV logic.
In Nick's words:
"We want to always return false if the platform is a Tor version, and it
is not as new as 0.3.0.8 -- but if the platform is not a Tor version, or
if the version is as new as 0.3.0.8, then we want to obey the protocol
list.
That way, other implementations of our protocol won't have to claim any
particular Tor version, and future versions of Tor will have the freedom
to drop this protocol in the distant future."
- Fix log message format string.
- Do extra circuit purpose check.
- wipe memory in a clear function
- Make sure we don't double add intro points in our list
- Make sure we don't double close intro circuits.
- s/tt_u64_op/tt_i64_op/
Turns out that introduction points don't care about the INTRODUCE2 cell
format as long as the top field is LEGACY_KEY_ID as expected. So let's
use a single INTRODUCE format regardless of the introduction point being
legacy or not.
This also removes the polymorphic void* situation.
Signed-off-by: David Goulet <dgoulet@torproject.org>
To upload the descriptor we needed a state file to write the rev counters in,
but that test did not have a state file initialized.
Also fix the typo in its func name.
Signed-off-by: David Goulet <dgoulet@torproject.org>
We consider to be in overlap mode when we are in the period of time between a
fresh SRV and the beginning of the new time period (in the normal network this
is between 00:00 and 12:00 UTC). This commit edits that function to use the
above semantic logic instead of absolute times.
Signed-off-by: David Goulet <dgoulet@torproject.org>
It used to be that time periods were 24 hours long even on chutney,
which made testing harder. With this commit, time periods have the same
length as a full SRV protocol run, which means that they will change
every 4 minutes in a 10-second voting interval chutney network!
Make this function public so we can use it both in hs_circuit.c and
hs_service.c to avoid code duplication.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This introduces a callback to relaunch a service rendezvous circuit when a
previous one failed to build or expired.
It unifies the legacy function rend_service_relaunch_rendezvous() with one for
specific to prop224. There is now only one entry point for that which is
hs_circ_retry_service_rendezvous_point() supporting both legacy and prop224
circuits.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Change the timing for intro point's lifetime and maximum amount of circuit we
are allowed to launch in a TestingNetwork. This is particurlarly useful for
chutney testing to test intro point rotation.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When the circuit is about to be freed which has been marked close before, for
introduction circuit we now call this has_closed() callback so we can cleanup
any introduction point that have retried to many times or at least flag them
that their circuit is not established anymore.
Signed-off-by: David Goulet <dgoulet@torproject.org>
We used to use NULL subcredential which is a terrible terrible idea. Refactor
HS unittests to use subcredentials.
Also add some non-fatal asserts to make sure that we always use subcredentials
when decoding/encoding descs.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This commit refactors the handle_hs_exit_conn() function introduced at a prior
commit that connects the rendezvous circuit to the edge connection used to
connect to the service virtual port requested in a BEGIN cell.
The refactor adds the support for prop224 adding the
hs_service_set_conn_addr_port() function that has the same purpose has
rend_service_set_connection_addr_port() from the legacy code.
The rend_service_set_connection_addr_port() has also been a bit refactored so
the common code can be shared between the two HS subsystems (legacy and
prop224).
In terms of functionallity, nothing has changed, we still close the circuits
in case of failure for the same reasons as the legacy system currently does.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This commit simply moves the code from the if condition of a rendezvous
circuit to a function to handle such a connection. No code was modified
_except_ the use or rh.stream_id changed to n_stream->stream_id so we don't
have to pass the cell header to the function.
This is groundwork for prop224 support which will break down the
handle_hs_exit_conn() depending on the version of hidden service the circuit
and edge connection is for.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Introduction point are rotated either if we get X amounts of INTRODUCE2 cells
on it or a time based expiration. This commit adds two consensus parameters
which are the min and max value bounding the random value X.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Imagine a Tor network where you have only 8 nodes available due to some
reasons. And your hidden service wants 8 introduction points. Everything is
fine but then a node goes down bringing the network to 7. The service will
retry 3 times that node and then give up but keep it in a failure cache for 5
minutes (INTRO_CIRC_RETRY_PERIOD) so it doesn't retry it non stop and exhaust
the maximum number of circuit retry.
In the real public network today, this is unlikely to happen unless the
ExcludeNodes list is extremely restrictive.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This commit adds a directory command function to make an upload directory
request for a service descriptor.
It is not used yet, just the groundwork.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This hsdir index value is used to give an index value to all node_t (relays)
that supports HSDir v3. An index value is then computed using the blinded key
to know where to fetch/upload the service descriptor from/to.
To avoid computing that index value everytime the client/service needs it, we
do that everytime we get a new consensus which then doesn't change until the
next one. The downside is that we need to sort them once we need to compute
the set of responsible HSDir.
Finally, the "hs_index" function is also added but not used. It will be used
in later commits to compute which node_t is a responsible HSDir for the
service we want to fetch/upload the descriptor.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Add this helper function that can lookup and return all the needed object from
a circuit identifier. It is a pattern we do often so make it nicer and avoid
duplicating it everywhere.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Add the entry point from the circuit subsystem of "circuit has opened" which
is for all type of hidden service circuits. For the introduction point, this
commit actually adds the support for handling those circuits when opened and
sending ESTABLISH_INTRO on a circuit.
Rendevzou point circuit aren't supported yet at this commit.
Signed-off-by: David Goulet <dgoulet@torproject.org>
This commit adds the functionality for a service to build its descriptor.
Also, a global call to build all descriptors for all services is added to the
service scheduled events.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Add the main loop entry point to the HS service subsystem. It is run every
second and make sure that all services are in their quiescent state after that
which means valid descriptors, all needed circuits opened and latest
descriptors have been uploaded.
For now, only v2 is supported and placeholders for v3 actions for that main
loop callback.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Add a function for both the client and service side that is building a blinded
key from a keypair (service) and from a public key (client). Those two
functions uses the current time period information to build the key.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Add a new and free function for hs_desc_intro_point_t so the service can use
them to setup those objects properly.
Signed-off-by: David Goulet <dgoulet@torproject.org>
The GNU C Library (glibc) offers an function which allocates the
necessary memory automatically [0]. When it is available, we use that.
Otherwise we depend upon the `getcwd` function which requires a
preallocated buffer (and its size). This function was used incorrectly
by depending on the initial buffer size being big enough and otherwise
failing to return the current working directory. The proper way of
getting the current working directory requires a loop which doubles the
buffer size if `getcwd` requires it. This code was copied from [1] with
modifications to fit the context.
[0] https://www.gnu.org/software/hurd/hurd/porting/guidelines.html
[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/getcwd.html
Fixes bug 23106; bugfix on 0.2.4.8-alpha.
Fortunately, we only support big-endian and little-endian platforms,
and on both of those, hton*() and ntoh*() behave the same. And if
we did start to support middle endian systems (haha, no), most of
_those_ have hton*(x) == ntoh*(x) too.
Since we start with desc_clean_since = 0, we should have been
starting with non-null desc_dirty_reason.
Fixes bug 22884; bugfix on 0.2.3.4-alpha when X-Desc-Gen-Reason was
added.
Patch from Vort; fixes bug 23081; bugfix on fd992deeea in
0.2.1.16-rc when set_main_thread() was introduced.
See the changes file for a list of all the symptoms this bug has
been causing when running Tor as a Windows Service.
The condition was always true meaning that we would reconsider updating our
directory information every 2 minutes.
If valid_until is 6am today, then now - 24h == 1pm yesterday which means that
"valid_until < (now - 24h)" is false. But at 6:01am tomorrow, "valid_until <
(now - 24h)" becomes true which is that point that we shouldn't trust the
consensus anymore.
Fixes#23091
Signed-off-by: David Goulet <dgoulet@torproject.org>
One log statement was a warning and has been forgotten. It is triggered for a
successful attempt at introducting from a client.
It has been reported here:
https://lists.torproject.org/pipermail/tor-relays/2017-August/012689.html
Three other log_warn() statement changed to protocol warning because they are
errors that basically can come from the network and thus triggered by anyone.
Fixes#23078.
Signed-off-by: David Goulet <dgoulet@torproject.org>
We don't actually want Coverity to complain when a BUG() check can
never fail, since such checks can prevent us from introducing bugs
later on.
Closes ticket 23054. Closes CID 1415720, 1415724.
Now that half the threads are permissive and half are strict, we
need to make sure we have at least two threads, so that we'll
have at least one of each kind.
Instead of choosing a lower-priority job with a 1/37 chance, have
the chance be 1/37 for half the threads, and 1/2147483647 for the
other half. This way if there are very slow jobs of low priority,
they shouldn't be able to grab all the threads when there is better
work to do.
Each piece of queued work now has an associated priority value; each
priority goes on a separate queue.
With probability (N-1)/N, the workers will take work from the highest
priority nonempty queue. Otherwise, they'll look for work in a
queue of lower priority. This behavior is meant to prevent
starvation for lower-priority tasks.
In the Linux kernel, the BUG() macro causes an instant panic. Our
BUG() macro is different, however: it generates a nonfatal assertion
failure, and is usable as an expression.
Additionally, this patch tells util_bug.h to make all assertion
failures into fatal conditions when we're building with a static
analysis tool, so that the analysis tool can look for instances
where they're reachable.
Fixes bug 23030.
Wow, it sure seems like some compilers can't implement isnan() and
friends in a way that pleases themselves!
Fixes bug 22915. Bug trigged by 0.2.8.1-alpha and later; caused by
clang 4.
A prop224 descriptor was missing the onion key for an introduction point which
is needed to extend to it by the client.
Closes#22979
Signed-off-by: David Goulet <dgoulet@torproject.org>
Remove the legacy intro point key because both service and client only uses
the ed25519 key even though the intro point chosen is a legacy one.
This also adds the CLIENT_PK key that is needed for the ntor handshake.
Signed-off-by: David Goulet <dgoulet@torproject.org>
We need to keep these around for TAP and old-style hidden services,
but they're obsolete, and we shouldn't encourage anyone to use them.
So I've added "obsolete" to their names, and a comment explaining
what the problem is.
Closes ticket 23026.
Closes bug 22964. Based on Teor's replacement there, but tries
to put the comment in a more logical place, and explain why we're
actually disabling compression in the first place.
There isn't much of a point of this buggy test afterall to add twice the same
service object but with a different key which ultinately can end up failing
the test because 1/N_BUCKETS of probability that we end up to put the service
in the same bucket.
Fixes#23023
Signed-off-by: David Goulet <dgoulet@torproject.org>
In zstd 1.3.0, once you have called ZSTD_endStream and been told
that your putput buffer is full, it really doesn't want you to call
ZSTD_compressStream again. ZSTD 1.2.0 didn't seem to mind about
this.
This patch fixes the issue by making sure never to call
ZSTD_endStream if there's any more data on the input buffer to
process, by flushing even when we're about to call "endStream", and
by never calling "compress" or "flush" after "endStream".
Fix for 22924. Bugfix on 0.2.9.1-alpha when the test was introducd
-- though it couldn't actually overflow until we fixed 17750.
Additionally, this only seems to overflow on 32-bit, and only when
the compiler doesn't re-order the (possibly dead) assignment out of
the way. We ran into it on a 32-bit ubuntu trusty builder.