This patch moves the measured_bw field and the has_measured_bw field
into vote_routerstatus_t, since only votes have 'Measured=XX' set on
their weight line.
I also added a new bw_is_unmeasured flag to routerstatus_t to
represent the Unmeasured=1 flag on a w line. Previously, I was using
has_measured_bw for this, which was quite incorrect: has_measured_bw
means that the measured_bw field is set, and it's probably a mistake
to have it serve double duty as meaning that 'baandwidth' represents a
measured value.
While making this change,I also found a harmless but stupid bug in
dirserv_read_measured_bandwidths: It assumes that it's getting a
smartlist of routerstatus_t, when really it's getting a smartlist of
vote_routerstatus_t. C's struct layout rules mean that we could never
actually get an error because of that, but it's still quite incorrect.
I fixed that, and in the process needed to add two more sorting and
searching helpers.
Finally, I made the Unmeasured=1 flag get parsed. We don't use it for
anything yet, but someday we might.
This isn't complete yet -- the new 2286 unit test doesn't build.
Instead of capping whenever a router has fewer than 3 measurements,
we cap whenever a router has fewer than 3 measurements *AND* there
are at least 3 authorities publishing measured bandwidths.
We also generate bandwidth lines with a new "Unmeasured=1" flag,
meaning that we didn't have enough observations for a node to use
measured bandwidth values in the authority's input, whether we capped
it or not.
Stop marking every relay as having been down for one hour every
time we restart a directory authority. These artificial downtimes
were messing with our Stable and Guard flag calculations.
Fixes bug 8218 (introduced by the fix for 1035). Bugfix on 0.2.2.23-alpha.
Apparently something in the directory guard code made it possible
for the same node to get added as a guard over and over when there
were no actual running guard nodes.
Relays used to check every 10 to 60 seconds, as an accidental side effect
of calling directory_fetches_from_authorities() when considering doing
a directory fetch. The fix for bug 1992 removes that side effect. At the
same time, bridge relays never had the side effect, leading to confused
bridge operators who tried crazy tricks to get their bridges to notice
IP address changes (see ticket 1913).
The new behavior is to reinstate an every-60-seconds check for both
public relays and bridge relays, now that the side effect is gone.
For example, we were doing a resolve every time we think about doing a
directory fetch. Now we reuse the cached answer in some cases.
Fixes bugs 1992 (bugfix on 0.2.0.20-rc) and 2410 (bugfix on
0.1.2.2-alpha).
When we compute the estimated microseconds we need to handle our
pending onionskins, we could (in principle) overflow a uint32_t if
we ever had 4 million pending onionskins before we had any data
about how onionskins take. Nevertheless, let's compute it properly.
Fixes bug 8210; bugfix on 0.2.4.10. Found by coverity; this is CID
980651.
Coverity is worried that we're checking entry_conn in some cases,
but not in the case where we set entry_conn->pending_optimistic_data.
This commit should calm it down (CID 718623).
The refactoring in commit 471ab34032 wasn't complete enough: we
were checking the auth_len variable, but never actually setting it,
so it would never seem that authentication had been provided.
This commit also removes a bunch of unused variables from
rend_service_introduce, whose unusedness we hadn't noticed because
we were wiping them at the end of the function.
Fix for bug 8207; bugfix on 0.2.4.1-alpha.
It returns the method by which we decided our public IP address
(explicitly configured, resolved from explicit hostname, guessed from
interfaces, learned by gethostname).
Now we can provide more helpful log messages when a relay guesses its IP
address incorrectly (e.g. due to unexpected lines in /etc/hosts). Resolves
ticket 2267.
While we're at it, stop sending a stray "(null)" in some cases for the
server status "EXTERNAL_ADDRESS" controller event. Resolves bug 8200.
This check isn't necessary (see comment on #7801), but it took at
least two smart people a little while to see why it wasn't necessary,
so let's have it in to make the code more readable.
We need a weak RNG in a couple of places where the strong RNG is
both needless and too slow. We had been using the weak RNG from our
platform's libc implementation, but that was problematic (because
many platforms have exceptionally horrible weak RNGs -- like, ones
that only return values between 0 and SHORT_MAX) and because we were
using it in a way that was wrong for LCG-based weak RNGs. (We were
counting on the low bits of the LCG output to be as random as the
high ones, which isn't true.)
This patch adds a separate type for a weak RNG, adds an LCG
implementation for it, and uses that exclusively where we had been
using the platform weak RNG.
If we're deciding on a node's bandwidth based on "Bandwidth="
declarations, clip it to "20" or to the maxunmeasuredbw parameter,
if it's voted on.
This adds a new consensus method.
This is "part A" of bug 2286
Authorities don't set is_possible_guard on node_t, so they were
never deciding that they could build enough paths. This is a quick
and dirty fix.
Bug not in any released version of Tor
These seem to have gotten conflicted out of existence while mike was
working on path bias stuff.
Thanks to sysrqb for collecting these in a handy patch.
Now we can specify to skip bridges that wouldn't be able to answer the
type of dir fetch we're launching.
It's still the responsibility of the rest of the code to prevent us from
launching a given dir fetch if we have no bridges that could handle it.
Now as we move into a future where most bridges can handle microdescs
we will generally find ourselves using them, rather than holding back
just because one of our bridges doesn't use them.
When we first implemented TLS, we assumed in conneciton_handle_write
that a TOR_TLS_WANT_WRITE from flush_buf_tls meant that nothing had
been written. But when we moved our buffers to a ring buffer
implementation back in 0.1.0.5-rc (!), we broke that invariant: it's
possible that some bytes have been written but nothing.
That's bad. It means that if we do a sequence of TLS writes that ends
with a WANTWRITE, we don't notice that we flushed any bytes, and we
don't (I think) decrement buckets.
Fixes bug 7708; bugfix on 0.1.0.5-rc
Also, deprecate the torrc options for the scaling values. It's unlikely anyone
but developers will ever tweak them, even if we provided a single ratio value.
This is meant to avoid conflict with the built-in log() function in
math.h. It resolves ticket 7599. First reported by dhill.
This was generated with the following perl script:
#!/usr/bin/perl -w -i -p
s/\blog\(LOG_(ERR|WARN|NOTICE|INFO|DEBUG)\s*,\s*/log_\L$1\(/g;
s/\blog\(/tor_log\(/g;
Instead of hardcoding the minimum fraction of possible paths to 0.6, we
take it from the user, and failing that from the consensus, and
failing that we fall back to 0.6.
Previously we did this based on the fraction of descriptors we
had. But really, we should be going based on what fraction of paths
we're able to build based on weighted bandwidth, since otherwise a
directory guard or two could make us behave quite oddly.
Implementation for feature 5956
If any circuits were opened during a scaling event, we were scaling attempts
and successes by different amounts. This leads to rounding error.
The fix is to record how many circuits are in a state that hasn't been fully
counted yet, and subtract that before scaling, and add it back afterwords.
Since they use RELAY_EARLY (which can be seen by all hops on the path),
it's not safe to say they actually count as a successful use.
There are also problems with trying to allow them to finish extending due to
the circuit purpose state machine logic. It is way less complicated (and
possibly more semantically coherent) to simply wait until we actually try to
do something with them before claiming we 'used' them.
Also, we shouldn't call timed out circuits 'used' either, for semantic
consistency.
An adversary could let the first stream request succeed (ie the resolve), but
then tag and timeout the remainder (via cell dropping), forcing them on new
circuits.
Rolling back the state will cause us to probe such circuits, which should lead
to probe failures in the event of such tagging due to either unrecognized
cells coming in while we wait for the probe, or the cipher state getting out
of sync in the case of dropped cells.
Path use bias measures how often we can actually succeed using the circuits we
actually try to use. It is a subset of path bias accounting, but it is
computed as a separate statistic because the rate of client circuit use may
vary depending on use case.
This is a minimal refactoring to expose the weighted bandwidth
calculations for each node so I can use them to see what fraction of
nodes, weighted by bandwidth, we have descriptors for.
This is ticket 7706, reported by "bugcatcher." The rationale here
is that if somebody says 'ExcludeNodes {tv}', then they probably
don't just want to block definitely Tuvaluan nodes: they also want
to block nodes that have unknown country, since for all they know
such nodes are also in Tuvalu.
This behavior is controlled by a new GeoIPExcludeUnknown autobool
option. With the default (auto) setting, we exclude ?? and A1 if
any country is excluded. If the option is 1, we add ?? and A1
unconditionally; if the option is 0, we never add them.
(Right now our geoip file doesn't actually seem to include A1: I'm
including it here in case it comes back.)
This feature only takes effect if you have a GeoIP file. Otherwise
you'd be excluding every node.
The implementation is pretty straightforward: parse_extended_hostname() is
modified to drop any leading components from an address like
'foo.aaaaaaaaaaaaaaaa.onion'.
This is an automatically generated commit, from the following perl script,
run with the options "-w -i -p".
s/smartlist_string_num_isin/smartlist_contains_int_as_string/g;
s/smartlist_string_isin((?:_case)?)/smartlist_contains_string$1/g;
s/smartlist_digest_isin/smartlist_contains_digest/g;
s/smartlist_isin/smartlist_contains/g;
s/digestset_isin/digestset_contains/g;
In 6fbdf635 we added a couple of statements like:
if (test) {
...
};
The extraneous semicolons there get flagged as worrisome empty
statements by the cparser library, so let's fix them.
Patch by Christian Grothoff; fixes bug 7115.
Otherwise, it's possible to create streams or circuits with these
bogus IDs, leading to orphaned circuits or streams, or to ones that
can cause bandwidth DOS problems.
Fixes bug 7889; bugfix on all released Tors.
In general, if we tried to use a circ for a stream, but then decided to place
that stream on a different circuit, we need to probe the original circuit
before deciding it was a "success".
We also need to do the same for cannibalized circuits that go unused.
This makes removing items from the middle of the queue into an O(1)
operation, which could prove important as we let onionqueues grow
longer.
Doing this actually makes the code slightly smaller, too.
The right way to set "MaxOnionsPending" was to adjust it until the
processing delay was appropriate. So instead, let's measure how long
it takes to process onionskins (sampling them once we have a big
number), and then limit the queue based on its expected time to
finish.
This change is extra-necessary for ntor, since there is no longer a
reasonable way to set MaxOnionsPending without knowing what mix of
onionskins you'll get.
This patch also reserves 1/3 of the onionskin spots for ntor
handshakes, on the theory that TAP handshakes shouldn't be allowed to
starve their speedier cousins. We can change this later if need be.
Resolves 7291.
The unit of work sent to a cpuworker is now a create_cell_t; its
response is now a created_cell_t. Several of the things that call or
get called by this chain of logic now take create_cell_t or
created_cell_t too.
Since all cpuworkers are forked or spawned by Tor, they don't need a
stable wire protocol, so we can just send structs. This saves us some
insanity, and helps p
As elsewhere, it makes sense when adding or extending a cell type to
actually make the code to parse it into a separate tested function.
This commit doesn't actually make anything use these new functions;
that's for a later commit.
The handshake_digest field was never meaningfully a digest *of* the
handshake, but rather is a digest *from* the handshake that we exapted
to prevent replays of ESTABLISH_INTRO cells. The ntor handshake will
generate it as more key material rather than taking it from any part
of the circuit handshake reply..
I'm going to want a generic "onionskin" type and set of wrappers, and
for that, it will be helpful to isolate the different circuit creation
handshakes. Now the original handshake is in onion_tap.[ch], the
CREATE_FAST handshake is in onion_fast.[ch], and onion.[ch] now
handles the onion queue.
This commit does nothing but move code and adjust header files.
Here we try to handle curve25519 onion keys from generating them,
loading and storing them, publishing them in our descriptors, putting
them in microdescriptors, and so on.
This commit is untested and probably buggy like whoa
This patch moves curve25519_keypair_t from src/or/onion_ntor.h to
src/common/crypto_curve25519.h, and adds new functions to generate,
load, and store keypairs.
The ntor handshake--described in proposal 216 and in a paper by
Goldberg, Stebila, and Ustaoglu--gets us much better performance than
our current approach.
Our old warn_nonlocal_client_ports() would give a bogus warning for
every nonlocal port every time it parsed any ports at all. So if it
parsed a nonlocal socksport, it would complain that it had a nonlocal
socksport...and then turn around and complain about the nonlocal
socksport again, calling it a nonlocal transport or nonlocal dnsport,
if it had any of those.
Fixes bug 7836; bugfix on 0.2.3.3-alpha.
mr-4 reports on #7799 that he was seeing it several times per second,
which suggests that things had gone very wrong.
This isn't a real fix, but it should make Tor usable till we can
figure out the real issue.