When we're hibernating, the main reqason we can't bootstrap will
always be that we're hibernating: reporting anything else at severity
WARN is pointless.
Fixes part of 7302.
This time, I'm checking whether our calculated offset matches our
real offset, in each case, as we go along. I don't think this is
the bug, but it can't hurt to check.
This should have been 2 bytes all along, since version numbers can
be 16 bits long. This isn't a live bug, since the call to
is_or_protocol_version_known in channel_tls_process_versions_cell
will reject any version number not in the range 1..4. Still, let's
fix this before we accidentally start supporting version 256.
Reported pseudonymously. Fixes bug 8062; bugfix on 0.2.0.10-alpha --
specifically, on commit 6fcda529, where during development I
increased the width of a version to 16 bits without changing the
type of link_proto.
Our ++ should have been += 2. This means that we'd accept version
numbers even when they started at an odd position.
This bug should be harmless in practice for so long as every version
number we allow begins with a 0 byte, but if we ever have a version
number starting with 1, 2, 3, or 4, there will be trouble here.
Fix for bug 8059, reported pseudonymously. Bugfix on 0.2.0.10-alpha
-- specifically, commit 6fcda529, where during development I
increased the width of a version to 16 bits without changing the
loop step.
I have no idea whether b0rken clients will DoS the network if the v2
authorities all turn this on or not. It's experimental. See #6783 for
a description of how to test it more or less safely, and please be
careful!
Now that circid_t is 4 bytes long, the default integer promotions will
leave it alone when sizeof(int) == 4, which will leave us formatting an
unsigned as an int. That's technically undefined behavior.
Fixes bug 8447 on bfffc1f0fc. Bug not
in any released Tor.
In a number of places, we decrement timestamp_dirty by
MaxCircuitDirtiness in order to mark a stream as "unusable for any
new connections.
This pattern sucks for a few reasons:
* It is nonobvious.
* It is error-prone: decrementing 0 can be a bad choice indeed.
* It really wants to have a function.
It can also introduce bugs if the system time jumps backwards, or if
MaxCircuitDirtiness is increased.
So in this patch, I add an unusable_for_new_conns flag to
origin_circuit_t, make it get checked everywhere it should (I looked
for things that tested timestamp_dirty), and add a new function to
frob it.
For now, the new function does still frob timestamp_dirty (after
checking for underflow and whatnot), in case I missed any cases that
should be checking unusable_for_new_conns.
Fixes bug 6174. We first used this pattern in 516ef41ac1,
which I think was in 0.0.2pre26 (but it could have been 0.0.2pre27).
Without this patch, there's no way to know what went wrong when we
fail to parse a torrc line entirely (that is, we can't turn it into
a K,V pair.) This patch introduces a new function that yields an
error message on failure, so we can at least tell the user what to
look for in their nonfunctional torrc.
(Actually, it's the same function as before with a new name:
parse_config_line_from_str is now a wrapper macro that the unit
tests use.)
Fixes bug 7950; fix on 0.2.0.16-alpha (58de695f90) which first
introduced the possibility of a torrc value not parsing correctly.
This patch moves the measured_bw field and the has_measured_bw field
into vote_routerstatus_t, since only votes have 'Measured=XX' set on
their weight line.
I also added a new bw_is_unmeasured flag to routerstatus_t to
represent the Unmeasured=1 flag on a w line. Previously, I was using
has_measured_bw for this, which was quite incorrect: has_measured_bw
means that the measured_bw field is set, and it's probably a mistake
to have it serve double duty as meaning that 'baandwidth' represents a
measured value.
While making this change,I also found a harmless but stupid bug in
dirserv_read_measured_bandwidths: It assumes that it's getting a
smartlist of routerstatus_t, when really it's getting a smartlist of
vote_routerstatus_t. C's struct layout rules mean that we could never
actually get an error because of that, but it's still quite incorrect.
I fixed that, and in the process needed to add two more sorting and
searching helpers.
Finally, I made the Unmeasured=1 flag get parsed. We don't use it for
anything yet, but someday we might.
This isn't complete yet -- the new 2286 unit test doesn't build.
Instead of capping whenever a router has fewer than 3 measurements,
we cap whenever a router has fewer than 3 measurements *AND* there
are at least 3 authorities publishing measured bandwidths.
We also generate bandwidth lines with a new "Unmeasured=1" flag,
meaning that we didn't have enough observations for a node to use
measured bandwidth values in the authority's input, whether we capped
it or not.
Stop marking every relay as having been down for one hour every
time we restart a directory authority. These artificial downtimes
were messing with our Stable and Guard flag calculations.
Fixes bug 8218 (introduced by the fix for 1035). Bugfix on 0.2.2.23-alpha.
Apparently something in the directory guard code made it possible
for the same node to get added as a guard over and over when there
were no actual running guard nodes.
Relays used to check every 10 to 60 seconds, as an accidental side effect
of calling directory_fetches_from_authorities() when considering doing
a directory fetch. The fix for bug 1992 removes that side effect. At the
same time, bridge relays never had the side effect, leading to confused
bridge operators who tried crazy tricks to get their bridges to notice
IP address changes (see ticket 1913).
The new behavior is to reinstate an every-60-seconds check for both
public relays and bridge relays, now that the side effect is gone.
For example, we were doing a resolve every time we think about doing a
directory fetch. Now we reuse the cached answer in some cases.
Fixes bugs 1992 (bugfix on 0.2.0.20-rc) and 2410 (bugfix on
0.1.2.2-alpha).
When we compute the estimated microseconds we need to handle our
pending onionskins, we could (in principle) overflow a uint32_t if
we ever had 4 million pending onionskins before we had any data
about how onionskins take. Nevertheless, let's compute it properly.
Fixes bug 8210; bugfix on 0.2.4.10. Found by coverity; this is CID
980651.
Coverity is worried that we're checking entry_conn in some cases,
but not in the case where we set entry_conn->pending_optimistic_data.
This commit should calm it down (CID 718623).
The refactoring in commit 471ab34032 wasn't complete enough: we
were checking the auth_len variable, but never actually setting it,
so it would never seem that authentication had been provided.
This commit also removes a bunch of unused variables from
rend_service_introduce, whose unusedness we hadn't noticed because
we were wiping them at the end of the function.
Fix for bug 8207; bugfix on 0.2.4.1-alpha.
It returns the method by which we decided our public IP address
(explicitly configured, resolved from explicit hostname, guessed from
interfaces, learned by gethostname).
Now we can provide more helpful log messages when a relay guesses its IP
address incorrectly (e.g. due to unexpected lines in /etc/hosts). Resolves
ticket 2267.
While we're at it, stop sending a stray "(null)" in some cases for the
server status "EXTERNAL_ADDRESS" controller event. Resolves bug 8200.
This check isn't necessary (see comment on #7801), but it took at
least two smart people a little while to see why it wasn't necessary,
so let's have it in to make the code more readable.
We need a weak RNG in a couple of places where the strong RNG is
both needless and too slow. We had been using the weak RNG from our
platform's libc implementation, but that was problematic (because
many platforms have exceptionally horrible weak RNGs -- like, ones
that only return values between 0 and SHORT_MAX) and because we were
using it in a way that was wrong for LCG-based weak RNGs. (We were
counting on the low bits of the LCG output to be as random as the
high ones, which isn't true.)
This patch adds a separate type for a weak RNG, adds an LCG
implementation for it, and uses that exclusively where we had been
using the platform weak RNG.
If we're deciding on a node's bandwidth based on "Bandwidth="
declarations, clip it to "20" or to the maxunmeasuredbw parameter,
if it's voted on.
This adds a new consensus method.
This is "part A" of bug 2286
Authorities don't set is_possible_guard on node_t, so they were
never deciding that they could build enough paths. This is a quick
and dirty fix.
Bug not in any released version of Tor
These seem to have gotten conflicted out of existence while mike was
working on path bias stuff.
Thanks to sysrqb for collecting these in a handy patch.
Now we can specify to skip bridges that wouldn't be able to answer the
type of dir fetch we're launching.
It's still the responsibility of the rest of the code to prevent us from
launching a given dir fetch if we have no bridges that could handle it.
Now as we move into a future where most bridges can handle microdescs
we will generally find ourselves using them, rather than holding back
just because one of our bridges doesn't use them.
When we first implemented TLS, we assumed in conneciton_handle_write
that a TOR_TLS_WANT_WRITE from flush_buf_tls meant that nothing had
been written. But when we moved our buffers to a ring buffer
implementation back in 0.1.0.5-rc (!), we broke that invariant: it's
possible that some bytes have been written but nothing.
That's bad. It means that if we do a sequence of TLS writes that ends
with a WANTWRITE, we don't notice that we flushed any bytes, and we
don't (I think) decrement buckets.
Fixes bug 7708; bugfix on 0.1.0.5-rc
Also, deprecate the torrc options for the scaling values. It's unlikely anyone
but developers will ever tweak them, even if we provided a single ratio value.
This is meant to avoid conflict with the built-in log() function in
math.h. It resolves ticket 7599. First reported by dhill.
This was generated with the following perl script:
#!/usr/bin/perl -w -i -p
s/\blog\(LOG_(ERR|WARN|NOTICE|INFO|DEBUG)\s*,\s*/log_\L$1\(/g;
s/\blog\(/tor_log\(/g;
Instead of hardcoding the minimum fraction of possible paths to 0.6, we
take it from the user, and failing that from the consensus, and
failing that we fall back to 0.6.
Previously we did this based on the fraction of descriptors we
had. But really, we should be going based on what fraction of paths
we're able to build based on weighted bandwidth, since otherwise a
directory guard or two could make us behave quite oddly.
Implementation for feature 5956
If any circuits were opened during a scaling event, we were scaling attempts
and successes by different amounts. This leads to rounding error.
The fix is to record how many circuits are in a state that hasn't been fully
counted yet, and subtract that before scaling, and add it back afterwords.
Since they use RELAY_EARLY (which can be seen by all hops on the path),
it's not safe to say they actually count as a successful use.
There are also problems with trying to allow them to finish extending due to
the circuit purpose state machine logic. It is way less complicated (and
possibly more semantically coherent) to simply wait until we actually try to
do something with them before claiming we 'used' them.
Also, we shouldn't call timed out circuits 'used' either, for semantic
consistency.
An adversary could let the first stream request succeed (ie the resolve), but
then tag and timeout the remainder (via cell dropping), forcing them on new
circuits.
Rolling back the state will cause us to probe such circuits, which should lead
to probe failures in the event of such tagging due to either unrecognized
cells coming in while we wait for the probe, or the cipher state getting out
of sync in the case of dropped cells.