We really should ignore any timeouts that have *no* network activity for their
entire measured lifetime, now that we have the 95th percentile measurement
changes. Usually this is up to a minute, even on fast connections.
If we really want all this complexity for these stages here, we need to handle
it better for people with large timeouts. It should probably go away, though.
Rechecking the timeout condition was foolish, because it is checked on the
same codepath. It was also wrong, because we didn't round.
Also, the liveness check itself should be <, and not <=, because we only have
1 second resolution.
Specifically, a circ attempt that we'd launched while the network was
down could timeout after we've marked our entrynodes up, marking them
back down again. The fix is to annotate as bad the OR conns that were
around before we did the retry, so if a circuit that's attached to them
times out we don't do anything about it.
This is needed for IOCP, since telling the IOCP backend about all
your CPUs is a good idea. It'll also come in handy with asn's
multithreaded crypto stuff, and for people who run servers without
reading the manual.
automake 1.6 doesn't like using a conditional += to add stuff to foo_LDADD.
Instead you need to conditionally define a variable, then non-conditionally
put that variable in foo_LDADD.
This requires the latest Git version of Libevent as of 24 March 2010.
In the future, we'll just say it requires Libevent 2.0.5-alpha or
later.
Since Libevent doesn't yet support hierarchical rate limit groups,
there isn't yet support for tracking relayed-bytes separately when
using the bufferevent system. If a future version does add support
for hierarchical buckets, we can add that back in.
Roger correctly pointed out that my code was broken for accounting
periods that shifted forwards, since
start_of_accounting_period_containing(interval_start_time) would not
be equal to interval_start_time, but potentially much earlier.
The RefuseUnknownExits config option is now a tristate, with "1"
meaning "enable it no matter what the consensus says", "0" meaning
"disable it no matter what the consensus says", and "auto" meaning "do
what the consensus says". If the consensus is silent, we enable
RefuseUnknownExits.
This patch also changes the dirserv logic so that refuseunknownexits
won't make us cache unless we're an exit.
Do not double-report signatures from unrecognized authorities both as
"from unknown authority" and "not present". Fixes bug 1956, bugfix on
0.2.2.16-alpha.
Our attempt to make compilation work on old versions of Windows
again while keeping wince compatibility broke the build for Win2k+.
helix reports this patch fixes the issue for WinXP. Bugfix on
0.2.2.15-alpha; related to bug 1797.
When the CellStatistics option is off, we don't store cell insertion
times. Doing so would also not be very smart, because there seem to
still be some performance issues with this type of statistics. Nothing
harmful happens when we don't have insertion times, so we don't need to
alarm the user.
Previously[*], the function would start with the first stream on the
circuit, and let it package as many cells as it wanted before
proceeding to the next stream in turn. If a circuit had many live
streams that all wanted to package data, the oldest would get
preference, and the newest would get ignored.
Now, we figure out how many cells we're willing to send per stream,
and try to allocate them fairly.
Roger diagnosed this in the comments for bug 1298.
[*] This bug has existed since before the first-ever public release
of Tor. It was added by r152 of Tor on 26 Jan 2003, which was
the first commit to implement streams (then called "topics").
This is not the oldest bug to be fixed in 0.2.2.x: that honor
goes to the windowing bug in r54, which got fixed in e50b7768 by
Roger with diagnosis by Karsten. This is, however, the most
long-lived bug to be fixed in 0.2.2.x: the r54 bug was fixed
2580 days after it was introduced, whereas I am writing this
commit message 2787 days after r152.
I'm going to use this to implement more fairness in
circuit_resume_edge_reading_helper in an attempt to fix bug 1298.
(Updated with fixes from arma and Sebastian)
An authority should never download a consensus if it has a live one,
but when it doesn't, it should admit that it's not going to get one,
and see if anybody else can give it one.
Fixes 1300, fix on 0.2.0.9-alpha
Bridges and other relays not included in the consensus don't
necessarily have a non-zero bandwidth capacity. If all our
configured bridges had a zero bw capacity we would warn the
user. Change that.
Previously, we were also considering the time spent in
soft-hibernation. If this was a long time, we would wind up
underestimating our bandwidth by a lot, and skewing our wakeup time
towards the start of the accounting interval.
This patch also makes us store a few more fields in the state file,
including the time at which we entered soft hibernation.
Fixes bug 1789. Bugfix on 0.0.9pre5.
This will make changes for DST still work, and avoid double-spending
bytes when there are slight changes to configurations.
Fixes bug 1511; the DST issue is a bugfix on 0.0.9pre5.
When we introduced the code to close non-open OR connections after
KeepalivePeriod had passed, we replaced some code that said
if (!connection_is_open(conn)) {
/* let it keep handshaking forever */
} else if (do other tests here) {
...
with new code that said
if (!connection_is_open(conn) && past_keepalive) {
/* let it keep handshaking forever */
} else if (do other tests here) {
...
This was a mistake, since it made all the other tests start applying
to non-open connections, thus causing bug 1840, where non-open
connections get closed way early.
Fixes bug 1840. Bugfix on 0.2.1.26 (commit 67b38d50).
- Move checks for extra_info to callers
- Change argument name from failed to descs
- Use strlen("fp/") instead of a magic number
- I passed on the suggestion to rename functions from *_failed() to
*_handle_failure(). There are a lot of these so for now just follow
the house style.
It's normal when bootstrapping to have a lot of different certs
missing, so we don't want missing certs to make us warn... unless
the certs we're missing are ones that we've tried to fetch a couple
of times and failed at.
May fix bug 1145.
We frequently add cells to stream-blocked queues for valid reasons
that don't mean we need to block streams. The most obvious reason
is if the cell arrives over a circuit rather than from an edge: we
don't block circuits, no matter how full queues get. The next most
obvious reason is that we allow CONNECTED cells from a newly created
stream to get delivered just fine.
This patch changes the behavior so that we only iterate over the
streams on a circuit when the cell in question came from a stream,
and we only block the stream that generated the cell, so that other
streams can still get their CONNECTEDs in.
This should keep WinCE working (unicode always-on) and get Win98
working again (unicode never-on).
There are two places where we explicitly use ASCII-only APIs, still:
in ntmain.c and in the unit tests.
This patch also fixes a bug in windoes tor_listdir that would cause
the first file to be listed an arbitrary number of times that was
also introduced with WinCE support.
Should fix bug 1797.
Setting CookieAuthFileGroupReadable but without setting CookieAuthFile makes
no sense, because unix directory permissions for the data directory prevent
the group from accessing the file anyways.
When this happens, run through the streams on the circuit and make
sure they're all blocked. If some aren't, that's a bug: block them
all and log it! If they all are, where did the cell come from? Log
it!
(I suspect that this actually happens pretty frequently, so I'm making
these log messages appear at INFO.)
Do not start reading on exit streams when we get a SENDME unless we
have space in the appropriate circuit's cell queue.
Draft fix for bug 1653.
(commit message by nickm)
router_add_to_routerlist() is supposed to be a nice minimal function
that only touches the routerlist structures, but it included a call to
dirserv_single_reachability_test().
We have a function that gets called _after_ adding descriptors
successfully: routerlist_descriptors_added. This patch moves the
responsibility for testing there.
Because the decision of whether to test or not depends on whether
there was an old routerinfo for this router or not, we have to first
detect whether we _will_ want to run the tests if the router is added.
We make this the job of
routers_update_status_from_consensus_networkstatus().
Finally, this patch makes the code notice if a router is going from
hibernating to non-hibernating, and if so causes a reachability test
to get launched.
This solves the problem Roger noted as:
What if the router has a clock that's 5 minutes off, so it
publishes a descriptor for 5 minutes in the future, and we test it
three minutes in. In this edge case, we will continue to advertise
it as Running for the full 45 minute period.
If the voting interval was short enough, the two-minutes delay
of CONSENSUS_MIN_SECONDS_BEFORE_CACHING would confuse bridges
to the point where they would assert before downloading a consensus.
It it was even shorter (<4 minutes, I think), caches would
assert too. This patch fixes that by having replacing the
two-minutes value with MIN(2 minutes, interval/16).
Bugfix for 1141; the cache bug could occur since 0.2.0.8-alpha, so
I'm calling this a bugfix on that. Robert Hogan diagnosed this.
Done as a patch against maint-0.2.1, since it makes it hard to
run some kinds of testing networks.
requests to authorities fail due to a network error.
Bug#1138
"When a Tor client starts up using a bridge, and UpdateBridgesFromAuthority
is set, Tor will go to the authority first and look up the bridge by
fingerprint. If the bridge authority is filtered, Tor will never notice that
the bridge authority lookup failed. So it will never fall back."
Add connection_dir_bridge_routerdesc_failed(), a function for unpacking
the bridge information from a failed request, and ensure
connection_dir_request_failed() calls it if the failed request
was for a bridge descriptor.
Test:
1. for ip in `grep -iR 'router ' cached-descriptors|cut -d ' ' -f 3`;
do sudo iptables -A OUTPUT -p tcp -d $ip -j DROP; done
2. remove all files from user tor directory
3. Put the following in torrc:
UseBridges 1
UpdateBridgesFromAuthority 1
Bridge 85.108.88.19:443 7E1B28DB47C175392A0E8E4A287C7CB8686575B7
4. Launch tor - it should fall back to downloading descriptors
directly from the bridge.
Initial patch reviewed and corrected by mingw-san.
https://trac.torproject.org/projects/tor/ticket/1525
"The codepath taken by the control port "RESOLVE" command to create a
synthetic SOCKS resolve request isn't the same as the path taken by
a real SOCKS request from 'tor-resolve'.
This prevents controllers who set LeaveStreamsUnattached=1 from
being able to attach RESOLVE streams to circuits of their choosing."
Create a new function connection_ap_rewrite_and_attach_if_allowed()
and call that when Tor needs to attach a stream to a circuit but
needs to know if the controller permits it.
No tests added.
With this patch we stop scheduling when we should write statistics using a
single timestamp in run_scheduled_events(). Instead, we remember when a
statistics interval starts separately for each statistic type in geoip.c
and rephist.c. Every time run_scheduled_events() tries to write stats to
disk, it learns when it should schedule the next such attempt.
This patch also enables all statistics to be stopped and restarted at a
later time.
This patch comes with a few refactorings, some of which were not easily
doable without the patch.
We already had the country code ?? indicating an unknown country, so all we
needed to do to make unknown countries excludable was to make the ?? code
discoverable.
It's okay to get (say) a SocksPort line in the torrc, and then a
SocksPort on the command line to override it, and then a SocksPort via
a controller to override *that*. But if there are two occurrences of
SocksPort in the torrc, or on the command line, or in a single SETCONF
command, then the user is likely confused. Our old code would not
help unconfuse the user, but would instead silently ignore all but
the last occurrence.
This patch changes the behavior so that if the some option is passed
more than once to any torrc, command line, or SETCONF (each of which
coincidentally corresponds to a call to config_assign()), and the
option is not a type that allows multiple occurrences (LINELIST or
LINELIST_X), then we can warn the user.
This closes trac entry 1384.
At best, this patch helps us avoid sending queued relayed cells that
would get ignored during the time between when a destroy cell is
sent and when the circuit is finally freed. At worst, it lets us
release some memory a little earlier than it would otherwise.
Fix for bug #1184. Bugfix on 0.2.0.1-alpha.
The next series of commits begins addressing the issue that we're
currently including the complete or.h file in all of our source files.
To change that, we're splitting function definitions into new header
files (one header file per source file).
Right now it says "552 internal error" because there's no way for
getinfo_helper_*() countries to specify an error message. This
patch changes the getinfo_helper_*() interface, and makes most of the
getinfo helpers give useful error messages in response to failures.
This should prevent recurrences of bug 1699, where a missing GeoIPFile
line in the torrc made GETINFO ip-to-county/* fail in a "not obvious
how to fix" way.
V3 authorities no longer decide not to vote on Guard+Exit. The bandwidth
weights should take care of this now.
Also, lower the max threshold for WFU to 0.98, to allow more nodes to become
guards.
This should make us conflict less with system files named "log.h".
Yes, we shouldn't have been conflicting with those anyway, but some
people's compilers act very oddly.
The actual change was done with one "git mv", by editing
Makefile.am, and running
find . -name '*.[ch]' | xargs perl -i -pe 'if (/^#include.*\Wlog.h/) {s/log.h/torlog.h/; }'
We now record large times as abandoned, to prevent a filter step from
happening and skewing our results.
Also, issue a warn for a rare case that can happen for funky values of Xm or
too many abandoned circuits. Can happen (very rarely) during unit tests, but
should not be possble during live operation, due to network liveness filters
and discard logic.
Many friendly operating systems have 64-bit times, and it's not nice
to pass them to an %ld format.
It's also extremely not-nice to write a time to the log as an
integer. Most people think it's 2010 June 29 23:57 UTC+epsilon, not
1277855805+epsilon.
These timers behave better with non-monotonic clocks than our old
ones, and also try harder to make once-per-second events get called
one second apart, rather than one-plus-epsilon seconds apart.
This fixes bug 943 for everybody using Libevent 2.0 or later.
We need to ensure that we close timeout measurement circuits. While
we're at it, we should close really old circuits of certain types that
aren't in use, and log really old circuits of other types.
We need to record different statistics at point of timeout, vs the point
of forcible closing.
Also, give some better names to constants and state file variables
to indicate they are not dealing with timeouts, but abandoned circuits.
In rare cases, we could cannibalize a one-hop circuit, ending up
with a two-hop circuit. This circuit would not be actually used,
but we should prevent its creation in the first place.
Thanks to outofwords and swissknife for helping to analyse this.
Most of the changes here are switches to use APIs available on Windows
CE. The most pervasive change is that Windows CE only provides the
wide-character ("FooW") variants of most of the windows function, and
doesn't support the older ASCII verions at all.
This patch will require use of the wcecompat library to get working
versions of the posix-style fd-based file IO functions.
[commit message by nickm]
Back when we changed the idea of a connection being "too old" for new
circuits into the connection being "bad" for new circuits, we didn't
actually change the info messages. This led to telling the user that
we were labelling connections as "too old" for being worse than
connections that were actually older than them.
Found by Scott on or-talk.
There are now four ways that CBT can be disabled:
1. Network-wide, with the cbtdisabled consensus param.
2. Via config, with "LearnCircuitBuildTimeout 0"
3. Via config, with "AuthoritativeDirectory 1"
4. Via a state file write failure.