We really should ignore any timeouts that have *no* network activity for their
entire measured lifetime, now that we have the 95th percentile measurement
changes. Usually this is up to a minute, even on fast connections.
If we really want all this complexity for these stages here, we need to handle
it better for people with large timeouts. It should probably go away, though.
Rechecking the timeout condition was foolish, because it is checked on the
same codepath. It was also wrong, because we didn't round.
Also, the liveness check itself should be <, and not <=, because we only have
1 second resolution.
Specifically, a circ attempt that we'd launched while the network was
down could timeout after we've marked our entrynodes up, marking them
back down again. The fix is to annotate as bad the OR conns that were
around before we did the retry, so if a circuit that's attached to them
times out we don't do anything about it.
We frequently add cells to stream-blocked queues for valid reasons
that don't mean we need to block streams. The most obvious reason
is if the cell arrives over a circuit rather than from an edge: we
don't block circuits, no matter how full queues get. The next most
obvious reason is that we allow CONNECTED cells from a newly created
stream to get delivered just fine.
This patch changes the behavior so that we only iterate over the
streams on a circuit when the cell in question came from a stream,
and we only block the stream that generated the cell, so that other
streams can still get their CONNECTEDs in.
Right now it says "552 internal error" because there's no way for
getinfo_helper_*() countries to specify an error message. This
patch changes the getinfo_helper_*() interface, and makes most of the
getinfo helpers give useful error messages in response to failures.
This should prevent recurrences of bug 1699, where a missing GeoIPFile
line in the torrc made GETINFO ip-to-county/* fail in a "not obvious
how to fix" way.
We now record large times as abandoned, to prevent a filter step from
happening and skewing our results.
Also, issue a warn for a rare case that can happen for funky values of Xm or
too many abandoned circuits. Can happen (very rarely) during unit tests, but
should not be possble during live operation, due to network liveness filters
and discard logic.
Many friendly operating systems have 64-bit times, and it's not nice
to pass them to an %ld format.
It's also extremely not-nice to write a time to the log as an
integer. Most people think it's 2010 June 29 23:57 UTC+epsilon, not
1277855805+epsilon.
We need to record different statistics at point of timeout, vs the point
of forcible closing.
Also, give some better names to constants and state file variables
to indicate they are not dealing with timeouts, but abandoned circuits.
There are now four ways that CBT can be disabled:
1. Network-wide, with the cbtdisabled consensus param.
2. Via config, with "LearnCircuitBuildTimeout 0"
3. Via config, with "AuthoritativeDirectory 1"
4. Via a state file write failure.
This should prevent some asserts and storage of incorrect build times
for the cases where Tor is suspended during a circuit construction, or
just after completing a circuit. The idea is that if the circuit
build time is much greater than we would have cut it off at, we probably
had a suspend event along this codepath, and we should discard the
value.
In case we decide that the timeout rate is now too high due to our
change of the max synthetic quantile value, this consensus parameter
will allow us to restore it to the previous value.
This is for the other issue we saw in Bug 1335. A large number of high
timeouts were causing the timeout calculation to slowly drift upwards,
especially in conditions of load. This fix repeatedly regenerates all of
our synthetic timeouts whenever the timeout changes, to try to prevent
drift.
It also lowers the timeout cap to help for some cases of Bug 1245, where
some timeout values were so large that we ended up allocating a ton of
scratch memory to count the histogram bins.
The downside is that lowering this cap is affecting our timeout rate.
Unfortunately, the buildtimeout quantile is now higher than the actual
completion rate by what appears to be about 7-10%, which probably
represents the skew in the distribution due to lowering this synthetic
cap.
In my state files, I was seeing several peaks, probably due to different
guards having different latency. This change is meant to better capture
this behavior and generate more reasonable timeouts when it happens. It
is improving the timeout values for my collection of state files.
Another dereference-then-NULL-check sequence. No reports of this bug
triggered in the wild. Fixes bugreport 1256.
Thanks to ekir for discovering and reporting this bug.
asprintf() is a GNU extension that some BSDs have picked up: it does a printf
into a newly allocated chunk of RAM.
Our tor_asprintf() differs from standard asprintf() in that:
- Like our other malloc functions, it asserts on OOM.
- It works on windows.
- It always sets its return-field.
Specifically, there are two cases: a) are we willing to start a new
circuit at a node not in your ExitNodes config option, and b) are we
willing to make use of a circuit that's already established but has an
unsuitable exit.
Now we discard all your circuits when you set ExitNodes, so the only
way you could end up with an exit circuit that ends at an unsuitable
place is if we explicitly ran out of exit nodes, StrictNodes was 0,
and we built this circuit to solve a stream that needs solving.
Fixes bug in dc322931, which would ignore the just-built circuit because
it has an unsuitable exit.
Before it would prepend your requested entrynodes to your list of guard
nodes, but feel free to use others after that. Now it chooses only
from your EntryNodes if any of those are available, and only falls back
to others if a) they're all down and b) StrictNodes is not set.
Also, now we refresh your entry guards from EntryNode at each consensus
fetch (rather than just at startup and then they slowly rot as the
network changes).
The goal here is to make users less likely to set StrictNodes, since
it's doing closer to what they expect it should be doing.
The new rule is: safe_str_X() means "this string is a piece of X
information; make it safe to log." safe_str() on its own means
"this string is a piece of who-knows-what; make it safe to log".
Some *_free functions threw asserts when passed NULL. Now all of them
accept NULL as input and perform no action when called that way.
This gains us consistence for our free functions, and allows some
code simplifications where an explicit null check is no longer necessary.
In C, the code "char x[10]; if (x) {...}" always takes the true branch of
the if statement. Coverity notices this now.
In some cases, we were testing arrays to make sure that an operation
we wanted to do would suceed. Those cases are now always-true.
In some cases, we were testing arrays to see if something was _set_.
Those caes are now tests for strlen(s), or tests for
!tor_mem_is_zero(d,len).
This seems to be happening to me a lot on a garbage DSL line.
We may need to come up with 2 threshholds: a high short onehop
count and a lower longer count.
Don't count one-hop circuits when we're estimating how long it
takes circuits to build on average. Otherwise we'll set our circuit
build timeout lower than we should. Bugfix on 0.2.2.2-alpha.
A) We were considering a circuit had timed out in the special cases
where we close rendezvous circuits because the final rendezvous
circuit couldn't be built in time.
B) We were looking at the wrong timestamp_created when considering
a timeout.
We want it to be under our control so it doesn't mess
up initialization. This is likely the cause for
the bug the previous assert-adding commit (09a75ad) was
trying to address.
Using CircuitBuildTimeout is prone to issues with SIGHUP, etc.
Also, shuffle the circuit build times array after loading it
in so that newer measurements don't replace chunks of
similarly timed measurements.
When we excluded some Exits, we were sometimes warning the user that we
were going to use the node regardless. Many of those warnings were in
fact bogus, because the relay in question was not used to connect to
the outside world.
Based on patch by Rotor, thanks!
Tor now reads the "circwindow" parameter out of the consensus,
and uses that value for its circuit package window rather than the
default of 1000 cells. Begins the implementation of proposal 168.
We were telling the controller about CHECKING_REACHABILITY and
REACHABILITY_FAILED status events whenever we launch a testing
circuit or notice that one has failed. Instead, only tell the
controller when we want to inform the user of overall success or
overall failure. Bugfix on 0.1.2.6-alpha. Fixes bug 1075. Reported
by SwissTorExit.
Provide a useful warning when launch_circuit tries to make us use a
node we don't want to use. Just give an info message when this is a
normal and okay situation. Fix for logging issues in bug 984.
ago. This change should significantly improve client performance,
especially once more people upgrade, since relays that have been
a guard for a long time are currently overloaded.
svn:r19287
The subversion $Id$ fields made every commit force a rebuild of
whatever file got committed. They were not actually useful for
telling the version of Tor files in the wild.
svn:r17867
use the same download mechanism as other places.
i had to make an ugly hack around "IMPOSSIBLE_TO_DOWNLOAD+1".
we should unhack that sometime.
svn:r17834
one guard from a given relay family. Otherwise we could end up with
all of our entry points into the network run by the same operator.
Suggested by Camilo Viecco. Fix on 0.1.1.11-alpha.
Not a backport candidate, since I think this might break for users
who only have a given /16 in their reachableaddresses, or something
like that.
svn:r17514
we were complaining about no support for our one-hop streams,
when in fact choose_good_exit_server_general() has no business
caring about one-hop streams. patch from miner.
svn:r17181
If not enough of our entry guards are available so we add a new
one, we might use the new one even if it overlapped with the
current circuit's exit relay (or its family). Anonymity bugfix
pointed out by rovv.
svn:r16698
Initial conversion of uint32_t addr to tor_addr_t addr in connection_t and related types. Most of the Tor wire formats using these new types are in, but the code to generate and use it is not. This is a big patch. Let me know what it breaks for you.
svn:r16435
Move n_addr, n_port, and n_conn_id_digest fields of circuit_t into a separately allocated extend_info_t. Saves 22 bytes per connected circuit_t on 32-bit platforms, and makes me more comfortable with using tor_addr_t in place of uint32_t n_addr.
svn:r16257
Refactor the router_choose_random_node interface: any function with 10 parameters, most of which are boolean and one of which is unused, should get refactored like this.
svn:r16167
Patch from Christian Wilms: remove (HiddenService|Rend)(Exclude)?Nodes options. They never worked properly, and nobody seems to be using them. Resolves bug 754.
svn:r16144
Never allow a circuit to be created with the same circid as a circuit that has been marked for close. May be a fix for bug 779. Needs testing. Backport candidate.
svn:r16136
Add new ExcludeExitNodes option. Also add a new routerset type to handle Exclude[Exit]Nodes. It is optimized for O(1) membership tests, so as to make choosing a random router run in O(N_routers) time instead of in O(N_routers*N_Excluded_Routers).
svn:r16061
to just our our entry guards for the test circuits. Otherwise we
tend to have multiple test circuits going through a single entry
guard, which makes our bandwidth test less accurate. Fixes part
of bug 654; patch contributed by Josh Albrecht.
(Actually, modify Josh's patch to avoid doing that when you're
a bridge relay, since it would leak more than we want to leak.)
svn:r15850
as soon as you run out of working bridges, rather than waiting
for ten failures -- which will never happen if you have less than
ten bridges.
svn:r15368
If you have more than one bridge but don't know their keys,
you would only learn a request for the descriptor of the first one
on your list. (Tor considered launching requests for the others, but
found that it already had a connection on the way for $0000...0000
so it didn't open another.)
If you have more than one bridge but don't know their keys, and the
connection to one of the bridges failed, you would cancel all
pending bridge connections. (After all, they all have the same
digest.)
svn:r15366
Only dump all guard node status to the log when the guard node status actually changes. Downgrade the 4 most common remaining INFO log messages to DEBUG.
svn:r14069
Add a bunch more code documentation; change the interface of fetch_var_cell_from_buf() so it takes the current link protocol into account and can't get confused by weird command bytes on v1 connections.
svn:r13430
Initial attempts to track down bug 600, and refactor possibly offending code. 1) complain early if circuit state is set to OPEN when an onionskin is pending. 2) refactor onionskin field into one only used when n_conn is pending, and a separate onionskin field waiting for attention by a cpuworker. This might even fix the bug. More likely, it will make it fail with a more useful core.
svn:r13394
When we load a bridge descriptor from the cache,
and it was previously unreachable, mark it as retriable so we won't
just ignore it. Also, try fetching a new copy immediately.
svn:r12950
Refactor circuit_launch* functions to take a bitfield of flags rather than 4 separate nonconsecutive flags arguments. Also, note a possible but in circuit_find_to_cannibalize, which seems to be ignoring its purpose argument.
svn:r12948
unexpected (it used to be in our networkstatus when we started
fetching it, but it isn't in our current networkstatus), and we
aren't using bridges. Bugfix on 0.2.0.x.
svn:r12911
using bridges or we have StrictEntryNodes set), don't mark relays
down when they fail a directory request. Otherwise we're too quick
to mark all our entry points down.
svn:r12755
enough directory information. This was causing us to always pick
two new guards on startup (bugfix on 0.2.0.9-alpha), and it was
causing us to discard all our guards on startup if we hadn't been
running for a few weeks (bugfix on 0.1.2.x). Fixes bug 448.
svn:r12570
the bridge authority could help us (for example, we don't know
a digest, or there is no bridge authority), don't be so eager to
fall back to asking the bridge authority.
svn:r12512
Add a bunch of function documentation; clean up a little code; fix some XXXXs; tag the nonsensical EXTRAINFO_PURPOSE_GENERAL as nonsesnse; note another bit of "do not cache special routers" code to nuke.
svn:r11761
users configure that and specify a bridge with an identity
fingerprint, now they will lookup the bridge descriptor at the
default bridge authority via a one-hop tunnel, but once circuits
are established they will switch to a three-hop tunnel for later
connections to the bridge authority.
svn:r11550
Add a line to the state file for each guard to let us know which version added the guard. If the line is absent, assume the guard was added by whatever version of Tor last wrote the state file. Remove guards if the version that added them was using a bad guard selection algorithm. (Previously, we removed guards if the version that wrote the file was using a bad guard selection algorithm, even if the guards themselves were chosen by a good version.)
svn:r11298
When we are loading state info from disk, never believe any date in the future. Doing so can keep us from retrying guards, rotating onion keys, storing bandwidth info, etc. Fixes bug 434, and others. Backport candidate, once it has been tested.
svn:r11166
never choose any guards past it. This way we don't expand our
guard list unless we need to. [Bugfix in 0.1.2.x]
I'm not sure if this will solve all our problems, but it is at least
something.
svn:r10730
in the networkstatuses, so we'll mark it unusable when we
load it, and then when we get a new routerinfo for it, we'll
still think it's unusable. fix that.
svn:r10612
- Only listen to responses for "authority" fetches if we're configured
to use Bridges. Otherwise it's safe (and maybe smarter) to silently
discard them like we used to.
- React faster to download networkstatuses after the first bridge
descriptor arrives.
- Don't do dir fetches before we have any bridges, even when our
dirport is open.
svn:r10604
- demand options->Bridges and options->TunnelDirConns if
options->UseBridges is set.
- after directory fetches, accept descriptors that aren't referenced by
our networkstatuses, *if* they're for a configured bridge.
- delay directory fetching until we have at least one bridge descriptor.
- learn how to build a one-hop circuit when we have neither routerinfo
nor routerstatus for our destination.
- teach directory connections how to pick a bridge as the destination
directory when doing non-anonymous fetches.
- tolerate directory commands for which the dir_port is 0.
- remember descriptors when the requested_resource was "authority",
rather than just ignoring them.
- put bridges on our entry_guards list once we have a descriptor for them.
When UseBridges is set, only pick entry guards that are bridges. Else
vice versa.
svn:r10571
It is becoming increasingly clear to me that bridges should
be a special case of entry guards, not a whole separate pile
of nearly identical functions.
svn:r10141
Initial version of circuit-based cell queues. Instead of hammering or_conns with piles of cells, queue cells on their corresponding circuits, and append them to the or_conn as needed. This seems to work so far, but needs a bit more work. This will break the memory-use-limitation patch for begin_dir conns: the solution will be a fun but fiddly.
svn:r9904
Make all LD_BUG log messsages get prefixed with "Bug: ". Remove manually-generated "Bug: "s from log-messages. (Apparently, we remembered to add them about 40% of the time.)
svn:r9733
Fix a bug found by Udo van den Heuvel: avoid an assertion failure when a controller sets and clears EntryNodes before the next call to choose_random_entry(). Also make a function static.
svn:r9669
Node-picking fixes: Never warn when a down node is listed in a config option (bug 348); always warn when a node in a config option is unnamed. Also, when entrynodes is configured, then build the guard list as: (parts of EntryNodes that were guards before), (the rest of EntryNodes), (the rest of the old guards). This last point ensures that EntryNode users will get the EntryNodes they want with the minimum change to their guard list.
svn:r9574
Implement proposal 106: stop requiring clients to have certificates, and stop checking for nicknames in certificates. [See proposal 106 for rationale.] Also improve messages when checking TLS handshake, to re-resolve bug 382.
svn:r9568
Resolve some XXXX012 items:
- Remove PathlenCoinWeight: if we want it again, we can add it
back in.
- Ditto with RelayBandwidth*.
- Decide to leave in the "hey, you didn't set end_reason!" BUG log message,
but stop telling people to bug me personally.
- Postpone strengthening assert_connection_ok(): it's important, but
it's also a good way to introduce weird bugs.
- Move some expensive consistency checking from dns_free_all() into
assert_cache_ok().
svn:r9533
Fix an XXXX012, and make circuits_pending_or_conns a static variable. In addition to cleaning up the code, this may also resolve Bug 386 if Roger has the right intuition there.
svn:r9482
Removing the last DOCDOC comment hurt so much that I had to use Doxygen to identify undocumented macros and comments, and add 150 more DOCDOCs to point out where they were. Oops. Hey, kids! Fixing some of these could be your first Tor patch!
svn:r9477
Write the entry guards section of path-spec; note a possible bug in cirbuitbuild.c; add a const; defer work on torrc.complete to be part of a bigger config documentation reorg.
svn:r9465
directory tunnel without knowing a descriptor first. Still not
ready yet. As part of the change, now assume we can use a
create_fast cell if we don't know anything about a router.
svn:r9440
Tidy up ORCONN reason patch from Mike Perry. Changes: make some of the handling of TLS error codes less error prone. Enforce house style wrt spaces. Make it compile with --enable-gcc-warnings. Only set or_conn->tls_error in the case of an actual error. Add a changelog entry.
svn:r9355
Revise logic used to flush state to disk. Now, we try to batch non-urgent changes so that we do not do too many writes, and we save very-non-urgent changes every once in a rare while, and we never save more than once per second.
svn:r9047
Apply patch from Mike Perry: add more reasons for circuit destroys. (Slightly tweaked to avoid allocating a number for an "internal" reason.)
svn:r8739
Add client support for a 'BadExit' flag, so authorities can say "Server X is a poor choise for your nytimes.com connections, as it seems to direct them to HoorayForMao.com or (more likely) WouldYouLikeToBuyTheseFineEncyclopedias.com"
svn:r8690
Touch up last patch (to add REASON to CIRC events): make some reasons
more sensible, send reasons only to controllers that have enabled
extended events, and clean up whitespace.
svn:r8672
Partial implementation of revised nickname syntax for controllers. Implement ability to look up routers by "verbose" nicknames; add a per-v1-control-connection flag to turn the feature on in events. Needs testing, spec, ability to actually turn on the flag, double-checking that we wont overflow any nickname buffers, and changelog.
svn:r8582
Remove/clarify some XXXs for no longer being accurate; for begin things we do not indend to fix; for already being parts of big todo issues (like "/* XXX ipv6 */"); etc. Also fix some spaces.
svn:r8580
Disprefer exit nodes for entry, middle positions (fixes bug 200). Also, switch to using a uint64_t to hold "total bandwidth for all nodes" under consideration; crypt_rand_int would have died at 2GB/s network capacity.
svn:r8571
Another tweak to guard logic: ignore check for the Guard flag if a server is listed on EntryNodes. (Also remove redundant checks for always-set variables.)
svn:r8522
Improvement to last entry guards patch: track when we last attempted to connect to a node in our state file along with how long it has been unreachable. Also clarify behavior of parse_iso_time() when it gets extra characters.
svn:r8520
Refactor entry guard status logic a lot; allow more factors [like not
having a Guard flag or being listed in ExcludeNodes] to render a guard
"unlisted" (now called "unusable"); track guard down status (now
called "unreachable") separately from is_running.
svn:r8519
Refactor connection_t into edge, or, dir, control, and base subtypes. This might save some RAM on busy exit servers, but really matters most in terms of correctness.
svn:r6906
circuit_t into origin_circuit_t and or_circuit_t. I fixed some
segaults; there may be more. We still need to move more rendezvous
stuff into subtypes.
This is a trial run for splitting up connection_t; if the approach is
insane, please say so soon so we can do something smarter.
Also, this discards the old HALF_OPEN code, which nobody seems to
want.
svn:r6817
already picked him, we would cycle endlessly picking him
again, being unhappy about it, and so forth.
now we specifically exclude guards when picking a new guard.
svn:r6575
connect to a different OR than lives on the addr:port we connected
to. we don't actually remember whether that was the case, currently.
so call it END_CIRC_REASON_OR_CONN_CLOSED as a compromise.
svn:r6515
when they created a network status. so if nobody asked for a
network status, they would never discover that any servers are
is_running, so they could never build a circuit.
svn:r6183
picking entry guards, rather than looking at is_fast or is_stable.
Now dirservers can change how they define it and clients will
automatically use their new definition.
svn:r5979
we screwed up the formatting in wild and unpredictable ways.
fix it before it becomes convention to format logs in wild and
unpredictable ways.
still need to do src/common/ someday.
svn:r5551
(intended to be cannibalized later for rendezvous and introduction
circuits), we were picking them so that they had useful exit nodes. There
was no need for this, and it actually aids some statistical attacks.
svn:r5453
don't tell you (it happens!); and rotate TLS connections once a week.
1) If an OR conn becomes more than a week old, make it obsolete.
2) If it's obsolete and empty, kill it.
3) When an OR makes a second connection to you, allow it.
4) If we want to send a new create cell, but the best conn we've
got is obsolete, and the router is 0.1.1.9-alpha-cvs or later, ask
for a new conn instead.
5) When we time out on circuit building on the first hop, make that
connection obsolete.
svn:r5429
first few moments of their existence in CIRCUIT_STATE_OPEN, then
when Alice sent an extend request for a Tor that they're not connected
to, they switched to CIRCUIT_STATE_OR_WAIT and spent the rest of
their sorry little lives in that state, even when the connection
was established and they were shuttling relay cells back and forth.
And I'm not going to backport this (yet), because somehow it worked!
svn:r5427
warn when we'recalling a non-named server by its nickname;
don't warn twice about the same name.
Fix a bug in routers_update_status_from_networkstatus that made nearly
all clients never update routerinfo_t.is_named.
Try to list MyFamily elements by key, not by nickname.
Only warn about names that we generated ourself, or got from the local
user.
On TLS handshake, only check the other router's nickname against its
expected nickname if is_named is set.
svn:r5185