Right now it says "552 internal error" because there's no way for
getinfo_helper_*() countries to specify an error message. This
patch changes the getinfo_helper_*() interface, and makes most of the
getinfo helpers give useful error messages in response to failures.
This should prevent recurrences of bug 1699, where a missing GeoIPFile
line in the torrc made GETINFO ip-to-county/* fail in a "not obvious
how to fix" way.
We now record large times as abandoned, to prevent a filter step from
happening and skewing our results.
Also, issue a warn for a rare case that can happen for funky values of Xm or
too many abandoned circuits. Can happen (very rarely) during unit tests, but
should not be possble during live operation, due to network liveness filters
and discard logic.
Many friendly operating systems have 64-bit times, and it's not nice
to pass them to an %ld format.
It's also extremely not-nice to write a time to the log as an
integer. Most people think it's 2010 June 29 23:57 UTC+epsilon, not
1277855805+epsilon.
We need to record different statistics at point of timeout, vs the point
of forcible closing.
Also, give some better names to constants and state file variables
to indicate they are not dealing with timeouts, but abandoned circuits.
There are now four ways that CBT can be disabled:
1. Network-wide, with the cbtdisabled consensus param.
2. Via config, with "LearnCircuitBuildTimeout 0"
3. Via config, with "AuthoritativeDirectory 1"
4. Via a state file write failure.
This should prevent some asserts and storage of incorrect build times
for the cases where Tor is suspended during a circuit construction, or
just after completing a circuit. The idea is that if the circuit
build time is much greater than we would have cut it off at, we probably
had a suspend event along this codepath, and we should discard the
value.
In case we decide that the timeout rate is now too high due to our
change of the max synthetic quantile value, this consensus parameter
will allow us to restore it to the previous value.
This is for the other issue we saw in Bug 1335. A large number of high
timeouts were causing the timeout calculation to slowly drift upwards,
especially in conditions of load. This fix repeatedly regenerates all of
our synthetic timeouts whenever the timeout changes, to try to prevent
drift.
It also lowers the timeout cap to help for some cases of Bug 1245, where
some timeout values were so large that we ended up allocating a ton of
scratch memory to count the histogram bins.
The downside is that lowering this cap is affecting our timeout rate.
Unfortunately, the buildtimeout quantile is now higher than the actual
completion rate by what appears to be about 7-10%, which probably
represents the skew in the distribution due to lowering this synthetic
cap.
In my state files, I was seeing several peaks, probably due to different
guards having different latency. This change is meant to better capture
this behavior and generate more reasonable timeouts when it happens. It
is improving the timeout values for my collection of state files.
Another dereference-then-NULL-check sequence. No reports of this bug
triggered in the wild. Fixes bugreport 1256.
Thanks to ekir for discovering and reporting this bug.
asprintf() is a GNU extension that some BSDs have picked up: it does a printf
into a newly allocated chunk of RAM.
Our tor_asprintf() differs from standard asprintf() in that:
- Like our other malloc functions, it asserts on OOM.
- It works on windows.
- It always sets its return-field.
Specifically, there are two cases: a) are we willing to start a new
circuit at a node not in your ExitNodes config option, and b) are we
willing to make use of a circuit that's already established but has an
unsuitable exit.
Now we discard all your circuits when you set ExitNodes, so the only
way you could end up with an exit circuit that ends at an unsuitable
place is if we explicitly ran out of exit nodes, StrictNodes was 0,
and we built this circuit to solve a stream that needs solving.
Fixes bug in dc322931, which would ignore the just-built circuit because
it has an unsuitable exit.
Before it would prepend your requested entrynodes to your list of guard
nodes, but feel free to use others after that. Now it chooses only
from your EntryNodes if any of those are available, and only falls back
to others if a) they're all down and b) StrictNodes is not set.
Also, now we refresh your entry guards from EntryNode at each consensus
fetch (rather than just at startup and then they slowly rot as the
network changes).
The goal here is to make users less likely to set StrictNodes, since
it's doing closer to what they expect it should be doing.
The new rule is: safe_str_X() means "this string is a piece of X
information; make it safe to log." safe_str() on its own means
"this string is a piece of who-knows-what; make it safe to log".
Some *_free functions threw asserts when passed NULL. Now all of them
accept NULL as input and perform no action when called that way.
This gains us consistence for our free functions, and allows some
code simplifications where an explicit null check is no longer necessary.
In C, the code "char x[10]; if (x) {...}" always takes the true branch of
the if statement. Coverity notices this now.
In some cases, we were testing arrays to make sure that an operation
we wanted to do would suceed. Those cases are now always-true.
In some cases, we were testing arrays to see if something was _set_.
Those caes are now tests for strlen(s), or tests for
!tor_mem_is_zero(d,len).
This seems to be happening to me a lot on a garbage DSL line.
We may need to come up with 2 threshholds: a high short onehop
count and a lower longer count.
Don't count one-hop circuits when we're estimating how long it
takes circuits to build on average. Otherwise we'll set our circuit
build timeout lower than we should. Bugfix on 0.2.2.2-alpha.
A) We were considering a circuit had timed out in the special cases
where we close rendezvous circuits because the final rendezvous
circuit couldn't be built in time.
B) We were looking at the wrong timestamp_created when considering
a timeout.
We want it to be under our control so it doesn't mess
up initialization. This is likely the cause for
the bug the previous assert-adding commit (09a75ad) was
trying to address.
Using CircuitBuildTimeout is prone to issues with SIGHUP, etc.
Also, shuffle the circuit build times array after loading it
in so that newer measurements don't replace chunks of
similarly timed measurements.
When we excluded some Exits, we were sometimes warning the user that we
were going to use the node regardless. Many of those warnings were in
fact bogus, because the relay in question was not used to connect to
the outside world.
Based on patch by Rotor, thanks!
Tor now reads the "circwindow" parameter out of the consensus,
and uses that value for its circuit package window rather than the
default of 1000 cells. Begins the implementation of proposal 168.
We were telling the controller about CHECKING_REACHABILITY and
REACHABILITY_FAILED status events whenever we launch a testing
circuit or notice that one has failed. Instead, only tell the
controller when we want to inform the user of overall success or
overall failure. Bugfix on 0.1.2.6-alpha. Fixes bug 1075. Reported
by SwissTorExit.
Provide a useful warning when launch_circuit tries to make us use a
node we don't want to use. Just give an info message when this is a
normal and okay situation. Fix for logging issues in bug 984.
ago. This change should significantly improve client performance,
especially once more people upgrade, since relays that have been
a guard for a long time are currently overloaded.
svn:r19287
The subversion $Id$ fields made every commit force a rebuild of
whatever file got committed. They were not actually useful for
telling the version of Tor files in the wild.
svn:r17867
use the same download mechanism as other places.
i had to make an ugly hack around "IMPOSSIBLE_TO_DOWNLOAD+1".
we should unhack that sometime.
svn:r17834
one guard from a given relay family. Otherwise we could end up with
all of our entry points into the network run by the same operator.
Suggested by Camilo Viecco. Fix on 0.1.1.11-alpha.
Not a backport candidate, since I think this might break for users
who only have a given /16 in their reachableaddresses, or something
like that.
svn:r17514
we were complaining about no support for our one-hop streams,
when in fact choose_good_exit_server_general() has no business
caring about one-hop streams. patch from miner.
svn:r17181
If not enough of our entry guards are available so we add a new
one, we might use the new one even if it overlapped with the
current circuit's exit relay (or its family). Anonymity bugfix
pointed out by rovv.
svn:r16698
Initial conversion of uint32_t addr to tor_addr_t addr in connection_t and related types. Most of the Tor wire formats using these new types are in, but the code to generate and use it is not. This is a big patch. Let me know what it breaks for you.
svn:r16435
Move n_addr, n_port, and n_conn_id_digest fields of circuit_t into a separately allocated extend_info_t. Saves 22 bytes per connected circuit_t on 32-bit platforms, and makes me more comfortable with using tor_addr_t in place of uint32_t n_addr.
svn:r16257
Refactor the router_choose_random_node interface: any function with 10 parameters, most of which are boolean and one of which is unused, should get refactored like this.
svn:r16167
Patch from Christian Wilms: remove (HiddenService|Rend)(Exclude)?Nodes options. They never worked properly, and nobody seems to be using them. Resolves bug 754.
svn:r16144
Never allow a circuit to be created with the same circid as a circuit that has been marked for close. May be a fix for bug 779. Needs testing. Backport candidate.
svn:r16136
Add new ExcludeExitNodes option. Also add a new routerset type to handle Exclude[Exit]Nodes. It is optimized for O(1) membership tests, so as to make choosing a random router run in O(N_routers) time instead of in O(N_routers*N_Excluded_Routers).
svn:r16061
to just our our entry guards for the test circuits. Otherwise we
tend to have multiple test circuits going through a single entry
guard, which makes our bandwidth test less accurate. Fixes part
of bug 654; patch contributed by Josh Albrecht.
(Actually, modify Josh's patch to avoid doing that when you're
a bridge relay, since it would leak more than we want to leak.)
svn:r15850
as soon as you run out of working bridges, rather than waiting
for ten failures -- which will never happen if you have less than
ten bridges.
svn:r15368
If you have more than one bridge but don't know their keys,
you would only learn a request for the descriptor of the first one
on your list. (Tor considered launching requests for the others, but
found that it already had a connection on the way for $0000...0000
so it didn't open another.)
If you have more than one bridge but don't know their keys, and the
connection to one of the bridges failed, you would cancel all
pending bridge connections. (After all, they all have the same
digest.)
svn:r15366
Only dump all guard node status to the log when the guard node status actually changes. Downgrade the 4 most common remaining INFO log messages to DEBUG.
svn:r14069
Add a bunch more code documentation; change the interface of fetch_var_cell_from_buf() so it takes the current link protocol into account and can't get confused by weird command bytes on v1 connections.
svn:r13430
Initial attempts to track down bug 600, and refactor possibly offending code. 1) complain early if circuit state is set to OPEN when an onionskin is pending. 2) refactor onionskin field into one only used when n_conn is pending, and a separate onionskin field waiting for attention by a cpuworker. This might even fix the bug. More likely, it will make it fail with a more useful core.
svn:r13394
When we load a bridge descriptor from the cache,
and it was previously unreachable, mark it as retriable so we won't
just ignore it. Also, try fetching a new copy immediately.
svn:r12950
Refactor circuit_launch* functions to take a bitfield of flags rather than 4 separate nonconsecutive flags arguments. Also, note a possible but in circuit_find_to_cannibalize, which seems to be ignoring its purpose argument.
svn:r12948
unexpected (it used to be in our networkstatus when we started
fetching it, but it isn't in our current networkstatus), and we
aren't using bridges. Bugfix on 0.2.0.x.
svn:r12911
using bridges or we have StrictEntryNodes set), don't mark relays
down when they fail a directory request. Otherwise we're too quick
to mark all our entry points down.
svn:r12755
enough directory information. This was causing us to always pick
two new guards on startup (bugfix on 0.2.0.9-alpha), and it was
causing us to discard all our guards on startup if we hadn't been
running for a few weeks (bugfix on 0.1.2.x). Fixes bug 448.
svn:r12570
the bridge authority could help us (for example, we don't know
a digest, or there is no bridge authority), don't be so eager to
fall back to asking the bridge authority.
svn:r12512
Add a bunch of function documentation; clean up a little code; fix some XXXXs; tag the nonsensical EXTRAINFO_PURPOSE_GENERAL as nonsesnse; note another bit of "do not cache special routers" code to nuke.
svn:r11761
users configure that and specify a bridge with an identity
fingerprint, now they will lookup the bridge descriptor at the
default bridge authority via a one-hop tunnel, but once circuits
are established they will switch to a three-hop tunnel for later
connections to the bridge authority.
svn:r11550