The next series of commits begins addressing the issue that we're
currently including the complete or.h file in all of our source files.
To change that, we're splitting function definitions into new header
files (one header file per source file).
Right now it says "552 internal error" because there's no way for
getinfo_helper_*() countries to specify an error message. This
patch changes the getinfo_helper_*() interface, and makes most of the
getinfo helpers give useful error messages in response to failures.
This should prevent recurrences of bug 1699, where a missing GeoIPFile
line in the torrc made GETINFO ip-to-county/* fail in a "not obvious
how to fix" way.
V3 authorities no longer decide not to vote on Guard+Exit. The bandwidth
weights should take care of this now.
Also, lower the max threshold for WFU to 0.98, to allow more nodes to become
guards.
This should make us conflict less with system files named "log.h".
Yes, we shouldn't have been conflicting with those anyway, but some
people's compilers act very oddly.
The actual change was done with one "git mv", by editing
Makefile.am, and running
find . -name '*.[ch]' | xargs perl -i -pe 'if (/^#include.*\Wlog.h/) {s/log.h/torlog.h/; }'
We now record large times as abandoned, to prevent a filter step from
happening and skewing our results.
Also, issue a warn for a rare case that can happen for funky values of Xm or
too many abandoned circuits. Can happen (very rarely) during unit tests, but
should not be possble during live operation, due to network liveness filters
and discard logic.
Many friendly operating systems have 64-bit times, and it's not nice
to pass them to an %ld format.
It's also extremely not-nice to write a time to the log as an
integer. Most people think it's 2010 June 29 23:57 UTC+epsilon, not
1277855805+epsilon.
These timers behave better with non-monotonic clocks than our old
ones, and also try harder to make once-per-second events get called
one second apart, rather than one-plus-epsilon seconds apart.
This fixes bug 943 for everybody using Libevent 2.0 or later.
We need to ensure that we close timeout measurement circuits. While
we're at it, we should close really old circuits of certain types that
aren't in use, and log really old circuits of other types.
We need to record different statistics at point of timeout, vs the point
of forcible closing.
Also, give some better names to constants and state file variables
to indicate they are not dealing with timeouts, but abandoned circuits.
In rare cases, we could cannibalize a one-hop circuit, ending up
with a two-hop circuit. This circuit would not be actually used,
but we should prevent its creation in the first place.
Thanks to outofwords and swissknife for helping to analyse this.
Most of the changes here are switches to use APIs available on Windows
CE. The most pervasive change is that Windows CE only provides the
wide-character ("FooW") variants of most of the windows function, and
doesn't support the older ASCII verions at all.
This patch will require use of the wcecompat library to get working
versions of the posix-style fd-based file IO functions.
[commit message by nickm]
Back when we changed the idea of a connection being "too old" for new
circuits into the connection being "bad" for new circuits, we didn't
actually change the info messages. This led to telling the user that
we were labelling connections as "too old" for being worse than
connections that were actually older than them.
Found by Scott on or-talk.
There are now four ways that CBT can be disabled:
1. Network-wide, with the cbtdisabled consensus param.
2. Via config, with "LearnCircuitBuildTimeout 0"
3. Via config, with "AuthoritativeDirectory 1"
4. Via a state file write failure.
This should prevent some asserts and storage of incorrect build times
for the cases where Tor is suspended during a circuit construction, or
just after completing a circuit. The idea is that if the circuit
build time is much greater than we would have cut it off at, we probably
had a suspend event along this codepath, and we should discard the
value.
In case we decide that the timeout rate is now too high due to our
change of the max synthetic quantile value, this consensus parameter
will allow us to restore it to the previous value.
This is for the other issue we saw in Bug 1335. A large number of high
timeouts were causing the timeout calculation to slowly drift upwards,
especially in conditions of load. This fix repeatedly regenerates all of
our synthetic timeouts whenever the timeout changes, to try to prevent
drift.
It also lowers the timeout cap to help for some cases of Bug 1245, where
some timeout values were so large that we ended up allocating a ton of
scratch memory to count the histogram bins.
The downside is that lowering this cap is affecting our timeout rate.
Unfortunately, the buildtimeout quantile is now higher than the actual
completion rate by what appears to be about 7-10%, which probably
represents the skew in the distribution due to lowering this synthetic
cap.
In my state files, I was seeing several peaks, probably due to different
guards having different latency. This change is meant to better capture
this behavior and generate more reasonable timeouts when it happens. It
is improving the timeout values for my collection of state files.
what's happening here is that we're fetching certs for obsolete
authorities -- probably legacy signers in this case. but try to
remain general in the log message.
It's natural for the definition of bandwidth_rule_t to be with the functions
that actually care about its values. Unfortunately, this means declaring
bandwidth_rate_rule_to_string() out of sequence. Someday we'll just rename
reasons.c to strings.c, and put it at the end of or.h, and this will all be
better.
Works like the --enable-static-openssl/libevent options. Requires
--with-zlib-dir to be set. Note that other dependencies might still
pull in a dynamicly linked zlib, if you don't link them in statically
too.
Everything that accepted the 'Circ' name handled it wrong, so even now
that we fixed the handling of the parameter, we wouldn't be able to
set it without making all the 0.2.2.7..0.2.2.10 relays act wonky.
This patch makes Tors accept the 'Circuit' name instead, so we can
turn on circuit priorities without confusing the versions that treated
the 'Circ' name as occasion to act weird.
I'm adding this because I can never remember what stuff like 'rule 3'
means. That's the one where if somebody goes limp or taps out, the
fight is over, right?
When you mean (a=b(c,d)) >= 0, you had better not say (a=b(c,d)>=0).
We did the latter, and so whenever CircPriorityHalflife was in the
consensus, it was treated as having a value of 1 msec (that is,
boolean true).
We need to make sure we have an event_base in dns.c before we call
anything that wants one. Make sure we always have one in dns_reset()
when we're a client. Fixes bug 1341.
Now if you're a published relay and you set RefuseUnknownExits, even
if your dirport is off, you'll fetch dir info from the authorities,
fetch it early, and cache it.
In the future, RefuseUnknownExits (or something like it) will be on
by default.
From http://archives.seul.org/tor/relays/Mar-2010/msg00006.html :
As I understand it, the bug should show up on relays that don't set
Address to an IP address (so they need to resolve their Address
line or their hostname to guess their IP address), and their
hostname or Address line fails to resolve -- at that point they'll
pick a random 4 bytes out of memory and call that their address. At
the same time, relays that *do* successfully resolve their address
will ignore the result, and only come up with a useful address if
their interface address happens to be a public IP address.
When the bandwidth-weights branch added the "directory-footer"
token, and began parsing the directory footer at the first
occurrence of "directory-footer", it made it possible to fool the
parsing algorithm into accepting unsigned data at the end of a
consensus or vote. This patch fixes that bug by treating the footer
as starting with the first "directory-footer" or the first
"directory-signature", whichever comes first.
Treat strings returned from signed_descriptor_get_body_impl() as not
NUL-terminated. Since the length of the strings is available, this is
not a big problem.
Discovered by rieo.
Don't allow anything but directory-signature tokens in a consensus after
the first directory-signature token. Fixes bug in bandwidth-weights branch.
Found by "outofwords."
Another dereference-then-NULL-check sequence. No reports of this bug
triggered in the wild. Fixes bugreport 1256.
Thanks to ekir for discovering and reporting this bug.
Fix a dereference-then-NULL-check sequence. This bug wasn't triggered
in the wild, but we should fix it anyways in case it ever happens.
Also make sure users get a note about this being a bug when they
see it in their log.
Thanks to ekir for discovering and reporting this bug.
This means that "if (E<G) {abc} else if (E>=G) {def}" can be replaced with
"if (E<G) {abc} else {def}"
Doing the second test explicitly made my mingw gcc nervous that we might
never be initializing casename.
For my 64-bit Linux system running with GCC 4.4.3-fc12-whatever, you
can't do 'printf("%lld", (int64_t)x);' Instead you need to tell the
compiler 'printf("%lld", (long long int)x);' or else it doesn't
believe the types match. This is why we added U64_PRINTF_ARG; it
looks like we needed an I64_PRINTF_ARG too.
asprintf() is a GNU extension that some BSDs have picked up: it does a printf
into a newly allocated chunk of RAM.
Our tor_asprintf() differs from standard asprintf() in that:
- Like our other malloc functions, it asserts on OOM.
- It works on windows.
- It always sets its return-field.
All other bandwidthrate settings are restricted to INT32_MAX, but
this check was forgotten for PerConnBWRate and PerConnBWBurst. Also
update the manpage to reflect the fact that specifying a bandwidth
in terabytes does not make sense, because that value will be too
large.
Fix a dereference-then-NULL-check sequence. This bug wasn't triggered
in the wild, but we should fix it anyways in case it ever happens.
Also make sure users get a note about this being a bug when they
see it in their log.
Thanks to ekir for discovering and reporting this bug.
On Windows, we don't have a notion of ~ meaning "our homedir", so we
were deliberately using an #ifdef to avoid calling expand_filename()
in multiple places. This is silly: The right place to turn a function
into a no-op on a single platform is in the function itself, not in
every single call-site.
Spec conformance issue: The code didn't force the network-status-version
token to be the first token in a v3 vote or consensus.
Problem discovered by Parakeep.
We need to use evdns_add_server_port_with_base() when configuring
our DNS listener, because libevent segfaults otherwise. Add a macro
in compat_libevent.h to pick the correct implementation depending
on the libevent version.
Fixes bug 1143, found by SwissTorExit
The src and dest of a memcpy() call aren't supposed to overlap,
but we were sometimes calling tor_addr_copy() as a no-op.
Also, tor_addr_assign was a redundant copy of tor_addr_copy(); this patch
removes it.
We implemented ratelimiting for warnings going into the logfile, but didn't
rate-limit controller events. Now both log warnings and controller events
are rate-limited.
Tor has tor_lookup_hostname(), which prefers ipv4 addresses automatically.
Bug 1244 occured because gethostbyname() returned an ipv6 address, which
Tor cannot handle currently. Fixes bug 1244; bugfix on 0.0.2pre25.
Reported by Mike Mestnik.
The problem was that we didn't allocate enough memory on 32-bit
platforms with 64-bit time_t. The memory leak occured every time
we fetched a hidden service descriptor we've fetched before.
When calculating the is_exit flag for a routerinfo_t, we don't need
to call exit_policy_is_general_exit() if router_exit_policy_rejects_all()
tells us it definitely is an exit. This check is much cheaper than
running exit_policy_is_general_exit().
exit_policy_is_general_exit() assumed that there are no redundancies
in the passed policy, in the sense that we actively combine entries
in the policy to really get rid of any redundancy. Since we cannot
do that without massively rewriting the policy lines the relay
operators set, fix exit_policy_is_general_exit().
Fixes bug 1238, discovered by Martin Kowalczyk.
We accidentally freed the internal buffer for bridge stats when we
were writing the bridge stats file or honoring a control port
request for said data. Change the interfaces for
geoip_get_bridge_stats* to prevent these problems, and remove the
offending free/add a tor_strdup.
Fixes bug 1208.
It's a bit confusing to have a loop where another function,
confusingly named "*_free", is responsible for advancing the loop
variable (or rather, for altering a structure so that the next time
the loop variable's initializer is evaluated it evaluates to something
different.)
Not only has this confused people: it's also confused coverity scan.
Let's fix that.
This was freaking out some relay operators without good reason, as
it is nothing the relay operator can do anything about anyways.
Quieting this warning suggested by rieo.
The OutboundBindAddress option is useful for making sure that all of
your outbond connections use a given interface. But when connecting
to 127.0.0.1 (or ::1 even) it's important to actually have the
connection come _from_ localhost, since lots of programs running on
localhost use the source address to authenticate that the connection
is really coming from the same host.
Our old code always bound to OutboundBindAddress, whether connecting
to localhost or not. This would potentially break DNS servers on
localhost, and socks proxies on localhost. This patch changes the
behavior so that we only look at OutboundBindAddress when connecting
to a non-loopback address.
this case can now legitimately happen, if you have a cached v2 status
from moria1, and you run with the new list of dirservers that's missing
the old moria1. it's nothing to worry about; the file will die off in
a month or two.
...to let us
rate-limit client connections as they enter the network. It's
controlled in the consensus so we can turn it on and off for
experiments. It's starting out off. Based on proposal 163.
Specifically, there are two cases: a) are we willing to start a new
circuit at a node not in your ExitNodes config option, and b) are we
willing to make use of a circuit that's already established but has an
unsuitable exit.
Now we discard all your circuits when you set ExitNodes, so the only
way you could end up with an exit circuit that ends at an unsuitable
place is if we explicitly ran out of exit nodes, StrictNodes was 0,
and we built this circuit to solve a stream that needs solving.
Fixes bug in dc322931, which would ignore the just-built circuit because
it has an unsuitable exit.
Before it would prepend your requested entrynodes to your list of guard
nodes, but feel free to use others after that. Now it chooses only
from your EntryNodes if any of those are available, and only falls back
to others if a) they're all down and b) StrictNodes is not set.
Also, now we refresh your entry guards from EntryNode at each consensus
fetch (rather than just at startup and then they slowly rot as the
network changes).
The goal here is to make users less likely to set StrictNodes, since
it's doing closer to what they expect it should be doing.
We do this in too many places throughout the code; it's time to start
clamping down.
Also, refactor Karsten's patch to use strchr-then-strndup, rather than
malloc-then-strlcpy-then-strchr-then-clear.
Fix statistics on client numbers by country as seen by bridges that were
broken in 0.2.2.1-alpha. Also switch to reporting full 24-hour intervals
instead of variable 12-to-48-hour intervals.
The HSAuthorityRecordStats option was used to track statistics of overall
hidden service usage on the version 0 hidden service authorities. With the
version 2 hidden service directories being deployed and version 0
descriptors being phased out, these statistics are not as useful anymore.
Goodbye, you fine piece of software; my first major code contribution to
Tor.
The new rule is: safe_str_X() means "this string is a piece of X
information; make it safe to log." safe_str() on its own means
"this string is a piece of who-knows-what; make it safe to log".
The rule is now: take the value from the CircuitPriorityHalflife
config option if it is set. If it zero, disable the cell_ewma
algorithm. If it is set, use it to calculate the scaling factor.
If it is not set, look for a CircPriorityHalflifeMsec parameter in the
consensus networkstatus. If *that* is zero, then disable the cell_ewma
algorithm; if it is set, use it to calculate the scaling factor.
If it is not set at all, disable the algorithm.
In connection_dir_client_reached_eof, we make sure that we either
return when we get an http status code of 503 or handle the problem
and set it to 200. Later we check if the status code is 503. Remove
that check.
There are two big changes here:
- We store active circuits in a priority queue for each or_conn,
rather than doing a linear search over all the active circuits
before we send each cell.
- Rather than multiplying every circuit's cell-ewma by a decay
factor every time we send a cell (thus normalizing the value of a
current cell to 1.0 and a past cell to alpha^t), we instead
only scale down the cell-ewma every tick (ten seconds atm),
normalizing so that a cell sent at the start of the tick has
value 1.0).
Each circuit is ranked in terms of how many cells from it have been
relayed recently, using a time-weighted average.
This patch has been tested this on a private Tor network on PlanetLab,
and gotten improvements of 12-35% in time it takes to fetch a small
web page while there's a simultaneous large data transfer going on
simultaneously.
[Commit msg by nickm based on mail from Ian Goldberg.]
This changes the pqueue API by requiring an additional int in every
structure that we store in a pqueue to hold the index of that structure
within the heap.
Some *_free functions threw asserts when passed NULL. Now all of them
accept NULL as input and perform no action when called that way.
This gains us consistence for our free functions, and allows some
code simplifications where an explicit null check is no longer necessary.
Do not segfault when writing buffer stats when we haven't observed a
single circuit to report about. This is a minor bug that would only show
up in testing environments with no traffic and with reduced stats
intervals.
Avoid crashing if the client is trying to upload many bytes and the
circuit gets torn down at the same time, or if the flip side
happens on the exit relay. Bugfix on 0.2.0.1-alpha; fixes bug 1150.