When I removed version_supports_begindir, I accidentally removed the
mechanism we had been using to make a directory cache self-test its
directory port. This caused bug 6815, which caused 6814 (both in
0.2.4.2-alpha).
To fix this bug, I'm replacing the "anonymized_connection" argument to
directory_initiate_command_* with an enumeration to say how indirectly
to connect to a directory server. (I don't want to reinstate the
"version_supports_begindir" argument as "begindir_ok" or anything --
these functions already take too many arguments.)
For safety, I made sure that passing 0 and 1 for 'indirection' gives
the same result as you would have gotten before -- just in case I
missed any 0s or 1s.
With this patch, I dump the old kludge of using magic negative
numbers to indicate unknown bandwidths. I also compute each node's
weighted bandwidth exactly once, rather than computing it once in
a loop to compute the total weighted bandwidth and a second time in
a loop to find which one we picked.
Previously, we had incremented rand_bw so that when we later tested
"tmp >= rand_bw", we wouldn't have an off-by-one error. But instead,
it makes more sense to leave rand_bw alone and test "tmp > rand_bw".
Note that this is still safe. To take the example from the bug1203
writeup: Suppose that we have 3 nodes with bandwidth 1. So the
bandwidth array is { 1, 1, 1 }, and the total bandwidth is 3. We
choose rand_bw == 0, 1, or 2. With the first iteration of the loop,
tmp is now 1; with the second, tmp is 2; with the third, tmp is 3.
Now that our check is tmp > rand_bw, we will set i in the first
iteration of the loop iff rand_bw == 0; in the second iteration of
the loop iff rand_bw == 1, and in the third iff rand_bw == 2.
That's what we want.
Incidentally, this change makes the bug 6538 fix more ironclad: once
rand_bw is set to UINT64_MAX, tmp > rand_bw is obviously false
regardless of the value of tmp.
The old approach, because of its "tmp >= rand_bw &&
!i_has_been_chosen" check, would run through the second part of the
loop slightly slower than the first part. Now, we remove
i_has_been_chosen, and instead set rand_bw = UINT64_MAX, so that
every instance of the loop will do exactly the same amount of work
regardless of the initial value of rand_bw.
Fix for bug 6538.
This should make our preferred solution to #6538 easier to
implement, avoid a bunch of potential nastiness with excessive
int-vs-double math, and generally make the code there a little less
scary.
"But wait!" you say. "Is it really safe to do this? Won't the
results come out differently?"
Yes, but not much. We now round every weighted bandwidth to the
nearest byte before computing on it. This will make every node that
had a fractional part of its weighted bandwidth before either
slighty more likely or slightly less likely. Further, the rand_bw
value was only ever set with integer precision, so it can't
accurately sample routers with tiny fractional bandwidth values
anyway. Finally, doing repeated double-vs-uint64 comparisons is
just plain sad; it will involve an implicit cast to double, which is
never a fun thing.
The SMARTLIST_FOREACH macro is more convenient than BEGIN/END when
you have a nice short loop body, but using it for long bodies makes
your preprocessor tell the compiler that all the code is on the same
line. That causes grief, since compiler warnings and debugger lines
will all refer to that one line.
So, here's a new style rule: SMARTLIST_FOREACH blocks need to be
short.
Also, try to resolve some doxygen issues. First, define a magic
"This is doxygen!" macro so that we take the correct branch in
various #if/#else/#endifs in order to get the right documentation.
Second, add in a few grouping @{ and @} entries in order to get some
variables and fields to get grouped together.
This patch changes the total serverdesc threshold from 25% to 75%
and the exit threshold from 33% to 50%. The goal is to make
initially constructed circuits less horrible, and to make initial
less awful (since fetching directory information in parallel with
whatever the user is trying to do can hurt their performance).
Implements ticket 3196.
This is ticket 2479. Roger's original explanation was:
We have a series of bugs where relays publish a descriptor within
12 hours of their last descriptor, but the authorities drop it
because it's not different "enough" from the last one and it's
too close to the last one.
The original goal of this idea was to a) reduce the number of new
descriptors authorities accept (and thus have to store) and b)
reduce the total number of descriptors that clients and mirrors
fetch. It's a defense against bugs where relays publish a new
descriptor every minute.
Now that we're putting out one consensus per hour, we're doing
better at the total damage that can be caused by 'b'.
There are broader-scale design changes that would help here, and
we've had a trac entry open for years about how relays should
recognize that they're not in the consensus, or recognize when
their publish failed, and republish sooner.
In the mean time, I think we should change some of the parameters
to make the problem less painful.
When we started RefuseUnknownExits back in 0.2.2.11-alpha, we
started making exits act like they cache directory info (since they
need an up-to-date idea of who is really a router). But this
included fetching needless (unrecognized) authorities' certs, which
doesn't make any sense for them.
This is related to, but not necessarily the same as, the issue that
Ian reported for bug #2297.
(This patch is based on a patch from a user who I believe has asked
not to be named. If I'm wrong about that, please add the
appropriate name onto the changelog.)