Instead of taking the length of a buffer, we were taking the length of
a pointer, so that our debugging log would cover only the first
sizeof(void*) bytes of the client nonce.
If 'intro' is NULL in these functions, I'm pretty sure that the
error message must be set before we hit the end. But scan-build
doesn't notice that, and is worried that we'll do a null-pointer
dereference in the last-chance errormsg generation.
As it stands, it relies on the fact that onion_queue_entry_remove
will magically remove each onionskin from the right list. This
patch changes the logic to be more resilient to possible bugs in
onion_queue_entry_remove, and less confusing to static analysis tools.
scan-build doesn't realize that a request can't be timed at the end
unless it's timed at the start, and so it's not possible for us to
be subtracting start from end without start being set.
Nevertheless, let's not confuse it.
When get_proxy_addrport returned PROXY_NONE, it would leave
addr/port unset. This is inconsistent, and could (if we used the
function in a stupid way) lead to undefined behavior. Bugfix on
5b050a9b0, though I don't think it affects tor-as-it-is.
Throughout circuituse, when we log about a circuit, we log its
desired path length from build_state. scan-build is irrationally
concerned that build_state might be NULL.
In circuitmux_detach_all_circuits, we check whether an HT iterator
gives us NULL. That should be impossible for an HT iterator. But
our checking it has confused scan-build (justly) into thinking that
our later use of HT_NEXT_RMV might not be kosher. I'm taking the
coward's route here and strengthening the check. Bugfix on
fd31dd44. (Not a real bug though)
If we fail in circuit_get_by_rend_token_and_purpose because the
circuit has no rend_info, don't try to reference fiends from its
rend_info when logging an error. Bugfix on 8b9a2cb68, which is
going into Tor 0.2.5.4-alpha.
Fixes a possible root cause of 11553 by only making 64 attempts at
most to pick a circuitID. Previously, we would test every possible
circuit ID until we found one or ran out.
This algorithm succeeds probabilistically. As the comment says:
This potentially causes us to give up early if our circuit ID
space is nearly full. If we have N circuit IDs in use, then we
will reject a new circuit with probability (N / max_range) ^
MAX_CIRCID_ATTEMPTS. This means that in practice, a few percent
of our circuit ID capacity will go unused.
The alternative here, though, is to do a linear search over the
whole circuit ID space every time we extend a circuit, which is
not so great either.
This makes new vs old clients distinguishable, so we should try to
batch it with other patches that do that, like 11438.
This means that tor can run without needing to communicate with ioctls
to the firewall, and therefore doesn't need to run with privileges to
open the /dev/pf device node.
A new TransProxyType is added for this purpose, "pf-divert"; if the user
specifies this TransProxyType in their torrc, then the pf device node is
never opened and the connection destination is determined with getsockname
(as per pf(4)). The default behaviour (ie., when TransProxyType is "default"
when using the pf firewall) is still to assume that pf is configured with
rdr-to rules.
This isn't on by default; to get it, you need to set "TransProxyType
ipfw". (The original patch had automatic detection for whether
/dev/pf is present and openable, but that seems marginally fragile.)
Libevent uses an arc4random implementation (I know, I know) to
generate DNS transaction IDs and capitalization. But it liked to
initialize it either with opening /dev/urandom (which won't work
under the sandbox if it doesn't use the right pointer), or with
sysctl({CTL_KERN,KERN_RANDOM,RANDOM_UUIC}). To make _that_ work, we
were permitting sysctl unconditionally. That's not such a great
idea.
Instead, we try to initialize the libevent PRNG _before_ installing
the sandbox, and make sysctl always fail with EPERM under the
sandbox.
Appearently, the majority of the filenames we pass to
sandbox_cfg_allow() functions are "freeable right after". So, consider
_all_ of them safe-to-steal, and add a tor_strdup() in the few cases
that aren't.
(Maybe buggy; revise when I can test.)
(If we don't restrict rename, there's not much point in restricting
open, since an attacker could always use rename to make us open
whatever they want.)
A new set of unit test cases are provided, as well as introducing
an alternative paradigm and macros to support it. Primarily, each test
case is given its own namespace, in order to isolate tests from each
other. We do this by in the usual fashion, by appending module and
submodule names to our symbols. New macros assist by reducing friction
for this and other tasks, like overriding a function in the global
namespace with one in the current namespace, or declaring integer
variables to assist tracking how many times a mock has been called.
A set of tests for a small-scale module has been included in this
commit, in order to highlight how the paradigm can be used. This
suite gives 100% coverage to status.c in test execution.
My first implementation was broken, since it returned "whether there
is one bridge" rather than "how many bridges."
Also, the implementation for the n_options_out feature in
choose_random_entry_impl was completely broken due to a missing *.
When we successfully create a usable circuit after it previously
timed out for a certain amount of time, we should make sure that
our public IP address hasn't changed and update our descriptor.
In C, it's a bad idea to do this:
char *cp = array;
char *end = array + array_len;
/* .... */
if (cp + 3 >= end) { /* out of bounds */ }
because cp+3 might be more than one off the end of the array, and
you are only allowed to construct pointers to the array elements,
and to an element one past the end. Instead you have to say
if (cp - array + 3 >= array_len) { /* ... */ }
or something like that.
This patch fixes two of these: one in process_versions_cell
introduced in 0.2.0.10-alpha, and one in process_certs_cell
introduced in 0.2.3.6-alpha. These are both tracked under bug
10363. "bobnomnom" found and reported both. See also 10313.
In our code, this is likely to be a problem as we used it only if we
get a nasty allocator that makes allocations end close to (void*)-1.
But it's best not to have to worry about such things at all, so
let's just fix all of these we can find.
According to reports, most programs degrade somewhat gracefully on
getting no answer for an MX or a CERT for www.example.com, but many
flip out completely on a NOTIMPL error.
Also, treat a QTYPE_ALL query as just asking for an A record.
The real fix here is to implement proposal 219 or something like it.
Fixes bug 10268; bugfix on 0.2.0.1-alpha.
Based on a patch from "epoch".
Found by testing with chutney. The old behavior was "fail an
assertion", which obviously isn't optimal.
Bugfix on 8b9a2cb68b290e550695124d7ef0511225b451d5; bug not in any
released version.
Also, stop accepting the old kind of RESOLVED cells with no TTL
fields; they haven't been sent since 0.1.1.6-alpha.
This patch won't work without the fix to #10468 -- it will break
DNSPorts unless they set the proper ipv4/6 flags on entry_connection_t.
Otherwise, it could mung the thing that came over the net on windows,
which would defeat the purpose of recording the unparseable thing.
Fixes bug 11342; bugfix on 0.2.2.1-alpha.
This is a fix for 9963. I say this is a feature, but if it's a
bugfix, it's a bugfix on 0.2.4.18-rc.
Old behavior:
Mar 27 11:02:19.000 [notice] Bootstrapped 50%: Loading relay descriptors.
Mar 27 11:02:20.000 [notice] Bootstrapped 51%: Loading relay descriptors.
Mar 27 11:02:20.000 [notice] Bootstrapped 52%: Loading relay descriptors.
... [Many lines omitted] ...
Mar 27 11:02:29.000 [notice] Bootstrapped 78%: Loading relay descriptors.
Mar 27 11:02:33.000 [notice] We now have enough directory information to build circuits.
New behavior:
Mar 27 11:16:17.000 [notice] Bootstrapped 50%: Loading relay descriptors
Mar 27 11:16:19.000 [notice] Bootstrapped 55%: Loading relay descriptors
Mar 27 11:16:21.000 [notice] Bootstrapped 60%: Loading relay descriptors
Mar 27 11:16:21.000 [notice] Bootstrapped 65%: Loading relay descriptors
Mar 27 11:16:21.000 [notice] Bootstrapped 70%: Loading relay descriptors
Mar 27 11:16:21.000 [notice] Bootstrapped 75%: Loading relay descriptors
Mar 27 11:16:21.000 [notice] We now have enough directory information to build circuits.
Most of these are simple. The only nontrivial part is that our
pattern for using ENUM_BF was confusing doxygen by making declarations
that didn't look like declarations.
In circuitlist_free_all, we free all the circuits, removing them from
the map as we go, but we weren't actually freeing the placeholder
entries that we use to indicate pending DESTROY cells.
Fix for bug 11278; bugfix on the 7912 code that was merged in
0.2.5.1-alpha
There are still quite a few 0.2.3.2x relays running for x<5, and while I
agree they should upgrade, I don't think cutting them out of the network
is a net win on either side.
This contains the obvious implementation using the circuitmux data
structure. It also runs the old (slow) algorithm and compares
the results of the two to make sure that they're the same.
Needs review and testing.
This change prevents LD_BUG warnings and bootstrap failure messages
when we try to do directory fetches when starting with
DisableNetwork == 1, a consensus present, but no descriptors (or
insufficient descriptors) yet.
Fixes bug 11200 and bug 10405. It's a bugfix on 0.2.3.9-alpha.
Thanks to mcs for walking me through the repro instructions!
This is meant to be a better bug 9229 fix -- or at least, one more
in tune with the intent of the original code, which calls
router_retry_directory_downloads() only on the first bridge descriptor.
This prevents long stalls when we're starting with a state file but
with no bridge descriptors. Fixes bug 9229. I believe this bug has
been present since 0.2.0.3-alpha.
By default, after you've made a connection to port XYZ, we assume
you might still want to have an exit ready to connect to XYZ for one
hour. This patch lets you lower that interval.
Implements ticket 91
This should fixes some "hey, that function could have
__attribute__((noreturn))" warnings introduced by f96400d9.
Bug not in any released version of Tor.
We have ignored any ports listed here since 80365b989 (0.0.7rc1),
but we didn't warn the user that we were ignoring them. This patch
adds a warning if you put explicit ports in any of the options
{Socks,Dir}Policy or AuthDir{Reject,Invalid,BadDir,BadExit}. It
also adjusts the manpage to say that ports are ignored.
Fixes ticket 11108.
In a couple of places, to implement the OOM-circuit-killer defense
against sniper attacks, we have counters to remember the age of
cells or data chunks. These timers were based on wall clock time,
which can move backwards, thus giving roll-over results for our age
calculation. This commit creates a low-budget monotonic time, based
on ratcheting gettimeofday(), so that even in the event of a time
rollback, we don't do anything _really_ stupid.
A future version of Tor should update this function to do something
even less stupid here, like employ clock_gettime() or its kin.
Back in 5e762e6a5c, non-exit servers
stopped launching DNS requests for users. So there's no need for them
to see if their DNS answers are hijacked.
Patch from Matt Pagan. I think this is a 965 fix.
For a client using a SocksPort connection and IPv6, the connect reply
from tor daemon did not handle AF_INET6 thus sending back the wrong
payload to the client.
A changes file is provided and this fixes#10987
Signed-off-by: David Goulet <dgoulet@ev0ke.net>
When we have more than two return values, we should really be using
an enum rather than "-2 means this, -1 means that, 0 means this, and
1 or more means a number."
On busy servers, this function takes up something like 3-7% in
different profiles, and gets invoked every time we need to participate
as the midpoint in a hidden service.
So maybe walking through a linked list of all the circuits here wasn't
a good idea.
We log only one message, containing a complete list of what's
wrong. We log the complete list whenever any of the possible things
that could have gotten wrong gets worse.
Fix for #9870. Bugfix on 10480dff01, which we merged in
0.2.5.1-alpha.
This patch splits out some of the functions in OOM handling so that
it's easier to check them without involving the rest of Tor or
requiring that the circuits be "wired up".
It's increasingly apparent that we want to make sure we initialize our
PRNG nice and early, or else OpenSSL will do it for us. (OpenSSL
doesn't do _too_ bad a job, but it's nice to do it ourselves.)
We'll also need this for making sure we initialize the siphash key
before we do any hashes.
I've made an exception for cases where I'm sure that users can't
influence the inputs. This is likely to cause a slowdown somewhere,
but it's safer to siphash everything and *then* look for cases to
optimize.
This patch doesn't actually get us any _benefit_ from siphash yet,
since we don't really randomize the key at any point.
These options were added back in 0.1.2.5-alpha, but no longer make any
sense now that all directories support tunneled connections and
BEGIN_DIR cells. These options were on by default; now they are
always-on.
This is a fix for 10849, where TunnelDirConns 0 would break hidden
services -- and that bug arrived, I think, in 0.2.0.10-alpha.
(There is no longer meaningfully any such thing as a HS authority,
since we stopped uploading or downloading v0 hs descriptors in
0.2.2.1-alpha.)
Implements #10881, and part of #10841.
This patch removes an "if (chan)" that occurred at a place where
chan was definitely non-NULL. Having it there made some static
analysis tools conclude that we were up to shenanigans.
This resolves#9979.
Right now this accounts for about 1% of circuits over all, but if you
pick a guard that's running 0.2.3, it will be about 6% of the circuits
running through that guard.
Making sure that every circuit has at least one ntor link means that
we're getting plausibly good forward secrecy on every circuit.
This implements ticket 9777,
It's possible to set your ExitNodes to contains only exits that don't
have the Exit flag. If you do that, we'll decide that 0 of your exits
are working. Instead, in that case we should look at nodes which have
(or which might have) exit policies that don't reject everything.
Fix for bug 10543; bugfix on 0.2.4.10-alpha.
According to control spec, longname should not contain any spaces and is
consists only of identy_digest + nickname
added two functions:
* node_get_verbose_nickname_by_id()
* node_describe_longname_by_id()
If you want a slow shutdown, send SIGNAL SHUTDOWN.
(Why not just have the default be SIGNAL QUIT? Because this case
should only happen when an owning controller has crashed, and a
crashed controller won't be able to give the user any "tor is
shutting down" feedback, and so the user gets confused for a while.
See bug 10449 for more info)
I'm doing this because:
* User doesn't mean you're running as root, and running as root
doesn't mean you've set User.
* It's possible that the user has done some other
capability-based hack to retain the necessary privileges.
The remaining vestige is that we continue to publish the V2dir flag,
and that, for the controller, we continue to emit v2 directory
formats when requested.
Previously, we would sometimes decide in directory_get_from_hs_dir()
to connect to an excluded node, and then later in
directory_initiate_command_routerstatus_rend() notice that it was
excluded and strictnodes was set, and catch it as a stopgap.
Additionally, this patch preferentially tries to fetch from
non-excluded nodes even when StrictNodes is off.
Fix for bug #10722. Bugfix on 0.2.0.10-alpha (the v2 hidserv directory
system was introduced in e136f00ca). Reported by "mr-4".
If we don't, we can wind up with a wedged cpuworker, and write to it
for ages and ages.
Found by skruffy. This was a bug in 2dda97e8fd, a.k.a. svn
revision 402. It's been there since we have been using cpuworkers.
When I introduced the unusable_for_new_circuits flag in
62fb209d83, I had a spurious ! in the
circuit_stream_is_being_handled loop. This made us decide that
non-unusable circuits (that is, usable ones) were the ones to avoid,
and caused it to launch a bunch of extra circuits.
Fixes bug 10456; bugfix on 0.2.4.12-alpha.
When we wrote the directory request statistics code in August 2009, we
thought that these statistics were only relevant for bridges, and that
bridges should not report them. That's why we added a switch to discard
relevant observations made by bridges. This code was first released in
0.2.2.1-alpha.
In May 2012 we learned that we didn't fully disable directory request
statistics on bridges. Bridges did report directory request statistics,
but these statistics contained empty dirreq-v3-ips and dirreq-v3-reqs
lines. But the remaining dirreq-* lines have always been non-empty. (We
didn't notice for almost three years, because directory-request statistics
were disabled by default until 0.2.3.1-alpha, and all statistics have been
removed from bridge descriptors before publishing them on the metrics
website.)
Proposal 201, created in May 2012, suggests to add a new line called
bridge-v3-reqs that is similar to dirreq-v3-reqs, but that is published
only by bridges. This proposal is still open as of December 2013.
Since October 2012 we're using dirreq-v3-resp (not -reqs) lines in
combination with bridge-ips lines to estimate bridge user numbers; see
task 8462. This estimation method has superseded the older approach that
was only based on bridge-ips lines in November 2013. Using dirreq-v3-resp
and bridge-ips lines is a workaround. The cleaner approach would be to
use dirreq-v3-reqs instead.
This commit makes bridges report the same directory request statistics as
relays, including dirreq-v3-ips and dirreq-v3-reqs lines. It makes
proposal 201 obsolete.
In 0.2.3.8-alpha we attempted to "completely disable stats if we aren't
running as a relay", but instead disabled them only if we aren't running
as a server.
This commit leaves DirReqStatistics enabled on both relays and bridges,
and disables (Cell,Entry,ExitPort)Statistics on bridges.
The 'body' field of a microdesc_t holds a strdup()'d value if the
microdesc's saved_location field is SAVED_IN_JOURNAL or
SAVED_NOWHERE, and holds a pointer to the middle of an mmap if the
microdesc is SAVED_IN_CACHE. But we weren't setting that field
until a while after we parsed the microdescriptor, which left an
interval where microdesc_free() would try to free() the middle of
the mmap().
This patch also includes a regression test.
This is a fix for #10409; bugfix on 0.2.2.6-alpha.
The old behavior was that NULL matched only bridges without known
identities; the correct behavior is that NULL should match all
bridges (assuming that their addr:port matches).
We were checking whether a 8-bit length field had overflowed a
503-byte buffer. Unless somebody has found a way to store "504" in a
single byte, it seems unlikely.
Fix for 10313 and 9980. Based on a pach by Jared L Wong. First found
by David Fifield with STACK.
It's conceivable (but probably impossible given our code) that lseek
could return -1 on an error; when that happens, we don't want off to
become -1.
Fixes CID 1035124.
This was a mistake in the merge commit 7a2b30fe16. It
would have made the CellStatistics code give completely bogus
results. Bug not in any released Tor.
We had accidentially grown two fake ones: one for backtrace.c, and one
for sandbox.c. Let's do this properly instead.
Now, when we configure logs, we keep track of fds that should get told
about bad stuff happening from signal handlers. There's another entry
point for these that avoids using non-signal-handler-safe functions.
On platforms with the backtrace/backtrace_symbols_fd interface, Tor
can now dump stack traces on assertion failure. By default, I log
them to DataDir/stack_dump and to stderr.
Conflicts:
src/or/or.h
src/or/relay.c
Conflicts were simple to resolve. More fixes were needed for
compilation, including: reinstating the tv_to_msec function, and renaming
*_conn_cells to *_chan_cells.
In proposal 157, we added a cross-certification element for
directory authority certificates. We implemented it in
0.2.1.9-alpha. All Tor directory authorities now generate it.
Here, as planned, make it required, so that we can finally close
proposal 157.
The biggest change in the code is in the unit test data, where some
old hardcoded certs that we made long ago have become no longer
valid and now need to be replaced.
Previously, when we ran low on memory, we'd close whichever circuits
had the most queued cells. Now, we close those that have the
*oldest* queued cells, on the theory that those are most responsible
for us running low on memory, and that those are the least likely to
actually drain on their own if we wait a little longer.
Based on analysis from a forthcoming paper by Jansen, Tschorsch,
Johnson, and Scheuermann. Fixes bug 9093.
Roger spotted this on tor-dev in his comments on proposal 221.
We etect DESTROY vs everything else, since arma likes network
timeout indicating failure but not overload indicating failure.
Don't cast uint64_t * to const uint64_t * explicitly. The cast is always
safe, so C does it for us. Doing the cast explitictly can hide bugs if
the input is secretly the wrong type.
Suggested by Nick.
As a bridge authority, before we create our networkstatus document, we
should compute the thresholds needed for the various status flags
assigned to each bridge based on the status of all other bridges. We
then add these thresholds to the networkstatus document for easy access.
Fixes for #1117 and #9859.
Also fix a bug where if the guard we choose first doesn't answer, we
would try the second guard, but once we connected to the second guard
we would abandon it and retry the first one, slowing down bootstrapping.
The fix in both cases is to treat all our initially chosen guards as
acceptable to use.
Fixes bug 9946.
We had updated our "do we have enough microdescs to begin building
circuits?" logic most recently in 0.2.4.10-alpha (see bug 5956), but we
left the bootstrap status event logic at "how far through getting 1/4
of them are we?"
Fixes bug 9958; bugfix on 0.2.2.36, which is where they diverged (see
bug 5343).
The old code had logic to use a shorter path length if we didn't
have enough nodes. But we don't support 2-node networks anwyay.
Fix for #9926. I'm not calling this a bugfix on any particular
version, since a 2-node network would fail to work for you for a lot
of other reasons too, and it's not clear to me when that began, or if
2-node networks would ever have worked.
By calling circuit_n_chan_done() unconditionally on close, we were
closing pending connections that might not have been pending quite for
the connection we were closing. Fix for bug 9880.
Thanks to skruffy for finding this and explaining it patiently until
we understood.