The variable was useless since it was only toggled off in disabled code.
If the 'exit_family' smartlist is empty, we don't consider exit family
anyway.
Check for consistency between the queued destroy cells and the marked
circuit IDs. Check for consistency in the count of queued destroy
cells in several ways. Check to see whether any of the marked circuit
IDs have somehow been marked longer than the channel has existed.
And add a comment about why conditions that would cause us to drop a
cell should get checked before actions that would cause us to send a
destroy cell.
Spotted by 'cypherpunks'.
And note that these issues have been present since 0.0.8pre1 (commit
0da256ef), where we added a "shutting down" state, and started
responding to all create cells with DESTROY when shutting down.
Conflicts:
src/or/channel.c
src/or/circuitlist.c
src/or/connection.c
Conflicts involved removal of next_circ_id and addition of
unusable-circid tracking.
The point of the "idle timeout" for connections is to kill the
connection a while after it has no more circuits. But using "last
added a non-padding cell" as a proxy for that is wrong, since if the
last circuit is closed from the other side of the connection, we
will not have sent anything on that connection since well before the
last circuit closed.
This is part of fixing 6799.
When applied to 0.2.5, it is also a fix for 12023.
Instead of killing an or_connection_t that has had no circuits for
the last 3 minutes, give every or_connection_t a randomized timeout,
so that an observer can't so easily infer from the connection close
time the time at which its last circuit closed.
Also, increase the base timeout for canonical connections from 3
minutes to 15 minutes.
Fix for ticket 6799.
When we find a stranded one-hop circuit, log whether it is dirty,
log information about any streams on it, and log information about
connections they might be linked to.
This code mis-handled the case where a circuit got the same circuit
ID in both directions. I found three instances of it in the
codebase, by grepping for [pn]_circ_id.
Because of the issue in command_process_relay_cell(), this would
have made roughly one circuit in a million completely nonfunctional.
Fixes bug 12195.
On some profiles of Andrea's from #11332, I found that a great deal
of time can still be attributed to functions called from
update_router_have_minimum_dir_info(). This is making our
digestmap, tor_memeq, and siphash functions take a much bigger
portion of runtime than they really should.
If we're calling update_router_have_minimum_dir_info() too often,
that's because we're calling router_dir_info_changed() too often.
And it looks like most of the callers of router_dir_info_changed()
are coming as tail-calls from router_set_status() as invoked by
channel_do_open_actions().
But we don't need to call router_dir_info_changed() so much! (I'm
not quite sure we need to call it from here at all, but...) Surely
we don't need to call it from router_set_status when the router's
status has not actually changed.
This patch makes us call router_dir_info_changed() from
router_set_status only when we are changing the router's status.
Fix for bug 12170. This is leftover from our fix back in 273ee3e81
in 0.1.2.1-alpha, where we started caching the value of
update_router_have_minimum_dir_info().
tor_memeq has started to show up on profiles, and this is one of the
most frequent callers of that function, appearing as it does on every
cell handled for entry or exit.
59f9097d5c introduced tor_memneq here;
it went into Tor 0.2.1.31. Fixes part of 12169.
Without this fix, when running with bridges, we would try fetching
directory info far too early, and have up to a 60 second delay if we
started with bridge descriptors available.
Fixes bug 11965. Fix on 0.2.3.6-alpha, arma thinks.
The old cache had problems:
* It needed to be manually preloaded. (It didn't remember any
address you didn't tell it to remember)
* It was AF_INET only.
* It looked at its cache even if the sandbox wasn't turned on.
* It couldn't remember errors.
* It had some memory management problems. (You can't use memcpy
to copy an addrinfo safely; it has pointers in.)
This patch fixes those issues, and moves to a hash table.
Fixes bug 11970; bugfix on 0.2.5.1-alpha.
The code was not disambiguating ClientTransportPlugin configured and
not used, and ClientTransportPlugin configured, but in a failed state.
The right thing to do is to undo moving the get_transport_by_addrport()
call back into get_proxy_addrport(), and remove and explicit check for
using a Bridge since by the time the check is made, if a Bridge is
being used, it is PT/proxy-less.
This change allows using Socks4Proxy, Socks5Proxy and HTTPSProxy with
ClientTransportPlugins via the TOR_PT_PROXY extension to the
pluggable transport specification.
This fixes bug #8402.
None of the things we might exec() can possibly run under the
sanbox, so rather than crash later, we have to refuse to accept the
configuration nice and early.
The longer-term solution is to have an exec() helper, but wow is
that risky.
fixes 12043; bugfix on 0.2.5.1-alpha
When we converted the horrible set of options that previously
controlled "use ORPort or DirPort? Anonymously or Non-anonymouly?" to
a single 'indirection' argument, we missed
directory_post_to_dirservers.
The problematic code was introduced in 5cbeb6080, which went into
0.2.4.3-alpha. This is a fix for bug 11469.
If somebody has configured a client to use a bridge without setting
an identity digest (not recommended), learn the identity digest from
whatever bridge descriptor we have downloaded or have in our cache.
When running with User set, we frequently try to look up our
information in the user database (e.g., /etc/passwd). The seccomp2
sandbox setup doesn't let us open /etc/passwd, and probably
shouldn't.
To fix this, we have a pair of wrappers for getpwnam and getpwuid.
When a real call to getpwnam or getpwuid fails, they fall back to a
cached value, if the uid/gid matches.
(Granting access to /etc/passwd isn't possible with the way we
handle opening files through the sandbox. It's not desirable either.)
On OpenBSD 5.4, time_t is a 32-bit integer. These instances contain
implicit treatment of long and time_t as comparable types, so explicitly
cast to time_t.
Clients should always believe that v3 directory authorities serve
extra-info documents, regardless of whether their server descriptor
contains a "caches-extra-info" line or not.
Fixes part of #11683.
on #9686, gmorehose reports that the 500 MB lower limit is too high
for raspberry pi users.
This is a backport of 647248729f to 0.2.4.
Note that in 0.2.4, the option is called MaxMemInCellQueues.
This was previously satisfied by using a temporary variable, but there
are three other instances in circuitlist.c that gcc is now bothered by,
so now introduce a CONST_TO_ORIGIN_CIRCUIT that takes a const
circuit_t instead.
When clearing a list of tokens, it's important to do token_clear()
on them first, or else any keys they contain will leak. This didn't
leak memory on any of the successful microdescriptor parsing paths,
but it does leak on some failing paths when the failure happens
during tokenization.
Fixes bug 11618; bugfix on 0.2.2.6-alpha.
If we can't detect the physical memory, the new default is 8 GB on
64-bit architectures, and 1 GB on 32-bit architectures.
If we *can* detect the physical memory, the new default is
CLAMP(256 MB, phys_mem * 0.75, MAX_DFLT)
where MAX_DFLT is 8 GB on 64-bit architectures and 2 GB on 32-bit
architectures.
You can still override the default by hand. The logic here is simply
trying to choose a lower default value on systems with less than 12 GB
of physical RAM.
Use a per-channel ratelim_t to control the rate at which we report
failures for each channel.
Explain why I picked N=32.
Never return a zero circID.
Thanks to Andrea and to cypherpunks.
Instead of taking the length of a buffer, we were taking the length of
a pointer, so that our debugging log would cover only the first
sizeof(void*) bytes of the client nonce.
If 'intro' is NULL in these functions, I'm pretty sure that the
error message must be set before we hit the end. But scan-build
doesn't notice that, and is worried that we'll do a null-pointer
dereference in the last-chance errormsg generation.
As it stands, it relies on the fact that onion_queue_entry_remove
will magically remove each onionskin from the right list. This
patch changes the logic to be more resilient to possible bugs in
onion_queue_entry_remove, and less confusing to static analysis tools.
scan-build doesn't realize that a request can't be timed at the end
unless it's timed at the start, and so it's not possible for us to
be subtracting start from end without start being set.
Nevertheless, let's not confuse it.
When get_proxy_addrport returned PROXY_NONE, it would leave
addr/port unset. This is inconsistent, and could (if we used the
function in a stupid way) lead to undefined behavior. Bugfix on
5b050a9b0, though I don't think it affects tor-as-it-is.
Throughout circuituse, when we log about a circuit, we log its
desired path length from build_state. scan-build is irrationally
concerned that build_state might be NULL.
In circuitmux_detach_all_circuits, we check whether an HT iterator
gives us NULL. That should be impossible for an HT iterator. But
our checking it has confused scan-build (justly) into thinking that
our later use of HT_NEXT_RMV might not be kosher. I'm taking the
coward's route here and strengthening the check. Bugfix on
fd31dd44. (Not a real bug though)
If we fail in circuit_get_by_rend_token_and_purpose because the
circuit has no rend_info, don't try to reference fiends from its
rend_info when logging an error. Bugfix on 8b9a2cb68, which is
going into Tor 0.2.5.4-alpha.
Fixes a possible root cause of 11553 by only making 64 attempts at
most to pick a circuitID. Previously, we would test every possible
circuit ID until we found one or ran out.
This algorithm succeeds probabilistically. As the comment says:
This potentially causes us to give up early if our circuit ID
space is nearly full. If we have N circuit IDs in use, then we
will reject a new circuit with probability (N / max_range) ^
MAX_CIRCID_ATTEMPTS. This means that in practice, a few percent
of our circuit ID capacity will go unused.
The alternative here, though, is to do a linear search over the
whole circuit ID space every time we extend a circuit, which is
not so great either.
This makes new vs old clients distinguishable, so we should try to
batch it with other patches that do that, like 11438.
This means that tor can run without needing to communicate with ioctls
to the firewall, and therefore doesn't need to run with privileges to
open the /dev/pf device node.
A new TransProxyType is added for this purpose, "pf-divert"; if the user
specifies this TransProxyType in their torrc, then the pf device node is
never opened and the connection destination is determined with getsockname
(as per pf(4)). The default behaviour (ie., when TransProxyType is "default"
when using the pf firewall) is still to assume that pf is configured with
rdr-to rules.
This isn't on by default; to get it, you need to set "TransProxyType
ipfw". (The original patch had automatic detection for whether
/dev/pf is present and openable, but that seems marginally fragile.)
Libevent uses an arc4random implementation (I know, I know) to
generate DNS transaction IDs and capitalization. But it liked to
initialize it either with opening /dev/urandom (which won't work
under the sandbox if it doesn't use the right pointer), or with
sysctl({CTL_KERN,KERN_RANDOM,RANDOM_UUIC}). To make _that_ work, we
were permitting sysctl unconditionally. That's not such a great
idea.
Instead, we try to initialize the libevent PRNG _before_ installing
the sandbox, and make sysctl always fail with EPERM under the
sandbox.
Appearently, the majority of the filenames we pass to
sandbox_cfg_allow() functions are "freeable right after". So, consider
_all_ of them safe-to-steal, and add a tor_strdup() in the few cases
that aren't.
(Maybe buggy; revise when I can test.)
(If we don't restrict rename, there's not much point in restricting
open, since an attacker could always use rename to make us open
whatever they want.)
A new set of unit test cases are provided, as well as introducing
an alternative paradigm and macros to support it. Primarily, each test
case is given its own namespace, in order to isolate tests from each
other. We do this by in the usual fashion, by appending module and
submodule names to our symbols. New macros assist by reducing friction
for this and other tasks, like overriding a function in the global
namespace with one in the current namespace, or declaring integer
variables to assist tracking how many times a mock has been called.
A set of tests for a small-scale module has been included in this
commit, in order to highlight how the paradigm can be used. This
suite gives 100% coverage to status.c in test execution.
My first implementation was broken, since it returned "whether there
is one bridge" rather than "how many bridges."
Also, the implementation for the n_options_out feature in
choose_random_entry_impl was completely broken due to a missing *.
When we successfully create a usable circuit after it previously
timed out for a certain amount of time, we should make sure that
our public IP address hasn't changed and update our descriptor.
In C, it's a bad idea to do this:
char *cp = array;
char *end = array + array_len;
/* .... */
if (cp + 3 >= end) { /* out of bounds */ }
because cp+3 might be more than one off the end of the array, and
you are only allowed to construct pointers to the array elements,
and to an element one past the end. Instead you have to say
if (cp - array + 3 >= array_len) { /* ... */ }
or something like that.
This patch fixes two of these: one in process_versions_cell
introduced in 0.2.0.10-alpha, and one in process_certs_cell
introduced in 0.2.3.6-alpha. These are both tracked under bug
10363. "bobnomnom" found and reported both. See also 10313.
In our code, this is likely to be a problem as we used it only if we
get a nasty allocator that makes allocations end close to (void*)-1.
But it's best not to have to worry about such things at all, so
let's just fix all of these we can find.
According to reports, most programs degrade somewhat gracefully on
getting no answer for an MX or a CERT for www.example.com, but many
flip out completely on a NOTIMPL error.
Also, treat a QTYPE_ALL query as just asking for an A record.
The real fix here is to implement proposal 219 or something like it.
Fixes bug 10268; bugfix on 0.2.0.1-alpha.
Based on a patch from "epoch".
Found by testing with chutney. The old behavior was "fail an
assertion", which obviously isn't optimal.
Bugfix on 8b9a2cb68b290e550695124d7ef0511225b451d5; bug not in any
released version.
Also, stop accepting the old kind of RESOLVED cells with no TTL
fields; they haven't been sent since 0.1.1.6-alpha.
This patch won't work without the fix to #10468 -- it will break
DNSPorts unless they set the proper ipv4/6 flags on entry_connection_t.
Otherwise, it could mung the thing that came over the net on windows,
which would defeat the purpose of recording the unparseable thing.
Fixes bug 11342; bugfix on 0.2.2.1-alpha.
This is a fix for 9963. I say this is a feature, but if it's a
bugfix, it's a bugfix on 0.2.4.18-rc.
Old behavior:
Mar 27 11:02:19.000 [notice] Bootstrapped 50%: Loading relay descriptors.
Mar 27 11:02:20.000 [notice] Bootstrapped 51%: Loading relay descriptors.
Mar 27 11:02:20.000 [notice] Bootstrapped 52%: Loading relay descriptors.
... [Many lines omitted] ...
Mar 27 11:02:29.000 [notice] Bootstrapped 78%: Loading relay descriptors.
Mar 27 11:02:33.000 [notice] We now have enough directory information to build circuits.
New behavior:
Mar 27 11:16:17.000 [notice] Bootstrapped 50%: Loading relay descriptors
Mar 27 11:16:19.000 [notice] Bootstrapped 55%: Loading relay descriptors
Mar 27 11:16:21.000 [notice] Bootstrapped 60%: Loading relay descriptors
Mar 27 11:16:21.000 [notice] Bootstrapped 65%: Loading relay descriptors
Mar 27 11:16:21.000 [notice] Bootstrapped 70%: Loading relay descriptors
Mar 27 11:16:21.000 [notice] Bootstrapped 75%: Loading relay descriptors
Mar 27 11:16:21.000 [notice] We now have enough directory information to build circuits.
Most of these are simple. The only nontrivial part is that our
pattern for using ENUM_BF was confusing doxygen by making declarations
that didn't look like declarations.
In circuitlist_free_all, we free all the circuits, removing them from
the map as we go, but we weren't actually freeing the placeholder
entries that we use to indicate pending DESTROY cells.
Fix for bug 11278; bugfix on the 7912 code that was merged in
0.2.5.1-alpha
There are still quite a few 0.2.3.2x relays running for x<5, and while I
agree they should upgrade, I don't think cutting them out of the network
is a net win on either side.
This contains the obvious implementation using the circuitmux data
structure. It also runs the old (slow) algorithm and compares
the results of the two to make sure that they're the same.
Needs review and testing.
This change prevents LD_BUG warnings and bootstrap failure messages
when we try to do directory fetches when starting with
DisableNetwork == 1, a consensus present, but no descriptors (or
insufficient descriptors) yet.
Fixes bug 11200 and bug 10405. It's a bugfix on 0.2.3.9-alpha.
Thanks to mcs for walking me through the repro instructions!
This is meant to be a better bug 9229 fix -- or at least, one more
in tune with the intent of the original code, which calls
router_retry_directory_downloads() only on the first bridge descriptor.
This prevents long stalls when we're starting with a state file but
with no bridge descriptors. Fixes bug 9229. I believe this bug has
been present since 0.2.0.3-alpha.
By default, after you've made a connection to port XYZ, we assume
you might still want to have an exit ready to connect to XYZ for one
hour. This patch lets you lower that interval.
Implements ticket 91
This should fixes some "hey, that function could have
__attribute__((noreturn))" warnings introduced by f96400d9.
Bug not in any released version of Tor.
We have ignored any ports listed here since 80365b989 (0.0.7rc1),
but we didn't warn the user that we were ignoring them. This patch
adds a warning if you put explicit ports in any of the options
{Socks,Dir}Policy or AuthDir{Reject,Invalid,BadDir,BadExit}. It
also adjusts the manpage to say that ports are ignored.
Fixes ticket 11108.
In a couple of places, to implement the OOM-circuit-killer defense
against sniper attacks, we have counters to remember the age of
cells or data chunks. These timers were based on wall clock time,
which can move backwards, thus giving roll-over results for our age
calculation. This commit creates a low-budget monotonic time, based
on ratcheting gettimeofday(), so that even in the event of a time
rollback, we don't do anything _really_ stupid.
A future version of Tor should update this function to do something
even less stupid here, like employ clock_gettime() or its kin.
Back in 5e762e6a5c, non-exit servers
stopped launching DNS requests for users. So there's no need for them
to see if their DNS answers are hijacked.
Patch from Matt Pagan. I think this is a 965 fix.
For a client using a SocksPort connection and IPv6, the connect reply
from tor daemon did not handle AF_INET6 thus sending back the wrong
payload to the client.
A changes file is provided and this fixes#10987
Signed-off-by: David Goulet <dgoulet@ev0ke.net>
When we have more than two return values, we should really be using
an enum rather than "-2 means this, -1 means that, 0 means this, and
1 or more means a number."
On busy servers, this function takes up something like 3-7% in
different profiles, and gets invoked every time we need to participate
as the midpoint in a hidden service.
So maybe walking through a linked list of all the circuits here wasn't
a good idea.
We log only one message, containing a complete list of what's
wrong. We log the complete list whenever any of the possible things
that could have gotten wrong gets worse.
Fix for #9870. Bugfix on 10480dff01, which we merged in
0.2.5.1-alpha.
This patch splits out some of the functions in OOM handling so that
it's easier to check them without involving the rest of Tor or
requiring that the circuits be "wired up".
It's increasingly apparent that we want to make sure we initialize our
PRNG nice and early, or else OpenSSL will do it for us. (OpenSSL
doesn't do _too_ bad a job, but it's nice to do it ourselves.)
We'll also need this for making sure we initialize the siphash key
before we do any hashes.
I've made an exception for cases where I'm sure that users can't
influence the inputs. This is likely to cause a slowdown somewhere,
but it's safer to siphash everything and *then* look for cases to
optimize.
This patch doesn't actually get us any _benefit_ from siphash yet,
since we don't really randomize the key at any point.
These options were added back in 0.1.2.5-alpha, but no longer make any
sense now that all directories support tunneled connections and
BEGIN_DIR cells. These options were on by default; now they are
always-on.
This is a fix for 10849, where TunnelDirConns 0 would break hidden
services -- and that bug arrived, I think, in 0.2.0.10-alpha.
(There is no longer meaningfully any such thing as a HS authority,
since we stopped uploading or downloading v0 hs descriptors in
0.2.2.1-alpha.)
Implements #10881, and part of #10841.
This patch removes an "if (chan)" that occurred at a place where
chan was definitely non-NULL. Having it there made some static
analysis tools conclude that we were up to shenanigans.
This resolves#9979.
Right now this accounts for about 1% of circuits over all, but if you
pick a guard that's running 0.2.3, it will be about 6% of the circuits
running through that guard.
Making sure that every circuit has at least one ntor link means that
we're getting plausibly good forward secrecy on every circuit.
This implements ticket 9777,
It's possible to set your ExitNodes to contains only exits that don't
have the Exit flag. If you do that, we'll decide that 0 of your exits
are working. Instead, in that case we should look at nodes which have
(or which might have) exit policies that don't reject everything.
Fix for bug 10543; bugfix on 0.2.4.10-alpha.
According to control spec, longname should not contain any spaces and is
consists only of identy_digest + nickname
added two functions:
* node_get_verbose_nickname_by_id()
* node_describe_longname_by_id()
If you want a slow shutdown, send SIGNAL SHUTDOWN.
(Why not just have the default be SIGNAL QUIT? Because this case
should only happen when an owning controller has crashed, and a
crashed controller won't be able to give the user any "tor is
shutting down" feedback, and so the user gets confused for a while.
See bug 10449 for more info)
I'm doing this because:
* User doesn't mean you're running as root, and running as root
doesn't mean you've set User.
* It's possible that the user has done some other
capability-based hack to retain the necessary privileges.
The remaining vestige is that we continue to publish the V2dir flag,
and that, for the controller, we continue to emit v2 directory
formats when requested.
Previously, we would sometimes decide in directory_get_from_hs_dir()
to connect to an excluded node, and then later in
directory_initiate_command_routerstatus_rend() notice that it was
excluded and strictnodes was set, and catch it as a stopgap.
Additionally, this patch preferentially tries to fetch from
non-excluded nodes even when StrictNodes is off.
Fix for bug #10722. Bugfix on 0.2.0.10-alpha (the v2 hidserv directory
system was introduced in e136f00ca). Reported by "mr-4".
If we don't, we can wind up with a wedged cpuworker, and write to it
for ages and ages.
Found by skruffy. This was a bug in 2dda97e8fd, a.k.a. svn
revision 402. It's been there since we have been using cpuworkers.
When I introduced the unusable_for_new_circuits flag in
62fb209d83, I had a spurious ! in the
circuit_stream_is_being_handled loop. This made us decide that
non-unusable circuits (that is, usable ones) were the ones to avoid,
and caused it to launch a bunch of extra circuits.
Fixes bug 10456; bugfix on 0.2.4.12-alpha.
When we wrote the directory request statistics code in August 2009, we
thought that these statistics were only relevant for bridges, and that
bridges should not report them. That's why we added a switch to discard
relevant observations made by bridges. This code was first released in
0.2.2.1-alpha.
In May 2012 we learned that we didn't fully disable directory request
statistics on bridges. Bridges did report directory request statistics,
but these statistics contained empty dirreq-v3-ips and dirreq-v3-reqs
lines. But the remaining dirreq-* lines have always been non-empty. (We
didn't notice for almost three years, because directory-request statistics
were disabled by default until 0.2.3.1-alpha, and all statistics have been
removed from bridge descriptors before publishing them on the metrics
website.)
Proposal 201, created in May 2012, suggests to add a new line called
bridge-v3-reqs that is similar to dirreq-v3-reqs, but that is published
only by bridges. This proposal is still open as of December 2013.
Since October 2012 we're using dirreq-v3-resp (not -reqs) lines in
combination with bridge-ips lines to estimate bridge user numbers; see
task 8462. This estimation method has superseded the older approach that
was only based on bridge-ips lines in November 2013. Using dirreq-v3-resp
and bridge-ips lines is a workaround. The cleaner approach would be to
use dirreq-v3-reqs instead.
This commit makes bridges report the same directory request statistics as
relays, including dirreq-v3-ips and dirreq-v3-reqs lines. It makes
proposal 201 obsolete.
In 0.2.3.8-alpha we attempted to "completely disable stats if we aren't
running as a relay", but instead disabled them only if we aren't running
as a server.
This commit leaves DirReqStatistics enabled on both relays and bridges,
and disables (Cell,Entry,ExitPort)Statistics on bridges.
The 'body' field of a microdesc_t holds a strdup()'d value if the
microdesc's saved_location field is SAVED_IN_JOURNAL or
SAVED_NOWHERE, and holds a pointer to the middle of an mmap if the
microdesc is SAVED_IN_CACHE. But we weren't setting that field
until a while after we parsed the microdescriptor, which left an
interval where microdesc_free() would try to free() the middle of
the mmap().
This patch also includes a regression test.
This is a fix for #10409; bugfix on 0.2.2.6-alpha.
The old behavior was that NULL matched only bridges without known
identities; the correct behavior is that NULL should match all
bridges (assuming that their addr:port matches).
We were checking whether a 8-bit length field had overflowed a
503-byte buffer. Unless somebody has found a way to store "504" in a
single byte, it seems unlikely.
Fix for 10313 and 9980. Based on a pach by Jared L Wong. First found
by David Fifield with STACK.
It's conceivable (but probably impossible given our code) that lseek
could return -1 on an error; when that happens, we don't want off to
become -1.
Fixes CID 1035124.
This was a mistake in the merge commit 7a2b30fe16. It
would have made the CellStatistics code give completely bogus
results. Bug not in any released Tor.
We had accidentially grown two fake ones: one for backtrace.c, and one
for sandbox.c. Let's do this properly instead.
Now, when we configure logs, we keep track of fds that should get told
about bad stuff happening from signal handlers. There's another entry
point for these that avoids using non-signal-handler-safe functions.
On platforms with the backtrace/backtrace_symbols_fd interface, Tor
can now dump stack traces on assertion failure. By default, I log
them to DataDir/stack_dump and to stderr.
Conflicts:
src/or/or.h
src/or/relay.c
Conflicts were simple to resolve. More fixes were needed for
compilation, including: reinstating the tv_to_msec function, and renaming
*_conn_cells to *_chan_cells.
In proposal 157, we added a cross-certification element for
directory authority certificates. We implemented it in
0.2.1.9-alpha. All Tor directory authorities now generate it.
Here, as planned, make it required, so that we can finally close
proposal 157.
The biggest change in the code is in the unit test data, where some
old hardcoded certs that we made long ago have become no longer
valid and now need to be replaced.
Previously, when we ran low on memory, we'd close whichever circuits
had the most queued cells. Now, we close those that have the
*oldest* queued cells, on the theory that those are most responsible
for us running low on memory, and that those are the least likely to
actually drain on their own if we wait a little longer.
Based on analysis from a forthcoming paper by Jansen, Tschorsch,
Johnson, and Scheuermann. Fixes bug 9093.
Roger spotted this on tor-dev in his comments on proposal 221.
We etect DESTROY vs everything else, since arma likes network
timeout indicating failure but not overload indicating failure.
Don't cast uint64_t * to const uint64_t * explicitly. The cast is always
safe, so C does it for us. Doing the cast explitictly can hide bugs if
the input is secretly the wrong type.
Suggested by Nick.
As a bridge authority, before we create our networkstatus document, we
should compute the thresholds needed for the various status flags
assigned to each bridge based on the status of all other bridges. We
then add these thresholds to the networkstatus document for easy access.
Fixes for #1117 and #9859.
Also fix a bug where if the guard we choose first doesn't answer, we
would try the second guard, but once we connected to the second guard
we would abandon it and retry the first one, slowing down bootstrapping.
The fix in both cases is to treat all our initially chosen guards as
acceptable to use.
Fixes bug 9946.
We had updated our "do we have enough microdescs to begin building
circuits?" logic most recently in 0.2.4.10-alpha (see bug 5956), but we
left the bootstrap status event logic at "how far through getting 1/4
of them are we?"
Fixes bug 9958; bugfix on 0.2.2.36, which is where they diverged (see
bug 5343).
The old code had logic to use a shorter path length if we didn't
have enough nodes. But we don't support 2-node networks anwyay.
Fix for #9926. I'm not calling this a bugfix on any particular
version, since a 2-node network would fail to work for you for a lot
of other reasons too, and it's not clear to me when that began, or if
2-node networks would ever have worked.
By calling circuit_n_chan_done() unconditionally on close, we were
closing pending connections that might not have been pending quite for
the connection we were closing. Fix for bug 9880.
Thanks to skruffy for finding this and explaining it patiently until
we understood.
this was causing directory authorities to send a time of 0 on all
connections they generated themselves, which means everybody reachability
test caused a time skew warning in the log for that relay.
(i didn't just revert, because the changes file has been modified by
other later commits.)
This isn't actually much of an issue, since only relays send
AUTHENTICATE cells, but while we're removing timestamps, we might as
well do this too.
Part of proposal 222. I didn't take the approach in the proposal of
using a time-based HMAC, since that was a bad-prng-mitigation hack
from SSL3, and in real life, if you don't have a good RNG, you're
hopeless as a Tor server.
For now, round down to the nearest 10 minutes. Later, eliminate entirely by
setting a consensus parameter.
(This rounding is safe because, in 0.2.2, where the timestamp mattered,
REND_REPLAY_TIME_INTERVAL was a nice generous 60 minutes.)
We were freeing these on exit, but when we added the dl_status_map
field to them in fddb814f, we forgot to arrange for it to be freed.
I've moved the cert_list_free() code into its own function, and added
an appropriate dsmap_free() call.
Fixes bug 9644; bugfix on 0.2.4.13-alpha.
The problem was that the server_identity_key_is_set() function could
return true under conditions where we don't really have an identity
key -- specifically, where we used to have one, but we stopped being a
server.
This is a fix for 6979; bugfix on 0.2.2.18-alpha where we added that
assertion to get_server_identity_key().
Whenever we had an non-option commandline arguments *and*
option-bearing commandline arguments on the commandline, we would save
only the latter across invocations of options_init_from_torrc, but
take their existence as license not to re-parse the former. Yuck!
Incidentally, this fix lets us throw away the backup_arg[gv] logic.
Fix for bug 9746; bugfix on d98dfb3746,
not in any released Tor. Found by Damian. Thanks, Damian!
This just goes to show: never cast a function pointer. Found while
testing new command line parse logic.
Bugfix on 1293835440, which implemented
6752: Not in any released tor.
Fall back to SOMAXCONN if INT_MAX doesn't work.
We'd like to do this because the actual maximum is overrideable by the
kernel, and the value in the header file might not be right at all.
All implementations I can find out about claim that this is supported.
Fix for 9716; bugfix on every Tor.
Now we explicitly check for overflow.
This approach seemed smarter than a cascade of "change int to unsigned
int and hope nothing breaks right before the release".
Nick, feel free to fix in a better way, maybe in master.
This would make us do testing circuits "even when cbt is disabled by
consensus, or when we're a directory authority, or when we've failed
to write cbt history to our state file lately." (Roger's words.)
This is a fix for 9671 and an improvement in our fix for 5049.
The original misbehavior was in 0.2.2.14-alpha; the incomplete
fix was in 0.2.3.17-beta.
(In practice they don't exist, but so long as we're making changes for
standards compliance...)
Also add several more unit tests for good and bad URL types.
There were only two functions outside of circuitstats that actually
wanted to know what was inside this. Making the structure itself
hidden should help isolation and prevent us from spaghettifying the
thing more.
The spec requires them to do so, and not doing so creates a situation
where they can't send-test because relays won't extend to them because
of the other part of bug 9546.
Fixes bug 9546; bugfix on 0.2.3.6-alpha.
The spec requires them to do so, and not doing so creates a situation
where they can't send-test because relays won't extend to them because
of the other part of bug 9546.
Fixes bug 9546; bugfix on 0.2.3.6-alpha.
(Backport to Tor 0.2.3)
Relays previously, when initiating a connection, would only send a
NETINFO after sending an AUTHENTICATE. But bridges, when receiving a
connection, would never send AUTH_CHALLENGE. So relays wouldn't
AUTHENTICATE, and wouldn't NETINFO, and then bridges would be
surprised to be receiving CREATE cells on a non-open circuit.
Fixes bug 9546.
Relays previously, when initiating a connection, would only send a
NETINFO after sending an AUTHENTICATE. But bridges, when receiving a
connection, would never send AUTH_CHALLENGE. So relays wouldn't
AUTHENTICATE, and wouldn't NETINFO, and then bridges would be
surprised to be receiving CREATE cells on a non-open circuit.
Fixes bug 9546.
Use the generic function for both the ControlPort cookie and the
ExtORPort cookie.
Also, place the global cookie variables in the heap so that we can
pass them around more easily as pointers.
Also also, fix the unit tests that broke by this change.
Conflicts:
src/or/config.h
src/or/ext_orport.c
memwipe some stack-allocated stuff
Add DOCDOC comments for state machines
Use memdup_nulterm as appropriate
Check for NULs in useraddr
Add a macro so that <= AUTH_MAX has a meaning.
- Don't leak if a transport proxy sends us a TRANSPORT command more
than once.
- Don't use smartlist_string_isin() in geoip_get_transport_history().
(pointed out by Nick)
- Use the 'join' argument of smartlist_join_strings() instead of
trying to write the separator on our own.
(pointed out by Nick)
- Document 'ext_or_transport' a bit better.
(pointed out by Nick)
- Be a bit more consistent with the types of the values of 'transport_counts'.
(pointed out by Nick)
Fortunately, later checks mean that uninitialized data can't get sent
to the network by this bug. Unfortunately, reading uninitialized heap
*can* (in some cases, with some allocators) cause a crash if you get
unlucky and go off the end of a page.
Found by asn. Bugfix on 0.2.4.1-alpha.
Both 'managed_proxy_list' and 'unconfigured_proxies_n' are global
src/or/transports.c variables that are not initialized properly when
unit tests are run.
When we moved channel_matches_target_addr_for_extend() into a separate
function, its sense was inverted from what one might expect, and we
didn't have a ! in one place where we should have.
Found by skruffy.
When we moved channel_matches_target_addr_for_extend() into a separate
function, its sense was inverted from what one might expect, and we
didn't have a ! in one place where we should have.
Found by skruffy.
I added this so I could write a unit test for ServerTransportOptions,
but it incidentally exercises the succeed-on-defaults case of
options_validate too.
Previously we would accept relative paths, but only if they contained a
slash somewhere (not at the end).
Otherwise we would silently not work. Closes: #9258. Bugfix on
0.2.3.16-alpha.
If you pass the --enable-coverage flag on the command line, we build
our testing binaries with appropriate options eo enable coverage
testing. We also build a "tor-cov" binary that has coverage enabled,
for integration tests.
On recent OSX versions, test coverage only works with clang, not gcc.
So we warn about that.
Also add a contrib/coverage script to actually run gcov with the
appropriate options to generate useful .gcov files. (Thanks to
automake, the .o files will not have the names that gcov expects to
find.)
Also, remove generated gcda and gcno files on clean.
We previously used FILENAME_PRIVATE identifiers mostly for
identifiers exposed only to the unit tests... but also for
identifiers exposed to the benchmarker, and sometimes for
identifiers exposed to a similar module, and occasionally for no
really good reason at all.
Now, we use FILENAME_PRIVATE identifiers for identifiers shared by
Tor and the unit tests. They should be defined static when we
aren't building the unit test, and globally visible otherwise. (The
STATIC macro will keep us honest here.)
For identifiers used only by the unit tests and never by Tor at all,
on the other hand, we wrap them in #ifdef TOR_UNIT_TESTS.
This is not the motivating use case for the split test/non-test
build system; it's just a test example to see how it works, and to
take a chance to clean up the code a little.
This is mainly a matter of automake trickery: we build each static
library in two versions now: one with the TOR_UNIT_TESTS macro
defined, and one without. When TOR_UNIT_TESTS is defined, we can
enable mocking and expose more functions. When it's not defined, we
can lock the binary down more.
The alternatives would be to have alternate build modes: a "testing
configuration" for building the libraries with test support, and a
"production configuration" for building them without. I don't favor
that approach, since I think it would mean more people runnning
binaries build for testing, or more people not running unit tests.
Fix a bug in the voting algorithm that could yield incorrect results
when a non-naming authority declared too many flags. Fixes bug 9200;
bugfix on 0.2.0.3-alpha.
Found by coverity scan.
This implements "algorithm 1" from my discussion of bug #9072: on OOM,
find the circuits with the longest queues, and kill them. It's also a
fix for #9063 -- without the side-effects of bug #9072.
The memory bounds aren't perfect here, and you need to be sure to
allow some slack for the rest of Tor's usage.
This isn't a perfect fix; the rest of the solutions I describe on
codeable.
In my #7912 fix, there wasn't any code to remove entries from the
(channel, circuit ID)->circuit map corresponding to queued but un-sent
DESTROYs.
Spotted by skruffy. Fixes bug 9082; bug not in any released Tor.
I added the code to pass a destroy cell to a queueing function rather
than writing it immediately, and the code to remember that we
shouldn't reuse the circuit id until the destroy is actually sent, and
the code to release the circuit id once the destroy has been sent...
and then I finished by hooking destroy_cell_queue into the rest of
Tor.
This is a reprise of the fix in bdff7e3299d78; 6905c1f6 reintroduced
that bug. Briefly: windows doesn't seem to like deleting a mapped
file. I tried adding the PROT_SHARED_DELETE flag to the createfile
all, but that didn't actually fix this issue. Fortunately, the unit
test I added in 4f4fc63fea should
prevent us from making this particular screw-up again.
This patch also tries to limit the crash potential of a failure to
write by a little bit, although it could do a better job of retaining
microdescriptor bodies.
Fix for bug 8822, bugfix on 0.2.4.12-alpha.
There's an assertion failure that can occur if a connection has
optimistic data waiting, and then the connect() call returns 0 on the
first attempt (rather than -1 and EINPROGRESS). That latter behavior
from connect() appears to be an (Open?)BSDism when dealing with remote
addresses in some cases. (At least, I've only seen it reported with
the BSDs under libevent, even when the address was 127.0.0.1. And
we've only seen this problem in Tor with OpenBSD.)
Fixes bug 9017; bugfix on 0.2.3.1-alpha, which first introduced
optimistic data. (Although you could also argue that the commented-out
connection_start_writing in 155c9b80 back in 2002 is the real source
of the issue.)
A new option TestingV3AuthVotingStartOffset is added which offsets the
starting time of the voting interval. This is possible only when
TestingTorNetwork is set.
This patch makes run_scheduled_events() check for new consensus
downloads every second when TestingTorNetwork, instead of every
minute. This should be fine, see #8532 for reasoning.
This patch also brings MIN_VOTE_SECONDS and MIN_DIST_SECONDS down from
20 to 2 seconds, unconditionally. This makes sanity checking of
misconfiguration slightly less sane.
Addresses #8532.
You can't use != to compare arbitary members of or_options_t.
(Also, generate a better error message to say which Testing* option
was set.)
Fix for bug 8992. Bugfix on b0d4ca49. Bug not in any released Tor.
- Rename n_read and n_written in origin_circuit_t to make it clear that
these are only used for CIRC_BW events.
- Extract new code in control_update_global_event_mask to new
clear_circ_bw_fields function.
- Avoid control_event_refill_global function with 13 arguments and
increase code reuse factor by moving more code from control.c to
connection.c.
- Avoid an unsafe uint32_t -> int cast.
- Add TestingEnableTbEmptyEvent option.
- Prepare functions for testing.
- Rename a few functions and improve documentation.
- Move cell_command_to_string from control.c to command.c.
- Use accessor for global_circuitlist instead of extern.
- Add a struct for cell statistics by command instead of six arrays.
- Split up control_event_circuit_cell_stats by using two helper functions.
- Add TestingEnableCellStatsEvent option.
- Prepare functions for testing.
- Rename a few variables and document a few things better.
- Move new ID= parameter in ORCONN event to end. Avoids possible trouble
from controllers that parse parameters by position, even though they
shouldn't.
Create new methods check_or_create_data_subdir() and
write_to_data_subdir() in config.c and use them throughout
rephist.c and geoip.c.
This should solve ticket #4282.
This is a fix for bug 8844, where eugenis correctly notes that there's
a sentinel value at the end of the list-of-freelists that's never
actually checked. It's a bug since the first version of the chunked
buffer code back in 0.2.0.16-alpha.
This would probably be a crash bug if it ever happens, but nobody's
ever reported something like this, so I'm unsure whether it can occur.
It would require write_to_buf, write_to_buf_zlib, read_to_buf, or
read_to_buf_tls to get an input size of more than 32K. Still, it's a
good idea to fix this kind of thing!
It appears that moria1 crashed because of one instance of this (the
one in router_counts_toward_thresholds). The other instance I fixed
won't actually have broken anything, but I think it's more clear this
way.
Fixes bug 8833; bugfix on 0.2.4.12-alpha.
We need to subtract both the current built circuits *and* the attempted
circuits from the attempt count during scaling, since *both* have already been
counted there.
It sure is a good thing we can run each test in its own process, or
else the amount of setup I needed to do to make this thing work
would have broken all the other tests.
Test mocking would have made this easier to write too.
Now we can compute the hash and signature of a dirobj before
concatenating the smartlist, and we don't need to play silly games
with sigbuf and realloc any more.
I believe this was introduced in 6bc071f765, which makes
this a fix on 0.2.0.10-alpha. But my code archeology has not extended
to actually testing that theory.
We were mixing bandwidth file entries (which are in kilobytes) with
router_get_advertised_bw() entries, which were in bytes.
Also, use router_get_advertised_bandwidth_capped() for credible_bandwidth.
Since 7536c40 only DNS results for real SOCKS requests are added to the cache,
but not DNS results for DNSPort queries or control connection RESOLVE queries.
Only cache additions would trigger ADDRMAP events on successful resolve.
Change it so that DNS results received after a RESOLVE command also generate
ADDRMAP events.
There was a bug in Tor prior to 0.2.4.10-alpha that allowed counts to
become invalid. Clipping the counts at startup allows us to rule out
log messages due to corruption from these prior Tor versions.