Long ago we used to call connection_ap_handshake_attach_circuit()
only in a few places, since connection_ap_attach_pending() attaches
all the pending connections, and does so regularly. But this turned
out to have a performance problem: it would introduce a delay to
launching or connecting a stream.
We couldn't just call connection_ap_attach_pending() every time we
make a new connection, since it walks the whole connection list. So
we started calling connection_ap_attach_pending all over, instead!
But that's kind of ugly and messes up our callgraph.
So instead, we now have connection_ap_attach_pending() use a list
only of the pending connections, so we can call it much more
frequently. We have a separate function to scan the whole
connection array to see if we missed adding anything, and log a
warning if so.
Closes ticket #17590
Also, do a little light refactoring to move some variable declarations
around and make a few things const
Also fix an obnoxious bug on checking for the DONE stream end reason.
It's not a flag; it's a possible value or a variable that needs to be
masked.
Also, stop accepting the old kind of RESOLVED cells with no TTL
fields; they haven't been sent since 0.1.1.6-alpha.
This patch won't work without the fix to #10468 -- it will break
DNSPorts unless they set the proper ipv4/6 flags on entry_connection_t.
We previously used FILENAME_PRIVATE identifiers mostly for
identifiers exposed only to the unit tests... but also for
identifiers exposed to the benchmarker, and sometimes for
identifiers exposed to a similar module, and occasionally for no
really good reason at all.
Now, we use FILENAME_PRIVATE identifiers for identifiers shared by
Tor and the unit tests. They should be defined static when we
aren't building the unit test, and globally visible otherwise. (The
STATIC macro will keep us honest here.)
For identifiers used only by the unit tests and never by Tor at all,
on the other hand, we wrap them in #ifdef TOR_UNIT_TESTS.
This is not the motivating use case for the split test/non-test
build system; it's just a test example to see how it works, and to
take a chance to clean up the code a little.
In general, if we tried to use a circ for a stream, but then decided to place
that stream on a different circuit, we need to probe the original circuit
before deciding it was a "success".
We also need to do the same for cannibalized circuits that go unused.
Now, every cached_resolve_t can remember an IPv4 result *and* an IPv6
result. As a light protection against timing-based distinguishers for
IPv6 users (and against complexity!), every forward request generates
an IPv4 *and* an IPv6 request, assuming that we're an IPv6 exit. Once
we have answers or errors for both, we act accordingly.
This patch additionally makes some useful refactorings in the dns.c
code, though there is quite a bit more of useful refactoring that could
be done.
Additionally, have a new interface for the argument passed to the
evdns_callback function. Previously, it was just the original address
we were resolving. But it turns out that, on error, evdns doesn't
tell you the type of the query, so on a failure we didn't know whether
IPv4 or IPv6 queries were failing.
The new convention is to have the first byte of that argument include
the query type. I've refactored the code a bit to make that simpler.
This makes it so we can handle getting an IPv6 in the 3 different
formats we specified it for in RESOLVED cells,
END_STREAM_REASON_EXITPOLICY cells, and CONNECTED cells.
We don't cache IPv6 addresses yet, since proposal 205 isn't
implemented.
There's a refactored function for parsing connected cells; it has unit
tests.
In C, we technically aren't supposed to define our own things that
start with an underscore.
This is a purely machine-generated commit. First, I ran this script
on all the headers in src/{common,or,test,tools/*}/*.h :
==============================
use strict;
my %macros = ();
my %skipped = ();
FILE: for my $fn (@ARGV) {
my $f = $fn;
if ($fn !~ /^\.\//) {
$f = "./$fn";
}
$skipped{$fn} = 0;
open(F, $fn);
while (<F>) {
if (/^#ifndef ([A-Za-z0-9_]+)/) {
$macros{$fn} = $1;
next FILE;
}
}
}
print "#!/usr/bin/perl -w -i -p\n\n";
for my $fn (@ARGV) {
if (! exists $macros{$fn}) {
print "# No macro known for $fn!\n" if (!$skipped{$fn});
next;
}
if ($macros{$fn} !~ /_H_?$/) {
print "# Weird macro for $fn...\n";
}
my $goodmacro = uc $fn;
$goodmacro =~ s#.*/##;
$goodmacro =~ s#[\/\-\.]#_#g;
print "s/(?<![A-Za-z0-9_])$macros{$fn}(?![A-Za-z0-9_])/TOR_${goodmacro}/g;\n"
}
==============================
It produced the following output, which I then re-ran on those same files:
==============================
s/(?<![A-Za-z0-9_])_TOR_ADDRESS_H(?![A-Za-z0-9_])/TOR_ADDRESS_H/g;
s/(?<![A-Za-z0-9_])_TOR_AES_H(?![A-Za-z0-9_])/TOR_AES_H/g;
s/(?<![A-Za-z0-9_])_TOR_COMPAT_H(?![A-Za-z0-9_])/TOR_COMPAT_H/g;
s/(?<![A-Za-z0-9_])_TOR_COMPAT_LIBEVENT_H(?![A-Za-z0-9_])/TOR_COMPAT_LIBEVENT_H/g;
s/(?<![A-Za-z0-9_])_TOR_CONTAINER_H(?![A-Za-z0-9_])/TOR_CONTAINER_H/g;
s/(?<![A-Za-z0-9_])_TOR_CRYPTO_H(?![A-Za-z0-9_])/TOR_CRYPTO_H/g;
s/(?<![A-Za-z0-9_])TOR_DI_OPS_H(?![A-Za-z0-9_])/TOR_DI_OPS_H/g;
s/(?<![A-Za-z0-9_])_TOR_MEMAREA_H(?![A-Za-z0-9_])/TOR_MEMAREA_H/g;
s/(?<![A-Za-z0-9_])_TOR_MEMPOOL_H(?![A-Za-z0-9_])/TOR_MEMPOOL_H/g;
s/(?<![A-Za-z0-9_])TOR_PROCMON_H(?![A-Za-z0-9_])/TOR_PROCMON_H/g;
s/(?<![A-Za-z0-9_])_TOR_TORGZIP_H(?![A-Za-z0-9_])/TOR_TORGZIP_H/g;
s/(?<![A-Za-z0-9_])_TOR_TORINT_H(?![A-Za-z0-9_])/TOR_TORINT_H/g;
s/(?<![A-Za-z0-9_])_TOR_LOG_H(?![A-Za-z0-9_])/TOR_TORLOG_H/g;
s/(?<![A-Za-z0-9_])_TOR_TORTLS_H(?![A-Za-z0-9_])/TOR_TORTLS_H/g;
s/(?<![A-Za-z0-9_])_TOR_UTIL_H(?![A-Za-z0-9_])/TOR_UTIL_H/g;
s/(?<![A-Za-z0-9_])_TOR_BUFFERS_H(?![A-Za-z0-9_])/TOR_BUFFERS_H/g;
s/(?<![A-Za-z0-9_])_TOR_CHANNEL_H(?![A-Za-z0-9_])/TOR_CHANNEL_H/g;
s/(?<![A-Za-z0-9_])_TOR_CHANNEL_TLS_H(?![A-Za-z0-9_])/TOR_CHANNELTLS_H/g;
s/(?<![A-Za-z0-9_])_TOR_CIRCUITBUILD_H(?![A-Za-z0-9_])/TOR_CIRCUITBUILD_H/g;
s/(?<![A-Za-z0-9_])_TOR_CIRCUITLIST_H(?![A-Za-z0-9_])/TOR_CIRCUITLIST_H/g;
s/(?<![A-Za-z0-9_])_TOR_CIRCUITMUX_EWMA_H(?![A-Za-z0-9_])/TOR_CIRCUITMUX_EWMA_H/g;
s/(?<![A-Za-z0-9_])_TOR_CIRCUITMUX_H(?![A-Za-z0-9_])/TOR_CIRCUITMUX_H/g;
s/(?<![A-Za-z0-9_])_TOR_CIRCUITUSE_H(?![A-Za-z0-9_])/TOR_CIRCUITUSE_H/g;
s/(?<![A-Za-z0-9_])_TOR_COMMAND_H(?![A-Za-z0-9_])/TOR_COMMAND_H/g;
s/(?<![A-Za-z0-9_])_TOR_CONFIG_H(?![A-Za-z0-9_])/TOR_CONFIG_H/g;
s/(?<![A-Za-z0-9_])TOR_CONFPARSE_H(?![A-Za-z0-9_])/TOR_CONFPARSE_H/g;
s/(?<![A-Za-z0-9_])_TOR_CONNECTION_EDGE_H(?![A-Za-z0-9_])/TOR_CONNECTION_EDGE_H/g;
s/(?<![A-Za-z0-9_])_TOR_CONNECTION_H(?![A-Za-z0-9_])/TOR_CONNECTION_H/g;
s/(?<![A-Za-z0-9_])_TOR_CONNECTION_OR_H(?![A-Za-z0-9_])/TOR_CONNECTION_OR_H/g;
s/(?<![A-Za-z0-9_])_TOR_CONTROL_H(?![A-Za-z0-9_])/TOR_CONTROL_H/g;
s/(?<![A-Za-z0-9_])_TOR_CPUWORKER_H(?![A-Za-z0-9_])/TOR_CPUWORKER_H/g;
s/(?<![A-Za-z0-9_])_TOR_DIRECTORY_H(?![A-Za-z0-9_])/TOR_DIRECTORY_H/g;
s/(?<![A-Za-z0-9_])_TOR_DIRSERV_H(?![A-Za-z0-9_])/TOR_DIRSERV_H/g;
s/(?<![A-Za-z0-9_])_TOR_DIRVOTE_H(?![A-Za-z0-9_])/TOR_DIRVOTE_H/g;
s/(?<![A-Za-z0-9_])_TOR_DNS_H(?![A-Za-z0-9_])/TOR_DNS_H/g;
s/(?<![A-Za-z0-9_])_TOR_DNSSERV_H(?![A-Za-z0-9_])/TOR_DNSSERV_H/g;
s/(?<![A-Za-z0-9_])TOR_EVENTDNS_TOR_H(?![A-Za-z0-9_])/TOR_EVENTDNS_TOR_H/g;
s/(?<![A-Za-z0-9_])_TOR_GEOIP_H(?![A-Za-z0-9_])/TOR_GEOIP_H/g;
s/(?<![A-Za-z0-9_])_TOR_HIBERNATE_H(?![A-Za-z0-9_])/TOR_HIBERNATE_H/g;
s/(?<![A-Za-z0-9_])_TOR_MAIN_H(?![A-Za-z0-9_])/TOR_MAIN_H/g;
s/(?<![A-Za-z0-9_])_TOR_MICRODESC_H(?![A-Za-z0-9_])/TOR_MICRODESC_H/g;
s/(?<![A-Za-z0-9_])_TOR_NETWORKSTATUS_H(?![A-Za-z0-9_])/TOR_NETWORKSTATUS_H/g;
s/(?<![A-Za-z0-9_])_TOR_NODELIST_H(?![A-Za-z0-9_])/TOR_NODELIST_H/g;
s/(?<![A-Za-z0-9_])_TOR_NTMAIN_H(?![A-Za-z0-9_])/TOR_NTMAIN_H/g;
s/(?<![A-Za-z0-9_])_TOR_ONION_H(?![A-Za-z0-9_])/TOR_ONION_H/g;
s/(?<![A-Za-z0-9_])_TOR_OR_H(?![A-Za-z0-9_])/TOR_OR_H/g;
s/(?<![A-Za-z0-9_])_TOR_POLICIES_H(?![A-Za-z0-9_])/TOR_POLICIES_H/g;
s/(?<![A-Za-z0-9_])_TOR_REASONS_H(?![A-Za-z0-9_])/TOR_REASONS_H/g;
s/(?<![A-Za-z0-9_])_TOR_RELAY_H(?![A-Za-z0-9_])/TOR_RELAY_H/g;
s/(?<![A-Za-z0-9_])_TOR_RENDCLIENT_H(?![A-Za-z0-9_])/TOR_RENDCLIENT_H/g;
s/(?<![A-Za-z0-9_])_TOR_RENDCOMMON_H(?![A-Za-z0-9_])/TOR_RENDCOMMON_H/g;
s/(?<![A-Za-z0-9_])_TOR_RENDMID_H(?![A-Za-z0-9_])/TOR_RENDMID_H/g;
s/(?<![A-Za-z0-9_])_TOR_RENDSERVICE_H(?![A-Za-z0-9_])/TOR_RENDSERVICE_H/g;
s/(?<![A-Za-z0-9_])_TOR_REPHIST_H(?![A-Za-z0-9_])/TOR_REPHIST_H/g;
s/(?<![A-Za-z0-9_])_TOR_REPLAYCACHE_H(?![A-Za-z0-9_])/TOR_REPLAYCACHE_H/g;
s/(?<![A-Za-z0-9_])_TOR_ROUTER_H(?![A-Za-z0-9_])/TOR_ROUTER_H/g;
s/(?<![A-Za-z0-9_])_TOR_ROUTERLIST_H(?![A-Za-z0-9_])/TOR_ROUTERLIST_H/g;
s/(?<![A-Za-z0-9_])_TOR_ROUTERPARSE_H(?![A-Za-z0-9_])/TOR_ROUTERPARSE_H/g;
s/(?<![A-Za-z0-9_])TOR_ROUTERSET_H(?![A-Za-z0-9_])/TOR_ROUTERSET_H/g;
s/(?<![A-Za-z0-9_])TOR_STATEFILE_H(?![A-Za-z0-9_])/TOR_STATEFILE_H/g;
s/(?<![A-Za-z0-9_])_TOR_STATUS_H(?![A-Za-z0-9_])/TOR_STATUS_H/g;
s/(?<![A-Za-z0-9_])TOR_TRANSPORTS_H(?![A-Za-z0-9_])/TOR_TRANSPORTS_H/g;
s/(?<![A-Za-z0-9_])_TOR_TEST_H(?![A-Za-z0-9_])/TOR_TEST_H/g;
s/(?<![A-Za-z0-9_])_TOR_FW_HELPER_H(?![A-Za-z0-9_])/TOR_TOR_FW_HELPER_H/g;
s/(?<![A-Za-z0-9_])_TOR_FW_HELPER_NATPMP_H(?![A-Za-z0-9_])/TOR_TOR_FW_HELPER_NATPMP_H/g;
s/(?<![A-Za-z0-9_])_TOR_FW_HELPER_UPNP_H(?![A-Za-z0-9_])/TOR_TOR_FW_HELPER_UPNP_H/g;
==============================
MSVC warns if you declare a function as having a "int foo" argument
and then implement it with a "const int foo" argument, even though
the latter "const" is not a part of the function's interface.
This time, I follow grarpamp's suggestion and move the check for
.exit+AllowDotExit 0 to the top of connection_ap_rewrite_and_attach,
before any rewriting occurs. This way, .exit addresses are
forbidden as they arrive from a socks connection or a DNSPort
request, and not otherwise.
It _is_ a little more complicated than that, though. We need to
treat any .exit addresses whose source is TrackHostExits as meaning
that we can retry without that exit. We also need to treat any
.exit address that comes from an AutomapHostsOnResolve operation as
user-provided (and thus forbidden if AllowDotExits==0), so that
transitioning from AllowDotExits==1 to AllowDotExits==0 will
actually turn off automapped .exit addresses.
In this new representation for wildcarded addresses, there are no
longer any 'magic addresses': rather, "a.b c.d", "*.a.b c.d" and
"*.a.b *.c.d" are all represented by a mapping from "a.b" to "c.d". we
now distinguish them by setting bits in the addressmap_entry_t
structure, where src_wildcard is set if the source address had a
wildcard, and dst_wildcard is set if the target address had a
wildcard.
This lets the case where "*.a.b *.c.d" or "*.a.b c.d" remap the
address "a.b" get handled trivially, and lets us simplify and improve
the addressmap_match_superdomains implementation: we can now have it
run in O(parts of address) rather than O(entries in addressmap).
Conflicts:
src/or/connection.c
src/or/connection_edge.c
src/or/connection_edge.h
src/or/dnsserv.c
Some of these were a little tricky, since they touched code that
changed because of the prop171 fixes.
One-hop dirconn streams all share a session group, and get the
ISO_SESSIONGRP flag: they may share circuits with each other and
nothing else.
Anonymized dirconn streams get a new internal-use-only ISO_STREAM
flag: they may not share circuits with anything, including each other.
Our old "do we need to launch a circuit for stream S" logic was,
more or less, that if we had a pending circuit that could handle S,
we didn't need to launch a new one.
But now that we have streams isolated from one another, we need
something stronger here: It's possible that some pending C can
handle either S1 or S2, but not both.
This patch reuses the existing isolation logic for a simple
solution: when we decide during circuit launching that some pending
C would satisfy stream S1, we "hypothetically" mark C as though S1
had been connected to it. Now if S2 is incompatible with S1, it
won't be something that can attach to C, and so we'll launch a new
stream.
When the circuit becomes OPEN for the first time (with no streams
attached to it), we reset the circuit's isolation status. I'm not
too sure about this part: I wanted some way to be sure that, if all
streams that would have used a circuit die before the circuit is
done, the circuit can still get used. But I worry that this
approach could also lead to us launching too many circuits. Careful
thought needed here.
This patch adds fields to track how streams should be isolated, and
ensures that those fields are set correctly. It also adds fields to
track what streams can go on a circuit, and adds functions to see
whether a streams can go on a circuit and update the circuit
accordingly. Those functions aren't yet called.
This lets us make a lot of other stuff const, allows the compiler to
generate (slightly) better code, and will make me get slightly fewer
patches from folks who stick mutable stuff into or_options_t.
const: because not every input is an output!
Previously, if they changed in torrc during a SIGHUP, all was well,
since we would just clear all transient entries from the addrmap
thanks to bug 1345. But if you changed them from the controller, Tor
would leave old mappings in place.
The VirtualAddrNetwork bug has been here since 0.1.1.19-rc; the
AutomapHosts* bug has been here since 0.2.0.1-alpha.
Resolved conflicts in:
doc/tor.1.txt
src/or/circuitbuild.c
src/or/circuituse.c
src/or/connection_edge.c
src/or/connection_edge.h
src/or/directory.c
src/or/rendclient.c
src/or/routerlist.c
src/or/routerlist.h
These were mostly releated to the routerinfo_t->node_t conversion.
Now we believe it to be the case that we never build a circuit for our
stream that has an unsuitable exit, so we'll never need to use such
a circuit. The risk is that we have some code that builds the circuit,
but now we refuse to use it, meaning we just build a bazillion circuits
and ignore them all.
IOW, if we were using TrackExitHosts, and we added an excluded node or
removed a node from exitnodes, we wouldn't actually remove the mapping
that points us at the new node.
Also, note with an XXX022 comment a place that I think we are looking
at the wrong string.
Right now, we only consider sending stream-level SENDME cells when we
have completely flushed a connection_edge's outbuf, or when it sends
us a DATA cell. Neither of these is ideal for throughput.
This patch changes the behavior so we now call
connection_edge_consider_sending_sendme when we flush _some_ data from
an edge outbuf.
Fix for bug 2756; bugfix on svn r152.
A node_t is an abstraction over routerstatus_t, routerinfo_t, and
microdesc_t. It should try to present a consistent interface to all
of them. There should be a node_t for a server whenever there is
* A routerinfo_t for it in the routerlist
* A routerstatus_t in the current_consensus.
(note that a microdesc_t alone isn't enough to make a node_t exist,
since microdescriptors aren't usable on their own.)
There are three ways to get a node_t right now: looking it up by ID,
looking it up by nickname, and iterating over the whole list of
microdescriptors.
All (or nearly all) functions that are supposed to return "a router"
-- especially those used in building connections and circuits --
should return a node_t, not a routerinfo_t or a routerstatus_t.
A node_t should hold all the *mutable* flags about a node. This
patch moves the is_foo flags from routerinfo_t into node_t. The
flags in routerstatus_t remain, but they get set from the consensus
and should not change.
Some other highlights of this patch are:
* Looking up routerinfo and routerstatus by nickname is now
unified and based on the "look up a node by nickname" function.
This tries to look only at the values from current consensus,
and not get confused by the routerinfo_t->is_named flag, which
could get set for other weird reasons. This changes the
behavior of how authorities (when acting as clients) deal with
nodes that have been listed by nickname.
* I tried not to artificially increase the size of the diff here
by moving functions around. As a result, some functions that
now operate on nodes are now in the wrong file -- they should
get moved to nodelist.c once this refactoring settles down.
This moving should happen as part of a patch that moves
functions AND NOTHING ELSE.
* Some old code is now left around inside #if 0/1 blocks, and
should get removed once I've verified that I don't want it
sitting around to see how we used to do things.
There are still some unimplemented functions: these are flagged
with "UNIMPLEMENTED_NODELIST()." I'll work on filling in the
implementation here, piece by piece.
I wish this patch could have been smaller, but there did not seem to
be any piece of it that was independent from the rest. Moving flags
forces many functions that once returned routerinfo_t * to return
node_t *, which forces their friends to change, and so on.