Note: this is a squashed commit; see branch bug6465_rebased_v2 of user/andrea/tor.git for full history of the following 10 commits:
Convert relay.c/relay.h to channel_t
Updating the timestamp if n_flushed > 0 at the end of channel_flush_from_first_active_circuit() was redundant since channel_write_cell() et al. do it themselves.
Get rid of now-unnecessary time parameter in channel_flush_from_first_active_circuit()
Get rid of now-unnecessary time parameter in channel_flush_from_first_active_circuit() in connection_or.c
Add non-inlined external call for channeltls.c to free a packed_cell_t
Appease make check-spaces in relay.c
Replace channel_get_write_queue_len() with sufficient and easier to implement channel_has_queued_writes() in relay.c
Rename channel_touched_by_client() and client_used field for consistency with other timestamps in relay.c
Don't double-free packed cells in relay.c (channel_t Tor now bootstraps and works as a client)
Rearrange channel_t struct to use a union distinguishing listener from cell-bearing channels in relay.c
Extend cells aren't allowed to have a stream_id, but we were only
blocking them when they had a stream_id that corresponded to a
connection. As far as I can tell, this change is harmless: it will
make some kinds of broken clients not work any more, but afaik nobody
actually make a client that was broken in that way.
Found while hunting for other places where we made the same mistake
as in 6271.
Bugfix on d7f50337c1 back from May 2003, which introduced
telescoping circuit construction into 0.0.2pre8.
This avoids a possible crash bug in flush_from_first_active_circuit.
Fixes bug 6341; bugfix on 0.2.2.7-alpha.
Bug reported and fixed by a pseudonymous user on IRC.
I only check on circuits, not streams, since bloating your stream
window past the initial circuit window can't help you much.
Also, I compare to CIRCWINDOW_START_MAX so we don't have surprising
races if we lower CIRCWINDOW_START for an experiment.
This could result in bizarre window values. Report and patch
contributed pseudymously. Fixes part of bug 6271. This bug was
introduced before the first Tor release, in svn commit r152.
(bug 6271, part a.)
This reverts commit c32ec9c425.
It turns out the two sides of the circuit don't actually stay in sync,
so it is perfectly normal for the circuit window on the exit relay to
grow to 2000+. We should fix that bug and then reconsider this patch.
I only check on circuits, not streams, since bloating your stream
window past the initial circuit window can't help you much.
Also, I compare to CIRCWINDOW_START_MAX so we don't have surprising
races if we lower CIRCWINDOW_START for an experiment.
Also, try to resolve some doxygen issues. First, define a magic
"This is doxygen!" macro so that we take the correct branch in
various #if/#else/#endifs in order to get the right documentation.
Second, add in a few grouping @{ and @} entries in order to get some
variables and fields to get grouped together.
This would happen if the deliver window could become negative
because of an nonexistent connection. (Fortunately, _that_ can't
occur, thanks to circuit_consider_sending_sendme. Still, if we
change our windowing logic at all, we won't want this to become
triggerable.) Fix for bug 5541. Bugfix on 4a66865d, back from
0.0.2pre14. asn found this. Nice catch, asn!
This time, I follow grarpamp's suggestion and move the check for
.exit+AllowDotExit 0 to the top of connection_ap_rewrite_and_attach,
before any rewriting occurs. This way, .exit addresses are
forbidden as they arrive from a socks connection or a DNSPort
request, and not otherwise.
It _is_ a little more complicated than that, though. We need to
treat any .exit addresses whose source is TrackHostExits as meaning
that we can retry without that exit. We also need to treat any
.exit address that comes from an AutomapHostsOnResolve operation as
user-provided (and thus forbidden if AllowDotExits==0), so that
transitioning from AllowDotExits==1 to AllowDotExits==0 will
actually turn off automapped .exit addresses.
We used to do this as a workaround for older Tors, but now it's never
the correct thing to do (especially since anything that didn't
understand RELAY_EARLY is now deprecated hard).
For printf, %f and %lf are synonymous, since floats are promoted to
doubles when passed as varargs. It's only for scanf that we need to
say "%lf" for doubles and "%f" for floats.
Apparenly, some older compilers think it's naughty to say %lf and like
to spew warnings about it.
Found by grarpamp.
This is a little error-prone when the local has a different type
from the parameter, and is very error-prone with both have the same
type. Let's not do this.
Fixes CID #437,438,439,440,441.
This lets us make a lot of other stuff const, allows the compiler to
generate (slightly) better code, and will make me get slightly fewer
patches from folks who stick mutable stuff into or_options_t.
const: because not every input is an output!
The conflicts were mainly caused by the routerinfo->node transition.
Conflicts:
src/or/circuitbuild.c
src/or/command.c
src/or/connection_edge.c
src/or/directory.c
src/or/dirserv.c
src/or/relay.c
src/or/rendservice.c
src/or/routerlist.c
This patch introduces a few new functions in router.c to produce a
more helpful description of a node than its nickame, and then tweaks
nearly all log messages taking a nickname as an argument to call these
functions instead.
There are a few cases where I left the old log messages alone: in
these cases, the nickname was that of an authority (whose nicknames
are useful and unique), or the message already included an identity
and/or an address. I might have missed a couple more too.
This is a fix for bug 3045.
Conflicts in various places, mainly node-related. Resolved them in
favor of HEAD, with copying of tor_mem* operations from bug3122_memcmp_022.
src/common/Makefile.am
src/or/circuitlist.c
src/or/connection_edge.c
src/or/directory.c
src/or/microdesc.c
src/or/networkstatus.c
src/or/router.c
src/or/routerlist.c
src/test/test_util.c
Conflicts throughout. All resolved in favor of taking HEAD and
adding tor_mem* or fast_mem* ops as appropriate.
src/common/Makefile.am
src/or/circuitbuild.c
src/or/directory.c
src/or/dirserv.c
src/or/dirvote.c
src/or/networkstatus.c
src/or/rendclient.c
src/or/rendservice.c
src/or/router.c
src/or/routerlist.c
src/or/routerparse.c
src/or/test.c
Here I looked at the results of the automated conversion and cleaned
them up as follows:
If there was a tor_memcmp or tor_memeq that was in fact "safe"[*] I
changed it to a fast_memcmp or fast_memeq.
Otherwise if there was a tor_memcmp that could turn into a
tor_memneq or tor_memeq, I converted it.
This wants close attention.
[*] I'm erring on the side of caution here, and leaving some things
as tor_memcmp that could in my opinion use the data-dependent
fast_memcmp variant.
Ian's original message:
The current code actually correctly handles queued data at the
Exit; if there is queued data in a EXIT_CONN_STATE_CONNECTING
stream, that data will be immediately sent when the connection
succeeds. If the connection fails, the data will be correctly
ignored and freed. The problem with the current server code is
that the server currently drops DATA cells on streams in the
EXIT_CONN_STATE_CONNECTING state. Also, if you try to queue data
in the EXIT_CONN_STATE_RESOLVING state, bad things happen because
streams in that state don't yet have conn->write_event set, and so
some existing sanity checks (any stream with queued data is at
least potentially writable) are no longer sound.
The solution is to simply not drop received DATA cells while in
the EXIT_CONN_STATE_CONNECTING state. Also do not send SENDME
cells in this state, so that the OP cannot send more than one
window's worth of data to be queued at the Exit. Finally, patch
the sanity checks so that streams in the EXIT_CONN_STATE_RESOLVING
state that have buffered data can pass.
[...] Here is a simple patch. It seems to work with both regular
streams and hidden services, but there may be other corner cases
I'm not aware of. (Do streams used for directory fetches, hidden
services, etc. take a different code path?)
We need to make sure that the worst thing that a weird consensus param
can do to us is to break our Tor (and only if the other Tors are
reliably broken in the same way) so that the majority of directory
authorities can't pull any attacks that are worse than the DoS that
they can trigger by simply shutting down.
One of these worse things was the cbtnummodes parameter, which could
lead to heap corruption on some systems if the value was sufficiently
large.
This commit fixes this particular issue and also introduces sanity
checking for all consensus parameters.
The reason the "streams problem" occurs is due to the complicated
interaction between Tor's congestion control and libevent. At some point
during the experiment, the circuit window is exhausted, which blocks all
edge streams. When a circuit level sendme is received at Exit, it
resumes edge reading by looping over linked list of edge streams, and
calling connection_start_reading() to inform libevent to resume reading.
When the streams are activated again, Tor gets the chance to service the
first three streams activated before the circuit window is exhausted
again, which causes all streams to be blocked again. As an experiment,
we reversed the order in which the streams are activated, and indeed the
first three streams, rather than the last three, got service, while the
others starved.
Our solution is to change the order in which streams are activated. We
choose a random edge connection from the linked list, and then we
activate streams starting from that chosen stream. When we reach the end
of the list, then we continue from the head of the list until our chosen
stream (treating the linked list as a circular linked list). It would
probably be better to actually remember which streams have received
service recently, but this way is simple and effective.
A node_t is an abstraction over routerstatus_t, routerinfo_t, and
microdesc_t. It should try to present a consistent interface to all
of them. There should be a node_t for a server whenever there is
* A routerinfo_t for it in the routerlist
* A routerstatus_t in the current_consensus.
(note that a microdesc_t alone isn't enough to make a node_t exist,
since microdescriptors aren't usable on their own.)
There are three ways to get a node_t right now: looking it up by ID,
looking it up by nickname, and iterating over the whole list of
microdescriptors.
All (or nearly all) functions that are supposed to return "a router"
-- especially those used in building connections and circuits --
should return a node_t, not a routerinfo_t or a routerstatus_t.
A node_t should hold all the *mutable* flags about a node. This
patch moves the is_foo flags from routerinfo_t into node_t. The
flags in routerstatus_t remain, but they get set from the consensus
and should not change.
Some other highlights of this patch are:
* Looking up routerinfo and routerstatus by nickname is now
unified and based on the "look up a node by nickname" function.
This tries to look only at the values from current consensus,
and not get confused by the routerinfo_t->is_named flag, which
could get set for other weird reasons. This changes the
behavior of how authorities (when acting as clients) deal with
nodes that have been listed by nickname.
* I tried not to artificially increase the size of the diff here
by moving functions around. As a result, some functions that
now operate on nodes are now in the wrong file -- they should
get moved to nodelist.c once this refactoring settles down.
This moving should happen as part of a patch that moves
functions AND NOTHING ELSE.
* Some old code is now left around inside #if 0/1 blocks, and
should get removed once I've verified that I don't want it
sitting around to see how we used to do things.
There are still some unimplemented functions: these are flagged
with "UNIMPLEMENTED_NODELIST()." I'll work on filling in the
implementation here, piece by piece.
I wish this patch could have been smaller, but there did not seem to
be any piece of it that was independent from the rest. Moving flags
forces many functions that once returned routerinfo_t * to return
node_t *, which forces their friends to change, and so on.
When the CellStatistics option is off, we don't store cell insertion
times. Doing so would also not be very smart, because there seem to
still be some performance issues with this type of statistics. Nothing
harmful happens when we don't have insertion times, so we don't need to
alarm the user.
Previously[*], the function would start with the first stream on the
circuit, and let it package as many cells as it wanted before
proceeding to the next stream in turn. If a circuit had many live
streams that all wanted to package data, the oldest would get
preference, and the newest would get ignored.
Now, we figure out how many cells we're willing to send per stream,
and try to allocate them fairly.
Roger diagnosed this in the comments for bug 1298.
[*] This bug has existed since before the first-ever public release
of Tor. It was added by r152 of Tor on 26 Jan 2003, which was
the first commit to implement streams (then called "topics").
This is not the oldest bug to be fixed in 0.2.2.x: that honor
goes to the windowing bug in r54, which got fixed in e50b7768 by
Roger with diagnosis by Karsten. This is, however, the most
long-lived bug to be fixed in 0.2.2.x: the r54 bug was fixed
2580 days after it was introduced, whereas I am writing this
commit message 2787 days after r152.
I'm going to use this to implement more fairness in
circuit_resume_edge_reading_helper in an attempt to fix bug 1298.
(Updated with fixes from arma and Sebastian)
We frequently add cells to stream-blocked queues for valid reasons
that don't mean we need to block streams. The most obvious reason
is if the cell arrives over a circuit rather than from an edge: we
don't block circuits, no matter how full queues get. The next most
obvious reason is that we allow CONNECTED cells from a newly created
stream to get delivered just fine.
This patch changes the behavior so that we only iterate over the
streams on a circuit when the cell in question came from a stream,
and we only block the stream that generated the cell, so that other
streams can still get their CONNECTEDs in.
When this happens, run through the streams on the circuit and make
sure they're all blocked. If some aren't, that's a bug: block them
all and log it! If they all are, where did the cell come from? Log
it!
(I suspect that this actually happens pretty frequently, so I'm making
these log messages appear at INFO.)
Do not start reading on exit streams when we get a SENDME unless we
have space in the appropriate circuit's cell queue.
Draft fix for bug 1653.
(commit message by nickm)
At best, this patch helps us avoid sending queued relayed cells that
would get ignored during the time between when a destroy cell is
sent and when the circuit is finally freed. At worst, it lets us
release some memory a little earlier than it would otherwise.
Fix for bug #1184. Bugfix on 0.2.0.1-alpha.
Everything that accepted the 'Circ' name handled it wrong, so even now
that we fixed the handling of the parameter, we wouldn't be able to
set it without making all the 0.2.2.7..0.2.2.10 relays act wonky.
This patch makes Tors accept the 'Circuit' name instead, so we can
turn on circuit priorities without confusing the versions that treated
the 'Circ' name as occasion to act weird.
When you mean (a=b(c,d)) >= 0, you had better not say (a=b(c,d)>=0).
We did the latter, and so whenever CircPriorityHalflife was in the
consensus, it was treated as having a value of 1 msec (that is,
boolean true).
The rule is now: take the value from the CircuitPriorityHalflife
config option if it is set. If it zero, disable the cell_ewma
algorithm. If it is set, use it to calculate the scaling factor.
If it is not set, look for a CircPriorityHalflifeMsec parameter in the
consensus networkstatus. If *that* is zero, then disable the cell_ewma
algorithm; if it is set, use it to calculate the scaling factor.
If it is not set at all, disable the algorithm.
There are two big changes here:
- We store active circuits in a priority queue for each or_conn,
rather than doing a linear search over all the active circuits
before we send each cell.
- Rather than multiplying every circuit's cell-ewma by a decay
factor every time we send a cell (thus normalizing the value of a
current cell to 1.0 and a past cell to alpha^t), we instead
only scale down the cell-ewma every tick (ten seconds atm),
normalizing so that a cell sent at the start of the tick has
value 1.0).
Each circuit is ranked in terms of how many cells from it have been
relayed recently, using a time-weighted average.
This patch has been tested this on a private Tor network on PlanetLab,
and gotten improvements of 12-35% in time it takes to fetch a small
web page while there's a simultaneous large data transfer going on
simultaneously.
[Commit msg by nickm based on mail from Ian Goldberg.]
Some *_free functions threw asserts when passed NULL. Now all of them
accept NULL as input and perform no action when called that way.
This gains us consistence for our free functions, and allows some
code simplifications where an explicit null check is no longer necessary.
- Refactor geoip.c by moving duplicate code into rotate_request_period().
- Don't leak memory when cleaning up cell queues.
- Make sure that exit_(streams|bytes_(read|written)) are initialized in all
places accessing these arrays.
- Read only the last block from *stats files and ensure that its timestamp
is not more than 25 hours in the past and not more than 1 hour in the
future.
- Stop truncating the last character when reading *stats files.
The only thing that's left now is to avoid reading whole *stats files into
memory.
Send circuit or stream sendme cells when our window has decreased
by 100 cells, not when it has decreased by 101 cells. Bug uncovered
by Karsten when testing the "reduce circuit window" performance
patch. Bugfix on the 54th commit on Tor -- from July 2002,
before the release of Tor 0.0.0. This is the new winner of the
oldest-bug prize.
The problem is that clients and hidden services are receiving
relay_early cells, and they tear down the circuit.
Hack #1 is for rendezvous points to rewrite relay_early cells to
relay cells. That way there are never any incoming relay_early cells.
Hack #2 is for clients and hidden services to never send a relay_early
cell on an established rendezvous circuit. That works around rendezvous
points that haven't upgraded yet.
Hack #3 is for clients and hidden services to not tear down the circuit
when they receive an inbound relay_early cell. We already refuse extend
cells at clients.
When determining how long directory requests take or how long cells spend
in queues, we were comparing timestamps on microsecond detail only to
convert results to second or millisecond detail later on. But on 32-bit
architectures this means that 2^31 microseconds only cover time
differences of up to 36 minutes. Instead, compare timestamps on
millisecond detail.
Changes to directory request statistics:
- Rename GEOIP statistics to DIRREQ statistics, because they now include
more than only GeoIP-based statistics, whereas other statistics are
GeoIP-dependent, too.
- Rename output file from geoip-stats to dirreq-stats.
- Add new config option DirReqStatistics that is required to measure
directory request statistics.
- Clean up ChangeLog.
Also ensure that entry guards statistics have access to a local GeoIP
database.
Fix an edge case where a malicious exit relay could convince a
controller that the client's DNS question resolves to an internal IP
address. Bug found and fixed by "optimist"; bugfix on 0.1.2.8-beta.
Fix an edge case where a malicious exit relay could convince a
controller that the client's DNS question resolves to an internal IP
address. Bug found and fixed by "optimist"; bugfix on 0.1.2.8-beta.
The subversion $Id$ fields made every commit force a rebuild of
whatever file got committed. They were not actually useful for
telling the version of Tor files in the wild.
svn:r17867
"connecting" and it receives an "end" relay cell, the exit relay
would silently ignore the end cell and not close the stream. If
the client never closes the circuit, then the exit relay never
closes the TCP connection. Bug introduced in Tor 0.1.2.1-alpha;
reported by "wood".
svn:r17625
The "ClientDNSRejectInternalAddresses" config option wasn't being
consistently obeyed: if an exit relay refuses a stream because its
exit policy doesn't allow it, we would remember what IP address
the relay said the destination address resolves to, even if it's
an internal IP address. Bugfix on 0.2.0.7-alpha; patch by rovv.
svn:r17135
Initial conversion of uint32_t addr to tor_addr_t addr in connection_t and related types. Most of the Tor wire formats using these new types are in, but the code to generate and use it is not. This is a big patch. Let me know what it breaks for you.
svn:r16435
Part of fix for bug 617: allow connection_ap_handshake_attach_circuit() to mark connections, to avoid double-mark warnings. Note that this is an incomplete refactoring.
svn:r14066
Re-tune mempool parametes based on testing on peacetime: use smaller chuncks, free them a little more aggressively, and try very hard to concentrate allocations on fuller chunks. Also, lots of new documentation.
svn:r13484
Add a couple of (currently disabled) strategies for trying to avoid using too much ram in memory pools: prefer putting new cells in almost-full chunks, and be willing to free the last empty chunk if we have not needed it for a while. Also add better output to mp_pool_log_status to track how many mallocs a given memory pool strategy is saving us, so we can tune the mempool parameters.
svn:r13428
Be more thorough about memory poisoning and clearing. Add an in-place version of aes_crypt in order to remove a memcpy from relay_crypt_one_payload.
svn:r13414