mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-10 13:13:44 +01:00
sched: Don't get KIST stuck in an infinite loop
When a channel is scheduled and flush cells returns 0 that is no cells to flush, we flag it back in waiting for cells so it doesn't get stuck in a possible infinite loop. It has been observed on moria1 where a closed channel end up in the scheduler where the flush process returned 0 cells but it was ultimately kept in the scheduling loop forever. We suspect that this is due to a more deeper problem in tor where the channel_more_to_flush() is actually looking at the wrong queue and was returning 1 for an empty channel thus putting the channel in the "Case 4" of the scheduler which is to go back in pending state thus re-considered at the next iteration. This is a fix that allows the KIST scheduler to recover properly from a not entirelly diagnosed problem in tor. Fixes #23676 Signed-off-by: David Goulet <dgoulet@torproject.org>
This commit is contained in:
parent
fc6c0b46fb
commit
3ef7e6f187
6
changes/bug23676
Normal file
6
changes/bug23676
Normal file
@ -0,0 +1,6 @@
|
||||
o Major bugfixes (scheduler):
|
||||
If a channel is put into the scheduler's pending list, then it starts
|
||||
closing, and then if the scheduler runs before it finishes closing, the
|
||||
scheduler will get stuck trying to flush its cells while the lower layers
|
||||
refuse to cooperate. Fix that race condition by given the scheduler an
|
||||
escape method. Fixes bug 23676; bugfix on 0.3.2.1-alpha
|
@ -591,8 +591,23 @@ kist_scheduler_run(void)
|
||||
if (flush_result > 0) {
|
||||
update_socket_written(&socket_table, chan, flush_result *
|
||||
(CELL_MAX_NETWORK_SIZE + TLS_PER_CELL_OVERHEAD));
|
||||
} else {
|
||||
/* XXX: This can happen because tor sometimes does flush in an
|
||||
* opportunistic way cells from the circuit to the outbuf so the
|
||||
* channel can end up here without having anything to flush nor needed
|
||||
* to write to the kernel. Hopefully we'll fix that soon but for now
|
||||
* we have to handle this case which happens kind of often. */
|
||||
log_debug(LD_SCHED,
|
||||
"We didn't flush anything on a chan that we think "
|
||||
"can write and wants to write. The channel's state is '%s' "
|
||||
"and in scheduler state %d. We're going to mark it as "
|
||||
"waiting_for_cells (as that's most likely the issue) and "
|
||||
"stop scheduling it this round.",
|
||||
channel_state_to_string(chan->state),
|
||||
chan->scheduler_state);
|
||||
chan->scheduler_state = SCHED_CHAN_WAITING_FOR_CELLS;
|
||||
continue;
|
||||
}
|
||||
/* XXX What if we didn't flush? */
|
||||
}
|
||||
|
||||
/* Decide what to do with the channel now */
|
||||
|
Loading…
Reference in New Issue
Block a user