We used to not set the guard state in launch_direct_bridge_descriptor_fetch().
So when a bridge descriptor fetch failed, the guard subsystem would never
learn about the fail (and hence the guard's reachability state would not
be updated).
The bridges+ipv6-min integration test has a client with bridges:
Bridge 127.0.0.1:5003
Bridge [::1]:5003
which got stuck in guard_selection_have_enough_dir_info_to_build_circuits()
because it couldn't find the descriptor of both bridges.
Specifically, the guard_has_descriptor() function could not find the
node_t of the [::1] bridge, because the [::1] bridge had no identity
digest assigned to it.
After further examination, it seems that during fetching the descriptor
for our bridges, we used the CERTS cell to fill the identity digest of
127.0.0.1:5003 properly. However, when we received a CERTS cell from
[::1]:5003 we actually ignored its identity digest because the
learned_router_identity() function was using
get_configured_bridge_by_addr_port_digest() which was returning the
127.0.0.1 bridge instead of the [::1] bridge (because it prioritizes
digest matching over addrport matching).
The fix replaces get_configured_bridge_by_addr_port_digest() with the
recent get_configured_bridge_by_exact_addr_port_digest() function. It
also relaxes the constraints of the
get_configured_bridge_by_exact_addr_port_digest() function by making it
return bridges whose identity digest is not yet known.
By using the _exact_() function, learned_router_identity() actually
fills in the identity digest of the [::1] bridge, which then allows
guard_has_descriptor() to find the right node_t and verify that the
descriptor is there.
FWIW, in the bridges+ipv6-min test both 127.0.0.1 and [::1] bridges
correspond to the same node_t, which I guess makes sense given that it's
actually the same underlying bridge.
Previously I'd made a bad assumption in the implementation of
prop271 in 0.3.0.1-alpha: I'd assumed that there couldn't be two
guards with the same identity. That's true for non-bridges, but in
the bridge case, we allow two bridges to have the same ID if they
have different addr:port combinations -- in order to have the same
bridge ID running multiple PTs.
Fortunately, this assumption wasn't deeply ingrained: we stop
enforcing the "one guard per ID" rule in the bridge case, and
instead enforce "one guard per <id,addr,port>".
We also needed to tweak our implementation of
get_bridge_info_for_guard, since it made the same incorrect
assumption.
Fixes bug 21027; bugfix on 0.3.0.1-alpha.
Here we handle most (all?) of the remaining tasks, and fix some
bugs, in the prop271 bridge implementation.
* We record bridge identities as we learn them.
* We only call deprecated functions from bridges.c when the
deprecated guard algorithm is in use.
* We update any_bridge_descriptors_known() and
num_bridges_usable() to work correctly with the new backend
code. (Previously, they called into the guard selection logic.
* We update bridge directory fetches to work with the new
guard code.
* We remove some erroneous assertions where we assumed that we'd
never load a guard that wasn't for the current selection.
Also, we fix a couple of typos.
This includes:
* making bridge_info_t exposed but opaque
* allowing guards where we don't know an identity
* making it possible to learn the identity of a guard
* creating a guard that lacks a node_t
* remembering a guard's address and port.
* Looking up a guard by address and port.
* Only enforcing the rule that we need a live consensus to update
the "listed" status for guards when we are not using bridges.
This patch doesn't cover every case; omitted cases are marked with
"XXXX prop271", as usual. It leaves both the old interface and the
new interface for guard status notification, since they don't
actually work in the same way: the new API wants to be told when a
circuit has failed or succeeded, whereas the old API wants to know
when a channel has failed or succeeded.
I ran into some trouble with directory guard stuff, since when we
pick the directory guard, we don't actually have a circuit to
associate it with. I solved that by allowing guard states to be
associated with directory connections, not just circuits.
The previous commit, in moving a bunch of functions to bridges.c,
broke compilation because bridges.c required two entry points to
entrynodes.c it didn't have.
This patch is just:
* Code movement
* Adding headers here and there as needed
* Adding a bridges_free_all() with a call to it.
It breaks compilation, since the bridge code needed to make exactly
2 calls into entrynodes.c internals. I'll fix those in the next
commit.