2009-01-18 10:51:09 +01:00
|
|
|
Filename: xxx-microdescriptors.txt
|
|
|
|
Title: Clients download consensus + microdescriptors
|
|
|
|
Version: $Revision$
|
|
|
|
Last-Modified: $Date$
|
|
|
|
Author: Roger Dingledine
|
|
|
|
Created: 17-Jan-2009
|
|
|
|
Status: Open
|
|
|
|
|
|
|
|
1. Overview
|
|
|
|
|
2009-01-18 11:22:13 +01:00
|
|
|
This proposal replaces section 3.2 of proposal 141, which was
|
|
|
|
called "Fetching descriptors on demand". Rather than modifying the
|
|
|
|
circuit-building protocol to fetch a server descriptor inline at each
|
|
|
|
circuit extend, we instead put all of the information that clients need
|
|
|
|
either into the consensus itself, or into a new set of data about each
|
|
|
|
relay called a microdescriptor.
|
|
|
|
|
|
|
|
The goal is that descriptor elements that are small and frequently
|
|
|
|
changing should go in the consensus itself, descriptor elements that
|
|
|
|
are small and relatively static should go in the microdescriptor,
|
|
|
|
and if we ever end up with descriptor elements that aren't small yet
|
|
|
|
clients need to know them, we'll need to resume considering some design
|
|
|
|
like the one in proposal 141.
|
2009-01-18 10:51:09 +01:00
|
|
|
|
|
|
|
2. Motivation
|
|
|
|
|
|
|
|
See
|
|
|
|
http://archives.seul.org/or/dev/Nov-2008/msg00000.html and
|
|
|
|
http://archives.seul.org/or/dev/Nov-2008/msg00001.html and especially
|
|
|
|
http://archives.seul.org/or/dev/Nov-2008/msg00007.html
|
|
|
|
for a discussion of the options and why this is currently the best
|
|
|
|
approach.
|
|
|
|
|
|
|
|
3. Design
|
|
|
|
|
2009-01-18 11:22:13 +01:00
|
|
|
There are three pieces to the proposal. First, authorities will list in
|
|
|
|
their votes (and thus in the consensus) what relay descriptor elements
|
|
|
|
are included in the microdescriptor, and also list the expected hash
|
|
|
|
of microdescriptor for each relay. Second, directory mirrors will serve
|
2009-01-18 10:51:09 +01:00
|
|
|
microdescriptors. Third, clients will ask for them and then cache them.
|
|
|
|
|
|
|
|
3.1. Consensus changes
|
|
|
|
|
|
|
|
V3 votes should include a new line:
|
|
|
|
microdescriptor-elements bar baz foo
|
|
|
|
|
|
|
|
We also need to include the hash of each expected microdescriptor in
|
|
|
|
the routerstatus section. I suggest a new "m" line for each stanza,
|
|
|
|
with the base64 of the hash of the elements that the authority voted
|
|
|
|
for above.
|
|
|
|
|
|
|
|
The consensus microdescriptor-elements and "m" lines are then computed
|
|
|
|
as described in Section 3.1.2 below.
|
|
|
|
|
|
|
|
I believe that means we need a new consensus-method "6" that knows
|
|
|
|
how to compute the microdescriptor-elements and add "m" lines.
|
|
|
|
|
|
|
|
3.1.1. Descriptor elements to include for now
|
|
|
|
|
|
|
|
To start, the element list that authorities suggest should be
|
|
|
|
family onion-key
|
|
|
|
|
|
|
|
(Note that the or-dev posts above only mention onion-key, but if
|
|
|
|
we don't also include family then clients will never learn it. It
|
|
|
|
seemed like it should be relatively static, so putting it in the
|
|
|
|
microdescriptor is smarter than trying to fit it into the consensus.)
|
|
|
|
|
|
|
|
3.1.2. Computing consensus for microdescriptor-elements and "m" lines
|
|
|
|
|
|
|
|
One approach is for the consensus microdescriptor-elements line to
|
|
|
|
include all elements listed by a majority of authorities, sorted. The
|
|
|
|
problem here is that it will no longer be deterministic what the correct
|
|
|
|
hash for the "m" line should be. We could imagine telling the authority
|
|
|
|
to go look in its descriptor and produce the right hash itself, but
|
|
|
|
we don't want consensus calculation to be based on external data like
|
|
|
|
that. (Plus, the authority may not have the descriptor that everybody
|
|
|
|
else voted to use.)
|
|
|
|
|
|
|
|
The better approach is to take the exact set that has the most votes
|
|
|
|
(breaking ties by the set that has the most elements, and breaking
|
|
|
|
ties after that by whichever is alphabetically first). That will
|
|
|
|
increase the odds that we actually get a microdescriptor hash that
|
|
|
|
is both a) for the descriptor we're putting in the consensus, and b)
|
|
|
|
over the elements that we're declaring it should be for.
|
|
|
|
|
|
|
|
Then the "m" line for a given relay is the one that gets the most votes
|
2009-01-18 11:22:13 +01:00
|
|
|
from authorities that both a) voted for the microdescriptor-elements
|
|
|
|
line we're using, and b) voted for the descriptor we're using.
|
2009-01-18 10:51:09 +01:00
|
|
|
|
|
|
|
(If there's a tie, use the smaller hash. But really, if there are
|
|
|
|
multiple such votes and they differ about a microdescriptor, we caught
|
|
|
|
one of them being lying or buggy. We should log it to track down why.)
|
|
|
|
|
|
|
|
If there are no such votes, then we leave out the "m" line for that
|
|
|
|
relay. That means clients should avoid it for this time period. (As
|
|
|
|
an extension it could instead mean that clients should fetch the
|
|
|
|
descriptor and figure out its microdescriptor themselves. But let's
|
|
|
|
not get ahead of ourselves.)
|
|
|
|
|
|
|
|
It would be nice to have a more foolproof way to agree on what
|
|
|
|
microdescriptor hash each authority should vote for, so we can avoid
|
|
|
|
missing "m" lines. Just switching to a new consensus-method each time
|
|
|
|
we change the set of microdescriptor-elements won't help though, since
|
|
|
|
each authority will still have to decide what hash to vote for before
|
|
|
|
knowing what consensus-method will be used.
|
|
|
|
|
2009-01-18 11:22:13 +01:00
|
|
|
Here's one way we could do it. Each vote / consensus includes
|
2009-01-18 10:51:09 +01:00
|
|
|
the microdescriptor-elements that were used to compute the hashes,
|
|
|
|
and also a preferred-microdescriptor-elements set. If an authority
|
|
|
|
has a consensus from the previous period, then it should use the
|
|
|
|
consensus preferred-microdescriptor-elements when computing its votes
|
|
|
|
for microdescriptor-elements and the appropriate hashes in the upcoming
|
|
|
|
period. (If it has no previous consensus, then it just puts down its
|
|
|
|
own preferences in both lines.)
|
|
|
|
|
|
|
|
3.2. Directory mirrors serve microdescriptors
|
|
|
|
|
|
|
|
Directory mirrors should then read the microdescriptor-elements line
|
|
|
|
from the consensus, and learn how to answer requests.
|
|
|
|
|
|
|
|
The microdescriptors with hashes <D1>,<D2>,<D3> should be available at:
|
|
|
|
http://<hostname>/tor/micro/d/<D1>+<D2>+<D3>.z
|
|
|
|
|
|
|
|
All the microdescriptors from the current consensus should also be
|
|
|
|
available at:
|
|
|
|
http://<hostname>/tor/micro/all.z
|
|
|
|
so a client that's bootstrapping doesn't need to send a 70KB URL just
|
|
|
|
to name every microdescriptor it's looking for.
|
|
|
|
|
|
|
|
The format of a microdescriptor is the header line
|
|
|
|
"microdescriptor 1"
|
|
|
|
followed by each element (keyword and body), alphabetically. There's
|
|
|
|
no need to mention what hash it is, since you can hash the elements
|
|
|
|
to learn this.
|
|
|
|
|
|
|
|
(Do we need a footer line to show that it's over, or is the next
|
|
|
|
microdescriptor line or EOF enough of a hint? A footer line wouldn't
|
|
|
|
hurt much. Also, no fair voting for the microdescriptor-element
|
|
|
|
"microdescriptor".)
|
|
|
|
|
|
|
|
The hash of the microdescriptor is simply the hash of the concatenated
|
|
|
|
elements -- not counting the header line or hypothetical footer line.
|
|
|
|
Is this smart?
|
|
|
|
|
|
|
|
Note that I put a "1" up there in the header line. It isn't part
|
|
|
|
of what's hashed, though. Is there a way to put in a version that's
|
|
|
|
more useful?
|
|
|
|
|
|
|
|
Directory mirrors should check to make sure that the microdescriptors
|
|
|
|
they're about to serve match the right hashes (either the hashes from
|
|
|
|
the fetch URL or the hashes from the consensus, respectively).
|
|
|
|
|
|
|
|
We will probably want to consider some sort of smart data structure to
|
|
|
|
be able to quickly convert microdescriptor hashes into the appropriate
|
|
|
|
microdescriptor. Clients will want this anyway when they load their
|
|
|
|
microdescriptor cache and want to match it up with the consensus to
|
|
|
|
see what's missing.
|
|
|
|
|
|
|
|
3.3. Clients fetch them and cache them
|
|
|
|
|
|
|
|
When a client gets a new consensus, it looks to see if there are any
|
|
|
|
microdescriptors it needs to learn. If it needs to learn more than
|
|
|
|
some threshold of the microdescriptors (half?), it requests 'all',
|
|
|
|
else it requests only the missing ones.
|
|
|
|
|
|
|
|
Clients maintain a cache of microdescriptors along with metadata like
|
|
|
|
when it was last referenced by a consensus. They keep a microdescriptor
|
|
|
|
until it hasn't been mentioned in any consensus for a week.
|
|
|
|
|
|
|
|
3.3.1. Information leaks from clients
|
|
|
|
|
|
|
|
If a client asks you for a set of microdescs, then you know she didn't
|
|
|
|
have them cached before. How much does that leak? What about when
|
|
|
|
we're all using our entry guards as directory guards, and we've seen
|
|
|
|
that user make a bunch of circuits already?
|
|
|
|
|
|
|
|
Fetching "all" when you need at least half is a good first order fix,
|
|
|
|
but might not be all there is to it.
|
|
|
|
|
2009-01-18 11:22:13 +01:00
|
|
|
Another future option would be to fetch some of the microdescriptors
|
|
|
|
anonymously (via a Tor circuit).
|
|
|
|
|
2009-01-18 10:51:09 +01:00
|
|
|
4. Transition and deployment
|
|
|
|
|
|
|
|
Phase one, the directory authorities should start voting on
|
|
|
|
microdescriptors and microdescriptor elements, and putting them in the
|
|
|
|
consensus. This should happen during the 0.2.1.x series, and should
|
|
|
|
be relatively easy to do.
|
|
|
|
|
|
|
|
Phase two, directory mirrors should learn how to serve them, and learn
|
|
|
|
how to read the consensus to find out what they should be serving. It
|
|
|
|
would be great if we can squeeze this in during 0.2.1.x also, so once
|
|
|
|
clients start to fetch them there will be many mirrors to choose from.
|
|
|
|
|
|
|
|
(Are there reasonable ways to build only part of phase two in 0.2.1.x?)
|
|
|
|
|
|
|
|
Phase three, clients should start fetching and caching them instead
|
|
|
|
of normal descriptors. This should happen post 0.2.1.x.
|
|
|
|
|