Major revision of proposal 158.

The big changes are to go from a "caches compute the micro-descriptor"
format to an "authorities generate microdescriptors" format.

See or-dev discussions of January 2009 for full rationales.
This commit is contained in:
Nick Mathewson 2009-05-16 00:42:27 -04:00
parent 143e6677ff
commit 573aeb769e

View File

@ -4,6 +4,15 @@ Author: Roger Dingledine
Created: 17-Jan-2009
Status: Open
0. History
15 May 2009: Substantially revised based on discussions on or-dev
from late January. Removed the notion of voting on how to choose
microdescriptors; made it just a function of the consesus method.
(This lets us avoid the possibility of "desynchronization.")
Added suggestion to use a new consensus flavor. Specified use of
SHA256 for new hashes. -nickm
1. Overview
This proposal replaces section 3.2 of proposal 141, which was
@ -22,6 +31,10 @@ Status: Open
them, we'll need to resume considering some design like the one in
proposal 141.
Note also that any descriptor element which clients need to use to
decide which servers to fetch info about, or which servers to fetch
info from, needs to stay in the consensus.
2. Motivation
See
@ -34,89 +47,69 @@ Status: Open
3. Design
There are three pieces to the proposal. First, authorities will list in
their votes (and thus in the consensus) what relay descriptor elements
are included in the microdescriptor, and also list the expected hash
their votes (and thus in the consensus) the expected hash
of microdescriptor for each relay. Second, directory mirrors will serve
microdescriptors. Third, clients will ask for them and cache them.
3.1. Consensus changes
V3 votes should include a new line:
microdescriptor-elements bar baz foo
listing each descriptor element (sorted alphabetically) that authority
included when it calculated its expected microdescriptor hashes.
If the authorities choose a consensus method of a given version or
later, a microdescriptor format is implicit in that version.
A microdescriptor should in every case be a pure function of the
router descriptor and the conensus method.
We also need to include the hash of each expected microdescriptor in
In votes, need to include the hash of each expected microdescriptor in
the routerstatus section. I suggest a new "m" line for each stanza,
with the base64 of the hash of the elements that the authority voted
for above.
with the base64 of the SHA256 hash of the router's microdescriptor.
For every consensus method that an authority supports, it includes a
separate "m" line in each router section of its vote, containing:
"m" SP methods SP digest NL
where methods is a comma-separated list of the consensus methods
that the authority believes will produce "digest".
(As with base64 encoding of SHA1 hashes in consensuses, let's
omit the trailing =s)
The consensus microdescriptor-elements and "m" lines are then computed
as described in Section 3.1.2 below.
I believe that means we need a new consensus-method "6" that knows
how to compute the microdescriptor-elements and add "m" lines.
(This means we need a new consensus-method that knows
how to compute the microdescriptor-elements and add "m" lines.)
3.1.1. Descriptor elements to include for now
To start, the element list that authorities suggest should be
family onion-key
(Note that the or-dev posts above only mention onion-key, but if
we don't also include family then clients will never learn it. It
seemed like it should be relatively static, so putting it in the
microdescriptor is smarter than trying to fit it into the consensus.)
We could imagine a config option "family,onion-key" so authorities
could change their voted preferences without needing to upgrade.
In the first version, the microdescriptor should contain the
onion-key element and the family element from the router descriptor.
3.1.2. Computing consensus for microdescriptor-elements and "m" lines
One approach is for the consensus microdescriptor-elements line to
include every element listed by a majority of authorities, sorted. The
problem here is that it will no longer be deterministic what the correct
hash for the "m" line should be. We could imagine telling the authority
to go look in its descriptor and produce the right hash itself, but
we don't want consensus calculation to be based on external data like
that. (Plus, the authority may not have the descriptor that everybody
else voted to use.)
When we generating a consensus, we use whichever m line
unambiguously corresponds to the descriptor digest that will be
included in the consensus. (If there are multiple m lines for that
descriptor digest, we use whichever is most common. If they are
equally common, we break ties in the favor of the lexically
earliest. Either way, we should log a warning: That's likely a
bug.)
The better approach is to take the exact set that has the most votes
(breaking ties by the set that has the most elements, and breaking
ties after that by whichever is alphabetically first). That will
increase the odds that we actually get a microdescriptor hash that
is both a) for the descriptor we're putting in the consensus, and b)
over the elements that we're declaring it should be for.
The "m" lines in a consensus contain only the digest, not a list of
consensus methods.
Then the "m" line for a given relay is the one that gets the most votes
from authorities that both a) voted for the microdescriptor-elements
line we're using, and b) voted for the descriptor we're using.
3.1.3. A new flavor of consensus
(If there's a tie, use the smaller hash. But really, if there are
multiple such votes and they differ about a microdescriptor, we caught
one of them lying or being buggy. We should log it to track down why.)
Rather than inserting "m" lines in the current consensus format,
they should be included in a new consensus flavor (see proposal
162).
If there are no such votes, then we leave out the "m" line for that
relay. That means clients should avoid it for this time period. (As
an extension it could instead mean that clients should fetch the
descriptor and figure out its microdescriptor themselves. But let's
not get ahead of ourselves.)
This flavor can safely omit descriptor digests.
It would be nice to have a more foolproof way to agree on what
microdescriptor hash each authority should vote for, so we can avoid
missing "m" lines. Just switching to a new consensus-method each time
we change the set of microdescriptor-elements won't help though, since
each authority will still have to decide what hash to vote for before
knowing what consensus-method will be used.
We still need to descide whether to move ports into microdescriptors
or not. In either case, they can be removed from the current "ns"
flavor of consensus, since no current clients use them, and they
take up about 5% of the compressed consensus.
Here's one way we could do it. Each vote / consensus includes
the microdescriptor-elements that were used to compute the hashes,
and also a preferred-microdescriptor-elements set. If an authority
has a consensus from the previous period, then it should use the
consensus preferred-microdescriptor-elements when computing its votes
for microdescriptor-elements and the appropriate hashes in the upcoming
period. (If it has no previous consensus, then it just writes its
own preferences in both lines.)
This new consensus flavor should be signed with the sha256 signature
format as documented in proposal 162.
3.2. Directory mirrors serve microdescriptors
@ -125,8 +118,10 @@ Status: Open
continue to serve normal relay descriptors too, a) to serve old clients
and b) to be able to construct microdescriptors on the fly.)
The microdescriptors with hashes <D1>,<D2>,<D3> should be available at:
http://<hostname>/tor/micro/d/<D1>+<D2>+<D3>.z
The microdescriptors with base64 hashes <D1>,<D2>,<D3> should be available at:
http://<hostname>/tor/micro/d/<D1>-<D2>-<D3>.z
(We use base64 for size and for consistency with the consensus
format. We use -s instead of +s to separate these items, since
All the microdescriptors from the current consensus should also be
available at:
@ -134,24 +129,9 @@ Status: Open
so a client that's bootstrapping doesn't need to send a 70KB URL just
to name every microdescriptor it's looking for.
The format of a microdescriptor is the header line
"microdescriptor-header"
followed by each element (keyword and body), alphabetically. There's
no need to mention what hash it's for, since it's self-identifying:
you can hash the elements to learn this.
(Do we need a footer line to show that it's over, or is the next
microdescriptor line or EOF enough of a hint? A footer line wouldn't
hurt much. Also, no fair voting for the microdescriptor-element
"microdescriptor-header".)
Microdescriptors have no header or footer.
The hash of the microdescriptor is simply the hash of the concatenated
elements -- not counting the header line or hypothetical footer line.
Unless you prefer that?
Is there a reasonable way to version these things? We could say that
the microdescriptor-header line can contain arguments which clients
must ignore if they don't understand them. Any better ways?
elements.
Directory mirrors should check to make sure that the microdescriptors
they're about to serve match the right hashes (either the hashes from
@ -168,10 +148,14 @@ Status: Open
When a client gets a new consensus, it looks to see if there are any
microdescriptors it needs to learn. If it needs to learn more than
some threshold of the microdescriptors (half?), it requests 'all',
else it requests only the missing ones.
else it requests only the missing ones. Clients MAY try to
determine whether the upload bandwidth for listing the
microdescriptors they want is more or less than the download
bandwidth for the microdescriptors they do not want.
Clients maintain a cache of microdescriptors along with metadata like
when it was last referenced by a consensus. They keep a microdescriptor
when it was last referenced by a consensus, and which identity key
it corresponds to. They keep a microdescriptor
until it hasn't been mentioned in any consensus for a week. Future
clients might cache them for longer or shorter times.
@ -192,14 +176,11 @@ Status: Open
Phase one, the directory authorities should start voting on
microdescriptors and microdescriptor elements, and putting them in the
consensus. This should happen during the 0.2.1.x series, and should
be relatively easy to do.
consensus.
Phase two, directory mirrors should learn how to serve them, and learn
how to read the consensus to find out what they should be serving. This
phase could be done either in 0.2.1.x or early in 0.2.2.x, depending
on how messy it turns out to be and how quickly we get around to it.
how to read the consensus to find out what they should be serving.
Phase three, clients should start fetching and caching them instead
of normal descriptors. This should happen post 0.2.1.x.
of normal descriptors.