diff --git a/doc/spec/proposals/104-short-descriptors.txt b/doc/spec/proposals/104-short-descriptors.txt index 80d16f0c08..5706a43813 100644 --- a/doc/spec/proposals/104-short-descriptors.txt +++ b/doc/spec/proposals/104-short-descriptors.txt @@ -34,7 +34,9 @@ Proposal: Another possible solution would be to drop these fields from descriptors, and have them uploaded as a part of a separate "bandwidth report" to the authorities. This could help prevent the mistake of using long descriptors - in the place of short ones. + in the place of short ones. It could also be generalized later to be an + overall status report, to include sanitized GeoIP information and whatever + else comes up. Other disposable fields: @@ -49,11 +51,15 @@ Other disposable fields: accept (Apparently, exit polices are highly compressible.) + [Does size-on-disk matter to anybody? Some clients and servers don't + have much disk, or have really slow disk (e.g. USB). And we don't + store caches compressed right now. -RD] + Issues: Indexing long descriptor or bandwidth reports presents an issue: right now the way to make sure you have the same copy of a descriptor as everyone - else is to request the descriptor by its digest, and to make sure to that + else is to request the descriptor by its digest, and to make sure that the digest you request is the one that the authorities like. Authorities should presumably list the digests of short descriptors, since @@ -62,19 +68,21 @@ Issues: with information nobody wants. Possible solutions are: - - Drop the property that you can be sure of having the same long - descriptor as others. This seems unoptimal. - - Have a separate extra-information-status that also gets generated by the + 1) Drop the property that you can be sure of having the same long + descriptor as others. This seems unoptimal, but if nobody caches + long descriptors so you have to go to the authority to get them, + maybe it's not so bad. + 2) Have a separate extra-information-status that also gets generated by the authorities; use it to tell which long descriptors others have. Also a pain. - - Have short descriptors include a hash of the corresponding long + 3) Have short descriptors include a hash of the corresponding long descriptor/extra-info. This would keep the same order of magnitude performance increase (~59.2% savings as opposed to 61% savings.) This would require longdesc/extra-info downloaders to fetch router data before they could know which longdescs/extra info to fetch. - - Have each authority make a signed concatenated "extra info" document, + 4) Have each authority make a signed concatenated "extra info" document, and hope we never need to reconcile them. - - ???? + 5) ???? Migration: @@ -83,12 +91,20 @@ Migration: * Authorities should accept both, now, and silently drop short descriptors. * Routers should upload both once authorities accept them. - * There should be a "long descriptor" url and the current "normal" URL. + * There should be a "long descriptor" url named + /tor/server/fp-detailed/ and the current "normal" URL. Authorities should serve long descriptors from both URLs. + There's no such thing as asking for a long descriptor by + its digest. * Once tools that want long descriptors support fetching them from the "long descriptor" URL: * Have authorities remember short descriptors, and serve them from the 'normal' URL. + These tools include: + lefkada's exit.py script. + tor26's noreply script and general directory cache. + https://nighteffect.us/tns/ for its graphs + and check with or-talk for the rest, once it's time. For bandwidth info approach: * First: @@ -99,3 +115,30 @@ Migration: * Once tools that want bandwidth info support fetching it: * Have routers stop including bandwidth info in their router descriptors. + +Discussion: + + Solution 4 seems like a nice plan: in many cases, the external services + that use read-history and write-history are directory authorities + themselves, so they just use their local opinion. + + Roger thinks we should go with the long/short descriptor plan, along + with solution 4. We don't want to just upload a bandwidth message, + because that involves new data structures for every new piece of + information we decide to upload. I suspect we'll realize once this + is deployed that there is other info we want to put in the long + descriptors. + + This won't solve the future sanitized GeoIP uploading question, but + who knows where we'll actually want to send that data, and whether + we'll want to handle it with the same privacy constraints as this data, + so let's not try to solve that yet. + + However, we may still need some basic reconciling algorithms between + authorities -- otherwise, if a router uploads to four authorities + and fails to reach the fifth, then that fifth will never have the new + descriptor. This will mean that the best strategy for external tools + is to fetch full concatenated-style long-descriptor lists from every + single authority, and merge them locally. So each authority should + periodically fetch the list from the others and take the new ones. +