2008-06-28 21:25:39 +02:00
|
|
|
Filename: 143-distributed-storage-improvements.txt
|
|
|
|
Title: Improvements of Distributed Storage for Tor Hidden Service Descriptors
|
|
|
|
Author: Karsten Loesing
|
|
|
|
Created: 28-Jun-2008
|
|
|
|
Status: Open
|
2008-07-14 22:44:44 +02:00
|
|
|
Target: 0.2.1.x
|
2008-06-28 21:25:39 +02:00
|
|
|
|
|
|
|
Change history:
|
|
|
|
|
|
|
|
28-Jun-2008 Initial proposal for or-dev
|
|
|
|
|
|
|
|
Overview:
|
|
|
|
|
|
|
|
An evaluation of the distributed storage for Tor hidden service
|
|
|
|
descriptors and subsequent discussions have brought up a few improvements
|
|
|
|
to proposal 114. All improvements are backwards compatible to the
|
|
|
|
implementation of proposal 114.
|
|
|
|
|
|
|
|
Design:
|
|
|
|
|
|
|
|
1. Report Bad Directory Nodes
|
|
|
|
|
|
|
|
Bad hidden service directory nodes could deny existence of previously
|
|
|
|
stored descriptors. A bad directory node that does this with all stored
|
|
|
|
descriptors causes harm to the distributed storage in general, but
|
|
|
|
replication will cope with this problem in most cases. However, an
|
|
|
|
adversary that attempts to make a specific hidden service unavailable by
|
|
|
|
running relays that become responsible for all of a service's
|
|
|
|
descriptors poses a more serious threat. The distributed storage needs to
|
|
|
|
defend against this attack by detecting and removing bad directory nodes.
|
|
|
|
|
|
|
|
As a countermeasure hidden services try to download their descriptors
|
|
|
|
every hour at random times from the hidden service directories that are
|
|
|
|
responsible for storing it. If a directory node replies with 404 (Not
|
|
|
|
found), the hidden service reports the supposedly bad directory node to
|
|
|
|
a random selection of half of the directory authorities (with version
|
|
|
|
numbers equal to or higher than the first version that implements this
|
|
|
|
proposal). The hidden service posts a complaint message using HTTP 'POST'
|
|
|
|
to a URL "/tor/rendezvous/complain" with the following message format:
|
|
|
|
|
|
|
|
"hidden-service-directory-complaint" identifier NL
|
|
|
|
|
|
|
|
[At start, exactly once]
|
|
|
|
|
|
|
|
The identifier of the hidden service directory node to be
|
|
|
|
investigated.
|
|
|
|
|
|
|
|
"rendezvous-service-descriptor" descriptor NL
|
|
|
|
|
|
|
|
[At end, Excatly once]
|
|
|
|
|
|
|
|
The hidden service descriptor that the supposedly bad directory node
|
|
|
|
does not serve.
|
|
|
|
|
|
|
|
The directory authority checks if the descriptor is valid and the hidden
|
|
|
|
service directory responsible for storing it. It waits for a random time
|
|
|
|
of up to 30 minutes before posting the descriptor to the hidden service
|
|
|
|
directory. If the publication is acknowledged, the directory authority
|
|
|
|
waits another random time of up to 30 minutes before attempting to
|
|
|
|
request the descriptor that it has posted. If the directory node replies
|
|
|
|
with 404 (Not found), it will be blacklisted for being a hidden service
|
|
|
|
directory node for the next 48 hours.
|
|
|
|
|
|
|
|
A blacklisted hidden service directory is assigned the new flag BadHSDir
|
|
|
|
instead of the HSDir flag in the vote that a directory authority creates.
|
|
|
|
In a consensus a relay is only assigned a HSDir flag if the majority of
|
|
|
|
votes contains a HSDir flag and no more than one third of votes contains
|
|
|
|
a BadHSDir flag. As a result, clients do not have to learn about the
|
|
|
|
BadHSDir flag. A blacklisted directory node will simply not be assigned
|
|
|
|
the HSDir flag in the consensus.
|
|
|
|
|
|
|
|
In order to prevent an attacker from setting up new nodes as replacement
|
|
|
|
for blacklisted directory nodes, all directory nodes in the same /24
|
|
|
|
subnet are blacklisted, too. Furthermore, if two or more directory nodes
|
|
|
|
are blacklisted in the same /16 subnet concurrently, all other directory
|
|
|
|
nodes in that /16 subnet are blacklisted, too. Blacklisting holds for at
|
|
|
|
most 48 hours.
|
|
|
|
|
|
|
|
2. Publish Fewer Replicas
|
|
|
|
|
|
|
|
The evaluation has shown that the probability of a directory node to
|
|
|
|
serve a previously stored descriptor is 85.7% (more precisely, this is
|
|
|
|
the 0.001-quantile of the empirical distribution with the rationale that
|
|
|
|
it holds for 99.9% of all empirical cases). If descriptors are replicated
|
|
|
|
to x directory nodes, the probability of at least one of the replicas to
|
|
|
|
be available for clients is 1 - (1 - 85.7%) ^ x. In order to achieve an
|
|
|
|
overall availability of 99.9%, x = 3.55 replicas need to be stored. From
|
|
|
|
this follows that 4 replicas are sufficient, rather than the currently
|
|
|
|
stored 6 replicas.
|
|
|
|
|
|
|
|
Further, the current design stores 2 sets of descriptors on 3 directory
|
|
|
|
nodes with consecutive identities. Originally, this was meant to
|
|
|
|
facilitate replication between directory nodes, which has not been and
|
|
|
|
will not be implemented (the selection criterion of 24 hours uptime does
|
|
|
|
not make it necessary). As a result, storing descriptors on directory
|
|
|
|
nodes with consecutive identities is not required. In fact it should be
|
|
|
|
avoided to enable an attacker to create "black holes" in the identifier
|
|
|
|
ring.
|
|
|
|
|
|
|
|
Hidden services should store their descriptors on 4 non-consecutive
|
|
|
|
directory nodes, and clients should request descriptors from these
|
|
|
|
directory nodes only. For compatibility reasons, hidden services also
|
|
|
|
store their descriptors on 2 consecutive directory nodes. Hence, 0.2.0.x
|
|
|
|
clients will be able to retrieve 4 out of 6 descriptors, but will fail
|
|
|
|
for the remaining 2 descriptors, which is sufficient for reliability. As
|
|
|
|
soon as 0.2.0.x is deprecated, hidden services can stop publishing the
|
|
|
|
additional 2 replicas.
|
|
|
|
|
|
|
|
3. Change Default Value of Being Hidden Service Directory
|
|
|
|
|
|
|
|
The requirements for becoming a hidden service directory node are an open
|
|
|
|
directory port and an uptime of at least 24 hours. The evaluation has
|
|
|
|
shown that there are 300 hidden service directory candidates in the mean,
|
|
|
|
but only 6 of them are configured to act as hidden service directories.
|
|
|
|
This is bad, because those 6 nodes need to serve a large share of all
|
|
|
|
hidden service descriptors. Optimally, there should be hundreds of hidden
|
|
|
|
service directories. Having a large number of 0.2.1.x directory nodes
|
|
|
|
also has a positive effect on 0.2.0.x hidden services and clients.
|
|
|
|
|
|
|
|
Therefore, the new default of HidServDirectoryV2 should be 1, so that a
|
|
|
|
Tor relay that has an open directory port automatically accepts and
|
|
|
|
serves v2 hidden service descriptors. A relay operator can still opt-out
|
|
|
|
running a hidden service directory by changing HidServDirectoryV2 to 0.
|
|
|
|
The additional bandwidth requirements for running a hidden service
|
|
|
|
directory node in addition to being a directory cache are negligible.
|
|
|
|
|
|
|
|
4. Make Descriptors Persistent on Directory Nodes
|
|
|
|
|
|
|
|
Hidden service directories that are restarted by their operators or after
|
|
|
|
a failure will not be selected as hidden service directories within the
|
|
|
|
next 24 hours. However, some clients might still think that these nodes
|
|
|
|
are responsible for certain descriptors, because they work on the basis
|
|
|
|
of network consensuses that are up to three hours old. The directory
|
|
|
|
nodes should be able to serve the previously received descriptors to
|
|
|
|
these clients. Therefore, directory nodes make all received descriptors
|
|
|
|
persistent and load previously received descriptors on startup.
|
|
|
|
|
|
|
|
5. Store and Serve Descriptors Regardless of Responsibility
|
|
|
|
|
|
|
|
Currently, directory nodes only accept descriptors for which they think
|
|
|
|
they are responsible. This may lead to problems when a directory node
|
|
|
|
uses an older or newer network consensus than hidden service or client
|
|
|
|
or when a directory node has been restarted recently. In fact, there are
|
|
|
|
no security issues in storing or serving descriptors for which a
|
|
|
|
directory node thinks it is not responsible. To the contrary, doing so
|
|
|
|
may improve reliability in border cases. As a result, a directory node
|
|
|
|
does not pay attention to responsibilty when receiving a publication or
|
|
|
|
fetch request, but stores or serves the requested descriptor. Likewise,
|
|
|
|
the directory node does not remove descriptors when it thinks it is not
|
|
|
|
responsible for them any more.
|
|
|
|
|
|
|
|
6. Avoid Periodic Descriptor Re-Publication
|
|
|
|
|
|
|
|
In the current implementation a hidden service re-publishes its
|
|
|
|
descriptor either when its content changes or an hour elapses. However,
|
|
|
|
the evaluation has shown that failures of hidden service directory nodes,
|
|
|
|
i.e. of nodes that have not failed within the last 24 hours, are very
|
|
|
|
rare. Together with making descriptors persistent on directory nodes,
|
|
|
|
there is no necessity to re-publish descriptors hourly.
|
|
|
|
|
|
|
|
The only two events leading to descriptor re-publication should be a
|
|
|
|
change of the descriptor content and a new directory node becoming
|
|
|
|
responsible for the descriptor. Hidden services should therefore consider
|
|
|
|
re-publication every time they learn about a new network consensus
|
|
|
|
instead of hourly.
|
|
|
|
|
|
|
|
7. Discard Expired Descriptors
|
|
|
|
|
|
|
|
The current implementation lets directory nodes keep a descriptor for two
|
|
|
|
days before discarding it. However, with the v2 design, descriptors are
|
|
|
|
only valid for at most one day. Directory nodes should determine the
|
|
|
|
validity of stored descriptors and discard them one hour after they have
|
|
|
|
expired (to compensate wrong clocks on clients).
|
|
|
|
|
|
|
|
8. Shorten Client-Side Descriptor Fetch History
|
|
|
|
|
|
|
|
When clients try to download a hidden service descriptor, they memorize
|
|
|
|
fetch requests to directory nodes for up to 15 minutes. This allows them
|
|
|
|
to request all replicas of a descriptor to avoid bad or failing directory
|
|
|
|
nodes, but without querying the same directory node twice.
|
|
|
|
|
|
|
|
The downside is that a client that has requested a descriptor without
|
|
|
|
success, will not be able to find a hidden service that has been started
|
|
|
|
during the following 15 minutes after the client's last request.
|
|
|
|
|
|
|
|
This can be improved by shortening the fetch history to only 5 minutes.
|
|
|
|
This time should be sufficient to complete requests for all replicas of a
|
|
|
|
descriptor, but without ending in an infinite request loop.
|
|
|
|
|
|
|
|
Compatibility:
|
|
|
|
|
|
|
|
All proposed improvements are compatible to the currently implemented
|
|
|
|
design as described in proposal 114.
|
|
|
|
|