mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-30 23:53:32 +01:00
55c3619c23
svn:r15905
442 lines
23 KiB
Plaintext
442 lines
23 KiB
Plaintext
Filename: 114-distributed-storage.txt
|
|
Title: Distributed Storage for Tor Hidden Service Descriptors
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Karsten Loesing
|
|
Created: 13-May-2007
|
|
Status: Closed
|
|
Implemented-In: 0.2.0.x
|
|
|
|
Change history:
|
|
|
|
13-May-2007 Initial proposal
|
|
14-May-2007 Added changes suggested by Lasse Øverlier
|
|
30-May-2007 Changed descriptor format, key length discussion, typos
|
|
09-Jul-2007 Incorporated suggestions by Roger, added status of specification
|
|
and implementation for upcoming GSoC mid-term evaluation
|
|
11-Aug-2007 Updated implementation statuses, included non-consecutive
|
|
replication to descriptor format
|
|
20-Aug-2007 Renamed config option HSDir as HidServDirectoryV2
|
|
02-Dec-2007 Closed proposal
|
|
|
|
Overview:
|
|
|
|
The basic idea of this proposal is to distribute the tasks of storing and
|
|
serving hidden service descriptors from currently three authoritative
|
|
directory nodes among a large subset of all onion routers. The three
|
|
reasons to do this are better robustness (availability), better
|
|
scalability, and improved security properties. Further,
|
|
this proposal suggests changes to the hidden service descriptor format to
|
|
prevent new security threats coming from decentralization and to gain even
|
|
better security properties.
|
|
|
|
Status:
|
|
|
|
As of December 2007, the new hidden service descriptor format is implemented
|
|
and usable. However, servers and clients do not yet make use of descriptor
|
|
cookies, because there are open usability issues of this feature that might
|
|
be resolved in proposal 121. Further, hidden service directories do not
|
|
perform replication by themselves, because (unauthorized) replica fetch
|
|
requests would allow any attacker to fetch all hidden service descriptors in
|
|
the system. As neither issue is critical to the functioning of v2
|
|
descriptors and their distribution, this proposal is considered as Closed.
|
|
|
|
Motivation:
|
|
|
|
The current design of hidden services exhibits the following performance and
|
|
security problems:
|
|
|
|
First, the three hidden service authoritative directories constitute a
|
|
performance bottleneck in the system. The directory nodes are responsible for
|
|
storing and serving all hidden service descriptors. As of May 2007 there are
|
|
about 1000 descriptors at a time, but this number is assumed to increase in
|
|
the future. Further, there is no replication protocol for descriptors between
|
|
the three directory nodes, so that hidden services must ensure the
|
|
availability of their descriptors by manually publishing them on all
|
|
directory nodes. Whenever a fourth or fifth hidden service authoritative
|
|
directory is added, hidden services will need to maintain an equally
|
|
increasing number of replicas. These scalability issues have an impact on the
|
|
current usage of hidden services and put an even higher burden on the
|
|
development of new kinds of applications for hidden services that might
|
|
require storing even more descriptors.
|
|
|
|
Second, besides posing a limitation to scalability, storing all hidden
|
|
service descriptors on three directory nodes also constitutes a security
|
|
risk. The directory node operators could easily analyze the publish and fetch
|
|
requests to derive information on service activity and usage and read the
|
|
descriptor contents to determine which onion routers work as introduction
|
|
points for a given hidden service and need to be attacked or threatened to
|
|
shut it down. Furthermore, the contents of a hidden service descriptor offer
|
|
only minimal security properties to the hidden service. Whoever gets aware of
|
|
the service ID can easily find out whether the service is active at the
|
|
moment and which introduction points it has. This applies to (former)
|
|
clients, (former) introduction points, and of course to the directory nodes.
|
|
It requires only to request the descriptor for the given service ID, which
|
|
can be performed by anyone anonymously.
|
|
|
|
This proposal suggests two major changes to approach the described
|
|
performance and security problems:
|
|
|
|
The first change affects the storage location for hidden service descriptors.
|
|
Descriptors are distributed among a large subset of all onion routers instead
|
|
of three fixed directory nodes. Each storing node is responsible for a subset
|
|
of descriptors for a limited time only. It is not able to choose which
|
|
descriptors it stores at a certain time, because this is determined by its
|
|
onion ID which is hard to change frequently and in time (only routers which
|
|
are stable for a given time are accepted as storing nodes). In order to
|
|
resist single node failures and untrustworthy nodes, descriptors are
|
|
replicated among a certain number of storing nodes. A first replication
|
|
protocol makes sure that descriptors don't get lost when the node population
|
|
changes; therefore, a storing node periodically requests the descriptors from
|
|
its siblings. A second replication protocol distributes descriptors among
|
|
non-consecutive nodes of the ID ring to prevent a group of adversaries from
|
|
generating new onion keys until they have consecutive IDs to create a 'black
|
|
hole' in the ring and make random services unavailable. Connections to
|
|
storing nodes are established by extending existing circuits by one hop to
|
|
the storing node. This also ensures that contents are encrypted. The effect
|
|
of this first change is that the probability that a single node operator
|
|
learns about a certain hidden service is very small and that it is very hard
|
|
to track a service over time, even when it collaborates with other node
|
|
operators.
|
|
|
|
The second change concerns the content of hidden service descriptors.
|
|
Obviously, security problems cannot be solved only by decentralizing storage;
|
|
in fact, they could also get worse if done without caution. At first, a
|
|
descriptor ID needs to change periodically in order to be stored on changing
|
|
nodes over time. Next, the descriptor ID needs to be computable only for the
|
|
service's clients, but should be unpredictable for all other nodes. Further,
|
|
the storing node needs to be able to verify that the hidden service is the
|
|
true originator of the descriptor with the given ID even though it is not a
|
|
client. Finally, a storing node should learn as little information as
|
|
necessary by storing a descriptor, because it might not be as trustworthy as
|
|
a directory node; for example it does not need to know the list of
|
|
introduction points. Therefore, a second key is applied that is only known to
|
|
the hidden service provider and its clients and that is not included in the
|
|
descriptor. It is used to calculate descriptor IDs and to encrypt the
|
|
introduction points. This second key can either be given to all clients
|
|
together with the hidden service ID, or to a group or a single client as
|
|
an authentication token. In the future this second key could be the result of
|
|
some key agreement protocol between the hidden service and one or more
|
|
clients. A new text-based format is proposed for descriptors instead of an
|
|
extension of the existing binary format for reasons of future extensibility.
|
|
|
|
Design:
|
|
|
|
The proposed design is described by the required changes to the current
|
|
design. These requirements are grouped by content, rather than by affected
|
|
specification documents or code files, and numbered for reference below.
|
|
|
|
Hidden service clients, servers, and directories:
|
|
|
|
/1/ Create routing list
|
|
|
|
All participants can filter the consensus status document received from the
|
|
directory authorities to one routing list containing only those servers
|
|
that store and serve hidden service descriptors and which are running for
|
|
at least 24 hours. A participant only trusts its own routing list and never
|
|
learns about routing information from other parties.
|
|
|
|
/2/ Determine responsible hidden service directory
|
|
|
|
All participants can determine the hidden service directory that is
|
|
responsible for storing and serving a given ID, as well as the hidden
|
|
service directories that replicate its content. Every hidden service
|
|
directory is responsible for the descriptor IDs in the interval from
|
|
its predecessor, exclusive, to its own ID, inclusive. Further, a hidden
|
|
service directory holds replicas for its n predecessors, where n denotes
|
|
the number of consecutive replicas. (requires /1/)
|
|
|
|
[/3/ and /4/ were requirements to use BEGIN_DIR cells for directory
|
|
requests which have not been fulfilled in the course of the implementation
|
|
of this proposal, but elsewhere.]
|
|
|
|
Hidden service directory nodes:
|
|
|
|
/5/ Advertise hidden service directory functionality
|
|
|
|
Every onion router that has its directory port open can decide whether it
|
|
wants to store and serve hidden service descriptors by setting a new config
|
|
option "HidServDirectoryV2" 0|1 to 1. An onion router with this config
|
|
option being set includes the flag "hidden-service-dir" in its router
|
|
descriptors that it sends to directory authorities.
|
|
|
|
/6/ Accept v2 publish requests, parse and store v2 descriptors
|
|
|
|
Hidden service directory nodes accept publish requests for hidden service
|
|
descriptors and store them to their local memory. (It is not necessary to
|
|
make descriptors persistent, because after disconnecting, the onion router
|
|
would not be accepted as storing node anyway, because it has not been
|
|
running for at least 24 hours.) All requests and replies are formatted as
|
|
HTTP messages. Requests are directed to the router's directory port and are
|
|
contained within BEGIN_DIR cells. A hidden service directory node stores a
|
|
descriptor only when it thinks that it is responsible for storing that
|
|
descriptor based on its own routing table. Every hidden service directory
|
|
node is responsible for the descriptor IDs in the interval of its n-th
|
|
predecessor in the ID circle up to its own ID (n denotes the number of
|
|
consecutive replicas). (requires /1/)
|
|
|
|
/7/ Accept v2 fetch requests
|
|
|
|
Same as /6/, but with fetch requests for hidden service descriptors.
|
|
(requires /2/)
|
|
|
|
/8/ Replicate descriptors with neighbors
|
|
|
|
A hidden service directory node replicates descriptors from its two
|
|
predecessors by downloading them once an hour. Further, it checks its
|
|
routing table periodically for changes. Whenever it realizes that a
|
|
predecessor has left the network, it establishes a connection to the new
|
|
n-th predecessor and requests its stored descriptors in the interval of its
|
|
(n+1)-th predecessor and the requested n-th predecessor. Whenever it
|
|
realizes that a new onion router has joined with an ID higher than its
|
|
former n-th predecessor, it adds it to its predecessors and discards all
|
|
descriptors in the interval of its (n+1)-th and its n-th predecessor.
|
|
(requires /1/)
|
|
|
|
[Dec 02: This function has not been implemented, because arbitrary nodes
|
|
what have been able to download the entire set of v2 descriptors. An
|
|
authorized replication request would be necessary. For the moment, the
|
|
system runs without any directory-side replication. -KL]
|
|
|
|
Authoritative directory nodes:
|
|
|
|
/9/ Confirm a router's hidden service directory functionality
|
|
|
|
Directory nodes include a new flag "HSDir" for routers that decided to
|
|
provide storage for hidden service descriptors and that are running for at
|
|
least 24 hours. The last requirement prevents a node from frequently
|
|
changing its onion key to become responsible for an identifier it wants to
|
|
target.
|
|
|
|
Hidden service provider:
|
|
|
|
/10/ Configure v2 hidden service
|
|
|
|
Each hidden service provider that has set the config option
|
|
"PublishV2HidServDescriptors" 0|1 to 1 is configured to publish v2
|
|
descriptors and conform to the v2 connection establishment protocol. When
|
|
configuring a hidden service, a hidden service provider checks if it has
|
|
already created a random secret_cookie and a hostname2 file; if not, it
|
|
creates both of them. (requires /2/)
|
|
|
|
/11/ Establish introduction points with fresh key
|
|
|
|
If configured to publish only v2 descriptors and no v0/v1 descriptors any
|
|
more, a hidden service provider that is setting up the hidden service at
|
|
introduction points does not pass its own public key, but the public key
|
|
of a freshly generated key pair. It also includes these fresh public keys
|
|
in the hidden service descriptor together with the other introduction point
|
|
information. The reason is that the introduction point does not need to and
|
|
therefore should not know for which hidden service it works, so as to
|
|
prevent it from tracking the hidden service's activity. (If a hidden
|
|
service provider supports both, v0/v1 and v2 descriptors, v0/v1 clients
|
|
rely on the fact that all introduction points accept the same public key,
|
|
so that this new feature cannot be used.)
|
|
|
|
/12/ Encode v2 descriptors and send v2 publish requests
|
|
|
|
If configured to publish v2 descriptors, a hidden service provider
|
|
publishes a new descriptor whenever its content changes or a new
|
|
publication period starts for this descriptor. If the current publication
|
|
period would only last for less than 60 minutes (= 2 x 30 minutes to allow
|
|
the server to be 30 minutes behind and the client 30 minutes ahead), the
|
|
hidden service provider publishes both a current descriptor and one for
|
|
the next period. Publication is performed by sending the descriptor to all
|
|
hidden service directories that are responsible for keeping replicas for
|
|
the descriptor ID. This includes two non-consecutive replicas that are
|
|
stored at 3 consecutive nodes each. (requires /1/ and /2/)
|
|
|
|
Hidden service client:
|
|
|
|
/13/ Send v2 fetch requests
|
|
|
|
A hidden service client that has set the config option
|
|
"FetchV2HidServDescriptors" 0|1 to 1 handles SOCKS requests for v2 onion
|
|
addresses by requesting a v2 descriptor from a randomly chosen hidden
|
|
service directory that is responsible for keeping replica for the
|
|
descriptor ID. In total there are six replicas of which the first and the
|
|
last three are stored on consecutive nodes. The probability of picking one
|
|
of the three consecutive replicas is 1/6, 2/6, and 3/6 to incorporate the
|
|
fact that the availability will be the highest on the node with next higher
|
|
ID. A hidden service client relies on the hidden service provider to store
|
|
two sets of descriptors to compensate clock skew between service and
|
|
client. (requires /1/ and /2/)
|
|
|
|
/14/ Process v2 fetch reply and parse v2 descriptors
|
|
|
|
A hidden service client that has sent a request for a v2 descriptor can
|
|
parse it and store it to the local cache of rendezvous service descriptors.
|
|
|
|
/15/ Establish connection to v2 hidden service
|
|
|
|
A hidden service client can establish a connection to a hidden service
|
|
using a v2 descriptor. This includes using the secret cookie for decrypting
|
|
the introduction points contained in the descriptor. When contacting an
|
|
introduction point, the client does not use the public key of the hidden
|
|
service provider, but the freshly-generated public key that is included in
|
|
the hidden service descriptor. Whether or not a fresh key is used instead
|
|
of the key of the hidden service depends on the available protocol versions
|
|
that are included in the descriptor; by this, connection establishment is
|
|
to a certain extend decoupled from fetching the descriptor.
|
|
|
|
Hidden service descriptor:
|
|
|
|
(Requirements concerning the descriptor format are contained in /6/ and /7/.)
|
|
|
|
The new v2 hidden service descriptor format looks like this:
|
|
|
|
onion-address = h(public-key) + cookie
|
|
descriptor-id = h(h(public-key) + h(time-period + cookie + relica))
|
|
descriptor-content = {
|
|
descriptor-id,
|
|
version,
|
|
public-key,
|
|
h(time-period + cookie + replica),
|
|
timestamp,
|
|
protocol-versions,
|
|
{ introduction-points } encrypted with cookie
|
|
} signed with private-key
|
|
|
|
The "descriptor-id" needs to change periodically in order for the
|
|
descriptor to be stored on changing nodes over time. It may only be
|
|
computable by a hidden service provider and all of his clients to prevent
|
|
unauthorized nodes from tracking the service activity by periodically
|
|
checking whether there is a descriptor for this service. Finally, the
|
|
hidden service directory needs to be able to verify that the hidden service
|
|
provider is the true originator of the descriptor with the given ID.
|
|
|
|
Therefore, "descriptor-id" is derived from the "public-key" of the hidden
|
|
service provider, the current "time-period" which changes every 24 hours,
|
|
a secret "cookie" shared between hidden service provider and clients, and
|
|
a "replica" denoting the number of this non-consecutive replica. (The
|
|
"time-period" is constructed in a way that time periods do not change at
|
|
the same moment for all descriptors by deriving a value between 0:00 and
|
|
23:59 hours from h(public-key) and making the descriptors of this hidden
|
|
service provider expire at that time of the day.) The "descriptor-id" is
|
|
defined to be 160 bits long. [extending the "descriptor-id" length
|
|
suggested by LØ]
|
|
|
|
Only the hidden service provider and the clients are able to generate
|
|
future "descriptor-ID"s. Hence, the "onion-address" is extended from now
|
|
the hash value of "public-key" by the secret "cookie". The "public-key" is
|
|
determined to be 80 bits long, whereas the "cookie" is dimensioned to be
|
|
120 bits long. This makes a total of 200 bits or 40 base32 chars, which is
|
|
quite a lot to handle for a human, but necessary to provide sufficient
|
|
protection against an adversary from generating a key pair with same
|
|
"public-key" hash or guessing the "cookie".
|
|
|
|
A hidden service directory can verify that a descriptor was created by the
|
|
hidden service provider by checking if the "descriptor-id" corresponds to
|
|
the "public-key" and if the signature can be verified with the
|
|
"public-key".
|
|
|
|
The "introduction-points" that are included in the descriptor are encrypted
|
|
using the same "cookie" that is shared between hidden service provider and
|
|
clients. [correction to use another key than h(time-period + cookie) as
|
|
encryption key for introduction points made by LØ]
|
|
|
|
A new text-based format is proposed for descriptors instead of an extension
|
|
of the existing binary format for reasons of future extensibility.
|
|
|
|
Security implications:
|
|
|
|
The security implications of the proposed changes are grouped by the roles of
|
|
nodes that could perform attacks or on which attacks could be performed.
|
|
|
|
Attacks by authoritative directory nodes
|
|
|
|
Authoritative directory nodes are no longer the single places in the
|
|
network that know about a hidden service's activity and introduction
|
|
points. Thus, they cannot perform attacks using this information, e.g.
|
|
track a hidden service's activity or usage pattern or attack its
|
|
introduction points. Formerly, it would only require a single corrupted
|
|
authoritative directory operator to perform such an attack.
|
|
|
|
Attacks by hidden service directory nodes
|
|
|
|
A hidden service directory node could misuse a stored descriptor to track a
|
|
hidden service's activity and usage pattern by clients. Though there is no
|
|
countermeasure against this kind of attack, it is very expensive to track a
|
|
certain hidden service over time. An attacker would need to run a large
|
|
number of stable onion routers that work as hidden service directory nodes
|
|
to have a good probability to become responsible for its changing
|
|
descriptor IDs. For each period, the probability is:
|
|
|
|
1-(N-c choose r)/(N choose r) for N-c>=r and 1 otherwise, with N
|
|
as total
|
|
number of hidden service directories, c as compromised nodes, and r as
|
|
number of replicas
|
|
|
|
The hidden service directory nodes could try to make a certain hidden
|
|
service unavailable to its clients. Therefore, they could discard all
|
|
stored descriptors for that hidden service and reply to clients that there
|
|
is no descriptor for the given ID or return an old or false descriptor
|
|
content. The client would detect a false descriptor, because it could not
|
|
contain a correct signature. But an old content or an empty reply could
|
|
confuse the client. Therefore, the countermeasure is to replicate
|
|
descriptors among a small number of hidden service directories, e.g. 5.
|
|
The probability of a group of collaborating nodes to make a hidden service
|
|
completely unavailable is in each period:
|
|
|
|
(c choose r)/(N choose r) for c>=r and N>=r, and 0 otherwise,
|
|
with N as total
|
|
number of hidden service directories, c as compromised nodes, and r as
|
|
number of replicas
|
|
|
|
A hidden service directory could try to find out which introduction points
|
|
are working on behalf of a hidden service. In contrast to the previous
|
|
design, this is not possible anymore, because this information is encrypted
|
|
to the clients of a hidden service.
|
|
|
|
Attacks on hidden service directory nodes
|
|
|
|
An anonymous attacker could try to swamp a hidden service directory with
|
|
false descriptors for a given descriptor ID. This is prevented by requiring
|
|
that descriptors are signed.
|
|
|
|
Anonymous attackers could swamp a hidden service directory with correct
|
|
descriptors for non-existing hidden services. There is no countermeasure
|
|
against this attack. However, the creation of valid descriptors is more
|
|
expensive than verification and storage in local memory. This should make
|
|
this kind of attack unattractive.
|
|
|
|
Attacks by introduction points
|
|
|
|
Current or former introduction points could try to gain information on the
|
|
hidden service they serve. But due to the fresh key pair that is used by
|
|
the hidden service, this attack is not possible anymore.
|
|
|
|
Attacks by clients
|
|
|
|
Current or former clients could track a hidden service's activity, attack
|
|
its introduction points, or determine the responsible hidden service
|
|
directory nodes and attack them. There is nothing that could prevent them
|
|
from doing so, because honest clients need the full descriptor content to
|
|
establish a connection to the hidden service. At the moment, the only
|
|
countermeasure against dishonest clients is to change the secret cookie and
|
|
pass it only to the honest clients.
|
|
|
|
Compatibility:
|
|
|
|
The proposed design is meant to replace the current design for hidden service
|
|
descriptors and their storage in the long run.
|
|
|
|
There should be a first transition phase in which both, the current design
|
|
and the proposed design are served in parallel. Onion routers should start
|
|
serving as hidden service directories, and hidden service providers and
|
|
clients should make use of the new design if both sides support it. Hidden
|
|
service providers should be allowed to publish descriptors of the current
|
|
format in parallel, and authoritative directories should continue storing and
|
|
serving these descriptors.
|
|
|
|
After the first transition phase, hidden service providers should stop
|
|
publishing descriptors on authoritative directories, and hidden service
|
|
clients should not try to fetch descriptors from the authoritative
|
|
directories. However, the authoritative directories should continue serving
|
|
hidden service descriptors for a second transition phase. As of this point,
|
|
all v2 config options should be set to a default value of 1.
|
|
|
|
After the second transition phase, the authoritative directories should stop
|
|
serving hidden service descriptors.
|
|
|