that's your plan, ray? get her?

more work on the discovery section.


svn:r8923
This commit is contained in:
Roger Dingledine 2006-11-09 08:53:13 +00:00
parent 10f58f25fc
commit df183bb75e

View File

@ -88,6 +88,10 @@ leveraged for a new blocking-resistant design; Section~\ref{sec:related}
explains the features and drawbacks of the currently deployed solutions;
and ...
% The other motivation is for places where we're concerned they will
% try to enumerate a list of Tor users. So even if they're not blocking
% the Tor network, it may be smart to not be visible as connecting to it.
%And adding more different classes of users and goals to the Tor network
%improves the anonymity for all Tor users~\cite{econymics,usability:weis2006}.
@ -497,18 +501,19 @@ to get more relay addresses, and to distribute them to users differently.
\subsection{Bridge relays}
Today, Tor servers operate on less than a thousand distinct IP; an adversary
Today, Tor servers operate on less than a thousand distinct IP addresses;
an adversary
could enumerate and block them all with little trouble. To provide a
means of ingress to the network, we need a larger set of entry points, most
of which an adversary won't be able to enumerate easily. Fortunately, we
have such a set: the Tor userbase.
have such a set: the Tor users.
Hundreds of thousands of people around the world use Tor. We can leverage
our already self-selected user base to produce a list of thousands of
often-changing IP addresses. Specifically, we can give them a little
button in the GUI that says ``Tor for Freedom'', and users who click
the button will turn into \emph{bridge relays}, or just \emph{bridges}
for short. They can rate limit relayed connections to 10 KB/s (almost
the button will turn into \emph{bridge relays} (or just \emph{bridges}
for short). They can rate limit relayed connections to 10 KB/s (almost
nothing for a broadband user in a free country, but plenty for a user
who otherwise has no access at all), and since they are just relaying
bytes back and forth between blocked users and the main Tor network, they
@ -537,13 +542,13 @@ bridge directory authorities.
The main difference between bridge authorities and the directory
authorities for the main Tor network is that the main authorities provide
out a list of every known relay, but the bridge authorities only give
a list of every known relay, but the bridge authorities only give
out a server descriptor if you already know its identity key. That is,
you can keep up-to-date on a bridge's location and other information
once you know about it, but you can't just grab a list of all the bridges.
The identity keys, IP address, and directory port for the bridge
authorities ship by default with the Tor software, so the bridge relays
The identity key, IP address, and directory port for each bridge
authority ship by default with the Tor software, so the bridge relays
can be confident they're publishing to the right location, and the
blocked users can establish an encrypted authenticated channel. See
Section~\ref{subsec:trust-chain} for more discussion of the public key
@ -551,8 +556,8 @@ infrastructure and trust chain.
Bridges use Tor to publish their descriptors privately and securely,
so even an attacker monitoring the bridge directory authority's network
can't make a list of all the addresses contacting the authority and
track them that way. Bridges may publish to only a subset of the
can't make a list of all the addresses contacting the authority.
Bridges may publish to only a subset of the
authorities, to limit the potential impact of an authority compromise.
@ -666,7 +671,7 @@ Note that, unlike many settings, the reputation problem should not be
hard here. If a bridge says it is blocked, then it might as well be.
If an adversary can say that the bridge is blocked wrt
$\mathcal{censor}_i$, then it might as well be, since
$\mathcal{censor}_i$ can presumaby then block that bridge if it so
$\mathcal{censor}_i$ can presumably then block that bridge if it so
chooses.
11. How much damage can the adversary do by running nodes in the Tor
@ -718,10 +723,9 @@ be most useful, because clients behind standard firewalls will have
the best chance to reach them. Is this the best choice in all cases,
or should we encourage some fraction of them pick random ports, or other
ports commonly permitted through firewalls like 53 (DNS) or 110
(POP)? Or perhaps we should use a port where TLS traffic is expected, like
443 (HTTPS), 993 (IMAPS), or 995 (POP3S). We need
more research on our potential users, and their current and anticipated
firewall restrictions.
(POP)? Or perhaps we should use other ports where TLS traffic is
expected, like 993 (IMAPS) or 995 (POP3S). We need more research on our
potential users, and their current and anticipated firewall restrictions.
Furthermore, we need to look at the specifics of Tor's TLS handshake.
Right now Tor uses some predictable strings in its TLS handshakes. For
@ -762,11 +766,14 @@ variety of protocols, and we'll want to automatically handle web browsing
differently from, say, instant messaging.
% Tor cells are 512 bytes each. So TLS records will be roughly
% multiples of this size? How bad is this?
% multiples of this size? How bad is this? -RD
% Look at ``Inferring the Source of Encrypted HTTP Connections''
% by Marc Liberatore and Brian Neil Levine (CCS 2006)
% They substantially flesh out the numbers for the web fingerprinting
% attack.
% attack. -PS
% Yes, but I meant detecting the signature of Tor traffic itself, not
% learning what websites we're going to. I wouldn't be surprised to
% learn that these are related problems, but it's not obvious to me. -RD
\subsection{Identity keys as part of addressing information}
@ -811,29 +818,29 @@ unfortunate fact is that we have no magic bullet for discovery. We're
in the same arms race as all the other designs we described in
Section~\ref{sec:related}.
In this section we describe four approaches to adding discovery
components for our design, in order of increasing complexity. Note that
we can deploy all four schemes at once---bridges and blocked users can
use the discovery approach that is most appropriate for their situation.
In this section we describe three approaches to adding discovery
components for our design. Note that we should deploy all the schemes
at once---bridges and blocked users can then use the discovery approach
that is most appropriate for their situation.
\subsection{Independent bridges, no central discovery}
The first design is simply to have no centralized discovery component at
all. Volunteers run bridges, and we assume they have some blocked users
in mind and communicate their address information to them out-of-band
(for example, through gmail). This design allows for small personal
(for example, through Gmail). This design allows for small personal
bridges that have only one or a handful of users in mind, but it can
also support an entire community of users. For example, Citizen Lab's
upcoming Psiphon single-hop proxy tool~\cite{psiphon} plans to use this
\emph{social network} approach as its discovery component.
There are some variations on bootstrapping in this design. In the simple
There are several ways to do bootstrapping in this design. In the simple
case, the operator of the bridge informs each chosen user about his
bridge's address information and/or keys. A different approach involves
blocked users introducing new blocked users to the bridges they know.
That is, somebody in the blocked area can pass along a bridge's address to
somebody else they trust. This scheme brings in appealing but complex game
theory properties: the blocked user making the decision has an incentive
theoretic properties: the blocked user making the decision has an incentive
only to delegate to trustworthy people, since an adversary who learns
the bridge's address and filters it makes it unavailable for both of them.
@ -860,30 +867,143 @@ recommended bridges from any of the working bridges. Now the client can
learn new additions to the bridge pool, and can expire abandoned bridges
or bridges that the adversary has blocked, without the user ever needing
to care. To simplify maintenance of the community's bridge pool, each
community could run its own bridge directory authority---accessed via
the available bridges, or mirrored at each bridge.
community could run its own bridge directory authority---reachable via
the available bridges, and also mirrored at each bridge.
\subsection{Social networks with directory-side support}
\subsection{Public bridges with central discovery}
What about people who want to volunteer as bridges but don't know any
suitable blocked users? What about people who are blocked but don't
know anybody on the outside? Here we describe a way to make use of these
\emph{public bridges} in a way that still makes it hard for the attacker
to learn all of them.
The basic idea is to divide public bridges into a set of buckets based on
identity key, where each bucket has a different policy for distributing
its bridge addresses to users. Each of these \emph{distribution policies}
is designed to exercise a different scarce resource or property of
the user.
How do we divide bridges into buckets such that they're evenly distributed
and the allocation is hard to influence or predict, but also in a way
that's amenable to creating more buckets later on without reshuffling
all the bridges? We compute the bucket for a given bridge by hashing the
bridge's identity key along with a secret that only the bridge authority
knows: the first $n$ bits of this hash dictate the bucket number,
where $n$ is a parameter that describes how many buckets we want at this
point. We choose $n=3$ to start, so we have 8 buckets available; but as
we later invent new distribution policies, we can increment $n$ to split
the 8 into 16 buckets. Since a bridge can't predict the next bit in its
hash, it can't anticipate which identity key will correspond to a certain
bucket when the buckets are split. Further, since the bridge authority
doesn't provide any feedback to the bridge about which bucket it's in,
an adversary signing up bridges to fill a certain bucket will be slowed.
% This algorithm is not ideal. When we split buckets, each existing
% bucket is cut in half, where half the bridges remain with the
% old distribution policy, and half will be under what the new one
% is. So the new distribution policy inherits a bunch of blocked
% bridges if the old policy was too loose, or a bunch of unblocked
% bridges if its policy was still secure. -RD
The first distribution policy (used for the first bucket) publishes bridge
addresses in a time-release fashion. The bridge authority divides the
available bridges into partitions which are deterministically available
only in certain time windows. That is, over the course of a given time
slot (say, an hour), each requestor is given a random bridge from within
that partition. When the next time slot arrives, a new set of bridges
are available for discovery. Thus a bridge is always available when a new
user arrives, but to learn about all bridges the attacker needs to fetch
the new addresses at every new time slot. By varying the length of the
time slots, we can make it harder for the attacker to guess when to check
back. We expect these bridges will be the first to be blocked, but they'll
help the system bootstrap until they \emph{do} get blocked. Further,
remember that we're dealing with different blocking regimes around the
world that will progress at different rates---so this bucket will still
be useful to some users even as the arms race progresses.
The second distribution policy publishes bridge addresses based on the IP
address of the requesting user. Specifically, the bridge authority will
divide the available bridges in the bucket into a bunch of partitions
(as in the first distribution scheme), hash the requestor's IP address
with a secret of its own (as in the above allocation scheme for creating
buckets), and give the requestor a random bridge from the appropriate
partition. To raise the bar, we should discard the last octet of the
IP address before inputting it to the hash function, so an attacker
who only controls a ``/24'' address only counts as one user. A large
attacker like China will still be able to control many addresses, but
the hassle of needing to establish connections from each network (or
spoof TCP connections) may still slow them down. (We could also imagine
a policy that combines the time-based and location-based policies to
further constrain and rate-limit the available bridge addresses.)
The third policy is based on Circumventor's discovery strategy. Realizing
that its adoption will remain limited without some central coordination
mechanism, the Circumventor project has started a mailing list to
distribute new proxy addresses every few days. From experimentation it
seems they have concluded that sending updates every three or four days
is sufficient to stay ahead of the current attackers. We could give out
bridge addresses from the third bucket in a similar fashion
The fourth policy provides an alternative approach to a mailing list:
users provide an email address, and receive an automated response
listing an available bridge address. We could limit one response per
email address. To further rate limit queries, we could require a CAPTCHA
solution~\cite{captcha} in each case too. In fact, we wouldn't need to
implement the CAPTCHA on our side: if we only deliver bridge addresses
to Yahoo or GMail addresses, we can leverage the rate-limiting schemes
that other parties already impose for account creation.
The fifth policy ties in
...
reputation system
Pick some seeds---trusted people in the blocked area---and give
them each a few hundred bridge addresses. Run a website next to the
bridge authority, where they can log in (they only need persistent
pseudonyms). Give them tokens slowly over time. They can use these
tokens to delegate trust to other people they know. The tokens can
be exchanged for new accounts on the website.
Accounts in ``good standing'' accrue new bridge addresses and new
tokens.
This is great, except how do we decide that an account is in good
standing? One answer is to measure based on whether the bridge addresses
we give it end up blocked. But how do we decide if they get blocked?
Other questions below too.
\ref{sec:accounts}
\subsection{Public bridges, allocated in different ways}
Buckets six through eight are held in reserve, in case our currently
deployed tricks all fail at once---so we can adapt and move to
new approaches quickly, and have some bridges available for the new
schemes. (Bridges that sign up and don't get used yet may be unhappy that
they're not being used; but this is a transient problem: if bridges are
on by default, nobody will mind not being used yet.)
\subsubsection{Bootstrapping: finding your first bridge.}
\label{subsec:first-bridge}
How do users find their first public bridge, so they can reach the
bridge authority to learn more?
Most government firewalls are not perfect. That is, they allow connections to
Google cache or some open proxy servers, or they let file-sharing traffic or
Skype or World-of-Warcraft connections through. We assume that the
users have some mechanism for bypassing the firewall initially.
For users who can't use any of these techniques, hopefully they know
a friend who can---for example, perhaps the friend already knows some
bridge relay addresses.
(If they can't get around it at all, then we can't help them---they
should go meet more people.)
Is it useful to load balance which bridges are handed out? The above
bucket concept makes some bridges wildly popular and others less so.
But I guess that's the point.
Families of bridges: give out 4 or 8 at once, bound together.
\subsection{Advantages of deploying all solutions at once}
For once we're not in the position of the defender: we don't have to
defend against every possible filtering scheme, we just have to defend
against at least one.
public proxies. given out like circumventors. or all sorts of other rate
limiting ways.
\subsection{Remaining unsorted notes}
@ -898,67 +1018,32 @@ number of new bridge relays an external attacker can discover.
Going to be an arms race. Need a bag of tricks. Hard to say
which ones will work. Don't spend them all at once.
\subsection{Bootstrapping: finding your first bridge}
\label{subsec:first-bridge}
Most government firewalls are not perfect. They allow connections to
Google cache or some open proxy servers, or they let file-sharing or
Skype or World-of-Warcraft connections through.
For users who can't use any of these techniques, hopefully they know
a friend who can---for example, perhaps the friend already knows some
bridge relay addresses.
(If they can't get around it at all, then we can't help them---they
should go meet more people.)
Some techniques are sufficient to get us an IP address and a port,
and others can get us IP:port:key. Lay out some plausible options
for how users can bootstrap into learning their first bridge.
Round one:
- the bridge authority server will hand some out.
- get one from your friend.
- send us mail with a unique account, and get an automated answer.
-
Round two:
- social network thing
attack: adversary can reconstruct your social network by learning who
knows which bridges.
\subsection{Centrally-distributed personal proxies}
Circumventor, realizing that its adoption will remain limited if would-be
users can't connect with volunteers, has started a mailing list to
distribute new proxy addresses every few days. From experimentation
it seems they have concluded that sending updates every 3 or 4 days is
sufficient to stay ahead of the current attackers.
If there are many volunteer proxies and many interested users, a central
watering hole to connect them is a natural solution. On the other hand,
at first glance it appears that we've inherited the \emph{bad} parts of
each of the above designs: not only do we have to attract many volunteer
proxies, but the users also need to get to a single site that is sure
to be blocked.
There are two reasons why we're in better shape. First, the users don't
actually need to reach the watering hole directly: it can respond to
email, for example. Second,
In fact, the JAP
project~\cite{web-mix,koepsell:wpes2004} suggested an alternative approach
to a mailing list: new users email a central address and get an automated
response listing a proxy for them.
While the exact details of the
proposal are still to be worked out, the idea of giving out
%\section{The account / reputation system}
\section{Social networks with directory-side support}
\label{sec:accounts}
Perhaps each bridge should be known by a single bridge directory
authority. This makes it easier to trace which users have learned about
it, so easier to blame or reward. It also makes things more brittle,
since loss of that authority means its bridges aren't advertised until
they switch, and means its bridge users are sad too.
(Need a slick hash algorithm that will map our identity key to a
bridge authority, in a way that's sticky even when we add bridge
directory authorities, but isn't sticky when our authority goes
away. Does this exist?)
\subsection{Discovery based on social networks}
A token that can be exchanged at the bridge authority (assuming you
@ -978,55 +1063,6 @@ way that a given transaction was successful or unsuccessful.
(Lesson from designing reputation systems~\cite{rep-anon}: easy to
reward good behavior, hard to punish bad behavior.
\subsection{How to allocate bridge addresses to users}
Hold a fraction in reserve, in case our currently deployed tricks
all fail at once---so we can move to new approaches quickly.
(Bridges that sign up and don't get used yet will be sad; but this
is a transient problem---if bridges are on by default, nobody will
mind not being used.)
Perhaps each bridge should be known by a single bridge directory
authority. This makes it easier to trace which users have learned about
it, so easier to blame or reward. It also makes things more brittle,
since loss of that authority means its bridges aren't advertised until
they switch, and means its bridge users are sad too.
(Need a slick hash algorithm that will map our identity key to a
bridge authority, in a way that's sticky even when we add bridge
directory authorities, but isn't sticky when our authority goes
away. Does this exist?)
Divide bridges into buckets based on their identity key.
[Design question: need an algorithm to deterministically map a bridge's
identity key into a category that isn't too gameable. Take a keyed
hash of the identity key plus a secret the bridge authority keeps?
An adversary signing up bridges won't easily be able to learn what
category he's been put in, so it's slow to attack.]
One portion of the bridges is the public bucket. If you ask the
bridge account server for a public bridge, it will give you a random
one of these. We expect they'll be the first to be blocked, but they'll
help the system bootstrap until it *does* get blocked, and remember that
we're dealing with different blocking regimes around the world that will
progress at different rates.
The generalization of the public bucket is a bucket based on the bridge
user's IP address: you can learn a random entry only from the subbucket
your IP address (actually, your /24) maps to.
Another portion of the bridges can be sectioned off to be given out in
a time-release basis. The bucket is partitioned into pieces which are
deterministically available only in certain time windows.
And of course another portion is made available for the social network
design above.
Captchas.
Is it useful to load balance which bridges are handed out? The above
bucket concept makes some bridges wildly popular and others less so.
But I guess that's the point.
\subsection{How do we know if a bridge relay has been blocked?}
We need some mechanism for testing reachability from inside the
@ -1079,12 +1115,6 @@ progress reports.
The above geoip-based approach to detecting blocked bridges gives us a
solution though.
\subsection{Advantages of deploying all solutions at once}
For once we're not in the position of the defender: we don't have to
defend against every possible filtering scheme, we just have to defend
against at least one.
\section{Security considerations}
\label{sec:security}
@ -1397,10 +1427,6 @@ that is, if they want to block our new design, they will need to
add a feature to block exactly this.
strategically speaking, this may come in handy.
hash identity key + secret that bridge authority knows. start
out dividing into 2^n buckets, where n starts at 0, and we choose
which bucket you're in based on the first n bits of the hash.
Bridges come in clumps of 4 or 8 or whatever. If you know one bridge
in a clump, the authority will tell you the rest. Now bridges can
ask users to test reachability of their buddies.