mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-09-20 21:16:22 +02:00
that's your plan, ray? get her?
more work on the discovery section. svn:r8923
This commit is contained in:
parent
10f58f25fc
commit
df183bb75e
@ -88,6 +88,10 @@ leveraged for a new blocking-resistant design; Section~\ref{sec:related}
|
||||
explains the features and drawbacks of the currently deployed solutions;
|
||||
and ...
|
||||
|
||||
% The other motivation is for places where we're concerned they will
|
||||
% try to enumerate a list of Tor users. So even if they're not blocking
|
||||
% the Tor network, it may be smart to not be visible as connecting to it.
|
||||
|
||||
%And adding more different classes of users and goals to the Tor network
|
||||
%improves the anonymity for all Tor users~\cite{econymics,usability:weis2006}.
|
||||
|
||||
@ -497,18 +501,19 @@ to get more relay addresses, and to distribute them to users differently.
|
||||
|
||||
\subsection{Bridge relays}
|
||||
|
||||
Today, Tor servers operate on less than a thousand distinct IP; an adversary
|
||||
Today, Tor servers operate on less than a thousand distinct IP addresses;
|
||||
an adversary
|
||||
could enumerate and block them all with little trouble. To provide a
|
||||
means of ingress to the network, we need a larger set of entry points, most
|
||||
of which an adversary won't be able to enumerate easily. Fortunately, we
|
||||
have such a set: the Tor userbase.
|
||||
have such a set: the Tor users.
|
||||
|
||||
Hundreds of thousands of people around the world use Tor. We can leverage
|
||||
our already self-selected user base to produce a list of thousands of
|
||||
often-changing IP addresses. Specifically, we can give them a little
|
||||
button in the GUI that says ``Tor for Freedom'', and users who click
|
||||
the button will turn into \emph{bridge relays}, or just \emph{bridges}
|
||||
for short. They can rate limit relayed connections to 10 KB/s (almost
|
||||
the button will turn into \emph{bridge relays} (or just \emph{bridges}
|
||||
for short). They can rate limit relayed connections to 10 KB/s (almost
|
||||
nothing for a broadband user in a free country, but plenty for a user
|
||||
who otherwise has no access at all), and since they are just relaying
|
||||
bytes back and forth between blocked users and the main Tor network, they
|
||||
@ -537,13 +542,13 @@ bridge directory authorities.
|
||||
|
||||
The main difference between bridge authorities and the directory
|
||||
authorities for the main Tor network is that the main authorities provide
|
||||
out a list of every known relay, but the bridge authorities only give
|
||||
a list of every known relay, but the bridge authorities only give
|
||||
out a server descriptor if you already know its identity key. That is,
|
||||
you can keep up-to-date on a bridge's location and other information
|
||||
once you know about it, but you can't just grab a list of all the bridges.
|
||||
|
||||
The identity keys, IP address, and directory port for the bridge
|
||||
authorities ship by default with the Tor software, so the bridge relays
|
||||
The identity key, IP address, and directory port for each bridge
|
||||
authority ship by default with the Tor software, so the bridge relays
|
||||
can be confident they're publishing to the right location, and the
|
||||
blocked users can establish an encrypted authenticated channel. See
|
||||
Section~\ref{subsec:trust-chain} for more discussion of the public key
|
||||
@ -551,8 +556,8 @@ infrastructure and trust chain.
|
||||
|
||||
Bridges use Tor to publish their descriptors privately and securely,
|
||||
so even an attacker monitoring the bridge directory authority's network
|
||||
can't make a list of all the addresses contacting the authority and
|
||||
track them that way. Bridges may publish to only a subset of the
|
||||
can't make a list of all the addresses contacting the authority.
|
||||
Bridges may publish to only a subset of the
|
||||
authorities, to limit the potential impact of an authority compromise.
|
||||
|
||||
|
||||
@ -666,7 +671,7 @@ Note that, unlike many settings, the reputation problem should not be
|
||||
hard here. If a bridge says it is blocked, then it might as well be.
|
||||
If an adversary can say that the bridge is blocked wrt
|
||||
$\mathcal{censor}_i$, then it might as well be, since
|
||||
$\mathcal{censor}_i$ can presumaby then block that bridge if it so
|
||||
$\mathcal{censor}_i$ can presumably then block that bridge if it so
|
||||
chooses.
|
||||
|
||||
11. How much damage can the adversary do by running nodes in the Tor
|
||||
@ -718,10 +723,9 @@ be most useful, because clients behind standard firewalls will have
|
||||
the best chance to reach them. Is this the best choice in all cases,
|
||||
or should we encourage some fraction of them pick random ports, or other
|
||||
ports commonly permitted through firewalls like 53 (DNS) or 110
|
||||
(POP)? Or perhaps we should use a port where TLS traffic is expected, like
|
||||
443 (HTTPS), 993 (IMAPS), or 995 (POP3S). We need
|
||||
more research on our potential users, and their current and anticipated
|
||||
firewall restrictions.
|
||||
(POP)? Or perhaps we should use other ports where TLS traffic is
|
||||
expected, like 993 (IMAPS) or 995 (POP3S). We need more research on our
|
||||
potential users, and their current and anticipated firewall restrictions.
|
||||
|
||||
Furthermore, we need to look at the specifics of Tor's TLS handshake.
|
||||
Right now Tor uses some predictable strings in its TLS handshakes. For
|
||||
@ -762,11 +766,14 @@ variety of protocols, and we'll want to automatically handle web browsing
|
||||
differently from, say, instant messaging.
|
||||
|
||||
% Tor cells are 512 bytes each. So TLS records will be roughly
|
||||
% multiples of this size? How bad is this?
|
||||
% multiples of this size? How bad is this? -RD
|
||||
% Look at ``Inferring the Source of Encrypted HTTP Connections''
|
||||
% by Marc Liberatore and Brian Neil Levine (CCS 2006)
|
||||
% They substantially flesh out the numbers for the web fingerprinting
|
||||
% attack.
|
||||
% attack. -PS
|
||||
% Yes, but I meant detecting the signature of Tor traffic itself, not
|
||||
% learning what websites we're going to. I wouldn't be surprised to
|
||||
% learn that these are related problems, but it's not obvious to me. -RD
|
||||
|
||||
\subsection{Identity keys as part of addressing information}
|
||||
|
||||
@ -811,29 +818,29 @@ unfortunate fact is that we have no magic bullet for discovery. We're
|
||||
in the same arms race as all the other designs we described in
|
||||
Section~\ref{sec:related}.
|
||||
|
||||
In this section we describe four approaches to adding discovery
|
||||
components for our design, in order of increasing complexity. Note that
|
||||
we can deploy all four schemes at once---bridges and blocked users can
|
||||
use the discovery approach that is most appropriate for their situation.
|
||||
In this section we describe three approaches to adding discovery
|
||||
components for our design. Note that we should deploy all the schemes
|
||||
at once---bridges and blocked users can then use the discovery approach
|
||||
that is most appropriate for their situation.
|
||||
|
||||
\subsection{Independent bridges, no central discovery}
|
||||
|
||||
The first design is simply to have no centralized discovery component at
|
||||
all. Volunteers run bridges, and we assume they have some blocked users
|
||||
in mind and communicate their address information to them out-of-band
|
||||
(for example, through gmail). This design allows for small personal
|
||||
(for example, through Gmail). This design allows for small personal
|
||||
bridges that have only one or a handful of users in mind, but it can
|
||||
also support an entire community of users. For example, Citizen Lab's
|
||||
upcoming Psiphon single-hop proxy tool~\cite{psiphon} plans to use this
|
||||
\emph{social network} approach as its discovery component.
|
||||
|
||||
There are some variations on bootstrapping in this design. In the simple
|
||||
There are several ways to do bootstrapping in this design. In the simple
|
||||
case, the operator of the bridge informs each chosen user about his
|
||||
bridge's address information and/or keys. A different approach involves
|
||||
blocked users introducing new blocked users to the bridges they know.
|
||||
That is, somebody in the blocked area can pass along a bridge's address to
|
||||
somebody else they trust. This scheme brings in appealing but complex game
|
||||
theory properties: the blocked user making the decision has an incentive
|
||||
theoretic properties: the blocked user making the decision has an incentive
|
||||
only to delegate to trustworthy people, since an adversary who learns
|
||||
the bridge's address and filters it makes it unavailable for both of them.
|
||||
|
||||
@ -860,30 +867,143 @@ recommended bridges from any of the working bridges. Now the client can
|
||||
learn new additions to the bridge pool, and can expire abandoned bridges
|
||||
or bridges that the adversary has blocked, without the user ever needing
|
||||
to care. To simplify maintenance of the community's bridge pool, each
|
||||
community could run its own bridge directory authority---accessed via
|
||||
the available bridges, or mirrored at each bridge.
|
||||
community could run its own bridge directory authority---reachable via
|
||||
the available bridges, and also mirrored at each bridge.
|
||||
|
||||
\subsection{Social networks with directory-side support}
|
||||
\subsection{Public bridges with central discovery}
|
||||
|
||||
What about people who want to volunteer as bridges but don't know any
|
||||
suitable blocked users? What about people who are blocked but don't
|
||||
know anybody on the outside? Here we describe a way to make use of these
|
||||
\emph{public bridges} in a way that still makes it hard for the attacker
|
||||
to learn all of them.
|
||||
|
||||
The basic idea is to divide public bridges into a set of buckets based on
|
||||
identity key, where each bucket has a different policy for distributing
|
||||
its bridge addresses to users. Each of these \emph{distribution policies}
|
||||
is designed to exercise a different scarce resource or property of
|
||||
the user.
|
||||
|
||||
How do we divide bridges into buckets such that they're evenly distributed
|
||||
and the allocation is hard to influence or predict, but also in a way
|
||||
that's amenable to creating more buckets later on without reshuffling
|
||||
all the bridges? We compute the bucket for a given bridge by hashing the
|
||||
bridge's identity key along with a secret that only the bridge authority
|
||||
knows: the first $n$ bits of this hash dictate the bucket number,
|
||||
where $n$ is a parameter that describes how many buckets we want at this
|
||||
point. We choose $n=3$ to start, so we have 8 buckets available; but as
|
||||
we later invent new distribution policies, we can increment $n$ to split
|
||||
the 8 into 16 buckets. Since a bridge can't predict the next bit in its
|
||||
hash, it can't anticipate which identity key will correspond to a certain
|
||||
bucket when the buckets are split. Further, since the bridge authority
|
||||
doesn't provide any feedback to the bridge about which bucket it's in,
|
||||
an adversary signing up bridges to fill a certain bucket will be slowed.
|
||||
|
||||
% This algorithm is not ideal. When we split buckets, each existing
|
||||
% bucket is cut in half, where half the bridges remain with the
|
||||
% old distribution policy, and half will be under what the new one
|
||||
% is. So the new distribution policy inherits a bunch of blocked
|
||||
% bridges if the old policy was too loose, or a bunch of unblocked
|
||||
% bridges if its policy was still secure. -RD
|
||||
|
||||
The first distribution policy (used for the first bucket) publishes bridge
|
||||
addresses in a time-release fashion. The bridge authority divides the
|
||||
available bridges into partitions which are deterministically available
|
||||
only in certain time windows. That is, over the course of a given time
|
||||
slot (say, an hour), each requestor is given a random bridge from within
|
||||
that partition. When the next time slot arrives, a new set of bridges
|
||||
are available for discovery. Thus a bridge is always available when a new
|
||||
user arrives, but to learn about all bridges the attacker needs to fetch
|
||||
the new addresses at every new time slot. By varying the length of the
|
||||
time slots, we can make it harder for the attacker to guess when to check
|
||||
back. We expect these bridges will be the first to be blocked, but they'll
|
||||
help the system bootstrap until they \emph{do} get blocked. Further,
|
||||
remember that we're dealing with different blocking regimes around the
|
||||
world that will progress at different rates---so this bucket will still
|
||||
be useful to some users even as the arms race progresses.
|
||||
|
||||
The second distribution policy publishes bridge addresses based on the IP
|
||||
address of the requesting user. Specifically, the bridge authority will
|
||||
divide the available bridges in the bucket into a bunch of partitions
|
||||
(as in the first distribution scheme), hash the requestor's IP address
|
||||
with a secret of its own (as in the above allocation scheme for creating
|
||||
buckets), and give the requestor a random bridge from the appropriate
|
||||
partition. To raise the bar, we should discard the last octet of the
|
||||
IP address before inputting it to the hash function, so an attacker
|
||||
who only controls a ``/24'' address only counts as one user. A large
|
||||
attacker like China will still be able to control many addresses, but
|
||||
the hassle of needing to establish connections from each network (or
|
||||
spoof TCP connections) may still slow them down. (We could also imagine
|
||||
a policy that combines the time-based and location-based policies to
|
||||
further constrain and rate-limit the available bridge addresses.)
|
||||
|
||||
The third policy is based on Circumventor's discovery strategy. Realizing
|
||||
that its adoption will remain limited without some central coordination
|
||||
mechanism, the Circumventor project has started a mailing list to
|
||||
distribute new proxy addresses every few days. From experimentation it
|
||||
seems they have concluded that sending updates every three or four days
|
||||
is sufficient to stay ahead of the current attackers. We could give out
|
||||
bridge addresses from the third bucket in a similar fashion
|
||||
|
||||
The fourth policy provides an alternative approach to a mailing list:
|
||||
users provide an email address, and receive an automated response
|
||||
listing an available bridge address. We could limit one response per
|
||||
email address. To further rate limit queries, we could require a CAPTCHA
|
||||
solution~\cite{captcha} in each case too. In fact, we wouldn't need to
|
||||
implement the CAPTCHA on our side: if we only deliver bridge addresses
|
||||
to Yahoo or GMail addresses, we can leverage the rate-limiting schemes
|
||||
that other parties already impose for account creation.
|
||||
|
||||
The fifth policy ties in
|
||||
...
|
||||
reputation system
|
||||
Pick some seeds---trusted people in the blocked area---and give
|
||||
them each a few hundred bridge addresses. Run a website next to the
|
||||
bridge authority, where they can log in (they only need persistent
|
||||
pseudonyms). Give them tokens slowly over time. They can use these
|
||||
tokens to delegate trust to other people they know. The tokens can
|
||||
be exchanged for new accounts on the website.
|
||||
|
||||
Accounts in ``good standing'' accrue new bridge addresses and new
|
||||
tokens.
|
||||
|
||||
This is great, except how do we decide that an account is in good
|
||||
standing? One answer is to measure based on whether the bridge addresses
|
||||
we give it end up blocked. But how do we decide if they get blocked?
|
||||
Other questions below too.
|
||||
\ref{sec:accounts}
|
||||
|
||||
\subsection{Public bridges, allocated in different ways}
|
||||
Buckets six through eight are held in reserve, in case our currently
|
||||
deployed tricks all fail at once---so we can adapt and move to
|
||||
new approaches quickly, and have some bridges available for the new
|
||||
schemes. (Bridges that sign up and don't get used yet may be unhappy that
|
||||
they're not being used; but this is a transient problem: if bridges are
|
||||
on by default, nobody will mind not being used yet.)
|
||||
|
||||
\subsubsection{Bootstrapping: finding your first bridge.}
|
||||
\label{subsec:first-bridge}
|
||||
How do users find their first public bridge, so they can reach the
|
||||
bridge authority to learn more?
|
||||
Most government firewalls are not perfect. That is, they allow connections to
|
||||
Google cache or some open proxy servers, or they let file-sharing traffic or
|
||||
Skype or World-of-Warcraft connections through. We assume that the
|
||||
users have some mechanism for bypassing the firewall initially.
|
||||
For users who can't use any of these techniques, hopefully they know
|
||||
a friend who can---for example, perhaps the friend already knows some
|
||||
bridge relay addresses.
|
||||
(If they can't get around it at all, then we can't help them---they
|
||||
should go meet more people.)
|
||||
|
||||
Is it useful to load balance which bridges are handed out? The above
|
||||
bucket concept makes some bridges wildly popular and others less so.
|
||||
But I guess that's the point.
|
||||
|
||||
Families of bridges: give out 4 or 8 at once, bound together.
|
||||
|
||||
\subsection{Advantages of deploying all solutions at once}
|
||||
|
||||
For once we're not in the position of the defender: we don't have to
|
||||
defend against every possible filtering scheme, we just have to defend
|
||||
against at least one.
|
||||
|
||||
public proxies. given out like circumventors. or all sorts of other rate
|
||||
limiting ways.
|
||||
|
||||
|
||||
\subsection{Remaining unsorted notes}
|
||||
@ -898,67 +1018,32 @@ number of new bridge relays an external attacker can discover.
|
||||
Going to be an arms race. Need a bag of tricks. Hard to say
|
||||
which ones will work. Don't spend them all at once.
|
||||
|
||||
\subsection{Bootstrapping: finding your first bridge}
|
||||
\label{subsec:first-bridge}
|
||||
|
||||
Most government firewalls are not perfect. They allow connections to
|
||||
Google cache or some open proxy servers, or they let file-sharing or
|
||||
Skype or World-of-Warcraft connections through.
|
||||
For users who can't use any of these techniques, hopefully they know
|
||||
a friend who can---for example, perhaps the friend already knows some
|
||||
bridge relay addresses.
|
||||
(If they can't get around it at all, then we can't help them---they
|
||||
should go meet more people.)
|
||||
|
||||
Some techniques are sufficient to get us an IP address and a port,
|
||||
and others can get us IP:port:key. Lay out some plausible options
|
||||
for how users can bootstrap into learning their first bridge.
|
||||
|
||||
Round one:
|
||||
|
||||
- the bridge authority server will hand some out.
|
||||
|
||||
- get one from your friend.
|
||||
|
||||
- send us mail with a unique account, and get an automated answer.
|
||||
|
||||
-
|
||||
|
||||
Round two:
|
||||
|
||||
- social network thing
|
||||
|
||||
attack: adversary can reconstruct your social network by learning who
|
||||
knows which bridges.
|
||||
|
||||
\subsection{Centrally-distributed personal proxies}
|
||||
|
||||
Circumventor, realizing that its adoption will remain limited if would-be
|
||||
users can't connect with volunteers, has started a mailing list to
|
||||
distribute new proxy addresses every few days. From experimentation
|
||||
it seems they have concluded that sending updates every 3 or 4 days is
|
||||
sufficient to stay ahead of the current attackers.
|
||||
|
||||
If there are many volunteer proxies and many interested users, a central
|
||||
watering hole to connect them is a natural solution. On the other hand,
|
||||
at first glance it appears that we've inherited the \emph{bad} parts of
|
||||
each of the above designs: not only do we have to attract many volunteer
|
||||
proxies, but the users also need to get to a single site that is sure
|
||||
to be blocked.
|
||||
|
||||
There are two reasons why we're in better shape. First, the users don't
|
||||
actually need to reach the watering hole directly: it can respond to
|
||||
email, for example. Second,
|
||||
|
||||
In fact, the JAP
|
||||
project~\cite{web-mix,koepsell:wpes2004} suggested an alternative approach
|
||||
to a mailing list: new users email a central address and get an automated
|
||||
response listing a proxy for them.
|
||||
While the exact details of the
|
||||
proposal are still to be worked out, the idea of giving out
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
%\section{The account / reputation system}
|
||||
\section{Social networks with directory-side support}
|
||||
\label{sec:accounts}
|
||||
|
||||
Perhaps each bridge should be known by a single bridge directory
|
||||
authority. This makes it easier to trace which users have learned about
|
||||
it, so easier to blame or reward. It also makes things more brittle,
|
||||
since loss of that authority means its bridges aren't advertised until
|
||||
they switch, and means its bridge users are sad too.
|
||||
(Need a slick hash algorithm that will map our identity key to a
|
||||
bridge authority, in a way that's sticky even when we add bridge
|
||||
directory authorities, but isn't sticky when our authority goes
|
||||
away. Does this exist?)
|
||||
|
||||
\subsection{Discovery based on social networks}
|
||||
|
||||
A token that can be exchanged at the bridge authority (assuming you
|
||||
@ -978,55 +1063,6 @@ way that a given transaction was successful or unsuccessful.
|
||||
(Lesson from designing reputation systems~\cite{rep-anon}: easy to
|
||||
reward good behavior, hard to punish bad behavior.
|
||||
|
||||
\subsection{How to allocate bridge addresses to users}
|
||||
|
||||
Hold a fraction in reserve, in case our currently deployed tricks
|
||||
all fail at once---so we can move to new approaches quickly.
|
||||
(Bridges that sign up and don't get used yet will be sad; but this
|
||||
is a transient problem---if bridges are on by default, nobody will
|
||||
mind not being used.)
|
||||
|
||||
Perhaps each bridge should be known by a single bridge directory
|
||||
authority. This makes it easier to trace which users have learned about
|
||||
it, so easier to blame or reward. It also makes things more brittle,
|
||||
since loss of that authority means its bridges aren't advertised until
|
||||
they switch, and means its bridge users are sad too.
|
||||
(Need a slick hash algorithm that will map our identity key to a
|
||||
bridge authority, in a way that's sticky even when we add bridge
|
||||
directory authorities, but isn't sticky when our authority goes
|
||||
away. Does this exist?)
|
||||
|
||||
Divide bridges into buckets based on their identity key.
|
||||
[Design question: need an algorithm to deterministically map a bridge's
|
||||
identity key into a category that isn't too gameable. Take a keyed
|
||||
hash of the identity key plus a secret the bridge authority keeps?
|
||||
An adversary signing up bridges won't easily be able to learn what
|
||||
category he's been put in, so it's slow to attack.]
|
||||
|
||||
One portion of the bridges is the public bucket. If you ask the
|
||||
bridge account server for a public bridge, it will give you a random
|
||||
one of these. We expect they'll be the first to be blocked, but they'll
|
||||
help the system bootstrap until it *does* get blocked, and remember that
|
||||
we're dealing with different blocking regimes around the world that will
|
||||
progress at different rates.
|
||||
|
||||
The generalization of the public bucket is a bucket based on the bridge
|
||||
user's IP address: you can learn a random entry only from the subbucket
|
||||
your IP address (actually, your /24) maps to.
|
||||
|
||||
Another portion of the bridges can be sectioned off to be given out in
|
||||
a time-release basis. The bucket is partitioned into pieces which are
|
||||
deterministically available only in certain time windows.
|
||||
|
||||
And of course another portion is made available for the social network
|
||||
design above.
|
||||
|
||||
Captchas.
|
||||
|
||||
Is it useful to load balance which bridges are handed out? The above
|
||||
bucket concept makes some bridges wildly popular and others less so.
|
||||
But I guess that's the point.
|
||||
|
||||
\subsection{How do we know if a bridge relay has been blocked?}
|
||||
|
||||
We need some mechanism for testing reachability from inside the
|
||||
@ -1079,12 +1115,6 @@ progress reports.
|
||||
The above geoip-based approach to detecting blocked bridges gives us a
|
||||
solution though.
|
||||
|
||||
\subsection{Advantages of deploying all solutions at once}
|
||||
|
||||
For once we're not in the position of the defender: we don't have to
|
||||
defend against every possible filtering scheme, we just have to defend
|
||||
against at least one.
|
||||
|
||||
\section{Security considerations}
|
||||
\label{sec:security}
|
||||
|
||||
@ -1397,10 +1427,6 @@ that is, if they want to block our new design, they will need to
|
||||
add a feature to block exactly this.
|
||||
strategically speaking, this may come in handy.
|
||||
|
||||
hash identity key + secret that bridge authority knows. start
|
||||
out dividing into 2^n buckets, where n starts at 0, and we choose
|
||||
which bucket you're in based on the first n bits of the hash.
|
||||
|
||||
Bridges come in clumps of 4 or 8 or whatever. If you know one bridge
|
||||
in a clump, the authority will tell you the rest. Now bridges can
|
||||
ask users to test reachability of their buddies.
|
||||
|
Loading…
Reference in New Issue
Block a user