mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-24 20:33:31 +01:00
bafff6362c
PathlenCoinWeight-style implementation (for fingerprinting resistance). svn:r10508
388 lines
19 KiB
Plaintext
388 lines
19 KiB
Plaintext
Filename: 115-two-hop-paths.txt
|
|
Title: Two Hop Paths
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Mike Perry
|
|
Created:
|
|
Status: Open
|
|
Supersedes: 112
|
|
|
|
|
|
Overview:
|
|
|
|
The idea is that users should be able to choose if they would like
|
|
to have either two or three hop paths through the tor network.
|
|
|
|
Let us be clear: the users who would choose this option should be
|
|
those that are concerned with IP obfuscation only: ie they would not be
|
|
targets of a resource-intensive multi-node attack. It is sometimes said
|
|
that these users should find some other network to use other than Tor.
|
|
This is a foolish suggestion: more users improves security of everyone,
|
|
and the current small userbase size is a critical hindrance to
|
|
anonymity, as is discussed below and in [1].
|
|
|
|
This value should be modifiable from the controller, and should be
|
|
available from Vidalia.
|
|
|
|
|
|
Motivation:
|
|
|
|
The Tor network is slow and overloaded. Increasingly often I hear
|
|
stories about friends and friends of friends who are behind firewalls,
|
|
annoying censorware, or under surveillance that interferes with their
|
|
productivity and Internet usage, or chills their speech. These people
|
|
know about Tor, but they choose to put up with the censorship because
|
|
Tor is too slow to be usable for them. In fact, to download a fresh,
|
|
complete copy of levine-timing.pdf for the Theoretical Argument
|
|
section of this proposal over Tor took me 3 tries.
|
|
|
|
Furthermore, the biggest current problem with Tor's anonymity for
|
|
those who really need it is not someone attacking the network to
|
|
discover who they are. It's instead the extreme danger that so few
|
|
people use Tor because it's so slow, that those who do use it have
|
|
essentially no confusion set.
|
|
|
|
The recent case where the professor and the rogue Tor user were the
|
|
only Tor users on campus, and thus suspected in an incident involving
|
|
Tor and that University underscores this point: "That was why the police
|
|
had come to see me. They told me that only two people on our campus were
|
|
using Tor: me and someone they suspected of engaging in an online scam.
|
|
The detectives wanted to know whether the other user was a former
|
|
student of mine, and why I was using Tor"[1].
|
|
|
|
Not only does Tor provide no anonymity if you use it to be anonymous
|
|
but are obviously from a certain institution, location or circumstance,
|
|
it is also dangerous to use Tor for risk of being accused of having
|
|
something significant enough to hide to be willing to put up with
|
|
the horrible performance as opposed to using some weaker alternative.
|
|
|
|
There are many ways to improve the speed problem, and of course we
|
|
should and will implement as many as we can. Johannes's GSoC project
|
|
and my reputation system are longer term, higher-effort things that
|
|
will still provide benefit independent of this proposal.
|
|
|
|
However, reducing the path length to 2 for those who do not need the
|
|
extra anonymity 3 hops provide not only improves their Tor experience
|
|
but also reduces their load on the Tor network by 33%, and should
|
|
increase adoption of Tor by a good deal. That's not just Win-Win, it's
|
|
Win-Win-Win.
|
|
|
|
|
|
Who will enable this option?
|
|
|
|
This is the crux of the proposal. Admittedly, there is some anonymity
|
|
loss and some degree of decreased investment required on the part of
|
|
the adversary to attack 2 hop users versus 3 hop users, even if it is
|
|
minimal and limited mostly to up-front costs and false positives.
|
|
|
|
The key questions are:
|
|
|
|
1. Are these users in a class such that their risk is significantly
|
|
less than the amount of this anonymity loss?
|
|
|
|
2. Are these users able to identify themselves?
|
|
|
|
Many many users of Tor are not at risk for an adversary capturing c/n
|
|
nodes of the network just to see what they do. These users use Tor to
|
|
circumvent aggressive content filters, or simply to keep their IP out of
|
|
marketing and search engine databases. Most content filters have no
|
|
interest in running Tor nodes to catch violators, and marketers
|
|
certainly would never consider such a thing, both on a cost basis and a
|
|
legal one.
|
|
|
|
In a sense, this represents an alternate threat model against these
|
|
users who are not at risk for Tor's normal threat model.
|
|
|
|
It should be evident to these users that they fall into this class. All
|
|
that should be needed is a radio button
|
|
|
|
* "I use Tor for local content filter circumvention and/or IP obfuscation,
|
|
not anonymity. Speed is more important to me than high anonymity.
|
|
No one will make considerable efforts to determine my real IP."
|
|
* "I use Tor for anonymity and/or national-level, legally enforced
|
|
censorship. It is possible effort will be taken to identify
|
|
me, including but not limited to network surveillance. I need more
|
|
protection."
|
|
|
|
and then some explanation in the help for exactly what this means, and
|
|
the risks involved with eliminating the adversary's need for timing
|
|
attacks with respect to false positives. Ultimately, the decision is a
|
|
simple one that can be made without this information, however. The user
|
|
does not need Paul Syverson to instruct them on the deep magic of Onion
|
|
Routing to make this decision. They just need to know why they use Tor.
|
|
If they use it just to stay out of marketing databases and/or bypass a
|
|
local content filter, two hops is plenty. This is likely the vast
|
|
majority of Tor users, and many non-users we would like to bring on
|
|
board.
|
|
|
|
So, having established this class of users, let us now go on to
|
|
examine theoretical and practical risks we place them at, and determine
|
|
if these risks violate the users needs, or introduce additional risk
|
|
to node operators who may be subject to requests from law enforcement
|
|
to track users who need 3 hops, but use 2 because they enjoy the
|
|
thrill of russian roulette.
|
|
|
|
|
|
Theoretical Argument:
|
|
|
|
It has long been established that timing attacks against mixed
|
|
and onion networks are extremely effective, and that regardless
|
|
of path length, if the adversary has compromised your first and
|
|
last hop of your path, you can assume they have compromised your
|
|
identity for that connection.
|
|
|
|
In fact, it was demonstrated that for all but the slowest, lossiest
|
|
networks, error rates for false positives and false negatives were
|
|
very near zero[2]. Only for constant streams of traffic over slow and
|
|
(more importantly) extremely lossy network links did the error rate
|
|
hit 20%. For loss rates typical to the Internet, even the error rate
|
|
for slow nodes with constant traffic streams was 13%.
|
|
|
|
When you take into account that most Tor streams are not constant,
|
|
but probably much more like their "HomeIP" dataset, which consists
|
|
mostly of web traffic that exists over finite intervals at specific
|
|
times, error rates drop to fractions of 1%, even for the "worst"
|
|
network nodes.
|
|
|
|
Therefore, the user has little benefit from the extra hop, assuming
|
|
the adversary does timing correlation on their nodes. Since timing
|
|
correlation is simply an implementation issue and is most likely
|
|
a single up-front cost (and one that is like quite a bit cheaper
|
|
than the cost of the machines purchased to host the nodes to mount
|
|
an attack), the real protection is the low probability of getting
|
|
both the first and last hop of a client's stream.
|
|
|
|
|
|
Practical Issues:
|
|
|
|
Theoretical issues aside, there are several practical issues with the
|
|
implementation of Tor that need to be addressed to ensure that
|
|
identity information is not leaked by the implementation.
|
|
|
|
Exit policy issues:
|
|
|
|
If a client chooses an exit with a very restrictive exit policy
|
|
(such as an IP or IP range), the first hop then knows a good deal
|
|
about the destination. For this reason, clients should not select
|
|
exits that match their destination IP with anything other than "*".
|
|
|
|
Partitioning:
|
|
|
|
Partitioning attacks form another concern. Since Tor uses telescoping
|
|
to build circuits, it is possible to tell a user is constructing only
|
|
two hop paths at the entry node and on the local network. An external
|
|
adversary can potentially differentiate 2 and 3 hop users, and decide
|
|
that all IP addresses connecting to Tor and using 3 hops have something
|
|
to hide, and should be scrutinized more closely or outright apprehended.
|
|
|
|
One solution to this is to use the "leaky-circuit" method of attaching
|
|
streams: The user always creates 3-hop circuits, but if the option
|
|
is enabled, they always exit from their 2nd hop. The ideal solution
|
|
would be to create a RELAY_SHISHKABOB cell which contains onion
|
|
skins for every host along the path, but this requires protocol
|
|
changes at the nodes to support.
|
|
|
|
Guard nodes:
|
|
|
|
Since guard nodes can rotate due to client relocation, network
|
|
failure, node upgrades and other issues, if you amortize the risk a
|
|
mobile, dialup, or otherwise intermittently connected user is exposed to
|
|
over any reasonable duration of Tor usage (on the order of a year), it
|
|
is the same with or without guard nodes. Assuming an adversary has
|
|
c%/n% of network bandwidth, and guards rotate on average with period R,
|
|
statistically speaking, it's merely a question of if the user wishes
|
|
their risk to be concentrated with probability c/n over an expected
|
|
period of R*c, and probability 0 over an expected period of R*(n-c),
|
|
versus a continuous risk of (c/n)^2. So statistically speaking, guards
|
|
only create a time-tradeoff of risk over the long run for normal Tor
|
|
usage. Rotating guards do not reduce risk for normal client usage long
|
|
term.[3]
|
|
|
|
On other other hand, assuming a more stable method of guard selection
|
|
and preservation is devised, or a more stable client side network than
|
|
my own is typical (which rotates guards frequently due to network issues
|
|
and moving about), guard nodes provide a tradeoff in the form of c/n% of
|
|
the users being "sacrificial users" who are exposed to high risk O(c/n)
|
|
of identification, while the rest of the network is exposed to zero
|
|
risk.
|
|
|
|
The nature of Tor makes it likely an adversary will take a "shock and
|
|
awe" approach to suppressing Tor by rounding up a few users whose
|
|
browsing activity has been observed to be made into examples, in an
|
|
attempt to prove that Tor is not perfect.
|
|
|
|
Since this "shock and awe" attack can be applied with or without guard
|
|
nodes, stable guard nodes do offer a measure of accountability of sorts.
|
|
If a user was using a small set of guard nodes and knows them well, and
|
|
then is suddenly apprehended as a result of Tor usage, having a fixed
|
|
set of entry points to suspect is a lot better than suspecting the whole
|
|
network. Conversely, it can also give non-apprehended users comfort
|
|
that they are likely to remain safe indefinitely with their set of (now
|
|
presumably trusted) guards. This is probably the most beneficial
|
|
property of reliable guards: they deter the adversary from mounting
|
|
"shock and awe" attacks because the surviving users will not
|
|
intimidated, but instead made more confident. Of course, guards need to
|
|
be made much more stable and users need to be encouraged to know their
|
|
guards for this property to really take effect.
|
|
|
|
This beneficial property of client vigilance also carries over to an
|
|
active adversary, except in this case instead of relying on the user
|
|
to remember their guard nodes and somehow communicate them after
|
|
apprehension, the code can alert them to the presence of an active
|
|
adversary before they are apprehended. But only if they use guard nodes.
|
|
|
|
So lets consider the active adversary: Two hop paths allow malicious
|
|
guards to get considerably more benefit from failing circuits if they do
|
|
not extend to their colluding peers for the exit hop. Since guards can
|
|
detect the number of hops in a path via either timing or by statistical
|
|
analysis of the exit policy of the 2nd hop, they can perform this attack
|
|
predominantly against 2 hop users.
|
|
|
|
This can be addressed by completely abandoning an entry guard after a
|
|
certain ratio of extend or general circuit failures with respect to
|
|
non-failed circuits. The proper value for this ratio can be determined
|
|
experimentally with TorFlow. There is the possibility that the local
|
|
network can abuse this feature to cause certain guards to be dropped,
|
|
but they can do that anyways with the current Tor by just making guards
|
|
they don't like unreachable. With this mechanism, Tor will complain
|
|
loudly if any guard failure rate exceeds the expected in any failure
|
|
case, local or remote.
|
|
|
|
Eliminating guards entirely would actually not address this issue due
|
|
to the time-tradeoff nature of risk. In fact, it would just make it
|
|
worse. Without guard nodes, it becomes much more difficult for clients
|
|
to become alerted to Tor entry points that are failing circuits to make
|
|
sure that they only devote bandwidth to carry traffic for streams which
|
|
they observe both ends. Yet the rogue entry points are still able to
|
|
significantly increase their success rates by failing circuits.
|
|
|
|
For this reason, guard nodes should remain enabled for 2 hop users,
|
|
at least until an IP-independent, undetectable guard scanner can
|
|
be created. TorFlow can scan for failing guards, but after a while,
|
|
its unique behavior gives away the fact that its IP is a scanner and
|
|
it can be given selective service.
|
|
|
|
Consideration of risks for node operators:
|
|
|
|
There is a serious risk for two hop users in the form of guard
|
|
profiling. If an adversary running an exit node notices that a
|
|
particular site is always visited from a fixed previous hop, it is
|
|
likely that this is a two hop user using a certain guard, which could be
|
|
monitored to determine their identity. Thus, for the protection of both
|
|
2 hop users and node operators, 2 hop users should limit their guard
|
|
duration to a sufficient number of days to verify reliability of a node,
|
|
but not much more. This duration can be determined experimentally by
|
|
TorFlow.
|
|
|
|
Considering a Tor client builds on average 144 circuits/day (10
|
|
minutes per circuit), if the adversary owns c/n% of exits on the
|
|
network, they can expect to see 144*c/n circuits from this user, or
|
|
about 14 minutes of usage per day per percentage of network penetration.
|
|
Since it will take several occurrences of user-linkable exit content
|
|
from the same predecessor hop for the adversary to have any confidence
|
|
this is a 2 hop user, it is very unlikely that any sort of demands made
|
|
upon the predecessor node would guaranteed to be effective (ie it
|
|
actually was a guard), let alone be executed in time to apprehend the
|
|
user before they rotated guards.
|
|
|
|
The reverse risk also warrants consideration. If a malicious guard has
|
|
orders to surveil Mike Perry, it can determine Mike Perry is using two
|
|
hops by observing his tendency to choose a 2nd hop with a viable exit
|
|
policy. This can be done relatively quickly, unfortunately, and
|
|
indicates Mike Perry should spend some of his time building real 3 hop
|
|
circuits through the same guards, to require them to at least wait for
|
|
him to actually use Tor to determine his style of operation, rather than
|
|
collect this information from his passive building patterns.
|
|
|
|
However, to actively determine where Mike Perry is going, the guard
|
|
will need to require logging ahead of time at multiple exit nodes that
|
|
he may use over the course of the few days while he is at that guard,
|
|
and correlate the usage times of the exit node with Mike Perry's
|
|
activity at that guard for the few days he uses it. At this point, the
|
|
adversary is mounting a scale and method of attack (widespread logging,
|
|
timing attacks) that works pretty much just as effectively against 3
|
|
hops, so exit node operators are exposed to no additional danger than
|
|
they otherwise normally are.
|
|
|
|
|
|
Why not fix Pathlen=2?:
|
|
|
|
The main reason I am not advocating that we always use 2 hops is that
|
|
in some situations, timing correlation evidence by itself may not be
|
|
considered as solid and convincing as an actual, uninterrupted, fully
|
|
traced path. Are these timing attacks as effective on a real network as
|
|
they are in simulation? Maybe the circuit multiplexing of Tor can serve
|
|
to frustrate them to a degree? Would an extralegal adversary or
|
|
authoritarian government even care? In the face of these situation
|
|
dependent unknowns, it should be up to the user to decide if this is
|
|
a concern for them or not.
|
|
|
|
It should probably also be noted that even a false positive
|
|
rate of 1% for a 200k concurrent-user network could mean that for a
|
|
given node, a given stream could be confused with something like 10
|
|
users, assuming ~200 nodes carry most of the traffic (ie 1000 users
|
|
each). Though of course to really know for sure, someone needs to do
|
|
an attack on a real network, unfortunately.
|
|
|
|
Additionally, at some point cover traffic schemes may be implemented to
|
|
frustrate timing attacks on the first hop. It is possible some expert
|
|
users may do this ad-hoc already, and may wish to continue using 3 hops
|
|
for this reason.
|
|
|
|
|
|
Implementation:
|
|
|
|
new_route_len() can be modified directly with a check of the
|
|
Pathlen option. However, circuit construction logic should be
|
|
altered so that both 2 hop and 3 hop users build the same types of
|
|
circuits, and the option should ultimately govern circuit selection,
|
|
not construction. This improves coverage against guard nodes being
|
|
able to passively profile users who aren't even using Tor.
|
|
PathlenCoinWeight, anyone? :)
|
|
|
|
The exit policy hack is a bit more tricky. compare_addr_to_addr_policy
|
|
needs to return an alternate ADDR_POLICY_ACCEPTED_WILDCARD or
|
|
ADDR_POLICY_ACCEPTED_SPECIFIC return value for use in
|
|
circuit_is_acceptable.
|
|
|
|
The leaky exit is trickier still.. handle_control_attachstream
|
|
does allow paths to exit at a given hop. Presumably something similar
|
|
can be done in connection_ap_handshake_process_socks, and elsewhere?
|
|
Circuit construction would also have to be performed such that the
|
|
2nd hop's exit policy is what is considered, not the 3rd's.
|
|
|
|
The entry_guard_t structure could have num_circ_failed and
|
|
num_circ_succeeded members such that if it exceeds F% circuit
|
|
extend failure rate to a second hop, it is removed from the entry list.
|
|
|
|
F should be sufficiently high to avoid churn from normal Tor circuit
|
|
failure as determined by TorFlow scans.
|
|
|
|
The Vidalia option should be presented as a radio button.
|
|
|
|
|
|
Migration:
|
|
|
|
Phase 1: Adjust exit policy checks if Pathlen is set, implement leaky
|
|
circuit ability, and 2-3 hop circuit selection logic governed by
|
|
Pathlen.
|
|
|
|
Phase 2: Experiment to determine the proper ratio of circuit
|
|
failures used to expire garbage or malicious guards via TorFlow
|
|
(pending Bug #440 backport+adoption).
|
|
|
|
Phase 3: Implement guard expiration code to kick off failure-prone
|
|
guards and warn the user. Cap 2 hop guard duration to a proper number
|
|
of days determined sufficient to establish guard reliability (to be
|
|
determined by TorFlow).
|
|
|
|
Phase 4: Make radiobutton in Vidalia, along with help entry
|
|
that explains in layman's terms the risks involved.
|
|
|
|
Phase 5: Allow user to specify path length by HTTP URL suffix.
|
|
|
|
|
|
[1] http://p2pnet.net/story/11279
|
|
[2] http://www.cs.umass.edu/~mwright/papers/levine-timing.pdf
|
|
[3] Proof available upon request ;)
|