tor/doc/design-paper/challenges.tex

327 lines
14 KiB
TeX
Raw Normal View History

\documentclass{llncs}
\usepackage{url}
\usepackage{amsmath}
\usepackage{epsfig}
\newenvironment{tightlist}{\begin{list}{$\bullet$}{
\setlength{\itemsep}{0mm}
\setlength{\parsep}{0mm}
% \setlength{\labelsep}{0mm}
% \setlength{\labelwidth}{0mm}
% \setlength{\topsep}{0mm}
}}{\end{list}}
\begin{document}
\title{Challenges in bringing low-latency stream anonymity to the masses (DRAFT)}
\author{Roger Dingledine and Nick Mathewson}
\institute{The Free Haven Project\\
\email{\{arma,nickm\}@freehaven.net}}
\maketitle
\pagestyle{empty}
\begin{abstract}
foo
\end{abstract}
\section{Introduction}
Anonymous communication on the Internet today
Tor is a low-latency anonymous communication overlay network
\cite{tor-design}. We have been operating a publicly deployed Tor network
since October 2003.
Tor aims to resist observers and insiders by distributing each transaction
over several nodes in the network. This ``distributed trust'' approach
means the Tor network can be safely operated and used by a wide variety
of mutually distrustful users, providing more sustainability and security
than previous attempts at anonymizing networks.
The Tor network has a broad range of users, including ordinary citizens
who want to avoid being profiled for targeted advertisements, corporations
who don't want to reveal information to their competitors, and law
enforcement and government intelligence agencies who need
to do operations on the Internet without being noticed.
Tor has been funded by both the U.S. Navy, for use in securing government
communications, and also the Electronic Frontier Foundation, for use in
maintain civil liberties for ordinary citizens online.
The Tor protocol is one of the leading choices
to be the anonymizing layer in the European Union's PRIME directive to
help maintain privacy in Europe. The University of Dresden in Germany
has integrated an independent implementation of the Tor protocol into
their popular Java Anon Proxy anonymizing client. This wide variety of
interests helps maintain both the stability and the security of the
network.
We deployed this thing called Tor. it's got all these different types of
users. it's been backed by navy and eff, and prime and anonymizer looked at
it. Because we're this cool, you should believe us when we tell you stuff.
In this paper we give the reader an understanding of Tor's context
in the anonymity space and then we go on to describe the
practical challenges that stand in the way of moving from a practical
useful network to a practical useful anonymous network.
% The goal of the paper is to get the PET-audience reader up to speed
% on all the issues we have with Tor, so he can, if he wants,
% * understand the technical and policy and legal issues and why they're
% tricky in practice
% * help us out with answering some of the technical decisions
% (and in writing it, we'll clarify our own opinions about them)
% * help us out with answering some of the anonymity questions
\section{What Is Tor}
\subsection{Distributed trust: safety in numbers}
Tor provides \emph{forward privacy}, so that users can connect to
Internet sites without revealing their logical or physical locations
to those sites or to observers. It also provides \emph{location-hidden
services}, so that critical servers can support authorized users without
giving adversaries an effective vector for physical or online attacks.
Our design provides this protection even when a portion of its own
infrastructure is controlled by an adversary.
To make private connections in Tor, users incrementally build a path or
\emph{circuit} of encrypted connections through servers on the network,
extending it one step at a time so that each server in the circuit only
learns which server extended to it and which server it has been asked
to extend to. The client negotiates a separate set of encryption keys
for each step along the circuit.
Once a circuit has been established, the client software waits for
applications to request TCP connections, and directs these application
streams along the circuit. Many streams can be multiplexed along a single
circuit, so applications don't need to wait for keys to be negotiated
every time they open a connection. Because each server sees no
more than one end of the connection, a local eavesdropper or a compromised
server cannot use traffic analysis to link the connection's source and
destination. The Tor client software rotates circuits periodically
to prevent long-term linkability between different actions by a
single user.
Tor differs from other deployed systems for traffic analysis resistance
in its security and flexibility. Mix networks such as Mixmaster or its
successor Mixminion \cite{minion-design}
gain the highest degrees of anonymity at the expense of introducing highly
variable delays, thus making them unsuitable for applications such as web
browsing that require quick response times. Commercial single-hop proxies
such as {\url{anonymizer.com}} present a single point of failure, where
a single compromise can expose all users' traffic, and a single-point
eavesdropper can perform traffic analysis on the entire network.
Also, their proprietary implementations place any infrastucture that
depends on these single-hop solutions at the mercy of their providers'
financial health. Tor can handle any TCP-based protocol, such as web
browsing, instant messaging and chat, and secure shell login; and it is
the only implemented anonymizing design with an integrated system for
secure location-hidden services.
No organization can achieve this security on its own. If a single
corporation or government agency were to build a private network to
protect its operations, any connections entering or leaving that network
would be obviously linkable to the controlling organization. The members
and operations of that agency would be easier, not harder, to distinguish.
Instead, to protect our networks from traffic analysis, we must
collaboratively blend the traffic from many organizations and private
citizens, so that an eavesdropper can't tell which users are which,
and who is looking for what information. By bringing more users onto
the network, all users become more secure \cite{econymics}.
Naturally, organizations will not want to depend on others for their
security. If most participating providers are reliable, Tor tolerates
some hostile infiltration of the network. For maximum protection,
the Tor design includes an enclave approach that lets data be encrypted
(and authenticated) end-to-end, so high-sensitivity users can be sure it
hasn't been read or modified. This even works for Internet services that
don't have built-in encryption and authentication, such as unencrypted
HTTP or chat, and it requires no modification of those services to do so.
weasel's graph of \# nodes and of bandwidth, ideally from week 0.
Tor has the following goals.
and we made these assumptions when trying to design the thing.
\section{Tor's position in the anonymity field}
There are many other classes of systems: single-hop proxies, open proxies,
jap, mixminion, flash mixes, freenet, i2p, mute/ants/etc, tarzan,
morphmix, freedom. Give brief descriptions and brief characterizations
of how we differ. This is not the breakthrough stuff and we only have
a page or two for it.
\section{Crossroads}
Discuss each item that Tor hasn't solved yet that isn't just coding
work. Perhaps we'll have so many that we can pick out the best ones to
discuss, so it's a bit less of a laundry list. Maybe they'll even fit
into categories. The trick to making the paper good will be to find
the right balance between going into depth and breadth of coverage.
Peer-to-peer / practical issues:
Network discovery, sybil, node admission, scaling. It seems that the code
will ship with something and that's our trust root. We could try to get
people to build a web of trust, but no. Where we go from here depends
on what threats we have in mind. Really decentralized if your threat is
RIAA; less so if threat is to application data or individuals or...
Making use of servers with little bandwidth. How to handle hammering by
certain applications.
Handling servers that are far away from the rest of the network, e.g. on
the continents that aren't North America and Europe. High latency,
often high packet loss.
Running Tor servers behind NATs, behind great-firewalls-of-China, etc.
Restricted routes. How to propagate to everybody the topology? BGP
style doesn't work because we don't want just *one* path. Point to
Geoff's stuff.
Routing-zones. It seems that our threat model comes down to diversity and
dispersal. But hard for Alice to know how to act. Many questions remain.
The China problem. We have lots of users in Iran and similar (we stopped
logging, so it's hard to know now, but many Persian sites on how to use
Tor), and they seem to be doing ok. But the China problem is bigger. Cite
Stefan's paper, and talk about how we need to route through clients,
and we maybe we should start with a time-release IP publishing system +
advogato based reputation system, to bound the number of IPs leaked to the
adversary.
Policy issues:
Bittorrent and dmca. Should we add an IDS to autodetect protocols and
snipe them? Takedowns and efnet abuse and wikipedia complaints and irc
networks. Should we allow revocation of anonymity if a threshold of
servers want to?
2005-01-07 15:01:56 +01:00
Image: substantial non-infringing uses. Image is a security parameter,
since it impacts user base and perceived sustainability.
Sustainability. Previous attempts have been commercial which we think
adds a lot of unnecessary complexity and accountability. Freedom didn't
collect enough money to pay its servers; JAP bandwidth is supported by
continued money, and they periodically ask what they will do when it
dries up.
Logging. Making logs not revealing. A happy coincidence that verbose
logging is our \#2 performance bottleneck. Is there a way to detect
modified servers, or to have them volunteer the information that they're
logging verbosely? Would that actually solve any attacks?
Anonymity issues:
Transporting the stream vs transporting the packets.
The DNS problem in practice.
Applications that leak data. We can say they're not our problem, but
they're somebody's problem.
How to measure performance without letting people selectively deny service
by distinguishing pings. Heck, just how to measure performance at all. In
practice people have funny firewalls that don't match up to their exit
policies and Tor doesn't deal.
Mid-latency. Can we do traffic shape to get any defense against George's
PET2004 paper? Will padding or long-range dummies do anything then? Will
it kill the user base or can we get both approaches to play well together?
Does running a server help you or harm you? George's Oakland attack.
Plausible deniability -- without even running your traffic through Tor! We
have to pick the path length so adversary can't distinguish client from
server (how many hops is good?).
When does fixing your entry or exit node help you?
Helper nodes in the literature don't deal with churn, and
especially active attacks to induce churn.
Survivable services are new in practice, yes? Hidden services seem
less hidden than we'd like, since they stay in one place and get used
a lot. They're the epitome of the need for helper nodes. This means
that using Tor as a building block for Free Haven is going to be really
hard. Also, they're brittle in terms of intersection and observation
attacks. Would be nice to have hot-swap services, but hard to design.
P2P + anonymity issues:
Incentives. Copy the page I wrote for the NSF proposal, and maybe extend
it if we're feeling smart.
Usability: fc03 paper was great, except the lower latency you are the
less useful it seems it is.
A Tor gui, how jap's gui is nice but does not reflect the security
they provide.
Public perception, and thus advertising, is a security parameter.
Network investigation: Is all this bandwidth publishing thing a good idea?
How can we collect stats better? Note weasel's smokeping, at
http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor
which probably gives george and steven enough info to break tor?
Do general DoS attacks have anonymity implications? See e.g. Adam
Back's IH paper, but I think there's more to be pointed out here.
% need to do somewhere in the paper:
have a serious discussion of morphmix's assumptions, since they would
seem to be the direct competition. in fact tor is a flexible architecture
that would encompass morphmix, and they're nearly identical except for
path selection and node discovery. and the trust system morphmix has
seems overkill (and/or insecure) based on the threat model we've picked.
need to discuss how we take the approach of building the thing, and then
assuming that, how much anonymity can we get. we're not here to model or
to simulate or to produce equations and formulae. but those have their
roles too.
%%%
TCP vs UDP
argument 1: we need to do IP-level packet normalization, to block things like ip
fingerprinting.
argument 2: we still need to be easy to integrate with applications, so they can do
application-level scrubbing.
argument 3: we need a block-level encryption approach that can provide security despite
packet loss and out-of-order delivery. i believe you that such a thing can be created,
but no thing has yet been specified. so specify it for me if you want me to believe it.
(freedom and cebolla are vulnerable to tagging and malleability attacks i believe.)
argument 4: we still need to play with parameters for throughput, congestion control,
etc -- since we need sequence numbers and maybe more to do replay detection,
and just to handle duplicate frames. so we would be reimplementing some subset of tcp
anyway.
argument 5: tls over udp is not implemented or even specified.
argument 6: exit policies over arbitrary IP packets seems to be an IDS-hard problem. i
don't want to build an IDS into tor.
argument 7: certain protocols are going to leak information at the IP layer anyway. for
example, if we anonymizer your dns requests, but they still go to comcast's dns servers,
that's bad.
argument 8: hidden services, .exit addresses, etc are broken unless we have some way to
reach into the application-level protocol and decide the hostname it's trying to get.
\bibliographystyle{plain} \bibliography{tor-design}
\end{document}