diff --git a/doc/spec/proposals/000-index.txt b/doc/spec/proposals/000-index.txt index 8ea73540a3..9fa2868dcd 100644 --- a/doc/spec/proposals/000-index.txt +++ b/doc/spec/proposals/000-index.txt @@ -91,6 +91,7 @@ Proposals by number: 168 Reduce default circuit window [OPEN] 169 Eliminate TLS renegotiation for the Tor connection handshake [DRAFT] 170 Configuration options regarding circuit building [DRAFT] +171 Separate streams across circuits by connection metadata [OPEN] 172 GETINFO controller option for circuit information [ACCEPTED] 173 GETINFO Option Expansion [ACCEPTED] 174 Optimistic Data for Tor: Server Side [OPEN] @@ -107,7 +108,7 @@ Proposals by status: 144 Increase the diversity of circuits by detecting nodes belonging the same provider 169 Eliminate TLS renegotiation for the Tor connection handshake [for 0.2.2] 170 Configuration options regarding circuit building - 175 Automatically promoting Tor clients to nodes [for ] + 175 Automatically promoting Tor clients to nodes NEEDS-REVISION: 131 Help users to verify they are using Tor OPEN: @@ -123,6 +124,7 @@ Proposals by status: 164 Reporting the status of server votes [for 0.2.2] 165 Easy migration for voting authority sets 168 Reduce default circuit window [for 0.2.2] + 171 Separate streams across circuits by connection metadata 174 Optimistic Data for Tor: Server Side ACCEPTED: 110 Avoiding infinite length circuits [for 0.2.1.x] [in 0.2.1.3-alpha] diff --git a/doc/spec/proposals/001-process.txt b/doc/spec/proposals/001-process.txt index e2fe358fed..53ad32ba12 100644 --- a/doc/spec/proposals/001-process.txt +++ b/doc/spec/proposals/001-process.txt @@ -75,7 +75,7 @@ How new proposals get added: To get your proposal in, send it to or-dev. - The current proposal editor is Nick Mathewson. + The current proposal editors are Nick Mathewson and Jacob Appelbaum. What should go in a proposal: diff --git a/doc/spec/proposals/171-separate-streams.txt b/doc/spec/proposals/171-separate-streams.txt new file mode 100644 index 0000000000..958b8f7b6c --- /dev/null +++ b/doc/spec/proposals/171-separate-streams.txt @@ -0,0 +1,350 @@ +Filename: 171-separate-streams.txt +Title: Separate streams across circuits by connection metadata +Author: Robert Hogan, Jacob Appelbaum, Damon McCoy, Nick Mathewson +Created: 21-Oct-2008 +Modified: 7-Dec-2010 +Status: Open + +Summary: + + We propose a new set of options to isolate unrelated streams from one + another, putting them on separate circuits so that semantically + unrelated traffic is not inadvertently made linkable. + +Motivation: + + Currently, Tor attaches regular streams (that is, ones not carrying + rendezvous or directory traffic) to circuits based only on whether Tor + circuit's current exit node supports the destination, and whether the + circuit has been dirty (that is, in use) for too long. + + This means that traffic that would otherwise be unrelated sometimes + gets sent over the same circuit, allowing the exit node to link such + streams with certainty, and allowing other parties to link such + streams probabilistically. + + Older versions of onion routing tried to address this problem by + sending every stream over a separate circuit; performance issues made + this unfeasible. Moreover, in the presence of a localized adversary, + separating streams by circuits increases the odds that, for any given + linked set of streams, at least one will go over a compromised + circuit. + + Therefore we ought to look for ways to allow streams that ought to be + linked to travel over a single circuit, while keeping streams that + ought not be linked isolated to separate circuits. + +Discussion: + + Let's call a series of inherently-linked streams (like a set of + streams downloading objects from the same webpage, or a browsing + session where the user requests several related webpages) a "Session". + + "Sessions" are a necessarily a fuzzy concept. While users typically + consider some activities as wholly unrelated to each other ("My IM + session has nothing to do with my web browsing!"), the boundaries + between activities are sometimes hard to determine. If I'm reading + lolcats in one browser tab and reading about treatments for an + embarrassing disease in another, those are probably separate sessions. + If I search for a forum, log in, read it for a while, and post a few + messages on unrelated topics, that's probably all the same session. + + So with the proviso that no automated process can identify sessions + 100% accurately, let's see which options we have available. + + Generally, all the streams on a session come from a single + application. Unfortunately, isolating streams by application + automatically isn't feasible, given the lack of any nice + cross-platform way to tell which local process originated a given + connection. (Yes, lsof works. But a quick review of the lsof code + should be sufficient to scare you away from thinking there is a + portable option, much less a portable O(1) option.) So instead, we'll + have to use some other aspect of a Tor request as a proxy for the + application. + + Generally, traffic from separate applications is not in the same + session. + + With some applications (IRC, for example), each stream is a session. + + Some applications (most notably web browsing) can't be meaningfully + split into sessions without inspecting the traffic itself and + maintaining a lot of state. + + How well do ports correspond to sessions? Early versions of this + proposal focused on using destination ports as a proxy for + application, since a connection to port 22 for SSH is probably not in + the same session as one to port 80. This only works with some + applications better than others, though: while SSH users typically + know when they're on port 22 and when they aren't, a web browser can + be coaxed (though img urls or any number of releated tricks) into + connecting to any port at all. Moreover, when Tor gets a DNS lookup + request, it doesn't know in advance which port the resulting address + will be used to connect to. + + So in summary, each kind of traffic wants to follow different rules, + and assuming the existence of a web browser and a hostile web page or + exit node, we can't tell one kind of traffic from another by simply + looking at the destination:port of the traffic. + + Fortunately, we're not doomed. + +Design: + + When a stream arrives at Tor, we have the following data to examine: + 1) The destination address + 2) The destination port (unless this a DNS lookup) + 3) The protocol used by the application to send the stream to Tor: + SOCKS4, SOCKS4A, SOCKS5, or whatever local "transparent proxy" + mechanism the kernel gives us. + 4) The port used by the application to send the stream to Tor -- + that is, the SOCKSListenAddress or TransListenAddress that the + application used, if we have more than one. + 5) The SOCKS username and password, if any. + 6) The source address and port for the application. + + We propose to use 3, 4, and 5 as a backchannel for applications to + tell Tor about different sessions. Rather than running only one + SOCKSPort, a Tor user who would prefer better session isolation should + run multiple SOCKSPorts/TransPorts, and configure different + applications to use separate ports. Applications that support SOCKS + authentication can further be separated on a single port by their + choice of username/password. Streams sent to separate ports or using + different authentication information should never be sent over the + same circuit. We allow each port to have its own settings for + isolation based on destination port, destination address, or both. + + Handling DNS can be a challenge. We can get hostnames by one of three + means: + + A) A SOCKS4a request, or a SOCKS5 request with a hostname. This + case is handled trivially using the rules above. + B) A RESOLVE request on a SOCKSPort. This case is handled using the + rules above, except that port isolation can't work to isolate + RESOLVE requests into a proper session, since we don't know which + port will eventually be used when we connect to the returned + address. + C) A request on a DNSPort. We have no way of knowing which + address/port will be used to connect to the requested address. + + When B or C is required but problematic, we could favor the use of + AutomapHostsOnResolve. + +Interface: + + We propose that {SOCKS,Natd,Trans,DNS}ListenAddr be deprecated in + favor of an expanded {SOCKS,Natd,Trans,DNS}Port syntax: + + ClientPortLine = OptionName SP (Addr ":")? Port (SP Options?) + OptionName = "SOCKSPort" / "NatdPort" / "TransPort" / "DNSPort" + Addr = An IPv4 address / an IPv6 address surrounded by brackets. + If optional, we default to 127.0.0.1 + Port = An integer from 1 through 65535 inclusive + Options = Option + Options = Options SP Option + Option = IsolateOption / GroupOption + GroupOption = "SessionGroup=" UINT + IsolateOption = OptNo ("IsolateDestPort" / "IsolateDestAddr" / + "IsolateSOCKSUser"/ "IsolateClientProtocol" / + "IsolateClientAddr") OptPlural + OptNo = "No" ? + OptPlural = "s" ? + SP = " " + UINT = An unsigned integer + + All options are case-insensitive. + + The "IsolateSOCKSUser" and "IsolateClientAddr" options are on by + default; "NoIsolateSOCKSUser" and "NoIsolateClientAddr" respectively + turn them off. The IsolateDestPort and IsolateDestAddr and + IsolateClientProtocol options are off by default. NoIsolateDestPort and + NoIsolateDestAddr and NoIsolateClientProtocol have no effect. + + Given a set of ClientPortLines, streams must NOT be placed on the same + circuit if ANY of the following hold: + + * They were sent to two different client ports, unless the two + client ports both specify a "SessionGroup" option with the same + integer value. + * At least one was sent to a client port with the IsolateDestPort + active, and they have different destination ports. + * At least one was sent to a client port with IsolateDestAddr + active, and they have different destination addresses. + * At least one was sent to a client port with IsolateClientProtocol + active, and they use different protocols (where SOCKS4, SOCKS4a, + SOCKS5, TransPort, NatdPort, and DNS are the protocols in question) + * At least one was sent to a client port with IsolateSOCKSUser + active, and they have different SOCKS username/password values + configurations. (For the purposes of this option, the + username/password pair of ""/"" is distinct from SOCKS without + authentication, and both are distinct from any non-SOCKS client's + non-authentication.) + * At least one was sent to a client port with IsolateClientAddr + active, and they came from different client addresses. (For the + purpose of this option, any local interface counts as the same + address. So if the host is configured with addresses 10.0.0.1, + 192.0.32.10, and 127.0.0.1, then traffic from those addresses can + leave on the same circuit, but traffic to from 10.0.0.2 (for + example) could not share a circuit with any of them.) + + These rules apply regardless of whether the streams are active at the + same time. In other words, if the rules say that streams A and B must + not be on the same circuit, and stream A is attached to circuit X, + then stream B must never be attached to stream X, even if stream A is + closed first. + +Alternative Interface: + + We're cramming a lot onto one line in the design above. Perhaps + instead it would be a better idea to have grouped lines of the form: + + StreamGroup 1 + SOCKSPort 9050 + TransPort 9051 + IsolateDestPort 1 + IsolateClientProtocol 0 + EndStreamGroup + + StreamGroup 2 + SOCKSPort 9052 + DNSPort 9053 + IsolateDestAddr 1 + EndStreamGroup + + This would be equivalent to: + SOCKSPort 9050 SessionGroup=1 IsolateDestPort NoIsolateClientProtocol + TransPort 9051 SessionGroup=1 IsolateDestPort NoIsolateClientProtocol + SOCKSPort 9052 SessionGroup=2 IsolateDestAddr + DNSPort 9053 SessionGroup=2 IsolateDestAddr + + But it would let us extend range of allowed options later without + having client port lines group without bound. For example, we might + give different circuit building parameters to different session + groups. + +Example of use: + + Suppose that we want to use a web browser, an IRC client, and a SSH + client all at the same time. Let's assume that we want web traffic to + be isolated from all other traffic, even if the browser makes + connections to ports usually used for IRC or SSH. Let's also assume + that IRC and SSH are both used for relatively long-lived connections, + and we want to keep all IRC/SSH sessions separate from one another. + + In this case, we could say: + + SOCKSPort 9050 + SOCKSPort 9051 IsolateDestAddr IsolateDestPort + + We would then configure our browser to use 9050 and our IRC/SSH + clients to use 9051. + +Advanced example of use, #2: + + Suppose that we have a bunch of applications, and we launch them all + using torsocks, and we want to keep each applications isolated from + one another. We just create a shell script, "torlaunch": + #!/bin/bash + export TORSOCKS_USERNAME="$1" + exec torsocks $@ + And we configure our SOCKSPort with IsolateSOCKSUser. + + Or if we're on Linux and we want to isolate by application invocation, + we would change the TORSOCKS_USERNAME line to: + + export TORSOCKS_USERNAME="`cat /proc/sys/kernel/random/uuid`" + +Advanced example of use, #2: + + Now suppose that we want to achieve the benefits of the first example + of use, but we are stuck using transparent proxies. Let's suppose + this is Linux. + + TransPort 9090 + TransPort 9091 IsolateDestAddr IsolateDestPort + DNSPort 5353 + AutomapHostsOnResolve 1 + + Here we use the iptables --cmd-owner filter to distinguish which + command is originating the packets, directing traffic from our irc + client and our SSH client to port 9091, and directing other traffic to + 9090. Using AutomapHostsOnResolve will confuse ssh in its default + configuration; we'll need to find a way around that. + +Security Risks: + + Disabling IsolateClientAddr is a pretty bad idea. + + Setting up a set of applications to use this system effectively is a + big problem. It's likely that lots of people who try to do this will + mess it up. We should try to see which setups are sensible, and see + if we can provide good feedback to explain which streams are isolated + how. + +Performance Risks: + + This proposal will result in clients building many more circuits than + they do today. To avoid accidentally hammering the network, we should + have in-process limits on the maximum circuit creation rate and the + total maximum client circuits. + +Specification: + + The Tor client circuit selection process is not entirely specified. + Any client circuit specification must take these changes into account. + +Implementation notes: + + The more obvious ways to implement the "find a good circuit to attach + to" part of this proposal involve doing an O(n_circuits) operation + every time we have a stream to attach. We already do such an + operation, so it's not as if we need to hunt for fancy ways to make it + O(1). What will be harder is implementing the "launch circuits as + needed" part of the proposal. Still, it should come down to "a simple + matter of programming." + + The SOCKS4 spec has the client provide authentication info when it + connects; accepting such info is no problem. But the SOCKS5 spec has + the client send a list of known auth methods, then has the server send + back the authentication method it chooses. We'll need to update the + SOCKS5 implementation so it can accept user/password authentication if + it's offered. + + If we use the second syntax for describing these options, we'll want + to add a new "section-based" entry type for the configuration parser. + Not a huge deal; we already have kludged up something similar for + hidden service configurations. + + Opening circuits for predicted ports has the potential to get a little + more complicated; we can probably get away with the existing + algorithm, though, to see where its weak points are and look for + better ones. + + Perhaps we can get our next-gen HTTP proxy to communicate browser tab + or session into to tor via authentication, or have torbutton do it + directly. More design is needed here, though. + +Alternative designs: + + The implementation of this option may want to consider cases where the + same exit node is shared by two or more circuits and + IsolateStreamsByPort is in force. Since one possible use of the option + is to reduce the opportunity of Exit Nodes to attack traffic from the + same source on multiple ports, the implementation may need to ensure + that circuits reserved for the exclusive use of given ports do not + share the same exit node. On the other hand, if our goal is only that + streams should be unlinkable, deliberately shunting them to different + exit nodes is unnecessary and slightly counterproductive. + + Earlier versions of this design included a mechanism to isolate + _particular_ destination ports and addresses, so that traffic sent to, + say, port 22 would never share a port with any traffic *not* sent to + port 22. You can achieve this here by having all applications that + send traffic to one of these ports use a separate SOCKSPort, and + then setting IsolateDestPorts on that SOCKSPort. + +Lingering questions: + + I suspect there are issues remaining with DNS and TransPort users, and + that my "just use AutomapHostsOnResolve" suggestion may be + insufficient. diff --git a/doc/spec/proposals/ideas/xxx-separate-streams-by-port.txt b/doc/spec/proposals/ideas/xxx-separate-streams-by-port.txt deleted file mode 100644 index f26c1e580f..0000000000 --- a/doc/spec/proposals/ideas/xxx-separate-streams-by-port.txt +++ /dev/null @@ -1,59 +0,0 @@ -Filename: xxx-separate-streams-by-port.txt -Title: Separate streams across circuits by destination port -Author: Robert Hogan -Created: 21-Oct-2008 -Status: Draft - -Here's a patch Robert Hogan wrote to use only one destination port per -circuit. It's based on a wishlist item Roger wrote, to never send AIM -usernames over the same circuit that we're hoping to browse anonymously -through. The remaining open question is: how many extra circuits does this -cause an ordinary user to create? My guess is not very many, but I'm wary -of putting this in until we have some better estimate. On the other hand, -not putting it in means that we have a known security flaw. Hm. - -Index: src/or/or.h -=================================================================== ---- src/or/or.h (revision 17143) -+++ src/or/or.h (working copy) -@@ -1874,6 +1874,7 @@ - - uint8_t state; /**< Current status of this circuit. */ - uint8_t purpose; /**< Why are we creating this circuit? */ -+ uint16_t service; /**< Port conn must have to use this circuit. */ - - /** How many relay data cells can we package (read from edge streams) - * on this circuit before we receive a circuit-level sendme cell asking -Index: src/or/circuituse.c -=================================================================== ---- src/or/circuituse.c (revision 17143) -+++ src/or/circuituse.c (working copy) -@@ -62,10 +62,16 @@ - return 0; - } - -- if (purpose == CIRCUIT_PURPOSE_C_GENERAL) -+ if (purpose == CIRCUIT_PURPOSE_C_GENERAL) { - if (circ->timestamp_dirty && - circ->timestamp_dirty+get_options()->MaxCircuitDirtiness <= now) - return 0; -+ /* If the circuit is dirty and used for services on another port, -+ then it is not suitable. */ -+ if (circ->service && conn->socks_request->port && -+ (circ->service != conn->socks_request->port)) -+ return 0; -+ } - - /* decide if this circ is suitable for this conn */ - -@@ -1351,7 +1357,9 @@ - if (connection_ap_handshake_send_resolve(conn) < 0) - return -1; - } -- -+ if (conn->socks_request->port -+ && (TO_CIRCUIT(circ)->purpose == CIRCUIT_PURPOSE_C_GENERAL)) -+ TO_CIRCUIT(circ)->service = conn->socks_request->port; - return 1; - } -