Revise proposal 171 from start to finish

The big semantic change is to make the IsolateFoo options exist on a per-client-port basis.
2024-11-10 21:23:58 +01:00 · 2010-11-29 14:29:47 -05:00 · 2010-11-29 14:29:47 -05:00 · a1e46c5393
commit a1e46c5393
parent c4d2a55a88
1 changed files with 326 additions and 75 deletions
--- a/doc/spec/proposals/171-separate-streams.txt
+++ b/doc/spec/proposals/171-separate-streams.txt
@ -1,99 +1,350 @@
-Filename: 171-separate-streams-by-port-or-host.txt
+Filename: 171-separate-streams.txt
-Title: Separate streams across circuits by destination port or destination host
+Title: Separate streams across circuits by connection metadata
-Author: Robert Hogan, Jacob Appelbaum, Damon McCoy
+Author: Robert Hogan, Jacob Appelbaum, Damon McCoy, Nick Mathewson
 Created: 21-Oct-2008
-Modified: 30-Aug-2010
+Modified: 7-Dec-2010
-Status: Draft
+Status: Open
 Summary:
  We propose a new set of options to isolate unrelated streams from one
  another, putting them on separate circuits so that semantically
  unrelated traffic is not inadvertently made linkable.
 Motivation:
-Streams are currently attached to circuits without regard to their content,
+  Currently, Tor attaches regular streams (that is, ones not carrying
-destination host, or destination port. We propose three options,
+  rendezvous or directory traffic) to circuits based only on whether Tor
-IsolateBySOCKSUser, IsolateStreamsByPort and IsolateStreamsByHost to change the
+  circuit's current exit node supports the destination, and whether the
-default behavior.
+  circuit has been dirty (that is, in use) for too long.
-The contents of some streams will always have revealing plain text information;
+  This means that traffic that would otherwise be unrelated sometimes
-these streams should be treated differently than other streams that may or may
+  gets sent over the same circuit, allowing the exit node to link such
-not have unencrypted PII content. DNS, with the exception of DNSCurve, is
+  streams with certainty, and allowing other parties to link such
-always unencrypted. It is reasonable to assume that other protocols may exist
+  streams probabilistically.
 that have a similar issue and may cause user concern. It is also the case that
 we must balance network load issues and stream privacy. The Tor network will not
 currently scale to one circuit per application connection nor should it anytime
 soon.
-Circuits are currently created with a few constraints and are rotated within
+  Older versions of onion routing tried to address this problem by
-a reasonable time window. This allows a rogue exit node to correlate all
+  sending every stream over a separate circuit; performance issues made
-streams on a given circuit.
+  this unfeasible. Moreover, in the presence of a localized adversary,
  separating streams by circuits increases the odds that, for any given
  linked set of streams, at least one will go over a compromised
  circuit.
  Therefore we ought to look for ways to allow streams that ought to be
  linked to travel over a single circuit, while keeping streams that
  ought not be linked isolated to separate circuits.
 Discussion:
  Let's call a series of inherently-linked streams (like a set of
  streams downloading objects from the same webpage, or a browsing
  session where the user requests several related webpages) a "Session".
  "Sessions" are a necessarily a fuzzy concept.  While users typically
  consider some activities as wholly unrelated to each other ("My IM
  session has nothing to do with my web browsing!"), the boundaries
  between activities are sometimes hard to determine.  If I'm reading
  lolcats in one browser tab and reading about treatments for an
  embarrassing disease in another, those are probably separate sessions.
  If I search for a forum, log in, read it for a while, and post a few
  messages on unrelated topics, that's probably all the same session.
  So with the proviso that no automated process can identify sessions
  100% accurately, let's see which options we have available.
  Generally, all the streams on a session come from a single
  application.  Unfortunately, isolating streams by application
  automatically isn't feasible, given the lack of any nice
  cross-platform way to tell which local process originated a given
  connection.  (Yes, lsof works.  But a quick review of the lsof code
  should be sufficient to scare you away from thinking there is a
  portable option, much less a portable O(1) option.)  So instead, we'll
  have to use some other aspect of a Tor request as a proxy for the
  application.
  Generally, traffic from separate applications is not in the same
  session.
  With some applications (IRC, for example), each stream is a session.
  Some applications (most notably web browsing) can't be meaningfully
  split into sessions without inspecting the traffic itself and
  maintaining a lot of state.
  How well do ports correspond to sessions?  Early versions of this
  proposal focused on using destination ports as a proxy for
  application, since a connection to port 22 for SSH is probably not in
  the same session as one to port 80. This only works with some
  applications better than others, though: while SSH users typically
  know when they're on port 22 and when they aren't, a web browser can
  be coaxed (though img urls or any number of releated tricks) into
  connecting to any port at all.  Moreover, when Tor gets a DNS lookup
  request, it doesn't know in advance which port the resulting address
  will be used to connect to.
  So in summary, each kind of traffic wants to follow different rules,
  and assuming the existence of a web browser and a hostile web page or
  exit node, we can't tell one kind of traffic from another by simply
  looking at the destination:port of the traffic.
  Fortunately, we're not doomed.
 Design:
-We propose two options for isolation of streams that lessen the observability
+  When a stream arrives at Tor, we have the following data to examine:
-and linkability of the Tor client's traffic.
+    1) The destination address
    2) The destination port (unless this a DNS lookup)
    3) The protocol used by the application to send the stream to Tor:
       SOCKS4, SOCKS4A, SOCKS5, or whatever local "transparent proxy"
       mechanism the kernel gives us.
    4) The port used by the application to send the stream to Tor --
       that is, the SOCKSListenAddress or TransListenAddress that the
       application used, if we have more than one.
    5) The SOCKS username and password, if any.
    6) The source address and port for the application.
-IsolateStreamsByPort will take a list of ports or optionally the keyword 'All'
+  We propose to use 3, 4, and 5 as a backchannel for applications to
-in place of a port list. The use of the keyword 'All' will ensure that all
+  tell Tor about different sessions.  Rather than running only one
-application connections attached to streams will be isolated to separate
+  SOCKSPort, a Tor user who would prefer better session isolation should
-circuits by port number.
+  run multiple SOCKSPorts/TransPorts, and configure different
  applications to use separate ports. Applications that support SOCKS
  authentication can further be separated on a single port by their
  choice of username/password.  Streams sent to separate ports or using
  different authentication information should never be sent over the
  same circuit.  We allow each port to have its own settings for
  isolation based on destination port, destination address, or both.
-IsolateStreamsByHost will take a boolean value. When enabled, all application
+  Handling DNS can be a challenge.  We can get hostnames by one of three
-connections, regardless of port number will be isolated with separate circuits
+  means:
 per host. If this option is enabled, we should ensure that the client has a
 reasonable number of pre-built circuits to ensure perceived performance. This
 should also intentionally limit the total number of circuits a client will
 build to ten circuits to prevent abuse and load on the network. This is a
 trade-off of performance for anonymity. Tor will issue a warning if a client
 encounters this limit.
-IsolateBySOCKSUser will take a boolean value. When enabled, all application
+    A) A SOCKS4a request, or a SOCKS5 request with a hostname.  This
-connections, regardless of port number will be isolated with separate circuits
+       case is handled trivially using the rules above.
-per SOCKS username. This options ensures that any two streams that were created
+    B) A RESOLVE request on a SOCKSPort.  This case is handled using the
-with different SOCKS usernames will be sent over different circuits.  The empty
+       rules above, except that port isolation can't work to isolate
-username will be treated as its own username different from all other usernames.
+       RESOLVE requests into a proper session, since we don't know which
       port will eventually be used when we connect to the returned
       address.
    C) A request on a DNSPort.  We have no way of knowing which
       address/port will be used to connect to the requested address.
-Security implications:
+  When B or C is required but problematic, we could favor the use of
  AutomapHostsOnResolve.
-It is believed that the proposed changes will improve the anonymity for end
+Interface:
 user stream privacy.  The end user will no longer link all streams at a single
 exit node during a given time window.
-There is a possible attack where a hostile web page possibly in collusion with
+  We propose that {SOCKS,Natd,Trans,DNS}ListenAddr be deprecated in
-an exit node contains image links for images at (say) "evil.example.com:53" and
+  favor of an expanded {SOCKS,Natd,Trans,DNS}Port syntax:
-"evil.example.com:31337", and thereby (if they're lucky) correlate port-80
+
-circuits with port-53 and port-31337 circuits.
+  ClientPortLine = OptionName SP (Addr ":")? Port (SP Options?)
  OptionName = "SOCKSPort" / "NatdPort" / "TransPort" / "DNSPort"
  Addr = An IPv4 address / an IPv6 address surrounded by brackets.
         If optional, we default to 127.0.0.1
  Port = An integer from 1 through 65535 inclusive
  Options = Option
  Options = Options SP Option
  Option = IsolateOption / GroupOption
  GroupOption = "SessionGroup=" UINT
  IsolateOption =  OptNo ("IsolateDestPort" / "IsolateDestAddr" /
         "IsolateSOCKSUser"/ "IsolateClientProtocol" /
         "IsolateClientAddr") OptPlural
  OptNo = "No" ?
  OptPlural = "s" ?
  SP = " "
  UINT = An unsigned integer
  All options are case-insensitive.
  The "IsolateSOCKSUser" and "IsolateClientAddr" options are on by
  default; "NoIsolateSOCKSUser" and "NoIsolateClientAddr" respectively
  turn them off.  The IsolateDestPort and IsolateDestAddr and
  IsolateClientProtocol options are off by default.  NoIsolateDestPort and
  NoIsolateDestAddr and NoIsolateClientProtocol have no effect.
  Given a set of ClientPortLines, streams must NOT be placed on the same
  circuit if ANY of the following hold:
    * They were sent to two different client ports, unless the two
      client ports both specify a "SessionGroup" option with the same
      integer value.
    * At least one was sent to a client port with the IsolateDestPort
      active, and they have different destination ports.
    * At least one was sent to a client port with IsolateDestAddr
      active, and they have different destination addresses.
    * At least one was sent to a client port with IsolateClientProtocol
      active, and they use different protocols (where SOCKS4, SOCKS4a,
      SOCKS5, TransPort, NatdPort, and DNS are the protocols in question)
    * At least one was sent to a client port with IsolateSOCKSUser
      active, and they have different SOCKS username/password values
      configurations.  (For the purposes of this option, the
      username/password pair of ""/"" is distinct from SOCKS without
      authentication, and both are distinct from any non-SOCKS client's
      non-authentication.)
    * At least one was sent to a client port with IsolateClientAddr
      active, and they came from different client addresses.  (For the
      purpose of this option, any local interface counts as the same
      address.  So if the host is configured with addresses 10.0.0.1,
      192.0.32.10, and 127.0.0.1, then traffic from those addresses can
      leave on the same circuit, but traffic to from 10.0.0.2 (for
      example) could not share a circuit with any of them.)
  These rules apply regardless of whether the streams are active at the
  same time.  In other words, if the rules say that streams A and B must
  not be on the same circuit, and stream A is attached to circuit X,
  then stream B must never be attached to stream X, even if stream A is
  closed first.
 Alternative Interface:
  We're cramming a lot onto one line in the design above.  Perhaps
  instead it would be a better idea to have grouped lines of the form:
    StreamGroup 1
    SOCKSPort 9050
    TransPort 9051
    IsolateDestPort 1
    IsolateClientProtocol 0
    EndStreamGroup
    StreamGroup 2
    SOCKSPort 9052
    DNSPort 9053
    IsolateDestAddr 1
    EndStreamGroup
  This would be equivalent to:
   SOCKSPort 9050 SessionGroup=1 IsolateDestPort NoIsolateClientProtocol
   TransPort 9051 SessionGroup=1 IsolateDestPort NoIsolateClientProtocol
   SOCKSPort 9052 SessionGroup=2 IsolateDestAddr
   DNSPort   9053 SessionGroup=2 IsolateDestAddr
  But it would let us extend range of allowed options later without
  having client port lines group without bound.  For example, we might
  give different circuit building parameters to different session
  groups.
 Example of use:
  Suppose that we want to use a web browser, an IRC client, and a SSH
  client all at the same time.  Let's assume that we want web traffic to
  be isolated from all other traffic, even if the browser makes
  connections to ports usually used for IRC or SSH.  Let's also assume
  that IRC and SSH are both used for relatively long-lived connections,
  and we want to keep all IRC/SSH sessions separate from one another.
  In this case, we could say:
    SOCKSPort 9050
    SOCKSPort 9051 IsolateDestAddr IsolateDestPort
  We would then configure our browser to use 9050 and our IRC/SSH
  clients to use 9051.
 Advanced example of use, #2:
  Suppose that we have a bunch of applications, and we launch them all
  using torsocks, and we want to keep each applications isolated from
  one another.  We just create a shell script, "torlaunch":
    #!/bin/bash
    export TORSOCKS_USERNAME="$1"
    exec torsocks $@
  And we configure our SOCKSPort with IsolateSOCKSUser.
  Or if we're on Linux and we want to isolate by application invocation,
  we would change the TORSOCKS_USERNAME line to:
    export TORSOCKS_USERNAME="`cat /proc/sys/kernel/random/uuid`"
 Advanced example of use, #2:
  Now suppose that we want to achieve the benefits of the first example
  of use, but we are stuck using transparent proxies.  Let's suppose
  this is Linux.
    TransPort 9090
    TransPort 9091 IsolateDestAddr IsolateDestPort
    DNSPort 5353
    AutomapHostsOnResolve 1
  Here we use the iptables --cmd-owner filter to distinguish which
  command is originating the packets, directing traffic from our irc
  client and our SSH client to port 9091, and directing other traffic to
  9090.  Using AutomapHostsOnResolve will confuse ssh in its default
  configuration; we'll need to find a way around that.
 Security Risks:
  Disabling IsolateClientAddr is a pretty bad idea.
  Setting up a set of applications to use this system effectively is a
  big problem.  It's likely that lots of people who try to do this will
  mess it up.  We should try to see which setups are sensible, and see
  if we can provide good feedback to explain which streams are isolated
  how.
 Performance Risks:
  This proposal will result in clients building many more circuits than
  they do today.  To avoid accidentally hammering the network, we should
  have in-process limits on the maximum circuit creation rate and the
  total maximum client circuits.
 Specification:
-The Tor client circuit selection process is not entirely specified. Any client
+  The Tor client circuit selection process is not entirely specified.
-circuit specification must take these changes into account.
+  Any client circuit specification must take these changes into account.
 Compatibility:
 The proposed changes should not create any compatibility issues. New Tor clients
 will be able to take advantage of this without any modification to the network.
 Implementation:
 It is further proposed that IsolateStreamsByPort will be enabled by default
 for port 22, 53, and port 80.
 It is further proposed that IsolateStreamsByHost will be disabled by default.
 Implementation notes:
-The implementation of this option may want to consider cases where the same
+  The more obvious ways to implement the "find a good circuit to attach
-exit node is shared by two or more circuits and IsolateStreamsByPort is in
+  to" part of this proposal involve doing an O(n_circuits) operation
-force. Since the purpose of the option is to reduce the opportunity of Exit
+  every time we have a stream to attach.  We already do such an
-Nodes to attack traffic from the same source on multiple ports, the
+  operation, so it's not as if we need to hunt for fancy ways to make it
-implementation may need to ensure that circuits reserved for the exclusive use
+  O(1).  What will be harder is implementing the "launch circuits as
-of given ports do not share the same exit node.
+  needed" part of the proposal.  Still, it should come down to "a simple
  matter of programming."
-Circuits should not be shared by unique clients. Tor should check to ensure
+  The SOCKS4 spec has the client provide authentication info when it
-that peer IP addresses are identical when they connect to the SOCKS listener or
+  connects; accepting such info is no problem.  But the SOCKS5 spec has
-the TransPort listener before sharing a circuit. If the addresses are not
+  the client send a list of known auth methods, then has the server send
-identical, Tor should ensure that the circuits are not shared.
+  back the authentication method it chooses.  We'll need to update the
  SOCKS5 implementation so it can accept user/password authentication if
  it's offered.
-Performance and scalability notes:
+  If we use the second syntax for describing these options, we'll want
  to add a new "section-based" entry type for the configuration parser.
  Not a huge deal; we already have kludged up something similar for
  hidden service configurations.
-It is further proposed that IsolateStreamsByPort will be enabled by default for
+  Opening circuits for predicted ports has the potential to get a little
-all ports after a reasonable assessment is performed. Specifically, we should
+  more complicated; we can probably get away with the existing
-determine the impact this option has on Tor clients and the Tor network.
+  algorithm, though, to see where its weak points are and look for
  better ones.
  Perhaps we can get our next-gen HTTP proxy to communicate browser tab
  or session into to tor via authentication, or have torbutton do it
  directly.  More design is needed here, though.
 Alternative designs:
  The implementation of this option may want to consider cases where the
  same exit node is shared by two or more circuits and
  IsolateStreamsByPort is in force.  Since one possible use of the option
  is to reduce the opportunity of Exit Nodes to attack traffic from the
  same source on multiple ports, the implementation may need to ensure
  that circuits reserved for the exclusive use of given ports do not
  share the same exit node.  On the other hand, if our goal is only that
  streams should be unlinkable, deliberately shunting them to different
  exit nodes is unnecessary and slightly counterproductive.
  Earlier versions of this design included a mechanism to isolate
  _particular_ destination ports and addresses, so that traffic sent to,
  say, port 22 would never share a port with any traffic *not* sent to
  port 22.  You can achieve this here by having all applications that
  send traffic to one of these ports use a separate SOCKSPort, and
  then setting IsolateDestPorts on that SOCKSPort.
 Lingering questions:
  I suspect there are issues remaining with DNS and TransPort users, and
  that my "just use AutomapHostsOnResolve" suggestion may be
  insufficient.