tor/doc/spec/proposals/ideas/xxx-bridge-disbursement.txt


How to hand out bridges.

Divide bridges into 'strategies' as they come in.  Do this uniformly
at random for now.

For each strategy, we'll hand out bridges in a different way to
clients.  This document describes two strategies: email-based and
IP-based.

0. Notation:

   HMAC(k,v) : an HMAC of v using the key k.

   A|B: The string A concatenated with the string B.


1. Email-based.

  Goal: bootstrap based on one or more popular email service's sybil
     prevention algorithms.


  Parameters:
     HMAC -- an HMAC function
     P -- a time period
     K -- the number of bridges to send in a period.

  Setup: Generate two nonces, N and M.

  As bridges arrive, put them into a ring according to HMAC(N,ID)
  where ID is the bridges's identity digest.

  Divide time into divisions of length P.

  When we get an email:

     If it's not from a supported email service, reject it.

     If we already sent a response to that email address (normalized)
     in this period, send _exactly_ the same response.

     If it is from a supported service, generate X = HMAC(M,PS|E) where E
     is the lowercased normalized email address for the user, and
     where PS is the start of the currrent period.  Send
     the first K bridges in the ring after point X.

     [If we want to make sure that repeat queries are given exactly the
      same results, then we can't let the ring change during the
      time period. For a long time period like a month, that's quite a
      hassle. How about instead just keeping a replay cache of addresses
      that have been answered, and sending them a "sorry, you already got
      your addresses for the time period; perhaps you should try these
      other fine distribution strategies while you wait?" response? This
      approach would also resolve the "Make sure you can't construct a
      distinct address to match an existing one" note below. -RD]

        [I think, if we get a replay, we need to send back the same
        answer as we did the first time, not say "try again."
        Otherwise we need to worry that an attacker can keep people
        from getting bridges by preemtively asking for them,
        or that an attacker may force them to prove they haven't
        gotten any bridges by asking. -NM]

     [While we're at it, if we do the replay cache thing and don't need
      repeatable answers, we could just pick K random answers from the
      pool. Is it beneficial that a bridge user who knows about a clump of
      nodes will be sharing them with other users who know about a similar
      (overlapping) clump? One good aspect is against an adversary who
      learns about a clump this way and watches those bridges to learn
      other users and discover *their* bridges: he doesn't learn about
      as many new bridges as he might if they were randomly distributed.
      A drawback is against an adversary who happens to pick two email
      addresses in P that include overlapping answers: he can measure
      the difference in clumps and estimate how quickly the bridge pool
      is growing. -RD]

        [Random is one more darn thing to implement; rings are already
         there. -NM]

     [If we make the period P be mailbox-specific, and make it a random
      value around some mean, then we make it harder for an attacker to
      know when to try using his small army of gmail addresses to gather
      another harvest. But we also make it harder for users to know when
      they can try again. -RD]

        [Letting the users know about when they can try again seems
         worthwhile.  Otherwise users and attackers will all probe and
         probe and probe until they get an answer.  No additional
         security will be achieved, but bandwidth will be lost. -NM]

  To normalize an email address:
     Start with the RFC822 address.  Consider only the mailbox {???}
     portion of the address (username@domain).  Put this into lowercase
     ascii.

  Questions:
     What to do with weird character encodings?  Look up the RFC.

  Notes:
     Make sure that you can't force a single email address to appear
     in lots of different ways.  IOW, if nickm@freehaven.net and
     NICKM@freehaven.net aren't treated the same, then I can get lots
     more bridges than I should.

     Make sure you can't construct a distinct address to match an
     existing one.  IOW, if we treat nickm@X and nickm@Y as the same
     user, then anybody can register nickm@Z and use it to tell which
     bridges nickm@X got (or would get).

     Make sure that we actually check headers so we can't be trivially
     used to spam people.


2. IP-based.

  Goal: avoid handing out all the bridges to users in a similar IP
  space and time.

  Parameters:

     T_Flush -- how long it should take a user on a single network to
        see a whole cluster of bridges.

     N_C

     K -- the number of bridges we hand out in response to a single
     request.

  Setup: using an AS map or a geoip map or some other flawed input
  source, divide IP space into "areas" such that surveying a large
  collection of "areas" is hard.  For v0, use /24 address blocks.

  Group areas into N_C clusters.

  Generate secrets L, M, N.

  Set the period P such that P*(bridges-per-cluster/K) = T_flush.
  Don't set P to greater than a week, or less than three hours.

  When we get a bridge:

     Based on HMAC(L,ID), assign the bridge to a cluster.  Within each
     cluster, keep the bridges in a ring based on HMAC(M,ID).

     [Should we re-sort the rings for each new time period, so the ring
      for a given cluster is based on HMAC(M,PS|ID)? -RD]

  When we get a connection:

     If it's http, redirect it to https.

     Let area be the incoming IP network.  Let PS be the current
     period.  Compute X = HMAC(N, PS|area).  Return the next K bridges
     in the ring after X.

     [Don't we want to compute C = HMAC(key, area) to learn what cluster
      to answer from, and then X = HMAC(key, PS|area) to pick a point in
      that ring? -RD]


  Need to clarify that some HMACs are for rings, and some are for
  partitions. How rings scale is clear. How do we grow the number of
  partitions? Looking at successive bits from the HMAC output is one way.

3. Open issues

   Denial of service attacks
   A good view of network topology

at some point we should learn some reliability stats on our bridges. when
we say above 'give out k bridges', we might give out 2 reliable ones and
k-2 others. we count around the ring the same way we do now, to find them.
r15220@tombo: nickm \| 2007-12-07 19:53:27 -0500 add draft sketch of bridge disbursement designs. svn:r12717 2007-12-08 01:53:43 +01:00
			`How to hand out bridges.`

			`Divide bridges into 'strategies' as they come in. Do this uniformly`
			`at random for now.`

			`For each strategy, we'll hand out bridges in a different way to`
			`clients. This document describes two strategies: email-based and`
			`IP-based.`

			`0. Notation:`

			`HMAC(k,v) : an HMAC of v using the key k.`

			`A\|B: The string A concatenated with the string B.`


			`1. Email-based.`

			`Goal: bootstrap based on one or more popular email service's sybil`
			`prevention algorithms.`


			`Parameters:`
			`HMAC -- an HMAC function`
			`P -- a time period`
			`K -- the number of bridges to send in a period.`

			`Setup: Generate two nonces, N and M.`

			`As bridges arrive, put them into a ring according to HMAC(N,ID)`
			`where ID is the bridges's identity digest.`

			`Divide time into divisions of length P.`

			`When we get an email:`

			`If it's not from a supported email service, reject it.`

			`If we already sent a response to that email address (normalized)`
			`in this period, send _exactly_ the same response.`

			`If it is from a supported service, generate X = HMAC(M,PS\|E) where E`
			`is the lowercased normalized email address for the user, and`
			`where PS is the start of the currrent period. Send`
			`the first K bridges in the ring after point X.`

attacks and cleanups on the bridge disbursement plans svn:r12720 2007-12-08 05:13:07 +01:00			`[If we want to make sure that repeat queries are given exactly the`
			`same results, then we can't let the ring change during the`
			`time period. For a long time period like a month, that's quite a`
			`hassle. How about instead just keeping a replay cache of addresses`
			`that have been answered, and sending them a "sorry, you already got`
			`your addresses for the time period; perhaps you should try these`
			`other fine distribution strategies while you wait?" response? This`
			`approach would also resolve the "Make sure you can't construct a`
			`distinct address to match an existing one" note below. -RD]`

update and integrate proposals 125 (bridges) and 137 (bootstrap status) svn:r16374 2008-08-03 17:34:28 +02:00			`[I think, if we get a replay, we need to send back the same`
r15268@tombo: nickm \| 2007-12-11 18:22:52 -0500 tweaks to bridge-disbursement document svn:r12774 2007-12-12 00:23:25 +01:00			`answer as we did the first time, not say "try again."`
			`Otherwise we need to worry that an attacker can keep people`
			`from getting bridges by preemtively asking for them,`
			`or that an attacker may force them to prove they haven't`
			`gotten any bridges by asking. -NM]`

attacks and cleanups on the bridge disbursement plans svn:r12720 2007-12-08 05:13:07 +01:00			`[While we're at it, if we do the replay cache thing and don't need`
			`repeatable answers, we could just pick K random answers from the`
			`pool. Is it beneficial that a bridge user who knows about a clump of`
			`nodes will be sharing them with other users who know about a similar`
			`(overlapping) clump? One good aspect is against an adversary who`
			`learns about a clump this way and watches those bridges to learn`
			`other users and discover their bridges: he doesn't learn about`
			`as many new bridges as he might if they were randomly distributed.`
			`A drawback is against an adversary who happens to pick two email`
			`addresses in P that include overlapping answers: he can measure`
			`the difference in clumps and estimate how quickly the bridge pool`
			`is growing. -RD]`

r15268@tombo: nickm \| 2007-12-11 18:22:52 -0500 tweaks to bridge-disbursement document svn:r12774 2007-12-12 00:23:25 +01:00			`[Random is one more darn thing to implement; rings are already`
			`there. -NM]`

attacks and cleanups on the bridge disbursement plans svn:r12720 2007-12-08 05:13:07 +01:00			`[If we make the period P be mailbox-specific, and make it a random`
			`value around some mean, then we make it harder for an attacker to`
			`know when to try using his small army of gmail addresses to gather`
			`another harvest. But we also make it harder for users to know when`
			`they can try again. -RD]`

r15268@tombo: nickm \| 2007-12-11 18:22:52 -0500 tweaks to bridge-disbursement document svn:r12774 2007-12-12 00:23:25 +01:00			`[Letting the users know about when they can try again seems`
			`worthwhile. Otherwise users and attackers will all probe and`
			`probe and probe until they get an answer. No additional`
			`security will be achieved, but bandwidth will be lost. -NM]`

r15220@tombo: nickm \| 2007-12-07 19:53:27 -0500 add draft sketch of bridge disbursement designs. svn:r12717 2007-12-08 01:53:43 +01:00			`To normalize an email address:`
			`Start with the RFC822 address. Consider only the mailbox {???}`
attacks and cleanups on the bridge disbursement plans svn:r12720 2007-12-08 05:13:07 +01:00			`portion of the address (username@domain). Put this into lowercase`
r15220@tombo: nickm \| 2007-12-07 19:53:27 -0500 add draft sketch of bridge disbursement designs. svn:r12717 2007-12-08 01:53:43 +01:00			`ascii.`

			`Questions:`
			`What to do with weird character encodings? Look up the RFC.`

			`Notes:`
			`Make sure that you can't force a single email address to appear`
			`in lots of different ways. IOW, if nickm@freehaven.net and`
			`NICKM@freehaven.net aren't treated the same, then I can get lots`
			`more bridges than I should.`

			`Make sure you can't construct a distinct address to match an`
			`existing one. IOW, if we treat nickm@X and nickm@Y as the same`
			`user, then anybody can register nickm@Z and use it to tell which`
			`bridges nickm@X got (or would get).`

			`Make sure that we actually check headers so we can't be trivially`
attacks and cleanups on the bridge disbursement plans svn:r12720 2007-12-08 05:13:07 +01:00			`used to spam people.`
r15220@tombo: nickm \| 2007-12-07 19:53:27 -0500 add draft sketch of bridge disbursement designs. svn:r12717 2007-12-08 01:53:43 +01:00

			`2. IP-based.`

			`Goal: avoid handing out all the bridges to users in a similar IP`
			`space and time.`

			`Parameters:`

			`T_Flush -- how long it should take a user on a single network to`
			`see a whole cluster of bridges.`

			`N_C`

			`K -- the number of bridges we hand out in response to a single`
			`request.`

			`Setup: using an AS map or a geoip map or some other flawed input`
			`source, divide IP space into "areas" such that surveying a large`
attacks and cleanups on the bridge disbursement plans svn:r12720 2007-12-08 05:13:07 +01:00			`collection of "areas" is hard. For v0, use /24 address blocks.`
r15220@tombo: nickm \| 2007-12-07 19:53:27 -0500 add draft sketch of bridge disbursement designs. svn:r12717 2007-12-08 01:53:43 +01:00
			`Group areas into N_C clusters.`

attacks and cleanups on the bridge disbursement plans svn:r12720 2007-12-08 05:13:07 +01:00			`Generate secrets L, M, N.`
r15220@tombo: nickm \| 2007-12-07 19:53:27 -0500 add draft sketch of bridge disbursement designs. svn:r12717 2007-12-08 01:53:43 +01:00
			`Set the period P such that P*(bridges-per-cluster/K) = T_flush.`
			`Don't set P to greater than a week, or less than three hours.`

			`When we get a bridge:`

			`Based on HMAC(L,ID), assign the bridge to a cluster. Within each`
			`cluster, keep the bridges in a ring based on HMAC(M,ID).`

attacks and cleanups on the bridge disbursement plans svn:r12720 2007-12-08 05:13:07 +01:00			`[Should we re-sort the rings for each new time period, so the ring`
			`for a given cluster is based on HMAC(M,PS\|ID)? -RD]`

r15220@tombo: nickm \| 2007-12-07 19:53:27 -0500 add draft sketch of bridge disbursement designs. svn:r12717 2007-12-08 01:53:43 +01:00			`When we get a connection:`

			`If it's http, redirect it to https.`

attacks and cleanups on the bridge disbursement plans svn:r12720 2007-12-08 05:13:07 +01:00			`Let area be the incoming IP network. Let PS be the current`
			`period. Compute X = HMAC(N, PS\|area). Return the next K bridges`
r15220@tombo: nickm \| 2007-12-07 19:53:27 -0500 add draft sketch of bridge disbursement designs. svn:r12717 2007-12-08 01:53:43 +01:00			`in the ring after X.`

attacks and cleanups on the bridge disbursement plans svn:r12720 2007-12-08 05:13:07 +01:00			`[Don't we want to compute C = HMAC(key, area) to learn what cluster`
r15268@tombo: nickm \| 2007-12-11 18:22:52 -0500 tweaks to bridge-disbursement document svn:r12774 2007-12-12 00:23:25 +01:00			`to answer from, and then X = HMAC(key, PS\|area) to pick a point in`
			`that ring? -RD]`

attacks and cleanups on the bridge disbursement plans svn:r12720 2007-12-08 05:13:07 +01:00
			`Need to clarify that some HMACs are for rings, and some are for`
			`partitions. How rings scale is clear. How do we grow the number of`
			`partitions? Looking at successive bits from the HMAC output is one way.`

r15220@tombo: nickm \| 2007-12-07 19:53:27 -0500 add draft sketch of bridge disbursement designs. svn:r12717 2007-12-08 01:53:43 +01:00			`3. Open issues`

			`Denial of service attacks`
			`A good view of network topology`

write down my some reliable / some not idea, before we lose it svn:r12781 2007-12-12 18:02:47 +01:00			`at some point we should learn some reliability stats on our bridges. when`
			`we say above 'give out k bridges', we might give out 2 reliable ones and`
			`k-2 others. we count around the ring the same way we do now, to find them.`