How to hand out bridges. Divide bridges into 'strategies' as they come in. Do this uniformly at random for now. For each strategy, we'll hand out bridges in a different way to clients. This document describes two strategies: email-based and IP-based. 0. Notation: HMAC(k,v) : an HMAC of v using the key k. A|B: The string A concatenated with the string B. 1. Email-based. Goal: bootstrap based on one or more popular email service's sybil prevention algorithms. Parameters: HMAC -- an HMAC function P -- a time period K -- the number of bridges to send in a period. Setup: Generate two nonces, N and M. As bridges arrive, put them into a ring according to HMAC(N,ID) where ID is the bridges's identity digest. Divide time into divisions of length P. When we get an email: If it's not from a supported email service, reject it. If we already sent a response to that email address (normalized) in this period, send _exactly_ the same response. If it is from a supported service, generate X = HMAC(M,PS|E) where E is the lowercased normalized email address for the user, and where PS is the start of the currrent period. Send the first K bridges in the ring after point X. [If we want to make sure that repeat queries are given exactly the same results, then we can't let the ring change during the time period. For a long time period like a month, that's quite a hassle. How about instead just keeping a replay cache of addresses that have been answered, and sending them a "sorry, you already got your addresses for the time period; perhaps you should try these other fine distribution strategies while you wait?" response? This approach would also resolve the "Make sure you can't construct a distinct address to match an existing one" note below. -RD] [While we're at it, if we do the replay cache thing and don't need repeatable answers, we could just pick K random answers from the pool. Is it beneficial that a bridge user who knows about a clump of nodes will be sharing them with other users who know about a similar (overlapping) clump? One good aspect is against an adversary who learns about a clump this way and watches those bridges to learn other users and discover *their* bridges: he doesn't learn about as many new bridges as he might if they were randomly distributed. A drawback is against an adversary who happens to pick two email addresses in P that include overlapping answers: he can measure the difference in clumps and estimate how quickly the bridge pool is growing. -RD] [If we make the period P be mailbox-specific, and make it a random value around some mean, then we make it harder for an attacker to know when to try using his small army of gmail addresses to gather another harvest. But we also make it harder for users to know when they can try again. -RD] To normalize an email address: Start with the RFC822 address. Consider only the mailbox {???} portion of the address (username@domain). Put this into lowercase ascii. Questions: What to do with weird character encodings? Look up the RFC. Notes: Make sure that you can't force a single email address to appear in lots of different ways. IOW, if nickm@freehaven.net and NICKM@freehaven.net aren't treated the same, then I can get lots more bridges than I should. Make sure you can't construct a distinct address to match an existing one. IOW, if we treat nickm@X and nickm@Y as the same user, then anybody can register nickm@Z and use it to tell which bridges nickm@X got (or would get). Make sure that we actually check headers so we can't be trivially used to spam people. 2. IP-based. Goal: avoid handing out all the bridges to users in a similar IP space and time. Parameters: T_Flush -- how long it should take a user on a single network to see a whole cluster of bridges. N_C K -- the number of bridges we hand out in response to a single request. Setup: using an AS map or a geoip map or some other flawed input source, divide IP space into "areas" such that surveying a large collection of "areas" is hard. For v0, use /24 address blocks. Group areas into N_C clusters. Generate secrets L, M, N. Set the period P such that P*(bridges-per-cluster/K) = T_flush. Don't set P to greater than a week, or less than three hours. When we get a bridge: Based on HMAC(L,ID), assign the bridge to a cluster. Within each cluster, keep the bridges in a ring based on HMAC(M,ID). [Should we re-sort the rings for each new time period, so the ring for a given cluster is based on HMAC(M,PS|ID)? -RD] When we get a connection: If it's http, redirect it to https. Let area be the incoming IP network. Let PS be the current period. Compute X = HMAC(N, PS|area). Return the next K bridges in the ring after X. [Don't we want to compute C = HMAC(key, area) to learn what cluster to answer from, and then X = HMAC(key, PS|area) to pick a point in that ring? -RD] Need to clarify that some HMACs are for rings, and some are for partitions. How rings scale is clear. How do we grow the number of partitions? Looking at successive bits from the HMAC output is one way. 3. Open issues Denial of service attacks A good view of network topology