mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-28 14:23:30 +01:00
392 lines
17 KiB
Plaintext
392 lines
17 KiB
Plaintext
Filename: 166-statistics-extra-info-docs.txt
|
|
Title: Including Network Statistics in Extra-Info Documents
|
|
Author: Karsten Loesing
|
|
Created: 21-Jul-2009
|
|
Target: 0.2.2
|
|
Status: Accepted
|
|
|
|
Change history:
|
|
|
|
21-Jul-2009 Initial proposal for or-dev
|
|
|
|
|
|
Overview:
|
|
|
|
The Tor network has grown to almost two thousand relays and millions
|
|
of casual users over the past few years. With growth has come
|
|
increasing performance problems and attempts by some countries to
|
|
block access to the Tor network. In order to address these problems,
|
|
we need to learn more about the Tor network. This proposal suggests to
|
|
measure additional statistics and include them in extra-info documents
|
|
to help us understand the Tor network better.
|
|
|
|
|
|
Introduction:
|
|
|
|
As of May 2009, relays, bridges, and directories gather the following
|
|
data for statistical purposes:
|
|
|
|
- Relays and bridges count the number of bytes that they have pushed
|
|
in 15-minute intervals over the past 24 hours. Relays and bridges
|
|
include these data in extra-info documents that they send to the
|
|
directory authorities whenever they publish their server descriptor.
|
|
|
|
- Bridges further include a rough number of clients per country that
|
|
they have seen in the past 48 hours in their extra-info documents.
|
|
|
|
- Directories can be configured to count the number of clients they
|
|
see per country in the past 24 hours and to write them to a local
|
|
file.
|
|
|
|
Since then we extended the network statistics in Tor. These statistics
|
|
include:
|
|
|
|
- Directories now gather more precise statistics about connecting
|
|
clients. Fixes include measuring in intervals of exactly 24 hours,
|
|
counting unsuccessful requests, measuring download times, etc. The
|
|
directories append their statistics to a local file every 24 hours.
|
|
|
|
- Entry guards count the number of clients per country per day like
|
|
bridges do and write them to a local file every 24 hours.
|
|
|
|
- Relays measure statistics of the number of cells in their circuit
|
|
queues and how much time these cells spend waiting there. Relays
|
|
write these statistics to a local file every 24 hours.
|
|
|
|
- Exit nodes count the number of read and written bytes on exit
|
|
connections per port as well as the number of opened exit streams
|
|
per port in 24-hour intervals. Exit nodes write their statistics to
|
|
a local file.
|
|
|
|
The following four sections contain descriptions for adding these
|
|
statistics to the relays' extra-info documents.
|
|
|
|
|
|
Directory request statistics:
|
|
|
|
The first type of statistics aims at measuring directory requests sent
|
|
by clients to a directory mirror or directory authority. More
|
|
precisely, these statistics aim at requests for v2 and v3 network
|
|
statuses only. These directory requests are sent non-anonymously,
|
|
either via HTTP-like requests to a directory's Dir port or tunneled
|
|
over a 1-hop circuit.
|
|
|
|
Measuring directory request statistics is useful for several reasons:
|
|
First, the number of locally seen directory requests can be used to
|
|
estimate the total number of clients in the Tor network. Second, the
|
|
country-wise classification of requests using a GeoIP database can
|
|
help counting the relative and absolute number of users per country.
|
|
Third, the download times can give hints on the available bandwidth
|
|
capacity at clients.
|
|
|
|
Directory requests do not give any hints on the contents that clients
|
|
send or receive over the Tor network. Every client requests network
|
|
statuses from the directories, so that there are no anonymity-related
|
|
concerns to gather these statistics. It might be, though, that clients
|
|
wish to hide the fact that they are connecting to the Tor network.
|
|
Therefore, IP addresses are resolved to country codes in memory,
|
|
events are accumulated over 24 hours, and numbers are rounded up to
|
|
multiples of 4 or 8.
|
|
|
|
"dirreq-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
|
|
[At most once.]
|
|
|
|
YYYY-MM-DD HH:MM:SS defines the end of the included measurement
|
|
interval of length NSEC seconds (86400 seconds by default).
|
|
|
|
A "dirreq-stats-end" line, as well as any other "dirreq-*" line,
|
|
is only added when the relay has opened its Dir port and after 24
|
|
hours of measuring directory requests.
|
|
|
|
"dirreq-v2-ips" CC=N,CC=N,... NL
|
|
[At most once.]
|
|
"dirreq-v3-ips" CC=N,CC=N,... NL
|
|
[At most once.]
|
|
|
|
List of mappings from two-letter country codes to the number of
|
|
unique IP addresses that have connected from that country to
|
|
request a v2/v3 network status, rounded up to the nearest multiple
|
|
of 8. Only those IP addresses are counted that the directory can
|
|
answer with a 200 OK status code.
|
|
|
|
"dirreq-v2-reqs" CC=N,CC=N,... NL
|
|
[At most once.]
|
|
"dirreq-v3-reqs" CC=N,CC=N,... NL
|
|
[At most once.]
|
|
|
|
List of mappings from two-letter country codes to the number of
|
|
requests for v2/v3 network statuses from that country, rounded up
|
|
to the nearest multiple of 8. Only those requests are counted that
|
|
the directory can answer with a 200 OK status code.
|
|
|
|
"dirreq-v2-share" num% NL
|
|
[At most once.]
|
|
"dirreq-v3-share" num% NL
|
|
[At most once.]
|
|
|
|
The share of v2/v3 network status requests that the directory
|
|
expects to receive from clients based on its advertised bandwidth
|
|
compared to the overall network bandwidth capacity. Shares are
|
|
formatted in percent with two decimal places. Shares are
|
|
calculated as means over the whole 24-hour interval.
|
|
|
|
"dirreq-v2-resp" status=num,... NL
|
|
[At most once.]
|
|
"dirreq-v3-resp" status=nul,... NL
|
|
[At most once.]
|
|
|
|
List of mappings from response statuses to the number of requests
|
|
for v2/v3 network statuses that were answered with that response
|
|
status, rounded up to the nearest multiple of 4. Only response
|
|
statuses with at least 1 response are reported. New response
|
|
statuses can be added at any time. The current list of response
|
|
statuses is as follows:
|
|
|
|
"ok": a network status request is answered; this number
|
|
corresponds to the sum of all requests as reported in
|
|
"dirreq-v2-reqs" or "dirreq-v3-reqs", respectively, before
|
|
rounding up.
|
|
"not-enough-sigs: a version 3 network status is not signed by a
|
|
sufficient number of requested authorities.
|
|
"unavailable": a requested network status object is unavailable.
|
|
"not-found": a requested network status is not found.
|
|
"not-modified": a network status has not been modified since the
|
|
If-Modified-Since time that is included in the request.
|
|
"busy": the directory is busy.
|
|
|
|
"dirreq-v2-direct-dl" key=val,... NL
|
|
[At most once.]
|
|
"dirreq-v3-direct-dl" key=val,... NL
|
|
[At most once.]
|
|
"dirreq-v2-tunneled-dl" key=val,... NL
|
|
[At most once.]
|
|
"dirreq-v3-tunneled-dl" key=val,... NL
|
|
[At most once.]
|
|
|
|
List of statistics about possible failures in the download process
|
|
of v2/v3 network statuses. Requests are either "direct"
|
|
HTTP-encoded requests over the relay's directory port, or
|
|
"tunneled" requests using a BEGIN_DIR cell over the relay's OR
|
|
port. The list of possible statistics can change, and statistics
|
|
can be left out from reporting. The current list of statistics is
|
|
as follows:
|
|
|
|
Successful downloads and failures:
|
|
|
|
"complete": a client has finished the download successfully.
|
|
"timeout": a download did not finish within 10 minutes after
|
|
starting to send the response.
|
|
"running": a download is still running at the end of the
|
|
measurement period for less than 10 minutes after starting to
|
|
send the response.
|
|
|
|
Download times:
|
|
|
|
"min", "max": smallest and largest measured bandwidth in B/s.
|
|
"d[1-4,6-9]": 1st to 4th and 6th to 9th decile of measured
|
|
bandwidth in B/s. For a given decile i, i/10 of all downloads
|
|
had a smaller bandwidth than di, and (10-i)/10 of all downloads
|
|
had a larger bandwidth than di.
|
|
"q[1,3]": 1st and 3rd quartile of measured bandwidth in B/s. One
|
|
fourth of all downloads had a smaller bandwidth than q1, one
|
|
fourth of all downloads had a larger bandwidth than q3, and the
|
|
remaining half of all downloads had a bandwidth between q1 and
|
|
q3.
|
|
"md": median of measured bandwidth in B/s. Half of the downloads
|
|
had a smaller bandwidth than md, the other half had a larger
|
|
bandwidth than md.
|
|
|
|
|
|
Entry guard statistics:
|
|
|
|
Entry guard statistics include the number of clients per country and
|
|
per day that are connecting directly to an entry guard.
|
|
|
|
Entry guard statistics are important to learn more about the
|
|
distribution of clients to countries. In the future, this knowledge
|
|
can be useful to detect if there are or start to be any restrictions
|
|
for clients connecting from specific countries.
|
|
|
|
The information which client connects to a given entry guard is very
|
|
sensitive. This information must not be combined with the information
|
|
what contents are leaving the network at the exit nodes. Therefore,
|
|
entry guard statistics need to be aggregated to prevent them from
|
|
becoming useful for de-anonymization. Aggregation includes resolving
|
|
IP addresses to country codes, counting events over 24-hour intervals,
|
|
and rounding up numbers to the next multiple of 8.
|
|
|
|
"entry-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
|
|
[At most once.]
|
|
|
|
YYYY-MM-DD HH:MM:SS defines the end of the included measurement
|
|
interval of length NSEC seconds (86400 seconds by default).
|
|
|
|
An "entry-stats-end" line, as well as any other "entry-*"
|
|
line, is first added after the relay has been running for at least
|
|
24 hours.
|
|
|
|
"entry-ips" CC=N,CC=N,... NL
|
|
[At most once.]
|
|
|
|
List of mappings from two-letter country codes to the number of
|
|
unique IP addresses that have connected from that country to the
|
|
relay and which are no known other relays, rounded up to the
|
|
nearest multiple of 8.
|
|
|
|
|
|
Cell statistics:
|
|
|
|
The third type of statistics have to do with the time that cells spend
|
|
in circuit queues. In order to gather these statistics, the relay
|
|
memorizes when it puts a given cell in a circuit queue and when this
|
|
cell is flushed. The relay further notes the life time of the circuit.
|
|
These data are sufficient to determine the mean number of cells in a
|
|
queue over time and the mean time that cells spend in a queue.
|
|
|
|
Cell statistics are necessary to learn more about possible reasons for
|
|
the poor network performance of the Tor network, especially high
|
|
latencies. The same statistics are also useful to determine the
|
|
effects of design changes by comparing today's data with future data.
|
|
|
|
There are basically no privacy concerns from measuring cell
|
|
statistics, regardless of a node being an entry, middle, or exit node.
|
|
|
|
"cell-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
|
|
[At most once.]
|
|
|
|
YYYY-MM-DD HH:MM:SS defines the end of the included measurement
|
|
interval of length NSEC seconds (86400 seconds by default).
|
|
|
|
A "cell-stats-end" line, as well as any other "cell-*" line,
|
|
is first added after the relay has been running for at least 24
|
|
hours.
|
|
|
|
"cell-processed-cells" num,...,num NL
|
|
[At most once.]
|
|
|
|
Mean number of processed cells per circuit, subdivided into
|
|
deciles of circuits by the number of cells they have processed in
|
|
descending order from loudest to quietest circuits.
|
|
|
|
"cell-queued-cells" num,...,num NL
|
|
[At most once.]
|
|
|
|
Mean number of cells contained in queues by circuit decile. These
|
|
means are calculated by 1) determining the mean number of cells in
|
|
a single circuit between its creation and its termination and 2)
|
|
calculating the mean for all circuits in a given decile as
|
|
determined in "cell-processed-cells". Numbers have a precision of
|
|
two decimal places.
|
|
|
|
"cell-time-in-queue" num,...,num NL
|
|
[At most once.]
|
|
|
|
Mean time cells spend in circuit queues in milliseconds. Times are
|
|
calculated by 1) determining the mean time cells spend in the
|
|
queue of a single circuit and 2) calculating the mean for all
|
|
circuits in a given decile as determined in
|
|
"cell-processed-cells".
|
|
|
|
"cell-circuits-per-decile" num NL
|
|
[At most once.]
|
|
|
|
Mean number of circuits that are included in any of the deciles,
|
|
rounded up to the next integer.
|
|
|
|
|
|
Exit statistics:
|
|
|
|
The last type of statistics affects exit nodes counting the number of
|
|
bytes written and read and the number of streams opened per port and
|
|
per 24 hours. Exit port statistics can be measured from looking at
|
|
headers of BEGIN and DATA cells. A BEGIN cell contains the exit port
|
|
that is required for the exit node to open a new exit stream.
|
|
Subsequent DATA cells coming from the client or being sent back to the
|
|
client contain a length field stating how many bytes of application
|
|
data are contained in the cell.
|
|
|
|
Exit port statistics are important to measure in order to identify
|
|
possible load-balancing problems with respect to exit policies. Exit
|
|
nodes that permit more ports than others are very likely overloaded
|
|
with traffic for those ports plus traffic for other ports. Improving
|
|
load balancing in the Tor network improves the overall utilization of
|
|
bandwidth capacity.
|
|
|
|
Exit traffic is one of the most sensitive parts of network data in the
|
|
Tor network. Even though these statistics do not require looking at
|
|
traffic contents, statistics are aggregated so that they are not
|
|
useful for de-anonymizing users. Only those ports are reported that
|
|
have seen at least 0.1% of exiting or incoming bytes, numbers of bytes
|
|
are rounded up to full kibibytes (KiB), and stream numbers are rounded
|
|
up to the next multiple of 4.
|
|
|
|
"exit-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
|
|
[At most once.]
|
|
|
|
YYYY-MM-DD HH:MM:SS defines the end of the included measurement
|
|
interval of length NSEC seconds (86400 seconds by default).
|
|
|
|
An "exit-stats-end" line, as well as any other "exit-*" line, is
|
|
first added after the relay has been running for at least 24 hours
|
|
and only if the relay permits exiting (where exiting to a single
|
|
port and IP address is sufficient).
|
|
|
|
"exit-kibibytes-written" port=N,port=N,... NL
|
|
[At most once.]
|
|
"exit-kibibytes-read" port=N,port=N,... NL
|
|
[At most once.]
|
|
|
|
List of mappings from ports to the number of kibibytes that the
|
|
relay has written to or read from exit connections to that port,
|
|
rounded up to the next full kibibyte.
|
|
|
|
"exit-streams-opened" port=N,port=N,... NL
|
|
[At most once.]
|
|
|
|
List of mappings from ports to the number of opened exit streams
|
|
to that port, rounded up to the nearest multiple of 4.
|
|
|
|
|
|
Implementation notes:
|
|
|
|
Right now, relays that are configured accordingly write similar
|
|
statistics to those described in this proposal to disk every 24 hours.
|
|
With this proposal being implemented, relays include the contents of
|
|
these files in extra-info documents.
|
|
|
|
The following steps are necessary to implement this proposal:
|
|
|
|
1. The current format of [dirreq|entry|buffer|exit]-stats files needs
|
|
to be adapted to the description in this proposal. This step
|
|
basically means renaming keywords.
|
|
|
|
2. The timing of writing the four *-stats files should be unified, so
|
|
that they are written exactly 24 hours after starting the
|
|
relay. Right now, the measurement intervals for dirreq, entry, and
|
|
exit stats starts with the first observed request, and files are
|
|
written when observing the first request that occurs more than 24
|
|
hours after the beginning of the measurement interval. With this
|
|
proposal, the measurement intervals should all start at the same
|
|
time, and files should be written exactly 24 hours later.
|
|
|
|
3. It is advantageous to cache statistics in local files in the data
|
|
directory until they are included in extra-info documents. The
|
|
reason is that the 24-hour measurement interval can be very
|
|
different from the 18-hour publication interval of extra-info
|
|
documents. When a relay crashes after finishing a measurement
|
|
interval, but before publishing the next extra-info document,
|
|
statistics would get lost. Therefore, statistics are written to
|
|
disk when finishing a measurement interval and read from disk when
|
|
generating an extra-info document. Only the statistics that were
|
|
appended to the *-stats files within the past 24 hours are included
|
|
in extra-info documents. Further, the contents of the *-stats files
|
|
need to be checked in the process of generating extra-info documents.
|
|
|
|
4. With the statistics patches being tested, the ./configure options
|
|
should be removed and the statistics code be compiled by default.
|
|
It is still required for relay operators to add configuration
|
|
options (DirReqStatistics, ExitPortStatistics, etc.) to enable
|
|
gathering statistics. However, in the near future, statistics shall
|
|
be enabled gathered by all relays by default, where requiring a
|
|
./configure option would be a barrier for many relay operators.
|