mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-27 22:03:31 +01:00
Merge remote-tracking branch 'karsten/task-6266'
This commit is contained in:
commit
f366b0112e
3
changes/geoip-dec2012
Normal file
3
changes/geoip-dec2012
Normal file
@ -0,0 +1,3 @@
|
||||
o Minor features:
|
||||
- Update to the December 5 2012 Maxmind GeoLite Country database.
|
||||
|
3
changes/geoip-nov2012
Normal file
3
changes/geoip-nov2012
Normal file
@ -0,0 +1,3 @@
|
||||
o Minor features:
|
||||
- Update to the November 7 2012 Maxmind GeoLite Country database.
|
||||
|
7
changes/task-6266
Normal file
7
changes/task-6266
Normal file
@ -0,0 +1,7 @@
|
||||
o Minor features:
|
||||
- Use a script to replace "A1" ("Anonymous Proxy") entries in our
|
||||
geoip file with real country codes. This script fixes about 90% of
|
||||
"A1" entries automatically and uses manual country code assignments
|
||||
to fix the remaining 10%. See src/config/README.geoip for details.
|
||||
Fixes #6266.
|
||||
|
90
src/config/README.geoip
Normal file
90
src/config/README.geoip
Normal file
@ -0,0 +1,90 @@
|
||||
README.geoip -- information on the IP-to-country-code file shipped with tor
|
||||
===========================================================================
|
||||
|
||||
The IP-to-country-code file in src/config/geoip is based on MaxMind's
|
||||
GeoLite Country database with the following modifications:
|
||||
|
||||
- Those "A1" ("Anonymous Proxy") entries lying inbetween two entries with
|
||||
the same country code are automatically changed to that country code.
|
||||
These changes can be overriden by specifying a different country code
|
||||
in src/config/geoip-manual.
|
||||
|
||||
- Other "A1" entries are replaced with country codes specified in
|
||||
src/config/geoip-manual, or are left as is if there is no corresponding
|
||||
entry in that file. Even non-"A1" entries can be modified by adding a
|
||||
replacement entry to src/config/geoip-manual. Handle with care.
|
||||
|
||||
|
||||
1. Updating the geoip file from a MaxMind database file
|
||||
-------------------------------------------------------
|
||||
|
||||
Download the most recent MaxMind GeoLite Country database:
|
||||
http://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip
|
||||
|
||||
Run `python deanonymind.py` in the local directory. Review the output to
|
||||
learn about applied automatic/manual changes and watch out for any
|
||||
warnings.
|
||||
|
||||
Possibly edit geoip-manual to make more/fewer/different manual changes and
|
||||
re-run `python deanonymind.py`.
|
||||
|
||||
When done, prepend the new geoip file with a comment like this:
|
||||
|
||||
# Last updated based on $DATE Maxmind GeoLite Country
|
||||
# See README.geoip for details on the conversion.
|
||||
|
||||
|
||||
2. Verifying automatic and manual changes using diff
|
||||
----------------------------------------------------
|
||||
|
||||
To unzip the original MaxMind file and look at the automatic changes, run:
|
||||
|
||||
unzip GeoIPCountryCSV.zip
|
||||
diff -U1 GeoIPCountryWhois.csv AutomaticGeoIPCountryWhois.csv
|
||||
|
||||
To look at subsequent manual changes, run:
|
||||
|
||||
diff -U1 AutomaticGeoIPCountryWhois.csv ManualGeoIPCountryWhois.csv
|
||||
|
||||
To manually generate the geoip file and compare it to the automatically
|
||||
created one, run:
|
||||
|
||||
cut -d, -f3-5 < ManualGeoIPCountryWhois.csv | sed 's/"//g' > mygeoip
|
||||
diff -U1 geoip mygeoip
|
||||
|
||||
|
||||
3. Verifying automatic and manual changes using blockfinder
|
||||
-----------------------------------------------------------
|
||||
|
||||
Blockfinder is a powerful tool to handle multiple IP-to-country data
|
||||
sources. Blockfinder has a function to specify a country code and compare
|
||||
conflicting country code assignments in different data sources.
|
||||
|
||||
We can use blockfinder to compare A1 entries in the original MaxMind file
|
||||
with the same or overlapping blocks in the file generated above and in the
|
||||
RIR delegation files:
|
||||
|
||||
git clone https://github.com/ioerror/blockfinder
|
||||
cd blockfinder/
|
||||
python blockfinder -i
|
||||
python blockfinder -r ../GeoIPCountryWhois.csv
|
||||
python blockfinder -r ../ManualGeoIPCountryWhois.csv
|
||||
python blockfinder -p A1 > A1-comparison.txt
|
||||
|
||||
The output marks conflicts between assignments using either '*' in case of
|
||||
two different opinions or '#' for three or more different opinions about
|
||||
the country code for a given block.
|
||||
|
||||
The '*' conflicts are most likely harmless, because there will always be
|
||||
at least two opinions with the original MaxMind file saying A1 and the
|
||||
other two sources saying something more meaningful.
|
||||
|
||||
However, watch out for '#' conflicts. In these cases, the original
|
||||
MaxMind file ("A1"), the updated MaxMind file (hopefully the correct
|
||||
country code), and the RIR delegation files (some other country code) all
|
||||
disagree.
|
||||
|
||||
There are perfectly valid cases where the updated MaxMind file and the RIR
|
||||
delegation files don't agree. But each of those cases must be verified
|
||||
manually.
|
||||
|
194
src/config/deanonymind.py
Executable file
194
src/config/deanonymind.py
Executable file
@ -0,0 +1,194 @@
|
||||
#!/usr/bin/env python
|
||||
import optparse
|
||||
import os
|
||||
import sys
|
||||
import zipfile
|
||||
|
||||
"""
|
||||
Take a MaxMind GeoLite Country database as input and replace A1 entries
|
||||
with the country code and name of the preceding entry iff the preceding
|
||||
(subsequent) entry ends (starts) directly before (after) the A1 entry and
|
||||
both preceding and subsequent entries contain the same country code.
|
||||
|
||||
Then apply manual changes, either replacing A1 entries that could not be
|
||||
replaced automatically or overriding previously made automatic changes.
|
||||
"""
|
||||
|
||||
def main():
|
||||
options = parse_options()
|
||||
assignments = read_file(options.in_maxmind)
|
||||
assignments = apply_automatic_changes(assignments)
|
||||
write_file(options.out_automatic, assignments)
|
||||
manual_assignments = read_file(options.in_manual, must_exist=False)
|
||||
assignments = apply_manual_changes(assignments, manual_assignments)
|
||||
write_file(options.out_manual, assignments)
|
||||
write_file(options.out_geoip, assignments, long_format=False)
|
||||
|
||||
def parse_options():
|
||||
parser = optparse.OptionParser()
|
||||
parser.add_option('-i', action='store', dest='in_maxmind',
|
||||
default='GeoIPCountryCSV.zip', metavar='FILE',
|
||||
help='use the specified MaxMind GeoLite Country .zip or .csv '
|
||||
'file as input [default: %default]')
|
||||
parser.add_option('-g', action='store', dest='in_manual',
|
||||
default='geoip-manual', metavar='FILE',
|
||||
help='use the specified .csv file for manual changes or to '
|
||||
'override automatic changes [default: %default]')
|
||||
parser.add_option('-a', action='store', dest='out_automatic',
|
||||
default="AutomaticGeoIPCountryWhois.csv", metavar='FILE',
|
||||
help='write full input file plus automatic changes to the '
|
||||
'specified .csv file [default: %default]')
|
||||
parser.add_option('-m', action='store', dest='out_manual',
|
||||
default='ManualGeoIPCountryWhois.csv', metavar='FILE',
|
||||
help='write full input file plus automatic and manual '
|
||||
'changes to the specified .csv file [default: %default]')
|
||||
parser.add_option('-o', action='store', dest='out_geoip',
|
||||
default='geoip', metavar='FILE',
|
||||
help='write full input file plus automatic and manual '
|
||||
'changes to the specified .csv file that can be shipped '
|
||||
'with tor [default: %default]')
|
||||
(options, args) = parser.parse_args()
|
||||
return options
|
||||
|
||||
def read_file(path, must_exist=True):
|
||||
if not os.path.exists(path):
|
||||
if must_exist:
|
||||
print 'File %s does not exist. Exiting.' % (path, )
|
||||
sys.exit(1)
|
||||
else:
|
||||
return
|
||||
if path.endswith('.zip'):
|
||||
zip_file = zipfile.ZipFile(path)
|
||||
csv_content = zip_file.read('GeoIPCountryWhois.csv')
|
||||
zip_file.close()
|
||||
else:
|
||||
csv_file = open(path)
|
||||
csv_content = csv_file.read()
|
||||
csv_file.close()
|
||||
assignments = []
|
||||
for line in csv_content.split('\n'):
|
||||
stripped_line = line.strip()
|
||||
if len(stripped_line) > 0 and not stripped_line.startswith('#'):
|
||||
assignments.append(stripped_line)
|
||||
return assignments
|
||||
|
||||
def apply_automatic_changes(assignments):
|
||||
print '\nApplying automatic changes...'
|
||||
result_lines = []
|
||||
prev_line = None
|
||||
a1_lines = []
|
||||
for line in assignments:
|
||||
if '"A1"' in line:
|
||||
a1_lines.append(line)
|
||||
else:
|
||||
if len(a1_lines) > 0:
|
||||
new_a1_lines = process_a1_lines(prev_line, a1_lines, line)
|
||||
for new_a1_line in new_a1_lines:
|
||||
result_lines.append(new_a1_line)
|
||||
a1_lines = []
|
||||
result_lines.append(line)
|
||||
prev_line = line
|
||||
if len(a1_lines) > 0:
|
||||
new_a1_lines = process_a1_lines(prev_line, a1_lines, None)
|
||||
for new_a1_line in new_a1_lines:
|
||||
result_lines.append(new_a1_line)
|
||||
return result_lines
|
||||
|
||||
def process_a1_lines(prev_line, a1_lines, next_line):
|
||||
if not prev_line or not next_line:
|
||||
return a1_lines # Can't merge first or last line in file.
|
||||
if len(a1_lines) > 1:
|
||||
return a1_lines # Can't merge more than 1 line at once.
|
||||
a1_line = a1_lines[0].strip()
|
||||
prev_entry = parse_line(prev_line)
|
||||
a1_entry = parse_line(a1_line)
|
||||
next_entry = parse_line(next_line)
|
||||
touches_prev_entry = int(prev_entry['end_num']) + 1 == \
|
||||
int(a1_entry['start_num'])
|
||||
touches_next_entry = int(a1_entry['end_num']) + 1 == \
|
||||
int(next_entry['start_num'])
|
||||
same_country_code = prev_entry['country_code'] == \
|
||||
next_entry['country_code']
|
||||
if touches_prev_entry and touches_next_entry and same_country_code:
|
||||
new_line = format_line_with_other_country(a1_entry, prev_entry)
|
||||
print '-%s\n+%s' % (a1_line, new_line, )
|
||||
return [new_line]
|
||||
else:
|
||||
return a1_lines
|
||||
|
||||
def parse_line(line):
|
||||
if not line:
|
||||
return None
|
||||
keys = ['start_str', 'end_str', 'start_num', 'end_num',
|
||||
'country_code', 'country_name']
|
||||
stripped_line = line.replace('"', '').strip()
|
||||
parts = stripped_line.split(',')
|
||||
entry = dict((k, v) for k, v in zip(keys, parts))
|
||||
return entry
|
||||
|
||||
def format_line_with_other_country(original_entry, other_entry):
|
||||
return '"%s","%s","%s","%s","%s","%s"' % (original_entry['start_str'],
|
||||
original_entry['end_str'], original_entry['start_num'],
|
||||
original_entry['end_num'], other_entry['country_code'],
|
||||
other_entry['country_name'], )
|
||||
|
||||
def apply_manual_changes(assignments, manual_assignments):
|
||||
if not manual_assignments:
|
||||
return assignments
|
||||
print '\nApplying manual changes...'
|
||||
manual_dict = {}
|
||||
for line in manual_assignments:
|
||||
start_num = parse_line(line)['start_num']
|
||||
if start_num in manual_dict:
|
||||
print ('Warning: duplicate start number in manual '
|
||||
'assignments:\n %s\n %s\nDiscarding first entry.' %
|
||||
(manual_dict[start_num], line, ))
|
||||
manual_dict[start_num] = line
|
||||
result = []
|
||||
for line in assignments:
|
||||
entry = parse_line(line)
|
||||
start_num = entry['start_num']
|
||||
if start_num in manual_dict:
|
||||
manual_line = manual_dict[start_num]
|
||||
manual_entry = parse_line(manual_line)
|
||||
if entry['start_str'] == manual_entry['start_str'] and \
|
||||
entry['end_str'] == manual_entry['end_str'] and \
|
||||
entry['end_num'] == manual_entry['end_num']:
|
||||
if len(manual_entry['country_code']) != 2:
|
||||
print '-%s' % (line, ) # only remove, don't replace
|
||||
else:
|
||||
new_line = format_line_with_other_country(entry,
|
||||
manual_entry)
|
||||
print '-%s\n+%s' % (line, new_line, )
|
||||
result.append(new_line)
|
||||
del manual_dict[start_num]
|
||||
else:
|
||||
print ('Warning: only partial match between '
|
||||
'original/automatically replaced assignment and '
|
||||
'manual assignment:\n %s\n %s\nNot applying '
|
||||
'manual change.' % (line, manual_line, ))
|
||||
result.append(line)
|
||||
else:
|
||||
result.append(line)
|
||||
if len(manual_dict) > 0:
|
||||
print ('Warning: could not apply all manual assignments: %s' %
|
||||
('\n '.join(manual_dict.values())), )
|
||||
return result
|
||||
|
||||
def write_file(path, assignments, long_format=True):
|
||||
if long_format:
|
||||
output_lines = assignments
|
||||
else:
|
||||
output_lines = []
|
||||
for long_line in assignments:
|
||||
entry = parse_line(long_line)
|
||||
short_line = "%s,%s,%s" % (entry['start_num'],
|
||||
entry['end_num'], entry['country_code'], )
|
||||
output_lines.append(short_line)
|
||||
out_file = open(path, 'w')
|
||||
out_file.write('\n'.join(output_lines))
|
||||
out_file.close()
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
|
124322
src/config/geoip
124322
src/config/geoip
File diff suppressed because it is too large
Load Diff
114
src/config/geoip-manual
Normal file
114
src/config/geoip-manual
Normal file
@ -0,0 +1,114 @@
|
||||
# This file contains manual overrides of A1 entries (and possibly others)
|
||||
# in MaxMind's GeoLite Country database. Use deanonymind.py in the same
|
||||
# directory to process this file when producing a new geoip file. See
|
||||
# README.geoip in the same directory for details.
|
||||
|
||||
# Remove MaxMind entry 0.116.0.0-0.119.255.255 which MaxMind says is AT,
|
||||
# but which is part of reserved range 0.0.0.0/8. -KL 2012-06-13
|
||||
"0.116.0.0","0.119.255.255","7602176","7864319","",""
|
||||
|
||||
# NL, because previous MaxMind entry 31.171.128.0-31.171.133.255 is NL,
|
||||
# and RIR delegation files say 31.171.128.0-31.171.135.255 is NL.
|
||||
# -KL 2012-11-27
|
||||
"31.171.134.0","31.171.135.255","531334656","531335167","NL","Netherlands"
|
||||
|
||||
# EU, because next MaxMind entry 37.139.64.1-37.139.64.9 is EU, because
|
||||
# RIR delegation files say 37.139.64.0-37.139.71.255 is EU, and because it
|
||||
# just makes more sense for the next entry to start at .0 and not .1.
|
||||
# -KL 2012-11-27
|
||||
"37.139.64.0","37.139.64.0","629882880","629882880","EU","Europe"
|
||||
|
||||
# CH, because previous MaxMind entry 46.19.141.0-46.19.142.255 is CH, and
|
||||
# RIR delegation files say 46.19.136.0-46.19.143.255 is CH.
|
||||
# -KL 2012-11-27
|
||||
"46.19.143.0","46.19.143.255","773033728","773033983","CH","Switzerland"
|
||||
|
||||
# GB, because next MaxMind entry 46.166.129.0-46.166.134.255 is GB, and
|
||||
# RIR delegation files say 46.166.128.0-46.166.191.255 is GB.
|
||||
# -KL 2012-11-27
|
||||
"46.166.128.0","46.166.128.255","782663680","782663935","GB","United Kingdom"
|
||||
|
||||
# US, though could as well be CA. Previous MaxMind entry
|
||||
# 64.237.32.52-64.237.34.127 is US, next MaxMind entry
|
||||
# 64.237.34.144-64.237.34.151 is CA, and RIR delegation files say the
|
||||
# entire block 64.237.32.0-64.237.63.255 is US. -KL 2012-11-27
|
||||
"64.237.34.128","64.237.34.143","1089282688","1089282703","US","United States"
|
||||
|
||||
# US, though could as well be UY. Previous MaxMind entry
|
||||
# 67.15.170.0-67.15.182.255 is US, next MaxMind entry
|
||||
# 67.15.183.128-67.15.183.159 is UY, and RIR delegation files say the
|
||||
# entire block 67.15.0.0-67.15.255.255 is US. -KL 2012-11-27
|
||||
"67.15.183.0","67.15.183.127","1125103360","1125103487","US","United States"
|
||||
|
||||
# US, because next MaxMind entry 67.43.145.0-67.43.155.255 is US, and RIR
|
||||
# delegation files say 67.43.144.0-67.43.159.255 is US.
|
||||
# -KL 2012-11-27
|
||||
"67.43.144.0","67.43.144.255","1126928384","1126928639","US","United States"
|
||||
|
||||
# US, because previous MaxMind entry 70.159.21.51-70.232.244.255 is US,
|
||||
# because next MaxMind entry 70.232.245.58-70.232.245.59 is A2 ("Satellite
|
||||
# Provider") which is a country information about as useless as A1, and
|
||||
# because RIR delegation files say 70.224.0.0-70.239.255.255 is US.
|
||||
# -KL 2012-11-27
|
||||
"70.232.245.0","70.232.245.57","1189672192","1189672249","US","United States"
|
||||
|
||||
# US, because next MaxMind entry 70.232.246.0-70.240.141.255 is US,
|
||||
# because previous MaxMind entry 70.232.245.58-70.232.245.59 is A2
|
||||
# ("Satellite Provider") which is a country information about as useless
|
||||
# as A1, and because RIR delegation files say 70.224.0.0-70.239.255.255 is
|
||||
# US. -KL 2012-11-27
|
||||
"70.232.245.60","70.232.245.255","1189672252","1189672447","US","United States"
|
||||
|
||||
# GB, despite neither previous (GE) nor next (LV) MaxMind entry being GB,
|
||||
# but because RIR delegation files agree with both previous and next
|
||||
# MaxMind entry and say GB for 91.228.0.0-91.228.3.255. -KL 2012-11-27
|
||||
"91.228.0.0","91.228.3.255","1541668864","1541669887","GB","United Kingdom"
|
||||
|
||||
# GB, because next MaxMind entry 91.232.125.0-91.232.125.255 is GB, and
|
||||
# RIR delegation files say 91.232.124.0-91.232.125.255 is GB.
|
||||
# -KL 2012-11-27
|
||||
"91.232.124.0","91.232.124.255","1541962752","1541963007","GB","United Kingdom"
|
||||
|
||||
# GB, despite neither previous (RU) nor next (PL) MaxMind entry being GB,
|
||||
# but because RIR delegation files agree with both previous and next
|
||||
# MaxMind entry and say GB for 91.238.214.0-91.238.215.255.
|
||||
# -KL 2012-11-27
|
||||
"91.238.214.0","91.238.215.255","1542379008","1542379519","GB","United Kingdom"
|
||||
|
||||
# US, because next MaxMind entry 173.0.16.0-173.0.65.255 is US, and RIR
|
||||
# delegation files say 173.0.0.0-173.0.15.255 is US. -KL 2012-11-27
|
||||
"173.0.0.0","173.0.15.255","2902458368","2902462463","US","United States"
|
||||
|
||||
# US, because next MaxMind entry 176.67.84.0-176.67.84.79 is US, and RIR
|
||||
# delegation files say 176.67.80.0-176.67.87.255 is US. -KL 2012-11-27
|
||||
"176.67.80.0","176.67.83.255","2957201408","2957202431","US","United States"
|
||||
|
||||
# US, because previous MaxMind entry 176.67.84.192-176.67.85.255 is US,
|
||||
# and RIR delegation files say 176.67.80.0-176.67.87.255 is US.
|
||||
# -KL 2012-11-27
|
||||
"176.67.86.0","176.67.87.255","2957202944","2957203455","US","United States"
|
||||
|
||||
# EU, despite neither previous (RU) nor next (UA) MaxMind entry being EU,
|
||||
# but because RIR delegation files agree with both previous and next
|
||||
# MaxMind entry and say EU for 193.200.150.0-193.200.150.255.
|
||||
# -KL 2012-11-27
|
||||
"193.200.150.0","193.200.150.255","3251148288","3251148543","EU","Europe"
|
||||
|
||||
# US, because previous MaxMind entry 199.96.68.0-199.96.87.127 is US, and
|
||||
# RIR delegation files say 199.96.80.0-199.96.87.255 is US.
|
||||
# -KL 2012-11-27
|
||||
"199.96.87.128","199.96.87.255","3344979840","3344979967","US","United States"
|
||||
|
||||
# US, because previous MaxMind entry 209.58.176.144-209.59.31.255 is US,
|
||||
# and RIR delegation files say 209.59.32.0-209.59.63.255 is US.
|
||||
# -KL 2012-11-27
|
||||
"209.59.32.0","209.59.63.255","3510312960","3510321151","US","United States"
|
||||
|
||||
# FR, because previous MaxMind entry 217.15.166.0-217.15.166.255 is FR,
|
||||
# and RIR delegation files contain a block 217.15.160.0-217.15.175.255
|
||||
# which, however, is EU, not FR. But merging with next MaxMind entry
|
||||
# 217.15.176.0-217.15.191.255 which is KZ and which fully matches what
|
||||
# the RIR delegation files say seems unlikely to be correct.
|
||||
# -KL 2012-11-27
|
||||
"217.15.167.0","217.15.175.255","3641681664","3641683967","FR","France"
|
||||
|
Loading…
Reference in New Issue
Block a user