A fast, parallel improved version of the iGreedy algorithm for large-scale anycast-aware geolocation
180 150W 120W 90W 60W 30W 000 30E 60E 90E 120E 150E 180
| | | | | | | | | | | | |
+90N-+-----+-----+-----+-----+----+-----+-----+-----+-----+-----+-----+
| . _..::__: ,-"-"._ |7 , _,.__ |
| _.___ _ _<_>`!(._`.`-. / _._ `_ ,_/ ' '-._.---.-.__|
|.{ " " `-==,',._\{ \ / {) / _ ">_,-' ` mt-2_|
+ \_.:--. `._ )`^-. "' , [_/( G e o __,/-' +
|'"' \ " _L 0o_,--' ) /. (| |
| | A n y,' >_.\\._<> 6 _,' / ' |
| `. c s / [~/_'` `"( l o <'} ) |
+30N \\ a .-.t) / `-'"..' `:._ c _) ' +
| ` \ ( `( / `:\ > \ ,-^. /' ' |
| `._, "" | \`' \| ?_) {\ |
| `=.---. `._._ i ,' "` |' ,- '. |
+000 |a `-._ | / `:`<_|h--._ +
| ( l > . | , `=.__.`-'\ |
| `. / | |{| ,-.,\ .|
| | ,' \ z / `' ," a \ |
+30S | / |_' | __ t/ +
| |o| '-' `-' i\.|
| |/ " n / |
| \. _ _ |
+60S / \ _ __ _ _ ___ __ _ ___| |_ +
| ,/ / _ \ | '_ \| | | |/ __/ _` / __| __| |
| ,-----"-..?----_/ ) / ___ \| | | | |_| | (_| (_| \__ \ |_ _ |
|.._( `----'/_/ \_\_| |_|\__, |\___\__,_|___/\__| -|
+90S-+-----+-----+-----+-----+-----+-----+--___/ /--+-----+-----+-----+
Based on 1998 Map by Matthew Thomas |____/ Hacked on 2015 by 8^/
This repository contains a geolocation algorithm based on iGreedy that was published in the paper Latency-Based Anycast Geolocation: Algorithms, Software, and Data Sets.
The delta of this work is a performance aware implementation through multi-threading, implemented in Rust. In addition, we improve the iGreedy algorithm by geolocating IPs using the intersection of discs within each MIS cluster (see iGreedy paper for details) rather than the lowest circle in each set. This implementation outputs the most likely city (or airport) for each MIS cluster. Unicast targets produce a single location, whereas anycast targets produce multiple locations corresponding to different anycast sites.
The goal of this implementation is to reduce processing time for LACeS (an Open, Fast, Responsible and Efficient Longitudinal Anycast Census System). This code is used to produce daily anycast censuses, publicly available. It is designed to run using a single input file (containing latencies from multiple vantage points to targets), outputting a single file with geolocation results.
Given a set of RTT (round-trip time) measurements from geographically distributed vantage points (VPs) to a target IP, the algorithm determines the target's location(s):
-
RTT to distance — Each VP's RTT is converted to a maximum geographic radius (a disc) using the speed of light in fiber, centered on the VP's known location. The target must lie somewhere within this disc.
-
Enumeration (MIS) — Discs are sorted by radius (ascending). A greedy Maximum Independent Set (MIS) is built: each disc that does not overlap with any already-selected disc is added. Each MIS disc represents a distinct network site. A single MIS disc means unicast; multiple means anycast.
-
Clustering — For each MIS disc, all other discs that overlap with it (and only it—discs overlapping multiple MIS discs are excluded as ambiguous) are collected into a cluster. These discs all likely measured the same site.
-
Intersection & geolocation — Within the cluster:
- Find the smallest disc in the cluster (tightest constraint).
- Collect all candidate cities within this smallest disc's bounding box.
- Progressively intersect with each cluster disc (smallest to largest). If adding the next disc would remove all candidates, stop and use the last non-empty set.
- Apply the relative population filter (
--pop-ratio): keep only cities withpop >= max_pop × ratio, wheremax_popis the largest population among remaining candidates. - Select the best city using:
score = α × (pop / Σpop) − (1 − α) × (dist / Σdist), where distance is measured from the smallest disc's center.
-
Deduplication — If two MIS clusters resolve to the same city, only the one with the smaller radius (processed first) is kept.
The result is one output row per detected site, each with the geolocated city, its coordinates, and the defining VP.
We provide pre-compiled binaries for Linux and macOS.
curl -LO https://github.com/rhendriks/MiGreedy/releases/latest/download/migreedy-linux-x86_64.tar.gz
tar -xzvf migreedy-linux-x86_64.tar.gz
./migreedy --input path/to/measurements.csv --output path/to/results.csvcurl -LO https://github.com/rhendriks/MiGreedy/releases/latest/download/migreedy-macos-aarch64.tar.gz
tar -xzvf migreedy-macos-aarch64.tar.gz
./migreedy --input path/to/measurements.csv --output path/to/results.csvcurl -LO https://github.com/rhendriks/MiGreedy/releases/latest/download/migreedy-macos-x86_64.tar.gz
tar -xzvf migreedy-macos-x86_64.tar.gz
./migreedy --input path/to/measurements.csv --output path/to/results.csvThe code can be ran using Docker.
Pull the latest pre-built image from the GitHub Container Registry:
docker pull ghcr.io/rhendriks/migreedy:mainYou need a local directory containing your input CSV file (e.g., measurements.csv).
This directory will be mounted into the Docker container.
# Example: Create a directory and move your data into it
mkdir igreedy_data
mv measurements.csv igreedy_data/Execute the docker run command, which mounts your data directory and passes the necessary arguments to the MiGreedy script.
docker run --rm \
-v "$(pwd)"/igreedy_data:/app/data \
ghcr.io/rhendriks/migreedy:main \
--input /app/data/measurements.csv \
--output /app/data/results.csvdocker run --rm `
-v "${PWD}\igreedy_data:/app/data" `
ghcr.io/rhendriks/migreedy:main `
--input /app/data/measurements.csv `
--output /app/data/results.csvAfter the command finishes, the output file results.csv will appear in your local igreedy_data directory.
-
Clone this repository:
git clone https://github.com/rhendriks/MiGreedy cd MiGreedy -
Install Rust:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source $HOME/.cargo/env rustup update
-
Build the project using Cargo:
cargo build --release
-
Run the compiled binary with the required arguments:
./target/release/migreedy --input path/to/measurements.csv --output path/to/results.csv
| Argument | Default | Description |
|---|---|---|
-i, --input |
(Required) | Path to the input CSV file containing RTT measurements. Mutually exclusive with --atlas. |
--atlas |
RIPE Atlas measurement ID or URL (e.g. 11501 or https://atlas.ripe.net/measurements/11501/). Mutually exclusive with --input. |
|
-o, --output |
(Required) | Path for the output CSV file where results will be saved. Defaults to atlas_<ID>.csv when using --atlas. |
-d, --dataset |
cities |
Location dataset to use: cities (embedded), airports (embedded), or a path to a custom CSV file. |
-m, --min-pop |
0 |
Absolute minimum population threshold. Cities below this are filtered out at load time. |
-p, --pop-ratio |
0.0 |
Relative population threshold (0.0–1.0). During geolocation, keeps only cities with pop >= max_pop × ratio among candidates. |
-a, --alpha |
1.0 |
A float (0.0 to 1.0) to tune the geolocation scoring. A higher alpha prioritizes population density over distance from the disc center. |
-t, --threshold |
0 |
Discards measurements with an RTT greater than this value (in ms) to bound the maximum radius and potential error. |
--anycast |
false |
If set, outputs only geolocation for anycast targets. |
--accuracy |
false |
If set, adds candidate_diameter (km) and num_constraints columns to the output (see below). |
You can geolocate targets directly from a RIPE Atlas measurement without any local data files. For example, measurement 2001 is a traceroute towards K-root:
./migreedy --atlas 2001This fetches the latest results from the RIPE Atlas API, runs the geolocation algorithm, and writes the output to atlas_2001.csv.
You can also pass a full URL instead of a numeric ID:
./migreedy --atlas https://atlas.ripe.net/measurements/2001/NOTE: RIPE Atlas probes may have wrong user-reported locations which result in wrong geolocation results.
MiGreedy ships with embedded airports and cities datasets. The airports dataset is the original airport dataset (as used by iGreedy) with duplicate airports removed. The cities dataset contains all cities with a population of 500 or higher (sourced from GeoNames).
Population filtering:
--min-pop <N>filters cities globally at load time (absolute threshold)--pop-ratio <R>filters cities per-geolocation, keeping only those withpop >= max_pop × R(relative threshold)
These can be combined. For example, --min-pop 10000 --pop-ratio 0.5 first removes all cities under 10k, then during each geolocation keeps only the top 50% by population among candidates.
Select a dataset with the -d flag:
./migreedy --atlas 11501 -d cities
./migreedy --atlas 11501 -d cities --min-pop 15000
./migreedy --input measurements.csv --output results.csv -d airportsCity datasets are sourced from GeoNames and licensed under CC BY 4.0.
The input CSV file must not have a header and should contain the following columns in this specific order:
| Column | Data Type | Description |
|---|---|---|
target |
string | The IP address being measured. |
hostname |
string | The hostname or ID of the prober (VP). |
lat |
float | The latitude of the prober. |
lon |
float | The longitude of the prober. |
rtt |
float | The round-trip time (in ms) to the target. |
The output CSV file will have a header and contain the following columns:
| Column | Description |
|---|---|
target |
The IP address. |
vp |
The hostname of the vantage point that defined the disc. |
vp_lat |
The latitude of the vantage point. |
vp_lon |
The longitude of the vantage point. |
radius |
The radius of the disc in kilometers. |
pop_iata |
The identifier of the geolocated location (IATA code for airports, GeoNames ID for cities). "NoCity" if none found. |
pop_lat |
The latitude of the geolocated location. |
pop_lon |
The longitude of the geolocated location. |
pop_city |
The city name of the geolocated location. |
pop_cc |
The country code of the geolocated location. |
When --accuracy is set, two additional columns are appended:
| Column | Description |
|---|---|
candidate_diameter |
Maximum pairwise distance (km) between surviving candidate cities. Smaller values indicate higher precision. |
num_constraints |
Number of discs that narrowed the candidate set. Higher values indicate higher confidence in the result. |
- Remi Hendriks
- GitHub: @rhendriks
- Contact:
remi.hendriks@utwente.nl
Issues and pull requests are welcome!
This code was designed for our paper LACeS. Please use the following citation when using this code.
@inproceedings{10.1145/3730567.3764484,
author = {Hendriks, Remi and Luckie, Matthew and Jonker, Mattijs and Sommese, Raffaele and van Rijswijk-Deij, Roland},
title = {LACeS: An Open, Fast, Responsible and Efficient Longitudinal Anycast Census System},
year = {2025},
isbn = {9798400718601},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3730567.3764484},
doi = {10.1145/3730567.3764484},
abstract = {IP anycast replicates an address at multiple locations to reduce latency and enhance resilience. Due to anycast's crucial role in the modern Internet, earlier research introduced tools to perform anycast censuses. The first, iGreedy, uses latency measurements from geographically dispersed locations to map anycast deployments. The second, MAnycast2, uses anycast to perform a census of other anycast networks. MAnycast2's advantage is speed and coverage but suffers from problems with accuracy, while iGreedy is highly accurate but slower using author-defined probing rates and costlier. In this paper we address the shortcomings of both systems and present LACeS (Longitudinal Anycast Census System). Taking MAnycast2 as a basis, we completely redesign its measurement pipeline, and add support for distributed probing, additional protocols (DNS over UDP, TCP SYN/ACK, and IPv6) and latency measurements similar to iGreedy. We validate LACeS on an anycast testbed with 32 globally distributed nodes, compare against an external anycast production deployment, extensive latency measurements with RIPE Atlas and cross-check over 60\% of detected anycast using operator ground truth that shows LACeS achieves high accuracy. Finally, we provide a longitudinal analysis of anycast, covering 17+months, showing LACeS achieves high precision. We make continual daily LACeS censuses available to the community and release the source code of the tool under a permissive open source license.},
booktitle = {Proceedings of the 2025 ACM Internet Measurement Conference},
pages = {445–461},
numpages = {17},
keywords = {internet measurement, anycast, internet topology, routing, ip},
location = {USA},
series = {IMC '25}
}