A fast, parallel improved version of the iGreedy algorithm for large-scale anycast-aware geolocation

180 150W  120W  90W   60W   30W  000   30E   60E   90E   120E  150E 180
|    |     |     |     |     |    |     |     |     |     |     |     |
+90N-+-----+-----+-----+-----+----+-----+-----+-----+-----+-----+-----+
|          . _..::__:  ,-"-"._       |7       ,     _,.__             |
|  _.___ _ _<_>`!(._`.`-.    /        _._     `_ ,_/  '  '-._.---.-.__|
|.{     " " `-==,',._\{  \  / {)     / _ ">_,-' `                mt-2_|
+ \_.:--.       `._ )`^-. "'      , [_/( G        e      o     __,/-' +
|'"'     \         "    _L       0o_,--'                )     /. (|   |
|         | A  n     y,'          >_.\\._<> 6              _,' /  '   |
|         `. c   s   /          [~/_'` `"(   l     o      <'}  )      |
+30N       \\  a .-.t)          /   `-'"..' `:._        c  _)  '      +
|   `        \  (  `(          /         `:\  > \  ,-^.  /' '         |
|             `._,   ""        |           \`'   \|   ?_)  {\         |
|                `=.---.       `._._ i     ,'     "`  |' ,- '.        |
+000               |a    `-._       |     /          `:`<_|h--._      +
|                  (      l >       .     | ,          `=.__.`-'\     |
|                   `.     /        |     |{|              ,-.,\     .|
|                    |   ,'          \ z / `'            ," a   \     |
+30S                 |  /             |_'                |  __ t/     +
|                    |o|                                 '-'  `-'  i\.|
|                    |/                                        "  n / |
|                    \.          _                              _     |
+60S                            / \   _ __  _   _  ___ __ _ ___| |_   +
|                     ,/       / _ \ | '_ \| | | |/ __/ _` / __| __|  |
|    ,-----"-..?----_/ )      / ___ \| | | | |_| | (_| (_| \__ \ |_ _ |
|.._(                  `----'/_/   \_\_| |_|\__, |\___\__,_|___/\__| -|
+90S-+-----+-----+-----+-----+-----+-----+--___/ /--+-----+-----+-----+
     Based on 1998 Map by Matthew Thomas   |____/ Hacked on 2015 by 8^/

This repository contains a geolocation algorithm based on iGreedy that was published in the paper Latency-Based Anycast Geolocation: Algorithms, Software, and Data Sets.

The delta of this work is a performance aware implementation through multi-threading, implemented in Rust. In addition, we improve the iGreedy algorithm by geolocating IPs using the intersection of discs within each MIS cluster (see iGreedy paper for details) rather than the lowest circle in each set. This implementation outputs the most likely city (or airport) for each MIS cluster. Unicast targets produce a single location, whereas anycast targets produce multiple locations corresponding to different anycast sites.

The goal of this implementation is to reduce processing time for LACeS (an Open, Fast, Responsible and Efficient Longitudinal Anycast Census System). This code is used to produce daily anycast censuses, publicly available. It is designed to run using a single input file (containing latencies from multiple vantage points to targets), outputting a single file with geolocation results.

How our geolocation implementation works

Given a set of RTT (round-trip time) measurements from geographically distributed vantage points (VPs) to a target IP, the algorithm determines the target's location(s):

RTT to distance — Each VP's RTT is converted to a maximum geographic radius (a disc) using the speed of light in fiber, centered on the VP's known location. The target must lie somewhere within this disc.
Enumeration (MIS) — Discs are sorted by radius (ascending). A greedy Maximum Independent Set (MIS) is built: each disc that does not overlap with any already-selected disc is added. Each MIS disc represents a distinct network site. A single MIS disc means unicast; multiple means anycast.
Clustering — For each MIS disc, all other discs that overlap with it (and only it—discs overlapping multiple MIS discs are excluded as ambiguous) are collected into a cluster. These discs all likely measured the same site.
Intersection & geolocation — Within the cluster:
- Find the smallest disc in the cluster (tightest constraint).
- Collect all candidate cities within this smallest disc's bounding box.
- Progressively intersect with each cluster disc (smallest to largest). If adding the next disc would remove all candidates, stop and use the last non-empty set.
- Apply the relative population filter (--pop-ratio): keep only cities with pop >= max_pop × ratio, where max_pop is the largest population among remaining candidates.
- Select the best city using: score = α × (pop / Σpop) − (1 − α) × (dist / Σdist), where distance is measured from the smallest disc's center.
Deduplication — If two MIS clusters resolve to the same city, only the one with the smaller radius (processed first) is kept.

The result is one output row per detected site, each with the geolocated city, its coordinates, and the defining VP.

Pre-compiled binaries

We provide pre-compiled binaries for Linux and macOS.

Linux (x86_64, static musl)

curl -LO https://github.com/rhendriks/MiGreedy/releases/latest/download/migreedy-linux-x86_64.tar.gz
tar -xzvf migreedy-linux-x86_64.tar.gz
./migreedy --input path/to/measurements.csv --output path/to/results.csv

macOS (Apple Silicon)

curl -LO https://github.com/rhendriks/MiGreedy/releases/latest/download/migreedy-macos-aarch64.tar.gz
tar -xzvf migreedy-macos-aarch64.tar.gz
./migreedy --input path/to/measurements.csv --output path/to/results.csv

macOS (Intel)

curl -LO https://github.com/rhendriks/MiGreedy/releases/latest/download/migreedy-macos-x86_64.tar.gz
tar -xzvf migreedy-macos-x86_64.tar.gz
./migreedy --input path/to/measurements.csv --output path/to/results.csv

Running with Docker

The code can be ran using Docker.

Step 1: Pull the Docker Image

Pull the latest pre-built image from the GitHub Container Registry:

docker pull ghcr.io/rhendriks/migreedy:main

Step 2: Prepare Your Data Directory

You need a local directory containing your input CSV file (e.g., measurements.csv). This directory will be mounted into the Docker container.

# Example: Create a directory and move your data into it
mkdir igreedy_data
mv measurements.csv igreedy_data/

Step 3: Run the Container

Execute the docker run command, which mounts your data directory and passes the necessary arguments to the MiGreedy script.

Linux/MacOS

docker run --rm \
  -v "$(pwd)"/igreedy_data:/app/data \
  ghcr.io/rhendriks/migreedy:main \
  --input /app/data/measurements.csv \
  --output /app/data/results.csv

Windows (PowerShell)

docker run --rm `
  -v "${PWD}\igreedy_data:/app/data" `
  ghcr.io/rhendriks/migreedy:main `
  --input /app/data/measurements.csv `
  --output /app/data/results.csv

After the command finishes, the output file results.csv will appear in your local igreedy_data directory.

Installation

Clone this repository:

git clone https://github.com/rhendriks/MiGreedy
cd MiGreedy

Install Rust:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
rustup update

Build the project using Cargo:
```
cargo build --release
```

Run the compiled binary with the required arguments:

./target/release/migreedy --input path/to/measurements.csv --output path/to/results.csv

Command-Line Arguments

Argument	Default	Description
`-i`, `--input`	(Required)	Path to the input CSV file containing RTT measurements. Mutually exclusive with `--atlas`.
`--atlas`		RIPE Atlas measurement ID or URL (e.g. `11501` or `https://atlas.ripe.net/measurements/11501/`). Mutually exclusive with `--input`.
`-o`, `--output`	(Required)	Path for the output CSV file where results will be saved. Defaults to `atlas_<ID>.csv` when using `--atlas`.
`-d`, `--dataset`	`cities`	Location dataset to use: `cities` (embedded), `airports` (embedded), or a path to a custom CSV file.
`-m`, `--min-pop`	`0`	Absolute minimum population threshold. Cities below this are filtered out at load time.
`-p`, `--pop-ratio`	`0.0`	Relative population threshold (0.0–1.0). During geolocation, keeps only cities with `pop >= max_pop × ratio` among candidates.
`-a`, `--alpha`	`1.0`	A float (0.0 to 1.0) to tune the geolocation scoring. A higher alpha prioritizes population density over distance from the disc center.
`-t`, `--threshold`	`0`	Discards measurements with an RTT greater than this value (in ms) to bound the maximum radius and potential error.
`--anycast`	`false`	If set, outputs only geolocation for anycast targets.
`--accuracy`	`false`	If set, adds `candidate_diameter` (km) and `num_constraints` columns to the output (see below).

RIPE Atlas example

You can geolocate targets directly from a RIPE Atlas measurement without any local data files. For example, measurement 2001 is a traceroute towards K-root:

./migreedy --atlas 2001

This fetches the latest results from the RIPE Atlas API, runs the geolocation algorithm, and writes the output to atlas_2001.csv. You can also pass a full URL instead of a numeric ID:

./migreedy --atlas https://atlas.ripe.net/measurements/2001/

NOTE: RIPE Atlas probes may have wrong user-reported locations which result in wrong geolocation results.

Datasets

MiGreedy ships with embedded airports and cities datasets. The airports dataset is the original airport dataset (as used by iGreedy) with duplicate airports removed. The cities dataset contains all cities with a population of 500 or higher (sourced from GeoNames).

Population filtering:

--min-pop <N> filters cities globally at load time (absolute threshold)
--pop-ratio <R> filters cities per-geolocation, keeping only those with pop >= max_pop × R (relative threshold)

These can be combined. For example, --min-pop 10000 --pop-ratio 0.5 first removes all cities under 10k, then during each geolocation keeps only the top 50% by population among candidates.

Select a dataset with the -d flag:

./migreedy --atlas 11501 -d cities
./migreedy --atlas 11501 -d cities --min-pop 15000
./migreedy --input measurements.csv --output results.csv -d airports

City datasets are sourced from GeoNames and licensed under CC BY 4.0.

Data Format

Input File Format

The input CSV file must not have a header and should contain the following columns in this specific order:

Column	Data Type	Description
`target`	string	The IP address being measured.
`hostname`	string	The hostname or ID of the prober (VP).
`lat`	float	The latitude of the prober.
`lon`	float	The longitude of the prober.
`rtt`	float	The round-trip time (in ms) to the target.

Output File Format

The output CSV file will have a header and contain the following columns:

Column	Description
`target`	The IP address.
`vp`	The hostname of the vantage point that defined the disc.
`vp_lat`	The latitude of the vantage point.
`vp_lon`	The longitude of the vantage point.
`radius`	The radius of the disc in kilometers.
`pop_iata`	The identifier of the geolocated location (IATA code for airports, GeoNames ID for cities). "NoCity" if none found.
`pop_lat`	The latitude of the geolocated location.
`pop_lon`	The longitude of the geolocated location.
`pop_city`	The city name of the geolocated location.
`pop_cc`	The country code of the geolocated location.

When --accuracy is set, two additional columns are appended:

Column	Description
`candidate_diameter`	Maximum pairwise distance (km) between surviving candidate cities. Smaller values indicate higher precision.
`num_constraints`	Number of discs that narrowed the candidate set. Higher values indicate higher confidence in the result.

Author

Remi Hendriks
GitHub: @rhendriks
Contact: remi.hendriks@utwente.nl

Contributing

Issues and pull requests are welcome!

Citation

This code was designed for our paper LACeS. Please use the following citation when using this code.

@inproceedings{10.1145/3730567.3764484,
      author = {Hendriks, Remi and Luckie, Matthew and Jonker, Mattijs and Sommese, Raffaele and van Rijswijk-Deij, Roland},
      title = {LACeS: An Open, Fast, Responsible and Efficient Longitudinal Anycast Census System},
      year = {2025},
      isbn = {9798400718601},
      publisher = {Association for Computing Machinery},
      address = {New York, NY, USA},
      url = {https://doi.org/10.1145/3730567.3764484},
      doi = {10.1145/3730567.3764484},
      abstract = {IP anycast replicates an address at multiple locations to reduce latency and enhance resilience. Due to anycast's crucial role in the modern Internet, earlier research introduced tools to perform anycast censuses. The first, iGreedy, uses latency measurements from geographically dispersed locations to map anycast deployments. The second, MAnycast2, uses anycast to perform a census of other anycast networks. MAnycast2's advantage is speed and coverage but suffers from problems with accuracy, while iGreedy is highly accurate but slower using author-defined probing rates and costlier. In this paper we address the shortcomings of both systems and present LACeS (Longitudinal Anycast Census System). Taking MAnycast2 as a basis, we completely redesign its measurement pipeline, and add support for distributed probing, additional protocols (DNS over UDP, TCP SYN/ACK, and IPv6) and latency measurements similar to iGreedy. We validate LACeS on an anycast testbed with 32 globally distributed nodes, compare against an external anycast production deployment, extensive latency measurements with RIPE Atlas and cross-check over 60\% of detected anycast using operator ground truth that shows LACeS achieves high accuracy. Finally, we provide a longitudinal analysis of anycast, covering 17+months, showing LACeS achieves high precision. We make continual daily LACeS censuses available to the community and release the source code of the tool under a permissive open source license.},
      booktitle = {Proceedings of the 2025 ACM Internet Measurement Conference},
      pages = {445–461},
      numpages = {17},
      keywords = {internet measurement, anycast, internet topology, routing, ip},
      location = {USA},
      series = {IMC '25}
}

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
.github/workflows		.github/workflows
datasets		datasets
src		src
.gitignore		.gitignore
CITATION.cff		CITATION.cff
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A fast, parallel improved version of the iGreedy algorithm for large-scale anycast-aware geolocation

How our geolocation implementation works

Pre-compiled binaries

Linux (x86_64, static musl)

macOS (Apple Silicon)

macOS (Intel)

Running with Docker

Step 1: Pull the Docker Image

Step 2: Prepare Your Data Directory

Step 3: Run the Container

Linux/MacOS

Windows (PowerShell)

Installation

Command-Line Arguments

RIPE Atlas example

Datasets

Data Format

Input File Format

Output File Format

Author

Contributing

Citation

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A fast, parallel improved version of the iGreedy algorithm for large-scale anycast-aware geolocation

How our geolocation implementation works

Pre-compiled binaries

Linux (x86_64, static musl)

macOS (Apple Silicon)

macOS (Intel)

Running with Docker

Step 1: Pull the Docker Image

Step 2: Prepare Your Data Directory

Step 3: Run the Container

Linux/MacOS

Windows (PowerShell)

Installation

Command-Line Arguments

RIPE Atlas example

Datasets

Data Format

Input File Format

Output File Format

Author

Contributing

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages