Domain Mapper

A Python command-line tool that crawls a website, extracts every third-party domain referenced in its HTML, and resolves the IPv4 and IPv6 addresses of those domains. It is useful for auditing external dependencies, mapping third-party scripts and assets, and understanding a site's infrastructure footprint.

Description

The tool performs a breadth-first crawl of a site, starting from a given URL and limited to a configurable number of pages. While crawling, it follows internal links and collects URLs referenced in common HTML attributes such as href, src, action, data-src, and data-href. Any domain that differs from the starting domain is recorded as external. Once crawling finishes, each external domain is resolved via DNS to obtain its IPv4 and/or IPv6 addresses.

The crawler reuses a single HTTP session with automatic retries on transient server errors, skips non-HTML responses, caps the amount of data downloaded per page, and can optionally respect robots.txt and add a delay between requests.

Requirements

Python 3.8 or higher
Libraries: requests, beautifulsoup4, colorama (optional, used only for colored terminal output)

Installation

Clone the repository or download the script directly, then install the dependencies with pip:

pip install -r requirements.txt

The tool runs without colorama as well; in that case, output is shown without color.

Basic Usage

python3 main.py <URL> [options]

The URL can be provided with or without a protocol (http:// or https://). If the protocol is omitted, http:// is used.

Available Options

Option	Description
`-m`, `--max-pages N`	Maximum number of internal pages to crawl (default: 30).
`-t`, `--timeout N`	Timeout in seconds for HTTP requests (default: 10).
`--user-agent UA`	Custom User-Agent string for requests.
`--no-ipv4`	Disables IPv4 address resolution.
`--no-ipv6`	Disables IPv6 address resolution.
`--include-internal`	Includes the base domain in the results (by default only external domains are listed).
`--delay SECONDS`	Adds a delay between requests to reduce load on the target server.
`--respect-robots`	Honors `robots.txt` disallow rules while crawling internal pages.
`--max-content-size BYTES`	Caps how much of each response body is downloaded (default: 5 MB).
`-v`, `--verbose`	Displays detailed information during execution.
`--export FILE.csv`	Exports results to a CSV file.
`--json FILE.json`	Exports results to a JSON file.
`--simple`	Forces simple table output even when an export is requested.

Examples

Quick analysis of a website:
```
python3 main.py example.com
```
Increase the page limit and show detailed logs:
```
python3 main.py https://example.com -m 50 -v
```

Export results to CSV and JSON at the same time:

python3 main.py example.com --export domains.csv --json domains.json

Skip IPv6 resolution and include the base domain in the output:
```
python3 main.py example.com --no-ipv6 --include-internal
```
Crawl politely, respecting robots.txt and waiting between requests:
```
python3 main.py example.com --respect-robots --delay 1.5
```

Output

Simple Table (default)

Displays a formatted list with the domain name and the IP addresses found. Domains that could not be resolved are marked as (unresolved).

Example:

====================================================
DOMAIN                                    IP ADDRESSES
====================================================
cdn.example.com                           192.0.2.10, 2001:db8::1
api.thirdparty.net                        203.0.113.5
fonts.googleapis.com                      (unresolved)
====================================================
Total: 3 domains.

CSV Export

A file with the columns Domain, IPv4, and IPv6. Multiple IP addresses of the same type are separated by semicolons.

JSON Export

A file containing a list of objects with the following structure:

[
  {
    "domain": "cdn.example.com",
    "ipv4": ["192.0.2.10"],
    "ipv6": ["2001:db8::1"],
    "ips": ["192.0.2.10", "2001:db8::1"],
    "resolved": true
  }
]

Internal Operation

URL normalization: adds http:// if no protocol is provided.
Crawling: uses a queue to traverse internal pages, respecting the --max-pages limit and, optionally, robots.txt rules.
Fetching: streams each response, skips content that is not text/html, and stops downloading once --max-content-size is reached.
URL extraction: parses HTML with BeautifulSoup and collects links from common tags (a, link, script, img, iframe, form, source, embed, object) and custom attributes (data-src, data-href).
Classification: separates internal URLs (same domain as the start URL) from external ones.
DNS caching: avoids repeated lookups for domains seen more than once.
Resolution: uses socket.getaddrinfo to obtain IPv4 and/or IPv6 addresses for each external domain.

Limitations

Crawling only follows links found within the same origin domain as the starting URL.
JavaScript is not executed; only links present in the static HTML are considered.
DNS resolution happens after crawling completes, not while a page is being processed.
robots.txt enforcement, when enabled, only governs which internal pages are crawled; it does not affect which external domains are reported.

Responsible Use

This tool only requests publicly available pages and performs standard DNS lookups, the same kind of information any browser or nslookup/dig command would reveal. Always make sure you are authorized to crawl a target site, and consider using --respect-robots and --delay to avoid placing unnecessary load on the servers you scan.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Domain Mapper

Description

Requirements

Installation

Basic Usage

Available Options

Examples

Output

Simple Table (default)

CSV Export

JSON Export

Internal Operation

Limitations

Responsible Use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Domain Mapper

Description

Requirements

Installation

Basic Usage

Available Options

Examples

Output

Simple Table (default)

CSV Export

JSON Export

Internal Operation

Limitations

Responsible Use

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages