Skip to content

SpatialWorkflowIo/shapefile-sanitizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ—ΊοΈ shapefile-sanitizer

Automatically fix broken shapefiles in seconds. Repair encoding, projections, and geometry issues with a single commandβ€”no GIS experience required.

Python 3.8+ License: MIT Tests PyPI


🎯 Why shapefile-sanitizer?

Shapefiles are ubiquitous in GIS workflows, but often arrive with common problems:

Problem Impact Solution
Missing encoding metadata (.cpg) Character corruption, garbled text in non-ASCII regions Auto-detect or set UTF-8
Missing projection (.prj) Data rendered in wrong location; analysis results invalid Apply default WGS84 (EPSG:4326) or custom CRS
Invalid geometries Processing fails in analysis tools; topology errors Repair/validate with shapely + pyshp

Instead of manually fixing each file in ArcGIS or QGIS, run one command:

shapefile-sanitizer broken.shp fixed.shp --overwrite

βœ… 100% test coverage | βœ… Beginner-friendly | βœ… Scriptable for batch jobs | βœ… Open source


πŸ“¦ Installation

Minimal (encoding + projection only)

pip install shapefile-sanitizer

With geometry repair (recommended)

pip install shapefile-sanitizer[geometry]

For development

git clone https://github.com/SpatialWorkflowIo/shapefile-sanitizer.git
cd shapefile-sanitizer
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

πŸš€ Quick Start

Basic usage

shapefile-sanitizer input/roads.shp output/roads_clean.shp

Overwrite existing files

shapefile-sanitizer roads.shp roads.shp --overwrite

Batch process (with geometry repair)

for file in data/*.shp; do
  shapefile-sanitizer "$file" "output/$(basename $file)" --overwrite
done

πŸ“‹ What It Does

The CLI performs these steps in order:

  1. πŸ“‹ Validates input β€” checks for required .shp + .dbf
  2. πŸ“ Copies shapefile set β€” transfers .shp, .shx, .dbf, .prj, .cpg to output
  3. πŸ”€ Ensures encoding β€” adds/normalizes .cpg (defaults to UTF-8)
  4. 🧭 Ensures projection β€” adds/normalizes .prj (defaults to EPSG:4326 / WGS84)
  5. βœ… Repairs geometry (optional) β€” validates and fixes invalid shapes (requires [geometry] extra)

Example output

$ shapefile-sanitizer parcels.shp parcels_clean.shp --overwrite
Copied: parcels_clean.shp, parcels_clean.dbf
[encoding] changed - wrote UTF-8 to parcels_clean.cpg
[projection] changed - wrote default CRS EPSG:4326 to parcels_clean.prj
[geometry] skipped - optional dependencies missing (install: pip install shapefile-sanitizer[geometry])
Completed.

πŸ”§ Use Cases

πŸ“Š Data Standardization Pipelines

Import shapefiles from multiple sources with inconsistent metadata into a unified GIS or data warehouse.

for source in vendor_*.shp; do
  shapefile-sanitizer "$source" "standardized/$(basename $source)" --overwrite
done

🌍 Publishing Open Data

Ensure public GIS datasets conform to best practices before release (UTF-8 encoding, explicit CRS).

πŸ”¬ Research Data Prep

Fix geometry and projection issues before running spatial analysis, reducing downstream errors.

πŸ—οΈ ETL Workflows

Integrate shapefile repair as a stage in data ingestion pipelines (Docker-friendly).


πŸ“– Full Documentation

Command-line Options

shapefile-sanitizer --help
usage: shapefile-sanitizer INPUT_SHP OUTPUT_SHP [--overwrite]

Fix common shapefile issues.

positional arguments:
  INPUT_SHP              Path to input .shp file
  OUTPUT_SHP             Path to output .shp file

optional arguments:
  --overwrite            Overwrite output if it already exists
  -h, --help             Show this help message and exit

Environment & Requirements

  • Python: 3.8+
  • Core dependencies: pyshp (shapefile reading/writing)
  • Optional (geometry): shapely (geometry validation/repair)

πŸ§ͺ Development

Run tests

pytest

Enforces 100% code coverage β€” all logic paths are tested.

Add a new fix stage

  1. Create src/shapefile_sanitizer/stages/new_stage.py
  2. Implement repair_new_thing(shapefile_path: str) -> StageResult
  3. Add to pipeline in src/shapefile_sanitizer/pipeline.py
  4. Add tests to tests/test_stages.py
  5. Update README.md docs

Project structure

shapefile-sanitizer/
β”œβ”€β”€ src/shapefile_sanitizer/
β”‚   β”œβ”€β”€ cli.py              # CLI argument parsing + UX
β”‚   β”œβ”€β”€ pipeline.py         # Stage orchestration
β”‚   β”œβ”€β”€ models.py           # Shared contracts (StageResult, etc.)
β”‚   └── stages/
β”‚       β”œβ”€β”€ encoding.py     # .cpg handling
β”‚       β”œβ”€β”€ projection.py   # .prj handling
β”‚       └── geometry.py     # optional shape validation/repair
β”œβ”€β”€ tests/                  # 100% coverage tests
β”œβ”€β”€ README.md               # This file
└── pyproject.toml          # Package metadata

πŸ› Troubleshooting

"Required input file not found"

Ensure you're passing the full .shp file path, not just the stem. Required sidecars (.shx, .dbf) must also exist.

# βœ… Correct
shapefile-sanitizer data/roads.shp output/roads.shp

# ❌ Wrong
shapefile-sanitizer data/roads output/roads

"Output file already exists"

Use --overwrite to replace:

shapefile-sanitizer roads.shp roads_fixed.shp --overwrite

Geometry repair not working

Ensure [geometry] extras are installed:

pip install -e shapefile-sanitizer[geometry]

Verify with:

python -c "import shapely; print(shapely.__version__)"

🀝 Contributing

Found a bug or have a feature request? Issues & PRs welcome!

For major changes, please open an issue first to discuss proposed changes.


πŸ“„ License

This project is licensed under the MIT License β€” see LICENSE file for details.


πŸ”— Author & Resources

Website: https://spatialworkflow.io/

Repository: github.com/SpatialWorkflowIo/shapefile-sanitizer

Learn more about shapefiles:

Releases

No releases published

Packages

 
 
 

Contributors

Languages