Automatically fix broken shapefiles in seconds. Repair encoding, projections, and geometry issues with a single commandβno GIS experience required.
Shapefiles are ubiquitous in GIS workflows, but often arrive with common problems:
| Problem | Impact | Solution |
|---|---|---|
Missing encoding metadata (.cpg) |
Character corruption, garbled text in non-ASCII regions | Auto-detect or set UTF-8 |
Missing projection (.prj) |
Data rendered in wrong location; analysis results invalid | Apply default WGS84 (EPSG:4326) or custom CRS |
| Invalid geometries | Processing fails in analysis tools; topology errors | Repair/validate with shapely + pyshp |
Instead of manually fixing each file in ArcGIS or QGIS, run one command:
shapefile-sanitizer broken.shp fixed.shp --overwriteβ 100% test coverage | β Beginner-friendly | β Scriptable for batch jobs | β Open source
pip install shapefile-sanitizerpip install shapefile-sanitizer[geometry]git clone https://github.com/SpatialWorkflowIo/shapefile-sanitizer.git
cd shapefile-sanitizer
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"shapefile-sanitizer input/roads.shp output/roads_clean.shpshapefile-sanitizer roads.shp roads.shp --overwritefor file in data/*.shp; do
shapefile-sanitizer "$file" "output/$(basename $file)" --overwrite
doneThe CLI performs these steps in order:
- π Validates input β checks for required
.shp+.dbf - π Copies shapefile set β transfers
.shp,.shx,.dbf,.prj,.cpgto output - π€ Ensures encoding β adds/normalizes
.cpg(defaults to UTF-8) - π§ Ensures projection β adds/normalizes
.prj(defaults to EPSG:4326 / WGS84) - β
Repairs geometry (optional) β validates and fixes invalid shapes (requires
[geometry]extra)
$ shapefile-sanitizer parcels.shp parcels_clean.shp --overwrite
Copied: parcels_clean.shp, parcels_clean.dbf
[encoding] changed - wrote UTF-8 to parcels_clean.cpg
[projection] changed - wrote default CRS EPSG:4326 to parcels_clean.prj
[geometry] skipped - optional dependencies missing (install: pip install shapefile-sanitizer[geometry])
Completed.Import shapefiles from multiple sources with inconsistent metadata into a unified GIS or data warehouse.
for source in vendor_*.shp; do
shapefile-sanitizer "$source" "standardized/$(basename $source)" --overwrite
doneEnsure public GIS datasets conform to best practices before release (UTF-8 encoding, explicit CRS).
Fix geometry and projection issues before running spatial analysis, reducing downstream errors.
Integrate shapefile repair as a stage in data ingestion pipelines (Docker-friendly).
shapefile-sanitizer --helpusage: shapefile-sanitizer INPUT_SHP OUTPUT_SHP [--overwrite]
Fix common shapefile issues.
positional arguments:
INPUT_SHP Path to input .shp file
OUTPUT_SHP Path to output .shp file
optional arguments:
--overwrite Overwrite output if it already exists
-h, --help Show this help message and exit
- Python: 3.8+
- Core dependencies: pyshp (shapefile reading/writing)
- Optional (geometry): shapely (geometry validation/repair)
pytestEnforces 100% code coverage β all logic paths are tested.
- Create
src/shapefile_sanitizer/stages/new_stage.py - Implement
repair_new_thing(shapefile_path: str) -> StageResult - Add to pipeline in
src/shapefile_sanitizer/pipeline.py - Add tests to
tests/test_stages.py - Update
README.mddocs
shapefile-sanitizer/
βββ src/shapefile_sanitizer/
β βββ cli.py # CLI argument parsing + UX
β βββ pipeline.py # Stage orchestration
β βββ models.py # Shared contracts (StageResult, etc.)
β βββ stages/
β βββ encoding.py # .cpg handling
β βββ projection.py # .prj handling
β βββ geometry.py # optional shape validation/repair
βββ tests/ # 100% coverage tests
βββ README.md # This file
βββ pyproject.toml # Package metadata
Ensure you're passing the full .shp file path, not just the stem. Required sidecars (.shx, .dbf) must also exist.
# β
Correct
shapefile-sanitizer data/roads.shp output/roads.shp
# β Wrong
shapefile-sanitizer data/roads output/roadsUse --overwrite to replace:
shapefile-sanitizer roads.shp roads_fixed.shp --overwriteEnsure [geometry] extras are installed:
pip install -e shapefile-sanitizer[geometry]Verify with:
python -c "import shapely; print(shapely.__version__)"Found a bug or have a feature request? Issues & PRs welcome!
For major changes, please open an issue first to discuss proposed changes.
This project is licensed under the MIT License β see LICENSE file for details.
Website: https://spatialworkflow.io/
Repository: github.com/SpatialWorkflowIo/shapefile-sanitizer
Learn more about shapefiles: