try-point-blank

Try Pointblank - https://posit-dev.github.io/pointblank/

try-point-blank

Status

Overview

This project demonstrates data validation in Python using Posit's pointblank, a new open-source library inspired by its popular R counterpart.

Ensuring data quality is essential for trustworthy analytics, decision-making, and data governance. pointblank makes it easy to define validation rules for Pandas, Polars, and DuckDB tables, generating clear, actionable reports.

Compared to other tools like Great Expectations, Pandera, and Deequ, pointblank stands out for its intuitive API, seamless integration with modern Python data stacks, and robust reporting features. The package enables you to check for missing values, validate ranges and categories, and extract failing records for further inspection. It also supports advanced features like threshold-based monitoring and custom actions on validation failures.

This repository includes example validations (using the classic Titanic dataset) and demonstrates how to leverage pointblank for practical data quality workflows. See the accompanying code and documentation for details on setup, rule definition, and report generation.

For a complete write up, see this blog post.

Key Artefacts

notebooks/titanic.ipynb: The main notebook that demonstrates the functionality of pointblank with the Titanic dataset.
notebooks/titanic.duckdb: The DuckDB database file containing the Titanic dataset.
LICENSE: The Apache License 2.0 under which this project is distributed.

Getting Started

Prerequisites

Python 3.10 or higher
pointblank
duckdb
pandas
notebook (including IPython) for displaying Markdown in notebooks

Install the required dependencies using uv:

uv add pointblank duckdb pandas ipython

Appendix: Comparison of Data Validation Libraries

Feature/Aspect	pointblank (Python)	Great Expectations (GX)	Pandera	Deequ
Ease of Use	Simple, user-friendly API; quick to get started; programmatic simplicity[2][3].	Steep learning curve; complex object model; can feel bloated for simple use cases[1][7].	Very simple, pandas-like API; shallow learning curve[7].	More complex, requires Scala/Spark.
Supported Backends	Pandas, Polars, DuckDB, SQL databases[2].	Pandas, Spark, SQL, cloud data stores[6].	Pandas, Dask, PySpark, Polars (via extensions)[7].	Spark, Scala, some PyDeequ support.
Reporting	Attractive, detailed HTML reports (gt-based)[3].	Comprehensive, customisable documentation and reports[6].	No built-in reporting; must build reporting manually[7].	Basic anomaly reports.
Validation Features	Wide range, but still maturing; atomic test units; easy drilldown to failures[2][3].	Very comprehensive; supports complex checks, profiling, and autogenerated tests[5][6][7].	Focused on schema validation and runtime checks; less comprehensive than GX[7].	Focused on anomaly detection.
Action Triggers	No built-in triggers for follow-up actions on failure; must code manually[2][3].	Supports post-validation actions (Checkpoints, integrations, notifications)[7].	No built-in triggers; manual handling required.	Limited; mainly Spark integrations.
Integration & Ecosystem	Early days; fewer integrations but growing[2].	Mature ecosystem; integrates with many data tools and cloud services[6][7].	Limited integrations; best for local, data science workflows[7].	Good for Spark/Scala environments.
Performance/Scalability	Suitable for moderate data sizes; not yet optimised for huge scale[2].	Scalable; designed for production and large projects[6][7].	Best for small to moderate data; not designed for big data or distributed validation[7].	Scales with Spark.
Community/Support	Newer, smaller community; backed by Posit[2][3].	Large, active community; extensive documentation and support[6][7].	Smaller, data science-focused community[7].	Supported by AWS, active in Spark.
Best For	Quick, readable validations; attractive reports; individual or small team use[2][3].	Production-grade, complex validation systems; integrations; enterprise use[6][7].	Lightweight, Pythonic validation for data science and ML workflows[7].	Spark/Scala pipelines, anomaly detection.

Summary of Pros/Cons

pointblank:
- Pros: Simple API, great reports, easy to use, supports multiple backends.
- Cons: Fewer integrations, lacks automated action triggers, new and still maturing[2][3].
Great Expectations:
- Pros: Feature-rich, highly configurable, strong integrations, production-ready, comprehensive reporting[6][7].
- Cons: Steep learning curve, can feel bloated for simple tasks, complex object model[1][7].
Pandera:
- Pros: Lightweight, familiar to pandas users, quick to write tests, good for data science workflows[7].
- Cons: Limited integrations and reporting, not designed for large-scale or production systems[7].
Deequ:
- Pros: Strong for Spark/Scala environments, good for anomaly detection and big data.
- Cons: Less Python support, basic reporting, less user-friendly for non-Spark users[1].

References

[1] https://www.reddit.com/r/dataengineering/comments/15a45gt/great_expectations_is_bloaty_what_are_the/
[2] https://aeturrell.com/blog/posts/the-data-validation-landscape-in-2025/
[3] https://www.linkedin.com/posts/richard-iannone-a5640017_i-started-a-new-blog-this-ones-all-about-activity-7314040067340046337-j6KD
[4] https://posit.co/blog/introducing-pointblank-for-python/
[5] https://docs.greatexpectations.io/docs/reference/learn/data_quality_use_cases/distribution/
[6] https://whylabs.ai/blog/posts/choosing-the-right-data-quality-monitoring-solution
[7] https://endjin.com/blog/2023/03/a-look-into-pandera-and-great-expectations-for-data-validation
[8] https://rstudio.github.io/pointblank/reference/tbl_match.html

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
notebooks		notebooks
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

try-point-blank

Status

Overview

Key Artefacts

Getting Started

Prerequisites

Appendix: Comparison of Data Validation Libraries

Summary of Pros/Cons

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

DataBooth/try-point-blank

Folders and files

Latest commit

History

Repository files navigation

try-point-blank

Status

Overview

Key Artefacts

Getting Started

Prerequisites

Appendix: Comparison of Data Validation Libraries

Summary of Pros/Cons

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages