Update datafusion-sqlancer to work with latest DataFusion#36
Conversation
|
Thank you! This should be good to go after CI passing. However, I don’t think we’re going to continue maintaining this repo. It’s built on top of an existing Java SQL testing framework, which I’ve found hard to maintain due to (1) the cross-language setup and (2) a lot of unnecessary abstractions in the implementation. I’ve put together a Rust re-implementation that should be much simpler. It can reuse DataFusion utilities for most major testing components, and avoids re-implementing things in Java: The core framework is already in place; it mainly needs some cleanup and ongoing maintenance. I’m happy to help maintain it together if there’s interest. |
|
Thanks for the review and for the explanation. Building our own fuzzer sounds like a fun project, although personally I wonder if DF's requirements for a fuzzer are particularly unique. That is, in an ideal world, we would want an upstream project that develops the SQL fuzzer and just supports DF as one target SQL system among others, no? It would be nice to avoid needing to implement new oracles and similar generic fuzzer machinery ourselves. |
I was thinking along the same lines — relying on an existing SQL fuzzing framework (specifically SQLancer) sounds easier in theory: we declare the syntax and it should just work. However, after implementing it on the SQLancer Java framework, I found that in practice you still need to build almost everything from scratch, including query tree generation, query transformations for testing oracles and expression rendering. The utilities provided by the framework are mostly generic infrastructure such as CLI argument handling and logging. Meanwhile, much of this “generic” fuzzer machinery already exists in DataFusion itself:
Given this, implementing a Rust-native fuzzer tailored to DataFusion may actually be simpler than integrating with an external framework. I hope to find time to continue the Rust version. |
|
@2010YOUY01 Interesting, thanks for the additional context! |
Test briefly by compiling and using datafusion-sqlancer against the latest DataFusion code in git master. Without this PR, this does not work; with this PR, it works as expected.