GFQL Testing & Verification Roadmap
This meta-issue tracks follow-up work to achieve production-grade and publication-quality testing/verification for GFQL.
Context
PRs #846 (cudf same-path executor) and #852 (Alloy verification) add significant GFQL functionality. This roadmap extends testing to meet:
- Production standards: Conformance suites, metamorphic testing, fuzz testing
- Publication standards: Formal verification coverage for SIGMOD/VLDB-tier claims
Bugs Found in PR #846 (Lessons for Verification)
During development of the native vectorized executor in PR #846, several algorithmic bugs were discovered that formal verification could have caught earlier. These inform what the Alloy model should cover.
Bug 1: Backward Traversal Join Direction (commit 71cda418)
Location: _find_multihop_start_nodes in df_executor.py:754
Bug: When doing backward BFS to find which start nodes can reach end nodes, the join was on the wrong column:
# WRONG: joined on __to__ (predecessor) instead of __from__ (target)
new_frontier = edge_pairs.merge(frontier, left_on='__to__', ...)
# CORRECT: join on __from__ to find predecessors
new_frontier = edge_pairs.merge(frontier, left_on='__from__', ...)
Symptom: Multi-hop + WHERE queries returned wrong nodes or empty results.
Alloy gap: The model's AlgoBackward doesn't explicitly verify join direction semantics.
Recommendation: Add assertion that backward reachability computes { n | ∃ path n →* target } not { n | ∃ path target →* n }.
Bug 2: Empty Set Short-Circuit Missing (commit 71cda418)
Location: _materialize_filtered in df_executor.py:1520
Bug: If any step in the chain had an empty allowed node set (e.g., WHERE filtered out all nodes), the executor didn't short-circuit to empty result.
# FIX: Early return when any node step has empty set
if path_state.allowed_nodes:
for node_set in path_state.allowed_nodes.values():
if node_set is not None and len(node_set) == 0:
return empty_result
Symptom: 4-step chains with impossible WHERE returned nodes instead of empty.
Alloy gap: Model implicitly handles via no quantifier but doesn't assert materialization behavior.
Recommendation: Add assertion: (some i: NodeStep | AlgoAllowed[c, i] = none) implies AlgoOutN[c] = none
Bug 3: Wrong Node Source for Non-Adjacent WHERE (commit 71cda418)
Location: _apply_non_adjacent_where_post_prune in df_executor.py:410
Bug: Used alias_frames (populated during forward pass) instead of original graph nodes for value lookups. Forward pass may not populate all aliases.
# WRONG: alias_frames can be incomplete
left_values = alias_frames.get(left_alias)
# CORRECT: Use original graph's node DataFrame
nodes_df = self.inputs.graph._nodes
Symptom: Non-adjacent WHERE comparisons failed silently or used stale values.
Alloy gap: lowerWhere doesn't model where attribute values come from.
Recommendation: Clarify that WHERE value lookups use the full graph's node attributes, filtered to allowed nodes.
Bug 4: Multi-hop Path Tracing Through Intermediates (commit cd579363)
Location: _filter_multihop_edges_by_endpoints in df_executor.py:700
Bug: Backward prune filtered edges by dst in allowed_end_nodes, but for multi-hop, START connects to INTERMEDIATE nodes, not END directly.
Symptom: Start nodes showed as empty for multi-hop chains.
Alloy gap: Hop ranges not modeled at all.
Recommendation: High priority - Add hop range modeling to Alloy (see Sub-Issue #2).
Bug 5: Reverse/Undirected Edge Direction Handling (commit cd579363)
Location: Multiple functions in backward prune phase
Bug: is_reverse and is_undirected flags weren't correctly propagating through path tracing. For reverse edges, src/dst semantics are swapped.
Symptom: Reverse edge chains returned wrong or empty results.
Alloy gap: Model assumes forward edges only (implicit e.src → e.dst).
Recommendation: Add EdgeStep.direction attribute (forward/reverse/undirected) and verify directional semantics.
Tests Added
10 new tests in TestImpossibleConstraints class (test_df_executor_inputs.py):
test_contradictory_lt_gt_same_column
test_contradictory_eq_neq_same_column
test_contradictory_lte_gt_same_column
test_no_paths_satisfy_predicate
test_multihop_no_valid_endpoints
test_contradictory_on_different_columns
test_chain_with_impossible_intermediate
test_non_adjacent_impossible_constraint
test_empty_graph_with_constraints
test_no_edges_with_constraints
Sub-Issues
1. Metamorphic Testing for GFQL
Priority: P2
Add query equivalence testing under graph/query transformations:
- Graph isomorphism:
query(G) ≡ query(G') where G' is isomorphic to G
- Query partitioning:
WHERE x > 5 ≡ (WHERE x > 5 AND x > 3)
- Subgraph preservation: Results on subgraph ⊆ results on full graph
Reference: Gamera (VLDB 2024) - Graph-aware metamorphic relations
2. Extend Alloy Model to Cover Hop Ranges
Priority: P1 ⬆️ (upgraded based on bugs found)
Current Alloy model (alloy/gfql_fbf_where.als) does NOT formally verify:
min_hops / max_hops traversal bounds
output_min_hops / output_max_hops slicing
- Hop labeling (
label_node_hops, label_edge_hops, label_seeds)
- NEW: Backward traversal join direction for multi-hop
- NEW: Intermediate node reachability (start → intermediate → end)
Options:
- A) Explicit modeling: Add hop counter to path semantics, verify bounds
- B) Unrolling approach: Generate fixed-length chain variants, verify each
- C) Abstraction: Prove hop filtering is a sound post-filter on verified core
Rationale for P1: Bugs #1 and #4 above were in multi-hop logic that formal verification should catch.
3. Add Backward Reachability Assertions to Alloy
Priority: P1 (new)
Verify that AlgoBackward correctly computes predecessors:
assert BackwardReachabilityCorrect {
all c: Chain, n: Node, target: Node |
n in AlgoBackward[c, target] iff
(some path: seq Edge | pathConnects[path, n, target])
}
This would have caught Bug #1 (join direction).
4. Add Empty Set Propagation Assertion
Priority: P2 (new)
Verify that if any step has an empty allowed set, the output is empty:
assert EmptyPropagation {
all c: Chain |
(some i: seq/inds[c.seqSteps] | AlgoAllowed[c, i] = none)
implies (AlgoOutN[c] = none and AlgoOutE[c] = none)
}
This would have caught Bug #2.
5. Add Contradictory WHERE Scenarios
Priority: P2 (new)
Verify that contradictory WHERE clauses produce empty results:
pred ContradictoryWhere[c: Chain] {
some w1, w2: c.wheres |
w1.lhs.a = w2.lhs.a and w1.rhs.a = w2.rhs.a and
((w1.op = Lt and w2.op = Gt) or (w1.op = Eq and w2.op = Neq))
}
assert ContradictoryWhereEmpty {
all c: Chain | ContradictoryWhere[c] implies AlgoOutN[c] = none
}
6. Fuzz Testing for Chain + WHERE Combinations
Priority: P3
Random generation of:
- Graph structures (nodes, edges, types, values)
- GFQL chains (varying length, directions, hop params)
- WHERE clauses (random alias pairs, operators, column refs)
Compare cuDF executor vs oracle on generated inputs.
Tools: Hypothesis (Python), custom generators
7. Cypher/GQL Conformance Suite
Priority: P3
Work toward openCypher TCK compatibility:
- Map GFQL operations to Cypher equivalents
- Adapt TCK scenarios for GFQL surface syntax
- Track coverage percentage
Note: GFQL currently lacks Cypher surface syntax; this is aspirational.
Reference: Neo4j GQL Conformance
Current Coverage
| Category |
Status |
Location |
| Hop ranges (parity) |
✅ 11+ tests |
test_enumerator_parity.py |
| WHERE (parity) |
✅ 50+ tests |
test_df_executor_inputs.py |
| Hop + WHERE composition |
✅ 20+ tests |
test_df_executor_inputs.py |
| Impossible constraints |
✅ 10 tests |
test_df_executor_inputs.py |
| Alloy (WHERE lowering) |
✅ Verified |
alloy/gfql_fbf_where.als |
| Alloy (hop ranges) |
❌ Not modeled |
- |
| Alloy (backward reachability) |
❌ Not explicit |
- |
| Alloy (empty propagation) |
❌ Implicit only |
- |
References
Labels
testing, verification, gfql, roadmap
GFQL Testing & Verification Roadmap
This meta-issue tracks follow-up work to achieve production-grade and publication-quality testing/verification for GFQL.
Context
PRs #846 (cudf same-path executor) and #852 (Alloy verification) add significant GFQL functionality. This roadmap extends testing to meet:
Bugs Found in PR #846 (Lessons for Verification)
During development of the native vectorized executor in PR #846, several algorithmic bugs were discovered that formal verification could have caught earlier. These inform what the Alloy model should cover.
Bug 1: Backward Traversal Join Direction (commit
71cda418)Location:
_find_multihop_start_nodesindf_executor.py:754Bug: When doing backward BFS to find which start nodes can reach end nodes, the join was on the wrong column:
Symptom: Multi-hop + WHERE queries returned wrong nodes or empty results.
Alloy gap: The model's
AlgoBackwarddoesn't explicitly verify join direction semantics.Recommendation: Add assertion that backward reachability computes
{ n | ∃ path n →* target }not{ n | ∃ path target →* n }.Bug 2: Empty Set Short-Circuit Missing (commit
71cda418)Location:
_materialize_filteredindf_executor.py:1520Bug: If any step in the chain had an empty allowed node set (e.g., WHERE filtered out all nodes), the executor didn't short-circuit to empty result.
Symptom: 4-step chains with impossible WHERE returned nodes instead of empty.
Alloy gap: Model implicitly handles via
noquantifier but doesn't assert materialization behavior.Recommendation: Add assertion:
(some i: NodeStep | AlgoAllowed[c, i] = none) implies AlgoOutN[c] = noneBug 3: Wrong Node Source for Non-Adjacent WHERE (commit
71cda418)Location:
_apply_non_adjacent_where_post_pruneindf_executor.py:410Bug: Used
alias_frames(populated during forward pass) instead of original graph nodes for value lookups. Forward pass may not populate all aliases.Symptom: Non-adjacent WHERE comparisons failed silently or used stale values.
Alloy gap:
lowerWheredoesn't model where attribute values come from.Recommendation: Clarify that WHERE value lookups use the full graph's node attributes, filtered to allowed nodes.
Bug 4: Multi-hop Path Tracing Through Intermediates (commit
cd579363)Location:
_filter_multihop_edges_by_endpointsindf_executor.py:700Bug: Backward prune filtered edges by
dst in allowed_end_nodes, but for multi-hop, START connects to INTERMEDIATE nodes, not END directly.Symptom: Start nodes showed as empty for multi-hop chains.
Alloy gap: Hop ranges not modeled at all.
Recommendation: High priority - Add hop range modeling to Alloy (see Sub-Issue #2).
Bug 5: Reverse/Undirected Edge Direction Handling (commit
cd579363)Location: Multiple functions in backward prune phase
Bug:
is_reverseandis_undirectedflags weren't correctly propagating through path tracing. For reverse edges, src/dst semantics are swapped.Symptom: Reverse edge chains returned wrong or empty results.
Alloy gap: Model assumes forward edges only (implicit
e.src → e.dst).Recommendation: Add
EdgeStep.directionattribute (forward/reverse/undirected) and verify directional semantics.Tests Added
10 new tests in
TestImpossibleConstraintsclass (test_df_executor_inputs.py):test_contradictory_lt_gt_same_columntest_contradictory_eq_neq_same_columntest_contradictory_lte_gt_same_columntest_no_paths_satisfy_predicatetest_multihop_no_valid_endpointstest_contradictory_on_different_columnstest_chain_with_impossible_intermediatetest_non_adjacent_impossible_constrainttest_empty_graph_with_constraintstest_no_edges_with_constraintsSub-Issues
1. Metamorphic Testing for GFQL
Priority: P2
Add query equivalence testing under graph/query transformations:
query(G) ≡ query(G')whereG'is isomorphic toGWHERE x > 5≡(WHERE x > 5 AND x > 3)Reference: Gamera (VLDB 2024) - Graph-aware metamorphic relations
2. Extend Alloy Model to Cover Hop Ranges
Priority: P1 ⬆️ (upgraded based on bugs found)
Current Alloy model (
alloy/gfql_fbf_where.als) does NOT formally verify:min_hops/max_hopstraversal boundsoutput_min_hops/output_max_hopsslicinglabel_node_hops,label_edge_hops,label_seeds)Options:
Rationale for P1: Bugs #1 and #4 above were in multi-hop logic that formal verification should catch.
3. Add Backward Reachability Assertions to Alloy
Priority: P1 (new)
Verify that
AlgoBackwardcorrectly computes predecessors:This would have caught Bug #1 (join direction).
4. Add Empty Set Propagation Assertion
Priority: P2 (new)
Verify that if any step has an empty allowed set, the output is empty:
This would have caught Bug #2.
5. Add Contradictory WHERE Scenarios
Priority: P2 (new)
Verify that contradictory WHERE clauses produce empty results:
6. Fuzz Testing for Chain + WHERE Combinations
Priority: P3
Random generation of:
Compare cuDF executor vs oracle on generated inputs.
Tools: Hypothesis (Python), custom generators
7. Cypher/GQL Conformance Suite
Priority: P3
Work toward openCypher TCK compatibility:
Note: GFQL currently lacks Cypher surface syntax; this is aspirational.
Reference: Neo4j GQL Conformance
Current Coverage
test_enumerator_parity.pytest_df_executor_inputs.pytest_df_executor_inputs.pytest_df_executor_inputs.pyalloy/gfql_fbf_where.alsReferences
71cda418,cd579363Labels
testing, verification, gfql, roadmap