Skip to content

Meta: GFQL Testing & Verification Roadmap #871

@lmeyerov

Description

@lmeyerov

GFQL Testing & Verification Roadmap

This meta-issue tracks follow-up work to achieve production-grade and publication-quality testing/verification for GFQL.

Context

PRs #846 (cudf same-path executor) and #852 (Alloy verification) add significant GFQL functionality. This roadmap extends testing to meet:

  • Production standards: Conformance suites, metamorphic testing, fuzz testing
  • Publication standards: Formal verification coverage for SIGMOD/VLDB-tier claims

Bugs Found in PR #846 (Lessons for Verification)

During development of the native vectorized executor in PR #846, several algorithmic bugs were discovered that formal verification could have caught earlier. These inform what the Alloy model should cover.

Bug 1: Backward Traversal Join Direction (commit 71cda418)

Location: _find_multihop_start_nodes in df_executor.py:754

Bug: When doing backward BFS to find which start nodes can reach end nodes, the join was on the wrong column:

# WRONG: joined on __to__ (predecessor) instead of __from__ (target)
new_frontier = edge_pairs.merge(frontier, left_on='__to__', ...)

# CORRECT: join on __from__ to find predecessors
new_frontier = edge_pairs.merge(frontier, left_on='__from__', ...)

Symptom: Multi-hop + WHERE queries returned wrong nodes or empty results.

Alloy gap: The model's AlgoBackward doesn't explicitly verify join direction semantics.

Recommendation: Add assertion that backward reachability computes { n | ∃ path n →* target } not { n | ∃ path target →* n }.

Bug 2: Empty Set Short-Circuit Missing (commit 71cda418)

Location: _materialize_filtered in df_executor.py:1520

Bug: If any step in the chain had an empty allowed node set (e.g., WHERE filtered out all nodes), the executor didn't short-circuit to empty result.

# FIX: Early return when any node step has empty set
if path_state.allowed_nodes:
    for node_set in path_state.allowed_nodes.values():
        if node_set is not None and len(node_set) == 0:
            return empty_result

Symptom: 4-step chains with impossible WHERE returned nodes instead of empty.

Alloy gap: Model implicitly handles via no quantifier but doesn't assert materialization behavior.

Recommendation: Add assertion: (some i: NodeStep | AlgoAllowed[c, i] = none) implies AlgoOutN[c] = none

Bug 3: Wrong Node Source for Non-Adjacent WHERE (commit 71cda418)

Location: _apply_non_adjacent_where_post_prune in df_executor.py:410

Bug: Used alias_frames (populated during forward pass) instead of original graph nodes for value lookups. Forward pass may not populate all aliases.

# WRONG: alias_frames can be incomplete
left_values = alias_frames.get(left_alias)

# CORRECT: Use original graph's node DataFrame
nodes_df = self.inputs.graph._nodes

Symptom: Non-adjacent WHERE comparisons failed silently or used stale values.

Alloy gap: lowerWhere doesn't model where attribute values come from.

Recommendation: Clarify that WHERE value lookups use the full graph's node attributes, filtered to allowed nodes.

Bug 4: Multi-hop Path Tracing Through Intermediates (commit cd579363)

Location: _filter_multihop_edges_by_endpoints in df_executor.py:700

Bug: Backward prune filtered edges by dst in allowed_end_nodes, but for multi-hop, START connects to INTERMEDIATE nodes, not END directly.

Symptom: Start nodes showed as empty for multi-hop chains.

Alloy gap: Hop ranges not modeled at all.

Recommendation: High priority - Add hop range modeling to Alloy (see Sub-Issue #2).

Bug 5: Reverse/Undirected Edge Direction Handling (commit cd579363)

Location: Multiple functions in backward prune phase

Bug: is_reverse and is_undirected flags weren't correctly propagating through path tracing. For reverse edges, src/dst semantics are swapped.

Symptom: Reverse edge chains returned wrong or empty results.

Alloy gap: Model assumes forward edges only (implicit e.src → e.dst).

Recommendation: Add EdgeStep.direction attribute (forward/reverse/undirected) and verify directional semantics.

Tests Added

10 new tests in TestImpossibleConstraints class (test_df_executor_inputs.py):

  • test_contradictory_lt_gt_same_column
  • test_contradictory_eq_neq_same_column
  • test_contradictory_lte_gt_same_column
  • test_no_paths_satisfy_predicate
  • test_multihop_no_valid_endpoints
  • test_contradictory_on_different_columns
  • test_chain_with_impossible_intermediate
  • test_non_adjacent_impossible_constraint
  • test_empty_graph_with_constraints
  • test_no_edges_with_constraints

Sub-Issues

1. Metamorphic Testing for GFQL

Priority: P2

Add query equivalence testing under graph/query transformations:

  • Graph isomorphism: query(G) ≡ query(G') where G' is isomorphic to G
  • Query partitioning: WHERE x > 5(WHERE x > 5 AND x > 3)
  • Subgraph preservation: Results on subgraph ⊆ results on full graph

Reference: Gamera (VLDB 2024) - Graph-aware metamorphic relations

2. Extend Alloy Model to Cover Hop Ranges

Priority: P1 ⬆️ (upgraded based on bugs found)

Current Alloy model (alloy/gfql_fbf_where.als) does NOT formally verify:

  • min_hops / max_hops traversal bounds
  • output_min_hops / output_max_hops slicing
  • Hop labeling (label_node_hops, label_edge_hops, label_seeds)
  • NEW: Backward traversal join direction for multi-hop
  • NEW: Intermediate node reachability (start → intermediate → end)

Options:

  • A) Explicit modeling: Add hop counter to path semantics, verify bounds
  • B) Unrolling approach: Generate fixed-length chain variants, verify each
  • C) Abstraction: Prove hop filtering is a sound post-filter on verified core

Rationale for P1: Bugs #1 and #4 above were in multi-hop logic that formal verification should catch.

3. Add Backward Reachability Assertions to Alloy

Priority: P1 (new)

Verify that AlgoBackward correctly computes predecessors:

assert BackwardReachabilityCorrect {
  all c: Chain, n: Node, target: Node |
    n in AlgoBackward[c, target] iff
    (some path: seq Edge | pathConnects[path, n, target])
}

This would have caught Bug #1 (join direction).

4. Add Empty Set Propagation Assertion

Priority: P2 (new)

Verify that if any step has an empty allowed set, the output is empty:

assert EmptyPropagation {
  all c: Chain |
    (some i: seq/inds[c.seqSteps] | AlgoAllowed[c, i] = none)
    implies (AlgoOutN[c] = none and AlgoOutE[c] = none)
}

This would have caught Bug #2.

5. Add Contradictory WHERE Scenarios

Priority: P2 (new)

Verify that contradictory WHERE clauses produce empty results:

pred ContradictoryWhere[c: Chain] {
  some w1, w2: c.wheres |
    w1.lhs.a = w2.lhs.a and w1.rhs.a = w2.rhs.a and
    ((w1.op = Lt and w2.op = Gt) or (w1.op = Eq and w2.op = Neq))
}

assert ContradictoryWhereEmpty {
  all c: Chain | ContradictoryWhere[c] implies AlgoOutN[c] = none
}

6. Fuzz Testing for Chain + WHERE Combinations

Priority: P3

Random generation of:

  • Graph structures (nodes, edges, types, values)
  • GFQL chains (varying length, directions, hop params)
  • WHERE clauses (random alias pairs, operators, column refs)

Compare cuDF executor vs oracle on generated inputs.

Tools: Hypothesis (Python), custom generators

7. Cypher/GQL Conformance Suite

Priority: P3

Work toward openCypher TCK compatibility:

  • Map GFQL operations to Cypher equivalents
  • Adapt TCK scenarios for GFQL surface syntax
  • Track coverage percentage

Note: GFQL currently lacks Cypher surface syntax; this is aspirational.

Reference: Neo4j GQL Conformance


Current Coverage

Category Status Location
Hop ranges (parity) ✅ 11+ tests test_enumerator_parity.py
WHERE (parity) ✅ 50+ tests test_df_executor_inputs.py
Hop + WHERE composition ✅ 20+ tests test_df_executor_inputs.py
Impossible constraints ✅ 10 tests test_df_executor_inputs.py
Alloy (WHERE lowering) ✅ Verified alloy/gfql_fbf_where.als
Alloy (hop ranges) ❌ Not modeled -
Alloy (backward reachability) ❌ Not explicit -
Alloy (empty propagation) ❌ Implicit only -

References

Labels

testing, verification, gfql, roadmap

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions