Skip to content

Implement new query-aware indexing strategies and add synthetic stream benchmark#18

Open
Mirovh wants to merge 27 commits into
StreamIntelligenceLab:mainfrom
Mirovh:main
Open

Implement new query-aware indexing strategies and add synthetic stream benchmark#18
Mirovh wants to merge 27 commits into
StreamIntelligenceLab:mainfrom
Mirovh:main

Conversation

@Mirovh
Copy link
Copy Markdown

@Mirovh Mirovh commented May 17, 2026

Query-aware RDF indexing strategies

Kolibrie currently uses an RDF triple store technique called Hexastore.
This PR features an index interface implemented by multiple new indexing strategies/triple stores, and also the old Hexastore implementation.
New indexes:

  • Single index: uses one of the internal indexes of a hexastore in isolation
  • Buckets: creates a bucket that materialises the query result of each access pattern present in the query plan
  • Partial Hexastore: uses the access patterns present in the query plan to eliminate unused indexes in a Hexastore
  • Table: stores all triples in a table (hashset), effectively doing no indexing

This allows users to choose which type of index to build.

Synthetic stream benchmark

A new synthetic stream benchmark was implemented to compare these indexing strategies.
The data generator used is also included, with tweakable parameters.
A helper shell script is included to run all indexes in succession and print results to files.

Other

A helper shell script was also added for the WatDiv benchmark, also running all indexes in succession and printing results to files.

@Mirovh Mirovh marked this pull request as ready for review May 17, 2026 06:30
}

// Remove from index using helper function
remove_from_index(&mut self.ops, s, p, o);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPSSingleIndex::delete() checks existence using ops[o][p][s], but removes using remove_from_index(&mut self.ops, s, p, o). That should be o, p, s. Current behavior: delete() can return true while the triple remains in the index. That is a correctness bug

Take a look at:
1
2

}
}

panic!(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BucketIndex::query() can panic! when the query shape was not pre-planned. That violates the trait contract in mod.rs (line 94), which says query should always work.

Moreover, usually panic is not a good idea to have in production, it's better to have some fallback

inserted
}

fn delete(&mut self, triple: &Triple) -> bool {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete() can return false even when the triple was actually removed. only the SPO deletion result is assigned to deleted

So if the partial index was built without SPO, for example with only POS, the triple can be removed from the available index but delete() still returns false.

Box::new(self.clone())
}

fn supported_access_patterns(&self) -> AccessPatternSupport {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supported_access_patterns() disagrees with the actual scan_*() methods.
For example, this reports sp = true when either SPO or PSO exists:
sp: self.spo.is_some() || self.pso.is_some(),

But scan_sp() only checks SPO:
self.spo.as_ref().and_then(|idx| idx.get(&s).and_then(|m| m.get(&p)))

So a PSO-only partial index reports that SP scans are supported, but scan_sp(s, p) returns None.

@ladroid ladroid assigned ladroid and unassigned ladroid May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants