Implement new query-aware indexing strategies and add synthetic stream benchmark#18
Implement new query-aware indexing strategies and add synthetic stream benchmark#18Mirovh wants to merge 27 commits into
Conversation
Update to 0.1.1
Update to 0.1.1
Optimized rdf indexing
| } | ||
|
|
||
| // Remove from index using helper function | ||
| remove_from_index(&mut self.ops, s, p, o); |
There was a problem hiding this comment.
| } | ||
| } | ||
|
|
||
| panic!( |
There was a problem hiding this comment.
BucketIndex::query() can panic! when the query shape was not pre-planned. That violates the trait contract in mod.rs (line 94), which says query should always work.
Moreover, usually panic is not a good idea to have in production, it's better to have some fallback
| inserted | ||
| } | ||
|
|
||
| fn delete(&mut self, triple: &Triple) -> bool { |
There was a problem hiding this comment.
delete() can return false even when the triple was actually removed. only the SPO deletion result is assigned to deleted
So if the partial index was built without SPO, for example with only POS, the triple can be removed from the available index but delete() still returns false.
| Box::new(self.clone()) | ||
| } | ||
|
|
||
| fn supported_access_patterns(&self) -> AccessPatternSupport { |
There was a problem hiding this comment.
supported_access_patterns() disagrees with the actual scan_*() methods.
For example, this reports sp = true when either SPO or PSO exists:
sp: self.spo.is_some() || self.pso.is_some(),
But scan_sp() only checks SPO:
self.spo.as_ref().and_then(|idx| idx.get(&s).and_then(|m| m.get(&p)))
So a PSO-only partial index reports that SP scans are supported, but scan_sp(s, p) returns None.
Query-aware RDF indexing strategies
Kolibrie currently uses an RDF triple store technique called Hexastore.
This PR features an index interface implemented by multiple new indexing strategies/triple stores, and also the old Hexastore implementation.
New indexes:
This allows users to choose which type of index to build.
Synthetic stream benchmark
A new synthetic stream benchmark was implemented to compare these indexing strategies.
The data generator used is also included, with tweakable parameters.
A helper shell script is included to run all indexes in succession and print results to files.
Other
A helper shell script was also added for the WatDiv benchmark, also running all indexes in succession and printing results to files.