Adding node layers to tests and loaders#2597
Draft
arienandalibi wants to merge 96 commits intodb_v4from
Draft
Conversation
…re-compute new IDs and turn them into RecordBatches
…ock the graph to get parallel iterators over edges. We filter to respect GraphView filtering behaviour.
…ill use ArrowWriter<File> for now, but we will add support for loading into a graph
# Conflicts: # raphtory/src/serialise/parquet/mod.rs
…ng explode_layers() on each EdgeView.
… function can now be passed to these functions to determine how the sinks will be created. This will allow us to pass a sink which is a crossbeam_channel to send RecordBatches elsewhere.
# Conflicts: # raphtory/src/serialise/parquet/mod.rs
…w materialize function
…dge_meta and node_meta.
…k and reusing the old one.
…f encoding everything and then ingesting everything (which would keep everything in memory at once).
… when run on a big graph.
…another thread pool.
…rage so that it doesn't run out of memory
…anning each segment for each row. Now using this path in the new materialize_using_recordbatches function.
… as much as possible
…separate out running materialize and parquet decoding. Test using SF10 for now.
…m_df, and internalise otherwise
…ic ordering for events at the same timestamp for Prop::List (Vec and Array should be the same) and Prop::Map (ordering of elements should be stable, previously depended on HashMap iteration order which is undefined).
…alization of ParquetTEdge. Cleaned up materialize tests so that they don't try to call an "old" materialize anymore
…odes_from_df call. We can actually pass a column of layer names to the "layer_id_col" parameter, the name is misleading
…ze) and in proptests. currently fails
… was in persistent_semantics.rs, in fn node_updates_window. Proptests still fail.
…p_dst_id". GIDS are now "rap_src_id" and "rap_dst_id". This is inconsistent with other column's naming scheme, but it is backwards compatible with already encoded parquet files.
# Conflicts: # raphtory/src/arrow_loader/df_loaders/nodes.rs # raphtory/src/db/api/view/graph.rs # raphtory/src/io/parquet_loaders.rs # raphtory/src/parquet_encoder/edges.rs # raphtory/src/parquet_encoder/mod.rs # raphtory/src/parquet_encoder/model.rs # raphtory/src/parquet_encoder/nodes.rs # raphtory/src/python/graph/io/arrow_loaders.rs # raphtory/src/serialise/parquet.rs # raphtory/tests/df_loaders.rs # raphtory/tests/test_materialize_sf10.rs
…'re now back to ingesting using VIDs instead of resolving GIDs.
Contributor
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Rust Benchmark'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 2.
| Benchmark suite | Current: acdab5b | Previous: 9823ef7 | Ratio |
|---|---|---|---|
lotr_graph/num_edges |
5 ns/iter (± 0) |
0 ns/iter (± 0) |
+∞ |
lotr_graph/num_nodes |
5 ns/iter (± 0) |
1 ns/iter (± 0) |
5 |
lotr_graph/graph_latest |
3 ns/iter (± 0) |
0 ns/iter (± 0) |
+∞ |
lotr_graph_materialise/materialize |
7899412 ns/iter (± 51328) |
1564816 ns/iter (± 35303) |
5.05 |
lotr_graph_window_100/num_nodes |
13 ns/iter (± 0) |
5 ns/iter (± 0) |
2.60 |
lotr_graph_window_100/iterate_exploded_edges |
791993 ns/iter (± 4342) |
325242 ns/iter (± 847) |
2.44 |
lotr_graph_window_100_materialise/materialize |
8410279 ns/iter (± 38549) |
1669150 ns/iter (± 10700) |
5.04 |
lotr_graph_window_10/has_node_existing |
144 ns/iter (± 8) |
62 ns/iter (± 11) |
2.32 |
lotr_graph_window_10/iterate nodes |
31610 ns/iter (± 128) |
11339 ns/iter (± 40) |
2.79 |
lotr_graph_window_10/iterate edges |
100182 ns/iter (± 380) |
48684 ns/iter (± 211) |
2.06 |
lotr_graph_window_10/iterate_exploded_edges |
389238 ns/iter (± 3251) |
155788 ns/iter (± 1001) |
2.50 |
lotr_graph_window_10_materialise/materialize |
3486386 ns/iter (± 14256) |
971980 ns/iter (± 4278) |
3.59 |
lotr_graph_subgraph_10pc_materialise/materialize |
1682941 ns/iter (± 9159) |
334634 ns/iter (± 1287) |
5.03 |
lotr_graph_subgraph_10pc_windowed/has_node_existing |
147 ns/iter (± 8) |
62 ns/iter (± 14) |
2.37 |
lotr_graph_subgraph_10pc_windowed/iterate nodes |
5360 ns/iter (± 95) |
1365 ns/iter (± 3) |
3.93 |
lotr_graph_subgraph_10pc_windowed_materialise/materialize |
990344 ns/iter (± 32643) |
230399 ns/iter (± 2617) |
4.30 |
lotr_graph_window_50_layered/num_edges_temporal |
152746 ns/iter (± 8162) |
70121 ns/iter (± 7586) |
2.18 |
lotr_graph_window_50_layered/has_node_existing |
418 ns/iter (± 20) |
129 ns/iter (± 12) |
3.24 |
lotr_graph_window_50_layered/iterate nodes |
73147 ns/iter (± 1234) |
19308 ns/iter (± 47) |
3.79 |
lotr_graph_window_50_layered/iterate edges |
197171 ns/iter (± 1664) |
83616 ns/iter (± 1318) |
2.36 |
lotr_graph_window_50_layered/graph_latest |
78056 ns/iter (± 1718) |
36649 ns/iter (± 916) |
2.13 |
lotr_graph_window_50_layered_materialise/materialize |
26895485 ns/iter (± 276669) |
3488825 ns/iter (± 24948) |
7.71 |
lotr_graph_persistent_window_50_layered/num_edges_temporal |
600392 ns/iter (± 5483) |
192686 ns/iter (± 1569) |
3.12 |
lotr_graph_persistent_window_50_layered/has_node_existing |
457 ns/iter (± 288) |
174 ns/iter (± 83) |
2.63 |
lotr_graph_persistent_window_50_layered/iterate nodes |
97762 ns/iter (± 533) |
35886 ns/iter (± 191) |
2.72 |
lotr_graph_persistent_window_50_layered/iterate edges |
171473 ns/iter (± 705) |
84161 ns/iter (± 596) |
2.04 |
lotr_graph_persistent_window_50_layered/iterate_exploded_edges |
4348068 ns/iter (± 13607) |
1659940 ns/iter (± 19402) |
2.62 |
lotr_graph_persistent_window_50_layered_materialise/materialize |
48794835 ns/iter (± 160949) |
5298035 ns/iter (± 147912) |
9.21 |
lotr_graph/proto_encode |
9790791 ns/iter (± 142363) |
1157897 ns/iter (± 73709) |
8.46 |
This comment was automatically generated by workflow using github-action-benchmark.
# Conflicts: # raphtory/src/db/api/view/graph.rs
…c. resolve_layer fast path when layer ids are present is gone temporarily while debugging, will bring it back. fix node_updates_window in persistent_semantics.rs to account for the entire timestamp at the windows beginning for persisting properties properly.
…ders, and bringing back the fast path that uses these when resolving layers.
… back to Option, if it's not there then we imply STATIC_GRAPH_LAYER
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Node layers were previously not tested rigorously. They are now being added to tests, proptests, the loaders, and the parquet encoders. The loaders and parquet encoders are also used by materialize.
Why are the changes needed?
Fix node layer related bugs that we find.
Does this PR introduce any user-facing change? If yes is this documented?
It shouldn't
How was this patch tested?
proptests
Are there any further changes required?
There shouldn't be