feat: introduce query plan outlines and canonicalization#2901
feat: introduce query plan outlines and canonicalization#2901
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2901 +/- ##
==========================================
+ Coverage 74.54% 74.62% +0.08%
==========================================
Files 484 485 +1
Lines 59228 59972 +744
==========================================
+ Hits 44145 44747 +602
- Misses 11990 12104 +114
- Partials 3093 3121 +28 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
tstirrat15
left a comment
There was a problem hiding this comment.
See comments, otherwise LGTM
pkg/query/canonicalize.go
Outdated
|
|
||
| // extractCaveats recursively extracts all CaveatIteratorType nodes from the tree. | ||
| // Returns the tree with caveats removed and a list of all extracted caveats. | ||
| func extractCaveats(outline Outline) (Outline, []*core.ContextualizedCaveat) { |
There was a problem hiding this comment.
More for my own edification than something that needs to change: what's the tradeoff between using iterators for this kind of work and using slices? Like I had the thought that extractCaveats could be written as an iterator and then you could use slices.SortedFunc above, but I don't know whether one approach is better than the other.
There was a problem hiding this comment.
So the cool thing about Outlines is all in the function signature.
Iterators, for doing the Check/IR/IS, make sense to be purely abstracted. They're a set, and we just want to fetch/evaluate some subset of that set.
But when we're manipulating them directly, like this, the signature func(Iterator) (Iterator, []*core.ContextualizedCaveat) here requires a lot of type switching, especially when replacing subiterators. .ReplaceSubiterators() does some of that, but hopefully that can go away!
So you totally can write extractCaveats this way -- the first version of canonicalization did exactly that -- but it's far more convenient to work with outlines.
(But this is also a good note to clean up the Iterator interface a little in a followup!)
There was a problem hiding this comment.
Oh! When I said "iterator" I meant golang iter seqs or whatever they're called. Like lazy evaluation and yielding instead of operating on slices.
|
|
||
| // Compare compares two BaseRelations, returning -1 if b < other, 0 if b == other, 1 if b > other. | ||
| // Comparison is done lexicographically by: definition name, relation name, type, subrelation, caveat, expiration. | ||
| func (b *BaseRelation) Compare(other *BaseRelation) int { |
There was a problem hiding this comment.
This one I feel more confident that we won't be changing, unless there's something drastic to change in the schema. But! We are testing it at a higher level because otherwise the fuzz for the canonicalization will fail
| } | ||
|
|
||
| // argsCompare returns -1 if a < b, 0 if a == b, 1 if a > b | ||
| func argsCompare(a, b *IteratorArgs) int { |
There was a problem hiding this comment.
Same question here - will we capture it if we add new args?
There was a problem hiding this comment.
Yep; fuzzing on this sort order should find args that wouldn't stably sort
miparnisari
left a comment
There was a problem hiding this comment.
Could you please add descriptions to your PRs? Motivation, etc?
| // 2. Bottom-Up Canonicalization: Apply five transformation steps sequentially | ||
| // | ||
| // The function is idempotent: applying it multiple times produces the same result. | ||
| func CanonicalizeOutline(outline Outline) (Outline, error) { |
There was a problem hiding this comment.
nitpick: why not func (outline *Outline) CanonicalizeOutline() (Outline, error)? makes discovering the func easier
and why return Outline and not a new CanonicalizedOutline?
There was a problem hiding this comment.
Where CanonicalizedOutline is an alias for Outline?
There was a problem hiding this comment.
no, a new struct altogether
| Type IteratorType | ||
| Args *IteratorArgs | ||
| Subiterators []Outline | ||
| CanonicalKey CanonicalKey // Populated only after canonicalization |
There was a problem hiding this comment.
same as my other comment - code smell here? why not make a new type CanonicalizedOutline?
There was a problem hiding this comment.
So I did think about that. The thing is, we need to keep track of the CanonicalKey for a given node in the outline/iterator tree. But at the same time, we want to modify the Outline, restructuring it as we see fit, so that it's not canonical but has a CanonicalKey.
And so that's exactly how it's written. CanonicalOutline is a good idea, but it's not obvious to me how to do that without some indirection; how to keep a CanonicalKey attached to a node in a tree, without keeping it on the node.
It's true that one could generate UUIDs, and then this field is a UUID, and the CanonicalOutline contains the outline, and a map from UUID to CanonicalKey. (Or anything else we want to track). I don't mind this if you prefer!
| Subiterators: decompSubs, | ||
| }, nil | ||
|
|
||
| default: |
There was a problem hiding this comment.
i'm starting to think that we should bring this linter https://golangci-lint.run/docs/linters/configuration/#exhaustive to prevent errors of the form "we added a new type of iterator but we forgot to update a few switches"
There was a problem hiding this comment.
I'm open to the idea, but only in explicit annotation mode. And then annotate places like this. Or, better, if it supports annotating specific type alias enums which, if they are switched over, should be exhaustive; and this would certainly be one of them (then it will enforce new code easier)
pkg/query/path.go
Outdated
| aStr := fmt.Sprintf("%v", a.Metadata) | ||
| bStr := fmt.Sprintf("%v", b.Metadata) |
There was a problem hiding this comment.
when you format a map like this, do you have a guarantee that the keys will be printed in the same order every time? i don't think so.
how about we make Metadata a proper struct that has a comparison method of its own? then you can put all this logic there.
There was a problem hiding this comment.
I think the spec of the language doesn't guarantee it, but the implementation of the language does, similar to javascript and python.
203ac7a to
16c1024
Compare
tstirrat15
left a comment
There was a problem hiding this comment.
LGTM. Curious about whether something like rapid would be useful for writing these tests?
16c1024 to
ae0dd6b
Compare
Description
Testing
References