refactor: introduce datalayer abstraction, remove intermediate interfaces#2912
refactor: introduce datalayer abstraction, remove intermediate interfaces#2912josephschorr wants to merge 5 commits intoauthzed:mainfrom
Conversation
…aces Replace direct datastore.Reader usage with datalayer.DataLayer and datalayer.RevisionedReader throughout the codebase. This hides Legacy* methods behind clean interfaces (SchemaReader, RevisionedReader, ReadWriteTransaction) and routes all schema/relationship access through the datalayer package. Key changes: - Add pkg/datalayer with DataLayer, RevisionedReader, SchemaReader, and ReadWriteTransaction interfaces that wrap the raw datastore - Replace manual context key boilerplate with ctxkey.NewBoxedWithDefault - Rename internal/middleware/datastore and pkg/middleware/datastore packages to datalayer to match the abstraction they serve - Remove DefinitionLookup interface from pkg/schema/resolver.go; use datalayer.SchemaReader directly - Remove RelationshipQuerier interface from pagination/iterator.go; use datalayer.RevisionedReader directly - Remove RelationshipReader interface from pkg/query/context.go; use datalayer.RevisionedReader directly - Delete internal/datastore/schema/schema.go (consolidated into datalayer) - Update all datastore implementations (postgres, crdb, mysql, spanner, memdb) and proxies to work through the new interfaces
32432d3 to
ad17d96
Compare
Codecov Report❌ Patch coverage is ❌ Your project check has failed because the head coverage (74.32%) is below the target coverage (75.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2912 +/- ##
==========================================
- Coverage 74.51% 74.32% -0.19%
==========================================
Files 484 486 +2
Lines 59201 59465 +264
==========================================
+ Hits 44109 44191 +82
- Misses 11995 12154 +159
- Partials 3097 3120 +23 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| @@ -0,0 +1,139 @@ | |||
| package datalayer | |||
There was a problem hiding this comment.
| package datalayer | |
| //go:generate go run go.uber.org/mock/mockgen -source datalayer.go -destination ./mocks/mock_datalayer.go | |
| package datalayer |
There was a problem hiding this comment.
As discussed, I highly dislike mocks, so will keep as-is
| dl := datalayer.NewDataLayer(ds) | ||
| sr, err := dl.SnapshotReader(headRevision).ReadSchema() | ||
| req.NoError(err) |
There was a problem hiding this comment.
For a follow up PR:
RunSingleCaveatExpression takes a datastore.SchemaReadable — an interface with 1 method. These tests don't need a real datastore at all. Spinning up a full in-memory datastore, parsing a schema string, writing it to storage, fetching a head revision, and getting a snapshot reader is a sign of "excessive arrange phase" code smell (patent pending 😄 ).
There are downsides to this:
- it distracts me from what is actually being tested: caveat expression evaluation, not datastore behavior.
- it points to a developer issue: we don't have a production-grade helper to create a caveat proto
- it points to a code issue: when i was refactoring this, i noticed this - why does
LookupCaveatDefinitionsByNamesstoredatastore.SchemaDefinitionand not*core.CaveatDefinitiondirectly? i see no less than 4 places where callers are forced to doif caveatDef, ok := def.(*core.CaveatDefinition).
This could be simplified as follows:
ctrl := gomock.NewController(t)
defer ctrl.Finish()
sr := mock_datalayer.NewMockSchemaReader(ctrl)
sr.EXPECT().LookupCaveatDefinitionsByNames(gomock.Any(), gomock.Any()).Return(map[string]datastore.SchemaDefinition{
"firstCaveat": createTestCaveat(t, "firstCaveat", "first", types.Default.IntType, "first == 42"),
"secondCaveat": createTestCaveat(t, "secondCaveat", "second", types.Default.StringType, "second == 'hello'"),
"thirdCaveat": createTestCaveat(t, "thirdCaveat", "third", types.Default.BooleanType, "third"),
}, nil).AnyTimes()and you already wrote createTestCaveat in a different file 😄
func createTestCaveat(t *testing.T, name, paramName string, paramType types.VariableType, expr string) *core.CaveatDefinition {
env, err := pkgcaveats.EnvForVariablesWithDefaultTypeSet(map[string]types.VariableType{
paramName: paramType,
})
require.NoError(t, err)
c, err := pkgcaveats.CompileCaveatWithName(env, expr, name)
require.NoError(t, err)
cBytes, err := c.Serialize()
require.NoError(t, err)
return &core.CaveatDefinition{
Name: name,
SerializedExpression: cBytes,
ParameterTypes: env.EncodedParametersTypes(),
}
}A similar change could be made in TestValidateRelationshipOperations, TestCheckNamespaceAndRelations
There was a problem hiding this comment.
Again, no mocks.
If we wanted to simplify here, we could use a fake implementation of the interface, but I don't really have a strong opinion on that vs keeping as-is, so I'd prefer to keep as-is.
| // ContextWithHandle adds a placeholder to a context that will later be | ||
| // filled by the datalayer. | ||
| func ContextWithHandle(ctx context.Context) context.Context { | ||
| return datalayer.ContextWithHandle(ctx) | ||
| } |
There was a problem hiding this comment.
adds a placeholder?? why do we need a placeholder?datalayerKey = ctxkey.NewBoxedWithDefaultis already setting a default. why do we need this method at all? every usage i see (e.g. in tests), they callContextWithHandleand then immediately callSetInContext. just make one method...- why do we need this file? it is delegating everything to
pkg/datalayer. either remove this file, or put all the logic here and thenpkgcalls intointernal. right now this is backwards (internal calling into pkg)
There was a problem hiding this comment.
Removed and simplified
pkg/validationfile/loader.go
Outdated
| // Write the caveat definitions. | ||
| err := rwt.LegacyWriteCaveats(ctx, caveatDefs) | ||
| revision, err := dl.ReadWriteTx(ctx, func(ctx context.Context, rwt datalayer.ReadWriteTransaction) error { | ||
| ctx = datalayermw.ContextWithDataLayer(ctx, dl) |
There was a problem hiding this comment.
if i remove this line, the tests still pass
| // PopulateFromFiles populates the given datastore with the namespaces and tuples found in | ||
| // the validation file(s) specified. | ||
| func PopulateFromFiles(ctx context.Context, ds datastore.Datastore, caveatTypeSet *caveattypes.TypeSet, filePaths []string) (*PopulatedValidationFile, datastore.Revision, error) { | ||
| func PopulateFromFiles(ctx context.Context, dl datalayer.DataLayer, caveatTypeSet *caveattypes.TypeSet, filePaths []string) (*PopulatedValidationFile, datastore.Revision, error) { |
There was a problem hiding this comment.
for another PR: this feels like an API that should be part of the dataLayer implementation, instead of being here which is very hard to discover.
You could add these methods to DataLayer:
- bulk export(To file)
- bulk export(to response)
- bulk import(from files)
- bulk import(from request)
There was a problem hiding this comment.
Maybe... I can see it both ways, as I'm unsure we want the file format to be defined in the data layer itself
There was a problem hiding this comment.
it could be bulkExport(io.Writer) and bulkImport(io.Reader)
| if aerr != nil { | ||
| return aerr | ||
| } |
There was a problem hiding this comment.
if i remove from line 185 to 188, the tests still pass
There was a problem hiding this comment.
Added a test, although other tests do break if this is removed (any test using the file loader outside of this package)
| // UnaryCountingInterceptor wraps the datalayer in a counting proxy for each unary request. | ||
| // After the request completes, it exports the counts to Prometheus metrics. | ||
| func UnaryCountingInterceptor() grpc.UnaryServerInterceptor { | ||
| return datalayer.UnaryCountingInterceptor(nil) |
There was a problem hiding this comment.
similar to my other comment; this middleware being in internal and delegating to pkg is weird
| return nil, fmt.Errorf("failed to load bootstrap files: %w", err) | ||
| // Combine bootstrap files and direct contents into a single set so that | ||
| // all definitions are written together (WriteSchema replaces the full schema). | ||
| bootstrapContents := make(map[string][]byte, len(opts.BootstrapFiles)+len(opts.BootstrapFileContents)) |
There was a problem hiding this comment.
am i right in understanding that opts.BootstrapFileContents is not really an opt for users; it's to make testing easier?
internal/services/integrationtesting/consistencytestutil/clusteranddata.go
Outdated
Show resolved
Hide resolved
| UniqueID(ctx context.Context) (string, error) | ||
| MetricsID() (string, error) | ||
| Close() error | ||
| } |
There was a problem hiding this comment.
| } | |
| // BulkExport | |
| // BulkImport | |
| // DeleteAllData (i saw the convenience method in datastore/util.go...) | |
| } |
and any other APIs this DataLayer should eventually have, in your mind?
i'd like to add them as comments here for future PRs
There was a problem hiding this comment.
Nothing comes to mind
There was a problem hiding this comment.
can you add a TODO here with the list of methods that we need to add in a follow-up PR?
| ListAllCaveatDefinitions(ctx context.Context) ([]datastore.RevisionedCaveat, error) | ||
|
|
||
| // ListAllSchemaDefinitions lists all type and caveat definitions. | ||
| ListAllSchemaDefinitions(ctx context.Context) (map[string]datastore.SchemaDefinition, error) |
There was a problem hiding this comment.
there is cognitive dissonance here. as i was reading this list of functions, all the ones above return a datastore.Revisioned<something>. This one and the ones below break this rule. Why?
There was a problem hiding this comment.
Because some callers don't need the revision and they usually just need a map
There was a problem hiding this comment.
is that indicative that these methods do not belong inside of RevisionedReader then?
There was a problem hiding this comment.
No - ALL datastore access MUST be revisioned; the caller may not need the revision of when the rel or schema changed, however (most callers do not)
pkg/datalayer/datalayer.go
Outdated
| LookupSchemaDefinitionsByNames(ctx context.Context, names []string) (map[string]datastore.SchemaDefinition, error) | ||
|
|
||
| // LookupTypeDefinitionsByNames looks up type definitions by name. | ||
| LookupTypeDefinitionsByNames(ctx context.Context, names []string) (map[string]datastore.SchemaDefinition, error) |
There was a problem hiding this comment.
why does LookupTypeDefinitionsByNames return a map of SCHEMA definitions and not a map of RevisionedNamespace
same question for LookupCaveatDefinitionsByNames - it could returned RevisionedCaveats
There was a problem hiding this comment.
Because the callers didn't need the revisions
There was a problem hiding this comment.
that doesn't answer the question. What if you returned Namespaces? that is more type-safe than returning SchemaDefinition, which forces callers to disambiguate between Namespaces and Caveats
There was a problem hiding this comment.
Oh, I thought you were asking why it wasn't revisioned. Changed to the specific type
| // RevisionedReader reads data at a specific revision. | ||
| type RevisionedReader interface { |
There was a problem hiding this comment.
- this is called
RevisionedReader yet none of the methods below take arevisionas parameter. I mean, I get it, in order to get aRevisionedReaderyou must doSnapshotReader(revision).ReadSchema(). But: what does having theSnapshotReadermethod buy you? as far as i can tell, it buys you one extra hop for no good reason. you had to writedefaultDataLayerandrevisionedReaderstructs when it could have been all indefaultDataLayer, you had to writecountingDataLayerandcountingRevisionedReader, etc.. i guess it saves you from passing therevisionparameter around, but i wouldn't call this necessarily a bad thing if it makes it super obvious what revision you are using. - funny:
Watchin line 36, takes a revision but is not part ofRevisionedReader😄
There was a problem hiding this comment.
it is reading a specific revision, hence revisioned reader. I could just name it SnapshotReader like we do in datatstore, but I wanted a different interface name to see it at a glance
There was a problem hiding this comment.
The point i was trying to make is that the name RevisionedReader doesn't communicate how the revision got there. When someone holds a RevisionedReader and calls methods on it, the revision is invisible at the call site. You can't tell, without tracing back to where it was constructed, what revision the data is being read at. The revision is implicit state hidden inside the object.
So what I was proposing is adding a new parameter revision to every method in RevisionedReader. Then, it is obvious what revision the data is being read at.
This will have to be done in a follow-up PR though.
Can you address (2)?
There was a problem hiding this comment.
The point i was trying to make is that the name RevisionedReader doesn't communicate how the revision got there
Which is fine... the caller just asked for a specific point-in-time. This is no different from the current datastore interface.
Can you address (2)?
Watch doesn't operate on a revision, it starts at one; it should not be on the reader
| require.NoError(t, err) | ||
|
|
||
| ds, err := dsfortesting.NewMemDBDatastoreForTesting(t, 100, 1*time.Second, 100*time.Minute) | ||
| dl, err := dsfortesting.DataLayerForTesting(t, 100, 1*time.Second, 100*time.Minute) |
There was a problem hiding this comment.
same as #2912 (comment). it's incredible that in the test setup you are building a full blown datastore per test! when all you need is something that responds to .UniqueID(ctx).
There was a problem hiding this comment.
Not really - again, we could use a fake here, but we have an in-memory DS, might as well use it (and it allows us to test for dependencies within the DS as well)
|
Addressed feedback |
Replace direct datastore.Reader usage with datalayer.DataLayer and datalayer.RevisionedReader throughout the codebase. This hides Legacy* methods behind clean interfaces (SchemaReader, RevisionedReader, ReadWriteTransaction) and routes all schema/relationship access through the datalayer package.
Key changes: