Skip to content

Comments

refactor: introduce datalayer abstraction, remove intermediate interfaces#2912

Draft
josephschorr wants to merge 5 commits intoauthzed:mainfrom
josephschorr:datalayer
Draft

refactor: introduce datalayer abstraction, remove intermediate interfaces#2912
josephschorr wants to merge 5 commits intoauthzed:mainfrom
josephschorr:datalayer

Conversation

@josephschorr
Copy link
Member

Replace direct datastore.Reader usage with datalayer.DataLayer and datalayer.RevisionedReader throughout the codebase. This hides Legacy* methods behind clean interfaces (SchemaReader, RevisionedReader, ReadWriteTransaction) and routes all schema/relationship access through the datalayer package.

Key changes:

  • Add pkg/datalayer with DataLayer, RevisionedReader, SchemaReader, and ReadWriteTransaction interfaces that wrap the raw datastore
  • Replace manual context key boilerplate with ctxkey.NewBoxedWithDefault
  • Rename internal/middleware/datastore and pkg/middleware/datastore packages to datalayer to match the abstraction they serve
  • Remove DefinitionLookup interface from pkg/schema/resolver.go; use datalayer.SchemaReader directly
  • Remove RelationshipQuerier interface from pagination/iterator.go; use datalayer.RevisionedReader directly
  • Remove RelationshipReader interface from pkg/query/context.go; use datalayer.RevisionedReader directly
  • Delete internal/datastore/schema/schema.go (consolidated into datalayer)
  • Update all datastore implementations (postgres, crdb, mysql, spanner, memdb) and proxies to work through the new interfaces

@github-actions github-actions bot added area/cli Affects the command line area/api v1 Affects the v1 API area/datastore Affects the storage system area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) area/dispatch Affects dispatching of requests labels Feb 19, 2026
…aces

Replace direct datastore.Reader usage with datalayer.DataLayer and
datalayer.RevisionedReader throughout the codebase. This hides Legacy*
methods behind clean interfaces (SchemaReader, RevisionedReader,
ReadWriteTransaction) and routes all schema/relationship access through
the datalayer package.

Key changes:
- Add pkg/datalayer with DataLayer, RevisionedReader, SchemaReader, and
  ReadWriteTransaction interfaces that wrap the raw datastore
- Replace manual context key boilerplate with ctxkey.NewBoxedWithDefault
- Rename internal/middleware/datastore and pkg/middleware/datastore
  packages to datalayer to match the abstraction they serve
- Remove DefinitionLookup interface from pkg/schema/resolver.go;
  use datalayer.SchemaReader directly
- Remove RelationshipQuerier interface from pagination/iterator.go;
  use datalayer.RevisionedReader directly
- Remove RelationshipReader interface from pkg/query/context.go;
  use datalayer.RevisionedReader directly
- Delete internal/datastore/schema/schema.go (consolidated into datalayer)
- Update all datastore implementations (postgres, crdb, mysql, spanner,
  memdb) and proxies to work through the new interfaces
@codecov
Copy link

codecov bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 59.85401% with 330 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.32%. Comparing base (05f1566) to head (e7c57b7).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
pkg/datalayer/impl.go 42.11% 66 Missing ⚠️
pkg/datalayer/counting.go 57.74% 37 Missing and 4 partials ⚠️
pkg/datalayer/readonly.go 12.50% 28 Missing ⚠️
internal/services/v1/permissions.go 64.41% 13 Missing and 8 partials ⚠️
internal/services/v1/experimental.go 63.64% 13 Missing and 7 partials ⚠️
internal/graph/lookupsubjects.go 50.00% 8 Missing and 6 partials ⚠️
pkg/datalayer/schema_adapter.go 51.73% 10 Missing and 4 partials ⚠️
pkg/middleware/datalayer/datastore.go 0.00% 11 Missing ⚠️
internal/services/v1/relationships.go 71.43% 6 Missing and 4 partials ⚠️
pkg/development/devcontext.go 67.86% 5 Missing and 4 partials ⚠️
... and 24 more

❌ Your project check has failed because the head coverage (74.32%) is below the target coverage (75.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2912      +/-   ##
==========================================
- Coverage   74.51%   74.32%   -0.19%     
==========================================
  Files         484      486       +2     
  Lines       59201    59465     +264     
==========================================
+ Hits        44109    44191      +82     
- Misses      11995    12154     +159     
- Partials     3097     3120      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@@ -0,0 +1,139 @@
package datalayer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
package datalayer
//go:generate go run go.uber.org/mock/mockgen -source datalayer.go -destination ./mocks/mock_datalayer.go
package datalayer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, I highly dislike mocks, so will keep as-is

Comment on lines +473 to +475
dl := datalayer.NewDataLayer(ds)
sr, err := dl.SnapshotReader(headRevision).ReadSchema()
req.NoError(err)
Copy link
Contributor

@miparnisari miparnisari Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a follow up PR:

RunSingleCaveatExpression takes a datastore.SchemaReadable — an interface with 1 method. These tests don't need a real datastore at all. Spinning up a full in-memory datastore, parsing a schema string, writing it to storage, fetching a head revision, and getting a snapshot reader is a sign of "excessive arrange phase" code smell (patent pending 😄 ).

There are downsides to this:

  1. it distracts me from what is actually being tested: caveat expression evaluation, not datastore behavior.
  2. it points to a developer issue: we don't have a production-grade helper to create a caveat proto
  3. it points to a code issue: when i was refactoring this, i noticed this - why does LookupCaveatDefinitionsByNames store datastore.SchemaDefinition and not *core.CaveatDefinition directly? i see no less than 4 places where callers are forced to do if caveatDef, ok := def.(*core.CaveatDefinition).

This could be simplified as follows:

ctrl := gomock.NewController(t)
defer ctrl.Finish()

sr := mock_datalayer.NewMockSchemaReader(ctrl)
sr.EXPECT().LookupCaveatDefinitionsByNames(gomock.Any(), gomock.Any()).Return(map[string]datastore.SchemaDefinition{
	"firstCaveat": createTestCaveat(t, "firstCaveat", "first", types.Default.IntType, "first == 42"),
	"secondCaveat":  createTestCaveat(t, "secondCaveat", "second", types.Default.StringType, "second == 'hello'"),
	"thirdCaveat":  createTestCaveat(t, "thirdCaveat", "third", types.Default.BooleanType, "third"),
}, nil).AnyTimes()

and you already wrote createTestCaveat in a different file 😄

func createTestCaveat(t *testing.T, name, paramName string, paramType types.VariableType, expr string) *core.CaveatDefinition {
	env, err := pkgcaveats.EnvForVariablesWithDefaultTypeSet(map[string]types.VariableType{
		paramName: paramType,
	})
	require.NoError(t, err)

	c, err := pkgcaveats.CompileCaveatWithName(env, expr, name)
	require.NoError(t, err)

	cBytes, err := c.Serialize()
	require.NoError(t, err)

	return &core.CaveatDefinition{
		Name:                 name,
		SerializedExpression: cBytes,
		ParameterTypes:       env.EncodedParametersTypes(),
	}
}

A similar change could be made in TestValidateRelationshipOperations, TestCheckNamespaceAndRelations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, no mocks.

If we wanted to simplify here, we could use a fake implementation of the interface, but I don't really have a strong opinion on that vs keeping as-is, so I'd prefer to keep as-is.

Comment on lines 11 to 15
// ContextWithHandle adds a placeholder to a context that will later be
// filled by the datalayer.
func ContextWithHandle(ctx context.Context) context.Context {
return datalayer.ContextWithHandle(ctx)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • adds a placeholder ?? why do we need a placeholder? datalayerKey = ctxkey.NewBoxedWithDefault is already setting a default. why do we need this method at all? every usage i see (e.g. in tests), they call ContextWithHandle and then immediately call SetInContext. just make one method...
  • why do we need this file? it is delegating everything to pkg/datalayer. either remove this file, or put all the logic here and then pkg calls into internal. right now this is backwards (internal calling into pkg)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed and simplified

// Write the caveat definitions.
err := rwt.LegacyWriteCaveats(ctx, caveatDefs)
revision, err := dl.ReadWriteTx(ctx, func(ctx context.Context, rwt datalayer.ReadWriteTransaction) error {
ctx = datalayermw.ContextWithDataLayer(ctx, dl)
Copy link
Contributor

@miparnisari miparnisari Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i remove this line, the tests still pass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

// PopulateFromFiles populates the given datastore with the namespaces and tuples found in
// the validation file(s) specified.
func PopulateFromFiles(ctx context.Context, ds datastore.Datastore, caveatTypeSet *caveattypes.TypeSet, filePaths []string) (*PopulatedValidationFile, datastore.Revision, error) {
func PopulateFromFiles(ctx context.Context, dl datalayer.DataLayer, caveatTypeSet *caveattypes.TypeSet, filePaths []string) (*PopulatedValidationFile, datastore.Revision, error) {
Copy link
Contributor

@miparnisari miparnisari Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for another PR: this feels like an API that should be part of the dataLayer implementation, instead of being here which is very hard to discover.

You could add these methods to DataLayer:

  • bulk export(To file)
  • bulk export(to response)
  • bulk import(from files)
  • bulk import(from request)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe... I can see it both ways, as I'm unsure we want the file format to be defined in the data layer itself

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be bulkExport(io.Writer) and bulkImport(io.Reader)

Comment on lines 188 to 190
if aerr != nil {
return aerr
}
Copy link
Contributor

@miparnisari miparnisari Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i remove from line 185 to 188, the tests still pass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test, although other tests do break if this is removed (any test using the file loader outside of this package)

// UnaryCountingInterceptor wraps the datalayer in a counting proxy for each unary request.
// After the request completes, it exports the counts to Prometheus metrics.
func UnaryCountingInterceptor() grpc.UnaryServerInterceptor {
return datalayer.UnaryCountingInterceptor(nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to my other comment; this middleware being in internal and delegating to pkg is weird

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

return nil, fmt.Errorf("failed to load bootstrap files: %w", err)
// Combine bootstrap files and direct contents into a single set so that
// all definitions are written together (WriteSchema replaces the full schema).
bootstrapContents := make(map[string][]byte, len(opts.BootstrapFiles)+len(opts.BootstrapFileContents))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am i right in understanding that opts.BootstrapFileContents is not really an opt for users; it's to make testing easier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

UniqueID(ctx context.Context) (string, error)
MetricsID() (string, error)
Close() error
}
Copy link
Contributor

@miparnisari miparnisari Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
// BulkExport
// BulkImport
// DeleteAllData (i saw the convenience method in datastore/util.go...)
}

and any other APIs this DataLayer should eventually have, in your mind?

i'd like to add them as comments here for future PRs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing comes to mind

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a TODO here with the list of methods that we need to add in a follow-up PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

ListAllCaveatDefinitions(ctx context.Context) ([]datastore.RevisionedCaveat, error)

// ListAllSchemaDefinitions lists all type and caveat definitions.
ListAllSchemaDefinitions(ctx context.Context) (map[string]datastore.SchemaDefinition, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is cognitive dissonance here. as i was reading this list of functions, all the ones above return a datastore.Revisioned<something>. This one and the ones below break this rule. Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because some callers don't need the revision and they usually just need a map

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that indicative that these methods do not belong inside of RevisionedReader then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - ALL datastore access MUST be revisioned; the caller may not need the revision of when the rel or schema changed, however (most callers do not)

LookupSchemaDefinitionsByNames(ctx context.Context, names []string) (map[string]datastore.SchemaDefinition, error)

// LookupTypeDefinitionsByNames looks up type definitions by name.
LookupTypeDefinitionsByNames(ctx context.Context, names []string) (map[string]datastore.SchemaDefinition, error)
Copy link
Contributor

@miparnisari miparnisari Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does LookupTypeDefinitionsByNames return a map of SCHEMA definitions and not a map of RevisionedNamespace

same question for LookupCaveatDefinitionsByNames - it could returned RevisionedCaveats

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the callers didn't need the revisions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesn't answer the question. What if you returned Namespaces? that is more type-safe than returning SchemaDefinition, which forces callers to disambiguate between Namespaces and Caveats

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought you were asking why it wasn't revisioned. Changed to the specific type

Comment on lines +76 to +77
// RevisionedReader reads data at a specific revision.
type RevisionedReader interface {
Copy link
Contributor

@miparnisari miparnisari Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. this is called RevisionedReader yet none of the methods below take a revision as parameter. I mean, I get it, in order to get a RevisionedReader you must do SnapshotReader(revision).ReadSchema(). But: what does having the SnapshotReader method buy you? as far as i can tell, it buys you one extra hop for no good reason. you had to write defaultDataLayer and revisionedReader structs when it could have been all in defaultDataLayer, you had to write countingDataLayer and countingRevisionedReader, etc.. i guess it saves you from passing the revision parameter around, but i wouldn't call this necessarily a bad thing if it makes it super obvious what revision you are using.
  2. funny: Watch in line 36, takes a revision but is not part of RevisionedReader 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is reading a specific revision, hence revisioned reader. I could just name it SnapshotReader like we do in datatstore, but I wanted a different interface name to see it at a glance

Copy link
Contributor

@miparnisari miparnisari Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point i was trying to make is that the name RevisionedReader doesn't communicate how the revision got there. When someone holds a RevisionedReader and calls methods on it, the revision is invisible at the call site. You can't tell, without tracing back to where it was constructed, what revision the data is being read at. The revision is implicit state hidden inside the object.

So what I was proposing is adding a new parameter revision to every method in RevisionedReader. Then, it is obvious what revision the data is being read at.

This will have to be done in a follow-up PR though.

Can you address (2)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point i was trying to make is that the name RevisionedReader doesn't communicate how the revision got there

Which is fine... the caller just asked for a specific point-in-time. This is no different from the current datastore interface.

Can you address (2)?

Watch doesn't operate on a revision, it starts at one; it should not be on the reader

require.NoError(t, err)

ds, err := dsfortesting.NewMemDBDatastoreForTesting(t, 100, 1*time.Second, 100*time.Minute)
dl, err := dsfortesting.DataLayerForTesting(t, 100, 1*time.Second, 100*time.Minute)
Copy link
Contributor

@miparnisari miparnisari Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as #2912 (comment). it's incredible that in the test setup you are building a full blown datastore per test! when all you need is something that responds to .UniqueID(ctx).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really - again, we could use a fake here, but we have an in-memory DS, might as well use it (and it allows us to test for dependencies within the DS as well)

@josephschorr
Copy link
Member Author

Addressed feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api v1 Affects the v1 API area/cli Affects the command line area/datastore Affects the storage system area/dispatch Affects dispatching of requests area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants