openapi3: cache compiled JSON Schema 2020-12 validator per *Schema#1186
openapi3: cache compiled JSON Schema 2020-12 validator per *Schema#1186efritz wants to merge 4 commits into
Conversation
Compiling the JSON Schema 2020-12 validator is expensive: it marshals the OpenAPI Schema, unmarshals it back into a generic map, recursively transforms OpenAPI-specific keywords into JSON Schema 2020-12 keywords, and runs the jsonschema compiler. For long-lived processes that validate many requests against the same schema -- e.g. an HTTP server using openapi3filter against an OpenAPI 3.1 document, where openapi3filter automatically routes through useJSONSchema2020 -- this compilation dominates per-request validation time. Cache the compiled validator in a package-level sync.Map keyed by *Schema, mirroring the existing compiledPatterns precedent in schema.go. The first call for a given schema pays the compilation cost; subsequent calls reuse the same compiled validator. Benchmark on a small object schema (Apple M4 Pro): before: 39030 ns/op 59628 B/op 714 allocs/op after: 939 ns/op 1434 B/op 34 allocs/op Compilation failures are not cached: the compiler is deterministic given the schema, so the failure path will fail consistently and caching it would require a sentinel value.
| func TestSchemaJSONSchema2020ValidatorCache(t *testing.T) { | ||
| schema := &Schema{Type: &Types{"string"}} | ||
|
|
||
| require.NoError(t, schema.useJSONSchema2020(&schemaValidationSettings{useJSONSchema2020: true}, "hello")) | ||
|
|
||
| // The compiled validator should be cached | ||
| v, ok := compiledJSONSchemaValidators.Load(schema) | ||
| require.True(t, ok) | ||
| require.NotNil(t, v) | ||
|
|
||
| // A second call should reuse the same compiled validator | ||
| require.NoError(t, schema.useJSONSchema2020(&schemaValidationSettings{useJSONSchema2020: true}, "world")) | ||
| v2, ok := compiledJSONSchemaValidators.Load(schema) | ||
| require.True(t, ok) | ||
| require.Same(t, v, v2) | ||
| } |
There was a problem hiding this comment.
Please also demonstrate that updating the schema updates the cache.
There was a problem hiding this comment.
Ah; maybe this approach doesn't work in general for an upstream patch.
In my usage, I assumed that once we're on this code path there would be no reasonable updates to the Schema object, so the cache would be useful for the lifetime of the schema.
Should the cache be busted on any change to the Schema, if we can cache validators at all?
There was a problem hiding this comment.
Yes. A way to update the cached data needs to exist.
If you can't come up with a satisfying cache key, one way to do this could be to piggy back on the Validate() call such that it does the compilation+caching at that time, not "on every invocation".
WRT your other approach I am not getting it. Please flesh it out more.
|
@fenollp We could also go with this approach: https://github.com/getkin/kin-openapi/compare/master...efritz:kin-openapi:cache-jsonschema2020-validator-2?expand=1 However, we'd still need an answer to the cache invalidation problem for general application. |
useJSONSchema2020callsnewJSONSchemaValidator(schema)on every invocation. Compilation is expensive (marshal → unmarshal → keyword transform →jsonschema.Compile), andopenapi3filterautomatically takes this path for every OpenAPI 3.1 request, making it dominant per-request overhead for long-lived servers.This PR caches the compiled
*jsonSchemaValidatorin a package-levelsync.Mapkeyed by*Schema: first call compiles, subsequent calls do a singleLoad, races converge viaLoadOrStore. Compilation failures are not cached (deterministic given the schema; sentinel value not worth the complexity).Benchmark
Scratch benchmark validating a small object schema repeatedly (
darwin/arm64, M4 Pro,-benchtime=2s):benchmark source