Skip to content

openapi3: cache compiled JSON Schema 2020-12 validator per *Schema#1186

Open
efritz wants to merge 4 commits into
getkin:masterfrom
efritz:cache-jsonschema2020-validator
Open

openapi3: cache compiled JSON Schema 2020-12 validator per *Schema#1186
efritz wants to merge 4 commits into
getkin:masterfrom
efritz:cache-jsonschema2020-validator

Conversation

@efritz
Copy link
Copy Markdown

@efritz efritz commented May 16, 2026

useJSONSchema2020 calls newJSONSchemaValidator(schema) on every invocation. Compilation is expensive (marshal → unmarshal → keyword transform → jsonschema.Compile), and openapi3filter automatically takes this path for every OpenAPI 3.1 request, making it dominant per-request overhead for long-lived servers.

This PR caches the compiled *jsonSchemaValidator in a package-level sync.Map keyed by *Schema: first call compiles, subsequent calls do a single Load, races converge via LoadOrStore. Compilation failures are not cached (deterministic given the schema; sentinel value not worth the complexity).

Benchmark

Scratch benchmark validating a small object schema repeatedly (darwin/arm64, M4 Pro, -benchtime=2s):

ns/op B/op allocs/op
Before 39,030 59,628 714
After 832 1,434 34
~47× ~42× ~21×
benchmark source
package openapi3_test

import (
    "testing"

    "github.com/getkin/kin-openapi/openapi3"
)

func BenchmarkJSONSchema2020Validator(b *testing.B) {
    maxLen := uint64(64)
    age0, age150 := float64(0), float64(150)
    schema := &openapi3.Schema{
        Type: &openapi3.Types{"object"},
        Properties: openapi3.Schemas{
            "name":  &openapi3.SchemaRef{Value: &openapi3.Schema{Type: &openapi3.Types{"string"}, MinLength: 1, MaxLength: &maxLen}},
            "age":   &openapi3.SchemaRef{Value: &openapi3.Schema{Type: &openapi3.Types{"integer"}, Min: &age0, Max: &age150}},
            "email": &openapi3.SchemaRef{Value: &openapi3.Schema{Type: &openapi3.Types{"string"}, Format: "email"}},
        },
        Required: []string{"name", "age"},
    }
    value := map[string]any{"name": "Jane Doe", "age": 30, "email": "jane@example.com"}

    if err := schema.VisitJSON(value, openapi3.EnableJSONSchema2020()); err != nil {
        b.Fatal(err)
    }
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        if err := schema.VisitJSON(value, openapi3.EnableJSONSchema2020()); err != nil {
            b.Fatal(err)
        }
    }
}

Compiling the JSON Schema 2020-12 validator is expensive: it marshals
the OpenAPI Schema, unmarshals it back into a generic map, recursively
transforms OpenAPI-specific keywords into JSON Schema 2020-12 keywords,
and runs the jsonschema compiler. For long-lived processes that
validate many requests against the same schema -- e.g. an HTTP server
using openapi3filter against an OpenAPI 3.1 document, where
openapi3filter automatically routes through useJSONSchema2020 -- this
compilation dominates per-request validation time.

Cache the compiled validator in a package-level sync.Map keyed by
*Schema, mirroring the existing compiledPatterns precedent in
schema.go. The first call for a given schema pays the compilation
cost; subsequent calls reuse the same compiled validator.

Benchmark on a small object schema (Apple M4 Pro):

  before: 39030 ns/op  59628 B/op  714 allocs/op
  after:    939 ns/op   1434 B/op   34 allocs/op

Compilation failures are not cached: the compiler is deterministic
given the schema, so the failure path will fail consistently and
caching it would require a sentinel value.
Comment on lines +9 to +24
func TestSchemaJSONSchema2020ValidatorCache(t *testing.T) {
schema := &Schema{Type: &Types{"string"}}

require.NoError(t, schema.useJSONSchema2020(&schemaValidationSettings{useJSONSchema2020: true}, "hello"))

// The compiled validator should be cached
v, ok := compiledJSONSchemaValidators.Load(schema)
require.True(t, ok)
require.NotNil(t, v)

// A second call should reuse the same compiled validator
require.NoError(t, schema.useJSONSchema2020(&schemaValidationSettings{useJSONSchema2020: true}, "world"))
v2, ok := compiledJSONSchemaValidators.Load(schema)
require.True(t, ok)
require.Same(t, v, v2)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also demonstrate that updating the schema updates the cache.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah; maybe this approach doesn't work in general for an upstream patch.

In my usage, I assumed that once we're on this code path there would be no reasonable updates to the Schema object, so the cache would be useful for the lifetime of the schema.

Should the cache be busted on any change to the Schema, if we can cache validators at all?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. A way to update the cached data needs to exist.

If you can't come up with a satisfying cache key, one way to do this could be to piggy back on the Validate() call such that it does the compilation+caching at that time, not "on every invocation".


WRT your other approach I am not getting it. Please flesh it out more.

Comment thread openapi3/schema_jsonschema_validator.go Outdated
Comment thread openapi3/schema_jsonschema_validator.go Outdated
Comment thread openapi3/schema_jsonschema_validator.go Outdated
@efritz
Copy link
Copy Markdown
Author

efritz commented May 18, 2026

@fenollp We could also go with this approach: https://github.com/getkin/kin-openapi/compare/master...efritz:kin-openapi:cache-jsonschema2020-validator-2?expand=1

However, we'd still need an answer to the cache invalidation problem for general application.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants