Skip to content

Conversation

@ashley-taylor
Copy link
Contributor

What is the purpose of the change

Allows custom encodings to support file changes.

Used #1692 as inspiration

Verifying this change

(Please pick one of the following options)

This change added tests and can be verified as follows:
Test custom encodings with several different versions, confirming that the encoder is linked to the schema

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? JavaDocs

@github-actions github-actions bot added the Java Pull Requests for Java binding label Jul 30, 2025
@ashley-taylor
Copy link
Contributor Author

@opwvhk If you could review would be awesome thanks

@ashley-taylor
Copy link
Contributor Author

@opwvhk fixed prettier thanks

@ashley-taylor ashley-taylor force-pushed the AVRO-3520--Expose-read-schema-in-custom-encoding branch from 76ebd9a to f378f08 Compare August 2, 2025 09:36
@ashley-taylor
Copy link
Contributor Author

@opwvhk @martin-g anything I can do to help this one get reviewed? Thanks

@itstheceo
Copy link

@KalleOlaviNiemitalo @opwvhk could you please have a look? Much appreciated

@ashley-taylor
Copy link
Contributor Author

@opwvhk, following up to see if you have any time to review this PR. Thanks

* opportunity to detect schema changes and behave accordingly. Useful for
* maintaining backwards compatibility.
*
* @param schema the schema detected during read.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#withSchema() is used also at https://github.com/apache/avro/pull/3445/files#diff-87216da701d0b2e3135497e3a9ffc4065a56d31142729d9867e0ca2e7dd2b2d1R1072 where the schema is write. The javadoc seems not correct by saying read

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated comment


private FieldAccessor[] createAccessorsFor(Schema schema) {

var byNameSchema = buildByName(clazz, schema);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how often #createAccessorsFor(Schema) is called but buildByName(clazz, schema) seems to be wasteful. It uses reflection to extract the field names -> accessor map and then drops it. Next time this method is used again it does it again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is only called by #getAccessorsFor(Schema schema which caches this

if (enc != null) {
try {
return new CustomEncodingWrapper(enc.using().getDeclaredConstructor().newInstance());
var customEncodingClass = enc.using().getDeclaredConstructor().newInstance();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...Class suggests that this is a java.lang.Class, but actually it is an instance of CustomEncoding.class.

* @param schema the schema detected during read.
* @return custom encoding to be used.
*/
public CustomEncoding<T> withSchema(Schema schema) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API bothers me a bit.
Does it need to be thread-safe ? Because the overrides of this method could do something like this.schema = schema.
IMO this method should make sure somehow that it always returns a new instance of CustomEncoding that uses the provided schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, it is only called right after creation, so the libraries use is thread safe.
We could make it so that instead of a zero-argument constructor, it looks for a single-argument constructor and passes the schema in that way. It might be a little less discoverable as a feature, but safer.
They would need two constructors, as the zero-argument constructor will be needed to identify the schema

Alternativly perform an identity comparison after calling this method. Would mean default method would perform a reflection creation which I don't paticually like.

@ashley-taylor
Copy link
Contributor Author

@martin-g thank you for the review. Have addressed or replied to the feedback. If you would like to see the multiple constructor approach happy to make that change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Java Pull Requests for Java binding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants