-
Notifications
You must be signed in to change notification settings - Fork 1.7k
AVRO-4135: [C] full type name for json encoded unions #3373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Can you show an example of the ambiguous JSON data and the Avro schema with which was encoded? https://avro.apache.org/docs/1.12.0/specification/#json-encoding stipulates:
That specifically says "name" rather than "fullname". If the JSON encoders now should output the fullname of the type as the property name, then the specification should be changed. This likewise says "name" rather than "fullname":
Moreover, the schema resolution part of the specification says that schemas match if:
Suppose the specification were changed to allow multiple identically named record schemas as members of the same union, as long as they are in different namespaces. Then, if such a union schema is the reader's schema, then both record schemas can match the writer's schema, in which case the first matching schema must be used in schema resolution — even if the other record schema has the same namespace as in the writer's schema. That seems an undesirable result to me. |
|
@KalleOlaviNiemitalo the ambiguity you describe is contained in AVRO-2287. What we are seeing is that the C library goes for the name interpretation and the java library goes for the fullname description. To reproduce this with an example, I used the schema at the bottom of this message. To encode:
On the decoding side: C does not have a implementation for decoding the json encoding. However when you try with java to to read Which shows the implementation differences for both languages. And java is clearly expecting a fully qualified path for unions. This patch proposes to line up the behavior for C with that of Java. Would you prefer the other way around? Are you willing to accept a patch on the Java side which allows reading the C encoding? Example schema: Java code use to generate exception: |
|
@KalleOlaviNiemitalo meanwhile I checked all the other language bindings on how they interpret the spec. All the language bindings which have a JSON encoder seem to fully qualify their types. It is only the C binding which deviates here. So this patch lines up this behavior with the other language bindings.
[1]: possible to configure a non standard encoding where the union value is not wrapped in an object. This variant is not decodable. Again, if you think decoders should, like the js decoder, be more lenient I do not mind to workout a patch for the Java bindings. |
Support unqualified type references for unions when decoding them from json. AVRO-2287 makes it unclear if type reference for a JSON encoded union needs to be qualified or not. Today all encoders use the fully qualified types except the C JSON encoder which uses the unqualified type. In this patch we make the java JSON Decoder more lenient and let it fallback to unqualified types names when no qualified type name matches. Which matches the behavior currently implemented in the Javascript Json decoder. This patch is an alternative for apache#3373 where it is proposed to update the C library instead.
|
@KalleOlaviNiemitalo in #3374 I propose to add support for non qualified type resolution in the java |
| check_exit(avro_generic_value_new(iface, &val)); | ||
|
|
||
| #define TEST_AVRO_4135 (1) | ||
| #if TEST_AVRO_4135 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this done here ?
Do you need to disable the rest sometimes ?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this style from test_avro_766.c I can remove it from both if desired.
| if (json_object_set_new(result, branch_name, branch_json)) { | ||
| avro_set_error("Cannot append branch to union"); | ||
| json_decref(result); | ||
| json_decref(full_name_buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do the same for branch_json too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No jansson guarantees to take ownership of branch_json and will decref on errors.
| json_t *result = json_object(); | ||
| if (result == NULL) { | ||
| avro_set_error("Cannot allocate JSON union"); | ||
| json_decref(full_name_buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
full_name_buf could be NULL if there is no namespace. Maybe we need to check before calling json_decref() ?!
Same issues below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json_decref supports dereferencing NULL.
So I do not think this is a problem.
| branch_schema = | ||
| avro_schema_union_branch(union_schema, disc); | ||
|
|
||
| namespace = avro_schema_namespace(branch_schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems avro_schema_namespace does not support schema reference, so it will return NULL even if the referenced schema has a namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly clear to me what you mean here. In the use cases I tested it with it works.
Not clear which use case is missing.
Like java use the full type name when encoding the types for unions. This to prevent ambiguities when decoding this json encoded field back to avro using for example the java library.
badada3 to
ba1ea28
Compare
steven-aerts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough review.
| json_t *result = json_object(); | ||
| if (result == NULL) { | ||
| avro_set_error("Cannot allocate JSON union"); | ||
| json_decref(full_name_buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json_decref supports dereferencing NULL.
So I do not think this is a problem.
| if (json_object_set_new(result, branch_name, branch_json)) { | ||
| avro_set_error("Cannot append branch to union"); | ||
| json_decref(result); | ||
| json_decref(full_name_buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No jansson guarantees to take ownership of branch_json and will decref on errors.
| check_exit(avro_generic_value_new(iface, &val)); | ||
|
|
||
| #define TEST_AVRO_4135 (1) | ||
| #if TEST_AVRO_4135 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this style from test_avro_766.c I can remove it from both if desired.
What is the purpose of the change
Like java use the full type name when encoding the types for unions.
This to prevent ambiguities when decoding this json encoded field back to avro using for example the java library.
Without this patch you cannot use the Java
JsonDecoderto decode union with records having anamespaceentry generated by the avro C library.Verifying this change
This change added tests and can be verified by running the
test_avro_4135Documentation