From 6dcc8fc37de9e85da7a0995b1c5849365ea9b073 Mon Sep 17 00:00:00 2001 From: ndp-opendap Date: Fri, 11 Jul 2025 07:33:10 -0700 Subject: [PATCH 1/9] Draft Checksums Change --- 01_data-model-and-serialized-rep.adoc | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/01_data-model-and-serialized-rep.adoc b/01_data-model-and-serialized-rep.adoc index b489b1d..4a00be3 100644 --- a/01_data-model-and-serialized-rep.adoc +++ b/01_data-model-and-serialized-rep.adoc @@ -1555,11 +1555,7 @@ the client MAY use it for that purpose if it chooses. Note that the value of the checksum will change depending on the byte order used to serialize the data. -The checksum is made visible to the client by adding an attribute to -each top-level variable in the DMR. This attribute is named -"`_DAP4_Checksum_CRC32`". - -In all cases, the checksum is computed over the serialized +The checksum is computed over the serialized representation of each top-level variable. The checksum is computed before any chunking Section link:#_dap4_chunked_data_representation[[1.7]]) is applied. @@ -1571,12 +1567,22 @@ can have significant performance consequences since the server may need to read and serialize all of the data for all of the variables mentioned in the DMR even though that data is not transmitted to the client. +For a dmr response the checksum is made visible to the client by the addition +of an Attribute to each top-level variable in the DMR. This attribute is named +"`_DAP4_Checksum_CRC32`". + If the request to the server is a data request, then the checksum value will follow the value of the variable in the data part of the response. The computed checksum is appended to the serialized representation for transmission to the client. Note that in this case, the client is expected to add the "`_DAP4_Checksum_CRC32`" attribute to the DMR. +The serialized data response indicates to the client that checksums are +part of the serialized binary data by setting bit 3 in the very first +chunk header. +See link:#_dap4_chunked_data_representation[ Section 1.7, DAP4 +Chunked Data Representaion ]. + The default checksum algorithm is CRC32. So the size of each checksum inserted in the serialization will be a 32 bit integer. The checksum integer will use the same endian representation as for the all other @@ -1819,6 +1825,7 @@ the possible flags are as follows: | *0* | A data containing chunk | The last data chunk | *1* | The current chunk is not an error chunk. | The current chunk is an "`error chunk`" and contains an error message | *2* | The data in this response is encoded using Big-Endian (i.e. network byte order) | The data in this response is encoded using Little-Endian +| *3* | There are no checksums in the response. | The response includes 32 bit CRC checksum values in the serialized binary data. |=== It is possible for a chunk type to have more than one of the flags. So, From bbbee20527b1181cfa5f31b344b7324be76c71cf Mon Sep 17 00:00:00 2001 From: ndp-opendap Date: Fri, 8 Aug 2025 08:52:54 -0700 Subject: [PATCH 2/9] Updated Lexical Structure --- 01_data-model-and-serialized-rep.adoc | 41 +++++++++++++++++++++------ 1 file changed, 33 insertions(+), 8 deletions(-) diff --git a/01_data-model-and-serialized-rep.adoc b/01_data-model-and-serialized-rep.adoc index 4a00be3..e069099 100644 --- a/01_data-model-and-serialized-rep.adoc +++ b/01_data-model-and-serialized-rep.adoc @@ -1854,16 +1854,41 @@ Note that there is semantic limitation in the definition of '`chunk`': the number of bytes in the CHUNKDATA must be equal to SIZE. === Lexical Structure === +Each chunk header is defined by 4 8 bit byte values. One 8-bit byte for *Chunk Type*, and three 8-bit bytes +for *Chunk Size*. -[source,xml] +==== Chunk Type +The Chunk Type is held in the first, single, 8-bit byte of a *Chunk Header*. It is bitwise encoded and multiple +bits may be set. The bitwise encoding is as follows: +[source,c++] +---- +/* +Chunk Type Encoding: A single 8-bit byte, with the bitwise encoding: + 0 = data (0x00, 00000000) + 1 = end (0x01, 00000001) + 2 = error (0x02, 00000010) + 4 = Little-Endian (0x04, 00000100) + 8 = Checksums-Present (0x08, 00001000) (New! Now with checksum identification!) +*/ + +CHUNKTYPE = [0x00-0x0F] +---- + +==== Chunk Size +Chunk Size ia expressed as a sequence of three 8-bit bytes, interpreted as an integer value in network byte order. + +[source,c++] +---- +SIZE = [0x00-0xFF][0x00-0xFF][0x00-0xFF] +---- + +==== Chunk Data +The data content of a chunk is expressed as a sequence of 8-bit byte values whose length is encoded in the SIZE +section of the Chunk Header. + +[source,c++] ---- -/* A single 8-bit byte, - with the encoding 0 = data, 1 = end, 2 = error, 4 = Little-Endian */ -CHUNKTYPE = '\x00'|'\x01'|'\x02'|'\x4'|'\x06' -/* A sequence of three 8-bit bytes, - interpreted as an integer on network byte order */ -SIZE = [\0x00-\0xFF][\0x00-\0xFF][\0x00-\0xFF] -CHUNKDATA = [\0x00-\0xFF]* +CHUNKDATA = [0x00-0xFF]* ---- == Constraints == From b7336e2dd5a107b300f368ac98e2475dfd0f0a8b Mon Sep 17 00:00:00 2001 From: ndp-opendap Date: Fri, 8 Aug 2025 14:42:37 -0700 Subject: [PATCH 3/9] wip --- 01_data-model-and-serialized-rep.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/01_data-model-and-serialized-rep.adoc b/01_data-model-and-serialized-rep.adoc index e069099..5b5828f 100644 --- a/01_data-model-and-serialized-rep.adoc +++ b/01_data-model-and-serialized-rep.adoc @@ -1854,7 +1854,7 @@ Note that there is semantic limitation in the definition of '`chunk`': the number of bytes in the CHUNKDATA must be equal to SIZE. === Lexical Structure === -Each chunk header is defined by 4 8 bit byte values. One 8-bit byte for *Chunk Type*, and three 8-bit bytes +Each chunk header is defined by four 8-bit byte values. One 8-bit byte for *Chunk Type*, and three 8-bit bytes for *Chunk Size*. ==== Chunk Type From c71dc32e370723290dcd17090e3cd123ee4028b1 Mon Sep 17 00:00:00 2001 From: ndp-opendap Date: Fri, 8 Aug 2025 14:43:22 -0700 Subject: [PATCH 4/9] wip --- 01_data-model-and-serialized-rep.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/01_data-model-and-serialized-rep.adoc b/01_data-model-and-serialized-rep.adoc index 5b5828f..ffdd1f7 100644 --- a/01_data-model-and-serialized-rep.adoc +++ b/01_data-model-and-serialized-rep.adoc @@ -1875,7 +1875,7 @@ CHUNKTYPE = [0x00-0x0F] ---- ==== Chunk Size -Chunk Size ia expressed as a sequence of three 8-bit bytes, interpreted as an integer value in network byte order. +Chunk Size is expressed as a sequence of three 8-bit bytes, interpreted as an integer value in network byte order. [source,c++] ---- From 19df3a6625dc952f6d8435afd7533b26f2b400e2 Mon Sep 17 00:00:00 2001 From: ndp-opendap Date: Wed, 27 Aug 2025 15:07:14 -0700 Subject: [PATCH 5/9] wip --- 01_data-model-and-serialized-rep.adoc | 35 +++++++++++++-------------- 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/01_data-model-and-serialized-rep.adoc b/01_data-model-and-serialized-rep.adoc index ffdd1f7..692e946 100644 --- a/01_data-model-and-serialized-rep.adoc +++ b/01_data-model-and-serialized-rep.adoc @@ -1560,24 +1560,23 @@ representation of each top-level variable. The checksum is computed before any chunking Section link:#_dap4_chunked_data_representation[[1.7]]) is applied. -If the request to the server is a dmr-only request, then the server will -compute the checksum for each variable mentioned in the DMR and will -insert the "`_DAP4_Checksum_CRC32`" attribute in the DMR. Note that this -can have significant performance consequences since the server may need -to read and serialize all of the data for all of the variables mentioned -in the DMR even though that data is not transmitted to the client. - -For a dmr response the checksum is made visible to the client by the addition -of an Attribute to each top-level variable in the DMR. This attribute is named -"`_DAP4_Checksum_CRC32`". - -If the request to the server is a data request, then the checksum value -will follow the value of the variable in the data part of the response. -The computed checksum is appended to the serialized representation for -transmission to the client. Note that in this case, the client is -expected to add the "`_DAP4_Checksum_CRC32`" attribute to the DMR. - -The serialized data response indicates to the client that checksums are +If the request to the server is for a DMR with checksums (no data), then the +server will compute the checksum for each top level variable in the DMR and will +add an Attribute named `_DAP4_Checksum_CRC32_` into the variable's AttributeTable +in the DMR. Note: _This can have significant performance consequences, since +the server may need to read and serialize all the data for all the variables mentioned +in the DMR even though that data is not transmitted to the client_. + + +If the request to the server is for a DAP4 Data Response with checksums, then +the checksum value will follow the value of the variable in the data part of +the response. The attribute `_DAP4_Checksum_CRC32_` is NOT added +to the DMR included in a DAP4 Data Response. Instead, the client is expected to +retrieve the checksum value for each top-level variable in the DMR from the +serialized data response and add an Attribute named `_DAP4_Checksum_CRC32_` to +the associated top-level variable's AttributeTable in DMR. + +The DAP4 Data Response with checksums indicates to the client that checksums are part of the serialized binary data by setting bit 3 in the very first chunk header. See link:#_dap4_chunked_data_representation[ Section 1.7, DAP4 From 9a74c94572d73514c6e7f1689404756de5a4eae3 Mon Sep 17 00:00:00 2001 From: ndp-opendap Date: Thu, 28 Aug 2025 09:38:51 -0700 Subject: [PATCH 6/9] wip --- 01_data-model-and-serialized-rep.adoc | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/01_data-model-and-serialized-rep.adoc b/01_data-model-and-serialized-rep.adoc index 692e946..d88da9e 100644 --- a/01_data-model-and-serialized-rep.adoc +++ b/01_data-model-and-serialized-rep.adoc @@ -1573,12 +1573,11 @@ the checksum value will follow the value of the variable in the data part of the response. The attribute `_DAP4_Checksum_CRC32_` is NOT added to the DMR included in a DAP4 Data Response. Instead, the client is expected to retrieve the checksum value for each top-level variable in the DMR from the -serialized data response and add an Attribute named `_DAP4_Checksum_CRC32_` to -the associated top-level variable's AttributeTable in DMR. +serialized data response and (optionally) add an Attribute named `_DAP4_Checksum_CRC32_` to +the associated top-level variable's AttributeTable in the in memory DMR object. The DAP4 Data Response with checksums indicates to the client that checksums are -part of the serialized binary data by setting bit 3 in the very first -chunk header. +part of the serialized binary data by setting bit 3 in the very first chunk header. See link:#_dap4_chunked_data_representation[ Section 1.7, DAP4 Chunked Data Representaion ]. From 5838cd44d6bffc0810098a0018399f9c59cff124 Mon Sep 17 00:00:00 2001 From: ndp-opendap Date: Fri, 29 Aug 2025 10:06:17 -0700 Subject: [PATCH 7/9] wip --- 01_data-model-and-serialized-rep.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/01_data-model-and-serialized-rep.adoc b/01_data-model-and-serialized-rep.adoc index d88da9e..294de65 100644 --- a/01_data-model-and-serialized-rep.adoc +++ b/01_data-model-and-serialized-rep.adoc @@ -1560,7 +1560,7 @@ representation of each top-level variable. The checksum is computed before any chunking Section link:#_dap4_chunked_data_representation[[1.7]]) is applied. -If the request to the server is for a DMR with checksums (no data), then the +If the request to the server is for a DMR with checksums added (no data), then the server will compute the checksum for each top level variable in the DMR and will add an Attribute named `_DAP4_Checksum_CRC32_` into the variable's AttributeTable in the DMR. Note: _This can have significant performance consequences, since From 18d1de75d99758ba104b936c3599cb5e65e4b402 Mon Sep 17 00:00:00 2001 From: ndp-opendap Date: Tue, 2 Sep 2025 16:05:28 -0700 Subject: [PATCH 8/9] wip --- 01_data-model-and-serialized-rep.adoc | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/01_data-model-and-serialized-rep.adoc b/01_data-model-and-serialized-rep.adoc index 294de65..99dcd48 100644 --- a/01_data-model-and-serialized-rep.adoc +++ b/01_data-model-and-serialized-rep.adoc @@ -1560,10 +1560,12 @@ representation of each top-level variable. The checksum is computed before any chunking Section link:#_dap4_chunked_data_representation[[1.7]]) is applied. -If the request to the server is for a DMR with checksums added (no data), then the +If the request to the server is for a DMR-only response with checksums added (no data), then the server will compute the checksum for each top level variable in the DMR and will add an Attribute named `_DAP4_Checksum_CRC32_` into the variable's AttributeTable -in the DMR. Note: _This can have significant performance consequences, since +in the DMR. + +NOTE: _This can have significant performance consequences, since the server may need to read and serialize all the data for all the variables mentioned in the DMR even though that data is not transmitted to the client_. @@ -1579,7 +1581,7 @@ the associated top-level variable's AttributeTable in the in memory DMR object. The DAP4 Data Response with checksums indicates to the client that checksums are part of the serialized binary data by setting bit 3 in the very first chunk header. See link:#_dap4_chunked_data_representation[ Section 1.7, DAP4 -Chunked Data Representaion ]. +Chunked Data Representation ]. The default checksum algorithm is CRC32. So the size of each checksum inserted in the serialization will be a 32 bit integer. The checksum From 4352e083847d2d6c914fe504b142bcfeb959c7f5 Mon Sep 17 00:00:00 2001 From: ndp-opendap Date: Wed, 10 Sep 2025 15:48:58 -0700 Subject: [PATCH 9/9] wip --- 01_data-model-and-serialized-rep.adoc | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/01_data-model-and-serialized-rep.adoc b/01_data-model-and-serialized-rep.adoc index 99dcd48..8665797 100644 --- a/01_data-model-and-serialized-rep.adoc +++ b/01_data-model-and-serialized-rep.adoc @@ -1868,7 +1868,14 @@ Chunk Type Encoding: A single 8-bit byte, with the bitwise encoding: 1 = end (0x01, 00000001) 2 = error (0x02, 00000010) 4 = Little-Endian (0x04, 00000100) - 8 = Checksums-Present (0x08, 00001000) (New! Now with checksum identification!) + 8 = Checksums-Present (0x08, 00001000) + 16 = ReservedForFuture (0x10, 00010000) + 32 = ReservedForFuture (0x20, 00100000) + 64 = ReservedForFuture (0x40, 01000000) + 128 = ReservedForFuture (0x80, 10000000) + +The Chunk Type Encoding byte should be evaluated not as a single value +but rather by using & for each bit map value. */ CHUNKTYPE = [0x00-0x0F]