From 45ead86489bda317e575e506a7e947997b4868dc Mon Sep 17 00:00:00 2001 From: Derek Bruening Date: Mon, 23 May 2022 13:40:19 -0400 Subject: [PATCH] Add new framing chunk types without checksums Adds two new chunk types to the Snappy framing format: compressed data without a checksum, and uncompressed data without a checksum. These types are identical to their existing counterparts except they do not contain a CRC-32C checksum. Essentially, this makes including checksums for each data chunk optional rather than required. In some use cases, computing the CRC-32C checksums for the data chunks in the Snappy framing format ends up dominating execution time. Eliminating the checksums provides massive 2.5x performance improvements in our uses of Snappy for compressing address trace data prior to storing to disk. Existing readers of the Snappy framing format would be expected to fail up front on an unknown chunk type when encountering the new types, until updated to handle them, which should be a simple coding change. --- framing_format.txt | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/framing_format.txt b/framing_format.txt index 9764e83..9d98706 100644 --- a/framing_format.txt +++ b/framing_format.txt @@ -1,9 +1,9 @@ Snappy framing format description -Last revised: 2013-10-25 +Last revised: 2022-05-23 This format decribes a framing format for Snappy, allowing compressing to files or streams that can then more easily be decompressed without having -to hold the entire stream in memory. It also provides data checksums to +to hold the entire stream in memory. It also provides optional data checksums to help verify integrity. It does not provide metadata checksums, so it does not protect against e.g. all forms of truncations. @@ -106,7 +106,21 @@ no more than 65536 data bytes, so the maximum legal chunk length with the checksum is 65540. -4.4. Padding (chunk type 0xfe) +4.4. Compressed data without checksum (chunk type 0x02) + +This chunk type is identical to "Compressed data" (type 0x00) except that +the compressed data is _not_ preceded by a checksum. The same size and +other limitations apply. + + +4.5. Uncompressed data without checksum (chunk type 0x03) + +This chunk type is identical to "Uncompressed data" (type 0x01) except that +the data is _not_ preceded by a checksum. The same size and other +limitations apply. + + +4.6. Padding (chunk type 0xfe) Padding chunks allow a compressor to increase the size of the data stream so that it complies with external demands, e.g. that the total number of @@ -117,7 +131,7 @@ should be zero, but decompressors must not try to interpret or verify the padding data in any way. -4.5. Reserved unskippable chunks (chunk types 0x02-0x7f) +4.7. Reserved unskippable chunks (chunk types 0x04-0x7f) These are reserved for future expansion. A decoder that sees such a chunk should immediately return an error, as it must assume it cannot decode the @@ -126,7 +140,7 @@ stream correctly. Future versions of this specification may define meanings for these chunks. -4.6. Reserved skippable chunks (chunk types 0x80-0xfd) +4.8. Reserved skippable chunks (chunk types 0x80-0xfd) These are also reserved for future expansion, but unlike the chunks described in 4.5, a decoder seeing these must skip them and continue