Add new framing chunk types without checksums

Adds two new chunk types to the Snappy framing format: compressed data
without a checksum, and uncompressed data without a checksum.  These
types are identical to their existing counterparts except they do not
contain a CRC-32C checksum.  Essentially, this makes including
checksums for each data chunk optional rather than required.

In some use cases, computing the CRC-32C checksums for the data chunks
in the Snappy framing format ends up dominating execution time.
Eliminating the checksums provides massive 2.5x performance
improvements in our uses of Snappy for compressing address trace data
prior to storing to disk.

Existing readers of the Snappy framing format would be expected to
fail up front on an unknown chunk type when encountering the new
types, until updated to handle them, which should be a simple coding
change.
This commit is contained in:
Derek Bruening 2022-05-23 13:40:19 -04:00
parent 6a2b78a379
commit 45ead86489
1 changed files with 19 additions and 5 deletions

View File

@ -1,9 +1,9 @@
Snappy framing format description
Last revised: 2013-10-25
Last revised: 2022-05-23
This format decribes a framing format for Snappy, allowing compressing to
files or streams that can then more easily be decompressed without having
to hold the entire stream in memory. It also provides data checksums to
to hold the entire stream in memory. It also provides optional data checksums to
help verify integrity. It does not provide metadata checksums, so it does
not protect against e.g. all forms of truncations.
@ -106,7 +106,21 @@ no more than 65536 data bytes, so the maximum legal chunk length with the
checksum is 65540.
4.4. Padding (chunk type 0xfe)
4.4. Compressed data without checksum (chunk type 0x02)
This chunk type is identical to "Compressed data" (type 0x00) except that
the compressed data is _not_ preceded by a checksum. The same size and
other limitations apply.
4.5. Uncompressed data without checksum (chunk type 0x03)
This chunk type is identical to "Uncompressed data" (type 0x01) except that
the data is _not_ preceded by a checksum. The same size and other
limitations apply.
4.6. Padding (chunk type 0xfe)
Padding chunks allow a compressor to increase the size of the data stream
so that it complies with external demands, e.g. that the total number of
@ -117,7 +131,7 @@ should be zero, but decompressors must not try to interpret or verify the
padding data in any way.
4.5. Reserved unskippable chunks (chunk types 0x02-0x7f)
4.7. Reserved unskippable chunks (chunk types 0x04-0x7f)
These are reserved for future expansion. A decoder that sees such a chunk
should immediately return an error, as it must assume it cannot decode the
@ -126,7 +140,7 @@ stream correctly.
Future versions of this specification may define meanings for these chunks.
4.6. Reserved skippable chunks (chunk types 0x80-0xfd)
4.8. Reserved skippable chunks (chunk types 0x80-0xfd)
These are also reserved for future expansion, but unlike the chunks
described in 4.5, a decoder seeing these must skip them and continue