Commit Graph

64 Commits

Author SHA1 Message Date
Marcin Kowalczyk 984b191f0f Fix the remaining occurrence of non-const `std::string::data()`.
PiperOrigin-RevId: 479818960
2022-10-08 21:59:12 +02:00
Matt Callanan 974fcc49e8 Fix compilation errors under C++11.
`std::string::data()` is const-only until C++17.

PiperOrigin-RevId: 479708109
2022-10-08 08:41:35 +02:00
Matt Callanan 9758c9dfd7 Add `snappy::CompressFromIOVec`.
This reads from an `iovec` array rather than from a `char` array as in `snappy::Compress`.

PiperOrigin-RevId: 476930623
2022-09-29 09:32:28 -07:00
Victor Costan cbb83a1d64 Migrate feature detection macro checks from #ifdef to #if.
The #if predicate evaluates to false if the macro is undefined, or
defined to 0. #ifdef (and its synonym #if defined) evaluates to false
only if the macro is undefined.

The new setup allows differentiating between setting a macro to 0 (to
express that the capability definitely does not exist / should not be
used) and leaving a macro undefined (to express not knowing whether a
capability exists / not caring if a capability is used).

PiperOrigin-RevId: 391094241
2021-08-16 18:26:33 +00:00
Snappy Team d8f5dd8eca Clarify, in a comment, that offset/256 fits in 3 bits. It has to in this context, because the other 5 bits in the byte are used for len-4 and the tag.
PiperOrigin-RevId: 374926553
2021-05-25 02:20:42 +00:00
Victor Costan 5e7c14bd05 Add stubs for abseil flags.
This CL also removes support for using the gflags library to modify the
flags.

PiperOrigin-RevId: 361583626
2021-03-08 17:26:48 +00:00
Snappy Team 453942b38f Add absl::GetFlag and absl::SetFlag to uses of flags.
PiperOrigin-RevId: 357807059
2021-02-17 04:41:41 +00:00
Victor Costan 4ebd8b2f23 Split benchmarks and test tools into separate targets.
This lets us remove main() from snappy_bench.cc and snappy_unittest.cc,
which simplifies integrating these tests and benchmarks with other
suites.

PiperOrigin-RevId: 347857427
2020-12-16 19:09:56 +00:00
Victor Costan 6aa79cb471 Wrap snappy_unittest in an anonymous namespace and remove static from functions.
PiperOrigin-RevId: 347541028
2020-12-15 06:18:35 +00:00
Victor Costan 549685a598 Remove custom testing and benchmarking code.
Snappy includes a testing framework, which implements a subset of the
Google Test API, and can be used when Google Test is not available.
Snappy also includes a micro-benchmark framework, which implements an
old version of the Google Benchmark API.

This CL replaces the custom test and micro-benchmark frameworks with
google/googletest and google/benchmark. The code is vendored in
third_party/ via git submodules. The setup is similar to google/crc32c
and google/leveldb.

This CL also updates the benchmarking code to the modern Google
Benchmark API.

Benchmark results are expected to be more precise, as the old framework
ran each benchmark with a fixed number of iterations, whereas Google
Benchmark keeps iterating until the noise is low.

PiperOrigin-RevId: 347456142
2020-12-14 21:27:31 +00:00
Snappy Team 3b571656fa 1) Improve the lookup table data to require less instructions to extract the necessary data. We now store len - offset in a signed int16, this happens to remove masking offset in the calculations and the calculations that need to be done precisely give the flags that we need for testing correctness.
2) Replace offset extraction with a lookup mask. This is less uops and is needed because we need to special case type 3 to always return 0 as to properly trigger the fallback.
3) Unroll the loop twice, this removes some loop-condition checks AND it improves the generated assembly. The loop variables tend to end up in a different register requiring mov's having two consecutive copies allows the elision of the mov's.

PiperOrigin-RevId: 346663328
2020-12-14 02:48:03 +00:00
Shahriar Rouf a9730ed505 Optimize zippy decompression by making IncrementalCopy faster.
When SSSE3 is available:
- Use PSHUFB (_mm_shuffle_epi8) to handle pattern size 1 to 15 (previously it handled size 1 to 7).
- This enables us to do 16 byte copies instead of 8 bytes copies because we know that the pattern size >= 16.
- Use shuffle-reshuffle strategy to generate the next pattern after loading the initial pattern. This enables us to write 4 conditionals (similar to when pattern size >= 16) which would allow FDO to layout the code with respect to actual probabilities of each length.
- The PSHUFB masks are now generated programmatically at compile-time.

When SSSE3 is unavailable:
- No change.

In both cases:
- assert(op < op_limit) in IncrementalCopy so that we can check 'op_limit <= buf_limit - 15' instead of 'op_limit <= buf_limit - 16'. All existing call sites of IncrementalCopy guarantee this.

'bin' case is notably >20% faster because it has many repeated character patterns (i.e. pattern_size = 1).

PiperOrigin-RevId: 346454471
2020-12-14 02:47:49 +00:00
Snappy Team 01a566f825 Fix opensource version
PiperOrigin-RevId: 343272548
2020-11-19 17:06:26 +00:00
Snappy Team 616b8229b6 Add LZ4 as a benchmark option. Snappy is starting to look really good compared to LZ4. LZ4 is considered the fastest solution by many on internet. We now see that Snappy is actually becoming very competitive with compression a little faster and decompression slower but certainly not terribly slower.
PiperOrigin-RevId: 343140860
2020-11-18 23:22:04 +00:00
Snappy Team e4a6e97b91 Extend validate benchmarks over all types and also add a medley for validation.
I also made the compression happen only once per benchmark. This way we get a cleaner measurement of #branch-misses using "perf stat". Compression suffers naturally from a large number of branch misses which was polluting the measurements.

This showed that with the new decompression the branch misses is actually much lower then initially reported, only .2% and very stable, ie. doesn't really fluctuate with how you execute the benchmarks.

PiperOrigin-RevId: 342628576
2020-11-18 23:21:55 +00:00
Snappy Team 11e5165b98 Add a benchmark that decreased the branch prediction memorization by increasing the amount of independent branches executed per benchmark iteration.
PiperOrigin-RevId: 342242843
2020-11-18 23:21:12 +00:00
Victor Costan c98344f626 Fix Clang/GCC compilation warnings.
This makes it easier to adopt snappy in other projects.

PiperOrigin-RevId: 309958249
2020-05-05 16:15:02 +00:00
Victor Costan 113cd97ab3 Tighten types on a few for loops.
* Replace post-increment with pre-increment in for loops.
* Replace unsigned int counters with precise types, like uint8_t.
* Switch to C++11 iterating loops when possible.

PiperOrigin-RevId: 309724233
2020-05-04 12:32:00 +00:00
Victor Costan 63620c06d2 Add some std:: qualifiers to types and functions.
PiperOrigin-RevId: 309110343
2020-04-29 22:31:55 +00:00
Victor Costan 5417da69b7 Switch from C headers to C++ headers.
This CL makes the following substitutions.

* assert.h -> cassert
* math.h -> cmath
* stdarg.h -> cstdarg
* stdio.h -> cstdio
* stdlib.h -> cstdlib
* string.h -> cstring

stddef.h and stdint.h are not migrated to C++ headers.

PiperOrigin-RevId: 309074805
2020-04-29 19:38:03 +00:00
Victor Costan 231b8be076 Migrate to standard integral types.
The following changes are done via find/replace.
* int8 -> int8_t
* int16 -> int16_t
* int32 -> int32_t
* int64 -> int64_t

The aliases were removed from snappy-stubs-public.h.

PiperOrigin-RevId: 306141557
2020-04-12 20:10:03 +00:00
Victor Costan 14bef66290 Modernize memcpy() and memmove() usage.
This CL replaces memcpy() with std::memcpy()
and memmove() with std::memmove(), and #includes
<cstring> in files that use either function.

PiperOrigin-RevId: 306067788
2020-04-12 00:06:15 +00:00
Snappy Team 4dfcad9f4e assertion failure on darwin_x86_64, have to investigage
PiperOrigin-RevId: 303428229
2020-04-11 04:41:07 +00:00
Snappy Team e19178748f assertion failure on darwin_x86_64, have to investigage
PiperOrigin-RevId: 303346402
2020-04-11 04:40:57 +00:00
Snappy Team 0faf56378e This cl does two things
1) It shaves of a few cycles from the data dependency chain. By using "shrd" instead of a load.
2) The important loop is finding small copies (4-12) which are either "copy 1", or "copy 2" depending if the offset fits <2048. It turns out that this is a branch that is mispredicted often. Due to the long dependency chain the CPU is running with IPC~1 anyway so we can freely add instructions to instead emit copies branchfree. This reduces the branch misspredicts from 15% to 11% (for BM_ZFlat/6 txt1) and from 5.6% to 4% (for BM_ZFlat/10 or pb).

PiperOrigin-RevId: 303328967
2020-04-11 04:40:48 +00:00
Victor Costan f48c38f91a Fix one forgotten instance of StringPrintf -> StrFormat.
PiperOrigin-RevId: 278315159
2019-11-04 00:09:19 -08:00
Victor Costan c9212708b2 Fix build errors.
PiperOrigin-RevId: 278310119
2019-11-03 23:24:02 -08:00
Snappy Team 8f32e3fbc0 Internal changes
PiperOrigin-RevId: 277555451
2019-11-03 21:51:08 -08:00
Victor Costan 62363d9a79 Fully qualify std::string.
This is in preparation for removing the snappy::string alias
of std::string.

PiperOrigin-RevId: 271383199
2019-09-26 10:57:29 -07:00
Victor Costan 44d84addf2 Fix benchmarks.
PiperOrigin-RevId: 264501168
2019-08-20 17:17:53 -07:00
Victor Costan c6bf1170d8 Fix benchmarks.
PiperOrigin-RevId: 264420835
2019-08-20 13:16:53 -07:00
Shahriar Rouf 4c7f2d5dfb Add BM_ZFlatAll, BM_ZFlatIncreasingTableSize benchmarks to see how good zippy performs when it is processing different data one after the other.
PiperOrigin-RevId: 257518137
2019-08-19 14:30:00 -07:00
Chris Mumford c76b053449 Sync TODO and comment processing with external repo.
Copybara transforms code slightly different than MOE. One
example is the TODO username stripping where Copybara
produces different results than MOE did. This change
moves the Copybara versions of comments to the public
repository.

Note: These changes didn't originate in cl/247950252.

PiperOrigin-RevId: 247950252
2019-05-14 11:02:57 -07:00
costan 9a6fa91217 Remove use of std::uniform_distribution<uint8_t>.
A previous CL removed use of Google-specific random number generating
functionality, such as ACMRandom, and used the C++11 standard library
instead. The CL used std::uniform_distribution<uint8_t> to generate
random bytes, which seems to be unsupported by the standard [1, 2].

For better or for worse, our toolchain does not complain. However,
Visual Studio errors out with "invalid template argument for
uniform_int_distribution: N4659 29.6.1.1 [rand.req.genl]/1e requires one
of short, int, long, long long, unsigned short, unsigned int, unsigned
long, or unsigned long long".

This CL replaces std::uniform_distribution<uint8_t> with
std::uniform_distribution<int>(0, 255) and appropriate static_cast<>s.

[1] http://eel.is/c++draft/rand.req.genl#1.6
[2] be83c0b472/source/numerics.tex (L1807-L1817)
2019-01-06 12:48:39 -08:00
costan 3fcbc47f99 Use std random number generators in tests.
An earlier CL introduced absl::Uniform, which is not yet open sourced,
and therefore unavailable in the open source build.

This CL removes absl::Uniform and ACMRandom in favor of equivalent C++11
standard random generators. Abseil promises to be faster than the
standard library, but we can afford a speed hit in tests in return for
an easier open sourcing story.
2019-01-04 19:09:39 -08:00
jueminyang 254966c71e Migrate to use absl::random 2019-01-04 19:08:11 -08:00
alkis 53a38e5e33 Reduce number of allocations when compressing and simplify the code.
Before we were allocating at least once: twice with large table and
thrice when we used a scratch buffer. With this approach we always
allocate once.

  name                                          old speed               new speed               delta
  BM_UFlat/0      [html             ]           2.45GB/s ± 0%           2.45GB/s ± 0%   -0.13%        (p=0.000 n=11+11)
  BM_UFlat/1      [urls             ]           1.19GB/s ± 0%           1.22GB/s ± 0%   +2.48%        (p=0.000 n=11+11)
  BM_UFlat/2      [jpg              ]           17.2GB/s ± 2%           17.3GB/s ± 1%     ~           (p=0.193 n=11+11)
  BM_UFlat/3      [jpg_200          ]           1.52GB/s ± 0%           1.51GB/s ± 0%   -0.78%         (p=0.000 n=10+9)
  BM_UFlat/4      [pdf              ]           12.5GB/s ± 1%           12.5GB/s ± 1%     ~             (p=0.881 n=9+9)
  BM_UFlat/5      [html4            ]           1.86GB/s ± 0%           1.86GB/s ± 0%     ~           (p=0.123 n=11+11)
  BM_UFlat/6      [txt1             ]            793MB/s ± 0%            799MB/s ± 0%   +0.78%         (p=0.000 n=11+9)
  BM_UFlat/7      [txt2             ]            739MB/s ± 0%            744MB/s ± 0%   +0.77%        (p=0.000 n=11+11)
  BM_UFlat/8      [txt3             ]            839MB/s ± 0%            845MB/s ± 0%   +0.71%        (p=0.000 n=11+11)
  BM_UFlat/9      [txt4             ]            678MB/s ± 0%            685MB/s ± 0%   +1.01%        (p=0.000 n=11+11)
  BM_UFlat/10     [pb               ]           3.08GB/s ± 0%           3.12GB/s ± 0%   +1.21%        (p=0.000 n=11+11)
  BM_UFlat/11     [gaviota          ]            975MB/s ± 0%            976MB/s ± 0%   +0.11%        (p=0.000 n=11+11)
  BM_UFlat/12     [cp               ]           1.73GB/s ± 1%           1.74GB/s ± 1%   +0.46%        (p=0.010 n=11+11)
  BM_UFlat/13     [c                ]           1.53GB/s ± 0%           1.53GB/s ± 0%     ~           (p=0.987 n=11+10)
  BM_UFlat/14     [lsp              ]           1.65GB/s ± 0%           1.63GB/s ± 1%   -1.04%        (p=0.000 n=11+11)
  BM_UFlat/15     [xls              ]           1.08GB/s ± 0%           1.15GB/s ± 0%   +6.12%        (p=0.000 n=10+11)
  BM_UFlat/16     [xls_200          ]            944MB/s ± 0%            920MB/s ± 3%   -2.51%         (p=0.000 n=9+11)
  BM_UFlat/17     [bin              ]           1.86GB/s ± 0%           1.87GB/s ± 0%   +0.68%        (p=0.000 n=10+11)
  BM_UFlat/18     [bin_200          ]           1.91GB/s ± 3%           1.92GB/s ± 5%     ~           (p=0.356 n=11+11)
  BM_UFlat/19     [sum              ]           1.31GB/s ± 0%           1.40GB/s ± 0%   +6.53%        (p=0.000 n=11+11)
  BM_UFlat/20     [man              ]           1.42GB/s ± 0%           1.42GB/s ± 0%   +0.33%        (p=0.000 n=10+10)
2019-01-04 19:07:49 -08:00
jefflim 27ff0af12a Improve performance of zippy decompression to IOVecs by up to almost 50%
1) Simplify loop condition for small pattern IncrementalCopy
2) Use pointers rather than indices to track current iovec.
3) Use fast IncrementalCopy
4) Bypass Append check from within AppendFromSelf

While this code greatly improves the performance of ZippyIOVecWriter, a
bigger question is whether IOVec writing should be improved, or removed.

Perf tests:

name                                 old speed      new speed      delta
BM_UFlat/0      [html             ]  2.13GB/s ± 0%  2.14GB/s ± 1%     ~
BM_UFlat/1      [urls             ]  1.22GB/s ± 0%  1.24GB/s ± 0%   +1.87%
BM_UFlat/2      [jpg              ]  17.2GB/s ± 1%  17.1GB/s ± 0%     ~
BM_UFlat/3      [jpg_200          ]  1.55GB/s ± 0%  1.53GB/s ± 2%     ~
BM_UFlat/4      [pdf              ]  12.8GB/s ± 1%  12.7GB/s ± 2%   -0.36%
BM_UFlat/5      [html4            ]  1.89GB/s ± 0%  1.90GB/s ± 1%     ~
BM_UFlat/6      [txt1             ]   811MB/s ± 0%   829MB/s ± 1%   +2.24%
BM_UFlat/7      [txt2             ]   756MB/s ± 0%   774MB/s ± 1%   +2.41%
BM_UFlat/8      [txt3             ]   860MB/s ± 0%   879MB/s ± 1%   +2.16%
BM_UFlat/9      [txt4             ]   699MB/s ± 0%   715MB/s ± 1%   +2.31%
BM_UFlat/10     [pb               ]  2.64GB/s ± 0%  2.65GB/s ± 1%     ~
BM_UFlat/11     [gaviota          ]  1.00GB/s ± 0%  0.99GB/s ± 2%     ~
BM_UFlat/12     [cp               ]  1.66GB/s ± 1%  1.66GB/s ± 2%     ~
BM_UFlat/13     [c                ]  1.53GB/s ± 0%  1.47GB/s ± 5%   -3.97%
BM_UFlat/14     [lsp              ]  1.60GB/s ± 1%  1.55GB/s ± 5%   -3.41%
BM_UFlat/15     [xls              ]  1.12GB/s ± 0%  1.15GB/s ± 0%   +1.93%
BM_UFlat/16     [xls_200          ]   918MB/s ± 2%   929MB/s ± 1%   +1.15%
BM_UFlat/17     [bin              ]  1.86GB/s ± 0%  1.89GB/s ± 1%   +1.61%
BM_UFlat/18     [bin_200          ]  1.90GB/s ± 1%  1.97GB/s ± 1%   +3.67%
BM_UFlat/19     [sum              ]  1.32GB/s ± 0%  1.33GB/s ± 1%     ~
BM_UFlat/20     [man              ]  1.39GB/s ± 0%  1.36GB/s ± 3%     ~
BM_UValidate/0  [html             ]  2.85GB/s ± 3%  2.90GB/s ± 0%     ~
BM_UValidate/1  [urls             ]  1.57GB/s ± 0%  1.56GB/s ± 0%   -0.20%
BM_UValidate/2  [jpg              ]   824GB/s ± 0%   825GB/s ± 0%   +0.11%
BM_UValidate/3  [jpg_200          ]  2.01GB/s ± 0%  2.02GB/s ± 0%   +0.10%
BM_UValidate/4  [pdf              ]  30.4GB/s ±11%  33.5GB/s ± 0%     ~
BM_UIOVec/0     [html             ]   604MB/s ± 0%   856MB/s ± 0%  +41.70%
BM_UIOVec/1     [urls             ]   440MB/s ± 0%   660MB/s ± 0%  +49.91%
BM_UIOVec/2     [jpg              ]  15.1GB/s ± 1%  15.3GB/s ± 1%   +1.22%
BM_UIOVec/3     [jpg_200          ]   567MB/s ± 1%   629MB/s ± 0%  +10.89%
BM_UIOVec/4     [pdf              ]  7.16GB/s ± 2%  8.56GB/s ± 1%  +19.64%
BM_UFlatSink/0  [html             ]  2.13GB/s ± 0%  2.16GB/s ± 0%   +1.47%
BM_UFlatSink/1  [urls             ]  1.22GB/s ± 0%  1.25GB/s ± 0%   +2.18%
BM_UFlatSink/2  [jpg              ]  17.1GB/s ± 2%  17.1GB/s ± 2%     ~
BM_UFlatSink/3  [jpg_200          ]  1.51GB/s ± 1%  1.53GB/s ± 2%   +1.11%
BM_UFlatSink/4  [pdf              ]  12.7GB/s ± 2%  12.8GB/s ± 1%   +0.67%
BM_UFlatSink/5  [html4            ]  1.90GB/s ± 0%  1.92GB/s ± 0%   +1.31%
BM_UFlatSink/6  [txt1             ]   810MB/s ± 0%   835MB/s ± 0%   +3.04%
BM_UFlatSink/7  [txt2             ]   755MB/s ± 0%   779MB/s ± 0%   +3.19%
BM_UFlatSink/8  [txt3             ]   859MB/s ± 0%   884MB/s ± 0%   +2.86%
BM_UFlatSink/9  [txt4             ]   698MB/s ± 0%   718MB/s ± 0%   +2.96%
BM_UFlatSink/10 [pb               ]  2.64GB/s ± 0%  2.67GB/s ± 0%   +1.16%
BM_UFlatSink/11 [gaviota          ]  1.00GB/s ± 0%  1.01GB/s ± 0%   +1.04%
BM_UFlatSink/12 [cp               ]  1.66GB/s ± 1%  1.68GB/s ± 1%   +0.83%
BM_UFlatSink/13 [c                ]  1.52GB/s ± 1%  1.53GB/s ± 0%   +0.38%
BM_UFlatSink/14 [lsp              ]  1.60GB/s ± 1%  1.61GB/s ± 0%   +0.91%
BM_UFlatSink/15 [xls              ]  1.12GB/s ± 0%  1.15GB/s ± 0%   +1.96%
BM_UFlatSink/16 [xls_200          ]   906MB/s ± 3%   920MB/s ± 1%   +1.55%
BM_UFlatSink/17 [bin              ]  1.86GB/s ± 0%  1.90GB/s ± 0%   +2.15%
BM_UFlatSink/18 [bin_200          ]  1.85GB/s ± 2%  1.92GB/s ± 2%   +4.01%
BM_UFlatSink/19 [sum              ]  1.32GB/s ± 1%  1.35GB/s ± 0%   +2.23%
BM_UFlatSink/20 [man              ]  1.39GB/s ± 1%  1.40GB/s ± 0%   +1.12%
BM_ZFlat/0      [html (22.31 %)   ]   800MB/s ± 0%   793MB/s ± 0%   -0.95%
BM_ZFlat/1      [urls (47.78 %)   ]   423MB/s ± 0%   424MB/s ± 0%   +0.11%
BM_ZFlat/2      [jpg (99.95 %)    ]  12.0GB/s ± 2%  12.0GB/s ± 4%     ~
BM_ZFlat/3      [jpg_200 (73.00 %)]   592MB/s ± 3%   594MB/s ± 2%     ~
BM_ZFlat/4      [pdf (83.30 %)    ]  7.26GB/s ± 1%  7.23GB/s ± 2%   -0.49%
BM_ZFlat/5      [html4 (22.52 %)  ]   738MB/s ± 0%   739MB/s ± 0%   +0.17%
BM_ZFlat/6      [txt1 (57.88 %)   ]   286MB/s ± 0%   285MB/s ± 0%   -0.09%
BM_ZFlat/7      [txt2 (61.91 %)   ]   264MB/s ± 0%   264MB/s ± 0%   +0.08%
BM_ZFlat/8      [txt3 (54.99 %)   ]   300MB/s ± 0%   300MB/s ± 0%     ~
BM_ZFlat/9      [txt4 (66.26 %)   ]   248MB/s ± 0%   247MB/s ± 0%   -0.20%
BM_ZFlat/10     [pb (19.68 %)     ]  1.04GB/s ± 0%  1.03GB/s ± 0%   -1.17%
BM_ZFlat/11     [gaviota (37.72 %)]   451MB/s ± 0%   450MB/s ± 0%   -0.35%
BM_ZFlat/12     [cp (48.12 %)     ]   543MB/s ± 0%   538MB/s ± 0%   -1.04%
BM_ZFlat/13     [c (42.47 %)      ]   638MB/s ± 1%   643MB/s ± 0%   +0.68%
BM_ZFlat/14     [lsp (48.37 %)    ]   686MB/s ± 0%   691MB/s ± 1%   +0.76%
BM_ZFlat/15     [xls (41.23 %)    ]   636MB/s ± 0%   633MB/s ± 0%   -0.52%
BM_ZFlat/16     [xls_200 (78.00 %)]   523MB/s ± 2%   520MB/s ± 2%   -0.56%
BM_ZFlat/17     [bin (18.11 %)    ]  1.01GB/s ± 0%  1.01GB/s ± 0%   +0.50%
BM_ZFlat/18     [bin_200 (7.50 %) ]  2.45GB/s ± 1%  2.44GB/s ± 1%   -0.54%
BM_ZFlat/19     [sum (48.96 %)    ]   487MB/s ± 0%   478MB/s ± 0%   -1.89%
BM_ZFlat/20     [man (59.21 %)    ]   567MB/s ± 1%   566MB/s ± 1%     ~

The BM_UFlat/13 and BM_UFlat/14 results showed high variance, so I reran them:

name               old speed      new speed      delta
BM_UFlat/13 [c  ]  1.53GB/s ± 0%  1.53GB/s ± 1%    ~
BM_UFlat/14 [lsp]  1.61GB/s ± 1%  1.61GB/s ± 1%  +0.25%
2018-08-07 23:41:17 -07:00
costan c8049c5827 Replace getpagesize() with sysconf(_SC_PAGESIZE).
getpagesize() has been removed from POSIX.1-2001. Its recommended
replacement is sysconf(_SC_PAGESIZE).
2017-08-01 14:38:57 -07:00
ysaed 82deffcde7 Remove benchmarking support for fastlz. 2017-06-28 18:33:55 -07:00
jyrki 83179dd8be Remove quicklz and lzf support in benchmarks. 2017-06-05 13:54:10 -07:00
costan ed3b7b242b Clean up unused function warnings in snappy. 2017-03-17 13:59:03 -07:00
costan 8b60aac4fd Remove "using namespace std;" from zippy-stubs-internal.h.
This makes it easier to build zippy, as some compiles require a warning
suppression to accept "using namespace std".
2017-03-13 13:03:01 -07:00
scrubbed 039b3a7ace Add std:: prefix to STL non-type names.
In order to disable global using declarations, this CL qualifies
stl names with the std namespace.
2017-03-08 11:42:30 -08:00
Behzad Nouri 818b583387 adds std:: to stl types (#061) 2017-01-26 21:43:13 +01:00
Geoff Pike 38a5ec5fca Re-work fast path that emits copies in zippy compression.
The primary motivation for the change is that FindMatchLength is
likely to discover a difference in the first 8 bytes it compares.
If that occurs then we know the length of the match is less than 12,
because FindMatchLength is invoked after a 4-byte match is found.
When emitting a copy, it is useful to know that the length is less
than 12 because the two-byte variant of an emitted copy requires that.

This is a performance-tuning change that should not affect the
library's behavior.

With FDO on perflab/Haswell the geometric mean for ZFlat/* went from
47,290ns to 45,741ns, an improvement of 3.4%.

SAMPLE (before)

BM_ZFlat/0      102824     102650      40691 951.4MB/s  html (22.31 %)
BM_ZFlat/1     1293512    1290442       3225 518.9MB/s  urls (47.78 %)
BM_ZFlat/2       10373      10353     417959 11.1GB/s  jpg (99.95 %)
BM_ZFlat/3         268        268   15745324 712.4MB/s  jpg_200 (73.00 %)
BM_ZFlat/4       12137      12113     342462 7.9GB/s  pdf (83.30 %)
BM_ZFlat/5      430672     429720       9724 909.0MB/s  html4 (22.52 %)
BM_ZFlat/6      420541     419636       9833 345.6MB/s  txt1 (57.88 %)
BM_ZFlat/7      373829     373158      10000 319.9MB/s  txt2 (61.91 %)
BM_ZFlat/8     1119014    1116604       3755 364.5MB/s  txt3 (54.99 %)
BM_ZFlat/9     1544203    1540657       2748 298.3MB/s  txt4 (66.26 %)
BM_ZFlat/10      91041      90866      46002 1.2GB/s  pb (19.68 %)
BM_ZFlat/11     332766     331990      10000 529.5MB/s  gaviota (37.72 %)
BM_ZFlat/12      39960      39886     100000 588.3MB/s  cp (48.12 %)
BM_ZFlat/13      14493      14465     287181 735.1MB/s  c (42.47 %)
BM_ZFlat/14       4447       4440     947927 799.3MB/s  lsp (48.37 %)
BM_ZFlat/15    1316362    1313350       3196 747.7MB/s  xls (41.23 %)
BM_ZFlat/16        312        311   10000000 613.0MB/s  xls_200 (78.00 %)
BM_ZFlat/17     388471     387502      10000 1.2GB/s  bin (18.11 %)
BM_ZFlat/18         65         64   64838208 2.9GB/s  bin_200 (7.50 %)
BM_ZFlat/19      65900      65787      63099 554.3MB/s  sum (48.96 %)
BM_ZFlat/20       6188       6177     681951 652.6MB/s  man (59.21 %)

SAMPLE (after)

Benchmark     Time(ns)    CPU(ns) Iterations
--------------------------------------------
BM_ZFlat/0       99259      99044      42428 986.0MB/s  html (22.31 %)
BM_ZFlat/1     1257039    1255276       3341 533.4MB/s  urls (47.78 %)
BM_ZFlat/2       10044      10030     405781 11.4GB/s  jpg (99.95 %)
BM_ZFlat/3         268        267   15732282 713.3MB/s  jpg_200 (73.00 %)
BM_ZFlat/4       11675      11657     358629 8.2GB/s  pdf (83.30 %)
BM_ZFlat/5      420951     419818       9739 930.5MB/s  html4 (22.52 %)
BM_ZFlat/6      415460     414632      10000 349.8MB/s  txt1 (57.88 %)
BM_ZFlat/7      367191     366436      10000 325.8MB/s  txt2 (61.91 %)
BM_ZFlat/8     1098345    1096036       3819 371.3MB/s  txt3 (54.99 %)
BM_ZFlat/9     1508701    1505306       2758 305.3MB/s  txt4 (66.26 %)
BM_ZFlat/10      87195      87031      47289 1.3GB/s  pb (19.68 %)
BM_ZFlat/11     322338     321637      10000 546.5MB/s  gaviota (37.72 %)
BM_ZFlat/12      36739      36668     100000 639.9MB/s  cp (48.12 %)
BM_ZFlat/13      13646      13618     304009 780.9MB/s  c (42.47 %)
BM_ZFlat/14       4249       4240     992456 837.0MB/s  lsp (48.37 %)
BM_ZFlat/15    1262925    1260012       3314 779.4MB/s  xls (41.23 %)
BM_ZFlat/16        308        308   10000000 619.8MB/s  xls_200 (78.00 %)
BM_ZFlat/17     379750     378944      10000 1.3GB/s  bin (18.11 %)
BM_ZFlat/18         62         62   67443280 3.0GB/s  bin_200 (7.50 %)
BM_ZFlat/19      61706      61587      67645 592.1MB/s  sum (48.96 %)
BM_ZFlat/20       5968       5958     698974 676.6MB/s  man (59.21 %)
2017-01-26 21:39:39 +01:00
Steinar H. Gunderson 7525a1600d Fix an issue where the ByteSource path (used for parsing std::string)
would incorrectly accept some invalid varints that the other path would not,
causing potential CHECK-failures if the unit test were run with
--write_uncompressed and a corrupted input file.

Found by the afl fuzzer.
2016-01-04 12:52:15 +01:00
Steinar H. Gunderson 0852af7606 Move the logic from ComputeTable into the unit test, which means it's run
automatically together with the other tests, and also removes the stray
function ComputeTable() (which was never referenced by anything else
in the open-source version, causing compiler warnings for some)
out of the core library.

Fixes public issue 96.

A=sesse
R=sanjay
2015-08-19 11:37:51 +02:00
Steinar H. Gunderson b2312c4c25 Add support for Uncompress(source, sink). Various changes to allow
Uncompress(source, sink) to get the same performance as the different
variants of Uncompress to Cord/DataBuffer/String/FlatBuffer.

Changes to efficiently support Uncompress(source, sink)
--------

a) For strings - we add support to StringByteSink to do GetAppendBuffer so we
   can write to it without copying.
b) For flat array buffers, we do GetAppendBuffer and see if we can get a full buffer.

With the above changes we get performance with ByteSource/ByteSink
that is	very close to directly using flat arrays and strings.

We add various benchmark cases to demonstrate that.

Orthogonal change
------------------

Add support for TryFastAppend() for SnappyScatteredWriter.

Benchmark results are below

CPU: Intel Core2 dL1:32KB dL2:4096KB
Benchmark              Time(ns)    CPU(ns) Iterations
-----------------------------------------------------
BM_UFlat/0               109065     108996       6410 896.0MB/s  html
BM_UFlat/1              1012175    1012343        691 661.4MB/s  urls
BM_UFlat/2                26775      26771      26149 4.4GB/s  jpg
BM_UFlat/3                48947      48940      14363 1.8GB/s  pdf
BM_UFlat/4               441029     440835       1589 886.1MB/s  html4
BM_UFlat/5                39861      39880      17823 588.3MB/s  cp
BM_UFlat/6                18315      18300      38126 581.1MB/s  c
BM_UFlat/7                 5254       5254     100000 675.4MB/s  lsp
BM_UFlat/8              1568060    1567376        447 626.6MB/s  xls
BM_UFlat/9               337512     337734       2073 429.5MB/s  txt1
BM_UFlat/10              287269     287054       2434 415.9MB/s  txt2
BM_UFlat/11              890098     890219        787 457.2MB/s  txt3
BM_UFlat/12             1186593    1186863        590 387.2MB/s  txt4
BM_UFlat/13              573927     573318       1000 853.7MB/s  bin
BM_UFlat/14               64250      64294      10000 567.2MB/s  sum
BM_UFlat/15                7301       7300      96153 552.2MB/s  man
BM_UFlat/16              109617     109636       6375 1031.5MB/s  pb
BM_UFlat/17              364438     364497       1921 482.3MB/s  gaviota
BM_UFlatSink/0           108518     108465       6450 900.4MB/s  html
BM_UFlatSink/1           991952     991997        705 675.0MB/s  urls
BM_UFlatSink/2            26815      26798      26065 4.4GB/s  jpg
BM_UFlatSink/3            49127      49122      14255 1.8GB/s  pdf
BM_UFlatSink/4           436674     436731       1604 894.4MB/s  html4
BM_UFlatSink/5            39738      39733      17345 590.5MB/s  cp
BM_UFlatSink/6            18413      18416      37962 577.4MB/s  c
BM_UFlatSink/7             5677       5676     100000 625.2MB/s  lsp
BM_UFlatSink/8          1552175    1551026        451 633.2MB/s  xls
BM_UFlatSink/9           338526     338489       2065 428.5MB/s  txt1
BM_UFlatSink/10          289387     289307       2420 412.6MB/s  txt2
BM_UFlatSink/11          893803     893706        783 455.4MB/s  txt3
BM_UFlatSink/12         1195919    1195459        586 384.4MB/s  txt4
BM_UFlatSink/13          559637     559779       1000 874.3MB/s  bin
BM_UFlatSink/14           65073      65094      10000 560.2MB/s  sum
BM_UFlatSink/15            7618       7614      92823 529.5MB/s  man
BM_UFlatSink/16          110085     110121       6352 1027.0MB/s  pb
BM_UFlatSink/17          369196     368915       1896 476.5MB/s  gaviota
BM_UValidate/0            46954      46957      14899 2.0GB/s  html
BM_UValidate/1           500621     500868       1000 1.3GB/s  urls
BM_UValidate/2              283        283    2481447 417.2GB/s  jpg
BM_UValidate/3            16230      16228      43137 5.4GB/s  pdf
BM_UValidate/4           189129     189193       3701 2.0GB/s  html4

A=uday
R=sanjay
2015-07-06 14:21:00 +02:00
Steinar H. Gunderson b2ad960067 Changes to eliminate compiler warnings on MSVC
This code was not compiling under Visual Studio 2013 with warnings being treated
as errors. Specifically:

1. Changed int -> size_t to eliminate signed/unsigned mismatch warning.
2. Added some missing return values to functions.
3. Inserting character instead of integer literals into strings to avoid type
   conversions.

A=cmumford
R=jeff
2015-06-22 16:09:56 +02:00