Ilya Tokar
92f18e66fd
Add prefetch to zippy compress
...
PiperOrigin-RevId: 518358512
2023-03-29 17:31:17 -07:00
Snappy Team
f603a02008
Explicitly #include <utility> in snappy-internal.h
...
snappy-internal.h uses std::pair, which is defined in the <utility>
header. Typically, this works because existing C++ standard library
implementations provide <utility> via other transitive includes;
however, these transitive includes are not guaranteed to exist, and
don't exist in certain contexts (e.g. compiling against LLVM's libc++
with Clang modules.)
PiperOrigin-RevId: 517213822
2023-03-29 17:31:10 -07:00
Snappy Team
15e2a0e13d
Add "cc" clobbers to inline asm that modifies flags.
...
As far as we know, the lack of "cc" in the clobbers hasn't caused
problems yet, but it could. This change is to improve correctness,
and is also almost certainly performance neutral.
PiperOrigin-RevId: 487133620
2023-01-12 13:33:01 +00:00
Snappy Team
6a2b78a379
Optimize Zippy compression for ARM by 5-10% by choosing csel instructions
...
PiperOrigin-RevId: 444863689
2022-05-09 16:19:11 +00:00
Victor Costan
7062d7f1d8
Merge pull request #133 from JunHe77:simd
...
PiperOrigin-RevId: 393681630
2021-08-30 01:36:24 +00:00
Victor Costan
cbb83a1d64
Migrate feature detection macro checks from #ifdef to #if.
...
The #if predicate evaluates to false if the macro is undefined, or
defined to 0. #ifdef (and its synonym #if defined) evaluates to false
only if the macro is undefined.
The new setup allows differentiating between setting a macro to 0 (to
express that the capability definitely does not exist / should not be
used) and leaving a macro undefined (to express not knowing whether a
capability exists / not caring if a capability is used).
PiperOrigin-RevId: 391094241
2021-08-16 18:26:33 +00:00
Jun He
ab9a57280d
Fix SSE3 and BMI2 compile error
...
After SHUFFLE code blocks are refactored, "tmmintrin.h"
is missed, and bmi2 code part will have build failure
as type conflicts.
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I7800cd7e050f4d349e5a227206b14b9c566e547f
2021-08-12 15:45:41 +08:00
Snappy Team
9cc3689b21
Optimize memset to pure SIMD because compilers generate consistently bad code. clang for ARM and gcc for x86 https://gcc.godbolt.org/z/oxeGG7aEx
...
PiperOrigin-RevId: 383467656
2021-08-02 14:49:57 +00:00
atdt
b3fb0b5b4b
Enable vector byte shuffle optimizations on ARM NEON
...
The SSSE3 intrinsics we use have their direct analogues in NEON, so making this optimization portable requires a very thin translation layer.
PiperOrigin-RevId: 381280165
2021-07-05 01:05:44 +00:00
Snappy Team
289c8a3c0a
Make zippy decompression branchless
...
PiperOrigin-RevId: 342423961
2020-11-18 23:21:38 +00:00
Chris Kennelly
7ffaf77cf4
Replace ARCH_K8 with __x86_64__.
...
PiperOrigin-RevId: 321389098
2020-10-07 21:12:27 +00:00
Victor Costan
231b8be076
Migrate to standard integral types.
...
The following changes are done via find/replace.
* int8 -> int8_t
* int16 -> int16_t
* int32 -> int32_t
* int64 -> int64_t
The aliases were removed from snappy-stubs-public.h.
PiperOrigin-RevId: 306141557
2020-04-12 20:10:03 +00:00
Snappy Team
d674348a0c
Improve zippy with 5-10%.
...
BM_ZCord/0 [html ] 1.26GB/s ± 0% 1.35GB/s ± 0% +7.90% (p=0.008 n=5+5)
BM_ZCord/1 [urls ] 535MB/s ± 0% 562MB/s ± 0% +5.05% (p=0.008 n=5+5)
BM_ZCord/2 [jpg ] 10.2GB/s ± 1% 10.2GB/s ± 0% ~ (p=0.310 n=5+5)
BM_ZCord/3 [jpg_200] 841MB/s ± 1% 846MB/s ± 1% ~ (p=0.421 n=5+5)
BM_ZCord/4 [pdf ] 6.77GB/s ± 1% 7.06GB/s ± 1% +4.28% (p=0.008 n=5+5)
BM_ZCord/5 [html4 ] 1.00GB/s ± 0% 1.08GB/s ± 0% +7.94% (p=0.008 n=5+5)
BM_ZCord/6 [txt1 ] 391MB/s ± 0% 417MB/s ± 0% +6.71% (p=0.008 n=5+5)
BM_ZCord/7 [txt2 ] 363MB/s ± 0% 388MB/s ± 0% +6.73% (p=0.016 n=5+4)
BM_ZCord/8 [txt3 ] 400MB/s ± 0% 426MB/s ± 0% +6.55% (p=0.008 n=5+5)
BM_ZCord/9 [txt4 ] 328MB/s ± 0% 350MB/s ± 0% +6.66% (p=0.008 n=5+5)
BM_ZCord/10 [pb ] 1.67GB/s ± 1% 1.80GB/s ± 0% +7.52% (p=0.008 n=5+5)
1) A key bottleneck in the data dependency chain is figuring out how many bytes are matched and loading the data for next hash value. The load-to-use latency is 5 cycles, in previous cl/303353110 we removed the load in lieu of "shrd" to align previous loads. Unfortunately "shrd" itself has a latency of 4 cycles, we'd prefer "shrx" which takes 1 cycle for variable shifts.
2)Maximally use data already computed. The above trick calculates 5 bytes of useful data. So in case we need to search for new match we can use this for the first search (which is one byte further).
PiperOrigin-RevId: 303875535
2020-04-11 04:41:15 +00:00
Snappy Team
4dfcad9f4e
assertion failure on darwin_x86_64, have to investigage
...
PiperOrigin-RevId: 303428229
2020-04-11 04:41:07 +00:00
Snappy Team
e19178748f
assertion failure on darwin_x86_64, have to investigage
...
PiperOrigin-RevId: 303346402
2020-04-11 04:40:57 +00:00
Snappy Team
0faf56378e
This cl does two things
...
1) It shaves of a few cycles from the data dependency chain. By using "shrd" instead of a load.
2) The important loop is finding small copies (4-12) which are either "copy 1", or "copy 2" depending if the offset fits <2048. It turns out that this is a branch that is mispredicted often. Due to the long dependency chain the CPU is running with IPC~1 anyway so we can freely add instructions to instead emit copies branchfree. This reduces the branch misspredicts from 15% to 11% (for BM_ZFlat/6 txt1) and from 5.6% to 4% (for BM_ZFlat/10 or pb).
PiperOrigin-RevId: 303328967
2020-04-11 04:40:48 +00:00
alkis
53a38e5e33
Reduce number of allocations when compressing and simplify the code.
...
Before we were allocating at least once: twice with large table and
thrice when we used a scratch buffer. With this approach we always
allocate once.
name old speed new speed delta
BM_UFlat/0 [html ] 2.45GB/s ± 0% 2.45GB/s ± 0% -0.13% (p=0.000 n=11+11)
BM_UFlat/1 [urls ] 1.19GB/s ± 0% 1.22GB/s ± 0% +2.48% (p=0.000 n=11+11)
BM_UFlat/2 [jpg ] 17.2GB/s ± 2% 17.3GB/s ± 1% ~ (p=0.193 n=11+11)
BM_UFlat/3 [jpg_200 ] 1.52GB/s ± 0% 1.51GB/s ± 0% -0.78% (p=0.000 n=10+9)
BM_UFlat/4 [pdf ] 12.5GB/s ± 1% 12.5GB/s ± 1% ~ (p=0.881 n=9+9)
BM_UFlat/5 [html4 ] 1.86GB/s ± 0% 1.86GB/s ± 0% ~ (p=0.123 n=11+11)
BM_UFlat/6 [txt1 ] 793MB/s ± 0% 799MB/s ± 0% +0.78% (p=0.000 n=11+9)
BM_UFlat/7 [txt2 ] 739MB/s ± 0% 744MB/s ± 0% +0.77% (p=0.000 n=11+11)
BM_UFlat/8 [txt3 ] 839MB/s ± 0% 845MB/s ± 0% +0.71% (p=0.000 n=11+11)
BM_UFlat/9 [txt4 ] 678MB/s ± 0% 685MB/s ± 0% +1.01% (p=0.000 n=11+11)
BM_UFlat/10 [pb ] 3.08GB/s ± 0% 3.12GB/s ± 0% +1.21% (p=0.000 n=11+11)
BM_UFlat/11 [gaviota ] 975MB/s ± 0% 976MB/s ± 0% +0.11% (p=0.000 n=11+11)
BM_UFlat/12 [cp ] 1.73GB/s ± 1% 1.74GB/s ± 1% +0.46% (p=0.010 n=11+11)
BM_UFlat/13 [c ] 1.53GB/s ± 0% 1.53GB/s ± 0% ~ (p=0.987 n=11+10)
BM_UFlat/14 [lsp ] 1.65GB/s ± 0% 1.63GB/s ± 1% -1.04% (p=0.000 n=11+11)
BM_UFlat/15 [xls ] 1.08GB/s ± 0% 1.15GB/s ± 0% +6.12% (p=0.000 n=10+11)
BM_UFlat/16 [xls_200 ] 944MB/s ± 0% 920MB/s ± 3% -2.51% (p=0.000 n=9+11)
BM_UFlat/17 [bin ] 1.86GB/s ± 0% 1.87GB/s ± 0% +0.68% (p=0.000 n=10+11)
BM_UFlat/18 [bin_200 ] 1.91GB/s ± 3% 1.92GB/s ± 5% ~ (p=0.356 n=11+11)
BM_UFlat/19 [sum ] 1.31GB/s ± 0% 1.40GB/s ± 0% +6.53% (p=0.000 n=11+11)
BM_UFlat/20 [man ] 1.42GB/s ± 0% 1.42GB/s ± 0% +0.33% (p=0.000 n=10+10)
2019-01-04 19:07:49 -08:00
costan
ad82620f6f
Move pshufb_fill_patterns from snappy-internal.h to snappy.cc.
...
The array of constants is only used in the SSSE3 fast-path in IncrementalCopy.
2018-08-09 12:08:12 -07:00
atdt
8f469d97e2
Avoid store-forwarding stalls in Zippy's IncrementalCopy
...
NEW: Annotate `pattern` as initialized, for MSan.
Snappy's IncrementalCopy routine optimizes for speed by reading and writing
memory in blocks of eight or sixteen bytes. If the gap between the source
and destination pointers is smaller than eight bytes, snappy's strategy is
to expand the gap by issuing a series of partly-overlapping eight-byte
loads+stores. Because the range of each load partly overlaps that of the
store which preceded it, the store buffer cannot be forwarded to the load,
and the load stalls while it waits for the store to retire. This is called a
store-forwarding stall.
We can use fewer loads and avoid most of the stalls by loading the first
eight bytes into an 128-bit XMM register, then using PSHUFB to permute the
register's contents in-place into the desired repeating sequence of bytes.
When falling back to IncrementalCopySlow, use memset if the pattern size == 1.
This eliminates around 60% of the stalls.
name old time/op new time/op delta
BM_UFlat/0 [html] 48.6µs ± 0% 48.2µs ± 0% -0.92% (p=0.000 n=19+18)
BM_UFlat/1 [urls] 589µs ± 0% 576µs ± 0% -2.17% (p=0.000 n=19+18)
BM_UFlat/2 [jpg] 7.12µs ± 0% 7.10µs ± 0% ~ (p=0.071 n=19+18)
BM_UFlat/3 [jpg_200] 162ns ± 0% 151ns ± 0% -7.06% (p=0.000 n=19+18)
BM_UFlat/4 [pdf] 8.25µs ± 0% 8.19µs ± 0% -0.74% (p=0.000 n=19+18)
BM_UFlat/5 [html4] 218µs ± 0% 218µs ± 0% +0.09% (p=0.000 n=17+18)
BM_UFlat/6 [txt1] 191µs ± 0% 189µs ± 0% -1.12% (p=0.000 n=19+18)
BM_UFlat/7 [txt2] 168µs ± 0% 167µs ± 0% -1.01% (p=0.000 n=19+18)
BM_UFlat/8 [txt3] 502µs ± 0% 499µs ± 0% -0.52% (p=0.000 n=19+18)
BM_UFlat/9 [txt4] 704µs ± 0% 695µs ± 0% -1.26% (p=0.000 n=19+18)
BM_UFlat/10 [pb] 45.6µs ± 0% 44.2µs ± 0% -3.13% (p=0.000 n=19+15)
BM_UFlat/11 [gaviota] 188µs ± 0% 194µs ± 0% +3.06% (p=0.000 n=15+18)
BM_UFlat/12 [cp] 15.1µs ± 2% 14.7µs ± 1% -2.09% (p=0.000 n=18+18)
BM_UFlat/13 [c] 7.38µs ± 0% 7.36µs ± 0% -0.28% (p=0.000 n=16+18)
BM_UFlat/14 [lsp] 2.31µs ± 0% 2.37µs ± 0% +2.64% (p=0.000 n=19+18)
BM_UFlat/15 [xls] 984µs ± 0% 909µs ± 0% -7.59% (p=0.000 n=19+18)
BM_UFlat/16 [xls_200] 215ns ± 0% 217ns ± 0% +0.71% (p=0.000 n=19+15)
BM_UFlat/17 [bin] 289µs ± 0% 287µs ± 0% -0.71% (p=0.000 n=19+18)
BM_UFlat/18 [bin_200] 161ns ± 0% 116ns ± 0% -28.09% (p=0.000 n=19+16)
BM_UFlat/19 [sum] 31.9µs ± 0% 29.2µs ± 0% -8.37% (p=0.000 n=19+18)
BM_UFlat/20 [man] 3.13µs ± 1% 3.07µs ± 0% -1.79% (p=0.000 n=19+18)
name old allocs/op new allocs/op delta
BM_UFlat/0 [html] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/1 [urls] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/2 [jpg] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/3 [jpg_200] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/4 [pdf] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/5 [html4] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/6 [txt1] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/7 [txt2] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/8 [txt3] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/9 [txt4] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/10 [pb] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/11 [gaviota] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/12 [cp] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/13 [c] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/14 [lsp] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/15 [xls] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/16 [xls_200] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/17 [bin] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/18 [bin_200] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/19 [sum] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_UFlat/20 [man] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
name old speed new speed delta
BM_UFlat/0 [html] 2.11GB/s ± 0% 2.13GB/s ± 0% +0.92% (p=0.000 n=19+18)
BM_UFlat/1 [urls] 1.19GB/s ± 0% 1.22GB/s ± 0% +2.22% (p=0.000 n=16+17)
BM_UFlat/2 [jpg] 17.3GB/s ± 0% 17.3GB/s ± 0% ~ (p=0.074 n=19+18)
BM_UFlat/3 [jpg_200] 1.23GB/s ± 0% 1.33GB/s ± 0% +7.58% (p=0.000 n=19+18)
BM_UFlat/4 [pdf] 12.4GB/s ± 0% 12.5GB/s ± 0% +0.74% (p=0.000 n=19+18)
BM_UFlat/5 [html4] 1.88GB/s ± 0% 1.88GB/s ± 0% -0.09% (p=0.000 n=18+18)
BM_UFlat/6 [txt1] 798MB/s ± 0% 807MB/s ± 0% +1.13% (p=0.000 n=19+18)
BM_UFlat/7 [txt2] 743MB/s ± 0% 751MB/s ± 0% +1.02% (p=0.000 n=19+18)
BM_UFlat/8 [txt3] 850MB/s ± 0% 855MB/s ± 0% +0.52% (p=0.000 n=19+18)
BM_UFlat/9 [txt4] 684MB/s ± 0% 693MB/s ± 0% +1.28% (p=0.000 n=19+18)
BM_UFlat/10 [pb] 2.60GB/s ± 0% 2.69GB/s ± 0% +3.25% (p=0.000 n=19+16)
BM_UFlat/11 [gaviota] 979MB/s ± 0% 950MB/s ± 0% -2.97% (p=0.000 n=15+18)
BM_UFlat/12 [cp] 1.63GB/s ± 2% 1.67GB/s ± 1% +2.13% (p=0.000 n=18+18)
BM_UFlat/13 [c] 1.51GB/s ± 0% 1.52GB/s ± 0% +0.29% (p=0.000 n=16+18)
BM_UFlat/14 [lsp] 1.61GB/s ± 1% 1.57GB/s ± 0% -2.57% (p=0.000 n=19+18)
BM_UFlat/15 [xls] 1.05GB/s ± 0% 1.13GB/s ± 0% +8.22% (p=0.000 n=19+18)
BM_UFlat/16 [xls_200] 928MB/s ± 0% 921MB/s ± 0% -0.81% (p=0.000 n=19+17)
BM_UFlat/17 [bin] 1.78GB/s ± 0% 1.79GB/s ± 0% +0.71% (p=0.000 n=19+18)
BM_UFlat/18 [bin_200] 1.24GB/s ± 0% 1.72GB/s ± 0% +38.92% (p=0.000 n=19+18)
BM_UFlat/19 [sum] 1.20GB/s ± 0% 1.31GB/s ± 0% +9.15% (p=0.000 n=19+18)
BM_UFlat/20 [man] 1.35GB/s ± 1% 1.38GB/s ± 0% +1.84% (p=0.000 n=19+18)
2018-08-04 18:51:07 -07:00
costan
632cd0f128
Use 64-bit optimized code path for ARM64.
...
This is inspired by https://github.com/google/snappy/pull/22 .
Benchmark results with the change, Pixel C with Android N2G48B
Benchmark Time(ns) CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0 119544 119253 1501 818.9MB/s html
BM_UFlat/1 1223950 1208588 163 554.0MB/s urls
BM_UFlat/2 16081 15962 11527 7.2GB/s jpg
BM_UFlat/3 356 352 416666 540.6MB/s jpg_200
BM_UFlat/4 25010 24860 7683 3.8GB/s pdf
BM_UFlat/5 484832 481572 407 811.1MB/s html4
BM_UFlat/6 408410 408713 482 354.9MB/s txt1
BM_UFlat/7 361714 361663 553 330.1MB/s txt2
BM_UFlat/8 1090582 1087912 182 374.1MB/s txt3
BM_UFlat/9 1503127 1503759 133 305.6MB/s txt4
BM_UFlat/10 114183 114285 1715 989.6MB/s pb
BM_UFlat/11 406714 407331 491 431.5MB/s gaviota
BM_UIOVec/0 370397 369888 538 264.0MB/s html
BM_UIOVec/1 3207510 3190000 100 209.9MB/s urls
BM_UIOVec/2 16589 16573 11223 6.9GB/s jpg
BM_UIOVec/3 1052 1052 165289 181.2MB/s jpg_200
BM_UIOVec/4 49151 49184 3985 1.9GB/s pdf
BM_UValidate/0 68115 68095 2893 1.4GB/s html
BM_UValidate/1 792652 792000 250 845.4MB/s urls
BM_UValidate/2 334 334 487804 343.1GB/s jpg
BM_UValidate/3 235 235 666666 809.9MB/s jpg_200
BM_UValidate/4 6126 6130 32626 15.6GB/s pdf
BM_ZFlat/0 292697 290560 678 336.1MB/s html (22.31 %)
BM_ZFlat/1 4062080 4050000 100 165.3MB/s urls (47.78 %)
BM_ZFlat/2 29225 29274 6422 3.9GB/s jpg (99.95 %)
BM_ZFlat/3 1099 1098 163934 173.7MB/s jpg_200 (73.00 %)
BM_ZFlat/4 44117 44233 4205 2.2GB/s pdf (83.30 %)
BM_ZFlat/5 1158058 1157894 171 337.4MB/s html4 (22.52 %)
BM_ZFlat/6 1102983 1093922 181 132.6MB/s txt1 (57.88 %)
BM_ZFlat/7 974142 975490 204 122.4MB/s txt2 (61.91 %)
BM_ZFlat/8 2984670 2990000 100 136.1MB/s txt3 (54.99 %)
BM_ZFlat/9 4100130 4090000 100 112.4MB/s txt4 (66.26 %)
BM_ZFlat/10 276236 275139 716 411.0MB/s pb (19.68 %)
BM_ZFlat/11 760091 759541 262 231.4MB/s gaviota (37.72 %)
Baseline benchmark results, Pixel C with Android N2G48B
Benchmark Time(ns) CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0 148957 147565 1335 661.8MB/s html
BM_UFlat/1 1527257 1500000 132 446.4MB/s urls
BM_UFlat/2 19589 19397 8764 5.9GB/s jpg
BM_UFlat/3 425 418 408163 455.3MB/s jpg_200
BM_UFlat/4 30096 29552 6497 3.2GB/s pdf
BM_UFlat/5 595933 594594 333 657.0MB/s html4
BM_UFlat/6 516315 514360 383 282.0MB/s txt1
BM_UFlat/7 454653 453514 441 263.2MB/s txt2
BM_UFlat/8 1382687 1361111 144 299.0MB/s txt3
BM_UFlat/9 1967590 1904761 105 241.3MB/s txt4
BM_UFlat/10 148271 144560 1342 782.3MB/s pb
BM_UFlat/11 523997 510471 382 344.4MB/s gaviota
BM_UIOVec/0 478443 465227 417 209.9MB/s html
BM_UIOVec/1 4172860 4060000 100 164.9MB/s urls
BM_UIOVec/2 21470 20975 7342 5.5GB/s jpg
BM_UIOVec/3 1357 1330 75187 143.4MB/s jpg_200
BM_UIOVec/4 63143 61365 3031 1.6GB/s pdf
BM_UValidate/0 86910 85125 2279 1.1GB/s html
BM_UValidate/1 1022256 1000000 195 669.6MB/s urls
BM_UValidate/2 420 417 400000 274.6GB/s jpg
BM_UValidate/3 311 302 571428 630.0MB/s jpg_200
BM_UValidate/4 7778 7584 25445 12.6GB/s pdf
BM_ZFlat/0 469209 457547 424 213.4MB/s html (22.31 %)
BM_ZFlat/1 5633510 5460000 100 122.6MB/s urls (47.78 %)
BM_ZFlat/2 37896 36693 4524 3.1GB/s jpg (99.95 %)
BM_ZFlat/3 1485 1441 123456 132.3MB/s jpg_200 (73.00 %)
BM_ZFlat/4 74870 72775 2652 1.3GB/s pdf (83.30 %)
BM_ZFlat/5 1857321 1785714 112 218.8MB/s html4 (22.52 %)
BM_ZFlat/6 1538723 1492307 130 97.2MB/s txt1 (57.88 %)
BM_ZFlat/7 1338236 1310810 148 91.1MB/s txt2 (61.91 %)
BM_ZFlat/8 4050820 4040000 100 100.7MB/s txt3 (54.99 %)
BM_ZFlat/9 5234940 5230000 100 87.9MB/s txt4 (66.26 %)
BM_ZFlat/10 400309 400000 495 282.7MB/s pb (19.68 %)
BM_ZFlat/11 1063042 1058510 188 166.1MB/s gaviota (37.72 %)
2017-08-16 19:18:22 -07:00
jueminyang
71b8f86887
Add SNAPPY_ prefix to PREDICT_{TRUE,FALSE} macros.
2017-08-01 14:36:26 -07:00
costan
038a3329b1
Inline DISALLOW_COPY_AND_ASSIGN.
...
snappy-stubs-public.h defined the DISALLOW_COPY_AND_ASSIGN macro, so the
definition propagated to all translation units that included the open
source headers. The macro is now inlined, thus avoiding polluting the
macro environment of snappy users.
2017-07-27 16:46:42 -07:00
alkis
18488d6212
Use 64 bit little endian on ppc64le.
...
This has tangible performance benefits.
This lands https://github.com/google/snappy/pull/27
2017-06-28 18:33:13 -07:00
costan
ed3b7b242b
Clean up unused function warnings in snappy.
2017-03-17 13:59:03 -07:00
Behzad Nouri
818b583387
adds std:: to stl types ( #061 )
2017-01-26 21:43:13 +01:00
Geoff Pike
38a5ec5fca
Re-work fast path that emits copies in zippy compression.
...
The primary motivation for the change is that FindMatchLength is
likely to discover a difference in the first 8 bytes it compares.
If that occurs then we know the length of the match is less than 12,
because FindMatchLength is invoked after a 4-byte match is found.
When emitting a copy, it is useful to know that the length is less
than 12 because the two-byte variant of an emitted copy requires that.
This is a performance-tuning change that should not affect the
library's behavior.
With FDO on perflab/Haswell the geometric mean for ZFlat/* went from
47,290ns to 45,741ns, an improvement of 3.4%.
SAMPLE (before)
BM_ZFlat/0 102824 102650 40691 951.4MB/s html (22.31 %)
BM_ZFlat/1 1293512 1290442 3225 518.9MB/s urls (47.78 %)
BM_ZFlat/2 10373 10353 417959 11.1GB/s jpg (99.95 %)
BM_ZFlat/3 268 268 15745324 712.4MB/s jpg_200 (73.00 %)
BM_ZFlat/4 12137 12113 342462 7.9GB/s pdf (83.30 %)
BM_ZFlat/5 430672 429720 9724 909.0MB/s html4 (22.52 %)
BM_ZFlat/6 420541 419636 9833 345.6MB/s txt1 (57.88 %)
BM_ZFlat/7 373829 373158 10000 319.9MB/s txt2 (61.91 %)
BM_ZFlat/8 1119014 1116604 3755 364.5MB/s txt3 (54.99 %)
BM_ZFlat/9 1544203 1540657 2748 298.3MB/s txt4 (66.26 %)
BM_ZFlat/10 91041 90866 46002 1.2GB/s pb (19.68 %)
BM_ZFlat/11 332766 331990 10000 529.5MB/s gaviota (37.72 %)
BM_ZFlat/12 39960 39886 100000 588.3MB/s cp (48.12 %)
BM_ZFlat/13 14493 14465 287181 735.1MB/s c (42.47 %)
BM_ZFlat/14 4447 4440 947927 799.3MB/s lsp (48.37 %)
BM_ZFlat/15 1316362 1313350 3196 747.7MB/s xls (41.23 %)
BM_ZFlat/16 312 311 10000000 613.0MB/s xls_200 (78.00 %)
BM_ZFlat/17 388471 387502 10000 1.2GB/s bin (18.11 %)
BM_ZFlat/18 65 64 64838208 2.9GB/s bin_200 (7.50 %)
BM_ZFlat/19 65900 65787 63099 554.3MB/s sum (48.96 %)
BM_ZFlat/20 6188 6177 681951 652.6MB/s man (59.21 %)
SAMPLE (after)
Benchmark Time(ns) CPU(ns) Iterations
--------------------------------------------
BM_ZFlat/0 99259 99044 42428 986.0MB/s html (22.31 %)
BM_ZFlat/1 1257039 1255276 3341 533.4MB/s urls (47.78 %)
BM_ZFlat/2 10044 10030 405781 11.4GB/s jpg (99.95 %)
BM_ZFlat/3 268 267 15732282 713.3MB/s jpg_200 (73.00 %)
BM_ZFlat/4 11675 11657 358629 8.2GB/s pdf (83.30 %)
BM_ZFlat/5 420951 419818 9739 930.5MB/s html4 (22.52 %)
BM_ZFlat/6 415460 414632 10000 349.8MB/s txt1 (57.88 %)
BM_ZFlat/7 367191 366436 10000 325.8MB/s txt2 (61.91 %)
BM_ZFlat/8 1098345 1096036 3819 371.3MB/s txt3 (54.99 %)
BM_ZFlat/9 1508701 1505306 2758 305.3MB/s txt4 (66.26 %)
BM_ZFlat/10 87195 87031 47289 1.3GB/s pb (19.68 %)
BM_ZFlat/11 322338 321637 10000 546.5MB/s gaviota (37.72 %)
BM_ZFlat/12 36739 36668 100000 639.9MB/s cp (48.12 %)
BM_ZFlat/13 13646 13618 304009 780.9MB/s c (42.47 %)
BM_ZFlat/14 4249 4240 992456 837.0MB/s lsp (48.37 %)
BM_ZFlat/15 1262925 1260012 3314 779.4MB/s xls (41.23 %)
BM_ZFlat/16 308 308 10000000 619.8MB/s xls_200 (78.00 %)
BM_ZFlat/17 379750 378944 10000 1.3GB/s bin (18.11 %)
BM_ZFlat/18 62 62 67443280 3.0GB/s bin_200 (7.50 %)
BM_ZFlat/19 61706 61587 67645 592.1MB/s sum (48.96 %)
BM_ZFlat/20 5968 5958 698974 676.6MB/s man (59.21 %)
2017-01-26 21:39:39 +01:00
Steinar H. Gunderson
0852af7606
Move the logic from ComputeTable into the unit test, which means it's run
...
automatically together with the other tests, and also removes the stray
function ComputeTable() (which was never referenced by anything else
in the open-source version, causing compiler warnings for some)
out of the core library.
Fixes public issue 96.
A=sesse
R=sanjay
2015-08-19 11:37:51 +02:00
Steinar H. Gunderson
86eb8b152b
Change a few branch annotations that profiling found to be wrong.
...
Overall performance is neutral or slightly positive.
Westmere (64-bit, opt):
Benchmark Base (ns) New (ns) Improvement
--------------------------------------------------------------------------------------
BM_UFlat/0 73798 71464 1.3GB/s html +3.3%
BM_UFlat/1 715223 704318 953.5MB/s urls +1.5%
BM_UFlat/2 8137 8871 13.0GB/s jpg -8.3%
BM_UFlat/3 200 204 935.5MB/s jpg_200 -2.0%
BM_UFlat/4 21627 21281 4.5GB/s pdf +1.6%
BM_UFlat/5 302806 290350 1.3GB/s html4 +4.3%
BM_UFlat/6 218920 219017 664.1MB/s txt1 -0.0%
BM_UFlat/7 190437 191212 626.1MB/s txt2 -0.4%
BM_UFlat/8 584192 580484 703.4MB/s txt3 +0.6%
BM_UFlat/9 776537 779055 591.6MB/s txt4 -0.3%
BM_UFlat/10 76056 72606 1.5GB/s pb +4.8%
BM_UFlat/11 235962 239043 737.4MB/s gaviota -1.3%
BM_UFlat/12 28049 28000 840.1MB/s cp +0.2%
BM_UFlat/13 12225 12021 886.9MB/s c +1.7%
BM_UFlat/14 3362 3544 1004.0MB/s lsp -5.1%
BM_UFlat/15 937015 939206 1048.9MB/s xls -0.2%
BM_UFlat/16 236 233 823.1MB/s xls_200 +1.3%
BM_UFlat/17 373170 361947 1.3GB/s bin +3.1%
BM_UFlat/18 264 264 725.5MB/s bin_200 +0.0%
BM_UFlat/19 42834 43577 839.2MB/s sum -1.7%
BM_UFlat/20 4770 4736 853.6MB/s man +0.7%
BM_UValidate/0 39671 39944 2.4GB/s html -0.7%
BM_UValidate/1 443391 443391 1.5GB/s urls +0.0%
BM_UValidate/2 163 163 703.3GB/s jpg +0.0%
BM_UValidate/3 113 112 1.7GB/s jpg_200 +0.9%
BM_UValidate/4 7555 7608 12.6GB/s pdf -0.7%
BM_ZFlat/0 157616 157568 621.5MB/s html (22.31 %) +0.0%
BM_ZFlat/1 1997290 2014486 333.4MB/s urls (47.77 %) -0.9%
BM_ZFlat/2 23035 22237 5.2GB/s jpg (99.95 %) +3.6%
BM_ZFlat/3 539 540 354.5MB/s jpg_200 (73.00 %) -0.2%
BM_ZFlat/4 80709 81369 1.2GB/s pdf (81.85 %) -0.8%
BM_ZFlat/5 639059 639220 613.0MB/s html4 (22.51 %) -0.0%
BM_ZFlat/6 577203 583370 249.3MB/s txt1 (57.87 %) -1.1%
BM_ZFlat/7 510887 516094 232.0MB/s txt2 (61.93 %) -1.0%
BM_ZFlat/8 1535843 1556973 262.2MB/s txt3 (54.92 %) -1.4%
BM_ZFlat/9 2070068 2102380 219.3MB/s txt4 (66.22 %) -1.5%
BM_ZFlat/10 152396 152148 745.5MB/s pb (19.64 %) +0.2%
BM_ZFlat/11 447367 445859 395.4MB/s gaviota (37.72 %) +0.3%
BM_ZFlat/12 76375 76797 306.3MB/s cp (48.12 %) -0.5%
BM_ZFlat/13 31518 31987 333.3MB/s c (42.40 %) -1.5%
BM_ZFlat/14 10598 10827 328.6MB/s lsp (48.37 %) -2.1%
BM_ZFlat/15 1782243 1802728 546.5MB/s xls (41.23 %) -1.1%
BM_ZFlat/16 526 539 355.0MB/s xls_200 (78.00 %) -2.4%
BM_ZFlat/17 598141 597311 822.1MB/s bin (18.11 %) +0.1%
BM_ZFlat/18 121 120 1.6GB/s bin_200 (7.50 %) +0.8%
BM_ZFlat/19 109981 112173 326.0MB/s sum (48.96 %) -2.0%
BM_ZFlat/20 14355 14575 277.4MB/s man (59.36 %) -1.5%
Sum of all benchmarks 33882722 33879325 +0.0%
Sandy Bridge (64-bit, opt):
Benchmark Base (ns) New (ns) Improvement
--------------------------------------------------------------------------------------
BM_UFlat/0 43764 41600 2.3GB/s html +5.2%
BM_UFlat/1 517990 507058 1.3GB/s urls +2.2%
BM_UFlat/2 6625 5529 20.8GB/s jpg +19.8%
BM_UFlat/3 154 155 1.2GB/s jpg_200 -0.6%
BM_UFlat/4 12795 11747 8.1GB/s pdf +8.9%
BM_UFlat/5 200335 193413 2.0GB/s html4 +3.6%
BM_UFlat/6 156574 156426 929.2MB/s txt1 +0.1%
BM_UFlat/7 137574 137464 870.4MB/s txt2 +0.1%
BM_UFlat/8 422551 421603 967.4MB/s txt3 +0.2%
BM_UFlat/9 577749 578985 795.6MB/s txt4 -0.2%
BM_UFlat/10 42329 39362 2.8GB/s pb +7.5%
BM_UFlat/11 170615 169751 1037.9MB/s gaviota +0.5%
BM_UFlat/12 12800 12719 1.8GB/s cp +0.6%
BM_UFlat/13 6585 6579 1.6GB/s c +0.1%
BM_UFlat/14 2066 2044 1.7GB/s lsp +1.1%
BM_UFlat/15 750861 746911 1.3GB/s xls +0.5%
BM_UFlat/16 188 192 996.0MB/s xls_200 -2.1%
BM_UFlat/17 271622 264333 1.8GB/s bin +2.8%
BM_UFlat/18 208 207 923.6MB/s bin_200 +0.5%
BM_UFlat/19 24667 24845 1.4GB/s sum -0.7%
BM_UFlat/20 2663 2662 1.5GB/s man +0.0%
BM_ZFlat/0 115173 115624 846.5MB/s html (22.31 %) -0.4%
BM_ZFlat/1 1530331 1537769 436.5MB/s urls (47.77 %) -0.5%
BM_ZFlat/2 17503 17013 6.8GB/s jpg (99.95 %) +2.9%
BM_ZFlat/3 385 385 496.3MB/s jpg_200 (73.00 %) +0.0%
BM_ZFlat/4 61753 61540 1.6GB/s pdf (81.85 %) +0.3%
BM_ZFlat/5 484806 483356 810.1MB/s html4 (22.51 %) +0.3%
BM_ZFlat/6 464143 467609 310.9MB/s txt1 (57.87 %) -0.7%
BM_ZFlat/7 410315 413319 289.5MB/s txt2 (61.93 %) -0.7%
BM_ZFlat/8 1244082 1249381 326.5MB/s txt3 (54.92 %) -0.4%
BM_ZFlat/9 1696914 1709685 269.4MB/s txt4 (66.22 %) -0.7%
BM_ZFlat/10 104148 103372 1096.7MB/s pb (19.64 %) +0.8%
BM_ZFlat/11 363522 359722 489.8MB/s gaviota (37.72 %) +1.1%
BM_ZFlat/12 47021 50095 469.3MB/s cp (48.12 %) -6.1%
BM_ZFlat/13 16888 16985 627.4MB/s c (42.40 %) -0.6%
BM_ZFlat/14 5496 5469 650.3MB/s lsp (48.37 %) +0.5%
BM_ZFlat/15 1460713 1448760 679.5MB/s xls (41.23 %) +0.8%
BM_ZFlat/16 387 393 486.8MB/s xls_200 (78.00 %) -1.5%
BM_ZFlat/17 457654 451462 1086.6MB/s bin (18.11 %) +1.4%
BM_ZFlat/18 97 87 2.1GB/s bin_200 (7.50 %) +11.5%
BM_ZFlat/19 77904 80924 451.7MB/s sum (48.96 %) -3.7%
BM_ZFlat/20 7648 7663 527.1MB/s man (59.36 %) -0.2%
Sum of all benchmarks 25493635 25482069 +0.0%
A=dehao
R=sesse
2015-06-22 16:09:56 +02:00
Steinar H. Gunderson
22acaf438e
Change some internal path names.
...
This is mostly to sync up with some changes from Google's internal
repositories; it does not affect the open-source distribution in itself.
2015-06-22 15:39:08 +02:00
snappy.mirrorbot@gmail.com
8b95464146
Snappy library no longer depends on iostream.
...
Achieved by moving logging macro definitions to a test-only
header file, and by changing non-test code to use assert,
fprintf, and abort instead of LOG/CHECK macros.
R=sesse
git-svn-id: https://snappy.googlecode.com/svn/trunk@62 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-05-22 09:32:50 +00:00
snappy.mirrorbot@gmail.com
f19fb07e6d
Put back the final few lines of what was truncated during the
...
license header change.
R=csilvers
DELTA=5 (4 added, 0 deleted, 1 changed)
Revision created by MOE tool push_codebase.
MOE_MIGRATION=1094
git-svn-id: https://snappy.googlecode.com/svn/trunk@22 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-03-28 22:17:04 +00:00
snappy.mirrorbot@gmail.com
7e8ca8f831
Change on 2011-03-25 19:18:00-07:00 by sesse
...
Replace the Apache 2.0 license header by the BSD-type license header;
somehow a lot of the files were missed in the last round.
R=dannyb,csilvers
DELTA=147 (74 added, 2 deleted, 71 changed)
Change on 2011-03-25 19:25:07-07:00 by sesse
Unbreak the build; the relicensing removed a bit too much (only comments
were intended, but I also accidentially removed some of the top lines of
the actual source).
Revision created by MOE tool push_codebase.
MOE_MIGRATION=1072
git-svn-id: https://snappy.googlecode.com/svn/trunk@21 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-03-26 02:34:34 +00:00
snappy.mirrorbot@gmail.com
28a6440239
Revision created by MOE tool push_codebase.
...
MOE_MIGRATION=
git-svn-id: https://snappy.googlecode.com/svn/trunk@2 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-03-18 17:14:15 +00:00