Commit Graph

31 Commits

Author SHA1 Message Date
Snappy Team 15e2a0e13d Add "cc" clobbers to inline asm that modifies flags.
As far as we know, the lack of "cc" in the clobbers hasn't caused
problems yet, but it could.  This change is to improve correctness,
and is also almost certainly performance neutral.

PiperOrigin-RevId: 487133620
2023-01-12 13:33:01 +00:00
Snappy Team 6a2b78a379 Optimize Zippy compression for ARM by 5-10% by choosing csel instructions
PiperOrigin-RevId: 444863689
2022-05-09 16:19:11 +00:00
Victor Costan 7062d7f1d8 Merge pull request #133 from JunHe77:simd
PiperOrigin-RevId: 393681630
2021-08-30 01:36:24 +00:00
Victor Costan cbb83a1d64 Migrate feature detection macro checks from #ifdef to #if.
The #if predicate evaluates to false if the macro is undefined, or
defined to 0. #ifdef (and its synonym #if defined) evaluates to false
only if the macro is undefined.

The new setup allows differentiating between setting a macro to 0 (to
express that the capability definitely does not exist / should not be
used) and leaving a macro undefined (to express not knowing whether a
capability exists / not caring if a capability is used).

PiperOrigin-RevId: 391094241
2021-08-16 18:26:33 +00:00
Jun He ab9a57280d Fix SSE3 and BMI2 compile error
After SHUFFLE code blocks are refactored, "tmmintrin.h"
is missed, and bmi2 code part will have build failure
as type conflicts.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I7800cd7e050f4d349e5a227206b14b9c566e547f
2021-08-12 15:45:41 +08:00
Snappy Team 9cc3689b21 Optimize memset to pure SIMD because compilers generate consistently bad code. clang for ARM and gcc for x86 https://gcc.godbolt.org/z/oxeGG7aEx
PiperOrigin-RevId: 383467656
2021-08-02 14:49:57 +00:00
atdt b3fb0b5b4b Enable vector byte shuffle optimizations on ARM NEON
The SSSE3 intrinsics we use have their direct analogues in NEON, so making this optimization portable requires a very thin translation layer.

PiperOrigin-RevId: 381280165
2021-07-05 01:05:44 +00:00
Snappy Team 289c8a3c0a Make zippy decompression branchless
PiperOrigin-RevId: 342423961
2020-11-18 23:21:38 +00:00
Chris Kennelly 7ffaf77cf4 Replace ARCH_K8 with __x86_64__.
PiperOrigin-RevId: 321389098
2020-10-07 21:12:27 +00:00
Victor Costan 231b8be076 Migrate to standard integral types.
The following changes are done via find/replace.
* int8 -> int8_t
* int16 -> int16_t
* int32 -> int32_t
* int64 -> int64_t

The aliases were removed from snappy-stubs-public.h.

PiperOrigin-RevId: 306141557
2020-04-12 20:10:03 +00:00
Snappy Team d674348a0c Improve zippy with 5-10%.
BM_ZCord/0        [html   ]            1.26GB/s ± 0%           1.35GB/s ± 0%   +7.90%          (p=0.008 n=5+5)
BM_ZCord/1        [urls   ]             535MB/s ± 0%            562MB/s ± 0%   +5.05%          (p=0.008 n=5+5)
BM_ZCord/2        [jpg    ]            10.2GB/s ± 1%           10.2GB/s ± 0%     ~             (p=0.310 n=5+5)
BM_ZCord/3        [jpg_200]             841MB/s ± 1%            846MB/s ± 1%     ~             (p=0.421 n=5+5)
BM_ZCord/4        [pdf    ]            6.77GB/s ± 1%           7.06GB/s ± 1%   +4.28%          (p=0.008 n=5+5)
BM_ZCord/5        [html4  ]            1.00GB/s ± 0%           1.08GB/s ± 0%   +7.94%          (p=0.008 n=5+5)
BM_ZCord/6        [txt1   ]             391MB/s ± 0%            417MB/s ± 0%   +6.71%          (p=0.008 n=5+5)
BM_ZCord/7        [txt2   ]             363MB/s ± 0%            388MB/s ± 0%   +6.73%          (p=0.016 n=5+4)
BM_ZCord/8        [txt3   ]             400MB/s ± 0%            426MB/s ± 0%   +6.55%          (p=0.008 n=5+5)
BM_ZCord/9        [txt4   ]             328MB/s ± 0%            350MB/s ± 0%   +6.66%          (p=0.008 n=5+5)
BM_ZCord/10       [pb     ]            1.67GB/s ± 1%           1.80GB/s ± 0%   +7.52%          (p=0.008 n=5+5)

1) A key bottleneck in the data dependency chain is figuring out how many bytes are matched and loading the data for next hash value. The load-to-use latency is 5 cycles, in previous cl/303353110 we removed the load in lieu of "shrd" to align previous loads. Unfortunately "shrd" itself has a latency of 4 cycles, we'd prefer "shrx" which takes 1 cycle for variable shifts.
2)Maximally use data already computed. The above trick calculates 5 bytes of useful data. So in case we need to search for new match we can use this for the first search (which is one byte further).

PiperOrigin-RevId: 303875535
2020-04-11 04:41:15 +00:00
Snappy Team 4dfcad9f4e assertion failure on darwin_x86_64, have to investigage
PiperOrigin-RevId: 303428229
2020-04-11 04:41:07 +00:00
Snappy Team e19178748f assertion failure on darwin_x86_64, have to investigage
PiperOrigin-RevId: 303346402
2020-04-11 04:40:57 +00:00
Snappy Team 0faf56378e This cl does two things
1) It shaves of a few cycles from the data dependency chain. By using "shrd" instead of a load.
2) The important loop is finding small copies (4-12) which are either "copy 1", or "copy 2" depending if the offset fits <2048. It turns out that this is a branch that is mispredicted often. Due to the long dependency chain the CPU is running with IPC~1 anyway so we can freely add instructions to instead emit copies branchfree. This reduces the branch misspredicts from 15% to 11% (for BM_ZFlat/6 txt1) and from 5.6% to 4% (for BM_ZFlat/10 or pb).

PiperOrigin-RevId: 303328967
2020-04-11 04:40:48 +00:00
alkis 53a38e5e33 Reduce number of allocations when compressing and simplify the code.
Before we were allocating at least once: twice with large table and
thrice when we used a scratch buffer. With this approach we always
allocate once.

  name                                          old speed               new speed               delta
  BM_UFlat/0      [html             ]           2.45GB/s ± 0%           2.45GB/s ± 0%   -0.13%        (p=0.000 n=11+11)
  BM_UFlat/1      [urls             ]           1.19GB/s ± 0%           1.22GB/s ± 0%   +2.48%        (p=0.000 n=11+11)
  BM_UFlat/2      [jpg              ]           17.2GB/s ± 2%           17.3GB/s ± 1%     ~           (p=0.193 n=11+11)
  BM_UFlat/3      [jpg_200          ]           1.52GB/s ± 0%           1.51GB/s ± 0%   -0.78%         (p=0.000 n=10+9)
  BM_UFlat/4      [pdf              ]           12.5GB/s ± 1%           12.5GB/s ± 1%     ~             (p=0.881 n=9+9)
  BM_UFlat/5      [html4            ]           1.86GB/s ± 0%           1.86GB/s ± 0%     ~           (p=0.123 n=11+11)
  BM_UFlat/6      [txt1             ]            793MB/s ± 0%            799MB/s ± 0%   +0.78%         (p=0.000 n=11+9)
  BM_UFlat/7      [txt2             ]            739MB/s ± 0%            744MB/s ± 0%   +0.77%        (p=0.000 n=11+11)
  BM_UFlat/8      [txt3             ]            839MB/s ± 0%            845MB/s ± 0%   +0.71%        (p=0.000 n=11+11)
  BM_UFlat/9      [txt4             ]            678MB/s ± 0%            685MB/s ± 0%   +1.01%        (p=0.000 n=11+11)
  BM_UFlat/10     [pb               ]           3.08GB/s ± 0%           3.12GB/s ± 0%   +1.21%        (p=0.000 n=11+11)
  BM_UFlat/11     [gaviota          ]            975MB/s ± 0%            976MB/s ± 0%   +0.11%        (p=0.000 n=11+11)
  BM_UFlat/12     [cp               ]           1.73GB/s ± 1%           1.74GB/s ± 1%   +0.46%        (p=0.010 n=11+11)
  BM_UFlat/13     [c                ]           1.53GB/s ± 0%           1.53GB/s ± 0%     ~           (p=0.987 n=11+10)
  BM_UFlat/14     [lsp              ]           1.65GB/s ± 0%           1.63GB/s ± 1%   -1.04%        (p=0.000 n=11+11)
  BM_UFlat/15     [xls              ]           1.08GB/s ± 0%           1.15GB/s ± 0%   +6.12%        (p=0.000 n=10+11)
  BM_UFlat/16     [xls_200          ]            944MB/s ± 0%            920MB/s ± 3%   -2.51%         (p=0.000 n=9+11)
  BM_UFlat/17     [bin              ]           1.86GB/s ± 0%           1.87GB/s ± 0%   +0.68%        (p=0.000 n=10+11)
  BM_UFlat/18     [bin_200          ]           1.91GB/s ± 3%           1.92GB/s ± 5%     ~           (p=0.356 n=11+11)
  BM_UFlat/19     [sum              ]           1.31GB/s ± 0%           1.40GB/s ± 0%   +6.53%        (p=0.000 n=11+11)
  BM_UFlat/20     [man              ]           1.42GB/s ± 0%           1.42GB/s ± 0%   +0.33%        (p=0.000 n=10+10)
2019-01-04 19:07:49 -08:00
costan ad82620f6f Move pshufb_fill_patterns from snappy-internal.h to snappy.cc.
The array of constants is only used in the SSSE3 fast-path in IncrementalCopy.
2018-08-09 12:08:12 -07:00
atdt 8f469d97e2 Avoid store-forwarding stalls in Zippy's IncrementalCopy
NEW: Annotate `pattern` as initialized, for MSan.

Snappy's IncrementalCopy routine optimizes for speed by reading and writing
memory in blocks of eight or sixteen bytes. If the gap between the source
and destination pointers is smaller than eight bytes, snappy's strategy is
to expand the gap by issuing a series of partly-overlapping eight-byte
loads+stores. Because the range of each load partly overlaps that of the
store which preceded it, the store buffer cannot be forwarded to the load,
and the load stalls while it waits for the store to retire. This is called a
store-forwarding stall.

We can use fewer loads and avoid most of the stalls by loading the first
eight bytes into an 128-bit XMM register, then using PSHUFB to permute the
register's contents in-place into the desired repeating sequence of bytes.
When falling back to IncrementalCopySlow, use memset if the pattern size == 1.
This eliminates around 60% of the stalls.

name                       old time/op    new time/op    delta
BM_UFlat/0 [html]        48.6µs ± 0%    48.2µs ± 0%   -0.92%        (p=0.000 n=19+18)
BM_UFlat/1 [urls]         589µs ± 0%     576µs ± 0%   -2.17%        (p=0.000 n=19+18)
BM_UFlat/2 [jpg]         7.12µs ± 0%    7.10µs ± 0%     ~           (p=0.071 n=19+18)
BM_UFlat/3 [jpg_200]      162ns ± 0%     151ns ± 0%   -7.06%        (p=0.000 n=19+18)
BM_UFlat/4 [pdf]         8.25µs ± 0%    8.19µs ± 0%   -0.74%        (p=0.000 n=19+18)
BM_UFlat/5 [html4]        218µs ± 0%     218µs ± 0%   +0.09%        (p=0.000 n=17+18)
BM_UFlat/6 [txt1]         191µs ± 0%     189µs ± 0%   -1.12%        (p=0.000 n=19+18)
BM_UFlat/7 [txt2]         168µs ± 0%     167µs ± 0%   -1.01%        (p=0.000 n=19+18)
BM_UFlat/8 [txt3]         502µs ± 0%     499µs ± 0%   -0.52%        (p=0.000 n=19+18)
BM_UFlat/9 [txt4]         704µs ± 0%     695µs ± 0%   -1.26%        (p=0.000 n=19+18)
BM_UFlat/10 [pb]         45.6µs ± 0%    44.2µs ± 0%   -3.13%        (p=0.000 n=19+15)
BM_UFlat/11 [gaviota]     188µs ± 0%     194µs ± 0%   +3.06%        (p=0.000 n=15+18)
BM_UFlat/12 [cp]         15.1µs ± 2%    14.7µs ± 1%   -2.09%        (p=0.000 n=18+18)
BM_UFlat/13 [c]          7.38µs ± 0%    7.36µs ± 0%   -0.28%        (p=0.000 n=16+18)
BM_UFlat/14 [lsp]        2.31µs ± 0%    2.37µs ± 0%   +2.64%        (p=0.000 n=19+18)
BM_UFlat/15 [xls]         984µs ± 0%     909µs ± 0%   -7.59%        (p=0.000 n=19+18)
BM_UFlat/16 [xls_200]     215ns ± 0%     217ns ± 0%   +0.71%        (p=0.000 n=19+15)
BM_UFlat/17 [bin]         289µs ± 0%     287µs ± 0%   -0.71%        (p=0.000 n=19+18)
BM_UFlat/18 [bin_200]     161ns ± 0%     116ns ± 0%  -28.09%        (p=0.000 n=19+16)
BM_UFlat/19 [sum]        31.9µs ± 0%    29.2µs ± 0%   -8.37%        (p=0.000 n=19+18)
BM_UFlat/20 [man]        3.13µs ± 1%    3.07µs ± 0%   -1.79%        (p=0.000 n=19+18)

name                       old allocs/op  new allocs/op  delta
BM_UFlat/0 [html]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/1 [urls]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/2 [jpg]          0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/3 [jpg_200]      0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/4 [pdf]          0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/5 [html4]        0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/6 [txt1]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/7 [txt2]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/8 [txt3]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/9 [txt4]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/10 [pb]          0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/11 [gaviota]     0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/12 [cp]          0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/13 [c]           0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/14 [lsp]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/15 [xls]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/16 [xls_200]     0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/17 [bin]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/18 [bin_200]     0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/19 [sum]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/20 [man]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)

name                       old speed      new speed      delta
BM_UFlat/0 [html]      2.11GB/s ± 0%  2.13GB/s ± 0%   +0.92%        (p=0.000 n=19+18)
BM_UFlat/1 [urls]      1.19GB/s ± 0%  1.22GB/s ± 0%   +2.22%        (p=0.000 n=16+17)
BM_UFlat/2 [jpg]       17.3GB/s ± 0%  17.3GB/s ± 0%     ~           (p=0.074 n=19+18)
BM_UFlat/3 [jpg_200]   1.23GB/s ± 0%  1.33GB/s ± 0%   +7.58%        (p=0.000 n=19+18)
BM_UFlat/4 [pdf]       12.4GB/s ± 0%  12.5GB/s ± 0%   +0.74%        (p=0.000 n=19+18)
BM_UFlat/5 [html4]     1.88GB/s ± 0%  1.88GB/s ± 0%   -0.09%        (p=0.000 n=18+18)
BM_UFlat/6 [txt1]       798MB/s ± 0%   807MB/s ± 0%   +1.13%        (p=0.000 n=19+18)
BM_UFlat/7 [txt2]       743MB/s ± 0%   751MB/s ± 0%   +1.02%        (p=0.000 n=19+18)
BM_UFlat/8 [txt3]       850MB/s ± 0%   855MB/s ± 0%   +0.52%        (p=0.000 n=19+18)
BM_UFlat/9 [txt4]       684MB/s ± 0%   693MB/s ± 0%   +1.28%        (p=0.000 n=19+18)
BM_UFlat/10 [pb]       2.60GB/s ± 0%  2.69GB/s ± 0%   +3.25%        (p=0.000 n=19+16)
BM_UFlat/11 [gaviota]   979MB/s ± 0%   950MB/s ± 0%   -2.97%        (p=0.000 n=15+18)
BM_UFlat/12 [cp]       1.63GB/s ± 2%  1.67GB/s ± 1%   +2.13%        (p=0.000 n=18+18)
BM_UFlat/13 [c]        1.51GB/s ± 0%  1.52GB/s ± 0%   +0.29%        (p=0.000 n=16+18)
BM_UFlat/14 [lsp]      1.61GB/s ± 1%  1.57GB/s ± 0%   -2.57%        (p=0.000 n=19+18)
BM_UFlat/15 [xls]      1.05GB/s ± 0%  1.13GB/s ± 0%   +8.22%        (p=0.000 n=19+18)
BM_UFlat/16 [xls_200]   928MB/s ± 0%   921MB/s ± 0%   -0.81%        (p=0.000 n=19+17)
BM_UFlat/17 [bin]      1.78GB/s ± 0%  1.79GB/s ± 0%   +0.71%        (p=0.000 n=19+18)
BM_UFlat/18 [bin_200]  1.24GB/s ± 0%  1.72GB/s ± 0%  +38.92%        (p=0.000 n=19+18)
BM_UFlat/19 [sum]      1.20GB/s ± 0%  1.31GB/s ± 0%   +9.15%        (p=0.000 n=19+18)
BM_UFlat/20 [man]      1.35GB/s ± 1%  1.38GB/s ± 0%   +1.84%        (p=0.000 n=19+18)
2018-08-04 18:51:07 -07:00
costan 632cd0f128 Use 64-bit optimized code path for ARM64.
This is inspired by https://github.com/google/snappy/pull/22.

Benchmark results with the change, Pixel C with Android N2G48B

Benchmark            Time(ns)    CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0             119544     119253       1501 818.9MB/s  html
BM_UFlat/1            1223950    1208588        163 554.0MB/s  urls
BM_UFlat/2              16081      15962      11527 7.2GB/s  jpg
BM_UFlat/3                356        352     416666 540.6MB/s  jpg_200
BM_UFlat/4              25010      24860       7683 3.8GB/s  pdf
BM_UFlat/5             484832     481572        407 811.1MB/s  html4
BM_UFlat/6             408410     408713        482 354.9MB/s  txt1
BM_UFlat/7             361714     361663        553 330.1MB/s  txt2
BM_UFlat/8            1090582    1087912        182 374.1MB/s  txt3
BM_UFlat/9            1503127    1503759        133 305.6MB/s  txt4
BM_UFlat/10            114183     114285       1715 989.6MB/s  pb
BM_UFlat/11            406714     407331        491 431.5MB/s  gaviota
BM_UIOVec/0            370397     369888        538 264.0MB/s  html
BM_UIOVec/1           3207510    3190000        100 209.9MB/s  urls
BM_UIOVec/2             16589      16573      11223 6.9GB/s  jpg
BM_UIOVec/3              1052       1052     165289 181.2MB/s  jpg_200
BM_UIOVec/4             49151      49184       3985 1.9GB/s  pdf
BM_UValidate/0          68115      68095       2893 1.4GB/s  html
BM_UValidate/1         792652     792000        250 845.4MB/s  urls
BM_UValidate/2            334        334     487804 343.1GB/s  jpg
BM_UValidate/3            235        235     666666 809.9MB/s  jpg_200
BM_UValidate/4           6126       6130      32626 15.6GB/s  pdf
BM_ZFlat/0             292697     290560        678 336.1MB/s  html (22.31 %)
BM_ZFlat/1            4062080    4050000        100 165.3MB/s  urls (47.78 %)
BM_ZFlat/2              29225      29274       6422 3.9GB/s  jpg (99.95 %)
BM_ZFlat/3               1099       1098     163934 173.7MB/s  jpg_200 (73.00 %)
BM_ZFlat/4              44117      44233       4205 2.2GB/s  pdf (83.30 %)
BM_ZFlat/5            1158058    1157894        171 337.4MB/s  html4 (22.52 %)
BM_ZFlat/6            1102983    1093922        181 132.6MB/s  txt1 (57.88 %)
BM_ZFlat/7             974142     975490        204 122.4MB/s  txt2 (61.91 %)
BM_ZFlat/8            2984670    2990000        100 136.1MB/s  txt3 (54.99 %)
BM_ZFlat/9            4100130    4090000        100 112.4MB/s  txt4 (66.26 %)
BM_ZFlat/10            276236     275139        716 411.0MB/s  pb (19.68 %)
BM_ZFlat/11            760091     759541        262 231.4MB/s  gaviota (37.72 %)

Baseline benchmark results, Pixel C with Android N2G48B

Benchmark            Time(ns)    CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0             148957     147565       1335 661.8MB/s  html
BM_UFlat/1            1527257    1500000        132 446.4MB/s  urls
BM_UFlat/2              19589      19397       8764 5.9GB/s  jpg
BM_UFlat/3                425        418     408163 455.3MB/s  jpg_200
BM_UFlat/4              30096      29552       6497 3.2GB/s  pdf
BM_UFlat/5             595933     594594        333 657.0MB/s  html4
BM_UFlat/6             516315     514360        383 282.0MB/s  txt1
BM_UFlat/7             454653     453514        441 263.2MB/s  txt2
BM_UFlat/8            1382687    1361111        144 299.0MB/s  txt3
BM_UFlat/9            1967590    1904761        105 241.3MB/s  txt4
BM_UFlat/10            148271     144560       1342 782.3MB/s  pb
BM_UFlat/11            523997     510471        382 344.4MB/s  gaviota
BM_UIOVec/0            478443     465227        417 209.9MB/s  html
BM_UIOVec/1           4172860    4060000        100 164.9MB/s  urls
BM_UIOVec/2             21470      20975       7342 5.5GB/s  jpg
BM_UIOVec/3              1357       1330      75187 143.4MB/s  jpg_200
BM_UIOVec/4             63143      61365       3031 1.6GB/s  pdf
BM_UValidate/0          86910      85125       2279 1.1GB/s  html
BM_UValidate/1        1022256    1000000        195 669.6MB/s  urls
BM_UValidate/2            420        417     400000 274.6GB/s  jpg
BM_UValidate/3            311        302     571428 630.0MB/s  jpg_200
BM_UValidate/4           7778       7584      25445 12.6GB/s  pdf
BM_ZFlat/0             469209     457547        424 213.4MB/s  html (22.31 %)
BM_ZFlat/1            5633510    5460000        100 122.6MB/s  urls (47.78 %)
BM_ZFlat/2              37896      36693       4524 3.1GB/s  jpg (99.95 %)
BM_ZFlat/3               1485       1441     123456 132.3MB/s  jpg_200 (73.00 %)
BM_ZFlat/4              74870      72775       2652 1.3GB/s  pdf (83.30 %)
BM_ZFlat/5            1857321    1785714        112 218.8MB/s  html4 (22.52 %)
BM_ZFlat/6            1538723    1492307        130 97.2MB/s  txt1 (57.88 %)
BM_ZFlat/7            1338236    1310810        148 91.1MB/s  txt2 (61.91 %)
BM_ZFlat/8            4050820    4040000        100 100.7MB/s  txt3 (54.99 %)
BM_ZFlat/9            5234940    5230000        100 87.9MB/s  txt4 (66.26 %)
BM_ZFlat/10            400309     400000        495 282.7MB/s  pb (19.68 %)
BM_ZFlat/11           1063042    1058510        188 166.1MB/s  gaviota (37.72 %)
2017-08-16 19:18:22 -07:00
jueminyang 71b8f86887 Add SNAPPY_ prefix to PREDICT_{TRUE,FALSE} macros. 2017-08-01 14:36:26 -07:00
costan 038a3329b1 Inline DISALLOW_COPY_AND_ASSIGN.
snappy-stubs-public.h defined the DISALLOW_COPY_AND_ASSIGN macro, so the
definition propagated to all translation units that included the open
source headers. The macro is now inlined, thus avoiding polluting the
macro environment of snappy users.
2017-07-27 16:46:42 -07:00
alkis 18488d6212 Use 64 bit little endian on ppc64le.
This has tangible performance benefits.

This lands https://github.com/google/snappy/pull/27
2017-06-28 18:33:13 -07:00
costan ed3b7b242b Clean up unused function warnings in snappy. 2017-03-17 13:59:03 -07:00
Behzad Nouri 818b583387 adds std:: to stl types (#061) 2017-01-26 21:43:13 +01:00
Geoff Pike 38a5ec5fca Re-work fast path that emits copies in zippy compression.
The primary motivation for the change is that FindMatchLength is
likely to discover a difference in the first 8 bytes it compares.
If that occurs then we know the length of the match is less than 12,
because FindMatchLength is invoked after a 4-byte match is found.
When emitting a copy, it is useful to know that the length is less
than 12 because the two-byte variant of an emitted copy requires that.

This is a performance-tuning change that should not affect the
library's behavior.

With FDO on perflab/Haswell the geometric mean for ZFlat/* went from
47,290ns to 45,741ns, an improvement of 3.4%.

SAMPLE (before)

BM_ZFlat/0      102824     102650      40691 951.4MB/s  html (22.31 %)
BM_ZFlat/1     1293512    1290442       3225 518.9MB/s  urls (47.78 %)
BM_ZFlat/2       10373      10353     417959 11.1GB/s  jpg (99.95 %)
BM_ZFlat/3         268        268   15745324 712.4MB/s  jpg_200 (73.00 %)
BM_ZFlat/4       12137      12113     342462 7.9GB/s  pdf (83.30 %)
BM_ZFlat/5      430672     429720       9724 909.0MB/s  html4 (22.52 %)
BM_ZFlat/6      420541     419636       9833 345.6MB/s  txt1 (57.88 %)
BM_ZFlat/7      373829     373158      10000 319.9MB/s  txt2 (61.91 %)
BM_ZFlat/8     1119014    1116604       3755 364.5MB/s  txt3 (54.99 %)
BM_ZFlat/9     1544203    1540657       2748 298.3MB/s  txt4 (66.26 %)
BM_ZFlat/10      91041      90866      46002 1.2GB/s  pb (19.68 %)
BM_ZFlat/11     332766     331990      10000 529.5MB/s  gaviota (37.72 %)
BM_ZFlat/12      39960      39886     100000 588.3MB/s  cp (48.12 %)
BM_ZFlat/13      14493      14465     287181 735.1MB/s  c (42.47 %)
BM_ZFlat/14       4447       4440     947927 799.3MB/s  lsp (48.37 %)
BM_ZFlat/15    1316362    1313350       3196 747.7MB/s  xls (41.23 %)
BM_ZFlat/16        312        311   10000000 613.0MB/s  xls_200 (78.00 %)
BM_ZFlat/17     388471     387502      10000 1.2GB/s  bin (18.11 %)
BM_ZFlat/18         65         64   64838208 2.9GB/s  bin_200 (7.50 %)
BM_ZFlat/19      65900      65787      63099 554.3MB/s  sum (48.96 %)
BM_ZFlat/20       6188       6177     681951 652.6MB/s  man (59.21 %)

SAMPLE (after)

Benchmark     Time(ns)    CPU(ns) Iterations
--------------------------------------------
BM_ZFlat/0       99259      99044      42428 986.0MB/s  html (22.31 %)
BM_ZFlat/1     1257039    1255276       3341 533.4MB/s  urls (47.78 %)
BM_ZFlat/2       10044      10030     405781 11.4GB/s  jpg (99.95 %)
BM_ZFlat/3         268        267   15732282 713.3MB/s  jpg_200 (73.00 %)
BM_ZFlat/4       11675      11657     358629 8.2GB/s  pdf (83.30 %)
BM_ZFlat/5      420951     419818       9739 930.5MB/s  html4 (22.52 %)
BM_ZFlat/6      415460     414632      10000 349.8MB/s  txt1 (57.88 %)
BM_ZFlat/7      367191     366436      10000 325.8MB/s  txt2 (61.91 %)
BM_ZFlat/8     1098345    1096036       3819 371.3MB/s  txt3 (54.99 %)
BM_ZFlat/9     1508701    1505306       2758 305.3MB/s  txt4 (66.26 %)
BM_ZFlat/10      87195      87031      47289 1.3GB/s  pb (19.68 %)
BM_ZFlat/11     322338     321637      10000 546.5MB/s  gaviota (37.72 %)
BM_ZFlat/12      36739      36668     100000 639.9MB/s  cp (48.12 %)
BM_ZFlat/13      13646      13618     304009 780.9MB/s  c (42.47 %)
BM_ZFlat/14       4249       4240     992456 837.0MB/s  lsp (48.37 %)
BM_ZFlat/15    1262925    1260012       3314 779.4MB/s  xls (41.23 %)
BM_ZFlat/16        308        308   10000000 619.8MB/s  xls_200 (78.00 %)
BM_ZFlat/17     379750     378944      10000 1.3GB/s  bin (18.11 %)
BM_ZFlat/18         62         62   67443280 3.0GB/s  bin_200 (7.50 %)
BM_ZFlat/19      61706      61587      67645 592.1MB/s  sum (48.96 %)
BM_ZFlat/20       5968       5958     698974 676.6MB/s  man (59.21 %)
2017-01-26 21:39:39 +01:00
Steinar H. Gunderson 0852af7606 Move the logic from ComputeTable into the unit test, which means it's run
automatically together with the other tests, and also removes the stray
function ComputeTable() (which was never referenced by anything else
in the open-source version, causing compiler warnings for some)
out of the core library.

Fixes public issue 96.

A=sesse
R=sanjay
2015-08-19 11:37:51 +02:00
Steinar H. Gunderson 86eb8b152b Change a few branch annotations that profiling found to be wrong.
Overall performance is neutral or slightly positive.

Westmere (64-bit, opt):

Benchmark               Base (ns)  New (ns)                                Improvement
--------------------------------------------------------------------------------------
BM_UFlat/0                  73798     71464  1.3GB/s  html                    +3.3%
BM_UFlat/1                 715223    704318  953.5MB/s  urls                  +1.5%
BM_UFlat/2                   8137      8871  13.0GB/s  jpg                    -8.3%
BM_UFlat/3                    200       204  935.5MB/s  jpg_200               -2.0%
BM_UFlat/4                  21627     21281  4.5GB/s  pdf                     +1.6%
BM_UFlat/5                 302806    290350  1.3GB/s  html4                   +4.3%
BM_UFlat/6                 218920    219017  664.1MB/s  txt1                  -0.0%
BM_UFlat/7                 190437    191212  626.1MB/s  txt2                  -0.4%
BM_UFlat/8                 584192    580484  703.4MB/s  txt3                  +0.6%
BM_UFlat/9                 776537    779055  591.6MB/s  txt4                  -0.3%
BM_UFlat/10                 76056     72606  1.5GB/s  pb                      +4.8%
BM_UFlat/11                235962    239043  737.4MB/s  gaviota               -1.3%
BM_UFlat/12                 28049     28000  840.1MB/s  cp                    +0.2%
BM_UFlat/13                 12225     12021  886.9MB/s  c                     +1.7%
BM_UFlat/14                  3362      3544  1004.0MB/s  lsp                  -5.1%
BM_UFlat/15                937015    939206  1048.9MB/s  xls                  -0.2%
BM_UFlat/16                   236       233  823.1MB/s  xls_200               +1.3%
BM_UFlat/17                373170    361947  1.3GB/s  bin                     +3.1%
BM_UFlat/18                   264       264  725.5MB/s  bin_200               +0.0%
BM_UFlat/19                 42834     43577  839.2MB/s  sum                   -1.7%
BM_UFlat/20                  4770      4736  853.6MB/s  man                   +0.7%
BM_UValidate/0              39671     39944  2.4GB/s  html                    -0.7%
BM_UValidate/1             443391    443391  1.5GB/s  urls                    +0.0%
BM_UValidate/2                163       163  703.3GB/s  jpg                   +0.0%
BM_UValidate/3                113       112  1.7GB/s  jpg_200                 +0.9%
BM_UValidate/4               7555      7608  12.6GB/s  pdf                    -0.7%
BM_ZFlat/0                 157616    157568  621.5MB/s  html (22.31 %)        +0.0%
BM_ZFlat/1                1997290   2014486  333.4MB/s  urls (47.77 %)        -0.9%
BM_ZFlat/2                  23035     22237  5.2GB/s  jpg (99.95 %)           +3.6%
BM_ZFlat/3                    539       540  354.5MB/s  jpg_200 (73.00 %)     -0.2%
BM_ZFlat/4                  80709     81369  1.2GB/s  pdf (81.85 %)           -0.8%
BM_ZFlat/5                 639059    639220  613.0MB/s  html4 (22.51 %)       -0.0%
BM_ZFlat/6                 577203    583370  249.3MB/s  txt1 (57.87 %)        -1.1%
BM_ZFlat/7                 510887    516094  232.0MB/s  txt2 (61.93 %)        -1.0%
BM_ZFlat/8                1535843   1556973  262.2MB/s  txt3 (54.92 %)        -1.4%
BM_ZFlat/9                2070068   2102380  219.3MB/s  txt4 (66.22 %)        -1.5%
BM_ZFlat/10                152396    152148  745.5MB/s  pb (19.64 %)          +0.2%
BM_ZFlat/11                447367    445859  395.4MB/s  gaviota (37.72 %)     +0.3%
BM_ZFlat/12                 76375     76797  306.3MB/s  cp (48.12 %)          -0.5%
BM_ZFlat/13                 31518     31987  333.3MB/s  c (42.40 %)           -1.5%
BM_ZFlat/14                 10598     10827  328.6MB/s  lsp (48.37 %)         -2.1%
BM_ZFlat/15               1782243   1802728  546.5MB/s  xls (41.23 %)         -1.1%
BM_ZFlat/16                   526       539  355.0MB/s  xls_200 (78.00 %)     -2.4%
BM_ZFlat/17                598141    597311  822.1MB/s  bin (18.11 %)         +0.1%
BM_ZFlat/18                   121       120  1.6GB/s  bin_200 (7.50 %)        +0.8%
BM_ZFlat/19                109981    112173  326.0MB/s  sum (48.96 %)         -2.0%
BM_ZFlat/20                 14355     14575  277.4MB/s  man (59.36 %)         -1.5%
Sum of all benchmarks    33882722  33879325                                   +0.0%

Sandy Bridge (64-bit, opt):

Benchmark               Base (ns)  New (ns)                                Improvement
--------------------------------------------------------------------------------------
BM_UFlat/0                  43764     41600  2.3GB/s  html                    +5.2%
BM_UFlat/1                 517990    507058  1.3GB/s  urls                    +2.2%
BM_UFlat/2                   6625      5529  20.8GB/s  jpg                   +19.8%
BM_UFlat/3                    154       155  1.2GB/s  jpg_200                 -0.6%
BM_UFlat/4                  12795     11747  8.1GB/s  pdf                     +8.9%
BM_UFlat/5                 200335    193413  2.0GB/s  html4                   +3.6%
BM_UFlat/6                 156574    156426  929.2MB/s  txt1                  +0.1%
BM_UFlat/7                 137574    137464  870.4MB/s  txt2                  +0.1%
BM_UFlat/8                 422551    421603  967.4MB/s  txt3                  +0.2%
BM_UFlat/9                 577749    578985  795.6MB/s  txt4                  -0.2%
BM_UFlat/10                 42329     39362  2.8GB/s  pb                      +7.5%
BM_UFlat/11                170615    169751  1037.9MB/s  gaviota              +0.5%
BM_UFlat/12                 12800     12719  1.8GB/s  cp                      +0.6%
BM_UFlat/13                  6585      6579  1.6GB/s  c                       +0.1%
BM_UFlat/14                  2066      2044  1.7GB/s  lsp                     +1.1%
BM_UFlat/15                750861    746911  1.3GB/s  xls                     +0.5%
BM_UFlat/16                   188       192  996.0MB/s  xls_200               -2.1%
BM_UFlat/17                271622    264333  1.8GB/s  bin                     +2.8%
BM_UFlat/18                   208       207  923.6MB/s  bin_200               +0.5%
BM_UFlat/19                 24667     24845  1.4GB/s  sum                     -0.7%
BM_UFlat/20                  2663      2662  1.5GB/s  man                     +0.0%
BM_ZFlat/0                 115173    115624  846.5MB/s  html (22.31 %)        -0.4%
BM_ZFlat/1                1530331   1537769  436.5MB/s  urls (47.77 %)        -0.5%
BM_ZFlat/2                  17503     17013  6.8GB/s  jpg (99.95 %)           +2.9%
BM_ZFlat/3                    385       385  496.3MB/s  jpg_200 (73.00 %)     +0.0%
BM_ZFlat/4                  61753     61540  1.6GB/s  pdf (81.85 %)           +0.3%
BM_ZFlat/5                 484806    483356  810.1MB/s  html4 (22.51 %)       +0.3%
BM_ZFlat/6                 464143    467609  310.9MB/s  txt1 (57.87 %)        -0.7%
BM_ZFlat/7                 410315    413319  289.5MB/s  txt2 (61.93 %)        -0.7%
BM_ZFlat/8                1244082   1249381  326.5MB/s  txt3 (54.92 %)        -0.4%
BM_ZFlat/9                1696914   1709685  269.4MB/s  txt4 (66.22 %)        -0.7%
BM_ZFlat/10                104148    103372  1096.7MB/s  pb (19.64 %)         +0.8%
BM_ZFlat/11                363522    359722  489.8MB/s  gaviota (37.72 %)     +1.1%
BM_ZFlat/12                 47021     50095  469.3MB/s  cp (48.12 %)          -6.1%
BM_ZFlat/13                 16888     16985  627.4MB/s  c (42.40 %)           -0.6%
BM_ZFlat/14                  5496      5469  650.3MB/s  lsp (48.37 %)         +0.5%
BM_ZFlat/15               1460713   1448760  679.5MB/s  xls (41.23 %)         +0.8%
BM_ZFlat/16                   387       393  486.8MB/s  xls_200 (78.00 %)     -1.5%
BM_ZFlat/17                457654    451462  1086.6MB/s  bin (18.11 %)        +1.4%
BM_ZFlat/18                    97        87  2.1GB/s  bin_200 (7.50 %)       +11.5%
BM_ZFlat/19                 77904     80924  451.7MB/s  sum (48.96 %)         -3.7%
BM_ZFlat/20                  7648      7663  527.1MB/s  man (59.36 %)         -0.2%
Sum of all benchmarks    25493635  25482069                                   +0.0%

A=dehao
R=sesse
2015-06-22 16:09:56 +02:00
Steinar H. Gunderson 22acaf438e Change some internal path names.
This is mostly to sync up with some changes from Google's internal
repositories; it does not affect the open-source distribution in itself.
2015-06-22 15:39:08 +02:00
snappy.mirrorbot@gmail.com 8b95464146 Snappy library no longer depends on iostream.
Achieved by moving logging macro definitions to a test-only
header file, and by changing non-test code to use assert,
fprintf, and abort instead of LOG/CHECK macros.

R=sesse


git-svn-id: https://snappy.googlecode.com/svn/trunk@62 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-05-22 09:32:50 +00:00
snappy.mirrorbot@gmail.com f19fb07e6d Put back the final few lines of what was truncated during the
license header change.

R=csilvers
DELTA=5  (4 added, 0 deleted, 1 changed)


Revision created by MOE tool push_codebase.
MOE_MIGRATION=1094


git-svn-id: https://snappy.googlecode.com/svn/trunk@22 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-03-28 22:17:04 +00:00
snappy.mirrorbot@gmail.com 7e8ca8f831 Change on 2011-03-25 19:18:00-07:00 by sesse
Replace the Apache 2.0 license header by the BSD-type license header;
	somehow a lot of the files were missed in the last round.

	R=dannyb,csilvers
	DELTA=147  (74 added, 2 deleted, 71 changed)

Change on 2011-03-25 19:25:07-07:00 by sesse

	Unbreak the build; the relicensing removed a bit too much (only comments
	were intended, but I also accidentially removed some of the top lines of
	the actual source).



Revision created by MOE tool push_codebase.
MOE_MIGRATION=1072


git-svn-id: https://snappy.googlecode.com/svn/trunk@21 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-03-26 02:34:34 +00:00
snappy.mirrorbot@gmail.com 28a6440239 Revision created by MOE tool push_codebase.
MOE_MIGRATION=


git-svn-id: https://snappy.googlecode.com/svn/trunk@2 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-03-18 17:14:15 +00:00