Commit Graph

22 Commits

Author SHA1 Message Date
costan 632cd0f128 Use 64-bit optimized code path for ARM64.
This is inspired by https://github.com/google/snappy/pull/22.

Benchmark results with the change, Pixel C with Android N2G48B

Benchmark            Time(ns)    CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0             119544     119253       1501 818.9MB/s  html
BM_UFlat/1            1223950    1208588        163 554.0MB/s  urls
BM_UFlat/2              16081      15962      11527 7.2GB/s  jpg
BM_UFlat/3                356        352     416666 540.6MB/s  jpg_200
BM_UFlat/4              25010      24860       7683 3.8GB/s  pdf
BM_UFlat/5             484832     481572        407 811.1MB/s  html4
BM_UFlat/6             408410     408713        482 354.9MB/s  txt1
BM_UFlat/7             361714     361663        553 330.1MB/s  txt2
BM_UFlat/8            1090582    1087912        182 374.1MB/s  txt3
BM_UFlat/9            1503127    1503759        133 305.6MB/s  txt4
BM_UFlat/10            114183     114285       1715 989.6MB/s  pb
BM_UFlat/11            406714     407331        491 431.5MB/s  gaviota
BM_UIOVec/0            370397     369888        538 264.0MB/s  html
BM_UIOVec/1           3207510    3190000        100 209.9MB/s  urls
BM_UIOVec/2             16589      16573      11223 6.9GB/s  jpg
BM_UIOVec/3              1052       1052     165289 181.2MB/s  jpg_200
BM_UIOVec/4             49151      49184       3985 1.9GB/s  pdf
BM_UValidate/0          68115      68095       2893 1.4GB/s  html
BM_UValidate/1         792652     792000        250 845.4MB/s  urls
BM_UValidate/2            334        334     487804 343.1GB/s  jpg
BM_UValidate/3            235        235     666666 809.9MB/s  jpg_200
BM_UValidate/4           6126       6130      32626 15.6GB/s  pdf
BM_ZFlat/0             292697     290560        678 336.1MB/s  html (22.31 %)
BM_ZFlat/1            4062080    4050000        100 165.3MB/s  urls (47.78 %)
BM_ZFlat/2              29225      29274       6422 3.9GB/s  jpg (99.95 %)
BM_ZFlat/3               1099       1098     163934 173.7MB/s  jpg_200 (73.00 %)
BM_ZFlat/4              44117      44233       4205 2.2GB/s  pdf (83.30 %)
BM_ZFlat/5            1158058    1157894        171 337.4MB/s  html4 (22.52 %)
BM_ZFlat/6            1102983    1093922        181 132.6MB/s  txt1 (57.88 %)
BM_ZFlat/7             974142     975490        204 122.4MB/s  txt2 (61.91 %)
BM_ZFlat/8            2984670    2990000        100 136.1MB/s  txt3 (54.99 %)
BM_ZFlat/9            4100130    4090000        100 112.4MB/s  txt4 (66.26 %)
BM_ZFlat/10            276236     275139        716 411.0MB/s  pb (19.68 %)
BM_ZFlat/11            760091     759541        262 231.4MB/s  gaviota (37.72 %)

Baseline benchmark results, Pixel C with Android N2G48B

Benchmark            Time(ns)    CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0             148957     147565       1335 661.8MB/s  html
BM_UFlat/1            1527257    1500000        132 446.4MB/s  urls
BM_UFlat/2              19589      19397       8764 5.9GB/s  jpg
BM_UFlat/3                425        418     408163 455.3MB/s  jpg_200
BM_UFlat/4              30096      29552       6497 3.2GB/s  pdf
BM_UFlat/5             595933     594594        333 657.0MB/s  html4
BM_UFlat/6             516315     514360        383 282.0MB/s  txt1
BM_UFlat/7             454653     453514        441 263.2MB/s  txt2
BM_UFlat/8            1382687    1361111        144 299.0MB/s  txt3
BM_UFlat/9            1967590    1904761        105 241.3MB/s  txt4
BM_UFlat/10            148271     144560       1342 782.3MB/s  pb
BM_UFlat/11            523997     510471        382 344.4MB/s  gaviota
BM_UIOVec/0            478443     465227        417 209.9MB/s  html
BM_UIOVec/1           4172860    4060000        100 164.9MB/s  urls
BM_UIOVec/2             21470      20975       7342 5.5GB/s  jpg
BM_UIOVec/3              1357       1330      75187 143.4MB/s  jpg_200
BM_UIOVec/4             63143      61365       3031 1.6GB/s  pdf
BM_UValidate/0          86910      85125       2279 1.1GB/s  html
BM_UValidate/1        1022256    1000000        195 669.6MB/s  urls
BM_UValidate/2            420        417     400000 274.6GB/s  jpg
BM_UValidate/3            311        302     571428 630.0MB/s  jpg_200
BM_UValidate/4           7778       7584      25445 12.6GB/s  pdf
BM_ZFlat/0             469209     457547        424 213.4MB/s  html (22.31 %)
BM_ZFlat/1            5633510    5460000        100 122.6MB/s  urls (47.78 %)
BM_ZFlat/2              37896      36693       4524 3.1GB/s  jpg (99.95 %)
BM_ZFlat/3               1485       1441     123456 132.3MB/s  jpg_200 (73.00 %)
BM_ZFlat/4              74870      72775       2652 1.3GB/s  pdf (83.30 %)
BM_ZFlat/5            1857321    1785714        112 218.8MB/s  html4 (22.52 %)
BM_ZFlat/6            1538723    1492307        130 97.2MB/s  txt1 (57.88 %)
BM_ZFlat/7            1338236    1310810        148 91.1MB/s  txt2 (61.91 %)
BM_ZFlat/8            4050820    4040000        100 100.7MB/s  txt3 (54.99 %)
BM_ZFlat/9            5234940    5230000        100 87.9MB/s  txt4 (66.26 %)
BM_ZFlat/10            400309     400000        495 282.7MB/s  pb (19.68 %)
BM_ZFlat/11           1063042    1058510        188 166.1MB/s  gaviota (37.72 %)
2017-08-16 19:18:22 -07:00
costan 77c12adc19 Add unistd.h checks back to the CMake build.
getpagesize(), as well as its POSIX.2001 replacement
sysconf(_SC_PAGESIZE), is defined in <unistd.h>. On Linux and OS X,
including <sys/mman.h> is sufficient to get a definition for
getpagesize(). However, this is not true for the Android NDK. This CL
brings back the HAVE_UNISTD_H definition and its associated header
check.

This also adds a HAVE_FUNC_SYSCONF definition, which checks for the
presence of sysconf(). The definition can be used later to replace
getpagesize() with sysconf().
2017-08-02 10:56:06 -07:00
costan f0d3237c32 Use _BitScanForward and _BitScanReverse on MSVC.
Based on https://github.com/google/snappy/pull/30
2017-08-01 14:38:02 -07:00
jueminyang 71b8f86887 Add SNAPPY_ prefix to PREDICT_{TRUE,FALSE} macros. 2017-08-01 14:36:26 -07:00
costan 038a3329b1 Inline DISALLOW_COPY_AND_ASSIGN.
snappy-stubs-public.h defined the DISALLOW_COPY_AND_ASSIGN macro, so the
definition propagated to all translation units that included the open
source headers. The macro is now inlined, thus avoiding polluting the
macro environment of snappy users.
2017-07-27 16:46:42 -07:00
alkis 18488d6212 Use 64 bit little endian on ppc64le.
This has tangible performance benefits.

This lands https://github.com/google/snappy/pull/27
2017-06-28 18:33:13 -07:00
costan 8b60aac4fd Remove "using namespace std;" from zippy-stubs-internal.h.
This makes it easier to build zippy, as some compiles require a warning
suppression to accept "using namespace std".
2017-03-13 13:03:01 -07:00
alkis 597fa795de Delete UnalignedCopy64 from snappy-stubs since the version in snappy.cc is more robust and possibly faster (assuming the compiler knows how to best copy 8 bytes between locations in memory the fastest way possible - a rather safe bet). 2017-03-08 11:42:30 -08:00
Steinar H. Gunderson 0800b1e4c7 Work around an issue where some compilers interpret <:: as a trigraph.
Also correct the namespace name.
2016-01-08 15:05:44 +01:00
Steinar H. Gunderson e7d2818d1e Unbreak the open-source build for ARM due to missing ATTRIBUTE_PACKED
declaration.
2016-01-08 11:40:06 +01:00
Steinar H. Gunderson ef5598aa0e Make UNALIGNED_LOAD16/32 on ARMv7 go through an explicitly unaligned struct,
to avoid the compiler coalescing multiple loads into a single load instruction
(which only work for aligned accesses).

A typical example where GCC would coalesce:

  uint8* p = ...;
  uint32 a = UNALIGNED_LOAD32(p);
  uint32 b = UNALIGNED_LOAD32(p + 4);
  uint32 c = a | b;
2016-01-04 12:51:31 +01:00
Steinar H. Gunderson 22acaf438e Change some internal path names.
This is mostly to sync up with some changes from Google's internal
repositories; it does not affect the open-source distribution in itself.
2015-06-22 15:39:08 +02:00
snappy.mirrorbot@gmail.com 3ec60ac987 Mark ARMv4 as not supporting unaligned accesses (not just ARMv5 and ARMv6);
apparently Debian still targets these by default, giving us segfaults on
armel.

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@64 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-07-04 09:28:33 +00:00
snappy.mirrorbot@gmail.com 8b95464146 Snappy library no longer depends on iostream.
Achieved by moving logging macro definitions to a test-only
header file, and by changing non-test code to use assert,
fprintf, and abort instead of LOG/CHECK macros.

R=sesse


git-svn-id: https://snappy.googlecode.com/svn/trunk@62 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-05-22 09:32:50 +00:00
snappy.mirrorbot@gmail.com f8829ea39d Enable the use of unaligned loads and stores for ARM-based architectures
where they are available (ARMv7 and higher). This gives a significant 
speed boost on ARM, both for compression and decompression. 
It should not affect x86 at all. 
 
There are more changes possible to speed up ARM, but it might not be 
that easy to do without hurting x86 or making the code uglier. 
Also, we de not try to use NEON yet. 
 
Microbenchmark results on a Cortex-A9 1GHz, using g++ 4.6.2 (from Ubuntu/Linaro), 
-O2 -DNDEBUG -Wa,-march=armv7a -mtune=cortex-a9 -mthumb-interwork: 
 
Benchmark            Time(ns)    CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0             524806     529100        378 184.6MB/s  html            [+33.6%]
BM_UFlat/1            5139790    5200000        100 128.8MB/s  urls            [+28.8%]
BM_UFlat/2              86540      84166       1901 1.4GB/s  jpg               [ +0.6%]
BM_UFlat/3             215351     210176        904 428.0MB/s  pdf             [+29.8%]
BM_UFlat/4            2144490    2100000        100 186.0MB/s  html4           [+33.3%]
BM_UFlat/5             194482     190000       1000 123.5MB/s  cp              [+36.2%]
BM_UFlat/6              91843      90175       2107 117.9MB/s  c               [+38.6%]
BM_UFlat/7              28535      28426       6684 124.8MB/s  lsp             [+34.7%]
BM_UFlat/8            9206600    9200000        100 106.7MB/s  xls             [+42.4%]
BM_UFlat/9            1865273    1886792        106 76.9MB/s  txt1             [+32.5%]
BM_UFlat/10           1576809    1587301        126 75.2MB/s  txt2             [+32.3%]
BM_UFlat/11           4968450    4900000        100 83.1MB/s  txt3             [+32.7%]
BM_UFlat/12           6673970    6700000        100 68.6MB/s  txt4             [+32.8%]
BM_UFlat/13           2391470    2400000        100 203.9MB/s  bin             [+29.2%]
BM_UFlat/14            334601     344827        522 105.8MB/s  sum             [+30.6%]
BM_UFlat/15             37404      38080       5252 105.9MB/s  man             [+33.8%]
BM_UFlat/16            535470     540540        370 209.2MB/s  pb              [+31.2%]
BM_UFlat/17           1875245    1886792        106 93.2MB/s  gaviota          [+37.8%]
BM_UValidate/0         178425     179533       1114 543.9MB/s  html            [ +2.7%]
BM_UValidate/1        2100450    2000000        100 334.8MB/s  urls            [ +5.0%]
BM_UValidate/2           1039       1044     172413 113.3GB/s  jpg             [ +3.4%]
BM_UValidate/3          59423      59470       3363 1.5GB/s  pdf               [ +7.8%]
BM_UValidate/4         760716     766283        261 509.8MB/s  html4           [ +6.5%]
BM_ZFlat/0            1204632    1204819        166 81.1MB/s  html (23.57 %)   [+32.8%]
BM_ZFlat/1           15656190   15600000        100 42.9MB/s  urls (50.89 %)   [+27.6%]
BM_ZFlat/2             403336     410677        487 294.8MB/s  jpg (99.88 %)   [+16.5%]
BM_ZFlat/3             664073     671140        298 134.0MB/s  pdf (82.13 %)   [+28.4%]
BM_ZFlat/4            4961940    4900000        100 79.7MB/s  html4 (23.55 %)  [+30.6%]
BM_ZFlat/5             500664     501253        399 46.8MB/s  cp (48.12 %)     [+33.4%]
BM_ZFlat/6             217276     215982        926 49.2MB/s  c (42.40 %)      [+25.0%]
BM_ZFlat/7              64122      65487       3054 54.2MB/s  lsp (48.37 %)    [+36.1%]
BM_ZFlat/8           18045730   18000000        100 54.6MB/s  xls (41.34 %)    [+34.4%]
BM_ZFlat/9            4051530    4000000        100 36.3MB/s  txt1 (59.81 %)   [+25.0%]
BM_ZFlat/10           3451800    3500000        100 34.1MB/s  txt2 (64.07 %)   [+25.7%]
BM_ZFlat/11          11052340   11100000        100 36.7MB/s  txt3 (57.11 %)   [+24.3%]
BM_ZFlat/12          14538690   14600000        100 31.5MB/s  txt4 (68.35 %)   [+24.7%]
BM_ZFlat/13           5041850    5000000        100 97.9MB/s  bin (18.21 %)    [+32.0%]
BM_ZFlat/14            908840     909090        220 40.1MB/s  sum (51.88 %)    [+22.2%]
BM_ZFlat/15             86921      86206       1972 46.8MB/s  man (59.36 %)    [+42.2%]
BM_ZFlat/16           1312315    1315789        152 86.0MB/s  pb (23.15 %)     [+34.5%]
BM_ZFlat/17           3173120    3200000        100 54.9MB/s  gaviota (38.27%) [+28.1%]


The move from 64-bit to 32-bit operations for the copies also affected 32-bit x86;
positive on the decompression side, and slightly negative on the compression side
(unless that is noise; I only ran once):

Benchmark              Time(ns)    CPU(ns) Iterations
-----------------------------------------------------
BM_UFlat/0                86279      86140       7778 1.1GB/s  html             [ +7.5%]
BM_UFlat/1               839265     822622        778 813.9MB/s  urls           [ +9.4%]
BM_UFlat/2                 9180       9143      87500 12.9GB/s  jpg             [ +1.2%]
BM_UFlat/3                35080      35000      20000 2.5GB/s  pdf              [+10.1%]
BM_UFlat/4               350318     345000       2000 1.1GB/s  html4            [ +7.0%]
BM_UFlat/5                33808      33472      21212 701.0MB/s  cp             [ +9.0%]
BM_UFlat/6                15201      15214      46667 698.9MB/s  c              [+14.9%]
BM_UFlat/7                 4652       4651     159091 762.9MB/s  lsp            [ +7.5%]
BM_UFlat/8              1285551    1282528        538 765.7MB/s  xls            [+10.7%]
BM_UFlat/9               282510     281690       2414 514.9MB/s  txt1           [+13.6%]
BM_UFlat/10              243494     239286       2800 498.9MB/s  txt2           [+14.4%]
BM_UFlat/11              743625     740000       1000 550.0MB/s  txt3           [+14.3%]
BM_UFlat/12              999441     989717        778 464.3MB/s  txt4           [+16.1%]
BM_UFlat/13              412402     410076       1707 1.2GB/s  bin              [ +7.3%]
BM_UFlat/14               54876      54000      10000 675.3MB/s  sum            [+13.0%]
BM_UFlat/15                6146       6100     100000 660.8MB/s  man            [+14.8%]
BM_UFlat/16               90496      90286       8750 1.2GB/s  pb               [ +4.0%]
BM_UFlat/17              292650     292000       2500 602.0MB/s  gaviota        [+18.1%]
BM_UValidate/0            49620      49699      14286 1.9GB/s  html             [ +0.0%]
BM_UValidate/1           501371     500000       1000 1.3GB/s  urls             [ +0.0%]
BM_UValidate/2              232        227    3043478 521.5GB/s  jpg            [ +1.3%]
BM_UValidate/3            17250      17143      43750 5.1GB/s  pdf              [ -1.3%]
BM_UValidate/4           198643     200000       3500 1.9GB/s  html4            [ -0.9%]
BM_ZFlat/0               227128     229415       3182 425.7MB/s  html (23.57 %) [ -1.4%]
BM_ZFlat/1              2970089    2960000        250 226.2MB/s  urls (50.89 %) [ -1.9%]
BM_ZFlat/2                45683      44999      15556 2.6GB/s  jpg (99.88 %)    [ +2.2%]
BM_ZFlat/3               114661     113136       6364 795.1MB/s  pdf (82.13 %)  [ -1.5%]
BM_ZFlat/4               919702     914286        875 427.2MB/s  html4 (23.55%) [ -1.3%]
BM_ZFlat/5               108189     108422       6364 216.4MB/s  cp (48.12 %)   [ -1.2%]
BM_ZFlat/6                44525      44000      15909 241.7MB/s  c (42.40 %)    [ -2.9%]
BM_ZFlat/7                15973      15857      46667 223.8MB/s  lsp (48.37 %)  [ +0.0%]
BM_ZFlat/8              2677888    2639405        269 372.1MB/s  xls (41.34 %)  [ -1.4%]
BM_ZFlat/9               800715     780000       1000 186.0MB/s  txt1 (59.81 %) [ -0.4%]
BM_ZFlat/10              700089     700000       1000 170.5MB/s  txt2 (64.07 %) [ -2.9%]
BM_ZFlat/11             2159356    2138365        318 190.3MB/s  txt3 (57.11 %) [ -0.3%]
BM_ZFlat/12             2796143    2779923        259 165.3MB/s  txt4 (68.35 %) [ -1.4%]
BM_ZFlat/13              856458     835476        778 585.8MB/s  bin (18.21 %)  [ -0.1%]
BM_ZFlat/14              166908     166857       4375 218.6MB/s  sum (51.88 %)  [ -1.4%]
BM_ZFlat/15               21181      20857      35000 193.3MB/s  man (59.36 %)  [ -0.8%]
BM_ZFlat/16              244009     239973       2917 471.3MB/s  pb (23.15 %)   [ -1.4%]
BM_ZFlat/17              596362     590000       1000 297.9MB/s  gaviota (38.27%) [ +0.0%]

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@59 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-02-21 17:02:17 +00:00
snappy.mirrorbot@gmail.com d9068ee301 Fix public issue r57: Fix most warnings with -Wall, mostly signed/unsigned
warnings. There are still some in the unit test, but the main .cc file should
be clean. We haven't enabled -Wall for the default build, since the unit test
is still not clean.

This also fixes a real bug in the open-source implementation of
ReadFileToStringOrDie(); it would not detect errors correctly.

I had to go through some pains to avoid performance loss as the types
were changed; I think there might still be some with 32-bit if and only if LFS
is enabled (ie., size_t is 64-bit), but for regular 32-bit and 64-bit I can't
see any losses, and I've diffed the generated GCC assembler between the old and
new code without seeing any significant choices. If anything, it's ever so
slightly faster.

This may or may not enable compression of very large blocks (>2^32 bytes)
when size_t is 64-bit, but I haven't checked, and it is still not a supported
case.


git-svn-id: https://snappy.googlecode.com/svn/trunk@56 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-01-04 13:10:46 +00:00
snappy.mirrorbot@gmail.com e2e3032868 Fix public issue #50: Include generic byteswap macros.
Also include Solaris 10 and FreeBSD versions.

R=csilvers


git-svn-id: https://snappy.googlecode.com/svn/trunk@49 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-09-15 09:50:05 +00:00
snappy.mirrorbot@gmail.com f1063a5dc4 Use the right #ifdef test for sys/mman.h.
Based on patch by Travis Vitek.


git-svn-id: https://snappy.googlecode.com/svn/trunk@47 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-08-10 18:44:16 +00:00
snappy.mirrorbot@gmail.com 3dd93f3ec7 Fix public issue #27: Add HAVE_CONFIG_H tests around the config.h
inclusion in snappy-stubs-internal.h, which eases compiling outside the
automake/autoconf framework.

R=csilvers
DELTA=5  (4 added, 1 deleted, 0 changed)


Revision created by MOE tool push_codebase.
MOE_MIGRATION=1152


git-svn-id: https://snappy.googlecode.com/svn/trunk@25 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-03-30 20:27:53 +00:00
snappy.mirrorbot@gmail.com b4bbc1041b Change Snappy from the Apache 2.0 to a BSD-type license.
R=dannyb
DELTA=328  (80 added, 184 deleted, 64 changed)


Revision created by MOE tool push_codebase.
MOE_MIGRATION=1061


git-svn-id: https://snappy.googlecode.com/svn/trunk@20 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-03-25 16:14:41 +00:00
snappy.mirrorbot@gmail.com 2e182e9bb8 Make the unit test work on systems without mmap(). This is required for,
among others, Windows support. For Windows in specific, we could have used
CreateFileMapping/MapViewOfFile, but this should at least get us a bit closer
to compiling, and is of course also relevant for embedded systems with no MMU.

(Part 1/2)

R=csilvers
DELTA=9  (8 added, 0 deleted, 1 changed)


Revision created by MOE tool push_codebase.
MOE_MIGRATION=1031


git-svn-id: https://snappy.googlecode.com/svn/trunk@15 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-03-24 19:12:27 +00:00
snappy.mirrorbot@gmail.com 28a6440239 Revision created by MOE tool push_codebase.
MOE_MIGRATION=


git-svn-id: https://snappy.googlecode.com/svn/trunk@2 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-03-18 17:14:15 +00:00