Commit Graph

81 Commits

Author SHA1 Message Date
snappy.mirrorbot@gmail.com eeead8dc38 Release Snappy 1.1.1.
R=jeff


git-svn-id: https://snappy.googlecode.com/svn/trunk@81 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2013-10-15 15:21:31 +00:00
snappy.mirrorbot@gmail.com 6bc39e24c7 Add autoconf tests for size_t and ssize_t. Sort-of resolves public issue 79;
it would solve the problem if MSVC typically used autoconf. However, it gives
a natural place (config.h) to put the typedef even for MSVC.

R=jsbell


git-svn-id: https://snappy.googlecode.com/svn/trunk@80 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2013-08-13 12:55:00 +00:00
snappy.mirrorbot@gmail.com 7c3c01df77 When we compare the number of bytes produced with the offset for a
backreference, make the signedness of the bytes produced clear,
by sticking it into a size_t. This avoids a signed/unsigned compare
warning from MSVC (public issue 71), and also is slightly clearer.

Since the line is now so long the explanatory comment about the -1u
trick has to go somewhere else anyway, I used the opportunity to
explain it in slightly more detail.

This is a purely stylistic change; the emitted assembler from GCC
is identical.

R=jeff


git-svn-id: https://snappy.googlecode.com/svn/trunk@79 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2013-07-29 11:06:44 +00:00
snappy.mirrorbot@gmail.com 2f0aaf8631 In the fast path for decompressing literals, instead of checking
whether there's 16 bytes free and then checking right afterwards
(when having subtracted the literal size) that there are now 
5 bytes free, just check once for 21 bytes. This skips a compare
and a branch; although it is easily predictable, it is still
a few cycles on a fast path that we would like to get rid of.

Benchmarking this yields very confusing results. On open-source
GCC 4.8.1 on Haswell, we get exactly the expected results; the
benchmarks where we hit the fast path for literals (in particular
the two HTML benchmarks and the protobuf benchmark) give very nice
speedups, and the others are not really affected.

However, benchmarks with Google's GCC branch on other hardware
is much less clear. It seems that we have a weak loss in some cases
(and the win for the “typical” win cases are not nearly as clear),
but that it depends on microarchitecture and plain luck in how we run
the benchmark. Looking at the generated assembler, it seems that
the removal of the if causes other large-scale changes in how the
function is laid out, which makes it likely that this is just bad luck.

Thus, we should keep this change, even though its exact current impact is
unclear; it's a sensible change per se, and dropping it on the basis of
microoptimization for a given compiler (or even branch of a compiler)
would seem like a bad strategy in the long run.

Microbenchmark results (all in 64-bit, opt mode):

  Nehalem, Google GCC:

  Benchmark                Base (ns)  New (ns)                       Improvement
  ------------------------------------------------------------------------------
  BM_UFlat/0                   76747     75591  1.3GB/s  html           +1.5%
  BM_UFlat/1                  765756    757040  886.3MB/s  urls         +1.2%
  BM_UFlat/2                   10867     10893  10.9GB/s  jpg           -0.2%
  BM_UFlat/3                     124       131  1.4GB/s  jpg_200        -5.3%
  BM_UFlat/4                   31663     31596  2.8GB/s  pdf            +0.2%
  BM_UFlat/5                  314162    308176  1.2GB/s  html4          +1.9%
  BM_UFlat/6                   29668     29746  790.6MB/s  cp           -0.3%
  BM_UFlat/7                   12958     13386  796.4MB/s  c            -3.2%
  BM_UFlat/8                    3596      3682  966.0MB/s  lsp          -2.3%
  BM_UFlat/9                 1019193   1033493  953.3MB/s  xls          -1.4%
  BM_UFlat/10                    239       247  775.3MB/s  xls_200      -3.2%
  BM_UFlat/11                 236411    240271  606.9MB/s  txt1         -1.6%
  BM_UFlat/12                 206639    209768  571.2MB/s  txt2         -1.5%
  BM_UFlat/13                 627803    635722  641.4MB/s  txt3         -1.2%
  BM_UFlat/14                 845932    857816  538.2MB/s  txt4         -1.4%
  BM_UFlat/15                 402107    391670  1.2GB/s  bin            +2.7%
  BM_UFlat/16                    283       279  683.6MB/s  bin_200      +1.4%
  BM_UFlat/17                  46070     46815  781.5MB/s  sum          -1.6%
  BM_UFlat/18                   5053      5163  782.0MB/s  man          -2.1%
  BM_UFlat/19                  79721     76581  1.4GB/s  pb             +4.1%
  BM_UFlat/20                 251158    252330  697.5MB/s  gaviota      -0.5%
  Sum of all benchmarks      4966150   4980396                          -0.3%


  Sandy Bridge, Google GCC:
  
  Benchmark                Base (ns)  New (ns)                       Improvement
  ------------------------------------------------------------------------------
  BM_UFlat/0                   42850     42182  2.3GB/s  html           +1.6%
  BM_UFlat/1                  525660    515816  1.3GB/s  urls           +1.9%
  BM_UFlat/2                    7173      7283  16.3GB/s  jpg           -1.5%
  BM_UFlat/3                      92        91  2.1GB/s  jpg_200        +1.1%
  BM_UFlat/4                   15147     14872  5.9GB/s  pdf            +1.8%
  BM_UFlat/5                  199936    192116  2.0GB/s  html4          +4.1%
  BM_UFlat/6                   12796     12443  1.8GB/s  cp             +2.8%
  BM_UFlat/7                    6588      6400  1.6GB/s  c              +2.9%
  BM_UFlat/8                    2010      1951  1.8GB/s  lsp            +3.0%
  BM_UFlat/9                  761124    763049  1.3GB/s  xls            -0.3%
  BM_UFlat/10                    186       189  1016.1MB/s  xls_200     -1.6%
  BM_UFlat/11                 159354    158460  918.6MB/s  txt1         +0.6%
  BM_UFlat/12                 139732    139950  856.1MB/s  txt2         -0.2%
  BM_UFlat/13                 429917    425027  961.7MB/s  txt3         +1.2%
  BM_UFlat/14                 585255    587324  785.8MB/s  txt4         -0.4%
  BM_UFlat/15                 276186    266173  1.8GB/s  bin            +3.8%
  BM_UFlat/16                    205       207  925.5MB/s  bin_200      -1.0%
  BM_UFlat/17                  24925     24935  1.4GB/s  sum            -0.0%
  BM_UFlat/18                   2632      2576  1.5GB/s  man            +2.2%
  BM_UFlat/19                  40546     39108  2.8GB/s  pb             +3.7%
  BM_UFlat/20                 175803    168209  1048.9MB/s  gaviota     +4.5%
  Sum of all benchmarks      3408117   3368361                          +1.2%


  Haswell, upstream GCC 4.8.1:

  Benchmark                Base (ns)  New (ns)                       Improvement
  ------------------------------------------------------------------------------
  BM_UFlat/0                   46308     40641  2.3GB/s  html          +13.9%
  BM_UFlat/1                  513385    514706  1.3GB/s  urls           -0.3%
  BM_UFlat/2                    6197      6151  19.2GB/s  jpg           +0.7%
  BM_UFlat/3                      61        61  3.0GB/s  jpg_200        +0.0%
  BM_UFlat/4                   13551     13429  6.5GB/s  pdf            +0.9%
  BM_UFlat/5                  198317    190243  2.0GB/s  html4          +4.2%
  BM_UFlat/6                   14768     12560  1.8GB/s  cp            +17.6%
  BM_UFlat/7                    6453      6447  1.6GB/s  c              +0.1%
  BM_UFlat/8                    1991      1980  1.8GB/s  lsp            +0.6%
  BM_UFlat/9                  766947    770424  1.2GB/s  xls            -0.5%
  BM_UFlat/10                    170       169  1.1GB/s  xls_200        +0.6%
  BM_UFlat/11                 164350    163554  888.7MB/s  txt1         +0.5%
  BM_UFlat/12                 145444    143830  832.1MB/s  txt2         +1.1%
  BM_UFlat/13                 437849    438413  929.2MB/s  txt3         -0.1%
  BM_UFlat/14                 603587    605309  759.8MB/s  txt4         -0.3%
  BM_UFlat/15                 249799    248067  1.9GB/s  bin            +0.7%
  BM_UFlat/16                    191       188  1011.4MB/s  bin_200     +1.6%
  BM_UFlat/17                  26064     24778  1.4GB/s  sum            +5.2%
  BM_UFlat/18                   2620      2601  1.5GB/s  man            +0.7%
  BM_UFlat/19                  44551     37373  3.0GB/s  pb            +19.2%
  BM_UFlat/20                 165408    164584  1.0GB/s  gaviota        +0.5%
  Sum of all benchmarks      3408011   3385508                          +0.7%


git-svn-id: https://snappy.googlecode.com/svn/trunk@78 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2013-06-30 19:24:03 +00:00
snappy.mirrorbot@gmail.com 062bf544a6 Make the two IncrementalCopy* functions take in an ssize_t instead of a len,
in order to avoid having to do 32-to-64-bit signed conversions on a hot path
during decompression. (Also fixes some MSVC warnings, mentioned in public
issue 75, but more of those remain.) They cannot be size_t because we expect
them to go negative and test for that.

This saves a few movzwl instructions, yielding ~2% speedup in decompression.


Sandy Bridge:

Benchmark                          Base (ns)  New (ns)                                Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0                             48009     41283  2.3GB/s  html                   +16.3%
BM_UFlat/1                            531274    513419  1.3GB/s  urls                    +3.5%
BM_UFlat/2                              7378      7062  16.8GB/s  jpg                    +4.5%
BM_UFlat/3                                92        92  2.0GB/s  jpg_200                 +0.0%
BM_UFlat/4                             15057     14974  5.9GB/s  pdf                     +0.6%
BM_UFlat/5                            204323    193140  2.0GB/s  html4                   +5.8%
BM_UFlat/6                             13282     12611  1.8GB/s  cp                      +5.3%
BM_UFlat/7                              6511      6504  1.6GB/s  c                       +0.1%
BM_UFlat/8                              2014      2030  1.7GB/s  lsp                     -0.8%
BM_UFlat/9                            775909    768336  1.3GB/s  xls                     +1.0%
BM_UFlat/10                              182       184  1043.2MB/s  xls_200              -1.1%
BM_UFlat/11                           167352    161630  901.2MB/s  txt1                  +3.5%
BM_UFlat/12                           147393    142246  842.8MB/s  txt2                  +3.6%
BM_UFlat/13                           449960    432853  944.4MB/s  txt3                  +4.0%
BM_UFlat/14                           620497    594845  775.9MB/s  txt4                  +4.3%
BM_UFlat/15                           265610    267356  1.8GB/s  bin                     -0.7%
BM_UFlat/16                              206       205  932.7MB/s  bin_200               +0.5%
BM_UFlat/17                            25561     24730  1.4GB/s  sum                     +3.4%
BM_UFlat/18                             2620      2644  1.5GB/s  man                     -0.9%
BM_UFlat/19                            45766     38589  2.9GB/s  pb                     +18.6%
BM_UFlat/20                           171107    169832  1039.5MB/s  gaviota              +0.8%
Sum of all benchmarks                3500103   3394565                                   +3.1%


Westmere:

Benchmark                          Base (ns)  New (ns)                                Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0                             72624     71526  1.3GB/s  html                    +1.5%
BM_UFlat/1                            735821    722917  930.8MB/s  urls                  +1.8%
BM_UFlat/2                             10450     10172  11.7GB/s  jpg                    +2.7%
BM_UFlat/3                               117       117  1.6GB/s  jpg_200                 +0.0%
BM_UFlat/4                             29817     29648  3.0GB/s  pdf                     +0.6%
BM_UFlat/5                            297126    293073  1.3GB/s  html4                   +1.4%
BM_UFlat/6                             28252     27994  842.0MB/s  cp                    +0.9%
BM_UFlat/7                             12672     12391  862.1MB/s  c                     +2.3%
BM_UFlat/8                              3507      3425  1040.9MB/s  lsp                  +2.4%
BM_UFlat/9                           1004268    969395  1018.0MB/s  xls                  +3.6%
BM_UFlat/10                              233       227  844.8MB/s  xls_200               +2.6%
BM_UFlat/11                           230054    224981  647.8MB/s  txt1                  +2.3%
BM_UFlat/12                           201229    196447  610.5MB/s  txt2                  +2.4%
BM_UFlat/13                           609547    596761  685.3MB/s  txt3                  +2.1%
BM_UFlat/14                           824362    804821  573.8MB/s  txt4                  +2.4%
BM_UFlat/15                           371095    374899  1.3GB/s  bin                     -1.0%
BM_UFlat/16                              267       267  717.8MB/s  bin_200               +0.0%
BM_UFlat/17                            44623     43828  835.9MB/s  sum                   +1.8%
BM_UFlat/18                             5077      4815  841.0MB/s  man                   +5.4%
BM_UFlat/19                            74964     73210  1.5GB/s  pb                      +2.4%
BM_UFlat/20                           237987    236745  746.0MB/s  gaviota               +0.5%
Sum of all benchmarks                4794092   4697659                                   +2.1%


Istanbul:

Benchmark                          Base (ns)  New (ns)                                Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0                             98614     96376  1020.4MB/s  html                 +2.3%
BM_UFlat/1                            963740    953241  707.2MB/s  urls                  +1.1%
BM_UFlat/2                             25042     24769  4.8GB/s  jpg                     +1.1%
BM_UFlat/3                               180       180  1065.6MB/s  jpg_200              +0.0%
BM_UFlat/4                             45942     45403  1.9GB/s  pdf                     +1.2%
BM_UFlat/5                            400135    390226  1008.2MB/s  html4                +2.5%
BM_UFlat/6                             37768     37392  631.9MB/s  cp                    +1.0%
BM_UFlat/7                             18585     18200  588.2MB/s  c                     +2.1%
BM_UFlat/8                              5751      5690  627.7MB/s  lsp                   +1.1%
BM_UFlat/9                           1543154   1542209  641.4MB/s  xls                   +0.1%
BM_UFlat/10                              381       388  494.6MB/s  xls_200               -1.8%
BM_UFlat/11                           339715    331973  440.1MB/s  txt1                  +2.3%
BM_UFlat/12                           294807    289418  415.4MB/s  txt2                  +1.9%
BM_UFlat/13                           906160    884094  463.3MB/s  txt3                  +2.5%
BM_UFlat/14                          1224221   1198435  386.1MB/s  txt4                  +2.2%
BM_UFlat/15                           516277    502923  979.5MB/s  bin                   +2.7%
BM_UFlat/16                              405       402  477.2MB/s  bin_200               +0.7%
BM_UFlat/17                            61640     60621  605.6MB/s  sum                   +1.7%
BM_UFlat/18                             7326      7383  549.5MB/s  man                   -0.8%
BM_UFlat/19                            94720     92653  1.2GB/s  pb                      +2.2%
BM_UFlat/20                           360435    346687  510.6MB/s  gaviota               +4.0%
Sum of all benchmarks                6944998   6828663                                   +1.7%


git-svn-id: https://snappy.googlecode.com/svn/trunk@77 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2013-06-14 21:42:26 +00:00
snappy.mirrorbot@gmail.com 328aafa198 Add support for uncompressing to iovecs (scatter I/O).
Windows does not have struct iovec defined anywhere,
so we define our own version that's equal to what UNIX
typically has.

The bulk of this patch was contributed by Mohit Aron.

R=jeff


git-svn-id: https://snappy.googlecode.com/svn/trunk@76 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2013-06-13 16:19:52 +00:00
snappy.mirrorbot@gmail.com cd92eb0852 Some code reorganization needed for an internal change.
R=fikes


git-svn-id: https://snappy.googlecode.com/svn/trunk@75 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2013-06-12 19:51:15 +00:00
snappy.mirrorbot@gmail.com a3e928d62b Supports truncated test data in zippy benchmark.
R=sesse


git-svn-id: https://snappy.googlecode.com/svn/trunk@74 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2013-04-09 15:33:30 +00:00
snappy.mirrorbot@gmail.com bde324c016 Release Snappy 1.1.0.
R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@73 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2013-02-05 14:36:15 +00:00
snappy.mirrorbot@gmail.com 8168446c7e Make ./snappy_unittest pass without "srcdir" being defined.
Previously, snappy_unittests would read from an absolute path /testdata/..;
convert it to use a relative path instead.

Patch from Marc-Antonie Ruel.

R=maruel


git-svn-id: https://snappy.googlecode.com/svn/trunk@72 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2013-02-05 14:30:05 +00:00
snappy.mirrorbot@gmail.com 27a0cc3949 Increase the Zippy block size from 32 kB to 64 kB, winning ~3% density
while being effectively performance neutral.

The longer story about density is that we win 3-6% density on the benchmarks 
where this has any effect at all; many of the benchmarks (cp, c, lsp, man)
are smaller than 32 kB and thus will have no effect. Binary data also seems
to win little or nothing; of course, the already-compressed data wins nothing.
The protobuf benchmark wins as much as ~18% depending on architecture,
but I wouldn't be too sure that this is representative of protobuf data in
general.

As of performance, we lose a tiny amount since we get more tags (e.g., a long
literal might be broken up into literal-copy-literal), but we win it back with
less clearing of the hash table, and more opportunities to skip incompressible
data (e.g. in the jpg benchmark). Decompression seems to get ever so slightly
slower, again due to more tags. The total net change is about as close to zero
as we can get, so the end effect seems to be simply more density and no
real performance change.

The comment about not changing kBlockSize, scary as it is, is not really
relevant, since we're never going to have a block-level decompressor without
explicitly marked blocks. Replace it with something more appropriate.

This affects the framing format, but it's okay to change it since it basically
has no users yet.


Density (note that cp, c, lsp and man are all smaller than 32 kB):

   Benchmark         Description   Base (%)  New (%)  Improvement
   --------------------------------------------------------------
   ZFlat/0           html            22.57    22.31     +5.6%
   ZFlat/1           urls            50.89    47.77     +6.5%
   ZFlat/2           jpg             99.88    99.87     +0.0%
   ZFlat/3           pdf             82.13    82.07     +0.1%
   ZFlat/4           html4           23.55    22.51     +4.6%
   ZFlat/5           cp              48.12    48.12     +0.0%
   ZFlat/6           c               42.40    42.40     +0.0%
   ZFlat/7           lsp             48.37    48.37     +0.0%
   ZFlat/8           xls             41.34    41.23     +0.3%
   ZFlat/9           txt1            59.81    57.87     +3.4%
   ZFlat/10          txt2            64.07    61.93     +3.5%
   ZFlat/11          txt3            57.11    54.92     +4.0%
   ZFlat/12          txt4            68.35    66.22     +3.2%
   ZFlat/13          bin             18.21    18.11     +0.6%
   ZFlat/14          sum             51.88    48.96     +6.0%
   ZFlat/15          man             59.36    59.36     +0.0%
   ZFlat/16          pb              23.15    19.64    +17.9%
   ZFlat/17          gaviota         38.27    37.72     +1.5%
   Geometric mean                    45.51    44.15     +3.1%


Microbenchmarks (64-bit, opt):

Westmere 2.8 GHz:

   Benchmark                          Base (ns)  New (ns)                                Improvement
   -------------------------------------------------------------------------------------------------
   BM_UFlat/0                             75342     75027  1.3GB/s  html                    +0.4%
   BM_UFlat/1                            723767    744269  899.6MB/s  urls                  -2.8%
   BM_UFlat/2                             10072     10072  11.7GB/s  jpg                    +0.0%
   BM_UFlat/3                             30747     30388  2.9GB/s  pdf                     +1.2%
   BM_UFlat/4                            307353    306063  1.2GB/s  html4                   +0.4%
   BM_UFlat/5                             28593     28743  816.3MB/s  cp                    -0.5%
   BM_UFlat/6                             12958     12998  818.1MB/s  c                     -0.3%
   BM_UFlat/7                              3700      3792  935.8MB/s  lsp                   -2.4%
   BM_UFlat/8                            999685    999905  982.1MB/s  xls                   -0.0%
   BM_UFlat/9                            232954    230079  630.4MB/s  txt1                  +1.2%
   BM_UFlat/10                           200785    201468  592.6MB/s  txt2                  -0.3%
   BM_UFlat/11                           617267    610968  666.1MB/s  txt3                  +1.0%
   BM_UFlat/12                           821595    822475  558.7MB/s  txt4                  -0.1%
   BM_UFlat/13                           377097    377632  1.3GB/s  bin                     -0.1%
   BM_UFlat/14                            45476     45260  805.8MB/s  sum                   +0.5%
   BM_UFlat/15                             4985      5003  805.7MB/s  man                   -0.4%
   BM_UFlat/16                            80813     77494  1.4GB/s  pb                      +4.3%
   BM_UFlat/17                           251792    241553  727.7MB/s  gaviota               +4.2%
   BM_UValidate/0                         40343     40354  2.4GB/s  html                    -0.0%
   BM_UValidate/1                        426890    451574  1.4GB/s  urls                    -5.5%
   BM_UValidate/2                           187       179  661.9GB/s  jpg                   +4.5%
   BM_UValidate/3                         13783     13827  6.4GB/s  pdf                     -0.3%
   BM_UValidate/4                        162393    163335  2.3GB/s  html4                   -0.6%
   BM_UDataBuffer/0                       93756     93302  1046.7MB/s  html                 +0.5%
   BM_UDataBuffer/1                      886714    916292  730.7MB/s  urls                  -3.2%
   BM_UDataBuffer/2                       15861     16401  7.2GB/s  jpg                     -3.3%
   BM_UDataBuffer/3                       38934     39224  2.2GB/s  pdf                     -0.7%
   BM_UDataBuffer/4                      381008    379428  1029.5MB/s  html4                +0.4%
   BM_UCord/0                             92528     91098  1072.0MB/s  html                 +1.6%
   BM_UCord/1                            858421    885287  756.3MB/s  urls                  -3.0%
   BM_UCord/2                             13140     13464  8.8GB/s  jpg                     -2.4%
   BM_UCord/3                             39012     37773  2.3GB/s  pdf                     +3.3%
   BM_UCord/4                            376869    371267  1052.1MB/s  html4                +1.5%
   BM_UCordString/0                       75810     75303  1.3GB/s  html                    +0.7%
   BM_UCordString/1                      735290    753841  888.2MB/s  urls                  -2.5%
   BM_UCordString/2                       11945     13113  9.0GB/s  jpg                     -8.9%
   BM_UCordString/3                       33901     32562  2.7GB/s  pdf                     +4.1%
   BM_UCordString/4                      310985    309390  1.2GB/s  html4                   +0.5%
   BM_UCordValidate/0                     40952     40450  2.4GB/s  html                    +1.2%
   BM_UCordValidate/1                    433842    456531  1.4GB/s  urls                    -5.0%
   BM_UCordValidate/2                      1179      1173  100.8GB/s  jpg                   +0.5%
   BM_UCordValidate/3                     14481     14392  6.1GB/s  pdf                     +0.6%
   BM_UCordValidate/4                    164364    164151  2.3GB/s  html4                   +0.1%
   BM_ZFlat/0                            160610    156601  623.6MB/s  html (22.31 %)        +2.6%
   BM_ZFlat/1                           1995238   1993582  335.9MB/s  urls (47.77 %)        +0.1%
   BM_ZFlat/2                             30133     24983  4.7GB/s  jpg (99.87 %)          +20.6%
   BM_ZFlat/3                             74453     73128  1.2GB/s  pdf (82.07 %)           +1.8%
   BM_ZFlat/4                            647674    633729  616.4MB/s  html4 (22.51 %)       +2.2%
   BM_ZFlat/5                             76259     76090  308.4MB/s  cp (48.12 %)          +0.2%
   BM_ZFlat/6                             31106     31084  342.1MB/s  c (42.40 %)           +0.1%
   BM_ZFlat/7                             10507     10443  339.8MB/s  lsp (48.37 %)         +0.6%
   BM_ZFlat/8                           1811047   1793325  547.6MB/s  xls (41.23 %)         +1.0%
   BM_ZFlat/9                            597903    581793  249.3MB/s  txt1 (57.87 %)        +2.8%
   BM_ZFlat/10                           525320    514522  232.0MB/s  txt2 (61.93 %)        +2.1%
   BM_ZFlat/11                          1596591   1551636  262.3MB/s  txt3 (54.92 %)        +2.9%
   BM_ZFlat/12                          2134523   2094033  219.5MB/s  txt4 (66.22 %)        +1.9%
   BM_ZFlat/13                           593024    587869  832.6MB/s  bin (18.11 %)         +0.9%
   BM_ZFlat/14                           114746    110666  329.5MB/s  sum (48.96 %)         +3.7%
   BM_ZFlat/15                            14376     14485  278.3MB/s  man (59.36 %)         -0.8%
   BM_ZFlat/16                           167908    150070  753.6MB/s  pb (19.64 %)         +11.9%
   BM_ZFlat/17                           460228    442253  397.5MB/s  gaviota (37.72 %)     +4.1%
   BM_ZCord/0                            164896    160241  609.4MB/s  html                  +2.9%
   BM_ZCord/1                           2070239   2043492  327.7MB/s  urls                  +1.3%
   BM_ZCord/2                             54402     47002  2.5GB/s  jpg                    +15.7%
   BM_ZCord/3                             85871     83832  1073.1MB/s  pdf                  +2.4%
   BM_ZCord/4                            664078    648825  602.0MB/s  html4                 +2.4%
   BM_ZDataBuffer/0                      174874    172549  566.0MB/s  html                  +1.3%
   BM_ZDataBuffer/1                     2134410   2139173  313.0MB/s  urls                  -0.2%
   BM_ZDataBuffer/2                       71911     69551  1.7GB/s  jpg                     +3.4%
   BM_ZDataBuffer/3                       98236     99727  902.1MB/s  pdf                   -1.5%
   BM_ZDataBuffer/4                      710776    699104  558.8MB/s  html4                 +1.7%
   Sum of all benchmarks               27358908  27200688                                   +0.6%


Sandy Bridge 2.6 GHz:

   Benchmark                          Base (ns)  New (ns)                                Improvement
   -------------------------------------------------------------------------------------------------
   BM_UFlat/0                             49356     49018  1.9GB/s  html                    +0.7%
   BM_UFlat/1                            516764    531955  1.2GB/s  urls                    -2.9%
   BM_UFlat/2                              6982      7304  16.2GB/s  jpg                    -4.4%
   BM_UFlat/3                             15285     15598  5.6GB/s  pdf                     -2.0%
   BM_UFlat/4                            206557    206669  1.8GB/s  html4                   -0.1%
   BM_UFlat/5                             13681     13567  1.7GB/s  cp                      +0.8%
   BM_UFlat/6                              6571      6592  1.6GB/s  c                       -0.3%
   BM_UFlat/7                              2008      1994  1.7GB/s  lsp                     +0.7%
   BM_UFlat/8                            775700    773286  1.2GB/s  xls                     +0.3%
   BM_UFlat/9                            165578    164480  881.8MB/s  txt1                  +0.7%
   BM_UFlat/10                           143707    144139  828.2MB/s  txt2                  -0.3%
   BM_UFlat/11                           443026    436281  932.8MB/s  txt3                  +1.5%
   BM_UFlat/12                           603129    595856  771.2MB/s  txt4                  +1.2%
   BM_UFlat/13                           271682    270450  1.8GB/s  bin                     +0.5%
   BM_UFlat/14                            26200     25666  1.4GB/s  sum                     +2.1%
   BM_UFlat/15                             2620      2608  1.5GB/s  man                     +0.5%
   BM_UFlat/16                            48908     47756  2.3GB/s  pb                      +2.4%
   BM_UFlat/17                           174638    170346  1031.9MB/s  gaviota              +2.5%
   BM_UValidate/0                         31922     31898  3.0GB/s  html                    +0.1%
   BM_UValidate/1                        341265    363554  1.8GB/s  urls                    -6.1%
   BM_UValidate/2                           160       151  782.8GB/s  jpg                   +6.0%
   BM_UValidate/3                         10402     10380  8.5GB/s  pdf                     +0.2%
   BM_UValidate/4                        129490    130587  2.9GB/s  html4                   -0.8%
   BM_UDataBuffer/0                       59383     58736  1.6GB/s  html                    +1.1%
   BM_UDataBuffer/1                      619222    637786  1049.8MB/s  urls                 -2.9%
   BM_UDataBuffer/2                       10775     11941  9.9GB/s  jpg                     -9.8%
   BM_UDataBuffer/3                       18002     17930  4.9GB/s  pdf                     +0.4%
   BM_UDataBuffer/4                      259182    259306  1.5GB/s  html4                   -0.0%
   BM_UCord/0                             59379     57814  1.6GB/s  html                    +2.7%
   BM_UCord/1                            598456    615162  1088.4MB/s  urls                 -2.7%
   BM_UCord/2                              8519      8628  13.7GB/s  jpg                    -1.3%
   BM_UCord/3                             18123     17537  5.0GB/s  pdf                     +3.3%
   BM_UCord/4                            252375    252331  1.5GB/s  html4                   +0.0%
   BM_UCordString/0                       49494     49790  1.9GB/s  html                    -0.6%
   BM_UCordString/1                      524659    541803  1.2GB/s  urls                    -3.2%
   BM_UCordString/2                        8206      8354  14.2GB/s  jpg                    -1.8%
   BM_UCordString/3                       17235     16537  5.3GB/s  pdf                     +4.2%
   BM_UCordString/4                      210188    211072  1.8GB/s  html4                   -0.4%
   BM_UCordValidate/0                     31956     31587  3.0GB/s  html                    +1.2%
   BM_UCordValidate/1                    340828    362141  1.8GB/s  urls                    -5.9%
   BM_UCordValidate/2                       783       744  158.9GB/s  jpg                   +5.2%
   BM_UCordValidate/3                     10543     10462  8.4GB/s  pdf                     +0.8%
   BM_UCordValidate/4                    130150    129789  2.9GB/s  html4                   +0.3%
   BM_ZFlat/0                            113873    111200  878.2MB/s  html (22.31 %)        +2.4%
   BM_ZFlat/1                           1473023   1489858  449.4MB/s  urls (47.77 %)        -1.1%
   BM_ZFlat/2                             23569     19486  6.1GB/s  jpg (99.87 %)          +21.0%
   BM_ZFlat/3                             49178     48046  1.8GB/s  pdf (82.07 %)           +2.4%
   BM_ZFlat/4                            475063    469394  832.2MB/s  html4 (22.51 %)       +1.2%
   BM_ZFlat/5                             46910     46816  501.2MB/s  cp (48.12 %)          +0.2%
   BM_ZFlat/6                             16883     16916  628.6MB/s  c (42.40 %)           -0.2%
   BM_ZFlat/7                              5381      5447  651.5MB/s  lsp (48.37 %)         -1.2%
   BM_ZFlat/8                           1466870   1473861  666.3MB/s  xls (41.23 %)         -0.5%
   BM_ZFlat/9                            468006    464101  312.5MB/s  txt1 (57.87 %)        +0.8%
   BM_ZFlat/10                           408157    408957  291.9MB/s  txt2 (61.93 %)        -0.2%
   BM_ZFlat/11                          1253348   1232910  330.1MB/s  txt3 (54.92 %)        +1.7%
   BM_ZFlat/12                          1702373   1702977  269.8MB/s  txt4 (66.22 %)        -0.0%
   BM_ZFlat/13                           439792    438557  1116.0MB/s  bin (18.11 %)        +0.3%
   BM_ZFlat/14                            80766     78851  462.5MB/s  sum (48.96 %)         +2.4%
   BM_ZFlat/15                             7420      7542  534.5MB/s  man (59.36 %)         -1.6%
   BM_ZFlat/16                           112043    100126  1.1GB/s  pb (19.64 %)           +11.9%
   BM_ZFlat/17                           368877    357703  491.4MB/s  gaviota (37.72 %)     +3.1%
   BM_ZCord/0                            116402    113564  859.9MB/s  html                  +2.5%
   BM_ZCord/1                           1507156   1519911  440.5MB/s  urls                  -0.8%
   BM_ZCord/2                             39860     33686  3.5GB/s  jpg                    +18.3%
   BM_ZCord/3                             56211     54694  1.6GB/s  pdf                     +2.8%
   BM_ZCord/4                            485594    479212  815.1MB/s  html4                 +1.3%
   BM_ZDataBuffer/0                      123185    121572  803.3MB/s  html                  +1.3%
   BM_ZDataBuffer/1                     1569111   1589380  421.3MB/s  urls                  -1.3%
   BM_ZDataBuffer/2                       53143     49556  2.4GB/s  jpg                     +7.2%
   BM_ZDataBuffer/3                       65725     66826  1.3GB/s  pdf                     -1.6%
   BM_ZDataBuffer/4                      517871    514750  758.9MB/s  html4                 +0.6%
   Sum of all benchmarks               20258879  20315484                                   -0.3%


AMD Instanbul 2.4 GHz:

   Benchmark                          Base (ns)  New (ns)                                Improvement
   -------------------------------------------------------------------------------------------------
   BM_UFlat/0                             97120     96585  1011.1MB/s  html                 +0.6%
   BM_UFlat/1                            917473    948016  706.3MB/s  urls                  -3.2%
   BM_UFlat/2                             21496     23938  4.9GB/s  jpg                    -10.2%
   BM_UFlat/3                             44751     45639  1.9GB/s  pdf                     -1.9%
   BM_UFlat/4                            391950    391413  998.0MB/s  html4                 +0.1%
   BM_UFlat/5                             37366     37201  630.7MB/s  cp                    +0.4%
   BM_UFlat/6                             18350     18318  580.5MB/s  c                     +0.2%
   BM_UFlat/7                              5672      5661  626.9MB/s  lsp                   +0.2%
   BM_UFlat/8                           1533390   1529441  642.1MB/s  xls                   +0.3%
   BM_UFlat/9                            335477    336553  431.0MB/s  txt1                  -0.3%
   BM_UFlat/10                           285140    292080  408.7MB/s  txt2                  -2.4%
   BM_UFlat/11                           888507    894758  454.9MB/s  txt3                  -0.7%
   BM_UFlat/12                          1187643   1210928  379.5MB/s  txt4                  -1.9%
   BM_UFlat/13                           493717    507447  964.5MB/s  bin                   -2.7%
   BM_UFlat/14                            61740     60870  599.1MB/s  sum                   +1.4%
   BM_UFlat/15                             7211      7187  560.9MB/s  man                   +0.3%
   BM_UFlat/16                            97435     93100  1.2GB/s  pb                      +4.7%
   BM_UFlat/17                           362662    356395  493.2MB/s  gaviota               +1.8%
   BM_UValidate/0                         47475     47118  2.0GB/s  html                    +0.8%
   BM_UValidate/1                        501304    529741  1.2GB/s  urls                    -5.4%
   BM_UValidate/2                           276       243  486.2GB/s  jpg                  +13.6%
   BM_UValidate/3                         16361     16261  5.4GB/s  pdf                     +0.6%
   BM_UValidate/4                        190741    190353  2.0GB/s  html4                   +0.2%
   BM_UDataBuffer/0                      111080    109771  889.6MB/s  html                  +1.2%
   BM_UDataBuffer/1                     1051035   1085999  616.5MB/s  urls                  -3.2%
   BM_UDataBuffer/2                       25801     25463  4.6GB/s  jpg                     +1.3%
   BM_UDataBuffer/3                       50493     49946  1.8GB/s  pdf                     +1.1%
   BM_UDataBuffer/4                      447258    444138  879.5MB/s  html4                 +0.7%
   BM_UCord/0                            109350    107909  905.0MB/s  html                  +1.3%
   BM_UCord/1                           1023396   1054964  634.7MB/s  urls                  -3.0%
   BM_UCord/2                             25292     24371  4.9GB/s  jpg                     +3.8%
   BM_UCord/3                             48955     49736  1.8GB/s  pdf                     -1.6%
   BM_UCord/4                            440452    437331  893.2MB/s  html4                 +0.7%
   BM_UCordString/0                       98511     98031  996.2MB/s  html                  +0.5%
   BM_UCordString/1                      933230    963495  694.9MB/s  urls                  -3.1%
   BM_UCordString/2                       23311     24076  4.9GB/s  jpg                     -3.2%
   BM_UCordString/3                       45568     46196  1.9GB/s  pdf                     -1.4%
   BM_UCordString/4                      397791    396934  984.1MB/s  html4                 +0.2%
   BM_UCordValidate/0                     47537     46921  2.0GB/s  html                    +1.3%
   BM_UCordValidate/1                    505071    532716  1.2GB/s  urls                    -5.2%
   BM_UCordValidate/2                      1663      1621  72.9GB/s  jpg                    +2.6%
   BM_UCordValidate/3                     16890     16926  5.2GB/s  pdf                     -0.2%
   BM_UCordValidate/4                    192365    191984  2.0GB/s  html4                   +0.2%
   BM_ZFlat/0                            184708    179103  545.3MB/s  html (22.31 %)        +3.1%
   BM_ZFlat/1                           2293864   2302950  290.7MB/s  urls (47.77 %)        -0.4%
   BM_ZFlat/2                             52852     47618  2.5GB/s  jpg (99.87 %)          +11.0%
   BM_ZFlat/3                            100766     96179  935.3MB/s  pdf (82.07 %)         +4.8%
   BM_ZFlat/4                            741220    727977  536.6MB/s  html4 (22.51 %)       +1.8%
   BM_ZFlat/5                             85402     85418  274.7MB/s  cp (48.12 %)          -0.0%
   BM_ZFlat/6                             36558     36494  291.4MB/s  c (42.40 %)           +0.2%
   BM_ZFlat/7                             12706     12507  283.7MB/s  lsp (48.37 %)         +1.6%
   BM_ZFlat/8                           2336823   2335688  420.5MB/s  xls (41.23 %)         +0.0%
   BM_ZFlat/9                            701804    681153  212.9MB/s  txt1 (57.87 %)        +3.0%
   BM_ZFlat/10                           606700    597194  199.9MB/s  txt2 (61.93 %)        +1.6%
   BM_ZFlat/11                          1852283   1803238  225.7MB/s  txt3 (54.92 %)        +2.7%
   BM_ZFlat/12                          2475527   2443354  188.1MB/s  txt4 (66.22 %)        +1.3%
   BM_ZFlat/13                           694497    696654  702.6MB/s  bin (18.11 %)         -0.3%
   BM_ZFlat/14                           136929    129855  280.8MB/s  sum (48.96 %)         +5.4%
   BM_ZFlat/15                            17172     17124  235.4MB/s  man (59.36 %)         +0.3%
   BM_ZFlat/16                           190364    171763  658.4MB/s  pb (19.64 %)         +10.8%
   BM_ZFlat/17                           567285    555190  316.6MB/s  gaviota (37.72 %)     +2.2%
   BM_ZCord/0                            193490    187031  522.1MB/s  html                  +3.5%
   BM_ZCord/1                           2427537   2415315  277.2MB/s  urls                  +0.5%
   BM_ZCord/2                             85378     81412  1.5GB/s  jpg                     +4.9%
   BM_ZCord/3                            121898    119419  753.3MB/s  pdf                   +2.1%
   BM_ZCord/4                            779564    762961  512.0MB/s  html4                 +2.2%
   BM_ZDataBuffer/0                      213820    207272  471.1MB/s  html                  +3.2%
   BM_ZDataBuffer/1                     2589010   2586495  258.9MB/s  urls                  +0.1%
   BM_ZDataBuffer/2                      121871    118885  1018.4MB/s  jpg                  +2.5%
   BM_ZDataBuffer/3                      145382    145986  616.2MB/s  pdf                   -0.4%
   BM_ZDataBuffer/4                      868117    852754  458.1MB/s  html4                 +1.8%
   Sum of all benchmarks               33771833  33744763                                   +0.1%


git-svn-id: https://snappy.googlecode.com/svn/trunk@71 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2013-01-18 12:16:36 +00:00
snappy.mirrorbot@gmail.com 81f34784b7 Adjust the Snappy open-source distribution for the changes in Google's
internal file API.

R=sanjay



git-svn-id: https://snappy.googlecode.com/svn/trunk@70 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2013-01-06 19:21:26 +00:00
snappy.mirrorbot@gmail.com 698af469b4 Change a few ORs to additions where they don't matter. This helps the compiler
use the LEA instruction more efficiently, since e.g. a + (b << 2) can be encoded
as one instruction. Even more importantly, it can constant-fold the
COPY_* enums together with the shifted negative constants, which also saves
some instructions. (We don't need it for LITERAL, since it happens to be 0.)

I am unsure why the compiler couldn't do this itself, but the theory is that
it cannot prove that len-1 and len-4 cannot underflow/wrap, and thus can't
do the optimization safely.

The gains are small but measurable; 0.5-1.0% over the BM_Z* benchmarks
(measured on Westmere, Sandy Bridge and Istanbul).

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@69 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2013-01-04 11:54:20 +00:00
snappy.mirrorbot@gmail.com 55209f9b92 Stop giving -Werror to automake, due to an incompatibility between current
versions of libtool and automake on non-GNU platforms (e.g. Mac OS X).

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@68 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-10-08 11:37:16 +00:00
snappy.mirrorbot@gmail.com b86e81c8b3 Fix public issue 66: Document GetUncompressedLength better, in particular that
it leaves the source in a state that's not appropriate for RawUncompress.

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@67 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-08-17 13:54:47 +00:00
snappy.mirrorbot@gmail.com 2e225ba821 Fix public issue 64: Check for <sys/time.h> at configure time,
since MSVC seemingly does not have it.

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@66 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-07-31 11:44:44 +00:00
snappy.mirrorbot@gmail.com e89f20ab46 Handle the case where gettimeofday() goes backwards or returns the same value
twice; it could cause division by zero in the unit test framework.
(We already had one fix for this in place, but it was incomplete.)

This could in theory happen on any system, since there are few guarantees
about gettimeofday(), but seems to only happen in practice on GNU/Hurd, where
gettimeofday() is cached and only updated ever so often.

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@65 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-07-04 09:34:48 +00:00
snappy.mirrorbot@gmail.com 3ec60ac987 Mark ARMv4 as not supporting unaligned accesses (not just ARMv5 and ARMv6);
apparently Debian still targets these by default, giving us segfaults on
armel.

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@64 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-07-04 09:28:33 +00:00
snappy.mirrorbot@gmail.com be80d6f74f Fix public bug #62: Remove an extraneous comma at the end of an enum list,
causing compile errors when embedded in Mozilla on OpenBSD.

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@63 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-05-22 09:46:05 +00:00
snappy.mirrorbot@gmail.com 8b95464146 Snappy library no longer depends on iostream.
Achieved by moving logging macro definitions to a test-only
header file, and by changing non-test code to use assert,
fprintf, and abort instead of LOG/CHECK macros.

R=sesse


git-svn-id: https://snappy.googlecode.com/svn/trunk@62 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-05-22 09:32:50 +00:00
snappy.mirrorbot@gmail.com fc723b212d Release Snappy 1.0.5.
R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@61 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-02-24 15:46:37 +00:00
snappy.mirrorbot@gmail.com dc63e0ad96 For 32-bit platforms, do not try to accelerate multiple neighboring
32-bit loads with a 64-bit load during compression (it's not a win).

The main target for this optimization is ARM, but 32-bit x86 gets
a small gain, too, although there is noise in the microbenchmarks.
It's a no-op for 64-bit x86. It does not affect decompression.

Microbenchmark results on a Cortex-A9 1GHz, using g++ 4.6.2 (from
Ubuntu/Linaro), -O2 -DNDEBUG -Wa,-march=armv7a -mtune=cortex-a9
-mthumb-interwork, minimum 1000 iterations:

  Benchmark            Time(ns)    CPU(ns) Iterations
  ---------------------------------------------------
  BM_ZFlat/0            1158277    1160000       1000 84.2MB/s  html (23.57 %)    [ +4.3%]
  BM_ZFlat/1           14861782   14860000       1000 45.1MB/s  urls (50.89 %)    [ +1.1%]
  BM_ZFlat/2             393595     390000       1000 310.5MB/s  jpg (99.88 %)    [ +0.0%]
  BM_ZFlat/3             650583     650000       1000 138.4MB/s  pdf (82.13 %)    [ +3.1%]
  BM_ZFlat/4            4661480    4660000       1000 83.8MB/s  html4 (23.55 %)   [ +4.3%]
  BM_ZFlat/5             491973     490000       1000 47.9MB/s  cp (48.12 %)      [ +2.0%]
  BM_ZFlat/6             193575     192678       1038 55.2MB/s  c (42.40 %)       [ +9.0%]
  BM_ZFlat/7              62343      62754       3187 56.5MB/s  lsp (48.37 %)     [ +2.6%]
  BM_ZFlat/8           17708468   17710000       1000 55.5MB/s  xls (41.34 %)     [ -0.3%]
  BM_ZFlat/9            3755345    3760000       1000 38.6MB/s  txt1 (59.81 %)    [ +8.2%]
  BM_ZFlat/10           3324217    3320000       1000 36.0MB/s  txt2 (64.07 %)    [ +4.2%]
  BM_ZFlat/11          10139932   10140000       1000 40.1MB/s  txt3 (57.11 %)    [ +6.4%]
  BM_ZFlat/12          13532109   13530000       1000 34.0MB/s  txt4 (68.35 %)    [ +5.0%]
  BM_ZFlat/13           4690847    4690000       1000 104.4MB/s  bin (18.21 %)    [ +4.1%]
  BM_ZFlat/14            830682     830000       1000 43.9MB/s  sum (51.88 %)     [ +1.2%]
  BM_ZFlat/15             84784      85011       2235 47.4MB/s  man (59.36 %)     [ +1.1%]
  BM_ZFlat/16           1293254    1290000       1000 87.7MB/s  pb (23.15 %)      [ +2.3%]
  BM_ZFlat/17           2775155    2780000       1000 63.2MB/s  gaviota (38.27 %) [+12.2%]

Core i7 in 32-bit mode (only one run and 100 iterations, though, so noisy):

  Benchmark            Time(ns)    CPU(ns) Iterations
  ---------------------------------------------------
  BM_ZFlat/0             227582     223464       3043 437.0MB/s  html (23.57 %)    [ +7.4%]
  BM_ZFlat/1            2982430    2918455        233 229.4MB/s  urls (50.89 %)    [ +2.9%]
  BM_ZFlat/2              46967      46658      15217 2.5GB/s  jpg (99.88 %)       [ +0.0%]
  BM_ZFlat/3             115298     114864       5833 783.2MB/s  pdf (82.13 %)     [ +1.5%]
  BM_ZFlat/4             913440     899743        778 434.2MB/s  html4 (23.55 %)   [ +0.3%]
  BM_ZFlat/5             110302     108571       7000 216.1MB/s  cp (48.12 %)      [ +0.0%]
  BM_ZFlat/6              44409      43372      15909 245.2MB/s  c (42.40 %)       [ +0.8%]
  BM_ZFlat/7              15713      15643      46667 226.9MB/s  lsp (48.37 %)     [ +2.7%]
  BM_ZFlat/8            2625539    2602230        269 377.4MB/s  xls (41.34 %)     [ +1.4%]
  BM_ZFlat/9             808884     811429        875 178.8MB/s  txt1 (59.81 %)    [ -3.9%]
  BM_ZFlat/10            709532     700000       1000 170.5MB/s  txt2 (64.07 %)    [ +0.0%]
  BM_ZFlat/11           2177682    2162162        333 188.2MB/s  txt3 (57.11 %)    [ -1.4%]
  BM_ZFlat/12           2849640    2840000        250 161.8MB/s  txt4 (68.35 %)    [ -1.4%]
  BM_ZFlat/13            849760     835476        778 585.8MB/s  bin (18.21 %)     [ +1.2%]
  BM_ZFlat/14            165940     164571       4375 221.6MB/s  sum (51.88 %)     [ +1.4%]
  BM_ZFlat/15             20939      20571      35000 196.0MB/s  man (59.36 %)     [ +2.1%]
  BM_ZFlat/16            239209     236544       2917 478.1MB/s  pb (23.15 %)      [ +4.2%]
  BM_ZFlat/17            616206     610000       1000 288.2MB/s  gaviota (38.27 %) [ -1.6%]

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@60 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-02-23 17:00:36 +00:00
snappy.mirrorbot@gmail.com f8829ea39d Enable the use of unaligned loads and stores for ARM-based architectures
where they are available (ARMv7 and higher). This gives a significant 
speed boost on ARM, both for compression and decompression. 
It should not affect x86 at all. 
 
There are more changes possible to speed up ARM, but it might not be 
that easy to do without hurting x86 or making the code uglier. 
Also, we de not try to use NEON yet. 
 
Microbenchmark results on a Cortex-A9 1GHz, using g++ 4.6.2 (from Ubuntu/Linaro), 
-O2 -DNDEBUG -Wa,-march=armv7a -mtune=cortex-a9 -mthumb-interwork: 
 
Benchmark            Time(ns)    CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0             524806     529100        378 184.6MB/s  html            [+33.6%]
BM_UFlat/1            5139790    5200000        100 128.8MB/s  urls            [+28.8%]
BM_UFlat/2              86540      84166       1901 1.4GB/s  jpg               [ +0.6%]
BM_UFlat/3             215351     210176        904 428.0MB/s  pdf             [+29.8%]
BM_UFlat/4            2144490    2100000        100 186.0MB/s  html4           [+33.3%]
BM_UFlat/5             194482     190000       1000 123.5MB/s  cp              [+36.2%]
BM_UFlat/6              91843      90175       2107 117.9MB/s  c               [+38.6%]
BM_UFlat/7              28535      28426       6684 124.8MB/s  lsp             [+34.7%]
BM_UFlat/8            9206600    9200000        100 106.7MB/s  xls             [+42.4%]
BM_UFlat/9            1865273    1886792        106 76.9MB/s  txt1             [+32.5%]
BM_UFlat/10           1576809    1587301        126 75.2MB/s  txt2             [+32.3%]
BM_UFlat/11           4968450    4900000        100 83.1MB/s  txt3             [+32.7%]
BM_UFlat/12           6673970    6700000        100 68.6MB/s  txt4             [+32.8%]
BM_UFlat/13           2391470    2400000        100 203.9MB/s  bin             [+29.2%]
BM_UFlat/14            334601     344827        522 105.8MB/s  sum             [+30.6%]
BM_UFlat/15             37404      38080       5252 105.9MB/s  man             [+33.8%]
BM_UFlat/16            535470     540540        370 209.2MB/s  pb              [+31.2%]
BM_UFlat/17           1875245    1886792        106 93.2MB/s  gaviota          [+37.8%]
BM_UValidate/0         178425     179533       1114 543.9MB/s  html            [ +2.7%]
BM_UValidate/1        2100450    2000000        100 334.8MB/s  urls            [ +5.0%]
BM_UValidate/2           1039       1044     172413 113.3GB/s  jpg             [ +3.4%]
BM_UValidate/3          59423      59470       3363 1.5GB/s  pdf               [ +7.8%]
BM_UValidate/4         760716     766283        261 509.8MB/s  html4           [ +6.5%]
BM_ZFlat/0            1204632    1204819        166 81.1MB/s  html (23.57 %)   [+32.8%]
BM_ZFlat/1           15656190   15600000        100 42.9MB/s  urls (50.89 %)   [+27.6%]
BM_ZFlat/2             403336     410677        487 294.8MB/s  jpg (99.88 %)   [+16.5%]
BM_ZFlat/3             664073     671140        298 134.0MB/s  pdf (82.13 %)   [+28.4%]
BM_ZFlat/4            4961940    4900000        100 79.7MB/s  html4 (23.55 %)  [+30.6%]
BM_ZFlat/5             500664     501253        399 46.8MB/s  cp (48.12 %)     [+33.4%]
BM_ZFlat/6             217276     215982        926 49.2MB/s  c (42.40 %)      [+25.0%]
BM_ZFlat/7              64122      65487       3054 54.2MB/s  lsp (48.37 %)    [+36.1%]
BM_ZFlat/8           18045730   18000000        100 54.6MB/s  xls (41.34 %)    [+34.4%]
BM_ZFlat/9            4051530    4000000        100 36.3MB/s  txt1 (59.81 %)   [+25.0%]
BM_ZFlat/10           3451800    3500000        100 34.1MB/s  txt2 (64.07 %)   [+25.7%]
BM_ZFlat/11          11052340   11100000        100 36.7MB/s  txt3 (57.11 %)   [+24.3%]
BM_ZFlat/12          14538690   14600000        100 31.5MB/s  txt4 (68.35 %)   [+24.7%]
BM_ZFlat/13           5041850    5000000        100 97.9MB/s  bin (18.21 %)    [+32.0%]
BM_ZFlat/14            908840     909090        220 40.1MB/s  sum (51.88 %)    [+22.2%]
BM_ZFlat/15             86921      86206       1972 46.8MB/s  man (59.36 %)    [+42.2%]
BM_ZFlat/16           1312315    1315789        152 86.0MB/s  pb (23.15 %)     [+34.5%]
BM_ZFlat/17           3173120    3200000        100 54.9MB/s  gaviota (38.27%) [+28.1%]


The move from 64-bit to 32-bit operations for the copies also affected 32-bit x86;
positive on the decompression side, and slightly negative on the compression side
(unless that is noise; I only ran once):

Benchmark              Time(ns)    CPU(ns) Iterations
-----------------------------------------------------
BM_UFlat/0                86279      86140       7778 1.1GB/s  html             [ +7.5%]
BM_UFlat/1               839265     822622        778 813.9MB/s  urls           [ +9.4%]
BM_UFlat/2                 9180       9143      87500 12.9GB/s  jpg             [ +1.2%]
BM_UFlat/3                35080      35000      20000 2.5GB/s  pdf              [+10.1%]
BM_UFlat/4               350318     345000       2000 1.1GB/s  html4            [ +7.0%]
BM_UFlat/5                33808      33472      21212 701.0MB/s  cp             [ +9.0%]
BM_UFlat/6                15201      15214      46667 698.9MB/s  c              [+14.9%]
BM_UFlat/7                 4652       4651     159091 762.9MB/s  lsp            [ +7.5%]
BM_UFlat/8              1285551    1282528        538 765.7MB/s  xls            [+10.7%]
BM_UFlat/9               282510     281690       2414 514.9MB/s  txt1           [+13.6%]
BM_UFlat/10              243494     239286       2800 498.9MB/s  txt2           [+14.4%]
BM_UFlat/11              743625     740000       1000 550.0MB/s  txt3           [+14.3%]
BM_UFlat/12              999441     989717        778 464.3MB/s  txt4           [+16.1%]
BM_UFlat/13              412402     410076       1707 1.2GB/s  bin              [ +7.3%]
BM_UFlat/14               54876      54000      10000 675.3MB/s  sum            [+13.0%]
BM_UFlat/15                6146       6100     100000 660.8MB/s  man            [+14.8%]
BM_UFlat/16               90496      90286       8750 1.2GB/s  pb               [ +4.0%]
BM_UFlat/17              292650     292000       2500 602.0MB/s  gaviota        [+18.1%]
BM_UValidate/0            49620      49699      14286 1.9GB/s  html             [ +0.0%]
BM_UValidate/1           501371     500000       1000 1.3GB/s  urls             [ +0.0%]
BM_UValidate/2              232        227    3043478 521.5GB/s  jpg            [ +1.3%]
BM_UValidate/3            17250      17143      43750 5.1GB/s  pdf              [ -1.3%]
BM_UValidate/4           198643     200000       3500 1.9GB/s  html4            [ -0.9%]
BM_ZFlat/0               227128     229415       3182 425.7MB/s  html (23.57 %) [ -1.4%]
BM_ZFlat/1              2970089    2960000        250 226.2MB/s  urls (50.89 %) [ -1.9%]
BM_ZFlat/2                45683      44999      15556 2.6GB/s  jpg (99.88 %)    [ +2.2%]
BM_ZFlat/3               114661     113136       6364 795.1MB/s  pdf (82.13 %)  [ -1.5%]
BM_ZFlat/4               919702     914286        875 427.2MB/s  html4 (23.55%) [ -1.3%]
BM_ZFlat/5               108189     108422       6364 216.4MB/s  cp (48.12 %)   [ -1.2%]
BM_ZFlat/6                44525      44000      15909 241.7MB/s  c (42.40 %)    [ -2.9%]
BM_ZFlat/7                15973      15857      46667 223.8MB/s  lsp (48.37 %)  [ +0.0%]
BM_ZFlat/8              2677888    2639405        269 372.1MB/s  xls (41.34 %)  [ -1.4%]
BM_ZFlat/9               800715     780000       1000 186.0MB/s  txt1 (59.81 %) [ -0.4%]
BM_ZFlat/10              700089     700000       1000 170.5MB/s  txt2 (64.07 %) [ -2.9%]
BM_ZFlat/11             2159356    2138365        318 190.3MB/s  txt3 (57.11 %) [ -0.3%]
BM_ZFlat/12             2796143    2779923        259 165.3MB/s  txt4 (68.35 %) [ -1.4%]
BM_ZFlat/13              856458     835476        778 585.8MB/s  bin (18.21 %)  [ -0.1%]
BM_ZFlat/14              166908     166857       4375 218.6MB/s  sum (51.88 %)  [ -1.4%]
BM_ZFlat/15               21181      20857      35000 193.3MB/s  man (59.36 %)  [ -0.8%]
BM_ZFlat/16              244009     239973       2917 471.3MB/s  pb (23.15 %)   [ -1.4%]
BM_ZFlat/17              596362     590000       1000 297.9MB/s  gaviota (38.27%) [ +0.0%]

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@59 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-02-21 17:02:17 +00:00
snappy.mirrorbot@gmail.com f2e184f638 Lower the size allocated in the "corrupted input" unit test from 256 MB
to 2 MB. This fixes issues with running the unit test on platforms with
little RAM (e.g. some ARM boards).

Also, reactivate the 2 MB test for 64-bit platforms; there's no good
reason why it shouldn't be.

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@58 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-02-11 22:11:22 +00:00
snappy.mirrorbot@gmail.com e750dc0f05 Minor refactoring to accomodate changes in Google's internal code tree.
git-svn-id: https://snappy.googlecode.com/svn/trunk@57 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-01-08 17:55:48 +00:00
snappy.mirrorbot@gmail.com d9068ee301 Fix public issue r57: Fix most warnings with -Wall, mostly signed/unsigned
warnings. There are still some in the unit test, but the main .cc file should
be clean. We haven't enabled -Wall for the default build, since the unit test
is still not clean.

This also fixes a real bug in the open-source implementation of
ReadFileToStringOrDie(); it would not detect errors correctly.

I had to go through some pains to avoid performance loss as the types
were changed; I think there might still be some with 32-bit if and only if LFS
is enabled (ie., size_t is 64-bit), but for regular 32-bit and 64-bit I can't
see any losses, and I've diffed the generated GCC assembler between the old and
new code without seeing any significant choices. If anything, it's ever so
slightly faster.

This may or may not enable compression of very large blocks (>2^32 bytes)
when size_t is 64-bit, but I haven't checked, and it is still not a supported
case.


git-svn-id: https://snappy.googlecode.com/svn/trunk@56 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-01-04 13:10:46 +00:00
snappy.mirrorbot@gmail.com 0755c81519 Add a framing format description. We do not have any implementation of this at
the current point, but there seems to be enough of a general interest in the
topic (cf. public bug #34).

R=csilvers,sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@55 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-01-04 10:46:39 +00:00
snappy.mirrorbot@gmail.com d7eb2dc413 Speed up decompression by moving the refill check to the end of the loop.
This seems to work because in most of the branches, the compiler can evaluate
“ip_limit_ - ip” in a more efficient way than reloading ip_limit_ from memory
(either by already having the entire expression in a register, or reconstructing
it from “avail”, or something else). Memory loads, even from L1, are seemingly
costly in the big picture at the current decompression speeds.

Microbenchmarks (64-bit, opt mode):

Westmere (Intel Core i7):

  Benchmark     Time(ns)    CPU(ns) Iterations
  --------------------------------------------
  BM_UFlat/0       74492      74491     187894 1.3GB/s  html      [ +5.9%]
  BM_UFlat/1      712268     712263      19644 940.0MB/s  urls    [ +3.8%]
  BM_UFlat/2       10591      10590    1000000 11.2GB/s  jpg      [ -6.8%]
  BM_UFlat/3       29643      29643     469915 3.0GB/s  pdf       [ +7.9%]
  BM_UFlat/4      304669     304667      45930 1.3GB/s  html4     [ +4.8%]
  BM_UFlat/5       28508      28507     490077 823.1MB/s  cp      [ +4.0%]
  BM_UFlat/6       12415      12415    1000000 856.5MB/s  c       [ +8.6%]
  BM_UFlat/7        3415       3415    4084723 1039.0MB/s  lsp    [+18.0%]
  BM_UFlat/8      979569     979563      14261 1002.5MB/s  xls    [ +5.8%]
  BM_UFlat/9      230150     230148      60934 630.2MB/s  txt1    [ +5.2%]
  BM_UFlat/10     197167     197166      71135 605.5MB/s  txt2    [ +4.7%]
  BM_UFlat/11     607394     607390      23041 670.1MB/s  txt3    [ +5.6%]
  BM_UFlat/12     808502     808496      17316 568.4MB/s  txt4    [ +5.0%]
  BM_UFlat/13     372791     372788      37564 1.3GB/s  bin       [ +3.3%]
  BM_UFlat/14      44541      44541     313969 818.8MB/s  sum     [ +5.7%]
  BM_UFlat/15       4833       4833    2898697 834.1MB/s  man     [ +4.8%]
  BM_UFlat/16      79855      79855     175356 1.4GB/s  pb        [ +4.8%]
  BM_UFlat/17     245845     245843      56838 715.0MB/s  gaviota [ +5.8%]

Clovertown (Intel Core 2):

  Benchmark     Time(ns)    CPU(ns) Iterations
  --------------------------------------------
  BM_UFlat/0      107911     107890     100000 905.1MB/s  html    [ +2.2%]
  BM_UFlat/1     1011237    1011041      10000 662.3MB/s  urls    [ +2.5%]
  BM_UFlat/2       26775      26770     523089 4.4GB/s  jpg       [ +0.0%]
  BM_UFlat/3       48103      48095     290618 1.8GB/s  pdf       [ +3.4%]
  BM_UFlat/4      437724     437644      31937 892.6MB/s  html4   [ +2.1%]
  BM_UFlat/5       39607      39600     358284 592.5MB/s  cp      [ +2.4%]
  BM_UFlat/6       18227      18224     768191 583.5MB/s  c       [ +2.7%]
  BM_UFlat/7        5171       5170    2709437 686.4MB/s  lsp     [ +3.9%]
  BM_UFlat/8     1560291    1559989       8970 629.5MB/s  xls     [ +3.6%]
  BM_UFlat/9      335401     335343      41731 432.5MB/s  txt1    [ +3.0%]
  BM_UFlat/10     287014     286963      48758 416.0MB/s  txt2    [ +2.8%]
  BM_UFlat/11     888522     888356      15752 458.1MB/s  txt3    [ +2.9%]
  BM_UFlat/12    1186600    1186378      10000 387.3MB/s  txt4    [ +3.1%]
  BM_UFlat/13     572295     572188      24468 855.4MB/s  bin     [ +2.1%]
  BM_UFlat/14      64060      64049     218401 569.4MB/s  sum     [ +4.1%]
  BM_UFlat/15       7264       7263    1916168 555.0MB/s  man     [ +1.4%]
  BM_UFlat/16     108853     108836     100000 1039.1MB/s  pb     [ +1.7%]
  BM_UFlat/17     364289     364223      38419 482.6MB/s  gaviota [ +4.9%]

Barcelona (AMD Opteron):

  Benchmark     Time(ns)    CPU(ns) Iterations
  --------------------------------------------
  BM_UFlat/0      103900     103871     100000 940.2MB/s  html    [ +8.3%]
  BM_UFlat/1     1000435    1000107      10000 669.5MB/s  urls    [ +6.6%]
  BM_UFlat/2       24659      24652     567362 4.8GB/s  jpg       [ +0.1%]
  BM_UFlat/3       48206      48193     291121 1.8GB/s  pdf       [ +5.0%]
  BM_UFlat/4      421980     421850      33174 926.0MB/s  html4   [ +7.3%]
  BM_UFlat/5       40368      40357     346994 581.4MB/s  cp      [ +8.7%]
  BM_UFlat/6       19836      19830     708695 536.2MB/s  c       [ +8.0%]
  BM_UFlat/7        6100       6098    2292774 581.9MB/s  lsp     [ +9.0%]
  BM_UFlat/8     1693093    1692514       8261 580.2MB/s  xls     [ +8.0%]
  BM_UFlat/9      365991     365886      38225 396.4MB/s  txt1    [ +7.1%]
  BM_UFlat/10     311330     311238      44950 383.6MB/s  txt2    [ +7.6%]
  BM_UFlat/11     975037     974737      14376 417.5MB/s  txt3    [ +6.9%]
  BM_UFlat/12    1303558    1303175      10000 352.6MB/s  txt4    [ +7.3%]
  BM_UFlat/13     517448     517290      27144 946.2MB/s  bin     [ +5.5%]
  BM_UFlat/14      66537      66518     210352 548.3MB/s  sum     [ +7.5%]
  BM_UFlat/15       7976       7974    1760383 505.6MB/s  man     [ +5.6%]
  BM_UFlat/16     103121     103092     100000 1097.0MB/s  pb     [ +8.7%]
  BM_UFlat/17     391431     391314      35733 449.2MB/s  gaviota [ +6.5%]

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@54 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-12-05 21:27:26 +00:00
snappy.mirrorbot@gmail.com 5ed51ce15f Speed up decompression by making the fast path for literals faster.
We do the fast-path step as soon as possible; in fact, as soon as we know the
literal length. Since we usually hit the fast path, we can then skip the checks
for long literals and available input space (beyond what the fast path check
already does).

Note that this changes the decompression Writer API; however, it does not
change the ABI, since writers are always templatized and as such never
cross compilation units. The new API is slightly more general, in that it
doesn't hard-code the value 16. Note that we also take care to check
for len <= 16 first, since the other two checks almost always succeed
(so we don't want to waste time checking for them until we have to).

The improvements are most marked on Nehalem, but are generally positive
on other platforms as well. All microbenchmarks are 64-bit, opt.

Clovertown (Core 2):

  Benchmark     Time(ns)    CPU(ns) Iterations
  --------------------------------------------
  BM_UFlat/0      110226     110224     100000 886.0MB/s  html    [ +1.5%]
  BM_UFlat/1     1036523    1036508      10000 646.0MB/s  urls    [ -0.8%]
  BM_UFlat/2       26775      26775     522570 4.4GB/s  jpg       [ +0.0%]
  BM_UFlat/3       49738      49737     280974 1.8GB/s  pdf       [ +0.3%]
  BM_UFlat/4      446790     446792      31334 874.3MB/s  html4   [ +0.8%]
  BM_UFlat/5       40561      40562     350424 578.5MB/s  cp      [ +1.3%]
  BM_UFlat/6       18722      18722     746903 568.0MB/s  c       [ +1.4%]
  BM_UFlat/7        5373       5373    2608632 660.5MB/s  lsp     [ +8.3%]
  BM_UFlat/8     1615716    1615718       8670 607.8MB/s  xls     [ +2.0%]
  BM_UFlat/9      345278     345281      40481 420.1MB/s  txt1    [ +1.4%]
  BM_UFlat/10     294855     294855      47452 404.9MB/s  txt2    [ +1.6%]
  BM_UFlat/11     914263     914263      15316 445.2MB/s  txt3    [ +1.1%]
  BM_UFlat/12    1222694    1222691      10000 375.8MB/s  txt4    [ +1.4%]
  BM_UFlat/13     584495     584489      23954 837.4MB/s  bin     [ -0.6%]
  BM_UFlat/14      66662      66662     210123 547.1MB/s  sum     [ +1.2%]
  BM_UFlat/15       7368       7368    1881856 547.1MB/s  man     [ +4.0%]
  BM_UFlat/16     110727     110726     100000 1021.4MB/s  pb     [ +2.3%]
  BM_UFlat/17     382138     382141      36616 460.0MB/s  gaviota [ -0.7%]

Westmere (Core i7):

  Benchmark     Time(ns)    CPU(ns) Iterations
  --------------------------------------------
  BM_UFlat/0       78861      78853     177703 1.2GB/s  html      [ +2.1%]
  BM_UFlat/1      739560     739491      18912 905.4MB/s  urls    [ +3.4%]
  BM_UFlat/2        9867       9866    1419014 12.0GB/s  jpg      [ +3.4%]
  BM_UFlat/3       31989      31986     438385 2.7GB/s  pdf       [ +0.2%]
  BM_UFlat/4      319406     319380      43771 1.2GB/s  html4     [ +1.9%]
  BM_UFlat/5       29639      29636     472862 791.7MB/s  cp      [ +5.2%]
  BM_UFlat/6       13478      13477    1000000 789.0MB/s  c       [ +2.3%]
  BM_UFlat/7        4030       4029    3475364 880.7MB/s  lsp     [ +8.7%]
  BM_UFlat/8     1036585    1036492      10000 947.5MB/s  xls     [ +6.9%]
  BM_UFlat/9      242127     242105      57838 599.1MB/s  txt1    [ +3.0%]
  BM_UFlat/10     206499     206480      67595 578.2MB/s  txt2    [ +3.4%]
  BM_UFlat/11     641635     641570      21811 634.4MB/s  txt3    [ +2.4%]
  BM_UFlat/12     848847     848769      16443 541.4MB/s  txt4    [ +3.1%]
  BM_UFlat/13     384968     384938      36366 1.2GB/s  bin       [ +0.3%]
  BM_UFlat/14      47106      47101     297770 774.3MB/s  sum     [ +4.4%]
  BM_UFlat/15       5063       5063    2772202 796.2MB/s  man     [ +7.7%]
  BM_UFlat/16      83663      83656     167697 1.3GB/s  pb        [ +1.8%]
  BM_UFlat/17     260224     260198      53823 675.6MB/s  gaviota [ -0.5%]

Barcelona (Opteron):

  Benchmark     Time(ns)    CPU(ns) Iterations
  --------------------------------------------
  BM_UFlat/0      112490     112457     100000 868.4MB/s  html    [ -0.4%]
  BM_UFlat/1     1066719    1066339      10000 627.9MB/s  urls    [ +1.0%]
  BM_UFlat/2       24679      24672     563802 4.8GB/s  jpg       [ +0.7%]
  BM_UFlat/3       50603      50589     277285 1.7GB/s  pdf       [ +2.6%]
  BM_UFlat/4      452982     452849      30900 862.6MB/s  html4   [ -0.2%]
  BM_UFlat/5       43860      43848     319554 535.1MB/s  cp      [ +1.2%]
  BM_UFlat/6       21419      21413     653573 496.6MB/s  c       [ +1.0%]
  BM_UFlat/7        6646       6645    2105405 534.1MB/s  lsp     [ +0.3%]
  BM_UFlat/8     1828487    1827886       7658 537.3MB/s  xls     [ +2.6%]
  BM_UFlat/9      391824     391714      35708 370.3MB/s  txt1    [ +2.2%]
  BM_UFlat/10     334913     334816      41885 356.6MB/s  txt2    [ +1.7%]
  BM_UFlat/11    1042062    1041674      10000 390.7MB/s  txt3    [ +1.1%]
  BM_UFlat/12    1398902    1398456      10000 328.6MB/s  txt4    [ +1.7%]
  BM_UFlat/13     545706     545530      25669 897.2MB/s  bin     [ -0.4%]
  BM_UFlat/14      71512      71505     196035 510.0MB/s  sum     [ +1.4%]
  BM_UFlat/15       8422       8421    1665036 478.7MB/s  man     [ +2.6%]
  BM_UFlat/16     112053     112048     100000 1009.3MB/s  pb     [ -0.4%]
  BM_UFlat/17     416723     416713      33612 421.8MB/s  gaviota [ -2.0%]

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@53 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-11-23 11:14:17 +00:00
snappy.mirrorbot@gmail.com 0c1b9c3904 Fix public issue #53: Update the README to the API we actually open-sourced
with.

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@52 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-11-08 14:46:39 +00:00
snappy.mirrorbot@gmail.com b61134bc0a In the format description, use a clearer example to emphasize that varints are
stored in little-endian. Patch from Christian von Roques.

R=csilvers


git-svn-id: https://snappy.googlecode.com/svn/trunk@51 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-10-05 12:27:12 +00:00
snappy.mirrorbot@gmail.com 21a2e4f557 Release Snappy 1.0.4.
R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@50 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-09-15 19:34:06 +00:00
snappy.mirrorbot@gmail.com e2e3032868 Fix public issue #50: Include generic byteswap macros.
Also include Solaris 10 and FreeBSD versions.

R=csilvers


git-svn-id: https://snappy.googlecode.com/svn/trunk@49 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-09-15 09:50:05 +00:00
snappy.mirrorbot@gmail.com 593002da3c Partially fix public issue 50: Remove an extra comma from the end of some
enum declarations, as it seems the Sun compiler does not like it.

Based on patch by Travis Vitek.


git-svn-id: https://snappy.googlecode.com/svn/trunk@48 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-08-10 18:57:27 +00:00
snappy.mirrorbot@gmail.com f1063a5dc4 Use the right #ifdef test for sys/mman.h.
Based on patch by Travis Vitek.


git-svn-id: https://snappy.googlecode.com/svn/trunk@47 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-08-10 18:44:16 +00:00
snappy.mirrorbot@gmail.com 41c827a2fa Fix public issue #47: Small comment cleanups in the unit test.
Originally based on a patch by Patrick Pelletier.

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@46 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-08-10 01:22:09 +00:00
snappy.mirrorbot@gmail.com 59aeffa604 Fix public issue #46: Format description said "3-byte offset"
instead of "4-byte offset" for the longest copies.

Also fix an inconsistency in the heading for section 2.2.3.
Both patches by Patrick Pelletier.

R=csilvers


git-svn-id: https://snappy.googlecode.com/svn/trunk@45 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-08-10 01:14:43 +00:00
snappy.mirrorbot@gmail.com 57e7cd7255 Fix public issue #44: Make the definition and declaration of CompressFragment
identical, even regarding cv-qualifiers.

This is required to work around a bug in the Solaris Studio C++ compiler
(it does not properly disregard cv-qualifiers when doing name mangling).

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@44 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-06-28 11:40:25 +00:00
snappy.mirrorbot@gmail.com 13c4a449a8 Correct an inaccuracy in the Snappy format description.
(I stumbled into this when changing the way we decompress literals.) 

R=csilvers

Revision created by MOE tool push_codebase.


git-svn-id: https://snappy.googlecode.com/svn/trunk@43 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-06-04 10:19:05 +00:00
snappy.mirrorbot@gmail.com f540673740 Speed up decompression by removing a fast-path attempt.
Whenever we try to enter a copy fast-path, there is a certain cost in checking
that all the preconditions are in place, but it's normally offset by the fact
that we can usually take the cheaper path. However, in a certain path we've
already established that "avail < literal_length", which usually means that
either the available space is small, or the literal is big. Both will disqualify
us from taking the fast path, and thus we take the hit from the precondition
checking without gaining much from having a fast path. Thus, simply don't try
the fast path in this situation -- we're already on a slow path anyway
(one where we need to refill more data from the reader).

I'm a bit surprised at how much this gained; it could be that this path is
more common than I thought, or that the simpler structure somehow makes the
compiler happier. I haven't looked at the assembler, but it's a win across
the board on both Core 2, Core i7 and Opteron, at least for the cases we
typically care about. The gains seem to be the largest on Core i7, though.
Results from my Core i7 workstation:


  Benchmark            Time(ns)    CPU(ns) Iterations
  ---------------------------------------------------
  BM_UFlat/0              73337      73091     190996 1.3GB/s  html      [ +1.7%]
  BM_UFlat/1             696379     693501      20173 965.5MB/s  urls    [ +2.7%]
  BM_UFlat/2               9765       9734    1472135 12.1GB/s  jpg      [ +0.7%]
  BM_UFlat/3              29720      29621     472973 3.0GB/s  pdf       [ +1.8%]
  BM_UFlat/4             294636     293834      47782 1.3GB/s  html4     [ +2.3%]
  BM_UFlat/5              28399      28320     494700 828.5MB/s  cp      [ +3.5%]
  BM_UFlat/6              12795      12760    1000000 833.3MB/s  c       [ +1.2%]
  BM_UFlat/7               3984       3973    3526448 893.2MB/s  lsp     [ +5.7%]
  BM_UFlat/8             991996     989322      14141 992.6MB/s  xls     [ +3.3%]
  BM_UFlat/9             228620     227835      61404 636.6MB/s  txt1    [ +4.0%]
  BM_UFlat/10            197114     196494      72165 607.5MB/s  txt2    [ +3.5%]
  BM_UFlat/11            605240     603437      23217 674.4MB/s  txt3    [ +3.7%]
  BM_UFlat/12            804157     802016      17456 573.0MB/s  txt4    [ +3.9%]
  BM_UFlat/13            347860     346998      40346 1.4GB/s  bin       [ +1.2%]
  BM_UFlat/14             44684      44559     315315 818.4MB/s  sum     [ +2.3%]
  BM_UFlat/15              5120       5106    2739726 789.4MB/s  man     [ +3.3%]
  BM_UFlat/16             76591      76355     183486 1.4GB/s  pb        [ +2.8%]
  BM_UFlat/17            238564     237828      58824 739.1MB/s  gaviota [ +1.6%]
  BM_UValidate/0          42194      42060     333333 2.3GB/s  html      [ -0.1%]
  BM_UValidate/1         433182     432005      32407 1.5GB/s  urls      [ -0.1%]
  BM_UValidate/2            197        196   71428571 603.3GB/s  jpg     [ +0.5%]
  BM_UValidate/3          14494      14462     972222 6.1GB/s  pdf       [ +0.5%]
  BM_UValidate/4         168444     167836      83832 2.3GB/s  html4     [ +0.1%]
	
R=jeff

Revision created by MOE tool push_codebase.


git-svn-id: https://snappy.googlecode.com/svn/trunk@42 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-06-03 20:53:06 +00:00
snappy.mirrorbot@gmail.com 197f3ee9f9 Speed up decompression by not needing a lookup table for literal items.
Looking up into and decoding the values from char_table has long shown up as a
hotspot in the decompressor. While it turns out that it's hard to make a more
efficient decoder for the copy ops, the literals are simple enough that we can
decode them without needing a table lookup. (This means that 1/4 of the table
is now unused, although that in itself doesn't buy us anything.)

The gains are small, but definitely present; some tests win as much as 10%,
but 1-4% is more typical. These results are from Core i7, in 64-bit mode;
Core 2 and Opteron show similar results. (I've run with more iterations
than unusual to make sure the smaller gains don't drown entirely in noise.)

  Benchmark            Time(ns)    CPU(ns) Iterations
  ---------------------------------------------------
  BM_UFlat/0              74665      74428     182055 1.3GB/s  html      [ +3.1%]
  BM_UFlat/1             714106     711997      19663 940.4MB/s  urls    [ +4.4%]
  BM_UFlat/2               9820       9789    1427115 12.1GB/s  jpg      [ -1.2%]
  BM_UFlat/3              30461      30380     465116 2.9GB/s  pdf       [ +0.8%]
  BM_UFlat/4             301445     300568      46512 1.3GB/s  html4     [ +2.2%]
  BM_UFlat/5              29338      29263     479452 801.8MB/s  cp      [ +1.6%]
  BM_UFlat/6              13004      12970    1000000 819.9MB/s  c       [ +2.1%]
  BM_UFlat/7               4180       4168    3349282 851.4MB/s  lsp     [ +1.3%]
  BM_UFlat/8            1026149    1024000      10000 959.0MB/s  xls     [+10.7%]
  BM_UFlat/9             237441     236830      59072 612.4MB/s  txt1    [ +0.3%]
  BM_UFlat/10            203966     203298      69307 587.2MB/s  txt2    [ +0.8%]
  BM_UFlat/11            627230     625000      22400 651.2MB/s  txt3    [ +0.7%]
  BM_UFlat/12            836188     833979      16787 551.0MB/s  txt4    [ +1.3%]
  BM_UFlat/13            351904     350750      39886 1.4GB/s  bin       [ +3.8%]
  BM_UFlat/14             45685      45562     308370 800.4MB/s  sum     [ +5.9%]
  BM_UFlat/15              5286       5270    2656546 764.9MB/s  man     [ +1.5%]
  BM_UFlat/16             78774      78544     178117 1.4GB/s  pb        [ +4.3%]
  BM_UFlat/17            242270     241345      58091 728.3MB/s  gaviota [ +1.2%]
  BM_UValidate/0          42149      42000     333333 2.3GB/s  html      [ -3.0%]
  BM_UValidate/1         432741     431303      32483 1.5GB/s  urls      [ +7.8%]
  BM_UValidate/2            198        197   71428571 600.7GB/s  jpg     [+16.8%]
  BM_UValidate/3          14560      14521     965517 6.1GB/s  pdf       [ -4.1%]
  BM_UValidate/4         169065     168671      83832 2.3GB/s  html4     [ -2.9%]

R=jeff

Revision created by MOE tool push_codebase.


git-svn-id: https://snappy.googlecode.com/svn/trunk@41 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-06-03 20:47:14 +00:00
snappy.mirrorbot@gmail.com 8efa2639e8 Release Snappy 1.0.3.
git-svn-id: https://snappy.googlecode.com/svn/trunk@40 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-06-02 22:57:41 +00:00
snappy.mirrorbot@gmail.com 2e12124bd8 Remove an unneeded goto in the decompressor; it turns out that the
state of ip_ after decompression (or attempted decompresion) is
completely irrelevant, so we don't need the trailer.

Performance is, as expected, mostly flat -- there's a curious ~3-5%
loss in the "lsp" test, but that test case is so short it is hard to say
anything definitive about why (most likely, it's some sort of
unrelated effect).

R=jeff


git-svn-id: https://snappy.googlecode.com/svn/trunk@39 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-06-02 18:06:54 +00:00
snappy.mirrorbot@gmail.com c266bbf321 Speed up decompression by caching ip_.
It is seemingly hard for the compiler to understand that ip_, the current input
pointer into the compressed data stream, can not alias on anything else, and
thus using it directly will incur memory traffic as it cannot be kept in a
register. The code already knew about this and cached it into a local
variable, but since Step() only decoded one tag, it had to move ip_ back into
place between every tag. This seems to have cost us a significant amount of
performance, so changing Step() into a function that decodes as much as it can
before it saves ip_ back and returns. (Note that Step() was already inlined,
so it is not the manual inlining that buys the performance here.)

The wins are about 3-6% for Core 2, 6-13% on Core i7 and 5-12% on Opteron
(for plain array-to-array decompression, in 64-bit opt mode).

There is a tiny difference in the behavior here; if an invalid literal is
encountered (ie., the writer refuses the Append() operation), ip_ will now
point to the byte past the tag byte, instead of where the literal was
originally thought to end. However, we don't use ip_ for anything after
DecompressAllTags() has returned, so this should not change external behavior
in any way.

Microbenchmark results for Core i7, 64-bit (Opteron results are similar):

Benchmark            Time(ns)    CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0              79134      79110       8835 1.2GB/s  html      [ +6.2%]
BM_UFlat/1             786126     786096        891 851.8MB/s  urls    [+10.0%]
BM_UFlat/2               9948       9948      69125 11.9GB/s  jpg      [ -1.3%]
BM_UFlat/3              31999      31998      21898 2.7GB/s  pdf       [ +6.5%]
BM_UFlat/4             318909     318829       2204 1.2GB/s  html4     [ +6.5%]
BM_UFlat/5              31384      31390      22363 747.5MB/s  cp      [ +9.2%]
BM_UFlat/6              14037      14034      49858 757.7MB/s  c       [+10.6%]
BM_UFlat/7               4612       4612     151395 769.5MB/s  lsp     [ +9.5%]
BM_UFlat/8            1203174    1203007        582 816.3MB/s  xls     [+19.3%]
BM_UFlat/9             253869     253955       2757 571.1MB/s  txt1    [+11.4%]
BM_UFlat/10            219292     219290       3194 544.4MB/s  txt2    [+12.1%]
BM_UFlat/11            672135     672131       1000 605.5MB/s  txt3    [+11.2%]
BM_UFlat/12            902512     902492        776 509.2MB/s  txt4    [+12.5%]
BM_UFlat/13            372110     371998       1881 1.3GB/s  bin       [ +5.8%]
BM_UFlat/14             50407      50407      10000 723.5MB/s  sum     [+13.5%]
BM_UFlat/15              5699       5701     100000 707.2MB/s  man     [+12.4%]
BM_UFlat/16             83448      83424       8383 1.3GB/s  pb        [ +5.7%]
BM_UFlat/17            256958     256963       2723 684.1MB/s  gaviota [ +7.9%]
BM_UValidate/0          42795      42796      16351 2.2GB/s  html      [+25.8%]
BM_UValidate/1         490672     490622       1427 1.3GB/s  urls      [+22.7%]
BM_UValidate/2            237        237    2950297 499.0GB/s  jpg     [+24.9%]
BM_UValidate/3          14610      14611      47901 6.0GB/s  pdf       [+26.8%]
BM_UValidate/4         171973     171990       4071 2.2GB/s  html4     [+25.7%]




git-svn-id: https://snappy.googlecode.com/svn/trunk@38 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-06-02 17:59:40 +00:00
snappy.mirrorbot@gmail.com d0ee043bc5 Fix the numbering of the headlines in the Snappy format description.
R=csilvers
DELTA=4  (0 added, 0 deleted, 4 changed)


Revision created by MOE tool push_codebase.
MOE_MIGRATION=1906


git-svn-id: https://snappy.googlecode.com/svn/trunk@37 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-05-17 08:48:25 +00:00
snappy.mirrorbot@gmail.com 6c7053871f Fix public issue #32: Add compressed format documentation for Snappy.
This text is new, but an earlier version from Zeev Tarantov was used
as reference.

R=csilvers
DELTA=112  (111 added, 0 deleted, 1 changed)


Revision created by MOE tool push_codebase.
MOE_MIGRATION=1867


git-svn-id: https://snappy.googlecode.com/svn/trunk@36 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-05-16 08:59:18 +00:00
snappy.mirrorbot@gmail.com a1f9f9973d Fix public issue #39: Pick out the median runs based on CPU time,
not real time. Also, use nth_element instead of sort, since we
only need one element.

R=csilvers
DELTA=5  (3 added, 0 deleted, 2 changed)


Revision created by MOE tool push_codebase.
MOE_MIGRATION=1799


git-svn-id: https://snappy.googlecode.com/svn/trunk@35 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-05-09 21:29:02 +00:00
snappy.mirrorbot@gmail.com f7b105683c Fix public issue #38: Make the microbenchmark framework handle
properly cases where gettimeofday() can stand return the same
result twice (as sometimes on GNU/Hurd) or go backwards
(as when the user adjusts the clock). We avoid a division-by-zero,
and put a lower bound on the number of iterations -- the same
amount as we use to calibrate.

We should probably use CLOCK_MONOTONIC for platforms that support
it, to be robust against clock adjustments; we already use Windows'
monotonic timers. However, that's for a later changelist.

R=csilvers
DELTA=7  (5 added, 0 deleted, 2 changed)


Revision created by MOE tool push_codebase.
MOE_MIGRATION=1798


git-svn-id: https://snappy.googlecode.com/svn/trunk@34 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-05-09 21:28:45 +00:00
snappy.mirrorbot@gmail.com d8d481427a Fix public issue #37: Only link snappy_unittest against -lz and other autodetected
libraries, not libsnappy.so (which doesn't need any such dependency).

R=csilvers
DELTA=20  (14 added, 0 deleted, 6 changed)


Revision created by MOE tool push_codebase.
MOE_MIGRATION=1710


git-svn-id: https://snappy.googlecode.com/svn/trunk@33 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-05-03 23:22:52 +00:00
snappy.mirrorbot@gmail.com bcecf195c0 Release Snappy 1.0.2, to get the license change and various other fixes into
a release.

R=csilvers
DELTA=239  (236 added, 0 deleted, 3 changed)


Revision created by MOE tool push_codebase.
MOE_MIGRATION=1709


git-svn-id: https://snappy.googlecode.com/svn/trunk@32 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-05-03 23:22:33 +00:00