Snappy update w.r.t. LZO 2.05

Zeev Tarantov zeev.tarantov at gmail.com
Sat Apr 30 20:13:28 UTC 2011


I have submitted a kernel port of Google's Snappy compression library:
http://driverdev.linuxdriverproject.org/pipermail/devel/2011-April/015122.html
http://driverdev.linuxdriverproject.org/pipermail/devel/2011-April/015126.html

It is significantly (x4) faster than the LZO code currently in the kernel.

LZO 2.05 was recently released. I assume the kernel will upgrade when the kernel
port is ready. The port is not completely trivial because of the use of
 unaligned memory, endianness and bitops.
This version introduced the following optimizations first introduced in Snappy:

1. 64 bit memory access
2. unaligned multibyte memory access
3. 32bit multiplication in hash function
4. ctz or clz of xor for determining match length
5. when compressing, skip matching to previously seen bytes a single byte of
   input for every 32 incompressible bytes of input seen.

Updated benchmark results, Google's Snappy test suite:

testdata/alice29.txt                     :
ZLIB:    [b 1M] bytes 152089 ->  54404 35.8%  comp   9.8 MB/s  uncomp 138.0 MB/s
LZO204:  [b 1M] bytes 152089 ->  82691 54.4%  comp  64.6 MB/s  uncomp 206.3 MB/s
LZO205:  [b 1M] bytes 152089 ->  87825 57.7%  comp 175.4 MB/s  uncomp 240.0 MB/s
CSNAPPY: [b 1M] bytes 152089 ->  90965 59.8%  comp 173.7 MB/s  uncomp 409.6 MB/s
SNAPPY:  [b 4M] bytes 152089 ->  90965 59.8%  comp 174.9 MB/s  uncomp 401.6 MB/s
testdata/asyoulik.txt                    :
ZLIB:    [b 1M] bytes 125179 ->  48897 39.1%  comp   9.0 MB/s  uncomp 131.0 MB/s
LZO204:  [b 1M] bytes 125179 ->  73217 58.5%  comp  59.6 MB/s  uncomp 202.1 MB/s
LZO205:  [b 1M] bytes 125179 ->  77041 61.5%  comp 164.4 MB/s  uncomp 237.4 MB/s
CSNAPPY: [b 1M] bytes 125179 ->  80207 64.1%  comp 163.6 MB/s  uncomp 387.7 MB/s
SNAPPY:  [b 4M] bytes 125179 ->  80207 64.1%  comp 164.6 MB/s  uncomp 378.9 MB/s
testdata/cp.html                         :
ZLIB:    [b 1M] bytes  24603 ->   7961 32.4%  comp  23.0 MB/s  uncomp 142.0 MB/s
LZO204:  [b 1M] bytes  24603 ->  11621 47.2%  comp  66.8 MB/s  uncomp 300.0 MB/s
LZO205:  [b 1M] bytes  24603 ->  11909 48.4%  comp 218.1 MB/s  uncomp 336.9 MB/s
CSNAPPY: [b 1M] bytes  24603 ->  11838 48.1%  comp 228.9 MB/s  uncomp 548.1 MB/s
SNAPPY:  [b 4M] bytes  24603 ->  11838 48.1%  comp 227.6 MB/s  uncomp 523.3 MB/s
testdata/fields.c                        :
ZLIB:    [b 1M] bytes  11150 ->   3122 28.0%  comp  25.2 MB/s  uncomp 147.5 MB/s
LZO204:  [b 1M] bytes  11150 ->   4663 41.8%  comp  86.2 MB/s  uncomp 304.5 MB/s
LZO205:  [b 1M] bytes  11150 ->   4711 42.3%  comp 253.3 MB/s  uncomp 346.1 MB/s
CSNAPPY: [b 1M] bytes  11150 ->   4728 42.4%  comp 251.7 MB/s  uncomp 536.5 MB/s
SNAPPY:  [b 4M] bytes  11150 ->   4728 42.4%  comp 249.6 MB/s  uncomp 515.2 MB/s
testdata/geo.protodata                   :
ZLIB:    [b 1M] bytes 118588 ->  15131 12.8%  comp  43.2 MB/s  uncomp 310.1 MB/s
LZO204:  [b 1M] bytes 118588 ->  20026 16.9%  comp 150.2 MB/s  uncomp 639.7 MB/s
LZO205:  [b 1M] bytes 118588 ->  23965 20.2%  comp 487.6 MB/s  uncomp 705.7 MB/s
CSNAPPY: [b 1M] bytes 118588 ->  27459 23.2%  comp 469.0 MB/s  uncomp 985.8 MB/s
SNAPPY:  [b 4M] bytes 118588 ->  27459 23.2%  comp 466.1 MB/s  uncomp 954.6 MB/s
testdata/grammar.lsp                     :
ZLIB:    [b 1M] bytes   3721 ->   1222 32.8%  comp  24.0 MB/s  uncomp 109.3 MB/s
LZO204:  [b 1M] bytes   3721 ->   1781 47.9%  comp  79.2 MB/s  uncomp 360.8 MB/s
LZO205:  [b 1M] bytes   3721 ->   1811 48.7%  comp 232.3 MB/s  uncomp 442.2 MB/s
CSNAPPY: [b 1M] bytes   3721 ->   1800 48.4%  comp 257.6 MB/s  uncomp 612.8 MB/s
SNAPPY:  [b 4M] bytes   3721 ->   1800 48.4%  comp 250.1 MB/s  uncomp 570.9 MB/s
testdata/house.jpg                       :
ZLIB:    [b 1M] bytes 126958 -> 126513 99.6%  comp  19.0 MB/s uncomp  231.8 MB/s
LZO204: [b 1M] bytes 126958 -> 127173 100.2% comp   23.5 MB/s uncomp 1635.4 MB/s
LZO205: [b 1M] bytes 126958 -> 127303 100.3% comp 1051.1 MB/s uncomp 3762.4 MB/s
CSNAPPY: [b 1M] bytes 126958 -> 126803 99.9% comp 2365.1 MB/s uncomp 8190.2 MB/s
SNAPPY:  [b 4M] bytes 126958 -> 126803 99.9% comp 2326.8 MB/s uncomp 8402.5 MB/s
testdata/html                            :
ZLIB:    [b 1M] bytes 102400 ->  13699 13.4%  comp  35.6 MB/s  uncomp 273.4 MB/s
LZO204:  [b 1M] bytes 102400 ->  21027 20.5%  comp 135.7 MB/s  uncomp 494.3 MB/s
LZO205:  [b 1M] bytes 102400 ->  22547 22.0%  comp 421.6 MB/s  uncomp 557.5 MB/s
CSNAPPY: [b 1M] bytes 102400 ->  24140 23.6%  comp 425.8 MB/s  uncomp 873.0 MB/s
SNAPPY:  [b 4M] bytes 102400 ->  24140 23.6%  comp 422.9 MB/s  uncomp 845.4 MB/s
testdata/html_x_4                        :
ZLIB:    [b 1M] bytes 409600 ->  53367 13.0%  comp  32.1 MB/s  uncomp 277.7 MB/s
LZO204:  [b 1M] bytes 409600 ->  82980 20.3%  comp 143.3 MB/s  uncomp 487.0 MB/s
LZO205:  [b 1M] bytes 409600 ->  89475 21.8%  comp 428.2 MB/s  uncomp 556.1 MB/s
CSNAPPY: [b 1M] bytes 409600 ->  96472 23.6%  comp 423.4 MB/s  uncomp 870.8 MB/s
SNAPPY:  [b 4M] bytes 409600 ->  96472 23.6%  comp 418.3 MB/s  uncomp 830.5 MB/s
testdata/kennedy.xls                     :
ZLIB:    [b 1M] bytes 1029744 -> 203992 19.8%  comp  15.8 MB/s uncomp 230.0 MB/s
LZO204:  [b 1M] bytes 1029744 -> 357315 34.7%  comp 159.1 MB/s uncomp 624.6 MB/s
LZO205:  [b 1M] bytes 1029744 -> 362984 35.2%  comp 413.2 MB/s uncomp 736.1 MB/s
CSNAPPY: [b 1M] bytes 1029744 -> 425735 41.3%  comp 354.9 MB/s uncomp 564.4 MB/s
SNAPPY:  [b 4M] bytes 1029744 -> 425735 41.3%  comp 350.0 MB/s uncomp 513.0 MB/s
testdata/kppkn.gtb                       :
ZLIB:    [b 1M] bytes 184320 ->  38751 21.0%  comp   7.2 MB/s  uncomp 180.9 MB/s
LZO204:  [b 1M] bytes 184320 ->  71671 38.9%  comp  98.6 MB/s  uncomp 274.8 MB/s
LZO205:  [b 1M] bytes 184320 ->  71445 38.8%  comp 295.0 MB/s  uncomp 321.9 MB/s
CSNAPPY: [b 1M] bytes 184320 ->  70535 38.3%  comp 271.8 MB/s  uncomp 483.8 MB/s
SNAPPY:  [b 4M] bytes 184320 ->  70535 38.3%  comp 273.9 MB/s  uncomp 464.5 MB/s
testdata/lcet10.txt                      :
ZLIB:    [b 1M] bytes 426754 -> 144904 34.0%  comp  10.0 MB/s  uncomp 142.8 MB/s
LZO204:  [b 1M] bytes 426754 -> 221290 51.9%  comp  67.3 MB/s  uncomp 212.3 MB/s
LZO205:  [b 1M] bytes 426754 -> 236699 55.5%  comp 182.2 MB/s  uncomp 248.3 MB/s
CSNAPPY: [b 1M] bytes 426754 -> 243710 57.1%  comp 181.7 MB/s  uncomp 437.4 MB/s
SNAPPY:  [b 4M] bytes 426754 -> 243710 57.1%  comp 183.0 MB/s  uncomp 428.3 MB/s
testdata/mapreduce-osdi-1.pdf            :
ZLIB:    [b 1M] bytes  94330 ->  74928 79.4%  comp  22.4 MB/s  uncomp 177.9 MB/s
LZO204:  [b 1M] bytes  94330 ->  76999 81.6%  comp  29.0 MB/s  uncomp 938.7 MB/s
LZO205:  [b 1M] bytes  94330 -> 94704 100.4% comp 1057.4 MB/s uncomp 3974.6 MB/s
CSNAPPY: [b 1M] bytes  94330 ->  77477 82.1%  comp 833.6 MB/s uncomp 2115.4 MB/s
SNAPPY:  [b 4M] bytes  94330 ->  77477 82.1%  comp 832.2 MB/s uncomp 1997.5 MB/s
testdata/plrabn12.txt                    :
ZLIB:    [b 1M] bytes 481861 -> 195261 40.5%  comp   7.5 MB/s  uncomp 130.1 MB/s
LZO204:  [b 1M] bytes 481861 -> 294610 61.1%  comp  59.1 MB/s  uncomp 192.3 MB/s
LZO205:  [b 1M] bytes 481861 -> 314012 65.2%  comp 155.7 MB/s  uncomp 229.7 MB/s
CSNAPPY: [b 1M] bytes 481861 -> 329339 68.3%  comp 153.4 MB/s  uncomp 363.5 MB/s
SNAPPY:  [b 4M] bytes 481861 -> 329339 68.3%  comp 154.5 MB/s  uncomp 354.9 MB/s
testdata/ptt5                            :
ZLIB:    [b 1M] bytes 513216 ->  56465 11.0%  comp  25.8 MB/s  uncomp 269.0 MB/s
LZO204:  [b 1M] bytes 513216 ->  86232 16.8%  comp 139.7 MB/s  uncomp 590.6 MB/s
LZO205:  [b 1M] bytes 513216 ->  87278 17.0%  comp 551.6 MB/s  uncomp 667.6 MB/s
CSNAPPY: [b 1M] bytes 513216 ->  93455 18.2%  comp 555.0 MB/s  uncomp 845.6 MB/s
SNAPPY:  [b 4M] bytes 513216 ->  93455 18.2%  comp 553.1 MB/s  uncomp 795.0 MB/s
testdata/sum                             :
ZLIB:    [b 1M] bytes  38240 ->  12990 34.0%  comp  13.9 MB/s  uncomp 144.6 MB/s
LZO204:  [b 1M] bytes  38240 ->  17686 46.2%  comp  67.1 MB/s  uncomp 311.0 MB/s
LZO205:  [b 1M] bytes  38240 ->  18086 47.3%  comp 230.6 MB/s  uncomp 373.5 MB/s
CSNAPPY: [b 1M] bytes  38240 ->  19837 51.9%  comp 228.7 MB/s  uncomp 513.1 MB/s
SNAPPY:  [b 4M] bytes  38240 ->  19837 51.9%  comp 226.7 MB/s  uncomp 479.2 MB/s
testdata/urls.10K                        :
ZLIB:    [b 1M] bytes 702087 -> 222613 31.7%  comp  18.2 MB/s  uncomp 160.0 MB/s
LZO204:  [b 1M] bytes 702087 -> 309320 44.1%  comp  64.5 MB/s  uncomp 309.2 MB/s
LZO205:  [b 1M] bytes 702087 -> 345814 49.3%  comp 226.3 MB/s  uncomp 376.5 MB/s
CSNAPPY: [b 1M] bytes 702087 -> 357267 50.9%  comp 240.1 MB/s  uncomp 645.5 MB/s
SNAPPY:  [b 4M] bytes 702087 -> 357267 50.9%  comp 239.3 MB/s  uncomp 598.7 MB/s
testdata/xargs.1                         :
ZLIB:    [b 1M] bytes   4227 ->   1736 41.1%  comp  23.2 MB/s  uncomp 104.0 MB/s
LZO204:  [b 1M] bytes   4227 ->   2450 58.0%  comp  65.2 MB/s  uncomp 333.1 MB/s
LZO205:  [b 1M] bytes   4227 ->   2468 58.4%  comp 192.3 MB/s  uncomp 392.1 MB/s
CSNAPPY: [b 1M] bytes   4227 ->   2509 59.4%  comp 215.9 MB/s  uncomp 499.1 MB/s
SNAPPY:  [b 4M] bytes   4227 ->   2509 59.4%  comp 208.7 MB/s  uncomp 477.0 MB/s

These show that Snappy is ~50% faster than LZO while decompressing but when
 compressing they are about the same. LZO lost some of the compression ratio
 advantage: 2.05 is at about the half point between 2.04 and Snappy.

My block compressor, working on 4KB at a time (simulating zram), on some big
 files from my /usr directory:

compressing: /usr/lib64/chromium-browser/chrome
compressor: SNAPPY
#pages: 10392
> 100%	:341
> 50%	:8445
<= 50%	:1606
0.174652181 seconds
ratio: 27932299 * 100 / 42562848 = 65 %
compressor: LZO
#pages: 10392
> 100%	:495
> 50%	:8080
<= 50%	:1817
0.220447504 seconds
ratio: 27150908 * 100 / 42562848 = 63 %
compressor: ZLIB
#pages: 10392
> 100%	:0
> 50%	:5800
<= 50%	:4592
2.395360610 seconds
ratio: 20904235 * 100 / 42562848 = 49 %
compressing: /usr/lib64/qt4/libQtWebKit.so.4.7.2
compressor: SNAPPY
#pages: 5342
> 100%	:219
> 50%	:3405
<= 50%	:1718
0.080079531 seconds
ratio: 13290800 * 100 / 21877760 = 60 %
compressor: LZO
#pages: 5342
> 100%	:272
> 50%	:3281
<= 50%	:1789
0.100200702 seconds
ratio: 12737811 * 100 / 21877760 = 58 %
compressor: ZLIB
#pages: 5342
> 100%	:142
> 50%	:2464
<= 50%	:2736
1.147235809 seconds
ratio: 9903402 * 100 / 21877760 = 45 %
compressing: /usr/lib64/llvm/libLLVM-2.9.so
compressor: SNAPPY
#pages: 3472
> 100%	:44
> 50%	:2384
<= 50%	:1044
0.055121943 seconds
ratio: 8493554 * 100 / 14219992 = 59 %
compressor: LZO
#pages: 3472
> 100%	:53
> 50%	:2355
<= 50%	:1064
0.068662186 seconds
ratio: 8213334 * 100 / 14219992 = 57 %
compressor: ZLIB
#pages: 3472
> 100%	:12
> 50%	:1728
<= 50%	:1732
0.766150075 seconds
ratio: 6221694 * 100 / 14219992 = 43 %
compressing: /usr/lib64/xulrunner-2.0/libxul.so
compressor: SNAPPY
#pages: 7187
> 100%	:229
> 50%	:4432
<= 50%	:2526
0.108149693 seconds
ratio: 17455680 * 100 / 29433888 = 59 %
compressor: LZO
#pages: 7187
> 100%	:253
> 50%	:4287
<= 50%	:2647
0.135244136 seconds
ratio: 16596460 * 100 / 29433888 = 56 %
compressor: ZLIB
#pages: 7187
> 100%	:1
> 50%	:3021
<= 50%	:4165
1.610910737 seconds
ratio: 12248775 * 100 / 29433888 = 41 %
compressing: /usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.1-pre9999/cc1
compressor: SNAPPY
#pages: 3608
> 100%	:68
> 50%	:2168
<= 50%	:1372
0.056032193 seconds
ratio: 7728033 * 100 / 14775120 = 52 %
compressor: LZO
#pages: 3608
> 100%	:72
> 50%	:1975
<= 50%	:1561
0.069680830 seconds
ratio: 7384676 * 100 / 14775120 = 49 %
compressor: ZLIB
#pages: 3608
> 100%	:2
> 50%	:306
<= 50%	:3300
0.789069806 seconds
ratio: 5493265 * 100 / 14775120 = 37 %
compressing: /usr/lib64/libnvidia-glcore.so.270.41.03
compressor: SNAPPY
#pages: 6710
> 100%	:74
> 50%	:2614
<= 50%	:4022
0.084111724 seconds
ratio: 12860385 * 100 / 27481328 = 46 %
compressor: LZO
#pages: 6710
> 100%	:89
> 50%	:2436
<= 50%	:4185
0.103006618 seconds
ratio: 12051888 * 100 / 27481328 = 43 %
compressor: ZLIB
#pages: 6710
> 100%	:1
> 50%	:1633
<= 50%	:5076
1.216785009 seconds
ratio: 8641291 * 100 / 27481328 = 31 %
compressing: /usr/lib64/gcc/x86_64-pc-linux-gnu/4.6.1-pre9999/libgcj.so.12.0.0
compressor: SNAPPY
#pages: 15133
> 100%	:190
> 50%	:5105
<= 50%	:9838
0.193854352 seconds
ratio: 27131163 * 100 / 61982968 = 43 %
compressor: LZO
#pages: 15133
> 100%	:201
> 50%	:4323
<= 50%	:10609
0.235593989 seconds
ratio: 24944283 * 100 / 61982968 = 40 %
compressor: ZLIB
#pages: 15133
> 100%	:63
> 50%	:317
<= 50%	:14753
2.943011502 seconds
ratio: 18266667 * 100 / 61982968 = 29 %
compressing: /usr/lib64/libwireshark.so.0.0.1
compressor: SNAPPY
#pages: 11341
> 100%	:64
> 50%	:2982
<= 50%	:8295
0.130238274 seconds
ratio: 19576418 * 100 / 46449592 = 42 %
compressor: LZO
#pages: 11341
> 100%	:86
> 50%	:2565
<= 50%	:8690
0.157854033 seconds
ratio: 17765477 * 100 / 46449592 = 38 %
compressor: ZLIB
#pages: 11341
> 100%	:1
> 50%	:1219
<= 50%	:10121
2.020140289 seconds
ratio: 12565102 * 100 / 46449592 = 27 %
compressing: /usr/share/icons/oxygen/icon-theme.cache
compressor: SNAPPY
#pages: 43411
> 100%	:0
> 50%	:7777
<= 50%	:35634
0.441581102 seconds
ratio: 60247441 * 100 / 177810480 = 33 %
compressor: LZO
#pages: 43411
> 100%	:31
> 50%	:7801
<= 50%	:35579
0.547072992 seconds
ratio: 59064132 * 100 / 177810480 = 33 %
compressor: ZLIB
#pages: 43411
> 100%	:0
> 50%	:2464
<= 50%	:40947
6.256616084 seconds
ratio: 42305375 * 100 / 177810480 = 23 %

This shows Snappy in the configuration for zram (4KB at a time, 8KB working
 memory) is 20% faster than LZO 2.05 while achieving compression ratios less
 than 3% worse.

It seems LZO now has a state of the art LZ implementation optimized for the
 currently popular platform. Benchmarks on other architectures, for both Snappy
 and the new LZO code are welcome.

In light of these developments I agree that upgrading to Snappy is not worth the
 potential trouble, though it is faster and is tested in kernel-space on ppc32
 and arm (in qemu).

I would like to thank Nitin Gupta for his ack:
http://driverdev.linuxdriverproject.org/pipermail/devel/2011-April/015546.html

If there is interest in merging Snappy, I am more than willing to continue
 working to address any issues anyone cares to raise (currently I am aware of
 none).

Zram still needs a faster entropy-coder than zlib. Maybe something can be done.

-Z.T.



More information about the devel mailing list