[PATCHv4 0/7] zswap: compressed swap caching

Seth Jennings sjenning at linux.vnet.ibm.com
Mon Feb 4 21:45:12 UTC 2013


On 02/04/2013 03:30 PM, Seth Jennings wrote:
> On 01/29/2013 03:40 PM, Seth Jennings wrote:
>> Sorry for the churn but just this set might be easier to review.
>> The code required for the flushing is in a separate patch now
>> as requested.

I've got a large and valuable body of feedback to integrate for v5.
Thanks to all that reviewed/commented!

It will take a little time to compile it all and coordinate with
Minchan and Nitin on the additional documentation and rationale for
zsmalloc.

I just wanted to acknowledge the feedback and state that I'm working
on v5 and I'll get it out as soon as I can.

Thanks,
Seth

>
>>
>> Changelog:
>>
>> v4:
>> * Added Acks (Minchan)
>> * Separated flushing functionality into standalone patch
>>   for easier review (Minchan)
>> * fix comment on zswap enabled attribute (Minchan)
>> * add TODO for dynamic mempool size (Minchan)
>> * and check for NULL in zswap_free_page() (Minchan)
>> * add missing zs_free() in error path (Minchan)
>> * TODO: add comments for flushing/refcounting (Minchan)
>>
>> NOTE: To build, read this:
>> http://lkml.org/lkml/2013/1/28/586
>>
>> v3:
>> * Dropped the zsmalloc patches from the set, except the promotion patch
>>   which has be converted to a rename patch (vs full diff).  The dropped
>>   patches have been Acked and are going into Greg's staging tree soon.
>> * Separated [PATCHv2 7/9] into two patches since it makes changes for two
>>   different reasons (Minchan)
>> * Moved ZSWAP_MAX_OUTSTANDING_FLUSHES near the top in zswap.c (Rik)
>> * Rebase to v3.8-rc5. linux-next is a little volatile with the
>>   swapper_space per type changes which will effect this patchset.
>> * TODO: Move some stats from debugfs to sysfs. Which ones? (Rik)
>>
>> v2:
>> * Rename zswap_fs_* functions to zswap_frontswap_* to avoid
>>   confusion with "filesystem"
>> * Add comment about what the tree lock protects
>> * Remove "#if 0" code (should have been done before)
>> * Break out changes to existing swap code into separate patch
>> * Fix blank line EOF warning on documentation file
>> * Rebase to next-20130107
>>
>> Zswap Overview:
>>
>> Zswap is a lightweight compressed cache for swap pages. It takes
>> pages that are in the process of being swapped out and attempts to
>> compress them into a dynamically allocated RAM-based memory pool.
>> If this process is successful, the writeback to the swap device is
>> deferred and, in many cases, avoided completely.  This results in
>> a significant I/O reduction and performance gains for systems that
>> are swapping.
>>
>> The results of a kernel building benchmark indicate a
>> runtime reduction of 53% and an I/O reduction 76% with zswap vs normal
>> swapping with a kernel build under heavy memory pressure (see
>> Performance section for more).
>>
>> Some addition performance metrics regarding the performance
>> improvements and I/O reductions that can be achieved using zswap as
>> measured by SPECjbb are provided here:
>>
>> http://ibm.co/VCgHvM
>>
>> These results include runs on x86 and new results on Power7+ with
>> hardware compression acceleration.
>>
>> Of particular note is that zswap is able to evict pages from the compressed
>> cache, on an LRU basis, to the backing swap device when the compressed pool
>> reaches it size limit or the pool is unable to obtain additional pages
>> from the buddy allocator.  This eviction functionality had been identified
>> as a requirement in prior community discussions.
>>
>> Patchset Structure:
>> 1:   add atomic_t get/set to debugfs
>> 2:   promote zsmalloc to /lib
>> 3,4: changes to existing swap code for zswap
>> 5,6: add zswap and documentation
>>
>> Rationale:
>>
>> Zswap provides compressed swap caching that basically trades CPU cycles
>> for reduced swap I/O.  This trade-off can result in a significant
>> performance improvement as reads to/writes from to the compressed
>> cache almost always faster that reading from a swap device
>> which incurs the latency of an asynchronous block I/O read.
>>
>> Some potential benefits:
>> * Desktop/laptop users with limited RAM capacities can mitigate the
>>     performance impact of swapping.
>> * Overcommitted guests that share a common I/O resource can
>>     dramatically reduce their swap I/O pressure, avoiding heavy
>>     handed I/O throttling by the hypervisor.  This allows more work
>>     to get done with less impact to the guest workload and guests
>>     sharing the I/O subsystem
>> * Users with SSDs as swap devices can extend the life of the device by
>>     drastically reducing life-shortening writes.
>>
>> Compressed swap is also provided in zcache, along with page cache
>> compression and RAM clustering through RAMSter. Zswap seeks to deliver
>> the benefit of swap  compression to users in a discrete function.
>> This design decision is akin to Unix design philosophy of doing one
>> thing well, it leaves file cache compression and other features
>> for separate code.
>>
>> Design:
>>
>> Zswap receives pages for compression through the Frontswap API and
>> is able to evict pages from its own compressed pool on an LRU basis
>> and write them back to the backing swap device in the case that the
>> compressed pool is full or unable to secure additional pages from
>> the buddy allocator.
>>
>> Zswap makes use of zsmalloc for the managing the compressed memory
>> pool.  This is because zsmalloc is specifically designed to minimize
>> fragmentation on large (> PAGE_SIZE/2) allocation sizes.  Each
>> allocation in zsmalloc is not directly accessible by address.
>> Rather, a handle is return by the allocation routine and that handle
>> must be mapped before being accessed.  The compressed memory pool grows
>> on demand and shrinks as compressed pages are freed.  The pool is
>> not preallocated.
>>
>> When a swap page is passed from frontswap to zswap, zswap maintains
>> a mapping of the swap entry, a combination of the swap type and swap
>> offset, to the zsmalloc handle that references that compressed swap
>> page.  This mapping is achieved with a red-black tree per swap type.
>> The swap offset is the search key for the tree nodes.
>>
>> Zswap seeks to be simple in its policies.  Sysfs attributes allow for
>> two user controlled policies:
>> * max_compression_ratio - Maximum compression ratio, as as percentage,
>>     for an acceptable compressed page. Any page that does not compress
>>     by at least this ratio will be rejected.
>> * max_pool_percent - The maximum percentage of memory that the compressed
>>     pool can occupy.
>>
>> To enabled zswap, the "enabled" attribute must be set to 1 at boot time.
>>
>> Zswap allows the compressor to be selected at kernel boot time by
>> setting the “compressor” attribute.  The default compressor is lzo.
>>
>> A debugfs interface is provided for various statistic about pool size,
>> number of pages stored, and various counters for the reasons pages
>> are rejected.
>>
>> Performance, Kernel Building:
>>
>> Setup
>> ========
>> Gentoo w/ kernel v3.7-rc7
>> Quad-core i5-2500 @ 3.3GHz
>> 512MB DDR3 1600MHz (limited with mem=512m on boot)
>> Filesystem and swap on 80GB HDD (about 58MB/s with hdparm -t)
>> majflt are major page faults reported by the time command
>> pswpin/out is the delta of pswpin/out from /proc/vmstat before and after
>> the make -jN
>>
>> Summary
>> ========
>> * Zswap reduces I/O and improves performance at all swap pressure levels.
>>
>> * Under heavy swaping at 24 threads, zswap reduced I/O by 76%, saving
>>   over 1.5GB of I/O, and cut runtime in half.
>>
>> Details
>> ========
>> I/O (in pages)
>> 	base				zswap				change	change
>> N	pswpin	pswpout	majflt	I/O sum	pswpin	pswpout	majflt	I/O sum	%I/O	MB
>> 8	1	335	291	627	0	0	249	249	-60%	1
>> 12	3688	14315	5290	23293	123	860	5954	6937	-70%	64
>> 16	12711	46179	16803	75693	2936	7390	46092	56418	-25%	75
>> 20	42178	133781	49898	225857	9460	28382	92951	130793	-42%	371
>> 24	96079	357280	105242	558601	7719	18484	109309	135512	-76%	1653
>>
>> Runtime (in seconds)
>> N	base	zswap	%change
>> 8	107	107	0%
>> 12	128	110	-14%
>> 16	191	179	-6%
>> 20	371	240	-35%
>> 24	570	267	-53%
>>
>> %CPU utilization (out of 400% on 4 cpus)
>> N	base	zswap	%change
>> 8	317	319	1%
>> 12	267	311	16%
>> 16	179	191	7%
>> 20	94	143	52%
>> 24	60	128	113%
>>
>>
>> Seth Jennings (7):
>>   debugfs: add get/set for atomic types
>>   zsmalloc: promote to lib/
>>   zswap: add to mm/
>>   mm: break up swap_writepage() for frontswap backends
>>   mm: allow for outstanding swap writeback accounting
>>   zswap: add flushing support
>>   zswap: add documentation
>>
>>  Documentation/vm/zswap.txt                         |   73 ++
>>  drivers/staging/Kconfig                            |    2 -
>>  drivers/staging/Makefile                           |    1 -
>>  drivers/staging/zcache/zcache-main.c               |    3 +-
>>  drivers/staging/zram/zram_drv.h                    |    3 +-
>>  drivers/staging/zsmalloc/Kconfig                   |   10 -
>>  drivers/staging/zsmalloc/Makefile                  |    3 -
>>  fs/debugfs/file.c                                  |   42 +
>>  include/linux/debugfs.h                            |    2 +
>>  include/linux/swap.h                               |    4 +
>>  .../staging/zsmalloc => include/linux}/zsmalloc.h  |    0
>>  lib/Kconfig                                        |   18 +
>>  lib/Makefile                                       |    1 +
>>  .../zsmalloc/zsmalloc-main.c => lib/zsmalloc.c     |    3 +-
>>  mm/Kconfig                                         |   15 +
>>  mm/Makefile                                        |    1 +
>>  mm/page_io.c                                       |   22 +-
>>  mm/swap_state.c                                    |    2 +-
>>  mm/zswap.c                                         | 1073 ++++++++++++++++++++
>>  19 files changed, 1250 insertions(+), 28 deletions(-)
>>  create mode 100644 Documentation/vm/zswap.txt
>>  delete mode 100644 drivers/staging/zsmalloc/Kconfig
>>  delete mode 100644 drivers/staging/zsmalloc/Makefile
>>  rename {drivers/staging/zsmalloc => include/linux}/zsmalloc.h (100%)
>>  rename drivers/staging/zsmalloc/zsmalloc-main.c => lib/zsmalloc.c (99%)
>>  create mode 100644 mm/zswap.c
>>
> 




More information about the devel mailing list