[PATCH 0/4] zsmalloc improvements

Wed Jul 11 20:48:28 UTC 2012

On 07/11/2012 02:42 PM, Konrad Rzeszutek Wilk wrote:
>>>> Which architecture was this under? It sounds x86-ish? Is this on
>>>> Westmere and more modern machines? What about Core2 architecture?
>>>>
>>>> Oh how did it work on AMD Phenom boxes?
>>>
>>> I don't have a Phenom box but I have an Athlon X2 I can try out.
>>> I'll get this information next Monday.
>>
>> Actually, I'm running some production stuff on that box, so
>> I rather not put testing stuff on it.  Is there any
>> particular reason that you wanted this information? Do you
>> have a reason to believe that mapping will be faster than
>> copy for AMD procs?
> 
> Sorry for the late response. Working on some ugly bug that is taking
> more time than anticipated.
> My thoughts were that these findings are based on the hardware memory
> prefetcher. The Intel
> machines - especially starting with Nehelem have some pretty
> impressive prefetcher where
> even doing in a linked list 'prefetch' on the next node is not beneficial.
> 
> Perhaps the way to leverage this is to use different modes depending
> on the bulk of data?
> When there is a huge amount use the old method, but for small use copy
> (as it would
> in theory stay in the cache longer).

Not sure what you mean by "bulk" or "huge amount" but the
maximum size of mapped object is PAGE_SIZE and the typical
size more around PAGE_SIZE/2. So that is what I'm
considering.  Do you think it makes a difference with copies
that small?

Thanks,
Seth