[PATCH v2 0/3] staging: zcache: xcfmalloc support
dave at linux.vnet.ibm.com
Thu Sep 15 15:17:42 PDT 2011
On Thu, 2011-09-15 at 14:24 -0500, Seth Jennings wrote:
> How would you suggest that I measure xcfmalloc performance on a "very
> large set of workloads". I guess another form of that question is: How
> did xvmalloc do this?
Well, it didn't have a competitor, so this probably wasn't done. :)
I'd like to see a microbenchmarky sort of thing. Do a million (or 100
million, whatever) allocations, and time it for both allocators doing
the same thing. You just need to do the *same* allocations for both.
It'd be interesting to see the shape of a graph if you did:
for (i = 0; i < BIG_NUMBER; i++)
for (j = MIN_ALLOC; j < MAX_ALLOC; j += BLOCK_SIZE)
... basically for both allocators. Let's see how the graphs look. You
could do it a lot of different ways: alloc all, then free all, or alloc
one free one, etc... Maybe it will surprise us. Maybe the page
allocator overhead will dominate _everything_, and we won't even see the
x*malloc() functions show up.
The other thing that's important is to think of cases like I described
that would cause either allocator to do extra splits/joins or be slow in
other ways. I expect xcfmalloc() to be slowest when it is allocating
and has to break down a reserve page. Let's say it does a bunch of ~3kb
allocations and has no pages on the freelists, it will:
1. scan each of the 64 freelists heads (512 bytes of cache)
2. split a 4k page
3. reinsert the 1k remainder
Next time, it will:
1. scan, and find the 1k bit
2. continue scanning, eventually touching each freelist...
3. split a 4k page
4. reinsert the 2k remainder
It'll end up doing a scan/split/reinsert in 3/4 of the cases, I think.
The case of the freelists being quite empty will also be quite common
during times the pool is expanding. I think xvmalloc() will have some
of the same problems, but let's see if it does in practice.
More information about the devel