[PATCH RFC] mm/memory_hotplug: Introduce memory block types

David Hildenbrand david at redhat.com
Mon Nov 26 13:33:29 UTC 2018


On 26.11.18 13:30, David Hildenbrand wrote:
> On 23.11.18 19:06, Michal Suchánek wrote:
>> On Fri, 23 Nov 2018 12:13:58 +0100
>> David Hildenbrand <david at redhat.com> wrote:
>>
>>> On 28.09.18 17:03, David Hildenbrand wrote:
>>>> How to/when to online hotplugged memory is hard to manage for
>>>> distributions because different memory types are to be treated differently.
>>>> Right now, we need complicated udev rules that e.g. check if we are
>>>> running on s390x, on a physical system or on a virtualized system. But
>>>> there is also sometimes the demand to really online memory immediately
>>>> while adding in the kernel and not to wait for user space to make a
>>>> decision. And on virtualized systems there might be different
>>>> requirements, depending on "how" the memory was added (and if it will
>>>> eventually get unplugged again - DIMM vs. paravirtualized mechanisms).
>>>>
>>>> On the one hand, we have physical systems where we sometimes
>>>> want to be able to unplug memory again - e.g. a DIMM - so we have to online
>>>> it to the MOVABLE zone optionally. That decision is usually made in user
>>>> space.
>>>>
>>>> On the other hand, we have memory that should never be onlined
>>>> automatically, only when asked for by an administrator. Such memory only
>>>> applies to virtualized environments like s390x, where the concept of
>>>> "standby" memory exists. Memory is detected and added during boot, so it
>>>> can be onlined when requested by the admininistrator or some tooling.
>>>> Only when onlining, memory will be allocated in the hypervisor.
>>>>
>>>> But then, we also have paravirtualized devices (namely xen and hyper-v
>>>> balloons), that hotplug memory that will never ever be removed from a
>>>> system right now using offline_pages/remove_memory. If at all, this memory
>>>> is logically unplugged and handed back to the hypervisor via ballooning.
>>>>
>>>> For paravirtualized devices it is relevant that memory is onlined as
>>>> quickly as possible after adding - and that it is added to the NORMAL
>>>> zone. Otherwise, it could happen that too much memory in a row is added
>>>> (but not onlined), resulting in out-of-memory conditions due to the
>>>> additional memory for "struct pages" and friends. MOVABLE zone as well
>>>> as delays might be very problematic and lead to crashes (e.g. zone
>>>> imbalance).
>>>>
>>>> Therefore, introduce memory block types and online memory depending on
>>>> it when adding the memory. Expose the memory type to user space, so user
>>>> space handlers can start to process only "normal" memory. Other memory
>>>> block types can be ignored. One thing less to worry about in user space.
>>>>   
>>>
>>> So I was looking into alternatives.
>>>
>>> 1. Provide only "normal" and "standby" memory types to user space. This
>>> way user space can make smarter decisions about how to online memory.
>>> Not really sure if this is the right way to go.
>>>
>>>
>>> 2. Use device driver information (as mentioned by Michal S.).
>>>
>>> The problem right now is that there are no drivers for memory block
>>> devices. The "memory" subsystem has no drivers, so the KOBJ_ADD uevent
>>> will not contain a "DRIVER" information and we ave no idea what kind of
>>> memory block device we hold in our hands.
>>>
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory0
>>>
>>>   looking at device '/devices/system/memory/memory0':
>>>     KERNEL=="memory0"
>>>     SUBSYSTEM=="memory"
>>>     DRIVER==""
>>>     ATTR{online}=="1"
>>>     ATTR{phys_device}=="0"
>>>     ATTR{phys_index}=="00000000"
>>>     ATTR{removable}=="0"
>>>     ATTR{state}=="online"
>>>     ATTR{valid_zones}=="none"
>>>
>>>
>>> If we would provide "fake" drivers for the memory block devices we want
>>> to treat in a special way in user space (e.g. standby memory on s390x),
>>> user space could use that information to make smarter decisions.
>>>
>>> Adding such drivers might work. My suggestion would be to let ordinary
>>> DIMMs be without a driver for now and only special case standby memory
>>> and eventually paravirtualized memory devices (XEN and Hyper-V).
>>>
>>> Any thoughts?
>>
>> If we are going to fake the driver information we may as well add the
>> type attribute and be done with it.
>>
>> I think the problem with the patch was more with the semantic than the
>> attribute itself.
>>
>> What is normal, paravirtualized, and standby memory?
>>
>> I can understand DIMM device, baloon device, or whatever mechanism for
>> adding memory you might have.
>>
>> I can understand "memory designated as standby by the cluster
>> administrator".
>>
>> However, DIMM vs baloon is orthogonal to standby and should not be
>> conflated into one property.
>>
>> paravirtualized means nothing at all in relationship to memory type and
>> the desired online policy to me.
> 
> Right, so with whatever we come up, it should allow to make a decision
> in user space about
> - if memory is to be onlined automatically

And I will think about if we really should model standby memory. Maybe
it is really better to have in user space something like (as Dan noted)

if (isS390x() && type == "dimm") {
	/* don't online, on s390x system DIMMs are standby memory */
}

The we could have in addition

if (type == "balloon") {
	/*
	 * Balloon will not be unplugged by offlining the whole block at
	 * once, online as !movable.
	 */
}

But I'll have to think about the wording / types etc. (I neither like
"dimm" nor "balloon").

-- 

Thanks,

David / dhildenb


More information about the devel mailing list