[PATCH] Drivers: hv: vmbus: prevent new subchannel creation on device shutdown

Dexuan Cui decui at microsoft.com
Tue Jul 14 11:41:44 UTC 2015


> -----Original Message-----
> From: Vitaly Kuznetsov
> Sent: Monday, July 13, 2015 20:19
> Subject: [PATCH] Drivers: hv: vmbus: prevent new subchannel creation on device
> shutdown
> 
> When a new subchannel offer from host comes during device shutdown (e.g.
> when a netvsc/storvsc module is unloadedshortly after it was loaded) a
> crash can happen as vmbus_process_offer() is not anyhow serialized with
> vmbus_remove().

How about vmbus_onoffer_rescind()? 
It's not serialized with vmbus_remove() either, so I think there is an issue too?

I remember when 'rmmod hv_netvsc', we get a rescind-offer message for
each subchannel.

> As an example we can try calling subchannel create
> callback when the module is already unloaded.
> The following crash was observed while keeping loading/unloading hv_netvsc
> module on 64-CPU guest:
> 
>   hv_netvsc vmbus_14: net device safe to remove
>   BUG: unable to handle kernel paging request at 000000000000a118
>   IP: [<ffffffffa01c1b21>] netvsc_sc_open+0x31/0xb0 [hv_netvsc]
>   PGD 1f3946a067 PUD 1f38a5f067 PMD 0
>   Oops: 0000 [#1] SMP
>   ...
>   Call Trace:
>   [<ffffffffa0055cd7>] vmbus_onoffer+0x477/0x530 [hv_vmbus]
>   [<ffffffff81092e1f>] ? move_linked_works+0x5f/0x80
>   [<ffffffffa0055fd3>] vmbus_onmessage+0x33/0xa0 [hv_vmbus]
>   [<ffffffffa0052f81>] vmbus_onmessage_work+0x21/0x30 [hv_vmbus]
>   [<ffffffff81095cde>] process_one_work+0x18e/0x4e0
>   ...
> 
> The issue cannot be solved by just resetting sc_creation_callback on
> driver removal as while we search for the parent channel with channel_lock
> held we release it after the channel was found and it can disapper beneath
> our feet while we're still in vmbus_process_offer();
> 
> Introduce new sc_create_lock mutex and take it in vmbus_remove() to ensure
> no new subchannels are created after we started the removal procedure.
> Check its state with mutex_trylock in vmbus_process_offer().

In my 8-CPU VM, I can very easily reproduce the panic by
1. running
  while ((1)); do modprobe -r  hv_netvsc; modprobe hv_netvsc; sleep 10; done.

and
2. in vmbus_onoffer_rescind(), we sleep 3s after a subchannel is added into
the primary channel's sc_list (and before the sc_creation_callback is invoked):
(I added line 275)
 
262         if (!fnew) {
263                 /*
264                  * Check to see if this is a sub-channel.
265                  */
266                 if (newchannel->offermsg.offer.sub_channel_index != 0) {
267                         /*
268                          * Process the sub-channel.
269                          */
270                         newchannel->primary_channel = channel;
271                         spin_lock_irqsave(&channel->lock, flags);
272                         list_add_tail(&newchannel->sc_list, &channel->sc_list);
273                         channel->num_sc++;
274                         spin_unlock_irqrestore(&channel->lock, flags);
275                         ssleep(3);
276                 } else
277                         goto err_free_chan;
278         }




More information about the devel mailing list