[PATCH 1/4] Drivers: hv: vmbus: implement get/put usage workflow for vmbus channels

Vitaly Kuznetsov vkuznets at redhat.com
Wed Feb 4 09:32:26 UTC 2015


Dexuan Cui <decui at microsoft.com> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets at redhat.com]
>> Sent: Wednesday, February 4, 2015 1:01 AM
>> To: KY Srinivasan; devel at linuxdriverproject.org
>> Cc: Haiyang Zhang; linux-kernel at vger.kernel.org; Dexuan Cui; Jason Wang
>> Subject: [PATCH 1/4] Drivers: hv: vmbus: implement get/put usage workflow for
>> vmbus channels
>> 
>> free_channel() function frees the channel unconditionally so we need to make
>> sure nobody has any link to it. This is not trivial and there are several
>> examples of races we have:
>> 
>> 1) In vmbus_onoffer_rescind() we check for channel existence with
>>    relid2channel() and then use it. This can go wrong if we're in the middle
>>    of channel removal (free_channel() was already called).
>> 
>> 2) In process_chn_event() we check for channel existence with
>>    pcpu_relid2channel() and then use it. This can also go wrong.
>> 
>> 3) vmbus_free_channels() just frees all channels, in case we're in the middle
>>    of vmbus_process_rescind_offer() crash is possible.
>> 
>> The issue can be solved by holding vmbus_connection.channel_lock everywhere,
>> however, it looks like a way to deadlocks and performance degradation. Get/put
>> workflow fits here the best.
>> 
>> Implement vmbus_get_channel()/vmbus_put_channel() pair instead of
>> free_channel().
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets at redhat.com>
>> ---
>>  drivers/hv/channel_mgmt.c | 45
>> ++++++++++++++++++++++++++++++++++++++-------
>>  drivers/hv/connection.c   |  7 +++++--
>>  drivers/hv/hyperv_vmbus.h |  4 ++++
>>  include/linux/hyperv.h    | 13 +++++++++++++
>>  4 files changed, 60 insertions(+), 9 deletions(-)
>> 
>> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> index 36bacc7..eb9ce94 100644
>> --- a/drivers/hv/channel_mgmt.c
>> +++ b/drivers/hv/channel_mgmt.c
>> @@ -147,6 +147,8 @@ static struct vmbus_channel *alloc_channel(void)
>>  		return NULL;
>> 
>>  	channel->id = atomic_inc_return(&chan_num);
>> +	atomic_set(&channel->count, 1);
>> +
>>  	spin_lock_init(&channel->inbound_lock);
>>  	spin_lock_init(&channel->lock);
>> 
>> @@ -178,19 +180,47 @@ static void release_channel(struct work_struct *work)
>>  }
>> 
>>  /*
>> - * free_channel - Release the resources used by the vmbus channel object
>> + * vmbus_put_channel - Decrease the channel usage counter and release the
>> + * resources when this counter reaches zero.
>>   */
>> -static void free_channel(struct vmbus_channel *channel)
>> +void vmbus_put_channel(struct vmbus_channel *channel)
>>  {
>> +	unsigned long flags;
>> 
>>  	/*
>>  	 * We have to release the channel's workqueue/thread in the vmbus's
>>  	 * workqueue/thread context
>>  	 * ie we can't destroy ourselves.
>>  	 */
>> -	INIT_WORK(&channel->work, release_channel);
>> -	queue_work(vmbus_connection.work_queue, &channel->work);
>> +	spin_lock_irqsave(&channel->lock, flags);
>> +	if (atomic_dec_and_test(&channel->count)) {
>> +		channel->dying = true;
>> +		INIT_WORK(&channel->work, release_channel);
>> +		spin_unlock_irqrestore(&channel->lock, flags);
>> +		queue_work(vmbus_connection.work_queue, &channel->work);
>> +	} else
>> +		spin_unlock_irqrestore(&channel->lock, flags);
>> +}
>> +EXPORT_SYMBOL_GPL(vmbus_put_channel);
>> +
>> +/* vmbus_get_channel - Get additional reference to the channel */
>> +struct vmbus_channel *vmbus_get_channel(struct vmbus_channel *channel)
>> +{
>> +	unsigned long flags;
>> +	struct vmbus_channel *ret = NULL;
>> +
>> +	if (!channel)
>> +		return NULL;
>> +
>> +	spin_lock_irqsave(&channel->lock, flags);
>> +	if (!channel->dying) {
>> +		atomic_inc(&channel->count);
>> +		ret = channel;
>> +	}
>> +	spin_unlock_irqrestore(&channel->lock, flags);
>> +	return ret;
>>  }
>> +EXPORT_SYMBOL_GPL(vmbus_get_channel);
>> 
>>  static void percpu_channel_enq(void *arg)
>>  {
>> @@ -253,7 +283,7 @@ static void vmbus_process_rescind_offer(struct
>> work_struct *work)
>>  		list_del(&channel->sc_list);
>>  		spin_unlock_irqrestore(&primary_channel->lock, flags);
>>  	}
>> -	free_channel(channel);
>> +	vmbus_put_channel(channel);
>>  }
>> 
>>  void vmbus_free_channels(void)
>> @@ -262,7 +292,7 @@ void vmbus_free_channels(void)
>> 
>>  	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
>>  		vmbus_device_unregister(channel->device_obj);
>> -		free_channel(channel);
>> +		vmbus_put_channel(channel);
>>  	}
>>  }
>> 
>> @@ -391,7 +421,7 @@ done_init_rescind:
>>  	spin_unlock_irqrestore(&newchannel->lock, flags);
>>  	return;
>>  err_free_chan:
>> -	free_channel(newchannel);
>> +	vmbus_put_channel(newchannel);
>>  }
>> 
>>  enum {
>> @@ -549,6 +579,7 @@ static void vmbus_onoffer_rescind(struct
>> vmbus_channel_message_header *hdr)
>>  		queue_work(channel->controlwq, &channel->work);
>> 
>>  	spin_unlock_irqrestore(&channel->lock, flags);
>> +	vmbus_put_channel(channel);
>>  }
>> 
>>  /*
>> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
>> index c4acd1c..d1ce134 100644
>> --- a/drivers/hv/connection.c
>> +++ b/drivers/hv/connection.c
>> @@ -247,7 +247,8 @@ void vmbus_disconnect(void)
>>   * Map the given relid to the corresponding channel based on the
>>   * per-cpu list of channels that have been affinitized to this CPU.
>>   * This will be used in the channel callback path as we can do this
>> - * mapping in a lock-free fashion.
>> + * mapping in a lock-free fashion. Takes additional reference to the
>> + * channel, all users are supposed to do vmbus_put_channel().
>>   */
>>  static struct vmbus_channel *pcpu_relid2channel(u32 relid)
>>  {
>> @@ -263,7 +264,7 @@ static struct vmbus_channel *pcpu_relid2channel(u32
>> relid)
>>  		}
>>  	}
>> 
>> -	return found_channel;
>> +	return vmbus_get_channel(found_channel);
>>  }
>> 
>>  /*
>> @@ -297,6 +298,7 @@ struct vmbus_channel *relid2channel(u32 relid)
>>  			}
>>  		}
>>  	}
>> +	found_channel = vmbus_get_channel(found_channel);
>>  	spin_unlock_irqrestore(&vmbus_connection.channel_lock, flags);
>> 
>>  	return found_channel;
>> @@ -360,6 +362,7 @@ static void process_chn_event(u32 relid)
>>  		pr_err("no channel callback for relid - %u\n", relid);
>>  	}
>> 
>> +	vmbus_put_channel(channel);
>>  }
>> 
>>  /*
>> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
>> index b055e53..40d70f0 100644
>> --- a/drivers/hv/hyperv_vmbus.h
>> +++ b/drivers/hv/hyperv_vmbus.h
>> @@ -687,6 +687,10 @@ void vmbus_device_unregister(struct hv_device
>> *device_obj);
>>  /* VmbusChildDeviceDestroy( */
>>  /* struct hv_device *); */
>> 
>> +/*
>> + * Get the channel by its relid. Takes additional reference to the channel so
>> + * all users are supposed to do vmbus_put_channel() when they're done.
>> + */
>>  struct vmbus_channel *relid2channel(u32 relid);
>> 
>>  void vmbus_free_channels(void);
>> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
>> index e73cfeb..c576d2d 100644
>> --- a/include/linux/hyperv.h
>> +++ b/include/linux/hyperv.h
>> @@ -649,6 +649,9 @@ struct vmbus_channel {
>>  	/* Unique channel id */
>>  	int id;
>> 
>> +	/* Active reference count */
>> +	atomic_t count;
>> +
>>  	struct list_head listentry;
>> 
>>  	struct hv_device *device_obj;
>> @@ -666,6 +669,7 @@ struct vmbus_channel {
>>  	u8 monitor_bit;
>> 
>>  	bool rescind; /* got rescind msg */
>> +	bool dying; /* channel is dying */
>> 
>>  	u32 ringbuffer_gpadlhandle;
>> 
>> @@ -907,6 +911,15 @@ extern int vmbus_recvpacket_raw(struct
>> vmbus_channel *channel,
>> 
>>  extern void vmbus_ontimer(unsigned long data);
>> 
>> +/*
>> + * Decrease reference count for the channel. Frees the channel when its usage
>> + * count reaches zero.
>> + */
>> +extern void vmbus_put_channel(struct vmbus_channel *channel);
>> +
>> +/* Get additional reference to the channel */
>> +extern struct vmbus_channel *vmbus_get_channel(struct vmbus_channel
>> *channel);
>> +
>>  /* Base driver object */
>>  struct hv_driver {
>>  	const char *name;
>> --
>
> Hi Vitaly,
> Thanks for the patchset! 
> I once tried to do the same work, but just didn't find enough time. :-)
>
> Please see my below questions:

Thanks for your detailed review! I should have added 'RFC' to my series :-)

>
> Since we always get channel->lock when accessing channel->count, I don't
> think the counter has to be of atomic_t?

Agreed

>
> I suggest we add vmbus_get/put_channel() in vmbus_open/close().
> In this way, IMO patch 3/4 and 4/4 would be unnecessary?
>

I was thinking about that but I wasn't sure it is going to be enough for
sub-channels. I'll look again and report.

> In vmbus_exit(), I suspect we should swap the order of the 2 lines:
>         vmbus_free_channels();
>         bus_unregister(&hv_bus);
> I suppose in bus_unregister(), the .remove callbacks of all the drivers are
> invoked, and vmbus_close() is  invoked for every channel.

Makes sense!

>
> BTW, I just noticed: alloc_channel() -> alloc_workqueue(,,max_active==1,):
> due to max_active==1 here, a channel's process_offer() and
> process_rescind_offer() is already serialized.
> I'm not sure if this fact can be used to simplify the logic?

Workers are serialized but e.g. vmbus_onoffer_rescind() is not
serialized with them, e.g. there is no protection against channel
dessapearance after we found it with relid2channel() (e.g. we're on
failure path in vmbus_process_offer()).

Anyway, we have another examples of (theoretically) possible issues and
get/put workflow should simplify the syncronization.

> Is the previous commit "Drivers: hv: vmbus: serialize Offer and Rescind offer"
> necessary?

It is as not checking work function is bad anyway as (in theory) we can
run vmbus_process_offer() twice if vmbus_onoffer_rescind() runs before
vmbus_process_offer() was started.

>
> BTW2, I think there is an unsafe race in the 2 paths (this is an existing issue,
> not related to this patch of yours):
>
> vmbus_on_event() -> process_chn_event() -> pcpu_relid2channel -> 
> 	list_for_each_entry(channel, pcpu_head, percpu_list) 
>
> and
>
> vmbus_process_rescind_offer() -> percpu_channel_deq() ->
> 	list_del(&channel->percpu_list)
>
> because the former is running in tasklet and hence can preempt the latter (running
> in process context).  percpu_channel_enq() should have the same
> issue(?)

As I said in my cover letter I don't believe my series solves all issues :-)

>
> We should add local_bh_disable()/enable() accordingly?
> Can you please help to fix this?

Yes, sure, I'll take a look. 

>
> vmbus_exit() -> vmbus_free_channels() -> vmbus_put_channel() may have
> a race with vmbus_process_rescind_offer()?
> After vmbus_process_rescind_offer() ->  vmbus_device_unregister(), the count
> can already be the initial 1, later vmbus_free_channels() -> vmbus_put_channel() can
> kfree the channel before vmbus_process_rescind_offer() completely
> finishes?

I was going to rewrite vmbus_free_channels() not only because of
that. E.g. vmbus_get_outgoing_channel() presumes all lists (in both
channel.sc_list and vmbus_connection.chn_list) are valid and we don't
take care of that. Not that I believe vmbus_get_outgoing_channel()
may happen when vmbus_free_channels() is active (and we call
vmbus_free_channels() only if we unload the module)


Thanks for your thorough review, it gave me further inspiration :-)

-- 
  Vitaly


More information about the devel mailing list