[PATCH] Drivers: hv: util: on deinit, don't wait the release event, if we shouldn't

Vitaly Kuznetsov vkuznets at redhat.com
Tue Feb 28 12:31:09 UTC 2017


Dexuan Cui <decui at microsoft.com> writes:

> If the daemon is NOT running at all, when we disable the util device from
> Hyper-V Manager (or sometimes the host can rescind a util device and then
> re-offer it), we'll hang in util_remove -> hv_kvp_deinit ->
> wait_for_completion(&release_event), because this code path doesn't run:
> hvt_op_release -> ... -> kvp_on_reset -> complete(&release_event).
>
> Due to this, we even can't reboot the VM properly.
>
> The patch tracks if the dev file is opened or not, and we only need to
> wait if it's opened.
>
> Fixes: 5a66fecbf6aa ("Drivers: hv: util: kvp: Fix a rescind processing issue")
> Signed-off-by: Dexuan Cui <decui at microsoft.com>
> Cc: Vitaly Kuznetsov <vkuznets at redhat.com>
> Cc: "K. Y. Srinivasan" <kys at microsoft.com>
> Cc: Haiyang Zhang <haiyangz at microsoft.com>
> Cc: Stephen Hemminger <sthemmin at microsoft.com>
> ---
>  drivers/hv/hv_fcopy.c           | 5 ++++-
>  drivers/hv/hv_kvp.c             | 6 +++++-
>  drivers/hv/hv_snapshot.c        | 5 ++++-
>  drivers/hv/hv_utils_transport.c | 2 ++
>  drivers/hv/hv_utils_transport.h | 1 +
>  5 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
> index 9aee601..545cf43 100644
> --- a/drivers/hv/hv_fcopy.c
> +++ b/drivers/hv/hv_fcopy.c
> @@ -358,8 +358,11 @@ int hv_fcopy_init(struct hv_util_service *srv)
>
>  void hv_fcopy_deinit(void)
>  {
> +	bool wait = hvt->dev_opened;
> +
>  	fcopy_transaction.state = HVUTIL_DEVICE_DYING;
>  	cancel_delayed_work_sync(&fcopy_timeout_work);
>  	hvutil_transport_destroy(hvt);
> -	wait_for_completion(&release_event);
> +	if (wait)
> +		wait_for_completion(&release_event);

This is racy I think. We need to prevent openning the device first and
then query its state:

	bool wait;

 	fcopy_transaction.state = HVUTIL_DEVICE_DYING;
        /* make sure state is set */
        mb();
        wait = hvt->dev_opened;
  	cancel_delayed_work_sync(&fcopy_timeout_work);
  	hvutil_transport_destroy(hvt);
        if (wait)
		wait_for_completion(&release_event);

otherwise someone could open the device before we manage to update its
state.

>  }
> diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
> index de26371..15c7873 100644
> --- a/drivers/hv/hv_kvp.c
> +++ b/drivers/hv/hv_kvp.c
> @@ -742,10 +742,14 @@ hv_kvp_init(struct hv_util_service *srv)
>
>  void hv_kvp_deinit(void)
>  {
> +	bool wait = hvt->dev_opened;
> +
>  	kvp_transaction.state = HVUTIL_DEVICE_DYING;
>  	cancel_delayed_work_sync(&kvp_host_handshake_work);
>  	cancel_delayed_work_sync(&kvp_timeout_work);
>  	cancel_work_sync(&kvp_sendkey_work);
>  	hvutil_transport_destroy(hvt);
> -	wait_for_completion(&release_event);
> +
> +	if (wait)
> +		wait_for_completion(&release_event);
>  }
> diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
> index bcc03f0..3847f19 100644
> --- a/drivers/hv/hv_snapshot.c
> +++ b/drivers/hv/hv_snapshot.c
> @@ -396,9 +396,12 @@ hv_vss_init(struct hv_util_service *srv)
>
>  void hv_vss_deinit(void)
>  {
> +	bool wait = hvt->dev_opened;
> +
>  	vss_transaction.state = HVUTIL_DEVICE_DYING;
>  	cancel_delayed_work_sync(&vss_timeout_work);
>  	cancel_work_sync(&vss_handle_request_work);
>  	hvutil_transport_destroy(hvt);
> -	wait_for_completion(&release_event);
> +	if (wait)
> +		wait_for_completion(&release_event);
>  }
> diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
> index c235a95..05e0648 100644
> --- a/drivers/hv/hv_utils_transport.c
> +++ b/drivers/hv/hv_utils_transport.c
> @@ -153,6 +153,7 @@ static int hvt_op_open(struct inode *inode, struct file *file)
>
>  	if (issue_reset)
>  		hvt_reset(hvt);
> +	hvt->dev_opened = (hvt->mode == HVUTIL_TRANSPORT_CHARDEV) && !ret;
>
>  	mutex_unlock(&hvt->lock);
>
> @@ -182,6 +183,7 @@ static int hvt_op_release(struct inode *inode, struct file *file)
>  	 * connects back.
>  	 */
>  	hvt_reset(hvt);
> +	hvt->dev_opened = false;
>  	mutex_unlock(&hvt->lock);
>

Not sure but it seems this may also be racy (what if we query the state
just before we reset it?).

>  	if (mode_old == HVUTIL_TRANSPORT_DESTROY)
> diff --git a/drivers/hv/hv_utils_transport.h b/drivers/hv/hv_utils_transport.h
> index d98f522..9871283 100644
> --- a/drivers/hv/hv_utils_transport.h
> +++ b/drivers/hv/hv_utils_transport.h
> @@ -32,6 +32,7 @@ struct hvutil_transport {
>  	int mode;                           /* hvutil_transport_mode */
>  	struct file_operations fops;        /* file operations */
>  	struct miscdevice mdev;             /* misc device */
> +	bool   dev_opened;                  /* Is the device opened? */
>  	struct cb_id cn_id;                 /* CN_*_IDX/CN_*_VAL */
>  	struct list_head list;              /* hvt_list */
>  	int (*on_msg)(void *, int);         /* callback on new user message */

I think we can get away without introducing this new flag, e.g. if we
replace release_event with an atomic which will hold the state
(open/closed). This will also elimenate possible races above. I can try
prototyping a patch if you want me to.

Thanks,

-- 
  Vitaly


More information about the devel mailing list