[PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

Dexuan Cui decui at microsoft.com
Tue Jan 22 18:40:30 UTC 2019


> From: Kimberly Brown <kimbrownkd at gmail.com>
> Sent: Monday, January 21, 2019 10:43 PM
> > > @@ -1421,7 +1422,10 @@ static ssize_t vmbus_chan_attr_show(struct
> > > kobject *kobj,
> > >  	if (chan->state != CHANNEL_OPENED_STATE)
> > >  		return -EINVAL;
> > >
> > > -	return attribute->show(chan, buf);
> > > +	mutex_lock(&vmbus_connection.channel_mutex);
> > > +	ret = attribute->show(chan, buf);
> > > +	mutex_unlock(&vmbus_connection.channel_mutex);
> > > +	return ret;
> > >  }
> >
> > It looks this patch is already able to fix the original issue:
> > 6712cc9c2211 ("vmbus: don't return values for uninitalized channels"),
> > as chan->state can't be CHANNEL_OPENED_STATE when
> > channel->ringbuffer_page is not set up yet, or has been freed.
> > -- Dexuan
> 
> I think that patch 6712cc9c2211 fixes the problem when the channel is
> not set up yet, but it doesn't fix the problem when the channel is being
> closed
Yeah, now I see your point.

> The channel could be freed after the check that "chan->state" is
> CHANNEL_OPENED_STATE, while the "attribute->show()" function is running.

IMO the channel struct itself can't be freed while the "attribute->show()"
function is running, because I suppose the sysfs interface should have an extra
reference to channel->kobj (see vmbus_add_channel_kobj()), so before the sysfs
files are removed, the channel struct itself can't disappear.
(I didn't check the related code very carefully, so I could be wrong. :-) )

But as you pointed, at least for sub-channels, channel->ringbuffer_page
can indeed disappear in vmbus_close() -> ... -> vmbus_free_ring(), and
the "attribute->show()" could crash when the race happens.
Adding channel_mutex here seems to be able to fix the race for
sub-channels, as the same channel_mutex is used in vmbus_disconnect_ring().

For a primary channel, vmbus_close() -> vmbus_free_ring() can still
free channel->ringbuffer_page without notifying the "attribute->show()".
We may also need to acquire/release the channel_mutex in vmbus_close()?
 
> Actually, there should be checks that "chan" is not null and that
> "chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll
> need to fix that.
I suppose "chan" can not be NULL here (see the above).

Checking "chan->state" may not help to completely fix the race, because
there is no locking/synchronization code in
vmbus_close_internal() when we test and change "channel->state".

I guess we may need to check if channel->ringbuffer_page is NULL in
the "attribute->show()". 

PS, to prove that a race condition does exist and can really cause a panic or
something, I usually add some msleep() delays in different paths so that I
can reproduce the crash every time I take a special action, e.g. when I read
the sysfs files of a NIC, I try to remove hv_netvsc driver. This way, I can prove
a patch can indeed help, at least it can fix the crash which would happen
without the patch. :-)

-- Dexuan


More information about the devel mailing list