[PATCH 3/3] hv_netvsc: Implement VF matching based on serial numbers

Haiyang Zhang haiyangz at microsoft.com
Fri Dec 9 20:09:49 UTC 2016



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Friday, December 9, 2016 1:21 PM
> To: Greg KH <gregkh at linuxfoundation.org>
> Cc: KY Srinivasan <kys at microsoft.com>; olaf at aepfle.de; Haiyang Zhang
> <haiyangz at microsoft.com>; linux-kernel at vger.kernel.org;
> bjorn.helgaas at gmail.com; apw at canonical.com; devel at linuxdriverproject.org;
> leann.ogasawara at canonical.com; jasowang at redhat.com
> Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on
> serial numbers
> 
> On Fri, 9 Dec 2016 08:31:22 +0100
> Greg KH <gregkh at linuxfoundation.org> wrote:
> 
> > On Fri, Dec 09, 2016 at 12:05:53AM +0000, KY Srinivasan wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Greg KH [mailto:gregkh at linuxfoundation.org]
> > > > Sent: Thursday, December 8, 2016 7:56 AM
> > > > To: KY Srinivasan <kys at microsoft.com>
> > > > Cc: linux-kernel at vger.kernel.org; devel at linuxdriverproject.org;
> > > > olaf at aepfle.de; apw at canonical.com; vkuznets at redhat.com;
> > > > jasowang at redhat.com; leann.ogasawara at canonical.com;
> > > > bjorn.helgaas at gmail.com; Haiyang Zhang <haiyangz at microsoft.com>
> > > > Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on
> serial
> > > > numbers
> > > >
> > > > On Thu, Dec 08, 2016 at 12:33:43AM -0800,
> kys at exchange.microsoft.com
> > > > wrote:
> > > > > From: Haiyang Zhang <haiyangz at microsoft.com>
> > > > >
> > > > > We currently use MAC address to match VF and synthetic NICs.
> Hyper-V
> > > > > provides a serial number to both devices for this purpose. This
> patch
> > > > > implements the matching based on VF serial numbers. This is the
> way
> > > > > specified by the protocol and more reliable.
> > > > >
> > > > > Signed-off-by: Haiyang Zhang <haiyangz at microsoft.com>
> > > > > Signed-off-by: K. Y. Srinivasan <kys at microsoft.com>
> > > > > ---
> > > > >  drivers/net/hyperv/netvsc_drv.c |   55
> > > > ++++++++++++++++++++++++++++++++++++---
> > > > >  1 files changed, 51 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/hyperv/netvsc_drv.c
> > > > b/drivers/net/hyperv/netvsc_drv.c
> > > > > index 9522763..c5778cf 100644
> > > > > --- a/drivers/net/hyperv/netvsc_drv.c
> > > > > +++ b/drivers/net/hyperv/netvsc_drv.c
> > > > > @@ -1165,9 +1165,10 @@ static void netvsc_free_netdev(struct
> > > > net_device *netdev)
> > > > >  	free_netdev(netdev);
> > > > >  }
> > > > >
> > > > > -static struct net_device *get_netvsc_bymac(const u8 *mac)
> > > > > +static struct net_device *get_netvsc_byvfser(u32 vfser)
> > > > >  {
> > > > >  	struct net_device *dev;
> > > > > +	struct net_device_context *ndev_ctx;
> > > > >
> > > > >  	ASSERT_RTNL();
> > > > >
> > > > > @@ -1175,7 +1176,8 @@ static void netvsc_free_netdev(struct
> net_device
> > > > *netdev)
> > > > >  		if (dev->netdev_ops != &device_ops)
> > > > >  			continue;	/* not a netvsc device */
> > > > >
> > > > > -		if (ether_addr_equal(mac, dev->perm_addr))
> > > > > +		ndev_ctx = netdev_priv(dev);
> > > > > +		if (ndev_ctx->vf_serial == vfser)
> > > > >  			return dev;
> > > > >  	}
> > > > >
> > > > > @@ -1205,21 +1207,66 @@ static void netvsc_free_netdev(struct
> > > > net_device *netdev)
> > > > >  	return NULL;
> > > > >  }
> > > > >
> > > > > +static u32 netvsc_get_vfser(struct net_device *vf_netdev)
> > > > > +{
> > > > > +	struct device *dev;
> > > > > +	struct hv_device *hdev;
> > > > > +	struct hv_pcibus_device *hbus = NULL;
> > > > > +	struct list_head *iter;
> > > > > +	struct hv_pci_dev *hpdev;
> > > > > +	unsigned long flags;
> > > > > +	u32 vfser = 0;
> > > > > +	u32 count = 0;
> > > > > +
> > > > > +	for (dev = &vf_netdev->dev; dev; dev = dev->parent) {
> > > >
> > > > You are going to walk the whole device tree backwards?  That's
> crazy.
> > > > And foolish.  And racy and broken (what happens if the tree
> changes
> > > > while you do this?)  Where is the lock being grabbed while this
> happens?
> > > > What about reference counts?  Do you see other drivers ever doing
> this
> > > > (if you do, point them out and I'll go yell at them too...)
> > >
> > > Greg,
> > >
> > > We are registering for netdev events. Coming into this function, the
> caller
> > > guarantees that the list of netdevs does not change - we assert this
> on entry:
> > > ASSERT_RTNL(). We are only walking up the device tree for the
> netdevs whose
> > > state change is being notified to us - the device tree being walked
> here is limited to
> > > netdevs under question.
> >
> > But a netdev is a child of some type of "real" device, and you are now
> > walking the tree of all devices up to the "root" parent device, which
> > means you will hit PCI bridges, USB controllers, and all sorts of fun
> > things if you are a child of those types of devices.
> >
> > And can't you tell if the netdev for this event, really is "your"
> > netdev?  Or are you getting called this for "all" netdevs?  Sorry, I
> > don't know this api, any pointers to it would be appreciated.
> >
> > > We have a reference to the device and we know the device is not
> going away. Is it not
> > > safe to dereference the parent pointer - after all the child has
> taken a reference on
> > > the parent as part of  device_add() call.
> >
> > It might be, and might not be.  There's a reason you don't see this
> > pattern anywhere in the kernel because of this...
> >
> > > > > +		if (!dev_is_vmbus(dev))
> > > > > +			continue;
> > > >
> > > > Ick.
> > > >
> > > > Why isn't your parent pointer a vmbus device all the time?  How
> could
> > > > you get burried down in the device hierarchy when you are the
> driver for
> > > > a specific bus type in the first place?  How could this function
> ever be
> > > > called for a device that is NOT of this type?
> > >
> > > We get notified when state changes on any of the netdev devices in
> the system.
> > > Not all netdevs in the system belong to vmbus. Consider for instance
> the
> > > emulated NIC that can be configured. This is an emulated PCI NIC. We
> are only
> > > interested in netdevs that correspond to the VF instance that we are
> interested in.
> >
> > Can you "know" this is your netdev by some other way than having to
> walk
> > the device tree?  Name?  local device type?  Something else?  This
> seems
> > like an odd api in that everyone would have to do gyrations like this
> in
> > order to determine if the netdev is "theirs" or not...
> 
> The scenario is SR-IOV on Hyper-V. In the case of VF device, the host
> hands the
> guest OS a PCI device for the virtual function device. The VF device is
> placed
> on a special synthetic PCI bus (ie not part of the other buses on the
> system).
> The VF device also has a synthetic network interface (netvsc) which
> lives
> on VMBUS.  This code is about managing the interaction between the two.
> 
> The association between VF and synthetic NIC is done in response to the
> VF network device being registered. Initial version was based on MAC
> address
> which is the same.  Later refinement used permanent MAC address to
> avoid bugs if MAC address changed.  This version is to use serial number
> instead which is safer than MAC address.
> 
> The code to walk up/down maybe not be needed to find serial number.
> Perhaps a more direct single set of conditions is possible?
> 
> Something like:
> 
> In pci-hyperv.c
> 
> u32 hv_pcifront_get_serial(struct pci_bus *bus, unsigned int devfn)
> {
> 	struct hv_pcibus_device *hbus
> 		= container_of(bus->sysdata,
> 			       struct hv_pcibus_device, sysdata);
> 	struct hf_pci_dev *hpdev;
> 	u32 serial;
> 
> 	hpdev = get_pcichild_wslot(hbus, devfn_to_wslot(pdev->devfn));
> 	if (!hpdev)
> 		return 0;
> 
> 	serial = hpdev->devs.ser;
> 	put_pcichild(hpdev, hv_pcidev_ref_by_slot);
> 	return serial;
> }
> 
> In netvsc_drv.c
> 
> static u32 netvsc_get_vfser(struct net_device *vf_netdev)
> {
> 	struct device *dev = vf_netdev->dev.parent;
> 	struct pci_dev *pdev;
> 	u32 wslot;
> 
> 	if (!dev || !dev_is_pci(dev))
> 		return 0;
> 
> 	pdev = container_of(dev, struct pci_device, dev);
> 
> 	return hv_pcifront_get_serial(pdev->bus, pdev->devfn);
> }
> 
> 
> 
> 
> 
> P.S: it would be good to be able to get win_slot out through sysfs as
> well for systemd/udev.

Stephen,

Thanks for suggestion. Actually, in my earlier implementation of this 
feature (VF serial based matching), I thought about export a function
from vPCI driver, then calling it from netvsc. So I don't need to 
move structs between headers... But, it creates a dependency of netvsc
on vPCI driver's symbol. So, even if on a VM without SRIOV, we have to
load vPCI driver, which we don't want.

Also, hv_vpci device is 3 parent layers above the vf_netdevice:
Here is the VF drv hierarchy --
Should we assume it's always 3 parents above vf_netdevice, or search for it?

[  368.185259] HZINFO:NETDEV_REGISTER:
[  368.185261] HZINFO: dev:ffff88007c10d518, bus:          (null), busName:(null), drvName:(null)
[  368.185262] HZINFO: dev:ffff88007c10c0a0, bus:ffffffff81ce4b60, busName:pci, drvName:ixgbevf
[  368.185263] HZINFO: dev:ffff8800355c0000, bus:          (null), busName:(null), drvName:(null)
[  368.185264] HZINFO: dev:ffff8800355c5428, bus:ffffffffa0008160, busName:vmbus, drvName:hv_pci
[  368.185264] HZINFO: dev:ffff88007c49e268, bus:ffffffff81ce9800, busName:acpi, drvName:vmbus
[  368.185265] HZINFO: dev:ffff88007c48ea68, bus:ffffffff81ce9800, busName:acpi, drvName:(null)
[  368.185266] HZINFO: dev:ffff88007c48aa68, bus:ffffffff81ce9800, busName:acpi, drvName:(null)
[  368.185266] HZINFO: dev:ffff88007c48a268, bus:ffffffff81ce9800, busName:acpi, drvName:(null)
[  368.185267] HZINFO: dev:ffff88007c489a68, bus:ffffffff81ce9800, busName:acpi, drvName:(null)

Thanks,
- Haiyang



More information about the devel mailing list