[PATCH v2] PCI: PM: Move to D0 before calling pci_legacy_resume_early()

Dexuan Cui decui at microsoft.com
Wed Oct 9 00:16:05 UTC 2019


> From: Bjorn Helgaas <helgaas at kernel.org>
> Sent: Tuesday, October 8, 2019 12:56 PM
> ...
> Wordsmithing nit: what the patch does is not "fix the error message";
> what it does is fix the *problem*, i.e., the fact that we can't
> operate the device because we can't enable MSI-X.  The message is only
> a symptom.

I totally agree. :-)

> IIUC the relevant part of the system hibernation sequence is:
> 
>   pci_pm_freeze_noirq
>   pci_pm_thaw_noirq
>   pci_pm_thaw
> 
> And the execution flow is:
> 
>   pci_pm_freeze_noirq
>     if (pci_has_legacy_pm_support(pci_dev)) # true for mlx4
>       pci_legacy_suspend_late(dev, PMSG_FREEZE)
> 	pci_pm_set_unknown_state
> 	  dev->current_state = PCI_UNKNOWN  # <---
>   pci_pm_thaw_noirq
>     if (pci_has_legacy_pm_support(pci_dev)) # true
>       pci_legacy_resume_early(dev)          # noop; mlx4 doesn't
> implement
>   pci_pm_thaw                               # returns -95
> EOPNOTSUPP
>     if (pci_has_legacy_pm_support(pci_dev)) # true
>       pci_legacy_resume
> 	drv->resume
> 	  mlx4_resume                       # mlx4_driver.resume (legacy)
> 	    mlx4_load_one
> 	      mlx4_enable_msi_x
> 		pci_enable_msix_range
> 		  __pci_enable_msix_range
> 		    __pci_enable_msix
> 		      if (!pci_msi_supported())
> 			if (dev->current_state != PCI_D0)  # <---
> 			  return 0
> 			return -EINVAL
> 		err = -EOPNOTSUPP
> 		"INTx is not supported ..."
> 
> (These are just my notes; you don't need to put them all into the
> commit message.  I'm just sharing them in case I'm not understanding
> correctly.)

Yes, these notes are accurate.

> > > > > When the system starts again, a fresh kernel starts to run, and when the
> > > > > kernel detects that a hibernation image was saved, the kernel
> "quiesces"
> > > > > the devices, and then "restores" the devices from the saved image. In
> this
> > > > > path:
> > > > > device_resume_noirq() -> ... ->
> > > > >    pci_pm_restore_noirq() ->
> > > > >      pci_pm_default_resume_early() ->
> > > > >        pci_power_up() moves the device states back to PCI_D0. This
> path is
> > > > > not broken and doesn't need my patch.
> > > > >
> 
> The cc list suggests that this might be a fix for a user-reported
> problem.  Is there a launchpad or similar link you could include here?

I guess I'm the first one to notice the issue and there is not any bug link AFAIK.

The hibernation process usually saves the states into a local disk (before the
system is powered off), and the Mellanox NIC is not needed during the process,
so it's not a real issue that the NIC can not work between pci_pm_thaw() and 
power_down(). This may explain why nobody else noticed the issue. I happened
to see the error message, and hence investigated the issue.

> Should this be marked for stable?

I think we should do it.
 
> > > > > --- a/drivers/pci/pci-driver.c
> > > > > +++ b/drivers/pci/pci-driver.c
> > > > > @@ -1074,15 +1074,16 @@ static int pci_pm_thaw_noirq(struct device
> > > > *dev)
> > > > >   			return error;
> > > > >   	}
> > > > >
> > > > > -	if (pci_has_legacy_pm_support(pci_dev))
> > > > > -		return pci_legacy_resume_early(dev);
> > > > > -
> > > > >   	/*
> > > > >   	 * pci_restore_state() requires the device to be in D0 (because
> of MSI
> > > > >   	 * restoration among other things), so force it into D0 in case
> the
> > > > >   	 * driver's "freeze" callbacks put it into a low-power state
> directly.
> > > > >   	 */
> > > > >   	pci_set_power_state(pci_dev, PCI_D0);
> > > > > +
> > > > > +	if (pci_has_legacy_pm_support(pci_dev))
> > > > > +		return pci_legacy_resume_early(dev);
> > > > > +
> > > > >   	pci_restore_state(pci_dev);
> > > > >
> > > > >   	if (drv && drv->pm && drv->pm->thaw_noirq)
> > > > > --
> > > > > 2.19.1
> > > > >
> > The patch looks reasonable to me, but the comment above the
> > pci_set_power_state() call needs to be updated too IMO.
> 
> Hmm.
> 
> 1) pci_restore_state() mainly writes config space, which doesn't
> require the device to be in D0.  The only thing I see that would
> require D0 is the MSI-X MMIO space, so to be more specific, the
> comment could say "restoring the MSI-X *MMIO* state requires the
> device to be in D0".
> 
> But I think you meant some other comment change.  Did you mean
> something along the lines of "a legacy drv->resume_early() callback
> and pci_restore_state() both require the device to be in D0"?
> 
> If something else, maybe you could propose some text?
> 
> 2) I assume pci_pm_thaw_noirq() should leave the device in a
> functionally equivalent state, whether it uses legacy PM or not.  Do
> we want something like the patch below instead?  If we *do* want to
> skip pci_restore_state() for legacy PM, maybe we should add a comment.
> 
> 3) Documentation/power/pci.rst says:
> 
>   ... devices have to be brought back to the fully functional
>   state ...
> 
>   pci_pm_thaw_noirq() ... doesn't put the device into the full power
>   state and doesn't attempt to restore its standard configuration
>   registers.
> 
> That doesn't seem consistent, and it looks like pci_pm_thaw_noirq()
> actually *does* put the device in full power (D0) state and restore
> config registers.

I would leave these questions to Rafael.
 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index a8124e47bf6e..30c721fd6bcf 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1068,7 +1068,7 @@ static int pci_pm_thaw_noirq(struct device *dev)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
>  	struct device_driver *drv = dev->driver;
> -	int error = 0;
> +	int error;
> 
>  	if (pcibios_pm_ops.thaw_noirq) {
>  		error = pcibios_pm_ops.thaw_noirq(dev);
> @@ -1076,9 +1076,6 @@ static int pci_pm_thaw_noirq(struct device *dev)
>  			return error;
>  	}
> 
> -	if (pci_has_legacy_pm_support(pci_dev))
> -		return pci_legacy_resume_early(dev);
> -
>  	/*
>  	 * pci_restore_state() requires the device to be in D0 (because of MSI
>  	 * restoration among other things), so force it into D0 in case the
> @@ -1087,10 +1084,13 @@ static int pci_pm_thaw_noirq(struct device *dev)
>  	pci_set_power_state(pci_dev, PCI_D0);
>  	pci_restore_state(pci_dev);
> 
> +	if (pci_has_legacy_pm_support(pci_dev))
> +		return pci_legacy_resume_early(dev);
> +
>  	if (drv && drv->pm && drv->pm->thaw_noirq)
> -		error = drv->pm->thaw_noirq(dev);
> +		return drv->pm->thaw_noirq(dev);
> 
> -	return error;
> +	return 0;
>  }
> 
>  static int pci_pm_thaw(struct device *dev)

The only real difference from my patch is that you moved

 +	if (pci_has_legacy_pm_support(pci_dev))
 +		return pci_legacy_resume_early(dev);

to after the line "pci_restore_state(pci_dev);"

This change is good to me, and shoud also resolve the error message I saw.

Thanks,
-- Dexuan


More information about the devel mailing list