Discussion:
[PATCH] ARM: OMAP2+: omap_device: drop broken RPM status update from suspend_noirq
Johan Hovold
2017-07-24 09:52:01 UTC
Permalink
Since commit a8636c89648a ("PM / Runtime: Don't allow to suspend a
device with an active child"), which went into 4.10, it is no longer
permitted to set RPM_SUSPENDED state for a device with active children
(unless power.ignore_children is set).

This specifically means that the attempts to do just that from the omap
pm-domain suspend_noirq callback have since been failing whenever a
child is active, for example:

am335x-usb-childs 47400000.usb: runtime PM trying to suspend
device but active child

Silence this warning by dropping the broken pm_runtime_set_suspended()
call from the omap suspend_noirq callback along with the redundant
pm_runtime_set_active() in resume_noirq.

This effectively reverts commit 3522bf7bfa24 ("ARM: OMAP2+: omap_device:
maintain sane runtime pm status around suspend/resume"), which started
updating the RPM state after the runtime_suspend callback (!) for active
omap devices had been called during system suspend. The rationale was
that a later pm_runtime_get_sync() would then fail (even after runtime
pm had been disabled) and that this in turn would avoid any external
aborts when accessing registers with clocks disabled. (See also commit
6f3c77b040fc ("PM / Runtime: let rpm_resume() succeed if RPM_ACTIVE,
even when disabled, v2").

But during the suspend_noirq phase all children would already have been
suspended and their drivers would specifically not attempt any further
register accesses. And if this was all just a workaround for random
device drivers doing cross-tree calls during system suspend, those
drivers should be fixed and updated to explicitly model such
dependencies using device-links instead (and either way, any such calls
have been causing crashes since 4.10).

Fixes: 3522bf7bfa24 ("ARM: OMAP2+: omap_device: maintain sane runtime pm status around suspend/resume")
Fixes: a8636c89648a ("PM / Runtime: Don't allow to suspend a device with an active child")
Cc: Alan Stern <***@rowland.harvard.edu>
Cc: Dave Gerlach <d-***@ti.com>
Cc: Kevin Hilman <***@baylibre.com>
Cc: Nishanth Menon <***@ti.com>
Cc: Rafael J. Wysocki <***@rjwysocki.net>
Cc: Tony Lindgren <***@atomide.com>
Cc: Ulf Hansson <***@linaro.org>
Signed-off-by: Johan Hovold <***@kernel.org>
---
arch/arm/mach-omap2/omap_device.c | 10 ----------
1 file changed, 10 deletions(-)

diff --git a/arch/arm/mach-omap2/omap_device.c b/arch/arm/mach-omap2/omap_device.c
index ef9ffb8ac912..acbede082b5b 100644
--- a/arch/arm/mach-omap2/omap_device.c
+++ b/arch/arm/mach-omap2/omap_device.c
@@ -672,7 +672,6 @@ static int _od_suspend_noirq(struct device *dev)

if (!ret && !pm_runtime_status_suspended(dev)) {
if (pm_generic_runtime_suspend(dev) == 0) {
- pm_runtime_set_suspended(dev);
omap_device_idle(pdev);
od->flags |= OMAP_DEVICE_SUSPENDED;
}
@@ -689,15 +688,6 @@ static int _od_resume_noirq(struct device *dev)
if (od->flags & OMAP_DEVICE_SUSPENDED) {
od->flags &= ~OMAP_DEVICE_SUSPENDED;
omap_device_enable(pdev);
- /*
- * XXX: we run before core runtime pm has resumed itself. At
- * this point in time, we just restore the runtime pm state and
- * considering symmetric operations in resume, we donot expect
- * to fail. If we failed, something changed in core runtime_pm
- * framework OR some device driver messed things up, hence, WARN
- */
- WARN(pm_runtime_set_active(dev),
- "Could not set %s runtime state active\n", dev_name(dev));
pm_generic_runtime_resume(dev);
}
--
2.13.3
Grygorii Strashko
2017-07-24 22:16:02 UTC
Permalink
Post by Johan Hovold
Since commit a8636c89648a ("PM / Runtime: Don't allow to suspend a
device with an active child"), which went into 4.10, it is no longer
permitted to set RPM_SUSPENDED state for a device with active children
(unless power.ignore_children is set).
This specifically means that the attempts to do just that from the omap
pm-domain suspend_noirq callback have since been failing whenever a
am335x-usb-childs 47400000.usb: runtime PM trying to suspend
device but active child
Silence this warning by dropping the broken pm_runtime_set_suspended()
call from the omap suspend_noirq callback along with the redundant
pm_runtime_set_active() in resume_noirq.
maintain sane runtime pm status around suspend/resume"), which started
updating the RPM state after the runtime_suspend callback (!) for active
omap devices had been called during system suspend. The rationale was
that a later pm_runtime_get_sync() would then fail (even after runtime
pm had been disabled) and that this in turn would avoid any external
aborts when accessing registers with clocks disabled. (See also commit
6f3c77b040fc ("PM / Runtime: let rpm_resume() succeed if RPM_ACTIVE,
even when disabled, v2").
But during the suspend_noirq phase all children would already have been
suspended and their drivers would specifically not attempt any further
register accesses. And if this was all just a workaround for random
device drivers doing cross-tree calls during system suspend, those
drivers should be fixed and updated to explicitly model such
dependencies using device-links instead (and either way, any such calls
have been causing crashes since 4.10).
I'd like to provide some background and comments here.

1) OMAP platform is PM runtime centric - in other words all device's
PM operations (power on/off) are done through PM runtime calls and devices
drivers do not really change device's Power states during suspend/resume
- just do preparation for suspend. The final disabling of devices is done
form _od_suspend_noirq() or if device driver calls pm_runtime_force_suspend()
directly during suspend.

2) Initially all this suspend_noirq code was introduced due to commit

commit 1e2ef05bb8cf851a694d38e9170c89e7ff052741
Author: Rafael J. Wysocki <***@sisk.pl>
Date: Wed Jul 6 10:51:58 2011 +0200

PM: Limit race conditions between runtime PM and system sleep (v2)

which blocks PM runtime during suspend resume. And, at that time, there were
no pm_runtime_force_suspend/resume() neither proper support for power domains.

As result, below two calls really switch OFF device and its state should be
RPM_SUSPENDED as it will reflect real PM state of HW:
m_generic_runtime_suspend(dev)
omap_device_idle(pdev);

In general, now the code in _od_resume/resume_noirq() callback is
equivalent to pm_runtime_force_suspend/resume().

3) It's expected to ignore -EBUSY return code from pm_generic_runtime_suspend(dev)
in _od_suspend_noirq(), as such code (-EBUSY) tells OMAP device framework that
device should be kept active during suspend (for example
serial device in case of "no_console_suspend").

commit 4b7ec5accecdb136c7afaf8739a06d5335cc05aa
Author: Sourav Poddar <***@ti.com>
Date: Sat Apr 27 01:55:34 2013 +0530

arm: omap2+: omap_device: remove no_idle_on_suspend

As result, when "runtime PM trying to suspend device but active child" happens
the OMAP device framework will ignore it and gracefully continue to suspend where
target device will finally powered off (depending on suspend mode and device).
On resume, target device will not be powered on, because its state was not marked as
OMAP_DEVICE_SUSPENDED, so further attempts to power on device with pm_runtime_get_sync()
will be a NOP (because device state is RPM_ACTIVE) and any access to the device will fail
(system crash).

4) Unfortunately, commit a8636c89648a ("PM / Runtime: Don't allow to suspend a
device with an active child") "breaks" OMAP suspend functionality (or
makes visible more fundamental issues of PM runtime vs Sys suspend), because
not all Kernel subsystem (especially USB) maintain their devices PM runtime
state properly during suspend, which cause pm_runtime_set_suspended() (or
pm_runtime_force_suspend()) to return -EBUSY and, as result, misbehave of
OMAP device framework.


My personal thought here is that removing of pm_runtime_set_active() will not fix
root cause of the problem, but rather hide it :( and, probably, real fix will be
to update USB framework to ensure that all suspend devices are also PM runtime suspend
(not sure how) or add few more pm_suspend_ignore_children() calls
(for example as I've tried to do in [2], but this was unfinished).

I've found very simple steps to reproduce suspend failure on am335x-evm (should also
work on BBB) - do below sequence with USB device plugged:

echo platform > /sys/power/pm_test
echo 1 > /sys/power/pm_print_times
[ echo 0 > /sys/module/printk/parameters/console_suspend ]
echo mem > /sys/power/state

[ 95.499685] calling 47400000.usb+ @ 733, parent: ocp
[ 95.504818] am335x-usb-childs 47400000.usb: runtime PM trying to suspend device but active child
[ 95.513750] am335x-usb-childs 47400000.usb: omap device suspend failure 0

Below I've attached possible patch which converts OMAP device to
use pm_runtime_force_suspend/resume().


Related discussion:
[1] https://patchwork.kernel.org/patch/9642979/
[2] https://patchwork.kernel.org/patch/9660517/

regards,
-grygorii

----
From 05cd6c19f898ca28480b9ca8dca5b987190351d9 Mon Sep 17 00:00:00 2001
From: Grygorii Strashko <***@ti.com>
Date: Mon, 24 Jul 2017 16:54:48 -0500
Subject: [PATCH] ARM: OMAP2+: omap_device: use pm runtime force api in noirq
callbacks

The current implementation of _od_suspend_noirq and _od_resume_noirq
callbacks is duing actually the same things as
pm_runtime_force_suspend/resume() when it disables currently active
devices, so convert those callbacks to use
pm_runtime_force_suspend/resume().

Signed-off-by: Grygorii Strashko <***@ti.com>
---
arch/arm/mach-omap2/omap_device.c | 41 +++++++++++++++++++++------------------
1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/arch/arm/mach-omap2/omap_device.c b/arch/arm/mach-omap2/omap_device.c
index ef9ffb8..b91b0cd 100644
--- a/arch/arm/mach-omap2/omap_device.c
+++ b/arch/arm/mach-omap2/omap_device.c
@@ -670,12 +670,21 @@ static int _od_suspend_noirq(struct device *dev)

ret = pm_generic_suspend_noirq(dev);

- if (!ret && !pm_runtime_status_suspended(dev)) {
- if (pm_generic_runtime_suspend(dev) == 0) {
- pm_runtime_set_suspended(dev);
- omap_device_idle(pdev);
- od->flags |= OMAP_DEVICE_SUSPENDED;
+ if (!ret) {
+ ret = pm_runtime_force_suspend(dev);
+ if (ret == -EBUSY) {
+ /*
+ * In case of error pm_runtime_force_suspend will
+ * not call pm_runtime_disable() and it's required to
+ * make additional call to pm_runtime_disable() here
+ * to balance it with pm_runtime_enable() call in
+ * pm_runtime_force_resume()
+ */
+ pm_runtime_disable(dev);
+ return 0;
}
+ if (ret)
+ dev_err(dev, "omap device suspend failure %d", ret);
}

return ret;
@@ -685,21 +694,15 @@ static int _od_resume_noirq(struct device *dev)
{
struct platform_device *pdev = to_platform_device(dev);
struct omap_device *od = to_omap_device(pdev);
+ int ret;

- if (od->flags & OMAP_DEVICE_SUSPENDED) {
- od->flags &= ~OMAP_DEVICE_SUSPENDED;
- omap_device_enable(pdev);
- /*
- * XXX: we run before core runtime pm has resumed itself. At
- * this point in time, we just restore the runtime pm state and
- * considering symmetric operations in resume, we donot expect
- * to fail. If we failed, something changed in core runtime_pm
- * framework OR some device driver messed things up, hence, WARN
- */
- WARN(pm_runtime_set_active(dev),
- "Could not set %s runtime state active\n", dev_name(dev));
- pm_generic_runtime_resume(dev);
- }
+ /* Don't attempt late suspend on a driver that is not bound */
+ if (od->_driver_status != BUS_NOTIFY_BOUND_DRIVER)
+ return 0;
+
+ ret = pm_runtime_force_resume(dev);
+ if (ret)
+ dev_err(dev, "omap device resume failure %d", ret);

return pm_generic_resume_noirq(dev);
}
--
2.10.1
Tony Lindgren
2017-07-25 07:10:49 UTC
Permalink
Post by Grygorii Strashko
My personal thought here is that removing of pm_runtime_set_active() will not fix
root cause of the problem, but rather hide it :( and, probably, real fix will be
to update USB framework to ensure that all suspend devices are also PM runtime suspend
(not sure how) or add few more pm_suspend_ignore_children() calls
(for example as I've tried to do in [2], but this was unfinished).
I've found very simple steps to reproduce suspend failure on am335x-evm (should also
echo platform > /sys/power/pm_test
echo 1 > /sys/power/pm_print_times
[ echo 0 > /sys/module/printk/parameters/console_suspend ]
echo mem > /sys/power/state
[ 95.504818] am335x-usb-childs 47400000.usb: runtime PM trying to suspend device but active child
[ 95.513750] am335x-usb-childs 47400000.usb: omap device suspend failure 0
Below I've attached possible patch which converts OMAP device to
use pm_runtime_force_suspend/resume().
It seems to almost work for my PM test cases.. It seems that serial console
somehow won't get restored after suspend/resume cycle on omap3 though.

The system enters off mode during suspend, and wakes up properly so I can
ssh to it after resume. But the serial console no longer works after resume.
This is with 8250-omap driver.

Regards,

Tony
Tony Lindgren
2017-07-25 08:56:18 UTC
Permalink
Post by Tony Lindgren
Post by Grygorii Strashko
My personal thought here is that removing of pm_runtime_set_active() will not fix
root cause of the problem, but rather hide it :( and, probably, real fix will be
to update USB framework to ensure that all suspend devices are also PM runtime suspend
(not sure how) or add few more pm_suspend_ignore_children() calls
(for example as I've tried to do in [2], but this was unfinished).
I've found very simple steps to reproduce suspend failure on am335x-evm (should also
echo platform > /sys/power/pm_test
echo 1 > /sys/power/pm_print_times
[ echo 0 > /sys/module/printk/parameters/console_suspend ]
echo mem > /sys/power/state
[ 95.504818] am335x-usb-childs 47400000.usb: runtime PM trying to suspend device but active child
[ 95.513750] am335x-usb-childs 47400000.usb: omap device suspend failure 0
Below I've attached possible patch which converts OMAP device to
use pm_runtime_force_suspend/resume().
It seems to almost work for my PM test cases.. It seems that serial console
somehow won't get restored after suspend/resume cycle on omap3 though.
The system enters off mode during suspend, and wakes up properly so I can
ssh to it after resume. But the serial console no longer works after resume.
This is with 8250-omap driver.
And FYI, on omap4 this produces a bunch of L3 irq errors on suspend.

Tony
Grygorii Strashko
2017-07-25 17:41:27 UTC
Permalink
Post by Tony Lindgren
Post by Tony Lindgren
Post by Grygorii Strashko
My personal thought here is that removing of pm_runtime_set_active() will not fix
root cause of the problem, but rather hide it :( and, probably, real fix will be
to update USB framework to ensure that all suspend devices are also PM runtime suspend
(not sure how) or add few more pm_suspend_ignore_children() calls
(for example as I've tried to do in [2], but this was unfinished).
I've found very simple steps to reproduce suspend failure on am335x-evm (should also
echo platform > /sys/power/pm_test
echo 1 > /sys/power/pm_print_times
[ echo 0 > /sys/module/printk/parameters/console_suspend ]
echo mem > /sys/power/state
[ 95.504818] am335x-usb-childs 47400000.usb: runtime PM trying to suspend device but active child
[ 95.513750] am335x-usb-childs 47400000.usb: omap device suspend failure 0
Below I've attached possible patch which converts OMAP device to
use pm_runtime_force_suspend/resume().
It seems to almost work for my PM test cases.. It seems that serial console
somehow won't get restored after suspend/resume cycle on omap3 though.
The system enters off mode during suspend, and wakes up properly so I can
ssh to it after resume. But the serial console no longer works after resume.
This is with 8250-omap driver.
And FYI, on omap4 this produces a bunch of L3 irq errors on suspend.
Ok. This was a risky change even if it looks obvious :(
--
regards,
-grygorii
Johan Hovold
2017-07-25 08:24:44 UTC
Permalink
Post by Grygorii Strashko
Post by Johan Hovold
Since commit a8636c89648a ("PM / Runtime: Don't allow to suspend a
device with an active child"), which went into 4.10, it is no longer
permitted to set RPM_SUSPENDED state for a device with active children
(unless power.ignore_children is set).
This specifically means that the attempts to do just that from the omap
pm-domain suspend_noirq callback have since been failing whenever a
am335x-usb-childs 47400000.usb: runtime PM trying to suspend
device but active child
Silence this warning by dropping the broken pm_runtime_set_suspended()
call from the omap suspend_noirq callback along with the redundant
pm_runtime_set_active() in resume_noirq.
maintain sane runtime pm status around suspend/resume"), which started
updating the RPM state after the runtime_suspend callback (!) for active
omap devices had been called during system suspend. The rationale was
that a later pm_runtime_get_sync() would then fail (even after runtime
pm had been disabled) and that this in turn would avoid any external
aborts when accessing registers with clocks disabled. (See also commit
6f3c77b040fc ("PM / Runtime: let rpm_resume() succeed if RPM_ACTIVE,
even when disabled, v2").
But during the suspend_noirq phase all children would already have been
suspended and their drivers would specifically not attempt any further
register accesses. And if this was all just a workaround for random
device drivers doing cross-tree calls during system suspend, those
drivers should be fixed and updated to explicitly model such
dependencies using device-links instead (and either way, any such calls
have been causing crashes since 4.10).
I'd like to provide some background and comments here.
1) OMAP platform is PM runtime centric - in other words all device's
PM operations (power on/off) are done through PM runtime calls and
devices drivers do not really change device's Power states during
suspend/resume - just do preparation for suspend. The final disabling
of devices is done form _od_suspend_noirq() or if device driver calls
pm_runtime_force_suspend() directly during suspend.
2) Initially all this suspend_noirq code was introduced due to commit
commit 1e2ef05bb8cf851a694d38e9170c89e7ff052741
Date: Wed Jul 6 10:51:58 2011 +0200
PM: Limit race conditions between runtime PM and system sleep (v2)
which blocks PM runtime during suspend resume. And, at that time,
there were no pm_runtime_force_suspend/resume() neither proper support
for power domains.
As result, below two calls really switch OFF device and its state
m_generic_runtime_suspend(dev)
omap_device_idle(pdev);
That's the point to be discussed; does the runtime status really need to
reflect the hardware state *during* suspend?

As you've noticed, not all subsystems enforce that, even if omap seems
to have been expecting it.

And as I mentioned in the commit message, at least the point about
wanting to have late pm_runtime_get_sync() fail during suspend appears
to be moot.
Post by Grygorii Strashko
In general, now the code in _od_resume/resume_noirq() callback is
equivalent to pm_runtime_force_suspend/resume().
3) It's expected to ignore -EBUSY return code from
pm_generic_runtime_suspend(dev) in _od_suspend_noirq(), as such code
(-EBUSY) tells OMAP device framework that device should be kept active
during suspend (for example serial device in case of
"no_console_suspend").
commit 4b7ec5accecdb136c7afaf8739a06d5335cc05aa
Date: Sat Apr 27 01:55:34 2013 +0530
arm: omap2+: omap_device: remove no_idle_on_suspend
As result, when "runtime PM trying to suspend device but active child"
happens the OMAP device framework will ignore it and gracefully
continue to suspend where target device will finally powered off
(depending on suspend mode and device). On resume, target device will
not be powered on, because its state was not marked as
OMAP_DEVICE_SUSPENDED, so further attempts to power on device with
pm_runtime_get_sync() will be a NOP (because device state is
RPM_ACTIVE) and any access to the device will fail (system crash).
I believe you're mistaken here; _od_suspend_noirq() will continue to
power-off *and* set OMAP_DEVICE_SUSPENDED, so that device is again
powered during resume_noirq also after this patch has been applied.

Specifically, note that the return value of pm_runtime_set_suspended()
was never checked, so the only difference here would be that the error
message is suppressed.
Post by Grygorii Strashko
4) Unfortunately, commit a8636c89648a ("PM / Runtime: Don't allow to
suspend a device with an active child") "breaks" OMAP suspend
functionality (or makes visible more fundamental issues of PM runtime
vs Sys suspend), because not all Kernel subsystem (especially USB)
maintain their devices PM runtime state properly during suspend, which
cause pm_runtime_set_suspended() (or pm_runtime_force_suspend()) to
return -EBUSY and, as result, misbehave of OMAP device framework.
Again, the return value of pm_runtime_set_suspended() was never checked
so the only thing commit a8636c89648a ("PM / Runtime: Don't allow to
suspend a device with an active child") did was to trigger those error
messages and leave the runtime PM state at RPM_ACTIVE. The latter does
not seems to have any other side effects, and specifically, the device
would still be powered off.
Post by Grygorii Strashko
My personal thought here is that removing of pm_runtime_set_active()
will not fix root cause of the problem, but rather hide it :( and,
probably, real fix will be to update USB framework to ensure that all
suspend devices are also PM runtime suspend (not sure how) or add few
more pm_suspend_ignore_children() calls (for example as I've tried to
do in [2], but this was unfinished).
I've found very simple steps to reproduce suspend failure on
am335x-evm (should also > work on BBB) - do below sequence with USB
echo platform > /sys/power/pm_test
echo 1 > /sys/power/pm_print_times
[ echo 0 > /sys/module/printk/parameters/console_suspend ]
echo mem > /sys/power/state
[ 95.504818] am335x-usb-childs 47400000.usb: runtime PM trying to suspend device but active child
[ 95.513750] am335x-usb-childs 47400000.usb: omap device suspend failure 0
Below I've attached possible patch which converts OMAP device to
use pm_runtime_force_suspend/resume().
What code are you running above? The OMAP PM code should not be failing
due to that error message at least (as mentioned twice above).

I have musb suspend working on BBB with the patch I posted yesterday
which keeps the controller at RPM_ACTIVE during suspend:

https://marc.info/?i=20170724094939.21477-1-johan%40kernel.org

Thanks,
Johan
Grygorii Strashko
2017-07-25 17:48:40 UTC
Permalink
Hi Johan,
Post by Johan Hovold
Post by Grygorii Strashko
Post by Johan Hovold
Since commit a8636c89648a ("PM / Runtime: Don't allow to suspend a
device with an active child"), which went into 4.10, it is no longer
permitted to set RPM_SUSPENDED state for a device with active children
(unless power.ignore_children is set).
This specifically means that the attempts to do just that from the omap
pm-domain suspend_noirq callback have since been failing whenever a
am335x-usb-childs 47400000.usb: runtime PM trying to suspend
device but active child
Silence this warning by dropping the broken pm_runtime_set_suspended()
call from the omap suspend_noirq callback along with the redundant
pm_runtime_set_active() in resume_noirq.
maintain sane runtime pm status around suspend/resume"), which started
updating the RPM state after the runtime_suspend callback (!) for active
omap devices had been called during system suspend. The rationale was
that a later pm_runtime_get_sync() would then fail (even after runtime
pm had been disabled) and that this in turn would avoid any external
aborts when accessing registers with clocks disabled. (See also commit
6f3c77b040fc ("PM / Runtime: let rpm_resume() succeed if RPM_ACTIVE,
even when disabled, v2").
But during the suspend_noirq phase all children would already have been
suspended and their drivers would specifically not attempt any further
register accesses. And if this was all just a workaround for random
device drivers doing cross-tree calls during system suspend, those
drivers should be fixed and updated to explicitly model such
dependencies using device-links instead (and either way, any such calls
have been causing crashes since 4.10).
I'd like to provide some background and comments here.
1) OMAP platform is PM runtime centric - in other words all device's
PM operations (power on/off) are done through PM runtime calls and
devices drivers do not really change device's Power states during
suspend/resume - just do preparation for suspend. The final disabling
of devices is done form _od_suspend_noirq() or if device driver calls
pm_runtime_force_suspend() directly during suspend.
2) Initially all this suspend_noirq code was introduced due to commit
commit 1e2ef05bb8cf851a694d38e9170c89e7ff052741
Date: Wed Jul 6 10:51:58 2011 +0200
PM: Limit race conditions between runtime PM and system sleep (v2)
which blocks PM runtime during suspend resume. And, at that time,
there were no pm_runtime_force_suspend/resume() neither proper support
for power domains.
As result, below two calls really switch OFF device and its state
m_generic_runtime_suspend(dev)
omap_device_idle(pdev);
That's the point to be discussed; does the runtime status really need to
reflect the hardware state *during* suspend?
At least there are no restriction, as I know.
Post by Johan Hovold
As you've noticed, not all subsystems enforce that, even if omap seems
to have been expecting it.
True.
Post by Johan Hovold
And as I mentioned in the commit message, at least the point about
wanting to have late pm_runtime_get_sync() fail during suspend appears
to be moot.
Correct. It was introduced long time ago (before device-links).
Post by Johan Hovold
Post by Grygorii Strashko
In general, now the code in _od_resume/resume_noirq() callback is
equivalent to pm_runtime_force_suspend/resume().
3) It's expected to ignore -EBUSY return code from
pm_generic_runtime_suspend(dev) in _od_suspend_noirq(), as such code
(-EBUSY) tells OMAP device framework that device should be kept active
during suspend (for example serial device in case of
"no_console_suspend").
commit 4b7ec5accecdb136c7afaf8739a06d5335cc05aa
Date: Sat Apr 27 01:55:34 2013 +0530
arm: omap2+: omap_device: remove no_idle_on_suspend
As result, when "runtime PM trying to suspend device but active child"
happens the OMAP device framework will ignore it and gracefully
continue to suspend where target device will finally powered off
(depending on suspend mode and device). On resume, target device will
not be powered on, because its state was not marked as
OMAP_DEVICE_SUSPENDED, so further attempts to power on device with
pm_runtime_get_sync() will be a NOP (because device state is
RPM_ACTIVE) and any access to the device will fail (system crash).
I believe you're mistaken here; _od_suspend_noirq() will continue to
power-off *and* set OMAP_DEVICE_SUSPENDED, so that device is again
powered during resume_noirq also after this patch has been applied.
Specifically, note that the return value of pm_runtime_set_suspended()
was never checked, so the only difference here would be that the error
message is suppressed.
Correct. Sorry. My bad - it was Monday.
Post by Johan Hovold
Post by Grygorii Strashko
4) Unfortunately, commit a8636c89648a ("PM / Runtime: Don't allow to
suspend a device with an active child") "breaks" OMAP suspend
functionality (or makes visible more fundamental issues of PM runtime
vs Sys suspend), because not all Kernel subsystem (especially USB)
maintain their devices PM runtime state properly during suspend, which
cause pm_runtime_set_suspended() (or pm_runtime_force_suspend()) to
return -EBUSY and, as result, misbehave of OMAP device framework.
Again, the return value of pm_runtime_set_suspended() was never checked
so the only thing commit a8636c89648a ("PM / Runtime: Don't allow to
suspend a device with an active child") did was to trigger those error
messages and leave the runtime PM state at RPM_ACTIVE. The latter does
not seems to have any other side effects, and specifically, the device
would still be powered off.
Again sorry for the noise.
Post by Johan Hovold
Post by Grygorii Strashko
My personal thought here is that removing of pm_runtime_set_active()
will not fix root cause of the problem, but rather hide it :( and,
probably, real fix will be to update USB framework to ensure that all
suspend devices are also PM runtime suspend (not sure how) or add few
more pm_suspend_ignore_children() calls (for example as I've tried to
do in [2], but this was unfinished).
I've found very simple steps to reproduce suspend failure on
am335x-evm (should also > work on BBB) - do below sequence with USB
echo platform > /sys/power/pm_test
echo 1 > /sys/power/pm_print_times
[ echo 0 > /sys/module/printk/parameters/console_suspend ]
echo mem > /sys/power/state
[ 95.504818] am335x-usb-childs 47400000.usb: runtime PM trying to suspend device but active child
[ 95.513750] am335x-usb-childs 47400000.usb: omap device suspend failure 0
Below I've attached possible patch which converts OMAP device to
use pm_runtime_force_suspend/resume().
What code are you running above? The OMAP PM code should not be failing
due to that error message at least (as mentioned twice above).
I have musb suspend working on BBB with the patch I posted yesterday
https://marc.info/?i=20170724094939.21477-1-johan%40kernel.org
I re-checked it on clean master (Linux 4.13-rc2) with and without your patches
and see no issues, just "runtime PM trying to suspend device but active child"
message is gone. Above test sequence still can be used without any additional patches.

So, thank you for your patches and sorry for the noise.

Tested-by: Grygorii Strashko <***@ti.com>
--
regards,
-grygorii
Tony Lindgren
2017-07-25 08:55:17 UTC
Permalink
Post by Johan Hovold
Since commit a8636c89648a ("PM / Runtime: Don't allow to suspend a
device with an active child"), which went into 4.10, it is no longer
permitted to set RPM_SUSPENDED state for a device with active children
(unless power.ignore_children is set).
This specifically means that the attempts to do just that from the omap
pm-domain suspend_noirq callback have since been failing whenever a
am335x-usb-childs 47400000.usb: runtime PM trying to suspend
device but active child
Silence this warning by dropping the broken pm_runtime_set_suspended()
call from the omap suspend_noirq callback along with the redundant
pm_runtime_set_active() in resume_noirq.
With this things still work for me for my PM tests on omap3. Device
suspends and resumes just fine hitting off mode during suspend.

Regards,

Tony
Loading...