Philipp Zabel
2017-07-21 10:30:47 UTC
Hi Maxim,
thank you for the patch and the analysis. I have some comments below.
be easier to comment on specifics. Also, as a thermal patch, this should
be sent to linux-***@vger.kernel.org. The scripts/get_maintainers.pl
script in the kernel sources can help to find the relevant maintainers
and mailing lists.
I think the data->irq_enabled assignment should be moved up before the
call to devm_request_threaded_irq as well. If the interrupt triggers
immediately (and sets data->irq_enabled=false), and then
imx_thermal_probe returns (after setting data->irq_enabled=true) before
the threaded irq handler gets to run, imx_get_temp will not reenable the
interrupt at the end, if the temperature has just fallen below the alarm
temperature.
Reviewed-by: Philipp Zabel <***@pengutronix.de>
best regards
Philipp
thank you for the patch and the analysis. I have some comments below.
Hello, Shawn, Sascha,
We use yocto (with mainline kernel 4.4.18) for our i.MX 6SOLO-based
custom board. We use in-kernel thermal management framework with
default thermal governor (step_wise) (drivers/thermal/imx-thermal.c)
imx_thermal 2000000.aips-bus:tempmon: Commercial CPU temperature grade -
max:95C critical:90C passive:85C
We heat up the board in climate chamber until temperature reaches
critical value (90C) and in-kernel thermal management powers it off.
After short period of time (when temperature is in range
passive(85C)...critical(90C), we power up the board again so the alarm
condition is met and imx_thermal interrupt fires up. When we try to read
out the temperature from corresponding sysfs file we permanently get the
/sys/class/thermal/thermal_zone0/temp
...
open("/sys/class/thermal/thermal_zone0/temp", O_RDONLY|O_LARGEFILE) = 3
sendfile64(1, 3, NULL, 16777216) = -1 EAGAIN (Resource
temporarily unavailable)
read(3, 0x7ee5bc00, 4096) = -1 EAGAIN (Resource
temporarily unavailable)
brk(NULL) = 0x1ad9000
brk(0x1afa000) = 0x1afa000
Resource temporarily unavailable
CPU0 <snip>
271: 2 GPC 49 Level imx_thermal
<snip>
There are a couple of workarounds to enforce the temperature file to be
readable. If we explicitly enable the thermal's mode via sysfs (echo
enabled > /sys/sys/class/therma/thermal_zone0/mode) the temperature file
becomes readable. The same applies to suspend/resume cycles - after
resume the temperature file is readable.
Having analyzed/debugged the code (for both mainline and freescale's
In imx_thermal_probe() thermal alarm interrupt is enabled before
device's 'mode' field is set to THERMAL_DEVICE_ENABLED while the sensor
hardware is already powered up. If alarm condition is met -
the interrupt immediately fires up. During (threaded) interrupt
processing imx_get_temp() is called. The field 'mode' is still set to
DISABLED, so imx_get_temp() processes such case by special way: sensor
is powered up,
measurement is enabled, a reading is taken, after that measurement is
DISABLED and temperature sensor is POWERED DOWN.
When processing of alarm interrupt ends, imx_thermal_probe() continues
and sets mode field to ENABLED, but in fact the device is powered off!
This leads to broken logic of further calls of imx_get_temp().
The consequences of this bug could be quite serious - the temperature is
not read out from the sensor, so in-kernel thermal management is useless
- the board is not powered off by thermal management when the CPU is
overheated.
Attached is patch against current mainline kernel tree.
It would be preferable to have the patch sent inline. That way it wouldWe use yocto (with mainline kernel 4.4.18) for our i.MX 6SOLO-based
custom board. We use in-kernel thermal management framework with
default thermal governor (step_wise) (drivers/thermal/imx-thermal.c)
imx_thermal 2000000.aips-bus:tempmon: Commercial CPU temperature grade -
max:95C critical:90C passive:85C
We heat up the board in climate chamber until temperature reaches
critical value (90C) and in-kernel thermal management powers it off.
After short period of time (when temperature is in range
passive(85C)...critical(90C), we power up the board again so the alarm
condition is met and imx_thermal interrupt fires up. When we try to read
out the temperature from corresponding sysfs file we permanently get the
/sys/class/thermal/thermal_zone0/temp
...
open("/sys/class/thermal/thermal_zone0/temp", O_RDONLY|O_LARGEFILE) = 3
sendfile64(1, 3, NULL, 16777216) = -1 EAGAIN (Resource
temporarily unavailable)
read(3, 0x7ee5bc00, 4096) = -1 EAGAIN (Resource
temporarily unavailable)
brk(NULL) = 0x1ad9000
brk(0x1afa000) = 0x1afa000
Resource temporarily unavailable
CPU0 <snip>
271: 2 GPC 49 Level imx_thermal
<snip>
There are a couple of workarounds to enforce the temperature file to be
readable. If we explicitly enable the thermal's mode via sysfs (echo
enabled > /sys/sys/class/therma/thermal_zone0/mode) the temperature file
becomes readable. The same applies to suspend/resume cycles - after
resume the temperature file is readable.
Having analyzed/debugged the code (for both mainline and freescale's
In imx_thermal_probe() thermal alarm interrupt is enabled before
device's 'mode' field is set to THERMAL_DEVICE_ENABLED while the sensor
hardware is already powered up. If alarm condition is met -
the interrupt immediately fires up. During (threaded) interrupt
processing imx_get_temp() is called. The field 'mode' is still set to
DISABLED, so imx_get_temp() processes such case by special way: sensor
is powered up,
measurement is enabled, a reading is taken, after that measurement is
DISABLED and temperature sensor is POWERED DOWN.
When processing of alarm interrupt ends, imx_thermal_probe() continues
and sets mode field to ENABLED, but in fact the device is powered off!
This leads to broken logic of further calls of imx_get_temp().
The consequences of this bug could be quite serious - the temperature is
not read out from the sensor, so in-kernel thermal management is useless
- the board is not powered off by thermal management when the CPU is
overheated.
Attached is patch against current mainline kernel tree.
be easier to comment on specifics. Also, as a thermal patch, this should
be sent to linux-***@vger.kernel.org. The scripts/get_maintainers.pl
script in the kernel sources can help to find the relevant maintainers
and mailing lists.
I think the data->irq_enabled assignment should be moved up before the
call to devm_request_threaded_irq as well. If the interrupt triggers
immediately (and sets data->irq_enabled=false), and then
imx_thermal_probe returns (after setting data->irq_enabled=true) before
the threaded irq handler gets to run, imx_get_temp will not reenable the
interrupt at the end, if the temperature has just fallen below the alarm
temperature.
Reviewed-by: Philipp Zabel <***@pengutronix.de>
best regards
Philipp