Discussion:
RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?
Jonathan Cameron
2017-07-25 11:32:10 UTC
Permalink
Hi All,

We observed a regression on our d05 boards (but curiously not
the fairly similar but single socket / smaller core count
d03), initially seen with linux-next prior to the merge window
and still present in v4.13-rc2.

The symptom is:

[ 1982.959365] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1982.965021] 2-...: (10 GPs behind) idle=1d4/0/0 softirq=306/306 fqs=0
[ 1982.971624] 3-...: (2 GPs behind) idle=700/0/0 softirq=307/307 fqs=0
[ 1982.978139] 4-...: (20 GPs behind) idle=9f4/0/0 softirq=651/652 fqs=0
[ 1982.984740] 5-...: (18 GPs behind) idle=a78/0/0 softirq=369/371 fqs=0
[ 1982.991342] 6-...: (26 GPs behind) idle=e5c/0/0 softirq=217/219 fqs=0
[ 1982.997944] 7-...: (1438 GPs behind) idle=eb4/0/0 softirq=260/260 fqs=0
[ 1983.004719] 8-...: (18 GPs behind) idle=830/0/0 softirq=1609/1609 fqs=0
[ 1983.011494] 9-...: (18 GPs behind) idle=e9c/0/0 softirq=242/242 fqs=0
[ 1983.018095] 10-...: (1434 GPs behind) idle=ca0/0/0 softirq=210/212 fqs=0
[ 1983.024957] 11-...: (1106 GPs behind) idle=ee0/0/0 softirq=188/191 fqs=0
[ 1983.031819] 12-...: (1636 GPs behind) idle=c58/0/0 softirq=215/216 fqs=0
[ 1983.038680] 13-...: (1114 GPs behind) idle=c20/0/0 softirq=170/170 fqs=0
[ 1983.045542] 14-...: (1106 GPs behind) idle=d90/0/0 softirq=176/178 fqs=0
[ 1983.052403] 15-...: (1858 GPs behind) idle=900/0/0 softirq=184/185 fqs=0
[ 1983.059266] 16-...: (1621 GPs behind) idle=f04/0/0 softirq=204/206 fqs=0
[ 1983.066127] 17-...: (1433 GPs behind) idle=d30/0/0 softirq=202/202 fqs=0
[ 1983.072988] 18-...: (18 GPs behind) idle=2d4/0/0 softirq=218/220 fqs=0
[ 1983.079676] 19-...: (19 GPs behind) idle=bbc/0/0 softirq=178/180 fqs=0
[ 1983.086364] 20-...: (0 ticks this GP) idle=ee0/0/0 softirq=231/231 fqs=0
[ 1983.093226] 21-...: (4 GPs behind) idle=140/0/0 softirq=208/208 fqs=0
[ 1983.099827] 22-...: (5 GPs behind) idle=100/0/0 softirq=186/188 fqs=0
[ 1983.106428] 23-...: (1635 GPs behind) idle=fd4/0/0 softirq=1220/1221 fqs=0
[ 1983.113463] 24-...: (1112 GPs behind) idle=ca8/0/0 softirq=231/233 fqs=0
[ 1983.120325] 25-...: (1637 GPs behind) idle=9c4/0/0 softirq=164/166 fqs=0
[ 1983.127187] 27-...: (0 ticks this GP) idle=b08/0/0 softirq=182/182 fqs=0
[ 1983.134048] 28-...: (1110 GPs behind) idle=d28/0/0 softirq=179/181 fqs=0
[ 1983.140909] 29-...: (8 GPs behind) idle=1dc/0/0 softirq=196/198 fqs=0
[ 1983.147511] 31-...: (1434 GPs behind) idle=74c/0/0 softirq=160/161 fqs=0
[ 1983.154373] 32-...: (1432 GPs behind) idle=7d4/0/0 softirq=164/164 fqs=0
[ 1983.161234] 33-...: (1632 GPs behind) idle=4dc/0/0 softirq=130/132 fqs=0
[ 1983.168096] 34-...: (57 GPs behind) idle=3b0/0/0 softirq=411/411 fqs=0
[ 1983.174784] 35-...: (1599 GPs behind) idle=8a0/0/0 softirq=177/179 fqs=0
[ 1983.181646] 36-...: (1603 GPs behind) idle=520/0/0 softirq=132/134 fqs=0
[ 1983.188507] 37-...: (8 GPs behind) idle=02c/0/0 softirq=176/178 fqs=0
[ 1983.195108] 38-...: (1442 GPs behind) idle=d8c/0/0 softirq=3189/3190 fqs=0
[ 1983.202144] 39-...: (1431 GPs behind) idle=444/0/0 softirq=117/117 fqs=0
[ 1983.209005] 40-...: (4 GPs behind) idle=688/0/0 softirq=134/136 fqs=0
[ 1983.215606] 41-...: (1599 GPs behind) idle=554/0/0 softirq=2707/2711 fqs=0
[ 1983.222642] 42-...: (1430 GPs behind) idle=15c/0/0 softirq=110/111 fqs=0
[ 1983.229503] 43-...: (4 GPs behind) idle=054/0/0 softirq=101/103 fqs=0
[ 1983.236104] 46-...: (1117 GPs behind) idle=558/0/0 softirq=251/253 fqs=0
[ 1983.242966] 47-...: (1118 GPs behind) idle=5f0/0/0 softirq=110/112 fqs=0
[ 1983.249827] 48-...: (1621 GPs behind) idle=ef4/0/0 softirq=241/242 fqs=0
[ 1983.256689] 49-...: (1648 GPs behind) idle=92c/0/0 softirq=207/208 fqs=0
[ 1983.263550] 52-...: (1439 GPs behind) idle=e40/0/0 softirq=261/263 fqs=0
[ 1983.270412] 54-...: (1434 GPs behind) idle=650/0/0 softirq=258/260 fqs=0
[ 1983.277273] 55-...: (1646 GPs behind) idle=5e0/0/0 softirq=178/178 fqs=0
[ 1983.284135] 56-...: (1646 GPs behind) idle=800/0/0 softirq=249/249 fqs=0
[ 1983.290996] 57-...: (1599 GPs behind) idle=c48/0/0 softirq=222/224 fqs=0
[ 1983.297858] 58-...: (1648 GPs behind) idle=3e8/0/0 softirq=235/235 fqs=0
[ 1983.304719] 59-...: (1434 GPs behind) idle=720/0/0 softirq=201/203 fqs=0
[ 1983.311581] 60-...: (1647 GPs behind) idle=c80/0/0 softirq=250/250 fqs=0
[ 1983.318443] 61-...: (1598 GPs behind) idle=b18/0/0 softirq=208/208 fqs=0
[ 1983.325304] 62-...: (1112 GPs behind) idle=0a4/0/0 softirq=620/620 fqs=0
[ 1983.332166] 63-...: (1109 GPs behind) idle=4b0/0/0 softirq=187/188 fqs=0
[ 1983.339026] (detected by 44, t=5335 jiffies, g=1566, c=1565, q=220)
[ 1983.345371] Task dump for CPU 2:
[ 1983.348587] swapper/2 R running task 0 0 1 0x00000000
[ 1983.355626] Call trace:
[ 1983.358072] [<ffff000008084fb0>] __switch_to+0x90/0xa8
[ 1983.363199] [<7fffffffffffffff>] 0x7fffffffffffffff
[ 1983.368062] Task dump for CPU 3:
[ 1983.371278] swapper/3 R running task 0 0 1 0x00000000
[ 1983.378315] Call trace:
[ 1983.380750] [<ffff000008084fb0>] __switch_to+0x90/0xa8
[ 1983.385881] [<ffff000008ea9000>] page_wait_table+0x1280/0x1800
[ 1983.391699] Task dump for CPU 4:
[ 1983.394915] swapper/4 R running task 0 0 1 0x00000000
[ 1983.401951] Call trace:
[ 1983.404386] [<ffff000008084fb0>] __switch_to+0x90/0xa8
[ 1983.409510] [<7fffffffffffffff>] 0x7fffffffffffffff
[ 1983.414374] Task dump for CPU 5:
[ 1983.417590] swapper/5 R running task 0 0 1 0x00000000
[ 1983.424626] Call trace:
[ 1983.427060] [<ffff000008084fb0>] __switch_to+0x90/0xa8
[ 1983.432185] [<7fffffffffffffff>] 0x7fffffffffffffff
[ 1983.437049] Task dump for CPU 6:
[ 1983.440265] swapper/6 R running task 0 0 1 0x00000000

<snip> Mixture of the two forms above for all the cpus

[ 1984.568746] Call trace:
[ 1984.571180] [<ffff000008084fb0>] __switch_to+0x90/0xa8
[ 1984.576305] [<ffff000008ea9000>] page_wait_table+0x1280/0x1800
[ 1984.582124] Task dump for CPU 62:
[ 1984.585426] swapper/62 R running task 0 0 1 0x00000000
[ 1984.592461] Call trace:
[ 1984.594895] [<ffff000008084fb0>] __switch_to+0x90/0xa8
[ 1984.600020] [<ffff000008ea9000>] page_wait_table+0x1280/0x1800
[ 1984.605839] Task dump for CPU 63:
[ 1984.609141] swapper/63 R running task 0 0 1 0x00000000
[ 1984.616176] Call trace:
[ 1984.618611] [<ffff000008084fb0>] __switch_to+0x90/0xa8
[ 1984.623735] [<7fffffffffffffff>] 0x7fffffffffffffff
[ 1984.628602] rcu_preempt kthread starved for 5663 jiffies! g1566 c1565 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
[ 1984.638153] rcu_preempt S 0 9 2 0x00000000
[ 1984.643626] Call trace:
[ 1984.646059] [<ffff000008084fb0>] __switch_to+0x90/0xa8
[ 1984.651189] [<ffff000008962274>] __schedule+0x19c/0x5d8
[ 1984.656400] [<ffff0000089626e8>] schedule+0x38/0xa0
[ 1984.661266] [<ffff000008965844>] schedule_timeout+0x124/0x218
[ 1984.667002] [<ffff000008121424>] rcu_gp_kthread+0x4fc/0x748
[ 1984.672564] [<ffff0000080df0b4>] kthread+0xfc/0x128
[ 1984.677429] [<ffff000008082ec0>] ret_from_fork+0x10/0x50


Reducing the RCU CPU stall timeout makes it happen more often,
but we are seeing even with the default value of 24 seconds.

Tends to occur after a period or relatively low usage, but has
also been seen mid way through performance tests.

This was not seen with v4.12 so a bisection run later lead to
commit 05a4a9527 (kernel/watchdog: split up config options).

Which was odd until we discovered that a side effect of this patch
was to change whether the softlockup detector was enabled or not in
the arm64 defconfig.

On 4.13-rc2 enabling the softlockup detector indeed stopped us
seeing the rcu issue. Disabling the equivalent on 4.12 made the
issue occur there as well.

Clearly the softlockup detector results in a thread on every cpu,
which might be related but beyond that we are still looking into
the issue.

So the obvious question is whether anyone else is seeing this as
it might help us to focus in on where to look!

In the meantime we'll carry on digging.

Thanks,

Jonathan

p.s. As a more general question, do we want to have the
soft lockup detector enabledon arm64 by default?
Nicholas Piggin
2017-07-25 12:26:54 UTC
Permalink
On Tue, 25 Jul 2017 19:32:10 +0800
Post by Jonathan Cameron
Hi All,
We observed a regression on our d05 boards (but curiously not
the fairly similar but single socket / smaller core count
d03), initially seen with linux-next prior to the merge window
and still present in v4.13-rc2.
[ 1982.965021] 2-...: (10 GPs behind) idle=1d4/0/0 softirq=306/306 fqs=0
[ 1982.971624] 3-...: (2 GPs behind) idle=700/0/0 softirq=307/307 fqs=0
[ 1982.978139] 4-...: (20 GPs behind) idle=9f4/0/0 softirq=651/652 fqs=0
[ 1982.984740] 5-...: (18 GPs behind) idle=a78/0/0 softirq=369/371 fqs=0
[ 1982.991342] 6-...: (26 GPs behind) idle=e5c/0/0 softirq=217/219 fqs=0
[ 1982.997944] 7-...: (1438 GPs behind) idle=eb4/0/0 softirq=260/260 fqs=0
[ 1983.004719] 8-...: (18 GPs behind) idle=830/0/0 softirq=1609/1609 fqs=0
[ 1983.011494] 9-...: (18 GPs behind) idle=e9c/0/0 softirq=242/242 fqs=0
[ 1983.018095] 10-...: (1434 GPs behind) idle=ca0/0/0 softirq=210/212 fqs=0
[ 1983.024957] 11-...: (1106 GPs behind) idle=ee0/0/0 softirq=188/191 fqs=0
[ 1983.031819] 12-...: (1636 GPs behind) idle=c58/0/0 softirq=215/216 fqs=0
[ 1983.038680] 13-...: (1114 GPs behind) idle=c20/0/0 softirq=170/170 fqs=0
[ 1983.045542] 14-...: (1106 GPs behind) idle=d90/0/0 softirq=176/178 fqs=0
[ 1983.052403] 15-...: (1858 GPs behind) idle=900/0/0 softirq=184/185 fqs=0
[ 1983.059266] 16-...: (1621 GPs behind) idle=f04/0/0 softirq=204/206 fqs=0
[ 1983.066127] 17-...: (1433 GPs behind) idle=d30/0/0 softirq=202/202 fqs=0
[ 1983.072988] 18-...: (18 GPs behind) idle=2d4/0/0 softirq=218/220 fqs=0
[ 1983.079676] 19-...: (19 GPs behind) idle=bbc/0/0 softirq=178/180 fqs=0
[ 1983.086364] 20-...: (0 ticks this GP) idle=ee0/0/0 softirq=231/231 fqs=0
[ 1983.093226] 21-...: (4 GPs behind) idle=140/0/0 softirq=208/208 fqs=0
[ 1983.099827] 22-...: (5 GPs behind) idle=100/0/0 softirq=186/188 fqs=0
[ 1983.106428] 23-...: (1635 GPs behind) idle=fd4/0/0 softirq=1220/1221 fqs=0
[ 1983.113463] 24-...: (1112 GPs behind) idle=ca8/0/0 softirq=231/233 fqs=0
[ 1983.120325] 25-...: (1637 GPs behind) idle=9c4/0/0 softirq=164/166 fqs=0
[ 1983.127187] 27-...: (0 ticks this GP) idle=b08/0/0 softirq=182/182 fqs=0
[ 1983.134048] 28-...: (1110 GPs behind) idle=d28/0/0 softirq=179/181 fqs=0
[ 1983.140909] 29-...: (8 GPs behind) idle=1dc/0/0 softirq=196/198 fqs=0
[ 1983.147511] 31-...: (1434 GPs behind) idle=74c/0/0 softirq=160/161 fqs=0
[ 1983.154373] 32-...: (1432 GPs behind) idle=7d4/0/0 softirq=164/164 fqs=0
[ 1983.161234] 33-...: (1632 GPs behind) idle=4dc/0/0 softirq=130/132 fqs=0
[ 1983.168096] 34-...: (57 GPs behind) idle=3b0/0/0 softirq=411/411 fqs=0
[ 1983.174784] 35-...: (1599 GPs behind) idle=8a0/0/0 softirq=177/179 fqs=0
[ 1983.181646] 36-...: (1603 GPs behind) idle=520/0/0 softirq=132/134 fqs=0
[ 1983.188507] 37-...: (8 GPs behind) idle=02c/0/0 softirq=176/178 fqs=0
[ 1983.195108] 38-...: (1442 GPs behind) idle=d8c/0/0 softirq=3189/3190 fqs=0
[ 1983.202144] 39-...: (1431 GPs behind) idle=444/0/0 softirq=117/117 fqs=0
[ 1983.209005] 40-...: (4 GPs behind) idle=688/0/0 softirq=134/136 fqs=0
[ 1983.215606] 41-...: (1599 GPs behind) idle=554/0/0 softirq=2707/2711 fqs=0
[ 1983.222642] 42-...: (1430 GPs behind) idle=15c/0/0 softirq=110/111 fqs=0
[ 1983.229503] 43-...: (4 GPs behind) idle=054/0/0 softirq=101/103 fqs=0
[ 1983.236104] 46-...: (1117 GPs behind) idle=558/0/0 softirq=251/253 fqs=0
[ 1983.242966] 47-...: (1118 GPs behind) idle=5f0/0/0 softirq=110/112 fqs=0
[ 1983.249827] 48-...: (1621 GPs behind) idle=ef4/0/0 softirq=241/242 fqs=0
[ 1983.256689] 49-...: (1648 GPs behind) idle=92c/0/0 softirq=207/208 fqs=0
[ 1983.263550] 52-...: (1439 GPs behind) idle=e40/0/0 softirq=261/263 fqs=0
[ 1983.270412] 54-...: (1434 GPs behind) idle=650/0/0 softirq=258/260 fqs=0
[ 1983.277273] 55-...: (1646 GPs behind) idle=5e0/0/0 softirq=178/178 fqs=0
[ 1983.284135] 56-...: (1646 GPs behind) idle=800/0/0 softirq=249/249 fqs=0
[ 1983.290996] 57-...: (1599 GPs behind) idle=c48/0/0 softirq=222/224 fqs=0
[ 1983.297858] 58-...: (1648 GPs behind) idle=3e8/0/0 softirq=235/235 fqs=0
[ 1983.304719] 59-...: (1434 GPs behind) idle=720/0/0 softirq=201/203 fqs=0
[ 1983.311581] 60-...: (1647 GPs behind) idle=c80/0/0 softirq=250/250 fqs=0
[ 1983.318443] 61-...: (1598 GPs behind) idle=b18/0/0 softirq=208/208 fqs=0
[ 1983.325304] 62-...: (1112 GPs behind) idle=0a4/0/0 softirq=620/620 fqs=0
[ 1983.332166] 63-...: (1109 GPs behind) idle=4b0/0/0 softirq=187/188 fqs=0
[ 1983.339026] (detected by 44, t=5335 jiffies, g=1566, c=1565, q=220)
[ 1983.348587] swapper/2 R running task 0 0 1 0x00000000
[ 1983.358072] [<ffff000008084fb0>] __switch_to+0x90/0xa8
[ 1983.363199] [<7fffffffffffffff>] 0x7fffffffffffffff
[ 1983.371278] swapper/3 R running task 0 0 1 0x00000000
[ 1983.380750] [<ffff000008084fb0>] __switch_to+0x90/0xa8
[ 1983.385881] [<ffff000008ea9000>] page_wait_table+0x1280/0x1800
[ 1983.394915] swapper/4 R running task 0 0 1 0x00000000
[ 1983.404386] [<ffff000008084fb0>] __switch_to+0x90/0xa8
[ 1983.409510] [<7fffffffffffffff>] 0x7fffffffffffffff
[ 1983.417590] swapper/5 R running task 0 0 1 0x00000000
[ 1983.427060] [<ffff000008084fb0>] __switch_to+0x90/0xa8
[ 1983.432185] [<7fffffffffffffff>] 0x7fffffffffffffff
[ 1983.440265] swapper/6 R running task 0 0 1 0x00000000
<snip> Mixture of the two forms above for all the cpus
[ 1984.571180] [<ffff000008084fb0>] __switch_to+0x90/0xa8
[ 1984.576305] [<ffff000008ea9000>] page_wait_table+0x1280/0x1800
[ 1984.585426] swapper/62 R running task 0 0 1 0x00000000
[ 1984.594895] [<ffff000008084fb0>] __switch_to+0x90/0xa8
[ 1984.600020] [<ffff000008ea9000>] page_wait_table+0x1280/0x1800
[ 1984.609141] swapper/63 R running task 0 0 1 0x00000000
[ 1984.618611] [<ffff000008084fb0>] __switch_to+0x90/0xa8
[ 1984.623735] [<7fffffffffffffff>] 0x7fffffffffffffff
[ 1984.628602] rcu_preempt kthread starved for 5663 jiffies! g1566 c1565 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
[ 1984.638153] rcu_preempt S 0 9 2 0x00000000
[ 1984.646059] [<ffff000008084fb0>] __switch_to+0x90/0xa8
[ 1984.651189] [<ffff000008962274>] __schedule+0x19c/0x5d8
[ 1984.656400] [<ffff0000089626e8>] schedule+0x38/0xa0
[ 1984.661266] [<ffff000008965844>] schedule_timeout+0x124/0x218
[ 1984.667002] [<ffff000008121424>] rcu_gp_kthread+0x4fc/0x748
[ 1984.672564] [<ffff0000080df0b4>] kthread+0xfc/0x128
[ 1984.677429] [<ffff000008082ec0>] ret_from_fork+0x10/0x50
Reducing the RCU CPU stall timeout makes it happen more often,
but we are seeing even with the default value of 24 seconds.
Tends to occur after a period or relatively low usage, but has
also been seen mid way through performance tests.
This was not seen with v4.12 so a bisection run later lead to
commit 05a4a9527 (kernel/watchdog: split up config options).
Which was odd until we discovered that a side effect of this patch
was to change whether the softlockup detector was enabled or not in
the arm64 defconfig.
On 4.13-rc2 enabling the softlockup detector indeed stopped us
seeing the rcu issue. Disabling the equivalent on 4.12 made the
issue occur there as well.
Clearly the softlockup detector results in a thread on every cpu,
which might be related but beyond that we are still looking into
the issue.
So the obvious question is whether anyone else is seeing this as
it might help us to focus in on where to look!
Huh. Something similar has been seen very intermittently on powerpc
as well. We couldn't reproduce it reliably to bisect it already, so
this is a good help.

http://marc.info/?l=linuxppc-embedded&m=149872815523646&w=2

It looks like the watchdog patch has a similar effect on powerpc in
that it stops enabling the softlockup detector by default. Haven't
confirmed, but it looks like the same thing.

A bug in RCU stall detection?
Post by Jonathan Cameron
In the meantime we'll carry on digging.
Thanks,
Jonathan
p.s. As a more general question, do we want to have the
soft lockup detector enabledon arm64 by default?
I've cc'ed Don. My patch should not have changed defconfigs, I
should have been more careful with that.

Thanks,
Nick
Paul E. McKenney
2017-07-25 13:46:26 UTC
Permalink
Post by Nicholas Piggin
On Tue, 25 Jul 2017 19:32:10 +0800
Post by Jonathan Cameron
Hi All,
We observed a regression on our d05 boards (but curiously not
the fairly similar but single socket / smaller core count
d03), initially seen with linux-next prior to the merge window
and still present in v4.13-rc2.
Adding Dave Miller and the ***@vger.kernel.org email on CC, as
they have been seeing something similar, and you might well have saved
them the trouble of bisecting.

[ . . . ]
Post by Nicholas Piggin
Post by Jonathan Cameron
[ 1984.628602] rcu_preempt kthread starved for 5663 jiffies! g1566 c1565 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
This is the cause from an RCU perspective. You had a lot of idle CPUs,
and RCU is not permitted to disturb them -- the battery-powered embedded
guys get very annoyed by that sort of thing. What happens instead is
that each CPU updates a per-CPU state variable when entering or exiting
idle, and the grace-period kthread ("rcu_preempt kthread" in the above
message) checks these state variables, and if when sees an idle CPU,
it reports a quiescent state on that CPU's behalf.

But the grace-period kthread can only do this work if it gets a chance
to run. And the message above says that this kthread hasn't had a chance
to run for a full 5,663 jiffies. For completeness, the "g1566 c1565"
says that grace period #1566 is in progress, the "f0x0" says that no one
is needing another grace period #1567. The "RCU_GP_WAIT_FQS(3)" says
that the grace-period kthread has fully initialized the current grace
period and is sleeping for a few jiffies waiting to scan for idle tasks.
Finally, the "->state=0x1" says that the grace-period kthread is in
TASK_INTERRUPTIBLE state, in other words, still sleeping.

So my first question is "What did commit 05a4a9527 (kernel/watchdog:
split up config options) do to prevent the grace-period kthread from
getting a chance to run?" I must confess that I don't see anything
obvious in that commit, so my second question is "Are we sure that
reverting this commit makes the problem go away?" and my third is "Is
this an intermittent problem that led to a false bisection?"

[ . . . ]
Post by Nicholas Piggin
Post by Jonathan Cameron
Reducing the RCU CPU stall timeout makes it happen more often,
but we are seeing even with the default value of 24 seconds.
Tends to occur after a period or relatively low usage, but has
also been seen mid way through performance tests.
This was not seen with v4.12 so a bisection run later lead to
commit 05a4a9527 (kernel/watchdog: split up config options).
Which was odd until we discovered that a side effect of this patch
was to change whether the softlockup detector was enabled or not in
the arm64 defconfig.
On 4.13-rc2 enabling the softlockup detector indeed stopped us
seeing the rcu issue. Disabling the equivalent on 4.12 made the
issue occur there as well.
Clearly the softlockup detector results in a thread on every cpu,
which might be related but beyond that we are still looking into
the issue.
So the obvious question is whether anyone else is seeing this as
it might help us to focus in on where to look!
Huh. Something similar has been seen very intermittently on powerpc
as well. We couldn't reproduce it reliably to bisect it already, so
this is a good help.
http://marc.info/?l=linuxppc-embedded&m=149872815523646&w=2
It looks like the watchdog patch has a similar effect on powerpc in
that it stops enabling the softlockup detector by default. Haven't
confirmed, but it looks like the same thing.
A bug in RCU stall detection?
Well, if I am expected to make grace periods complete when my grace-period
kthreads aren't getting any CPU time, I will have to make some substantial
changes. ;-)

One possibility is that the timer isn't firing and another is that the
timer's wakeup is being lost somehow.

So another thing to try is to boot with rcutree.rcu_kick_kthreads=1.
This will cause RCU to do redundant wakeups on the grace-period kthread
if the grace period is moving slowly. This is of course a crude hack,
which is why this boot parameter will also cause a splat if it ever has
to do anything.

Does this help at all?

Thanx, Paul
Post by Nicholas Piggin
Post by Jonathan Cameron
In the meantime we'll carry on digging.
Thanks,
Jonathan
p.s. As a more general question, do we want to have the
soft lockup detector enabledon arm64 by default?
I've cc'ed Don. My patch should not have changed defconfigs, I
should have been more careful with that.
Thanks,
Nick
Jonathan Cameron
2017-07-25 14:42:45 UTC
Permalink
On Tue, 25 Jul 2017 06:46:26 -0700
Post by Paul E. McKenney
Post by Nicholas Piggin
On Tue, 25 Jul 2017 19:32:10 +0800
Post by Jonathan Cameron
Hi All,
We observed a regression on our d05 boards (but curiously not
the fairly similar but single socket / smaller core count
d03), initially seen with linux-next prior to the merge window
and still present in v4.13-rc2.
they have been seeing something similar, and you might well have saved
them the trouble of bisecting.
[ . . . ]
Post by Nicholas Piggin
Post by Jonathan Cameron
[ 1984.628602] rcu_preempt kthread starved for 5663 jiffies! g1566 c1565 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
This is the cause from an RCU perspective. You had a lot of idle CPUs,
and RCU is not permitted to disturb them -- the battery-powered embedded
guys get very annoyed by that sort of thing. What happens instead is
that each CPU updates a per-CPU state variable when entering or exiting
idle, and the grace-period kthread ("rcu_preempt kthread" in the above
message) checks these state variables, and if when sees an idle CPU,
it reports a quiescent state on that CPU's behalf.
But the grace-period kthread can only do this work if it gets a chance
to run. And the message above says that this kthread hasn't had a chance
to run for a full 5,663 jiffies. For completeness, the "g1566 c1565"
says that grace period #1566 is in progress, the "f0x0" says that no one
is needing another grace period #1567. The "RCU_GP_WAIT_FQS(3)" says
that the grace-period kthread has fully initialized the current grace
period and is sleeping for a few jiffies waiting to scan for idle tasks.
Finally, the "->state=0x1" says that the grace-period kthread is in
TASK_INTERRUPTIBLE state, in other words, still sleeping.
Thanks for the explanation!
Post by Paul E. McKenney
split up config options) do to prevent the grace-period kthread from
getting a chance to run?"
As far as we can tell it was a side effect of that patch.

The real cause is that patch changed the result of defconfigs to stop running
the softlockup detector - now CONFIG_SOFTLOCKUP_DETECTOR

Enabling that on 4.13-rc2 (and presumably everything in between)
means we don't see the problem any more.
Post by Paul E. McKenney
I must confess that I don't see anything
obvious in that commit, so my second question is "Are we sure that
reverting this commit makes the problem go away?"
Simply enabling CONFIG_SOFTLOCKUP_DETECTOR seems to make it go away.
That detector fires up a thread on every cpu, which may be relevant.
Post by Paul E. McKenney
and my third is "Is
this an intermittent problem that led to a false bisection?"
Whilst it is a bit slow to occur, we verified with long runs on either
side of that patch and since with the option enabled on latest mainline.

Also can cause the issue before that patch by disabling the previous
relevant option on 4.12.
Post by Paul E. McKenney
[ . . . ]
Post by Nicholas Piggin
Post by Jonathan Cameron
Reducing the RCU CPU stall timeout makes it happen more often,
but we are seeing even with the default value of 24 seconds.
Tends to occur after a period or relatively low usage, but has
also been seen mid way through performance tests.
This was not seen with v4.12 so a bisection run later lead to
commit 05a4a9527 (kernel/watchdog: split up config options).
Which was odd until we discovered that a side effect of this patch
was to change whether the softlockup detector was enabled or not in
the arm64 defconfig.
On 4.13-rc2 enabling the softlockup detector indeed stopped us
seeing the rcu issue. Disabling the equivalent on 4.12 made the
issue occur there as well.
Clearly the softlockup detector results in a thread on every cpu,
which might be related but beyond that we are still looking into
the issue.
So the obvious question is whether anyone else is seeing this as
it might help us to focus in on where to look!
Huh. Something similar has been seen very intermittently on powerpc
as well. We couldn't reproduce it reliably to bisect it already, so
this is a good help.
http://marc.info/?l=linuxppc-embedded&m=149872815523646&w=2
It looks like the watchdog patch has a similar effect on powerpc in
that it stops enabling the softlockup detector by default. Haven't
confirmed, but it looks like the same thing.
A bug in RCU stall detection?
Well, if I am expected to make grace periods complete when my grace-period
kthreads aren't getting any CPU time, I will have to make some substantial
changes. ;-)
One possibility is that the timer isn't firing and another is that the
timer's wakeup is being lost somehow.
So another thing to try is to boot with rcutree.rcu_kick_kthreads=1.
This will cause RCU to do redundant wakeups on the grace-period kthread
if the grace period is moving slowly. This is of course a crude hack,
which is why this boot parameter will also cause a splat if it ever has
to do anything.
Running that now will let you know how it goes. Not seen the issue yet
but might just be a 'lucky' run - will give it a few hours.

Jonathan
Post by Paul E. McKenney
Does this help at all?
Thanx, Paul
Post by Nicholas Piggin
Post by Jonathan Cameron
In the meantime we'll carry on digging.
Thanks,
Jonathan
p.s. As a more general question, do we want to have the
soft lockup detector enabledon arm64 by default?
I've cc'ed Don. My patch should not have changed defconfigs, I
should have been more careful with that.
Thanks,
Nick
Paul E. McKenney
2017-07-25 15:12:45 UTC
Permalink
Post by Jonathan Cameron
On Tue, 25 Jul 2017 06:46:26 -0700
Post by Paul E. McKenney
Post by Nicholas Piggin
On Tue, 25 Jul 2017 19:32:10 +0800
Post by Jonathan Cameron
Hi All,
We observed a regression on our d05 boards (but curiously not
the fairly similar but single socket / smaller core count
d03), initially seen with linux-next prior to the merge window
and still present in v4.13-rc2.
they have been seeing something similar, and you might well have saved
them the trouble of bisecting.
[ . . . ]
Post by Nicholas Piggin
Post by Jonathan Cameron
[ 1984.628602] rcu_preempt kthread starved for 5663 jiffies! g1566 c1565 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
This is the cause from an RCU perspective. You had a lot of idle CPUs,
and RCU is not permitted to disturb them -- the battery-powered embedded
guys get very annoyed by that sort of thing. What happens instead is
that each CPU updates a per-CPU state variable when entering or exiting
idle, and the grace-period kthread ("rcu_preempt kthread" in the above
message) checks these state variables, and if when sees an idle CPU,
it reports a quiescent state on that CPU's behalf.
But the grace-period kthread can only do this work if it gets a chance
to run. And the message above says that this kthread hasn't had a chance
to run for a full 5,663 jiffies. For completeness, the "g1566 c1565"
says that grace period #1566 is in progress, the "f0x0" says that no one
is needing another grace period #1567. The "RCU_GP_WAIT_FQS(3)" says
that the grace-period kthread has fully initialized the current grace
period and is sleeping for a few jiffies waiting to scan for idle tasks.
Finally, the "->state=0x1" says that the grace-period kthread is in
TASK_INTERRUPTIBLE state, in other words, still sleeping.
Thanks for the explanation!
Post by Paul E. McKenney
split up config options) do to prevent the grace-period kthread from
getting a chance to run?"
As far as we can tell it was a side effect of that patch.
The real cause is that patch changed the result of defconfigs to stop running
the softlockup detector - now CONFIG_SOFTLOCKUP_DETECTOR
Enabling that on 4.13-rc2 (and presumably everything in between)
means we don't see the problem any more.
Post by Paul E. McKenney
I must confess that I don't see anything
obvious in that commit, so my second question is "Are we sure that
reverting this commit makes the problem go away?"
Simply enabling CONFIG_SOFTLOCKUP_DETECTOR seems to make it go away.
That detector fires up a thread on every cpu, which may be relevant.
Interesting... Why should it be necessary to fire up a thread on every
CPU in order to make sure that RCU's grace-period kthreads get some
CPU time? Especially give how many idle CPUs you had on your system.

So I have to ask if there is some other bug that the softlockup detector
is masking.
Post by Jonathan Cameron
Post by Paul E. McKenney
and my third is "Is
this an intermittent problem that led to a false bisection?"
Whilst it is a bit slow to occur, we verified with long runs on either
side of that patch and since with the option enabled on latest mainline.
Also can cause the issue before that patch by disabling the previous
relevant option on 4.12.
OK, thank you -- hard to argue with that! ;-)

Except that I am still puzzled as to why per-CPU softlockup threads
are needed for RCU's kthreads to get their wakeups. We really should
be able to disable softlockup and still have kthreads get wakeups and
access to CPU, after all.
Post by Jonathan Cameron
Post by Paul E. McKenney
[ . . . ]
Post by Nicholas Piggin
Post by Jonathan Cameron
Reducing the RCU CPU stall timeout makes it happen more often,
but we are seeing even with the default value of 24 seconds.
Tends to occur after a period or relatively low usage, but has
also been seen mid way through performance tests.
This was not seen with v4.12 so a bisection run later lead to
commit 05a4a9527 (kernel/watchdog: split up config options).
Which was odd until we discovered that a side effect of this patch
was to change whether the softlockup detector was enabled or not in
the arm64 defconfig.
On 4.13-rc2 enabling the softlockup detector indeed stopped us
seeing the rcu issue. Disabling the equivalent on 4.12 made the
issue occur there as well.
Clearly the softlockup detector results in a thread on every cpu,
which might be related but beyond that we are still looking into
the issue.
So the obvious question is whether anyone else is seeing this as
it might help us to focus in on where to look!
Huh. Something similar has been seen very intermittently on powerpc
as well. We couldn't reproduce it reliably to bisect it already, so
this is a good help.
http://marc.info/?l=linuxppc-embedded&m=149872815523646&w=2
It looks like the watchdog patch has a similar effect on powerpc in
that it stops enabling the softlockup detector by default. Haven't
confirmed, but it looks like the same thing.
A bug in RCU stall detection?
Well, if I am expected to make grace periods complete when my grace-period
kthreads aren't getting any CPU time, I will have to make some substantial
changes. ;-)
One possibility is that the timer isn't firing and another is that the
timer's wakeup is being lost somehow.
So another thing to try is to boot with rcutree.rcu_kick_kthreads=1.
This will cause RCU to do redundant wakeups on the grace-period kthread
if the grace period is moving slowly. This is of course a crude hack,
which is why this boot parameter will also cause a splat if it ever has
to do anything.
Running that now will let you know how it goes. Not seen the issue yet
but might just be a 'lucky' run - will give it a few hours.
Thank you very much!

Thanx, Paul
Post by Jonathan Cameron
Jonathan
Post by Paul E. McKenney
Does this help at all?
Thanx, Paul
Post by Nicholas Piggin
Post by Jonathan Cameron
In the meantime we'll carry on digging.
Thanks,
Jonathan
p.s. As a more general question, do we want to have the
soft lockup detector enabledon arm64 by default?
I've cc'ed Don. My patch should not have changed defconfigs, I
should have been more careful with that.
Thanks,
Nick
Jonathan Cameron
2017-07-25 16:52:07 UTC
Permalink
On Tue, 25 Jul 2017 08:12:45 -0700
Post by Paul E. McKenney
Post by Jonathan Cameron
On Tue, 25 Jul 2017 06:46:26 -0700
Post by Paul E. McKenney
Post by Nicholas Piggin
On Tue, 25 Jul 2017 19:32:10 +0800
Post by Jonathan Cameron
Hi All,
We observed a regression on our d05 boards (but curiously not
the fairly similar but single socket / smaller core count
d03), initially seen with linux-next prior to the merge window
and still present in v4.13-rc2.
they have been seeing something similar, and you might well have saved
them the trouble of bisecting.
[ . . . ]
Post by Nicholas Piggin
Post by Jonathan Cameron
[ 1984.628602] rcu_preempt kthread starved for 5663 jiffies! g1566 c1565 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
This is the cause from an RCU perspective. You had a lot of idle CPUs,
and RCU is not permitted to disturb them -- the battery-powered embedded
guys get very annoyed by that sort of thing. What happens instead is
that each CPU updates a per-CPU state variable when entering or exiting
idle, and the grace-period kthread ("rcu_preempt kthread" in the above
message) checks these state variables, and if when sees an idle CPU,
it reports a quiescent state on that CPU's behalf.
But the grace-period kthread can only do this work if it gets a chance
to run. And the message above says that this kthread hasn't had a chance
to run for a full 5,663 jiffies. For completeness, the "g1566 c1565"
says that grace period #1566 is in progress, the "f0x0" says that no one
is needing another grace period #1567. The "RCU_GP_WAIT_FQS(3)" says
that the grace-period kthread has fully initialized the current grace
period and is sleeping for a few jiffies waiting to scan for idle tasks.
Finally, the "->state=0x1" says that the grace-period kthread is in
TASK_INTERRUPTIBLE state, in other words, still sleeping.
Thanks for the explanation!
Post by Paul E. McKenney
split up config options) do to prevent the grace-period kthread from
getting a chance to run?"
As far as we can tell it was a side effect of that patch.
The real cause is that patch changed the result of defconfigs to stop running
the softlockup detector - now CONFIG_SOFTLOCKUP_DETECTOR
Enabling that on 4.13-rc2 (and presumably everything in between)
means we don't see the problem any more.
Post by Paul E. McKenney
I must confess that I don't see anything
obvious in that commit, so my second question is "Are we sure that
reverting this commit makes the problem go away?"
Simply enabling CONFIG_SOFTLOCKUP_DETECTOR seems to make it go away.
That detector fires up a thread on every cpu, which may be relevant.
Interesting... Why should it be necessary to fire up a thread on every
CPU in order to make sure that RCU's grace-period kthreads get some
CPU time? Especially give how many idle CPUs you had on your system.
So I have to ask if there is some other bug that the softlockup detector
is masking.
I am thinking the same. We can try going back further than 4.12 tomorrow
(we think we can realistically go back to 4.8 and possibly 4.6
with this board)
Post by Paul E. McKenney
Post by Jonathan Cameron
Post by Paul E. McKenney
and my third is "Is
this an intermittent problem that led to a false bisection?"
Whilst it is a bit slow to occur, we verified with long runs on either
side of that patch and since with the option enabled on latest mainline.
Also can cause the issue before that patch by disabling the previous
relevant option on 4.12.
OK, thank you -- hard to argue with that! ;-)
We thought it was a pretty unlikely a bisection result
hence hammered it thoroughly ;)
Post by Paul E. McKenney
Except that I am still puzzled as to why per-CPU softlockup threads
are needed for RCU's kthreads to get their wakeups. We really should
be able to disable softlockup and still have kthreads get wakeups and
access to CPU, after all.
Post by Jonathan Cameron
Post by Paul E. McKenney
[ . . . ]
Post by Nicholas Piggin
Post by Jonathan Cameron
Reducing the RCU CPU stall timeout makes it happen more often,
but we are seeing even with the default value of 24 seconds.
Tends to occur after a period or relatively low usage, but has
also been seen mid way through performance tests.
This was not seen with v4.12 so a bisection run later lead to
commit 05a4a9527 (kernel/watchdog: split up config options).
Which was odd until we discovered that a side effect of this patch
was to change whether the softlockup detector was enabled or not in
the arm64 defconfig.
On 4.13-rc2 enabling the softlockup detector indeed stopped us
seeing the rcu issue. Disabling the equivalent on 4.12 made the
issue occur there as well.
Clearly the softlockup detector results in a thread on every cpu,
which might be related but beyond that we are still looking into
the issue.
So the obvious question is whether anyone else is seeing this as
it might help us to focus in on where to look!
Huh. Something similar has been seen very intermittently on powerpc
as well. We couldn't reproduce it reliably to bisect it already, so
this is a good help.
http://marc.info/?l=linuxppc-embedded&m=149872815523646&w=2
It looks like the watchdog patch has a similar effect on powerpc in
that it stops enabling the softlockup detector by default. Haven't
confirmed, but it looks like the same thing.
A bug in RCU stall detection?
Well, if I am expected to make grace periods complete when my grace-period
kthreads aren't getting any CPU time, I will have to make some substantial
changes. ;-)
One possibility is that the timer isn't firing and another is that the
timer's wakeup is being lost somehow.
So another thing to try is to boot with rcutree.rcu_kick_kthreads=1.
This will cause RCU to do redundant wakeups on the grace-period kthread
if the grace period is moving slowly. This is of course a crude hack,
which is why this boot parameter will also cause a splat if it ever has
to do anything.
Running that now will let you know how it goes. Not seen the issue yet
but might just be a 'lucky' run - will give it a few hours.
Thank you very much!
So far it's not actually shown any splats. I did a quick drop back to running
without the parameter and got the original splat in less that 5 minutes.

I've spun up another board with this parameter set as well and will leave
them both running overnight to see if anything interesting happens.

Thanks for your help with this,

Jonathan
Post by Paul E. McKenney
Thanx, Paul
Post by Jonathan Cameron
Jonathan
Post by Paul E. McKenney
Does this help at all?
Thanx, Paul
Post by Nicholas Piggin
Post by Jonathan Cameron
In the meantime we'll carry on digging.
Thanks,
Jonathan
p.s. As a more general question, do we want to have the
soft lockup detector enabledon arm64 by default?
I've cc'ed Don. My patch should not have changed defconfigs, I
should have been more careful with that.
Thanks,
Nick
David Miller
2017-07-25 21:10:29 UTC
Permalink
From: Jonathan Cameron <***@huawei.com>
Date: Wed, 26 Jul 2017 00:52:07 +0800
Post by Jonathan Cameron
On Tue, 25 Jul 2017 08:12:45 -0700
Post by Paul E. McKenney
Post by Jonathan Cameron
On Tue, 25 Jul 2017 06:46:26 -0700
Post by Paul E. McKenney
Post by Nicholas Piggin
On Tue, 25 Jul 2017 19:32:10 +0800
Post by Jonathan Cameron
Hi All,
We observed a regression on our d05 boards (but curiously not
the fairly similar but single socket / smaller core count
d03), initially seen with linux-next prior to the merge window
and still present in v4.13-rc2.
they have been seeing something similar, and you might well have saved
them the trouble of bisecting.
[ . . . ]
Post by Nicholas Piggin
Post by Jonathan Cameron
[ 1984.628602] rcu_preempt kthread starved for 5663 jiffies! g1566 c1565 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
This is the cause from an RCU perspective. You had a lot of idle CPUs,
and RCU is not permitted to disturb them -- the battery-powered embedded
guys get very annoyed by that sort of thing. What happens instead is
that each CPU updates a per-CPU state variable when entering or exiting
idle, and the grace-period kthread ("rcu_preempt kthread" in the above
message) checks these state variables, and if when sees an idle CPU,
it reports a quiescent state on that CPU's behalf.
But the grace-period kthread can only do this work if it gets a chance
to run. And the message above says that this kthread hasn't had a chance
to run for a full 5,663 jiffies. For completeness, the "g1566 c1565"
says that grace period #1566 is in progress, the "f0x0" says that no one
is needing another grace period #1567. The "RCU_GP_WAIT_FQS(3)" says
that the grace-period kthread has fully initialized the current grace
period and is sleeping for a few jiffies waiting to scan for idle tasks.
Finally, the "->state=0x1" says that the grace-period kthread is in
TASK_INTERRUPTIBLE state, in other words, still sleeping.
Thanks for the explanation!
Post by Paul E. McKenney
split up config options) do to prevent the grace-period kthread from
getting a chance to run?"
As far as we can tell it was a side effect of that patch.
The real cause is that patch changed the result of defconfigs to stop running
the softlockup detector - now CONFIG_SOFTLOCKUP_DETECTOR
Enabling that on 4.13-rc2 (and presumably everything in between)
means we don't see the problem any more.
Post by Paul E. McKenney
I must confess that I don't see anything
obvious in that commit, so my second question is "Are we sure that
reverting this commit makes the problem go away?"
Simply enabling CONFIG_SOFTLOCKUP_DETECTOR seems to make it go away.
That detector fires up a thread on every cpu, which may be relevant.
Interesting... Why should it be necessary to fire up a thread on every
CPU in order to make sure that RCU's grace-period kthreads get some
CPU time? Especially give how many idle CPUs you had on your system.
So I have to ask if there is some other bug that the softlockup detector
is masking.
I am thinking the same. We can try going back further than 4.12 tomorrow
(we think we can realistically go back to 4.8 and possibly 4.6
with this board)
Just to report, turning softlockup back on fixes things for me on
sparc64 too.

The thing about softlockup is it runs an hrtimer, which seems to run
about every 4 seconds.

So I wonder if this is a NO_HZ problem.
Paul E. McKenney
2017-07-26 03:55:45 UTC
Permalink
Post by David Miller
Date: Wed, 26 Jul 2017 00:52:07 +0800
Post by Jonathan Cameron
On Tue, 25 Jul 2017 08:12:45 -0700
Post by Paul E. McKenney
Post by Jonathan Cameron
On Tue, 25 Jul 2017 06:46:26 -0700
Post by Paul E. McKenney
Post by Nicholas Piggin
On Tue, 25 Jul 2017 19:32:10 +0800
Post by Jonathan Cameron
Hi All,
We observed a regression on our d05 boards (but curiously not
the fairly similar but single socket / smaller core count
d03), initially seen with linux-next prior to the merge window
and still present in v4.13-rc2.
they have been seeing something similar, and you might well have saved
them the trouble of bisecting.
[ . . . ]
Post by Nicholas Piggin
Post by Jonathan Cameron
[ 1984.628602] rcu_preempt kthread starved for 5663 jiffies! g1566 c1565 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
This is the cause from an RCU perspective. You had a lot of idle CPUs,
and RCU is not permitted to disturb them -- the battery-powered embedded
guys get very annoyed by that sort of thing. What happens instead is
that each CPU updates a per-CPU state variable when entering or exiting
idle, and the grace-period kthread ("rcu_preempt kthread" in the above
message) checks these state variables, and if when sees an idle CPU,
it reports a quiescent state on that CPU's behalf.
But the grace-period kthread can only do this work if it gets a chance
to run. And the message above says that this kthread hasn't had a chance
to run for a full 5,663 jiffies. For completeness, the "g1566 c1565"
says that grace period #1566 is in progress, the "f0x0" says that no one
is needing another grace period #1567. The "RCU_GP_WAIT_FQS(3)" says
that the grace-period kthread has fully initialized the current grace
period and is sleeping for a few jiffies waiting to scan for idle tasks.
Finally, the "->state=0x1" says that the grace-period kthread is in
TASK_INTERRUPTIBLE state, in other words, still sleeping.
Thanks for the explanation!
Post by Paul E. McKenney
split up config options) do to prevent the grace-period kthread from
getting a chance to run?"
As far as we can tell it was a side effect of that patch.
The real cause is that patch changed the result of defconfigs to stop running
the softlockup detector - now CONFIG_SOFTLOCKUP_DETECTOR
Enabling that on 4.13-rc2 (and presumably everything in between)
means we don't see the problem any more.
Post by Paul E. McKenney
I must confess that I don't see anything
obvious in that commit, so my second question is "Are we sure that
reverting this commit makes the problem go away?"
Simply enabling CONFIG_SOFTLOCKUP_DETECTOR seems to make it go away.
That detector fires up a thread on every cpu, which may be relevant.
Interesting... Why should it be necessary to fire up a thread on every
CPU in order to make sure that RCU's grace-period kthreads get some
CPU time? Especially give how many idle CPUs you had on your system.
So I have to ask if there is some other bug that the softlockup detector
is masking.
I am thinking the same. We can try going back further than 4.12 tomorrow
(we think we can realistically go back to 4.8 and possibly 4.6
with this board)
Just to report, turning softlockup back on fixes things for me on
sparc64 too.
Very good!
Post by David Miller
The thing about softlockup is it runs an hrtimer, which seems to run
about every 4 seconds.
I could see where that could shake things loose, but I am surprised that
it would be needed. I ran a short run with CONFIG_SOFTLOCKUP_DETECTOR=y
with no trouble, but I will be running a longer test later on.
Post by David Miller
So I wonder if this is a NO_HZ problem.
Might be. My tests run with NO_HZ_FULL=n and NO_HZ_IDLE=y. What are
you running? (Again, my symptoms are slightly different, so I might
be seeing a different bug.)

Thanx, Paul
David Miller
2017-07-26 04:02:33 UTC
Permalink
From: "Paul E. McKenney" <***@linux.vnet.ibm.com>
Date: Tue, 25 Jul 2017 20:55:45 -0700
Post by Paul E. McKenney
Post by David Miller
Just to report, turning softlockup back on fixes things for me on
sparc64 too.
Very good!
Post by David Miller
The thing about softlockup is it runs an hrtimer, which seems to run
about every 4 seconds.
I could see where that could shake things loose, but I am surprised that
it would be needed. I ran a short run with CONFIG_SOFTLOCKUP_DETECTOR=y
with no trouble, but I will be running a longer test later on.
Post by David Miller
So I wonder if this is a NO_HZ problem.
Might be. My tests run with NO_HZ_FULL=n and NO_HZ_IDLE=y. What are
you running? (Again, my symptoms are slightly different, so I might
be seeing a different bug.)
I run with NO_HZ_FULL=n and NO_HZ_IDLE=y, just like you.

To clarify, the symptoms show up with SOFTLOCKUP_DETECTOR disabled.
Paul E. McKenney
2017-07-26 04:12:17 UTC
Permalink
Post by David Miller
Date: Tue, 25 Jul 2017 20:55:45 -0700
Post by Paul E. McKenney
Post by David Miller
Just to report, turning softlockup back on fixes things for me on
sparc64 too.
Very good!
Post by David Miller
The thing about softlockup is it runs an hrtimer, which seems to run
about every 4 seconds.
I could see where that could shake things loose, but I am surprised that
it would be needed. I ran a short run with CONFIG_SOFTLOCKUP_DETECTOR=y
with no trouble, but I will be running a longer test later on.
Post by David Miller
So I wonder if this is a NO_HZ problem.
Might be. My tests run with NO_HZ_FULL=n and NO_HZ_IDLE=y. What are
you running? (Again, my symptoms are slightly different, so I might
be seeing a different bug.)
I run with NO_HZ_FULL=n and NO_HZ_IDLE=y, just like you.
To clarify, the symptoms show up with SOFTLOCKUP_DETECTOR disabled.
Same here -- but my failure case happens fairly rarely, so it will take
some time to gain reasonable confidence that enabling SOFTLOCKUP_DETECTOR
had effect.

But you are right, might be interesting to try NO_HZ_PERIODIC=y
or NO_HZ_FULL=y. So many possible tests, and so little time. ;-)

Thanx, Paul

Paul E. McKenney
2017-07-26 03:53:06 UTC
Permalink
Post by Jonathan Cameron
On Tue, 25 Jul 2017 08:12:45 -0700
Post by Paul E. McKenney
Post by Jonathan Cameron
On Tue, 25 Jul 2017 06:46:26 -0700
Post by Paul E. McKenney
Post by Nicholas Piggin
On Tue, 25 Jul 2017 19:32:10 +0800
Post by Jonathan Cameron
Hi All,
We observed a regression on our d05 boards (but curiously not
the fairly similar but single socket / smaller core count
d03), initially seen with linux-next prior to the merge window
and still present in v4.13-rc2.
they have been seeing something similar, and you might well have saved
them the trouble of bisecting.
[ . . . ]
Post by Nicholas Piggin
Post by Jonathan Cameron
[ 1984.628602] rcu_preempt kthread starved for 5663 jiffies! g1566 c1565 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
This is the cause from an RCU perspective. You had a lot of idle CPUs,
and RCU is not permitted to disturb them -- the battery-powered embedded
guys get very annoyed by that sort of thing. What happens instead is
that each CPU updates a per-CPU state variable when entering or exiting
idle, and the grace-period kthread ("rcu_preempt kthread" in the above
message) checks these state variables, and if when sees an idle CPU,
it reports a quiescent state on that CPU's behalf.
But the grace-period kthread can only do this work if it gets a chance
to run. And the message above says that this kthread hasn't had a chance
to run for a full 5,663 jiffies. For completeness, the "g1566 c1565"
says that grace period #1566 is in progress, the "f0x0" says that no one
is needing another grace period #1567. The "RCU_GP_WAIT_FQS(3)" says
that the grace-period kthread has fully initialized the current grace
period and is sleeping for a few jiffies waiting to scan for idle tasks.
Finally, the "->state=0x1" says that the grace-period kthread is in
TASK_INTERRUPTIBLE state, in other words, still sleeping.
Thanks for the explanation!
Post by Paul E. McKenney
split up config options) do to prevent the grace-period kthread from
getting a chance to run?"
As far as we can tell it was a side effect of that patch.
The real cause is that patch changed the result of defconfigs to stop running
the softlockup detector - now CONFIG_SOFTLOCKUP_DETECTOR
Enabling that on 4.13-rc2 (and presumably everything in between)
means we don't see the problem any more.
Post by Paul E. McKenney
I must confess that I don't see anything
obvious in that commit, so my second question is "Are we sure that
reverting this commit makes the problem go away?"
Simply enabling CONFIG_SOFTLOCKUP_DETECTOR seems to make it go away.
That detector fires up a thread on every cpu, which may be relevant.
Interesting... Why should it be necessary to fire up a thread on every
CPU in order to make sure that RCU's grace-period kthreads get some
CPU time? Especially give how many idle CPUs you had on your system.
So I have to ask if there is some other bug that the softlockup detector
is masking.
I am thinking the same. We can try going back further than 4.12 tomorrow
(we think we can realistically go back to 4.8 and possibly 4.6
with this board)
Looking forward to seeing results!
Post by Jonathan Cameron
Post by Paul E. McKenney
Post by Jonathan Cameron
Post by Paul E. McKenney
and my third is "Is
this an intermittent problem that led to a false bisection?"
Whilst it is a bit slow to occur, we verified with long runs on either
side of that patch and since with the option enabled on latest mainline.
Also can cause the issue before that patch by disabling the previous
relevant option on 4.12.
OK, thank you -- hard to argue with that! ;-)
We thought it was a pretty unlikely a bisection result
hence hammered it thoroughly ;)
Glad that I am not the only paranoid one out here. ;-)
Post by Jonathan Cameron
Post by Paul E. McKenney
Except that I am still puzzled as to why per-CPU softlockup threads
are needed for RCU's kthreads to get their wakeups. We really should
be able to disable softlockup and still have kthreads get wakeups and
access to CPU, after all.
Post by Jonathan Cameron
Post by Paul E. McKenney
[ . . . ]
Post by Nicholas Piggin
Post by Jonathan Cameron
Reducing the RCU CPU stall timeout makes it happen more often,
but we are seeing even with the default value of 24 seconds.
Tends to occur after a period or relatively low usage, but has
also been seen mid way through performance tests.
This was not seen with v4.12 so a bisection run later lead to
commit 05a4a9527 (kernel/watchdog: split up config options).
Which was odd until we discovered that a side effect of this patch
was to change whether the softlockup detector was enabled or not in
the arm64 defconfig.
On 4.13-rc2 enabling the softlockup detector indeed stopped us
seeing the rcu issue. Disabling the equivalent on 4.12 made the
issue occur there as well.
Clearly the softlockup detector results in a thread on every cpu,
which might be related but beyond that we are still looking into
the issue.
So the obvious question is whether anyone else is seeing this as
it might help us to focus in on where to look!
Huh. Something similar has been seen very intermittently on powerpc
as well. We couldn't reproduce it reliably to bisect it already, so
this is a good help.
http://marc.info/?l=linuxppc-embedded&m=149872815523646&w=2
It looks like the watchdog patch has a similar effect on powerpc in
that it stops enabling the softlockup detector by default. Haven't
confirmed, but it looks like the same thing.
A bug in RCU stall detection?
Well, if I am expected to make grace periods complete when my grace-period
kthreads aren't getting any CPU time, I will have to make some substantial
changes. ;-)
One possibility is that the timer isn't firing and another is that the
timer's wakeup is being lost somehow.
So another thing to try is to boot with rcutree.rcu_kick_kthreads=1.
This will cause RCU to do redundant wakeups on the grace-period kthread
if the grace period is moving slowly. This is of course a crude hack,
which is why this boot parameter will also cause a splat if it ever has
to do anything.
Running that now will let you know how it goes. Not seen the issue yet
but might just be a 'lucky' run - will give it a few hours.
Thank you very much!
So far it's not actually shown any splats. I did a quick drop back to running
without the parameter and got the original splat in less that 5 minutes.
That is a bit strange. Sensitive to code position or some such???
Post by Jonathan Cameron
I've spun up another board with this parameter set as well and will leave
them both running overnight to see if anything interesting happens.
Thanks for your help with this,
And thanks to you as well!!!

Thanx, Paul
Post by Jonathan Cameron
Jonathan
Post by Paul E. McKenney
Thanx, Paul
Post by Jonathan Cameron
Jonathan
Post by Paul E. McKenney
Does this help at all?
Thanx, Paul
Post by Nicholas Piggin
Post by Jonathan Cameron
In the meantime we'll carry on digging.
Thanks,
Jonathan
p.s. As a more general question, do we want to have the
soft lockup detector enabledon arm64 by default?
I've cc'ed Don. My patch should not have changed defconfigs, I
should have been more careful with that.
Thanks,
Nick
Loading...