Discussion:
[PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
Will Deacon
2010-05-27 15:11:58 UTC
Permalink
The cpu_relax() macro is often used in the body of busy-wait loops to ensure
that the variable being spun on is re-loaded for each iteration. On the
ARM11MPCore processor [where loads are prioritised over stores], spinning in
such a loop will prevent the write buffer from draining. If a write contained
in the write buffer indirectly affects the variable being spun on, there is a
potential for deadlock. This deadlock is experienced when executing the KGDB
testsuite on an SMP ARM11MPCore configuration.

This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores,
forcing a flushing of the write buffer on SMP systems before the next load
takes place. If the Kernel is not compiled for SMP support, this will expand
to a barrier() as before.

Cc: Russell King - ARM Linux <***@arm.linux.org.uk>
Acked-by: Catalin Marinas <***@arm.com>
Signed-off-by: Will Deacon <***@arm.com>
---

Russell - the discussion on this last month seemed to draw to a close.
Can I submit this to the patch system please, or are there any outstanding
issues?

arch/arm/include/asm/processor.h | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 6a89567..7bed3da 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -91,7 +91,11 @@ extern void release_thread(struct task_struct *);

unsigned long get_wchan(struct task_struct *p);

+#if __LINUX_ARM_ARCH__ == 6
+#define cpu_relax() smp_mb()
+#else
#define cpu_relax() barrier()
+#endif

/*
* Create a new kernel thread
--
1.6.3.3
Shilimkar, Santosh
2010-05-27 15:20:47 UTC
Permalink
Will,
-----Original Message-----
Sent: Thursday, May 27, 2010 8:42 PM
Cc: Russell King - ARM Linux; Will Deacon
Subject: [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
The cpu_relax() macro is often used in the body of busy-wait loops to ensure
that the variable being spun on is re-loaded for each iteration. On the
ARM11MPCore processor [where loads are prioritised over stores], spinning in
such a loop will prevent the write buffer from draining. If a write contained
in the write buffer indirectly affects the variable being spun on, there is a
potential for deadlock. This deadlock is experienced when executing the KGDB
testsuite on an SMP ARM11MPCore configuration.
This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores,
forcing a flushing of the write buffer on SMP systems before the next load
takes place. If the Kernel is not compiled for SMP support, this will expand
to a barrier() as before.
I have missed this thread. Is this not applicable for ARMv7 MP Cores as well??
---
Russell - the discussion on this last month seemed to draw to a close.
Can I submit this to the patch system please, or are there any outstanding
issues?
arch/arm/include/asm/processor.h | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 6a89567..7bed3da 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -91,7 +91,11 @@ extern void release_thread(struct task_struct *);
unsigned long get_wchan(struct task_struct *p);
+#if __LINUX_ARM_ARCH__ == 6
+#define cpu_relax() smp_mb()
+#else
#define cpu_relax() barrier()
+#endif
/*
* Create a new kernel thread
--
1.6.3.3
Regards,
Santosh
Will Deacon
2010-05-27 16:53:33 UTC
Permalink
Hi Santosh,
Post by Shilimkar, Santosh
Post by Will Deacon
This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores,
forcing a flushing of the write buffer on SMP systems before the next load
takes place. If the Kernel is not compiled for SMP support, this will expand
to a barrier() as before.
I have missed this thread. Is this not applicable for ARMv7 MP Cores as well??
You can pick up on some of the old thread here:

http://lists.infradead.org/pipermail/linux-arm-kernel/2010-April/012874.html

Actually, the ChangeLog I wrote for that patch is better, so I'll update
my current one.

To answer your question; this problem only occurs on ARM11MPCore. For ARMv7MP,
stores are guaranteed to become visible eventually [i.e. they can't be held up
indefinitely by a sequence of aggresive loads].

An addition to the ARM ARM [Section A3.8.2] says:

* For an implementation that does not include the Multiprocessing Extensions,
it is IMPLEMENTATION DEFINED whether all writes become globally observed in
their shareability domain in a finite period of time, or whether in some cases,
software must execute a DSB to ensure the visibility of some writes.

* For an implementation that includes the Multiprocessing Extensions, all writes
become globally observed in their shareability domain in a finite period of time.

Will
Shilimkar, Santosh
2010-05-28 04:33:57 UTC
Permalink
-----Original Message-----
Sent: Thursday, May 27, 2010 10:24 PM
Cc: Russell King - ARM Linux
Subject: RE: [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
Hi Santosh,
Post by Shilimkar, Santosh
Post by Will Deacon
This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores,
forcing a flushing of the write buffer on SMP systems before the next load
takes place. If the Kernel is not compiled for SMP support, this will expand
to a barrier() as before.
I have missed this thread. Is this not applicable for ARMv7 MP Cores as well??
http://lists.infradead.org/pipermail/linux-arm-kernel/2010-April/012874.html
Actually, the ChangeLog I wrote for that patch is better, so I'll update
my current one.
To answer your question; this problem only occurs on ARM11MPCore. For ARMv7MP,
stores are guaranteed to become visible eventually [i.e. they can't be held up
indefinitely by a sequence of aggresive loads].
* For an implementation that does not include the Multiprocessing Extensions,
it is IMPLEMENTATION DEFINED whether all writes become globally observed in
their shareability domain in a finite period of time, or whether in some cases,
software must execute a DSB to ensure the visibility of some writes.
* For an implementation that includes the Multiprocessing Extensions, all writes
become globally observed in their shareability domain in a finite period of time.
Thanks. It's clear to me now.

Regards,
Santosh

Loading...