Discussion:
[RFC PATCH 0/7] Pseudo-NMI for arm64 using ICC_PMR_EL1 (GICv3)
(too old to reply)
Daniel Thompson
2015-03-18 14:20:21 UTC
Permalink
This patchset provides a pseudo-NMI for arm64 kernels by reimplementing
the irqflags macros to modify the GIC PMR (the priority mask register is
accessible as a system register on GICv3 and later) rather than the
PSR. The pseudo-NMI changes are support by a prototype implementation of
arch_trigger_all_cpu_backtrace that allows the new code to be exercised.

In addition to the arm64 changes I've bundled in a few patches from
other patchsets to make the patchset self-contained. Of particular note
of the serial break emulation patch which allows ^B^R^K to be used
instead of a serial break to trigger SysRq-L (FVP UART sockets don't
seem to support serial breaks). This makes it easy to run
arch_trigger_all_cpu_backtrace from an IRQ handler (i.e. somewhere with
interrupts masked so we are forced to preempt and take the NMI).

The code works-for-me (tm) but there are currently some pretty serious
limitations.

1. Exercised only on the foundation model with gicv3 support. It has
not been exercised on real silicon or even on the more advanced
ARM models.

2. It has been written without any documentation describing GICv3
architecture (which has not yet been released by ARM). I've been
guessing about the behaviour based on the ARMv8 and GICv2
architecture specs. The code works on the foundation model but
I cannot check that it conforms architecturally.

3. Requires GICv3+ hardware together with firmware support to enable
GICv3 features at EL3. If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is
enabled the kernel will not boot on older hardware. It will be hard
to diagnose because we will crash very early in the boot (i.e.
before the call to start_kernel). Auto-detection might be possible
but the performance and code size cost of adding conditional code to
the irqflags macros probably makes it impractical. As such it may
never be possible to remove this limitation (although it might be
possible to find a way to survive long enough to panic and show the
results on the console).

4. No benchmarking (see #1 above). Unlike the PSR, updates to the PMR
do not self synchronize which requires me to sprinkle isb
instructions fairly liberally. I've been told the cost of isb varies
from almost-free (A53) to somewhat costly (A57) thus we export this
code to reduce kernel performance. However this needs to be
quantified and I am currently unable to do this. I'd really like to
but don't have any suitable hardware.

5. There is no code in el1_irq to detect NMI and switch from IRQ to NMI
handling. This means all the irq handling machinary is re-entered in
order to handle the NMI. This not safe and deadlocks are likely.
This is a severe limitation although, in this proof-of-concept
work, NMI can only be triggered by SysRq-L or severe kernel damage.
This means we just about get away with it for simple test (lockdep
detects that we are doing wrong and shows a backtrace). This is
definitely the first thing that needs to be tackled to take this
code further.

Note also that alternative approaches to implementing a pseudo-NMI on
arm64 are possible but only through runtime cooperation with other
software components in the system, potentially both those running at EL3
and at secure EL1. I should like to explore these options in future but,
as far as I know, this is the only sane way to provide NMI-like features
whilst being implementable entirely in non-secure EL1[1]

[1] Except for a single register write to ICC_SRE_EL3 by the EL3
firmware (and already implemented by ARM trusted firmware).


Daniel Thompson (7):
serial: Emulate break using control characters
printk: Simple implementation for NMI backtracing
irqchip: gic-v3: Reset BPR during initialization
arm64: irqflags: Reorder the fiq & async macros
arm64: irqflags: Use ICC sysregs to implement IRQ masking
arm64: irqflags: Automatically identify I bit mis-management
arm64: Add support for on-demand backtrace of other CPUs

arch/arm64/Kconfig | 16 ++++
arch/arm64/include/asm/assembler.h | 72 +++++++++++++++--
arch/arm64/include/asm/hardirq.h | 2 +-
arch/arm64/include/asm/irq.h | 5 ++
arch/arm64/include/asm/irqflags.h | 130 ++++++++++++++++++++++++++++--
arch/arm64/include/asm/ptrace.h | 10 +++
arch/arm64/include/asm/smp.h | 4 +
arch/arm64/include/uapi/asm/ptrace.h | 8 ++
arch/arm64/kernel/entry.S | 70 ++++++++++++++---
arch/arm64/kernel/head.S | 27 +++++++
arch/arm64/kernel/irq.c | 6 ++
arch/arm64/kernel/smp.c | 70 +++++++++++++++++
arch/arm64/mm/cache.S | 4 +-
arch/arm64/mm/proc.S | 19 +++++
drivers/irqchip/irq-gic-v3.c | 70 ++++++++++++++++-
include/linux/irqchip/arm-gic-v3.h | 6 +-
include/linux/irqchip/arm-gic.h | 2 +-
include/linux/printk.h | 20 +++++
include/linux/serial_core.h | 83 +++++++++++++++-----
init/Kconfig | 3 +
kernel/printk/Makefile | 1 +
kernel/printk/nmi_backtrace.c | 148 +++++++++++++++++++++++++++++++++++
lib/Kconfig.debug | 15 ++++
23 files changed, 746 insertions(+), 45 deletions(-)
create mode 100644 kernel/printk/nmi_backtrace.c

--
2.1.0
Daniel Thompson
2015-03-18 14:20:22 UTC
Permalink
Currently the magic SysRq functions can accessed by sending a break on
the serial port. Unfortunately some networked serial proxies make it
difficult to send a break meaning SysRq functions cannot be used. This
patch provides a workaround by allowing the (fairly unlikely) sequence
of ^B^R^K characters to emulate a real break.

This approach is very nearly as robust as normal sysrq/break handling
because all trigger recognition happens during interrupt handling. Only
major difference is that to emulate a break we must enter the ISR four
times (instead of twice) and manage an extra byte of state.

No means is provided to escape the trigger sequence (and pass ^B^R^K to
the underlying process) however the sequence is proved reasonably pretty
collision resistant in practice. The most significant consequence is
that ^B and ^B^R are delayed until a new character is observed.

The most significant collision I am aware of is with emacs-like
backward-char bindings (^B) because the character movement will become
lumpy (two characters every two key presses rather than one character
per key press). Arrow keys or ^B^B^F provide workarounds.

Special note for tmux users:
tmux defaults to using ^B as its escape character but does not have a
default binding for ^B^R. Likewise tmux had no visual indicator
showing the beginning of break sequence meaning delayed the delivery
of ^B is not observable. Thus serial break emulation does not interfere
with the use of tmux's default key bindings.

Signed-off-by: Daniel Thompson <***@linaro.org>
---
include/linux/serial_core.h | 83 +++++++++++++++++++++++++++++++++++----------
lib/Kconfig.debug | 15 ++++++++
2 files changed, 80 insertions(+), 18 deletions(-)

diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index cf9c2ce9479d..56f8e3daf42c 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -160,6 +160,9 @@ struct uart_port {
struct console *cons; /* struct console, if any */
#if defined(CONFIG_SERIAL_CORE_CONSOLE) || defined(SUPPORT_SYSRQ)
unsigned long sysrq; /* sysrq timeout */
+#ifdef CONFIG_MAGIC_SYSRQ_BREAK_EMULATION
+ char sysrq_emul; /* emulation state */
+#endif
#endif

/* flags must be updated while holding port mutex */
@@ -420,24 +423,6 @@ extern void uart_handle_cts_change(struct uart_port *uport,
extern void uart_insert_char(struct uart_port *port, unsigned int status,
unsigned int overrun, unsigned int ch, unsigned int flag);

-#ifdef SUPPORT_SYSRQ
-static inline int
-uart_handle_sysrq_char(struct uart_port *port, unsigned int ch)
-{
- if (port->sysrq) {
- if (ch && time_before(jiffies, port->sysrq)) {
- handle_sysrq(ch);
- port->sysrq = 0;
- return 1;
- }
- port->sysrq = 0;
- }
- return 0;
-}
-#else
-#define uart_handle_sysrq_char(port,ch) ({ (void)port; 0; })
-#endif
-
/*
* We do the SysRQ and SAK checking like this...
*/
@@ -462,6 +447,68 @@ static inline int uart_handle_break(struct uart_port *port)
return 0;
}

+#if defined(SUPPORT_SYSRQ) && defined(CONFIG_MAGIC_SYSRQ_BREAK_EMULATION)
+/*
+ * Emulate a break if we are the serial console and receive ^B, ^R, ^K.
+ */
+static inline int
+uart_handle_sysrq_break_emulation(struct uart_port *port, unsigned int ch)
+{
+ const unsigned int ctrlb = 'B' & 31;
+ const unsigned int ctrlr = 'R' & 31;
+ const unsigned int ctrlk = 'K' & 31;
+
+ if (uart_console(port)) {
+ if ((port->sysrq_emul == 0 && ch == ctrlb) ||
+ (port->sysrq_emul == ctrlb && ch == ctrlr)) {
+ /* for either of the first two trigger characters
+ * update the state variable and move on.
+ */
+ port->sysrq_emul = ch;
+ return 1;
+ } else if (port->sysrq_emul == ctrlr && ch == ctrlk &&
+ uart_handle_break(port)) {
+ /* the break has already been emulated whilst
+ * evaluating the condition, tidy up and move on
+ */
+ port->sysrq_emul = 0;
+ return 1;
+ }
+ }
+
+ if (port->sysrq_emul) {
+ /* received a partial (false) trigger, tidy up and move on */
+ uart_insert_char(port, 0, 0, ctrlb, TTY_NORMAL);
+ if (port->sysrq_emul == ctrlr)
+ uart_insert_char(port, 0, 0, ctrlr, TTY_NORMAL);
+ port->sysrq_emul = 0;
+ }
+
+ return 0;
+}
+#else
+#define uart_handle_sysrq_break_emulation(port, ch) ({ (void)port; 0; })
+#endif
+
+#ifdef SUPPORT_SYSRQ
+static inline int
+uart_handle_sysrq_char(struct uart_port *port, unsigned int ch)
+{
+ if (port->sysrq) {
+ if (ch && time_before(jiffies, port->sysrq)) {
+ handle_sysrq(ch);
+ port->sysrq = 0;
+ return 1;
+ }
+ port->sysrq = 0;
+ }
+
+ return uart_handle_sysrq_break_emulation(port, ch);
+}
+#else
+#define uart_handle_sysrq_char(port, ch) ({ (void)port; 0; })
+#endif
+
/*
* UART_ENABLE_MS - determine if port should enable modem status irqs
*/
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c2d51af327bc..3f54e85c27d2 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -372,6 +372,21 @@ config MAGIC_SYSRQ_DEFAULT_ENABLE
This may be set to 1 or 0 to enable or disable them all, or
to a bitmask as described in Documentation/sysrq.txt.

+config MAGIC_SYSRQ_BREAK_EMULATION
+ bool "Enable magic SysRq serial break emulation"
+ depends on MAGIC_SYSRQ && SERIAL_CORE_CONSOLE
+ default n
+ help
+ If you say Y here, then you can use the character sequence ^B^R^K
+ to simulate a BREAK on the serial console. This is useful if for
+ some reason you cannot send a BREAK to your console's serial port.
+ For example, if you have a serial device server that cannot
+ send a BREAK. Enabling this feature can delay the delivery of
+ characters to the TTY because the ^B and a subsequent ^R will be
+ delayed until we know what the next character is.
+
+ If unsure, say N.
+
config DEBUG_KERNEL
bool "Kernel debugging"
help
--
2.1.0
Peter Hurley
2015-03-19 22:05:21 UTC
Permalink
[ + linux-serial ]

Hi Daniel,

I apologize for not reviewing this back in Sept when you first
RFC'd this patch.
Post by Daniel Thompson
Currently the magic SysRq functions can accessed by sending a break on
the serial port. Unfortunately some networked serial proxies make it
difficult to send a break meaning SysRq functions cannot be used. This
patch provides a workaround by allowing the (fairly unlikely) sequence
of ^B^R^K characters to emulate a real break.
I really feel that support for this belongs in the terminal server;
terminal servers designed to be console servers already provide this
functionality.

That said, my review comments are below in case others feel differently.
Post by Daniel Thompson
This approach is very nearly as robust as normal sysrq/break handling
because all trigger recognition happens during interrupt handling. Only
major difference is that to emulate a break we must enter the ISR four
times (instead of twice) and manage an extra byte of state.
It may not be immediately obvious to others that this means that portions
of the sequence are not processed as input until the sequence is not matched.

So, ^B is not available for reading at the tty until the next char is
received. If no next char is sent, ^B is _never_ received.
This can cause all kinds of weird process behavior, like no
context help.

This will also interfere with any portion of the process key bindings
(as opposed to only the first). For example, the default emacs key-binding
for list-buffers is ^X ^B.
Post by Daniel Thompson
No means is provided to escape the trigger sequence (and pass ^B^R^K to
the underlying process) however the sequence is proved reasonably pretty
collision resistant in practice. The most significant consequence is
that ^B and ^B^R are delayed until a new character is observed.
The most significant collision I am aware of is with emacs-like
backward-char bindings (^B) because the character movement will become
lumpy (two characters every two key presses rather than one character
per key press). Arrow keys or ^B^B^F provide workarounds.
tmux defaults to using ^B as its escape character but does not have a
default binding for ^B^R. Likewise tmux had no visual indicator
showing the beginning of break sequence meaning delayed the delivery
of ^B is not observable. Thus serial break emulation does not interfere
with the use of tmux's default key bindings.
Cataloging the user-space collisions here is really pointless;
it's best to make it clear that widespread userspace key binding
interference is likely.
Post by Daniel Thompson
---
include/linux/serial_core.h | 83 +++++++++++++++++++++++++++++++++++----------
lib/Kconfig.debug | 15 ++++++++
2 files changed, 80 insertions(+), 18 deletions(-)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index cf9c2ce9479d..56f8e3daf42c 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -160,6 +160,9 @@ struct uart_port {
struct console *cons; /* struct console, if any */
#if defined(CONFIG_SERIAL_CORE_CONSOLE) || defined(SUPPORT_SYSRQ)
unsigned long sysrq; /* sysrq timeout */
+#ifdef CONFIG_MAGIC_SYSRQ_BREAK_EMULATION
+ char sysrq_emul; /* emulation state */
+#endif
#endif
/* flags must be updated while holding port mutex */
@@ -420,24 +423,6 @@ extern void uart_handle_cts_change(struct uart_port *uport,
extern void uart_insert_char(struct uart_port *port, unsigned int status,
unsigned int overrun, unsigned int ch, unsigned int flag);
-#ifdef SUPPORT_SYSRQ
-static inline int
-uart_handle_sysrq_char(struct uart_port *port, unsigned int ch)
-{
- if (port->sysrq) {
- if (ch && time_before(jiffies, port->sysrq)) {
- handle_sysrq(ch);
- port->sysrq = 0;
- return 1;
- }
- port->sysrq = 0;
- }
- return 0;
-}
-#else
-#define uart_handle_sysrq_char(port,ch) ({ (void)port; 0; })
-#endif
-
/*
* We do the SysRQ and SAK checking like this...
*/
@@ -462,6 +447,68 @@ static inline int uart_handle_break(struct uart_port *port)
return 0;
}
+#if defined(SUPPORT_SYSRQ) && defined(CONFIG_MAGIC_SYSRQ_BREAK_EMULATION)
+/*
+ * Emulate a break if we are the serial console and receive ^B, ^R, ^K.
+ */
+static inline int
+uart_handle_sysrq_break_emulation(struct uart_port *port, unsigned int ch)
+{
+ const unsigned int ctrlb = 'B' & 31;
+ const unsigned int ctrlr = 'R' & 31;
+ const unsigned int ctrlk = 'K' & 31;
+
+ if (uart_console(port)) {
+ if ((port->sysrq_emul == 0 && ch == ctrlb) ||
+ (port->sysrq_emul == ctrlb && ch == ctrlr)) {
+ /* for either of the first two trigger characters
+ * update the state variable and move on.
+ */
+ port->sysrq_emul = ch;
+ return 1;
+ } else if (port->sysrq_emul == ctrlr && ch == ctrlk &&
+ uart_handle_break(port)) {
+ /* the break has already been emulated whilst
+ * evaluating the condition, tidy up and move on
+ */
+ port->sysrq_emul = 0;
+ return 1;
+ }
+ }
+
+ if (port->sysrq_emul) {
+ /* received a partial (false) trigger, tidy up and move on */
+ uart_insert_char(port, 0, 0, ctrlb, TTY_NORMAL);
+ if (port->sysrq_emul == ctrlr)
+ uart_insert_char(port, 0, 0, ctrlr, TTY_NORMAL);
+ port->sysrq_emul = 0;
+ }
I think this function would be shorter and clearer if the sequence
was in a byte array and the state variable was an index.

That would also allow the sequence to be configurable.
Post by Daniel Thompson
+
+ return 0;
+}
+#else
+#define uart_handle_sysrq_break_emulation(port, ch) ({ (void)port; 0; })
+#endif
+
+#ifdef SUPPORT_SYSRQ
+static inline int
+uart_handle_sysrq_char(struct uart_port *port, unsigned int ch)
+{
+ if (port->sysrq) {
+ if (ch && time_before(jiffies, port->sysrq)) {
+ handle_sysrq(ch);
+ port->sysrq = 0;
+ return 1;
+ }
+ port->sysrq = 0;
+ }
+
+ return uart_handle_sysrq_break_emulation(port, ch);
+}
+#else
+#define uart_handle_sysrq_char(port, ch) ({ (void)port; 0; })
+#endif
Why did your diff re-insert this whole function for a 1-line change?
Post by Daniel Thompson
+
/*
* UART_ENABLE_MS - determine if port should enable modem status irqs
*/
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c2d51af327bc..3f54e85c27d2 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -372,6 +372,21 @@ config MAGIC_SYSRQ_DEFAULT_ENABLE
This may be set to 1 or 0 to enable or disable them all, or
to a bitmask as described in Documentation/sysrq.txt.
+config MAGIC_SYSRQ_BREAK_EMULATION
+ bool "Enable magic SysRq serial break emulation"
+ depends on MAGIC_SYSRQ && SERIAL_CORE_CONSOLE
I think this option should depend on DEBUG_KERNEL.
Post by Daniel Thompson
+ default n
+ help
+ If you say Y here, then you can use the character sequence ^B^R^K
+ to simulate a BREAK on the serial console. This is useful if for
+ some reason you cannot send a BREAK to your console's serial port.
+ For example, if you have a serial device server that cannot
+ send a BREAK. Enabling this feature can delay the delivery of
+ characters to the TTY because the ^B and a subsequent ^R will be
+ delayed until we know what the next character is.
This help text should be more pessimistic and suggest this option _only_
if an inadequate terminal server is actually encountered and other
known methods of triggering sysrq remotely have failed.

And perhaps contain a warning.

Regards,
Peter Hurley
Post by Daniel Thompson
+
+ If unsure, say N.
+
config DEBUG_KERNEL
bool "Kernel debugging"
help
Daniel Thompson
2015-03-23 15:14:28 UTC
Permalink
Post by Peter Hurley
[ + linux-serial ]
Hi Daniel,
I apologize for not reviewing this back in Sept when you first
RFC'd this patch.
No worries.

The original RFC really was a "would anyone else find this useful?"
question and that question seemed to be answered by the absence of
responses!
Post by Peter Hurley
Post by Daniel Thompson
Currently the magic SysRq functions can accessed by sending a break on
the serial port. Unfortunately some networked serial proxies make it
difficult to send a break meaning SysRq functions cannot be used. This
patch provides a workaround by allowing the (fairly unlikely) sequence
of ^B^R^K characters to emulate a real break.
I really feel that support for this belongs in the terminal server;
terminal servers designed to be console servers already provide this
functionality.
I agree that the best place to fix this is in the terminal server.

However I wrote (and still personally use) this patch to cover those
occasions where that is not possible, typically because the terminal
server is implemented in a closed source firmware that I can't modify.
In one notable case I have a networked JTAG debugger that has a built in
serial proxy. This is *such* a great way to simplifying wiring up a lab
that I am prepared to tolerate the limitations and/or bugs in its serial
proxy.

More recently with the arm64 foundation model, I've been using a closed
source simulator where serial break doesn't work (or at least where
sending a break via telnet protocol did not work for me). Therefore I
have rolled the patch out again because without it my code cannot be tested.
Post by Peter Hurley
That said, my review comments are below in case others feel differently.
Thanks for the review and, for the record, I don't think I disagree with
anything you say below. However unless others really do feels
differently (or are persuaded by the above that it is a worthwhile idea)
then I'll probably just keep the patch to myself.
Post by Peter Hurley
Post by Daniel Thompson
This approach is very nearly as robust as normal sysrq/break handling
because all trigger recognition happens during interrupt handling. Only
major difference is that to emulate a break we must enter the ISR four
times (instead of twice) and manage an extra byte of state.
It may not be immediately obvious to others that this means that portions
of the sequence are not processed as input until the sequence is not matched.
So, ^B is not available for reading at the tty until the next char is
received. If no next char is sent, ^B is _never_ received.
This can cause all kinds of weird process behavior, like no
context help.
This will also interfere with any portion of the process key bindings
(as opposed to only the first). For example, the default emacs key-binding
for list-buffers is ^X ^B.
Post by Daniel Thompson
No means is provided to escape the trigger sequence (and pass ^B^R^K to
the underlying process) however the sequence is proved reasonably pretty
collision resistant in practice. The most significant consequence is
that ^B and ^B^R are delayed until a new character is observed.
The most significant collision I am aware of is with emacs-like
backward-char bindings (^B) because the character movement will become
lumpy (two characters every two key presses rather than one character
per key press). Arrow keys or ^B^B^F provide workarounds.
tmux defaults to using ^B as its escape character but does not have a
default binding for ^B^R. Likewise tmux had no visual indicator
showing the beginning of break sequence meaning delayed the delivery
of ^B is not observable. Thus serial break emulation does not interfere
with the use of tmux's default key bindings.
Cataloging the user-space collisions here is really pointless;
it's best to make it clear that widespread userspace key binding
interference is likely.
Post by Daniel Thompson
---
include/linux/serial_core.h | 83 +++++++++++++++++++++++++++++++++++----------
lib/Kconfig.debug | 15 ++++++++
2 files changed, 80 insertions(+), 18 deletions(-)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index cf9c2ce9479d..56f8e3daf42c 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -160,6 +160,9 @@ struct uart_port {
struct console *cons; /* struct console, if any */
#if defined(CONFIG_SERIAL_CORE_CONSOLE) || defined(SUPPORT_SYSRQ)
unsigned long sysrq; /* sysrq timeout */
+#ifdef CONFIG_MAGIC_SYSRQ_BREAK_EMULATION
+ char sysrq_emul; /* emulation state */
+#endif
#endif
/* flags must be updated while holding port mutex */
@@ -420,24 +423,6 @@ extern void uart_handle_cts_change(struct uart_port *uport,
extern void uart_insert_char(struct uart_port *port, unsigned int status,
unsigned int overrun, unsigned int ch, unsigned int flag);
-#ifdef SUPPORT_SYSRQ
-static inline int
-uart_handle_sysrq_char(struct uart_port *port, unsigned int ch)
-{
- if (port->sysrq) {
- if (ch && time_before(jiffies, port->sysrq)) {
- handle_sysrq(ch);
- port->sysrq = 0;
- return 1;
- }
- port->sysrq = 0;
- }
- return 0;
-}
-#else
-#define uart_handle_sysrq_char(port,ch) ({ (void)port; 0; })
-#endif
-
/*
* We do the SysRQ and SAK checking like this...
*/
@@ -462,6 +447,68 @@ static inline int uart_handle_break(struct uart_port *port)
return 0;
}
+#if defined(SUPPORT_SYSRQ) && defined(CONFIG_MAGIC_SYSRQ_BREAK_EMULATION)
+/*
+ * Emulate a break if we are the serial console and receive ^B, ^R, ^K.
+ */
+static inline int
+uart_handle_sysrq_break_emulation(struct uart_port *port, unsigned int ch)
+{
+ const unsigned int ctrlb = 'B' & 31;
+ const unsigned int ctrlr = 'R' & 31;
+ const unsigned int ctrlk = 'K' & 31;
+
+ if (uart_console(port)) {
+ if ((port->sysrq_emul == 0 && ch == ctrlb) ||
+ (port->sysrq_emul == ctrlb && ch == ctrlr)) {
+ /* for either of the first two trigger characters
+ * update the state variable and move on.
+ */
+ port->sysrq_emul = ch;
+ return 1;
+ } else if (port->sysrq_emul == ctrlr && ch == ctrlk &&
+ uart_handle_break(port)) {
+ /* the break has already been emulated whilst
+ * evaluating the condition, tidy up and move on
+ */
+ port->sysrq_emul = 0;
+ return 1;
+ }
+ }
+
+ if (port->sysrq_emul) {
+ /* received a partial (false) trigger, tidy up and move on */
+ uart_insert_char(port, 0, 0, ctrlb, TTY_NORMAL);
+ if (port->sysrq_emul == ctrlr)
+ uart_insert_char(port, 0, 0, ctrlr, TTY_NORMAL);
+ port->sysrq_emul = 0;
+ }
I think this function would be shorter and clearer if the sequence
was in a byte array and the state variable was an index.
That would also allow the sequence to be configurable.
Post by Daniel Thompson
+
+ return 0;
+}
+#else
+#define uart_handle_sysrq_break_emulation(port, ch) ({ (void)port; 0; })
+#endif
+
+#ifdef SUPPORT_SYSRQ
+static inline int
+uart_handle_sysrq_char(struct uart_port *port, unsigned int ch)
+{
+ if (port->sysrq) {
+ if (ch && time_before(jiffies, port->sysrq)) {
+ handle_sysrq(ch);
+ port->sysrq = 0;
+ return 1;
+ }
+ port->sysrq = 0;
+ }
+
+ return uart_handle_sysrq_break_emulation(port, ch);
+}
+#else
+#define uart_handle_sysrq_char(port, ch) ({ (void)port; 0; })
+#endif
Why did your diff re-insert this whole function for a 1-line change?
Post by Daniel Thompson
+
/*
* UART_ENABLE_MS - determine if port should enable modem status irqs
*/
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c2d51af327bc..3f54e85c27d2 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -372,6 +372,21 @@ config MAGIC_SYSRQ_DEFAULT_ENABLE
This may be set to 1 or 0 to enable or disable them all, or
to a bitmask as described in Documentation/sysrq.txt.
+config MAGIC_SYSRQ_BREAK_EMULATION
+ bool "Enable magic SysRq serial break emulation"
+ depends on MAGIC_SYSRQ && SERIAL_CORE_CONSOLE
I think this option should depend on DEBUG_KERNEL.
Post by Daniel Thompson
+ default n
+ help
+ If you say Y here, then you can use the character sequence ^B^R^K
+ to simulate a BREAK on the serial console. This is useful if for
+ some reason you cannot send a BREAK to your console's serial port.
+ For example, if you have a serial device server that cannot
+ send a BREAK. Enabling this feature can delay the delivery of
+ characters to the TTY because the ^B and a subsequent ^R will be
+ delayed until we know what the next character is.
This help text should be more pessimistic and suggest this option _only_
if an inadequate terminal server is actually encountered and other
known methods of triggering sysrq remotely have failed.
And perhaps contain a warning.
Regards,
Peter Hurley
Post by Daniel Thompson
+
+ If unsure, say N.
+
config DEBUG_KERNEL
bool "Kernel debugging"
help
Dave Martin
2015-03-20 14:28:49 UTC
Permalink
Post by Daniel Thompson
Currently the magic SysRq functions can accessed by sending a break on
the serial port. Unfortunately some networked serial proxies make it
difficult to send a break meaning SysRq functions cannot be used. This
patch provides a workaround by allowing the (fairly unlikely) sequence
of ^B^R^K characters to emulate a real break.
This is neat, but as it stands it feels like a bit of a hack. It would
be preferable to make the magic string configurable, since almost any
choice is going to upset somebody.

Since this also breaks the console (i.e., changes the behaviour)
It should probably not be on by default: a command-line option or
/proc/sys/kernel tweak should be required in order to turn it on.
Otherwise, this is likely to get activated unconditionally in production
kernels.

A particular concern is that something other than a human user could be
connected to the UART.

I also feel it doesn't really belong in this series. NMI doesn't
require this in order to be useful, this patch doesn't require NMI, and
anyway it's not specific to arm.


I suggest posting this separately, CCing linux-serial.

Cheers
--Dave
Post by Daniel Thompson
This approach is very nearly as robust as normal sysrq/break handling
because all trigger recognition happens during interrupt handling. Only
major difference is that to emulate a break we must enter the ISR four
times (instead of twice) and manage an extra byte of state.
No means is provided to escape the trigger sequence (and pass ^B^R^K to
the underlying process) however the sequence is proved reasonably pretty
collision resistant in practice. The most significant consequence is
that ^B and ^B^R are delayed until a new character is observed.
The most significant collision I am aware of is with emacs-like
backward-char bindings (^B) because the character movement will become
lumpy (two characters every two key presses rather than one character
per key press). Arrow keys or ^B^B^F provide workarounds.
tmux defaults to using ^B as its escape character but does not have a
default binding for ^B^R. Likewise tmux had no visual indicator
showing the beginning of break sequence meaning delayed the delivery
of ^B is not observable. Thus serial break emulation does not interfere
with the use of tmux's default key bindings.
---
include/linux/serial_core.h | 83 +++++++++++++++++++++++++++++++++++----------
lib/Kconfig.debug | 15 ++++++++
2 files changed, 80 insertions(+), 18 deletions(-)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index cf9c2ce9479d..56f8e3daf42c 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -160,6 +160,9 @@ struct uart_port {
struct console *cons; /* struct console, if any */
#if defined(CONFIG_SERIAL_CORE_CONSOLE) || defined(SUPPORT_SYSRQ)
unsigned long sysrq; /* sysrq timeout */
+#ifdef CONFIG_MAGIC_SYSRQ_BREAK_EMULATION
+ char sysrq_emul; /* emulation state */
+#endif
#endif
/* flags must be updated while holding port mutex */
@@ -420,24 +423,6 @@ extern void uart_handle_cts_change(struct uart_port *uport,
extern void uart_insert_char(struct uart_port *port, unsigned int status,
unsigned int overrun, unsigned int ch, unsigned int flag);
-#ifdef SUPPORT_SYSRQ
-static inline int
-uart_handle_sysrq_char(struct uart_port *port, unsigned int ch)
-{
- if (port->sysrq) {
- if (ch && time_before(jiffies, port->sysrq)) {
- handle_sysrq(ch);
- port->sysrq = 0;
- return 1;
- }
- port->sysrq = 0;
- }
- return 0;
-}
-#else
-#define uart_handle_sysrq_char(port,ch) ({ (void)port; 0; })
-#endif
-
/*
* We do the SysRQ and SAK checking like this...
*/
@@ -462,6 +447,68 @@ static inline int uart_handle_break(struct uart_port *port)
return 0;
}
+#if defined(SUPPORT_SYSRQ) && defined(CONFIG_MAGIC_SYSRQ_BREAK_EMULATION)
+/*
+ * Emulate a break if we are the serial console and receive ^B, ^R, ^K.
+ */
+static inline int
+uart_handle_sysrq_break_emulation(struct uart_port *port, unsigned int ch)
+{
+ const unsigned int ctrlb = 'B' & 31;
+ const unsigned int ctrlr = 'R' & 31;
+ const unsigned int ctrlk = 'K' & 31;
+
+ if (uart_console(port)) {
+ if ((port->sysrq_emul == 0 && ch == ctrlb) ||
+ (port->sysrq_emul == ctrlb && ch == ctrlr)) {
+ /* for either of the first two trigger characters
+ * update the state variable and move on.
+ */
+ port->sysrq_emul = ch;
+ return 1;
+ } else if (port->sysrq_emul == ctrlr && ch == ctrlk &&
+ uart_handle_break(port)) {
+ /* the break has already been emulated whilst
+ * evaluating the condition, tidy up and move on
+ */
+ port->sysrq_emul = 0;
+ return 1;
+ }
+ }
+
+ if (port->sysrq_emul) {
+ /* received a partial (false) trigger, tidy up and move on */
+ uart_insert_char(port, 0, 0, ctrlb, TTY_NORMAL);
+ if (port->sysrq_emul == ctrlr)
+ uart_insert_char(port, 0, 0, ctrlr, TTY_NORMAL);
+ port->sysrq_emul = 0;
+ }
+
+ return 0;
+}
+#else
+#define uart_handle_sysrq_break_emulation(port, ch) ({ (void)port; 0; })
+#endif
+
+#ifdef SUPPORT_SYSRQ
+static inline int
+uart_handle_sysrq_char(struct uart_port *port, unsigned int ch)
+{
+ if (port->sysrq) {
+ if (ch && time_before(jiffies, port->sysrq)) {
+ handle_sysrq(ch);
+ port->sysrq = 0;
+ return 1;
+ }
+ port->sysrq = 0;
+ }
+
+ return uart_handle_sysrq_break_emulation(port, ch);
+}
+#else
+#define uart_handle_sysrq_char(port, ch) ({ (void)port; 0; })
+#endif
+
/*
* UART_ENABLE_MS - determine if port should enable modem status irqs
*/
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c2d51af327bc..3f54e85c27d2 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -372,6 +372,21 @@ config MAGIC_SYSRQ_DEFAULT_ENABLE
This may be set to 1 or 0 to enable or disable them all, or
to a bitmask as described in Documentation/sysrq.txt.
+config MAGIC_SYSRQ_BREAK_EMULATION
+ bool "Enable magic SysRq serial break emulation"
+ depends on MAGIC_SYSRQ && SERIAL_CORE_CONSOLE
+ default n
+ help
+ If you say Y here, then you can use the character sequence ^B^R^K
+ to simulate a BREAK on the serial console. This is useful if for
+ some reason you cannot send a BREAK to your console's serial port.
+ For example, if you have a serial device server that cannot
+ send a BREAK. Enabling this feature can delay the delivery of
+ characters to the TTY because the ^B and a subsequent ^R will be
+ delayed until we know what the next character is.
+
+ If unsure, say N.
+
config DEBUG_KERNEL
bool "Kernel debugging"
help
--
2.1.0
_______________________________________________
linux-arm-kernel mailing list
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Daniel Thompson
2015-03-23 15:28:37 UTC
Permalink
Post by Dave Martin
Post by Daniel Thompson
Currently the magic SysRq functions can accessed by sending a break on
the serial port. Unfortunately some networked serial proxies make it
difficult to send a break meaning SysRq functions cannot be used. This
patch provides a workaround by allowing the (fairly unlikely) sequence
of ^B^R^K characters to emulate a real break.
This is neat, but as it stands it feels like a bit of a hack. It would
be preferable to make the magic string configurable, since almost any
choice is going to upset somebody.
Since this also breaks the console (i.e., changes the behaviour)
It should probably not be on by default: a command-line option or
/proc/sys/kernel tweak should be required in order to turn it on.
Otherwise, this is likely to get activated unconditionally in production
kernels.
It hadn't really occurred to me that it would ever be a good idea to
activate this for production kernels. Aren't these code paths rather hot
when the serial ports are running as super high speeds?

That said if the magic string were configurable then it could simply
default to the empty string and that would result in the serial break
emulation being disabled.
Post by Dave Martin
A particular concern is that something other than a human user could be
connected to the UART.
I also feel it doesn't really belong in this series. NMI doesn't
require this in order to be useful, this patch doesn't require NMI, and
anyway it's not specific to arm.
To be clear I included the patch in this series only because:

1. I couldn't figure out any way to send a serial break to the ARM
Foundation Model making it impossible for me to provoke SysRq actions
from interrupt context,

2. SysRq-L is a really good way to test the code and, when launched
from interrupt context proves that pre-emption by pseudo-NMI works.

I only really included the patch as a convenience for anyone wanting to
do any runtime testing.
Post by Dave Martin
I suggest posting this separately, CCing linux-serial.
Don't worry, I shared the patch on linux-serial quite some time ago
although, as it happens, the patch has got a lot more review comments
when I included it as a convenience in an unrelated patchset than it did
when I RFCed it separately.
Post by Dave Martin
Post by Daniel Thompson
This approach is very nearly as robust as normal sysrq/break handling
because all trigger recognition happens during interrupt handling. Only
major difference is that to emulate a break we must enter the ISR four
times (instead of twice) and manage an extra byte of state.
No means is provided to escape the trigger sequence (and pass ^B^R^K to
the underlying process) however the sequence is proved reasonably pretty
collision resistant in practice. The most significant consequence is
that ^B and ^B^R are delayed until a new character is observed.
The most significant collision I am aware of is with emacs-like
backward-char bindings (^B) because the character movement will become
lumpy (two characters every two key presses rather than one character
per key press). Arrow keys or ^B^B^F provide workarounds.
tmux defaults to using ^B as its escape character but does not have a
default binding for ^B^R. Likewise tmux had no visual indicator
showing the beginning of break sequence meaning delayed the delivery
of ^B is not observable. Thus serial break emulation does not interfere
with the use of tmux's default key bindings.
---
include/linux/serial_core.h | 83 +++++++++++++++++++++++++++++++++++----------
lib/Kconfig.debug | 15 ++++++++
2 files changed, 80 insertions(+), 18 deletions(-)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index cf9c2ce9479d..56f8e3daf42c 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -160,6 +160,9 @@ struct uart_port {
struct console *cons; /* struct console, if any */
#if defined(CONFIG_SERIAL_CORE_CONSOLE) || defined(SUPPORT_SYSRQ)
unsigned long sysrq; /* sysrq timeout */
+#ifdef CONFIG_MAGIC_SYSRQ_BREAK_EMULATION
+ char sysrq_emul; /* emulation state */
+#endif
#endif
/* flags must be updated while holding port mutex */
@@ -420,24 +423,6 @@ extern void uart_handle_cts_change(struct uart_port *uport,
extern void uart_insert_char(struct uart_port *port, unsigned int status,
unsigned int overrun, unsigned int ch, unsigned int flag);
-#ifdef SUPPORT_SYSRQ
-static inline int
-uart_handle_sysrq_char(struct uart_port *port, unsigned int ch)
-{
- if (port->sysrq) {
- if (ch && time_before(jiffies, port->sysrq)) {
- handle_sysrq(ch);
- port->sysrq = 0;
- return 1;
- }
- port->sysrq = 0;
- }
- return 0;
-}
-#else
-#define uart_handle_sysrq_char(port,ch) ({ (void)port; 0; })
-#endif
-
/*
* We do the SysRQ and SAK checking like this...
*/
@@ -462,6 +447,68 @@ static inline int uart_handle_break(struct uart_port *port)
return 0;
}
+#if defined(SUPPORT_SYSRQ) && defined(CONFIG_MAGIC_SYSRQ_BREAK_EMULATION)
+/*
+ * Emulate a break if we are the serial console and receive ^B, ^R, ^K.
+ */
+static inline int
+uart_handle_sysrq_break_emulation(struct uart_port *port, unsigned int ch)
+{
+ const unsigned int ctrlb = 'B' & 31;
+ const unsigned int ctrlr = 'R' & 31;
+ const unsigned int ctrlk = 'K' & 31;
+
+ if (uart_console(port)) {
+ if ((port->sysrq_emul == 0 && ch == ctrlb) ||
+ (port->sysrq_emul == ctrlb && ch == ctrlr)) {
+ /* for either of the first two trigger characters
+ * update the state variable and move on.
+ */
+ port->sysrq_emul = ch;
+ return 1;
+ } else if (port->sysrq_emul == ctrlr && ch == ctrlk &&
+ uart_handle_break(port)) {
+ /* the break has already been emulated whilst
+ * evaluating the condition, tidy up and move on
+ */
+ port->sysrq_emul = 0;
+ return 1;
+ }
+ }
+
+ if (port->sysrq_emul) {
+ /* received a partial (false) trigger, tidy up and move on */
+ uart_insert_char(port, 0, 0, ctrlb, TTY_NORMAL);
+ if (port->sysrq_emul == ctrlr)
+ uart_insert_char(port, 0, 0, ctrlr, TTY_NORMAL);
+ port->sysrq_emul = 0;
+ }
+
+ return 0;
+}
+#else
+#define uart_handle_sysrq_break_emulation(port, ch) ({ (void)port; 0; })
+#endif
+
+#ifdef SUPPORT_SYSRQ
+static inline int
+uart_handle_sysrq_char(struct uart_port *port, unsigned int ch)
+{
+ if (port->sysrq) {
+ if (ch && time_before(jiffies, port->sysrq)) {
+ handle_sysrq(ch);
+ port->sysrq = 0;
+ return 1;
+ }
+ port->sysrq = 0;
+ }
+
+ return uart_handle_sysrq_break_emulation(port, ch);
+}
+#else
+#define uart_handle_sysrq_char(port, ch) ({ (void)port; 0; })
+#endif
+
/*
* UART_ENABLE_MS - determine if port should enable modem status irqs
*/
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c2d51af327bc..3f54e85c27d2 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -372,6 +372,21 @@ config MAGIC_SYSRQ_DEFAULT_ENABLE
This may be set to 1 or 0 to enable or disable them all, or
to a bitmask as described in Documentation/sysrq.txt.
+config MAGIC_SYSRQ_BREAK_EMULATION
+ bool "Enable magic SysRq serial break emulation"
+ depends on MAGIC_SYSRQ && SERIAL_CORE_CONSOLE
+ default n
+ help
+ If you say Y here, then you can use the character sequence ^B^R^K
+ to simulate a BREAK on the serial console. This is useful if for
+ some reason you cannot send a BREAK to your console's serial port.
+ For example, if you have a serial device server that cannot
+ send a BREAK. Enabling this feature can delay the delivery of
+ characters to the TTY because the ^B and a subsequent ^R will be
+ delayed until we know what the next character is.
+
+ If unsure, say N.
+
config DEBUG_KERNEL
bool "Kernel debugging"
help
--
2.1.0
_______________________________________________
linux-arm-kernel mailing list
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Dave Martin
2015-03-23 16:28:14 UTC
Permalink
Post by Daniel Thompson
Post by Dave Martin
Post by Daniel Thompson
Currently the magic SysRq functions can accessed by sending a break on
the serial port. Unfortunately some networked serial proxies make it
difficult to send a break meaning SysRq functions cannot be used. This
patch provides a workaround by allowing the (fairly unlikely) sequence
of ^B^R^K characters to emulate a real break.
This is neat, but as it stands it feels like a bit of a hack. It would
be preferable to make the magic string configurable, since almost any
choice is going to upset somebody.
Since this also breaks the console (i.e., changes the behaviour)
It should probably not be on by default: a command-line option or
/proc/sys/kernel tweak should be required in order to turn it on.
Otherwise, this is likely to get activated unconditionally in production
kernels.
It hadn't really occurred to me that it would ever be a good idea to
activate this for production kernels. Aren't these code paths rather
Maybe, but there's no way to force people not to. In general, many
useful debugging options are left on in production kernels :)

This is useful (since production kernels are not really bug-free, and
debugging tools are therefore useful), but it means that debugging
options should be low-overhead and unobtrusive unless something extra
is done explicitly to turn them on...
Post by Daniel Thompson
hot when the serial ports are running as super high speeds?
It's probably not that bad, since this processing is only done for the
console and not other ports -- I don't know whether a serial port would
ever be used for the the console and simultaneously for some other high-
throughput purpose, since random printk spam is going to break most
protocols...
Post by Daniel Thompson
That said if the magic string were configurable then it could simply
default to the empty string and that would result in the serial
break emulation being disabled.
Post by Dave Martin
A particular concern is that something other than a human user could be
connected to the UART.
I also feel it doesn't really belong in this series. NMI doesn't
require this in order to be useful, this patch doesn't require NMI, and
anyway it's not specific to arm.
1. I couldn't figure out any way to send a serial break to the ARM
Foundation Model making it impossible for me to provoke SysRq actions
from interrupt context,
Agreed, there's no direct way to do it (annoyingly).

Arguably that's a deficiency in the model, though that's not much help to
you right now.
Post by Daniel Thompson
2. SysRq-L is a really good way to test the code and, when launched
from interrupt context proves that pre-emption by pseudo-NMI works.
I only really included the patch as a convenience for anyone wanting
to do any runtime testing.
Sure, that's all fine.

[...]

Cheers
---Dave
One Thousand Gnomes
2015-03-23 19:05:11 UTC
Permalink
Post by Dave Martin
Post by Daniel Thompson
1. I couldn't figure out any way to send a serial break to the ARM
Foundation Model making it impossible for me to provoke SysRq actions
from interrupt context,
Agreed, there's no direct way to do it (annoyingly).
Arguably that's a deficiency in the model, though that's not much help to
you right now.
If it isn't affecting real hardware and it's just for a flawed model then
hack a check for a suitable symbol into the specific patches for
the model and its serial driver and claim it was a break - don't dump it
into the mainstream.

There are some "conventional" escapes platforms have used (ctrl and
symbols for example such as ctrl-^) which are less likely to cause
conflicts than sequences. Nevetheless its not something you want anywhere
mainstream if you can avoid it because those symbols could be sent over a
remote management modem or similar.

They may be better if it does need to end up in the core kernel.

Alan
Daniel Thompson
2015-03-18 14:20:23 UTC
Permalink
Currently there is a quite a pile of code sitting in
arch/x86/kernel/apic/hw_nmi.c to support safe all-cpu backtracing from NMI.
The code is inaccessible to backtrace implementations for other
architectures, which is a shame because they would probably like to be
safe too.

Copy this code into printk, reworking it a little as we do so to make
it easier to exploit as library code.

We'll port the x86 NMI backtrace logic to it in a later patch.

Signed-off-by: Daniel Thompson <***@linaro.org>
Cc: Steven Rostedt <***@goodmis.org>
---
include/linux/printk.h | 20 ++++++
init/Kconfig | 3 +
kernel/printk/Makefile | 1 +
kernel/printk/nmi_backtrace.c | 148 ++++++++++++++++++++++++++++++++++++++++++
4 files changed, 172 insertions(+)
create mode 100644 kernel/printk/nmi_backtrace.c

diff --git a/include/linux/printk.h b/include/linux/printk.h
index baa3f97d8ce8..44bb85ad1f62 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -228,6 +228,26 @@ static inline void show_regs_print_info(const char *log_lvl)
}
#endif

+#ifdef CONFIG_PRINTK_NMI_BACKTRACE
+/*
+ * printk_nmi_backtrace_prepare/complete are called to prepare the
+ * system for some or all cores to issue trace from NMI.
+ * printk_nmi_backtrace_complete will print buffered output and cannot
+ * (safely) be called from NMI.
+ */
+extern int printk_nmi_backtrace_prepare(void);
+extern void printk_nmi_backtrace_complete(void);
+
+/*
+ * printk_nmi_backtrace_this_cpu_begin/end are used divert/restore printk
+ * on this cpu. The result is the output of printk() (by this CPU) will be
+ * stored in temporary buffers for later printing by
+ * printk_nmi_backtrace_complete.
+ */
+extern void printk_nmi_backtrace_this_cpu_begin(void);
+extern void printk_nmi_backtrace_this_cpu_end(void);
+#endif
+
extern asmlinkage void dump_stack(void) __cold;

#ifndef pr_fmt
diff --git a/init/Kconfig b/init/Kconfig
index f5dbc6d4261b..0107e9b4d2cf 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1421,6 +1421,9 @@ config PRINTK
very difficult to diagnose system problems, saying N here is
strongly discouraged.

+config PRINTK_NMI_BACKTRACE
+ bool
+
config BUG
bool "BUG() support" if EXPERT
default y
diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index 85405bdcf2b3..1849b001384a 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -1,2 +1,3 @@
obj-y = printk.o
+obj-$(CONFIG_PRINTK_NMI_BACKTRACE) += nmi_backtrace.o
obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o
diff --git a/kernel/printk/nmi_backtrace.c b/kernel/printk/nmi_backtrace.c
new file mode 100644
index 000000000000..e9a06929c4f3
--- /dev/null
+++ b/kernel/printk/nmi_backtrace.c
@@ -0,0 +1,148 @@
+#include <linux/kernel.h>
+#include <linux/seq_buf.h>
+
+#define NMI_BUF_SIZE 4096
+
+struct nmi_seq_buf {
+ unsigned char buffer[NMI_BUF_SIZE];
+ struct seq_buf seq;
+};
+
+/* Safe printing in NMI context */
+static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
+
+static DEFINE_PER_CPU(printk_func_t, nmi_print_saved_print_func);
+
+/* "in progress" flag of NMI printing */
+static unsigned long nmi_print_flag;
+
+static int __init printk_nmi_backtrace_init(void)
+{
+ struct nmi_seq_buf *s;
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ s = &per_cpu(nmi_print_seq, cpu);
+ seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
+ }
+
+ return 0;
+}
+pure_initcall(printk_nmi_backtrace_init);
+
+/*
+ * It is not safe to call printk() directly from NMI handlers.
+ * It may be fine if the NMI detected a lock up and we have no choice
+ * but to do so, but doing a NMI on all other CPUs to get a back trace
+ * can be done with a sysrq-l. We don't want that to lock up, which
+ * can happen if the NMI interrupts a printk in progress.
+ *
+ * Instead, we redirect the vprintk() to this nmi_vprintk() that writes
+ * the content into a per cpu seq_buf buffer. Then when the NMIs are
+ * all done, we can safely dump the contents of the seq_buf to a printk()
+ * from a non NMI context.
+ *
+ * This is not a generic printk() implementation and must be used with
+ * great care. In particular there is a static limit on the quantity of
+ * data that may be emitted during NMI, only one client can be active at
+ * one time (arbitrated by the return value of printk_nmi_begin() and
+ * it is required that something at task or interrupt context be scheduled
+ * to issue the output.
+ */
+static int nmi_vprintk(const char *fmt, va_list args)
+{
+ struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
+ unsigned int len = seq_buf_used(&s->seq);
+
+ seq_buf_vprintf(&s->seq, fmt, args);
+ return seq_buf_used(&s->seq) - len;
+}
+
+/*
+ * Reserve the NMI printk mechanism. Return an error if some other component
+ * is already using it.
+ */
+int printk_nmi_backtrace_prepare(void)
+{
+ if (test_and_set_bit(0, &nmi_print_flag)) {
+ /*
+ * If something is already using the NMI print facility we
+ * can't allow a second one...
+ */
+ return -EBUSY;
+ }
+
+ return 0;
+}
+
+static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
+{
+ const char *buf = s->buffer + start;
+
+ printk("%.*s", (end - start) + 1, buf);
+}
+
+void printk_nmi_backtrace_complete(void)
+{
+ struct nmi_seq_buf *s;
+ int len, cpu, i, last_i;
+
+ /*
+ * Now that all the NMIs have triggered, we can dump out their
+ * back traces safely to the console.
+ */
+ for_each_possible_cpu(cpu) {
+ s = &per_cpu(nmi_print_seq, cpu);
+ last_i = 0;
+
+ len = seq_buf_used(&s->seq);
+ if (!len)
+ continue;
+
+ /* Print line by line. */
+ for (i = 0; i < len; i++) {
+ if (s->buffer[i] == '\n') {
+ print_seq_line(s, last_i, i);
+ last_i = i + 1;
+ }
+ }
+ /* Check if there was a partial line. */
+ if (last_i < len) {
+ print_seq_line(s, last_i, len - 1);
+ pr_cont("\n");
+ }
+
+ /* Wipe out the buffer ready for the next time around. */
+ seq_buf_clear(&s->seq);
+ }
+
+ clear_bit(0, &nmi_print_flag);
+ smp_mb__after_atomic();
+}
+
+void printk_nmi_backtrace_this_cpu_begin(void)
+{
+ /*
+ * Detect double-begins and report them. This code is unsafe (because
+ * it will print from NMI) but things are pretty badly damaged if the
+ * NMI re-enters and is somehow granted permission to use NMI printk,
+ * so how much worse can it get? Also since this code interferes with
+ * the operation of printk it is unlikely that any consequential
+ * failures will be able to log anything making this our last
+ * opportunity to tell anyone that something is wrong.
+ */
+ if (this_cpu_read(nmi_print_saved_print_func)) {
+ this_cpu_write(printk_func,
+ this_cpu_read(nmi_print_saved_print_func));
+ BUG();
+ }
+
+ this_cpu_write(nmi_print_saved_print_func, this_cpu_read(printk_func));
+ this_cpu_write(printk_func, nmi_vprintk);
+}
+
+void printk_nmi_backtrace_this_cpu_end(void)
+{
+ this_cpu_write(printk_func, this_cpu_read(nmi_print_saved_print_func));
+ this_cpu_write(nmi_print_saved_print_func, NULL);
+}
--
2.1.0
Steven Rostedt
2015-03-19 17:39:58 UTC
Permalink
On Wed, 18 Mar 2015 14:20:23 +0000
Post by Daniel Thompson
Currently there is a quite a pile of code sitting in
arch/x86/kernel/apic/hw_nmi.c to support safe all-cpu backtracing from NMI.
The code is inaccessible to backtrace implementations for other
architectures, which is a shame because they would probably like to be
safe too.
Copy this code into printk, reworking it a little as we do so to make
it easier to exploit as library code.
We'll port the x86 NMI backtrace logic to it in a later patch.
---
include/linux/printk.h | 20 ++++++
init/Kconfig | 3 +
kernel/printk/Makefile | 1 +
kernel/printk/nmi_backtrace.c | 148 ++++++++++++++++++++++++++++++++++++++++++
4 files changed, 172 insertions(+)
create mode 100644 kernel/printk/nmi_backtrace.c
diff --git a/include/linux/printk.h b/include/linux/printk.h
index baa3f97d8ce8..44bb85ad1f62 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -228,6 +228,26 @@ static inline void show_regs_print_info(const char *log_lvl)
}
#endif
+#ifdef CONFIG_PRINTK_NMI_BACKTRACE
+/*
+ * printk_nmi_backtrace_prepare/complete are called to prepare the
+ * system for some or all cores to issue trace from NMI.
+ * printk_nmi_backtrace_complete will print buffered output and cannot
+ * (safely) be called from NMI.
+ */
+extern int printk_nmi_backtrace_prepare(void);
+extern void printk_nmi_backtrace_complete(void);
+
+/*
+ * printk_nmi_backtrace_this_cpu_begin/end are used divert/restore printk
+ * on this cpu. The result is the output of printk() (by this CPU) will be
+ * stored in temporary buffers for later printing by
+ * printk_nmi_backtrace_complete.
+ */
+extern void printk_nmi_backtrace_this_cpu_begin(void);
+extern void printk_nmi_backtrace_this_cpu_end(void);
+#endif
+
extern asmlinkage void dump_stack(void) __cold;
#ifndef pr_fmt
diff --git a/init/Kconfig b/init/Kconfig
index f5dbc6d4261b..0107e9b4d2cf 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1421,6 +1421,9 @@ config PRINTK
very difficult to diagnose system problems, saying N here is
strongly discouraged.
+config PRINTK_NMI_BACKTRACE
+ bool
+
config BUG
bool "BUG() support" if EXPERT
default y
diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index 85405bdcf2b3..1849b001384a 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -1,2 +1,3 @@
obj-y = printk.o
+obj-$(CONFIG_PRINTK_NMI_BACKTRACE) += nmi_backtrace.o
obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o
diff --git a/kernel/printk/nmi_backtrace.c b/kernel/printk/nmi_backtrace.c
new file mode 100644
index 000000000000..e9a06929c4f3
--- /dev/null
+++ b/kernel/printk/nmi_backtrace.c
@@ -0,0 +1,148 @@
+#include <linux/kernel.h>
+#include <linux/seq_buf.h>
+
+#define NMI_BUF_SIZE 4096
+
+struct nmi_seq_buf {
+ unsigned char buffer[NMI_BUF_SIZE];
+ struct seq_buf seq;
+};
+
+/* Safe printing in NMI context */
+static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
+
+static DEFINE_PER_CPU(printk_func_t, nmi_print_saved_print_func);
+
+/* "in progress" flag of NMI printing */
+static unsigned long nmi_print_flag;
+
+static int __init printk_nmi_backtrace_init(void)
+{
+ struct nmi_seq_buf *s;
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ s = &per_cpu(nmi_print_seq, cpu);
+ seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
+ }
+
+ return 0;
+}
+pure_initcall(printk_nmi_backtrace_init);
+
+/*
+ * It is not safe to call printk() directly from NMI handlers.
+ * It may be fine if the NMI detected a lock up and we have no choice
+ * but to do so, but doing a NMI on all other CPUs to get a back trace
+ * can be done with a sysrq-l. We don't want that to lock up, which
+ * can happen if the NMI interrupts a printk in progress.
+ *
+ * Instead, we redirect the vprintk() to this nmi_vprintk() that writes
+ * the content into a per cpu seq_buf buffer. Then when the NMIs are
+ * all done, we can safely dump the contents of the seq_buf to a printk()
+ * from a non NMI context.
+ *
+ * This is not a generic printk() implementation and must be used with
+ * great care. In particular there is a static limit on the quantity of
+ * data that may be emitted during NMI, only one client can be active at
+ * one time (arbitrated by the return value of printk_nmi_begin() and
+ * it is required that something at task or interrupt context be scheduled
+ * to issue the output.
+ */
+static int nmi_vprintk(const char *fmt, va_list args)
+{
+ struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
+ unsigned int len = seq_buf_used(&s->seq);
+
+ seq_buf_vprintf(&s->seq, fmt, args);
+ return seq_buf_used(&s->seq) - len;
+}
+
+/*
+ * Reserve the NMI printk mechanism. Return an error if some other component
+ * is already using it.
+ */
+int printk_nmi_backtrace_prepare(void)
+{
+ if (test_and_set_bit(0, &nmi_print_flag)) {
+ /*
+ * If something is already using the NMI print facility we
+ * can't allow a second one...
+ */
+ return -EBUSY;
+ }
+
+ return 0;
+}
+
+static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
+{
+ const char *buf = s->buffer + start;
+
+ printk("%.*s", (end - start) + 1, buf);
+}
+
+void printk_nmi_backtrace_complete(void)
+{
+ struct nmi_seq_buf *s;
+ int len, cpu, i, last_i;
+
+ /*
+ * Now that all the NMIs have triggered, we can dump out their
+ * back traces safely to the console.
+ */
+ for_each_possible_cpu(cpu) {
+ s = &per_cpu(nmi_print_seq, cpu);
+ last_i = 0;
+
+ len = seq_buf_used(&s->seq);
+ if (!len)
+ continue;
+
+ /* Print line by line. */
+ for (i = 0; i < len; i++) {
+ if (s->buffer[i] == '\n') {
+ print_seq_line(s, last_i, i);
+ last_i = i + 1;
+ }
+ }
+ /* Check if there was a partial line. */
+ if (last_i < len) {
+ print_seq_line(s, last_i, len - 1);
+ pr_cont("\n");
+ }
+
+ /* Wipe out the buffer ready for the next time around. */
+ seq_buf_clear(&s->seq);
+ }
+
+ clear_bit(0, &nmi_print_flag);
+ smp_mb__after_atomic();
Is this really necessary. What is the mb synchronizing?

[ Added Peter Zijlstra to confirm it's not needed ]

-- Steve
Post by Daniel Thompson
+}
+
+void printk_nmi_backtrace_this_cpu_begin(void)
+{
+ /*
+ * Detect double-begins and report them. This code is unsafe (because
+ * it will print from NMI) but things are pretty badly damaged if the
+ * NMI re-enters and is somehow granted permission to use NMI printk,
+ * so how much worse can it get? Also since this code interferes with
+ * the operation of printk it is unlikely that any consequential
+ * failures will be able to log anything making this our last
+ * opportunity to tell anyone that something is wrong.
+ */
+ if (this_cpu_read(nmi_print_saved_print_func)) {
+ this_cpu_write(printk_func,
+ this_cpu_read(nmi_print_saved_print_func));
+ BUG();
+ }
+
+ this_cpu_write(nmi_print_saved_print_func, this_cpu_read(printk_func));
+ this_cpu_write(printk_func, nmi_vprintk);
+}
+
+void printk_nmi_backtrace_this_cpu_end(void)
+{
+ this_cpu_write(printk_func, this_cpu_read(nmi_print_saved_print_func));
+ this_cpu_write(nmi_print_saved_print_func, NULL);
+}
Peter Zijlstra
2015-03-19 18:30:37 UTC
Permalink
Post by Steven Rostedt
Post by Daniel Thompson
+void printk_nmi_backtrace_complete(void)
+{
+ struct nmi_seq_buf *s;
+ int len, cpu, i, last_i;
+
+ /*
+ * Now that all the NMIs have triggered, we can dump out their
+ * back traces safely to the console.
+ */
+ for_each_possible_cpu(cpu) {
+ s = &per_cpu(nmi_print_seq, cpu);
+ last_i = 0;
+
+ len = seq_buf_used(&s->seq);
+ if (!len)
+ continue;
+
+ /* Print line by line. */
+ for (i = 0; i < len; i++) {
+ if (s->buffer[i] == '\n') {
+ print_seq_line(s, last_i, i);
+ last_i = i + 1;
+ }
+ }
+ /* Check if there was a partial line. */
+ if (last_i < len) {
+ print_seq_line(s, last_i, len - 1);
+ pr_cont("\n");
+ }
+
+ /* Wipe out the buffer ready for the next time around. */
+ seq_buf_clear(&s->seq);
+ }
+
+ clear_bit(0, &nmi_print_flag);
+ smp_mb__after_atomic();
Is this really necessary. What is the mb synchronizing?
[ Added Peter Zijlstra to confirm it's not needed ]
It surely looks suspect; and it lacks a comment, which is a clear sign
its buggy.

Now it if tries to order the accesses to the seqbuf againt the clearing
of the bit one would have expected a _before_ barrier, not an _after_.
Daniel Thompson
2015-03-19 18:48:10 UTC
Permalink
Post by Peter Zijlstra
Post by Steven Rostedt
Post by Daniel Thompson
+void printk_nmi_backtrace_complete(void)
+{
+ struct nmi_seq_buf *s;
+ int len, cpu, i, last_i;
+
+ /*
+ * Now that all the NMIs have triggered, we can dump out their
+ * back traces safely to the console.
+ */
+ for_each_possible_cpu(cpu) {
+ s = &per_cpu(nmi_print_seq, cpu);
+ last_i = 0;
+
+ len = seq_buf_used(&s->seq);
+ if (!len)
+ continue;
+
+ /* Print line by line. */
+ for (i = 0; i < len; i++) {
+ if (s->buffer[i] == '\n') {
+ print_seq_line(s, last_i, i);
+ last_i = i + 1;
+ }
+ }
+ /* Check if there was a partial line. */
+ if (last_i < len) {
+ print_seq_line(s, last_i, len - 1);
+ pr_cont("\n");
+ }
+
+ /* Wipe out the buffer ready for the next time around. */
+ seq_buf_clear(&s->seq);
+ }
+
+ clear_bit(0, &nmi_print_flag);
+ smp_mb__after_atomic();
Is this really necessary. What is the mb synchronizing?
[ Added Peter Zijlstra to confirm it's not needed ]
It surely looks suspect; and it lacks a comment, which is a clear sign
its buggy.
Now it if tries to order the accesses to the seqbuf againt the clearing
of the bit one would have expected a _before_ barrier, not an _after_.
It's nothing to do with the seqbuf since I added the seqbuf code myself
but the barrier was already in the code that I copied from.

In the mainline code today it looks like this as part of the x86 code
(note that call to put_cpu() in my patchset but it lives in the arch/
specific code rather than the generic code):

: /* Check if there was a partial line. */
: if (last_i < len) {
: print_seq_line(s, last_i, len - 1);
: pr_cont("\n");
: }
: }
:
: clear_bit(0, &backtrace_flag);
: smp_mb__after_atomic();
: put_cpu();
: }

The barrier was not intended to have anything to do with put_cpu()
either though since the barrier was added before put_cpu() arrived:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=554ec063982752e9a569ab9189eeffa3d96731b2

There's nothing in the commit comment explaining the barrier and I
really can't see what it is for.


Daniel.
Steven Rostedt
2015-03-19 19:01:11 UTC
Permalink
On Thu, 19 Mar 2015 18:48:10 +0000
Daniel Thompson <***@linaro.org> wrote:
\
Post by Daniel Thompson
The barrier was not intended to have anything to do with put_cpu()
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=554ec063982752e9a569ab9189eeffa3d96731b2
There's nothing in the commit comment explaining the barrier and I
really can't see what it is for.
Looks like it wasn't needed then either.

-- Steve
Daniel Thompson
2015-03-23 14:51:56 UTC
Permalink
Post by Steven Rostedt
On Thu, 19 Mar 2015 18:48:10 +0000
\
Post by Daniel Thompson
The barrier was not intended to have anything to do with put_cpu()
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=554ec063982752e9a569ab9189eeffa3d96731b2
There's nothing in the commit comment explaining the barrier and I
really can't see what it is for.
Looks like it wasn't needed then either.
Agreed. I'll respin the patchset with the barrier removed.


Daniel.
Daniel Thompson
2015-03-18 14:20:25 UTC
Permalink
Separate out the local fiq & async macros from the various arch inlines.
This makes is easier for us (in a later patch) to provide an alternative
implementation of these inlines.

Signed-off-by: Daniel Thompson <***@linaro.org>
---
arch/arm64/include/asm/irqflags.h | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
index 11cc941bd107..df7477af6389 100644
--- a/arch/arm64/include/asm/irqflags.h
+++ b/arch/arm64/include/asm/irqflags.h
@@ -53,12 +53,6 @@ static inline void arch_local_irq_disable(void)
: "memory");
}

-#define local_fiq_enable() asm("msr daifclr, #1" : : : "memory")
-#define local_fiq_disable() asm("msr daifset, #1" : : : "memory")
-
-#define local_async_enable() asm("msr daifclr, #4" : : : "memory")
-#define local_async_disable() asm("msr daifset, #4" : : : "memory")
-
/*
* Save the current interrupt enable state.
*/
@@ -90,6 +84,12 @@ static inline int arch_irqs_disabled_flags(unsigned long flags)
return flags & PSR_I_BIT;
}

+#define local_fiq_enable() asm("msr daifclr, #1" : : : "memory")
+#define local_fiq_disable() asm("msr daifset, #1" : : : "memory")
+
+#define local_async_enable() asm("msr daifclr, #4" : : : "memory")
+#define local_async_disable() asm("msr daifset, #4" : : : "memory")
+
/*
* save and restore debug state
*/
--
2.1.0
Daniel Thompson
2015-03-18 14:20:24 UTC
Permalink
Currently, when running on FVP, CPU 0 boots up with its BPR changed from
the reset value. This renders it impossible to (preemptively) prioritize
interrupts on CPU 0.

This is harmless on normal systems but prevents preemption by NMIs on
systems with CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS enabled.

Many thanks to Andrew Thoelke for suggesting the BPR as having the
potential to harm preemption.

Suggested-by: Andrew Thoelke <***@arm.com>
Signed-off-by: Daniel Thompson <***@linaro.org>
---
drivers/irqchip/irq-gic-v3.c | 13 +++++++++++++
include/linux/irqchip/arm-gic-v3.h | 2 ++
2 files changed, 15 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index fd8850def1b8..32533650494c 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -120,6 +120,11 @@ static void __maybe_unused gic_write_pmr(u64 val)
asm volatile("msr_s " __stringify(ICC_PMR_EL1) ", %0" : : "r" (val));
}

+static void __maybe_unused gic_write_bpr1(u64 val)
+{
+ asm volatile("msr_s " __stringify(ICC_BPR1_EL1) ", %0" : : "r" (val));
+}
+
static void __maybe_unused gic_write_ctlr(u64 val)
{
asm volatile("msr_s " __stringify(ICC_CTLR_EL1) ", %0" : : "r" (val));
@@ -383,6 +388,14 @@ static void gic_cpu_sys_reg_init(void)
/* Set priority mask register */
gic_write_pmr(DEFAULT_PMR_VALUE);

+ /*
+ * On FVP, CPU 0 arrives in the kernel with its BPR changed from the
+ * reset value (and the value is large enough to prevent pre-emptive
+ * interrupts from working at all). Writing a zero to BPR restores the
+ * reset value.
+ */
+ gic_write_bpr1(0);
+
/* EOI deactivates interrupt too (mode 0) */
gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop_dir);

diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index 781974afff9f..79d6645897e6 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -270,6 +270,8 @@
#define ICH_VMCR_PMR_SHIFT 24
#define ICH_VMCR_PMR_MASK (0xffUL << ICH_VMCR_PMR_SHIFT)

+#define ICC_BPR0_EL1 sys_reg(3, 0, 12, 8, 3)
+#define ICC_BPR1_EL1 sys_reg(3, 0, 12, 12, 3)
#define ICC_EOIR1_EL1 sys_reg(3, 0, 12, 12, 1)
#define ICC_IAR1_EL1 sys_reg(3, 0, 12, 12, 0)
#define ICC_SGI1R_EL1 sys_reg(3, 0, 12, 11, 5)
--
2.1.0
Daniel Thompson
2015-03-18 14:20:27 UTC
Permalink
This is self-test code to identify circumstances where the I bit is
set by hardware but no software exists to copy its state to the PMR.

I don't really expect this patch to be retained much after the RFC stage.
However I have included it in this RFC series to document the testing I
have done and to allow further testing under different workloads.

Signed-off-by: Daniel Thompson <***@linaro.org>
---
arch/arm64/include/asm/irqflags.h | 29 +++++++++++++++++++++++++++++
arch/arm64/kernel/irq.c | 6 ++++++
2 files changed, 35 insertions(+)

diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
index 7b6866022f82..89be5f830857 100644
--- a/arch/arm64/include/asm/irqflags.h
+++ b/arch/arm64/include/asm/irqflags.h
@@ -18,6 +18,7 @@

#ifdef __KERNEL__

+#include <asm/bug.h>
#include <asm/ptrace.h>

#ifndef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
@@ -90,6 +91,23 @@ static inline int arch_irqs_disabled_flags(unsigned long flags)

#include <linux/irqchip/arm-gic-v3.h>

+extern bool enable_i_bit_check;
+
+static inline void check_for_i_bit(void)
+{
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS_SELF_TEST
+ unsigned long psr;
+
+ if (enable_i_bit_check) {
+ asm volatile("mrs %0, daif" : "=r"(psr));
+ if (psr & PSR_I_BIT) {
+ enable_i_bit_check = false;
+ WARN(true, "I bit is set: %08lx\n", psr);
+ }
+ }
+#endif
+}
+
/*
* CPU interrupt mask handling.
*/
@@ -97,6 +115,8 @@ static inline unsigned long arch_local_irq_save(void)
{
unsigned long flags, masked = ICC_PMR_EL1_MASKED;

+ check_for_i_bit();
+
asm volatile(
"// arch_local_irq_save\n"
"mrs_s %0, " __stringify(ICC_PMR_EL1) "\n"
@@ -113,6 +133,8 @@ static inline void arch_local_irq_enable(void)
{
unsigned long unmasked = ICC_PMR_EL1_UNMASKED;

+ check_for_i_bit();
+
asm volatile(
"// arch_local_irq_enable\n"
"msr_s " __stringify(ICC_PMR_EL1) ",%0\n"
@@ -126,6 +148,8 @@ static inline void arch_local_irq_disable(void)
{
unsigned long masked = ICC_PMR_EL1_MASKED;

+ check_for_i_bit();
+
asm volatile(
"// arch_local_irq_disable\n"
"msr_s " __stringify(ICC_PMR_EL1) ",%0\n"
@@ -142,6 +166,8 @@ static inline unsigned long arch_local_save_flags(void)
{
unsigned long flags;

+ check_for_i_bit();
+
asm volatile(
"// arch_local_save_flags\n"
"mrs_s %0, " __stringify(ICC_PMR_EL1) "\n"
@@ -157,6 +183,8 @@ static inline unsigned long arch_local_save_flags(void)
*/
static inline void arch_local_irq_restore(unsigned long flags)
{
+ check_for_i_bit();
+
asm volatile(
"// arch_local_irq_restore\n"
"msr_s " __stringify(ICC_PMR_EL1) ",%0\n"
@@ -168,6 +196,7 @@ static inline void arch_local_irq_restore(unsigned long flags)

static inline int arch_irqs_disabled_flags(unsigned long flags)
{
+ check_for_i_bit();
return !(flags & ICC_PMR_EL1_G_BIT);
}

diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index 240b75c0e94f..7d68193af26c 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -31,6 +31,12 @@

unsigned long irq_err_count;

+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS_SELF_TEST
+/* enable_i_bit_check is declared in asm/irqflags.h */
+bool enable_i_bit_check = true;
+EXPORT_SYMBOL(enable_i_bit_check);
+#endif
+
int arch_show_interrupts(struct seq_file *p, int prec)
{
#ifdef CONFIG_SMP
--
2.1.0
Daniel Thompson
2015-03-18 14:20:28 UTC
Permalink
Currently arm64 has no implementation of arch_trigger_all_cpu_backtace.
The patch provides one for arm64 systems that are built with
CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS (i.e. those that have a pseudo-NMI).

Signed-off-by: Daniel Thompson <***@linaro.org>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/hardirq.h | 2 +-
arch/arm64/include/asm/irq.h | 5 +++
arch/arm64/include/asm/smp.h | 4 +++
arch/arm64/kernel/smp.c | 70 ++++++++++++++++++++++++++++++++++++++++
drivers/irqchip/irq-gic-v3.c | 28 ++++++++++++++++
6 files changed, 109 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 91c0babb6132..cb8d3dce73c3 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -76,6 +76,7 @@ config ARM64
select PERF_USE_VMALLOC
select POWER_RESET
select POWER_SUPPLY
+ select PRINTK_NMI_BACKTRACE if USE_ICC_SYSREGS_FOR_IRQFLAGS
select RTC_LIB
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
index 6aae421f4d73..e8a3268a891c 100644
--- a/arch/arm64/include/asm/hardirq.h
+++ b/arch/arm64/include/asm/hardirq.h
@@ -20,7 +20,7 @@
#include <linux/threads.h>
#include <asm/irq.h>

-#define NR_IPI 5
+#define NR_IPI 6

typedef struct {
unsigned int __softirq_pending;
diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
index 94c53674a31d..4f974a2959bb 100644
--- a/arch/arm64/include/asm/irq.h
+++ b/arch/arm64/include/asm/irq.h
@@ -8,4 +8,9 @@ struct pt_regs;
extern void migrate_irqs(void);
extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));

+#if defined CONFIG_SMP && defined CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+extern void arch_trigger_all_cpu_backtrace(bool);
+#define arch_trigger_all_cpu_backtrace(x) arch_trigger_all_cpu_backtrace(x)
+#endif
+
#endif
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 780f82c827b6..9b3b5663b88b 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -24,6 +24,10 @@
# error "<asm/smp.h> included in non-SMP build"
#endif

+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+#define SMP_IPI_NMI_MASK (1 << 5)
+#endif
+
#define raw_smp_processor_id() (current_thread_info()->cpu)

struct seq_file;
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 328b8ce4b007..9369eee1fa9f 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -68,6 +68,7 @@ enum ipi_msg_type {
IPI_CPU_STOP,
IPI_TIMER,
IPI_IRQ_WORK,
+ IPI_CPU_BACKTRACE,
};

/*
@@ -485,6 +486,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
S(IPI_CPU_STOP, "CPU stop interrupts"),
S(IPI_TIMER, "Timer broadcast interrupts"),
S(IPI_IRQ_WORK, "IRQ work interrupts"),
+ S(IPI_CPU_BACKTRACE, "backtrace interrupts"),
};

static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
@@ -559,6 +561,66 @@ static void ipi_cpu_stop(unsigned int cpu)
cpu_relax();
}

+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+
+/* For reliability, we're prepared to waste bits here. */
+static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
+
+void arch_trigger_all_cpu_backtrace(bool include_self)
+{
+ int i;
+ int this_cpu = get_cpu();
+
+ if (0 != printk_nmi_backtrace_prepare()) {
+ /*
+ * If there is already an nmi printk sequence in
+ * progress then just give up...
+ */
+ put_cpu();
+ return;
+ }
+
+ cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
+ if (!include_self)
+ cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
+
+ if (!cpumask_empty(to_cpumask(backtrace_mask))) {
+ pr_info("Sending NMI to %s CPUs:\n",
+ (include_self ? "all" : "other"));
+ smp_cross_call(to_cpumask(backtrace_mask), IPI_CPU_BACKTRACE);
+ }
+
+ /* Wait for up to 10 seconds for all CPUs to do the backtrace */
+ for (i = 0; i < 3 * 1000; i++) {
+ if (cpumask_empty(to_cpumask(backtrace_mask)))
+ break;
+ mdelay(1);
+ touch_softlockup_watchdog();
+ }
+
+ printk_nmi_backtrace_complete();
+ put_cpu();
+}
+
+void ipi_cpu_backtrace(struct pt_regs *regs)
+{
+ int cpu = smp_processor_id();
+
+ BUILD_BUG_ON(SMP_IPI_NMI_MASK != BIT(IPI_CPU_BACKTRACE));
+
+ if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
+ printk_nmi_backtrace_this_cpu_begin();
+ pr_warn("NMI backtrace for cpu %d\n", cpu);
+ show_regs(regs);
+ show_stack(NULL, NULL);
+ printk_nmi_backtrace_this_cpu_end();
+
+ cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
+ }
+}
+
+#endif /* CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS */
+
/*
* Main handler for inter-processor interrupts
*/
@@ -605,6 +667,14 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
break;
#endif

+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ case IPI_CPU_BACKTRACE:
+ nmi_enter();
+ ipi_cpu_backtrace(regs);
+ nmi_exit();
+ break;
+#endif
+
default:
pr_crit("CPU%u: Unknown IPI message 0x%x\n", cpu, ipinr);
break;
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 3923b2a2150c..bdb134ed4519 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -34,6 +34,16 @@
#include "irq-gic-common.h"
#include "irqchip.h"

+#ifndef SMP_IPI_NMI_MASK
+#define SMP_IPI_NMI_MASK 0
+#endif
+
+/*
+ * Copied from arm-gic.h (which we cannot include here because it conflicts
+ * with arm-gic-v3.h)
+ */
+#define GIC_DIST_PRI 0x400
+
struct redist_region {
void __iomem *redist_base;
phys_addr_t phys_base;
@@ -438,6 +448,7 @@ static int gic_dist_supports_lpis(void)
static void gic_cpu_init(void)
{
void __iomem *rbase;
+ unsigned long nmimask, hwirq;

/* Register ourselves with the rest of the world */
if (gic_populate_rdist())
@@ -455,6 +466,23 @@ static void gic_cpu_init(void)

/* initialise system registers */
gic_cpu_sys_reg_init();
+
+ /* Boost the priority of any IPI in the mask */
+ nmimask = SMP_IPI_NMI_MASK;
+ for_each_set_bit(hwirq, &nmimask, 16) {
+ unsigned int pri_reg = (hwirq / 4) * 4;
+ u32 pri_mask = BIT(6 + ((hwirq % 4) * 8));
+ u32 pri_val = readl_relaxed(rbase + GIC_DIST_PRI + pri_reg);
+ u32 actual;
+
+ pri_mask |= BIT(7 + ((hwirq % 4) * 8));
+ pri_val &= ~pri_mask; /* priority boost */
+ writel_relaxed(pri_val, rbase + GIC_DIST_PRI + pri_reg);
+
+ actual = readl_relaxed(rbase + GIC_DIST_PRI + pri_reg);
+ }
+ gic_dist_wait_for_rwp();
+ gic_redist_wait_for_rwp();
}

#ifdef CONFIG_SMP
--
2.1.0
Daniel Thompson
2015-03-18 14:20:26 UTC
Permalink
Currently irqflags is implemented using the PSR's I bit. It is possible
to implement irqflags by using the co-processor interface to the GIC.
Using the co-processor interface makes it feasible to simulate NMIs
using GIC interrupt prioritization.

This patch changes the irqflags macros to modify, save and restore
ICC_PMR_EL1. This has a substantial knock on effect for the rest of
the kernel. There are three reasons for this:

1. The state of the ICC_PMR_EL1_G_BIT becomes part of the CPU context
and must be saved and restored during traps. To simplify the
additional context management the ICC_PMR_EL1_G_BIT is converted
into a fake (reserved) bit within the PSR (PSR_G_BIT). Naturally
this approach will need to be changed if future ARM architecture
extensions make use of this bit.

2. The hardware automatically masks the I bit (at boot, during traps, etc).
When the I bit is set by hardware we must add code to switch from I
bit masking and PMR masking.

3. Some instructions, noteably wfi, require that the PMR not be used
for interrupt masking. Before calling these instructions we must
switch from PMR masking to I bit masking.

Signed-off-by: Daniel Thompson <***@linaro.org>
---
arch/arm64/Kconfig | 15 ++++++
arch/arm64/include/asm/assembler.h | 72 ++++++++++++++++++++++++++---
arch/arm64/include/asm/irqflags.h | 89 ++++++++++++++++++++++++++++++++++++
arch/arm64/include/asm/ptrace.h | 10 ++++
arch/arm64/include/uapi/asm/ptrace.h | 8 ++++
arch/arm64/kernel/entry.S | 70 ++++++++++++++++++++++++----
arch/arm64/kernel/head.S | 27 +++++++++++
arch/arm64/mm/cache.S | 4 +-
arch/arm64/mm/proc.S | 19 ++++++++
drivers/irqchip/irq-gic-v3.c | 29 +++++++++++-
include/linux/irqchip/arm-gic-v3.h | 4 +-
include/linux/irqchip/arm-gic.h | 2 +-
12 files changed, 329 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0725a6051872..91c0babb6132 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -611,6 +611,21 @@ config SETEND_EMULATION
If unsure, say Y
endif

+config USE_ICC_SYSREGS_FOR_IRQFLAGS
+ bool "Use ICC system registers for IRQ masking"
+ select CONFIG_ARM_GIC_V3
+ help
+ Using the ICC system registers for IRQ masking makes it possible
+ to simulate NMI on ARM64 systems. This allows several interesting
+ features (especially debug features) to be used on these systems.
+
+ Say Y here to implement IRQ masking using ICC system
+ registers. This will result in an unbootable kernel if these
+ registers are not implemented or made inaccessible by the
+ EL3 firmare or EL2 hypervisor (if present).
+
+ If unsure, say N
+
endmenu

menu "Boot options"
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 750bac4e637e..0a0a97f3c3c1 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -23,6 +23,7 @@
#ifndef __ASM_ASSEMBLER_H
#define __ASM_ASSEMBLER_H

+#include <linux/irqchip/arm-gic-v3.h>
#include <asm/ptrace.h>
#include <asm/thread_info.h>

@@ -39,26 +40,79 @@
.endm

/*
+ * Enable and disable pseudo NMI.
+ */
+ .macro disable_nmi
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ msr daifset, #2
+#endif
+ .endm
+
+ .macro enable_nmi
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ msr daifclr, #2
+#endif
+ .endm
+
+/*
+ * Save/disable and restore pseudo NMI.
+ */
+ .macro save_and_disable_nmis, olddaif
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ mrs \olddaif, daif /* Get flags */
+ disable_nmi
+#endif
+ .endm
+
+ .macro restore_nmis, olddaif
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ msr daif, \olddaif
+#endif
+ .endm
+
+/*
* Enable and disable interrupts.
*/
- .macro disable_irq
+ .macro disable_irq, tmp
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ mov \tmp, #ICC_PMR_EL1_MASKED
+ msr_s ICC_PMR_EL1, \tmp
+ isb
+#else
msr daifset, #2
+#endif
.endm

- .macro enable_irq
+ .macro enable_irq, tmp
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ enable_nmi
+ mov \tmp, #ICC_PMR_EL1_UNMASKED
+ msr_s ICC_PMR_EL1, \tmp
+ isb
+#else
msr daifclr, #2
+#endif
.endm

/*
* Save/disable and restore interrupts.
*/
- .macro save_and_disable_irqs, olddaif
- mrs \olddaif, daif
- disable_irq
+ .macro save_and_disable_irqs, olddaif, tmp
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ mrs_s \olddaif, ICC_PMR_EL1 /* Get PMR */
+#else
+ mrs \olddaif, daif /* Get flags */
+#endif
+ disable_irq \tmp
.endm

.macro restore_irqs, olddaif
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ msr_s ICC_PMR_EL1, \olddaif /* Write to PMR */
+ isb
+#else
msr daif, \olddaif
+#endif
.endm

/*
@@ -90,13 +144,19 @@
9990:
.endm

+
/*
* Enable both debug exceptions and interrupts. This is likely to be
* faster than two daifclr operations, since writes to this register
* are self-synchronising.
*/
- .macro enable_dbg_and_irq
+ .macro enable_dbg_and_irq, tmp
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ enable_dbg
+ enable_irq \tmp
+#else
msr daifclr, #(8 | 2)
+#endif
.endm

/*
diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
index df7477af6389..7b6866022f82 100644
--- a/arch/arm64/include/asm/irqflags.h
+++ b/arch/arm64/include/asm/irqflags.h
@@ -20,6 +20,8 @@

#include <asm/ptrace.h>

+#ifndef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+
/*
* CPU interrupt mask handling.
*/
@@ -84,6 +86,93 @@ static inline int arch_irqs_disabled_flags(unsigned long flags)
return flags & PSR_I_BIT;
}

+#else /* CONFIG_IRQFLAGS_GIC_MASKING */
+
+#include <linux/irqchip/arm-gic-v3.h>
+
+/*
+ * CPU interrupt mask handling.
+ */
+static inline unsigned long arch_local_irq_save(void)
+{
+ unsigned long flags, masked = ICC_PMR_EL1_MASKED;
+
+ asm volatile(
+ "// arch_local_irq_save\n"
+ "mrs_s %0, " __stringify(ICC_PMR_EL1) "\n"
+ "msr_s " __stringify(ICC_PMR_EL1) ",%1\n"
+ "isb\n"
+ : "=&r" (flags)
+ : "r" (masked)
+ : "memory");
+
+ return flags;
+}
+
+static inline void arch_local_irq_enable(void)
+{
+ unsigned long unmasked = ICC_PMR_EL1_UNMASKED;
+
+ asm volatile(
+ "// arch_local_irq_enable\n"
+ "msr_s " __stringify(ICC_PMR_EL1) ",%0\n"
+ "isb\n"
+ :
+ : "r" (unmasked)
+ : "memory");
+}
+
+static inline void arch_local_irq_disable(void)
+{
+ unsigned long masked = ICC_PMR_EL1_MASKED;
+
+ asm volatile(
+ "// arch_local_irq_disable\n"
+ "msr_s " __stringify(ICC_PMR_EL1) ",%0\n"
+ "isb\n"
+ :
+ : "r" (masked)
+ : "memory");
+}
+
+/*
+ * Save the current interrupt enable state.
+ */
+static inline unsigned long arch_local_save_flags(void)
+{
+ unsigned long flags;
+
+ asm volatile(
+ "// arch_local_save_flags\n"
+ "mrs_s %0, " __stringify(ICC_PMR_EL1) "\n"
+ : "=r" (flags)
+ :
+ : "memory");
+
+ return flags;
+}
+
+/*
+ * restore saved IRQ state
+ */
+static inline void arch_local_irq_restore(unsigned long flags)
+{
+ asm volatile(
+ "// arch_local_irq_restore\n"
+ "msr_s " __stringify(ICC_PMR_EL1) ",%0\n"
+ "isb\n"
+ :
+ : "r" (flags)
+ : "memory");
+}
+
+static inline int arch_irqs_disabled_flags(unsigned long flags)
+{
+ return !(flags & ICC_PMR_EL1_G_BIT);
+}
+
+#endif /* CONFIG_IRQFLAGS_GIC_MASKING */
+
#define local_fiq_enable() asm("msr daifclr, #1" : : : "memory")
#define local_fiq_disable() asm("msr daifset, #1" : : : "memory")

diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index a379596e0888..da4c593b596d 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -25,6 +25,16 @@
#define CurrentEL_EL1 (1 << 2)
#define CurrentEL_EL2 (2 << 2)

+/* PMR values used to mask/unmask interrupts */
+#define ICC_PMR_EL1_G_SHIFT 6
+#define ICC_PMR_EL1_G_BIT (1 << ICC_PMR_EL1_G_SHIFT)
+#define ICC_PMR_EL1_UNMASKED 0xf0
+#define ICC_PMR_EL1_MASKED (ICC_PMR_EL1_UNMASKED ^ ICC_PMR_EL1_G_BIT)
+
+#define PSR_G_SHIFT 22
+#define PSR_G_PMR_G_SHIFT (PSR_G_SHIFT - ICC_PMR_EL1_G_SHIFT)
+#define PSR_I_PMR_G_SHIFT (7 - ICC_PMR_EL1_G_SHIFT)
+
/* AArch32-specific ptrace requests */
#define COMPAT_PTRACE_GETREGS 12
#define COMPAT_PTRACE_SETREGS 13
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 6913643bbe54..6c640d6430d8 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -51,6 +51,14 @@
#define PSR_N_BIT 0x80000000

/*
+ * This is the GIC interrupt mask bit. It is not actually part of the
+ * PSR, we are simply using some reserved bits in the PSR to store some state
+ * from the interrupt controller. The context save/restore functions will
+ * extract the ICC_PMR_EL1_G_BIT and save it as the PSR_G_BIT.
+ */
+#define PSR_G_BIT 0x00400000
+
+/*
* Groups of PSR bits
*/
#define PSR_f 0xff000000 /* Flags */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 0aa5747639f8..964573308e68 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -20,6 +20,7 @@

#include <linux/init.h>
#include <linux/linkage.h>
+#include <linux/irqchip/arm-gic-v3.h>

#include <asm/assembler.h>
#include <asm/asm-offsets.h>
@@ -94,6 +95,26 @@
.endif
mrs x22, elr_el1
mrs x23, spsr_el1
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ /*
+ * Save the context held in the PMR register and copy the current
+ * I bit state to the PMR. Re-enable of the I bit happens in later
+ * code that knows what type of trap we are handling.
+ */
+ mrs_s x20, ICC_PMR_EL1 // Get PMR
+ and x20, x20, #ICC_PMR_EL1_G_BIT // Extract mask bit
+ lsl x20, x20, #PSR_G_PMR_G_SHIFT // Shift to a PSTATE RES0 bit
+ eor x20, x20, #PSR_G_BIT // Invert bit
+ orr x23, x20, x23 // Store PMR within PSTATE
+ orr x20, x23, #PSR_I_BIT // Extract I bit
+ .if PSR_I_PMR_G_SHIFT != 0
+ lsr x20, x20, #PSR_I_PMR_G_SHIFT // Shift down to meet mask bit
+ .endif
+ eor x20, x20, #ICC_PMR_EL1_UNMASKED // If I set: 0xb0 else: 0xf0
+ msr_s ICC_PMR_EL1, x20 // Write to PMR
+ isb
+#endif
+
stp lr, x21, [sp, #S_LR]
stp x22, x23, [sp, #S_PC]

@@ -121,6 +142,18 @@
ldr x23, [sp, #S_SP] // load return stack pointer
msr sp_el0, x23
.endif
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ /*
+ * Restore the context to the PMR (and ensure the reserved bit is
+ * restored to zero before being copied back to the PSR).
+ */
+ and x20, x22, #PSR_G_BIT // Get stolen PSTATE bit
+ and x22, x22, #~PSR_G_BIT // Clear stolen bit
+ lsr x20, x20, #PSR_G_PMR_G_SHIFT // Shift back to PMR mask
+ eor x20, x20, #ICC_PMR_EL1_UNMASKED // x20 gets 0xf0 or 0xb0
+ msr_s ICC_PMR_EL1, x20 // Write to PMR
+ isb
+#endif
msr elr_el1, x21 // set up the return data
msr spsr_el1, x22
.if \ret
@@ -288,16 +321,22 @@ el1_da:
* Data abort handling
*/
mrs x0, far_el1
+ enable_nmi
enable_dbg
// re-enable interrupts if they were enabled in the aborted context
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ tbnz x23, #PSR_G_SHIFT, 1f // PSR_G_BIT
+#else
tbnz x23, #7, 1f // PSR_I_BIT
- enable_irq
+#endif
+ enable_irq x2
1:
mov x2, sp // struct pt_regs
bl do_mem_abort

// disable interrupts before pulling preserved data off the stack
- disable_irq
+ disable_irq x21
+ disable_nmi
kernel_exit 1
el1_sp_pc:
/*
@@ -337,6 +376,7 @@ ENDPROC(el1_sync)
.align 6
el1_irq:
kernel_entry 1
+ enable_nmi
enable_dbg
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_off
@@ -356,6 +396,7 @@ el1_irq:
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_on
#endif
+ disable_nmi
kernel_exit 1
ENDPROC(el1_irq)

@@ -450,7 +491,8 @@ el0_da:
*/
mrs x26, far_el1
// enable interrupts before calling the main handler
- enable_dbg_and_irq
+ enable_nmi
+ enable_dbg_and_irq x0
ct_user_exit
bic x0, x26, #(0xff << 56)
mov x1, x25
@@ -463,7 +505,8 @@ el0_ia:
*/
mrs x26, far_el1
// enable interrupts before calling the main handler
- enable_dbg_and_irq
+ enable_nmi
+ enable_dbg_and_irq x0
ct_user_exit
mov x0, x26
orr x1, x25, #1 << 24 // use reserved ISS bit for instruction aborts
@@ -496,7 +539,8 @@ el0_sp_pc:
*/
mrs x26, far_el1
// enable interrupts before calling the main handler
- enable_dbg_and_irq
+ enable_nmi
+ enable_dbg_and_irq x0
mov x0, x26
mov x1, x25
mov x2, sp
@@ -507,7 +551,8 @@ el0_undef:
* Undefined instruction
*/
// enable interrupts before calling the main handler
- enable_dbg_and_irq
+ enable_nmi
+ enable_dbg_and_irq x0
ct_user_exit
mov x0, sp
bl do_undefinstr
@@ -538,6 +583,7 @@ ENDPROC(el0_sync)
el0_irq:
kernel_entry 0
el0_irq_naked:
+ enable_nmi
enable_dbg
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_off
@@ -587,11 +633,12 @@ ENDPROC(cpu_switch_to)
* and this includes saving x0 back into the kernel stack.
*/
ret_fast_syscall:
- disable_irq // disable interrupts
+ disable_irq x21 // disable interrupts
ldr x1, [tsk, #TI_FLAGS]
and x2, x1, #_TIF_WORK_MASK
cbnz x2, fast_work_pending
enable_step_tsk x1, x2
+ disable_nmi
kernel_exit 0, ret = 1

/*
@@ -606,7 +653,8 @@ work_pending:
mov x0, sp // 'regs'
tst x2, #PSR_MODE_MASK // user mode regs?
b.ne no_work_pending // returning to kernel
- enable_irq // enable interrupts for do_notify_resume()
+ enable_nmi
+ enable_irq x21 // enable interrupts for do_notify_resume()
bl do_notify_resume
b ret_to_user
work_resched:
@@ -616,12 +664,13 @@ work_resched:
* "slow" syscall return path.
*/
ret_to_user:
- disable_irq // disable interrupts
+ disable_irq x21 // disable interrupts
ldr x1, [tsk, #TI_FLAGS]
and x2, x1, #_TIF_WORK_MASK
cbnz x2, work_pending
enable_step_tsk x1, x2
no_work_pending:
+ disable_nmi
kernel_exit 0, ret = 0
ENDPROC(ret_to_user)

@@ -652,7 +701,8 @@ el0_svc:
adrp stbl, sys_call_table // load syscall table pointer
el0_svc_naked: // compat entry point
stp x0, scno, [sp, #S_ORIG_X0] // save the original x0 and syscall number
- enable_dbg_and_irq
+ enable_nmi
+ enable_dbg_and_irq x16
ct_user_exit 1

ldr x16, [tsk, #TI_FLAGS] // check for syscall hooks
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 9a4ef52abf7c..6226d24f8107 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -445,6 +445,30 @@ __switch_data:
.quad init_thread_union + THREAD_START_SP // sp

/*
+ * Conditional macro to enable interrupt controller system register access.
+ *
+ * Before we jump into generic code we must enable interrupt controller system
+ * register access because this is required by the irqflags macros. We must
+ * also mask interrupts at the PMR and unmask them within the PSR. That leaves
+ * us set up and ready for the kernel to make its first call to
+ * arch_local_irq_enable().
+ *
+ * Corrupts: tmp
+ */
+ .macro enable_icc_sysregs, tmp
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ mrs_s \tmp, ICC_SRE_EL1
+ orr \tmp, \tmp, #ICC_SRE_EL1_SRE
+ msr_s ICC_SRE_EL1, \tmp // Set ICC_SRE_EL1.SRE==1
+ isb // Make sure SRE is now set
+ mov \tmp, ICC_PMR_EL1_MASKED
+ msr_s ICC_PMR_EL1, \tmp // Prepare for unmask of I bit
+ isb
+ msr daifclr, #2 // Clear the I bit
+#endif
+ .endm
+
+/*
* The following fragment of code is executed with the MMU on in MMU mode, and
* uses absolute addresses; this is not position independent.
*/
@@ -464,6 +488,8 @@ __mmap_switched:
str x22, [x4] // Save processor ID
str x21, [x5] // Save FDT pointer
str x24, [x6] // Save PHYS_OFFSET
+
+ enable_icc_sysregs x29 // May be nop
mov x29, #0
b start_kernel
ENDPROC(__mmap_switched)
@@ -652,6 +678,7 @@ ENDPROC(secondary_startup)
ENTRY(__secondary_switched)
ldr x0, [x21] // get secondary_data.stack
mov sp, x0
+ enable_icc_sysregs x29 // may be nop
mov x29, #0
b secondary_start_kernel
ENDPROC(__secondary_switched)
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index 2560e1e1562e..f34aab45f948 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -46,10 +46,12 @@ loop1:
and x1, x1, #7 // mask of the bits for current cache only
cmp x1, #2 // see what cache we have at this level
b.lt skip // skip if no cache, or just i-cache
- save_and_disable_irqs x9 // make CSSELR and CCSIDR access atomic
+ save_and_disable_irqs x9, x4 // make CSSELR and CCSIDR access atomic
+ save_and_disable_nmis x4
msr csselr_el1, x10 // select current cache level in csselr
isb // isb to sych the new cssr&csidr
mrs x1, ccsidr_el1 // read the new ccsidr
+ restore_nmis x4
restore_irqs x9
and x2, x1, #7 // extract the length of the cache lines
add x2, x2, #4 // add 4 (line length offset)
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 28eebfb6af76..f91e66ecef80 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -20,6 +20,7 @@

#include <linux/init.h>
#include <linux/linkage.h>
+#include <linux/irqchip/arm-gic-v3.h>
#include <asm/assembler.h>
#include <asm/asm-offsets.h>
#include <asm/hwcap.h>
@@ -95,10 +96,28 @@ ENDPROC(cpu_soft_restart)
* cpu_do_idle()
*
* Idle the processor (wait for interrupt).
+ *
+ * If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is set we must do additional
+ * work to ensure that interrupts are not masked at the PMR (because the
+ * core will not wake up if we block the wake up signal in the interrupt
+ * controller).
*/
ENTRY(cpu_do_idle)
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ mrs x0, daif // save I bit
+ msr daifset, #2 // set I bit
+ mrs_s x1, ICC_PMR_EL1 // save PMR
+ mov x2, #ICC_PMR_EL1_UNMASKED
+ msr_s ICC_PMR_EL1, x2 // unmask at PMR
+ isb
+#endif
dsb sy // WFI may enter a low-power mode
wfi
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ msr_s ICC_PMR_EL1, x1 // restore PMR
+ isb
+ msr daif, x0 // restore I bit
+#endif
ret
ENDPROC(cpu_do_idle)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 32533650494c..3923b2a2150c 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -110,8 +110,33 @@ static void gic_redist_wait_for_rwp(void)
static u64 __maybe_unused gic_read_iar(void)
{
u64 irqstat;
+ u64 __maybe_unused daif;
+ u64 __maybe_unused pmr;
+ u64 __maybe_unused default_pmr_value = DEFAULT_PMR_VALUE;

+#ifndef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
asm volatile("mrs_s %0, " __stringify(ICC_IAR1_EL1) : "=r" (irqstat));
+#else
+ /*
+ * The PMR may be configured to mask interrupts when this code is
+ * called, thus in order to acknowledge interrupts we must set the
+ * PMR to its default value before reading from the IAR.
+ *
+ * To do this without taking an interrupt we also ensure the I bit
+ * is set whilst we are interfering with the value of the PMR.
+ */
+ asm volatile(
+ "mrs %1, daif\n" /* save I bit */
+ "msr daifset, #2\n" /* set I bit */
+ "mrs_s %2, " __stringify(ICC_PMR_EL1) "\n" /* save PMR */
+ "msr_s " __stringify(ICC_PMR_EL1) ",%3\n" /* set PMR */
+ "mrs_s %0, " __stringify(ICC_IAR1_EL1) "\n" /* ack int */
+ "msr_s " __stringify(ICC_PMR_EL1) ",%2\n" /* restore PMR */
+ "isb\n"
+ "msr daif, %1" /* restore I */
+ : "=r" (irqstat), "=&r" (daif), "=&r" (pmr)
+ : "r" (default_pmr_value));
+#endif
return irqstat;
}

@@ -142,7 +167,7 @@ static void __maybe_unused gic_write_sgi1r(u64 val)
asm volatile("msr_s " __stringify(ICC_SGI1R_EL1) ", %0" : : "r" (val));
}

-static void gic_enable_sre(void)
+static void __maybe_unused gic_enable_sre(void)
{
u64 val;

@@ -382,11 +407,13 @@ static int gic_populate_rdist(void)

static void gic_cpu_sys_reg_init(void)
{
+#ifndef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
/* Enable system registers */
gic_enable_sre();

/* Set priority mask register */
gic_write_pmr(DEFAULT_PMR_VALUE);
+#endif

/*
* On FVP, CPU 0 arrives in the kernel with its BPR changed from the
diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index 79d6645897e6..67b90813b619 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -242,7 +242,7 @@
*/
#define ICC_CTLR_EL1_EOImode_drop_dir (0U << 1)
#define ICC_CTLR_EL1_EOImode_drop (1U << 1)
-#define ICC_SRE_EL1_SRE (1U << 0)
+#define ICC_SRE_EL1_SRE (1 << 0)

/*
* Hypervisor interface registers (SRE only)
@@ -346,6 +346,8 @@

#include <linux/stringify.h>

+struct device_node;
+
/*
* We need a value to serve as a irq-type for LPIs. Choose one that will
* hopefully pique the interest of the reviewer.
diff --git a/include/linux/irqchip/arm-gic.h b/include/linux/irqchip/arm-gic.h
index 71d706d5f169..65340ce7b5af 100644
--- a/include/linux/irqchip/arm-gic.h
+++ b/include/linux/irqchip/arm-gic.h
@@ -49,7 +49,7 @@
#define GICD_INT_EN_CLR_X32 0xffffffff
#define GICD_INT_EN_SET_SGI 0x0000ffff
#define GICD_INT_EN_CLR_PPI 0xffff0000
-#define GICD_INT_DEF_PRI 0xa0
+#define GICD_INT_DEF_PRI 0xc0
#define GICD_INT_DEF_PRI_X4 ((GICD_INT_DEF_PRI << 24) |\
(GICD_INT_DEF_PRI << 16) |\
(GICD_INT_DEF_PRI << 8) |\
--
2.1.0
Dave Martin
2015-03-20 15:45:44 UTC
Permalink
Post by Daniel Thompson
This patchset provides a pseudo-NMI for arm64 kernels by reimplementing
the irqflags macros to modify the GIC PMR (the priority mask register is
accessible as a system register on GICv3 and later) rather than the
PSR. The pseudo-NMI changes are support by a prototype implementation of
arch_trigger_all_cpu_backtrace that allows the new code to be exercised.
Hi there,

Interesting series -- I've not gone into all the code in detail yet,
but I have some high-level comments and questions first.
Post by Daniel Thompson
In addition to the arm64 changes I've bundled in a few patches from
other patchsets to make the patchset self-contained. Of particular note
of the serial break emulation patch which allows ^B^R^K to be used
instead of a serial break to trigger SysRq-L (FVP UART sockets don't
seem to support serial breaks). This makes it easy to run
arch_trigger_all_cpu_backtrace from an IRQ handler (i.e. somewhere with
interrupts masked so we are forced to preempt and take the NMI).
The code works-for-me (tm) but there are currently some pretty serious
limitations.
1. Exercised only on the foundation model with gicv3 support. It has
not been exercised on real silicon or even on the more advanced
ARM models.
2. It has been written without any documentation describing GICv3
architecture (which has not yet been released by ARM). I've been
guessing about the behaviour based on the ARMv8 and GICv2
architecture specs. The code works on the foundation model but
I cannot check that it conforms architecturally.
Review is the best way to approact that for now. If the code can be
demonstrated on the model, that's a good start.
Post by Daniel Thompson
3. Requires GICv3+ hardware together with firmware support to enable
GICv3 features at EL3. If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is
enabled the kernel will not boot on older hardware. It will be hard
to diagnose because we will crash very early in the boot (i.e.
before the call to start_kernel). Auto-detection might be possible
but the performance and code size cost of adding conditional code to
the irqflags macros probably makes it impractical. As such it may
never be possible to remove this limitation (although it might be
possible to find a way to survive long enough to panic and show the
results on the console).
This can (and should) be done via patching -- otherwise we risk breaking
single kernel image for GICv2+v3.
Post by Daniel Thompson
4. No benchmarking (see #1 above). Unlike the PSR, updates to the PMR
do not self synchronize which requires me to sprinkle isb
instructions fairly liberally. I've been told the cost of isb varies
from almost-free (A53) to somewhat costly (A57) thus we export this
code to reduce kernel performance. However this needs to be
quantified and I am currently unable to do this. I'd really like to
but don't have any suitable hardware.
5. There is no code in el1_irq to detect NMI and switch from IRQ to NMI
handling. This means all the irq handling machinary is re-entered in
order to handle the NMI. This not safe and deadlocks are likely.
This is a severe limitation although, in this proof-of-concept
work, NMI can only be triggered by SysRq-L or severe kernel damage.
This means we just about get away with it for simple test (lockdep
detects that we are doing wrong and shows a backtrace). This is
definitely the first thing that needs to be tackled to take this
code further.
Indeed, and this does look a bit weird at present... it took me a
while to figure out where NMIs could possibly be coming from in
this series.
Post by Daniel Thompson
Note also that alternative approaches to implementing a pseudo-NMI on
arm64 are possible but only through runtime cooperation with other
software components in the system, potentially both those running at EL3
and at secure EL1. I should like to explore these options in future but,
For the KVM case, vFIQ is an obvious choice, but you're correct that
all other scenarios would require cooperation from a separate hypervisor/
firmware etc.

Ideally, we should avoid having multiple ways of implementing the same
thing.
Post by Daniel Thompson
as far as I know, this is the only sane way to provide NMI-like features
whilst being implementable entirely in non-secure EL1[1]
[1] Except for a single register write to ICC_SRE_EL3 by the EL3
firmware (and already implemented by ARM trusted firmware).
Even that would require more of the memory-mapped GIC CPU interface
to be NS-accessible than is likely to be the case on product
platforms. Note also that the memory-mapped interface is not
mandated for GICv3, so some platforms may simply not have it.


Some other generalities that don't seem to be addressed yet:

* How are NMIs prioritised with respect to other interrupts and
exceptions? This needs to be concretely specified. A sensible
answer would probably be that the effect is to split the
existing single-priority IRQ into two bands: ordinary IRQs
and NMIs. Prioritisation against FIQ and other exceptions
would be unaffected.

I think this is effectively what you've implemented so far.

* Should it be possible to map SPIs as NMIs? How would they
be configured/registed? Should it be possible to register
multiple interrupts as NMIs?

* What about interrupt affinity?


Some other points:

* I feel uneasy about using reserved SPSR fields to store
information. This is probably OK for now, but it might
be cleaner simply to save/restore the PMR directly.

Providing that the affected bit is cleared before writing
to the SPSR (as you do already in kernel_exit) looks
workable, but wonder whether the choice of bit should be
UAPI -- it may have to change in the future.


* You can probably thin out the ISBs.

I believe that the via the system register interface,
the GICC PMR is intended to be self-synchronising.

* The value BPR resets to is implementation-dependent.
It should be initialised on each CPU if we are going to rely
on its value, on all platforms. This isn't specific to FVP.

* Is ICC_CTLR_EL1.CBPR set correctly?


Cheers
---Dave
Daniel Thompson
2015-03-23 18:47:53 UTC
Permalink
Post by Dave Martin
Post by Daniel Thompson
This patchset provides a pseudo-NMI for arm64 kernels by reimplementing
the irqflags macros to modify the GIC PMR (the priority mask register is
accessible as a system register on GICv3 and later) rather than the
PSR. The pseudo-NMI changes are support by a prototype implementation of
arch_trigger_all_cpu_backtrace that allows the new code to be exercised.
Hi there,
Interesting series -- I've not gone into all the code in detail yet,
but I have some high-level comments and questions first.
Post by Daniel Thompson
In addition to the arm64 changes I've bundled in a few patches from
other patchsets to make the patchset self-contained. Of particular note
of the serial break emulation patch which allows ^B^R^K to be used
instead of a serial break to trigger SysRq-L (FVP UART sockets don't
seem to support serial breaks). This makes it easy to run
arch_trigger_all_cpu_backtrace from an IRQ handler (i.e. somewhere with
interrupts masked so we are forced to preempt and take the NMI).
The code works-for-me (tm) but there are currently some pretty serious
limitations.
1. Exercised only on the foundation model with gicv3 support. It has
not been exercised on real silicon or even on the more advanced
ARM models.
2. It has been written without any documentation describing GICv3
architecture (which has not yet been released by ARM). I've been
guessing about the behaviour based on the ARMv8 and GICv2
architecture specs. The code works on the foundation model but
I cannot check that it conforms architecturally.
Review is the best way to approact that for now. If the code can be
demonstrated on the model, that's a good start.
Post by Daniel Thompson
3. Requires GICv3+ hardware together with firmware support to enable
GICv3 features at EL3. If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is
enabled the kernel will not boot on older hardware. It will be hard
to diagnose because we will crash very early in the boot (i.e.
before the call to start_kernel). Auto-detection might be possible
but the performance and code size cost of adding conditional code to
the irqflags macros probably makes it impractical. As such it may
never be possible to remove this limitation (although it might be
possible to find a way to survive long enough to panic and show the
results on the console).
This can (and should) be done via patching -- otherwise we risk breaking
single kernel image for GICv2+v3.
Do you mean real patching (hunting down all those inlines and rewrite
them) or simply implementing irqflags with an ops table? If the former I
didn't look at this because I didn't release we could do that...
Post by Dave Martin
Post by Daniel Thompson
4. No benchmarking (see #1 above). Unlike the PSR, updates to the PMR
do not self synchronize which requires me to sprinkle isb
instructions fairly liberally. I've been told the cost of isb varies
from almost-free (A53) to somewhat costly (A57) thus we export this
code to reduce kernel performance. However this needs to be
quantified and I am currently unable to do this. I'd really like to
but don't have any suitable hardware.
5. There is no code in el1_irq to detect NMI and switch from IRQ to NMI
handling. This means all the irq handling machinary is re-entered in
order to handle the NMI. This not safe and deadlocks are likely.
This is a severe limitation although, in this proof-of-concept
work, NMI can only be triggered by SysRq-L or severe kernel damage.
This means we just about get away with it for simple test (lockdep
detects that we are doing wrong and shows a backtrace). This is
definitely the first thing that needs to be tackled to take this
code further.
Indeed, and this does look a bit weird at present... it took me a
while to figure out where NMIs could possibly be coming from in
this series.
My plan was to check the running priority register early in el1_irq and
branch to a handler specific to NMI when the priority indicates we are
handling a pseudo-NMI.
Post by Dave Martin
Post by Daniel Thompson
Note also that alternative approaches to implementing a pseudo-NMI on
arm64 are possible but only through runtime cooperation with other
software components in the system, potentially both those running at EL3
and at secure EL1. I should like to explore these options in future but,
For the KVM case, vFIQ is an obvious choice, but you're correct that
all other scenarios would require cooperation from a separate hypervisor/
firmware etc.
Ideally, we should avoid having multiple ways of implementing the same
thing.
Post by Daniel Thompson
as far as I know, this is the only sane way to provide NMI-like features
whilst being implementable entirely in non-secure EL1[1]
[1] Except for a single register write to ICC_SRE_EL3 by the EL3
firmware (and already implemented by ARM trusted firmware).
Even that would require more of the memory-mapped GIC CPU interface
to be NS-accessible than is likely to be the case on product
platforms. Note also that the memory-mapped interface is not
mandated for GICv3, so some platforms may simply not have it.
Perhaps I used clumsy phrasing here.

There is a main difference I care about is between runtime cooperation
and boot-time cooperation. The approach I have taken in the patchset
requires boot time cooperation (to configure GIC appropriately) but no
runtime cooperation.
Post by Dave Martin
* How are NMIs prioritised with respect to other interrupts and
exceptions? This needs to be concretely specified. A sensible
answer would probably be that the effect is to split the
existing single-priority IRQ into two bands: ordinary IRQs
and NMIs. Prioritisation against FIQ and other exceptions
would be unaffected.
I think this is effectively what you've implemented so far.
Pretty much. Normal interrupts run at the default priority and NMIs run
at default priority but with bit 6 cleared.

In addition I would expect most kernel exception handlers to unmask the
I-bit as soon as the context has been saved. This allows them to be
pre-empted by an NMI.
Post by Dave Martin
* Should it be possible to map SPIs as NMIs? How would they
be configured/registed? Should it be possible to register
multiple interrupts as NMIs?
Yes, although not quite yet.

The work on arm64 is following in the footsteps of similar work for arm.

My initial ideas are here (although as you can see from the review I've
got a *long* way to go):
http://thread.gmane.org/gmane.linux.kernel/1871163

However the basic theme is:

1. Use existing interrupt API to register NMI handlers (with special
flag).

2. Flag makes callback to irqchip driver. In case of GICv3 this would
alter the priority of interrupt (for ARMv7+FIQ it would also change
interrupt group but this is not needed on ARMv8+GICv3).

3. "Simple" demux. We cannot traverse all the IRQ data structures from
NMI due to locks within the structures so we need some simplifying
assumptions. My initial simplifying assumptions were:

a) NMI only for first level interrupt controller (i.e. peripherals
directly attached to this controller).

b) No shared interrupt lines.


Based on tglx's review I'm working on the basis that b) above is
simplication too many but I've not yet had the chance to go back and
have anyother go.


As you can see from the reviews I have a bit of work to do in orde
Post by Dave Martin
* What about interrupt affinity?
It should "just work" as normal if I can get the rest of the interrupt
system right. Do you foresee a specific problem?
Post by Dave Martin
* I feel uneasy about using reserved SPSR fields to store
information. This is probably OK for now, but it might
be cleaner simply to save/restore the PMR directly.
Providing that the affected bit is cleared before writing
to the SPSR (as you do already in kernel_exit) looks
workable, but wonder whether the choice of bit should be
UAPI -- it may have to change in the future.
I agree we ought to keep this out of the uapi.

Regarding stealing a bit from the SPSR this was mostly an implementation
convenience. It might be interesting (eventually) to benchmark it
against a more obvious approach.
Post by Dave Martin
* You can probably thin out the ISBs.
I believe that the via the system register interface,
the GICC PMR is intended to be self-synchronising.
That sounds great. I've just found the relevant line in the ARMv8
manual... I'd overlooked that before.
Post by Dave Martin
* The value BPR resets to is implementation-dependent.
It should be initialised on each CPU if we are going to rely
on its value, on all platforms. This isn't specific to FVP.
Really? As mentioned I only have a GICv2 spec but on that revision the
reset value is the minimum supported value (i.e. the same effect as
attempting to set it to zero). In other words it is technically
implementation-dependent but nevertheless defaults to a setting that
avoids any weird "globbing" effect on the interrupt priorities.

On FVP something has reached in and changed the BPR for CPU0 from its
proper reset value (all oter CPUs have correct reset value). Of course
that could be the firmware rather than the FVP itself that has caused this.

I guess it is good practice for a driver to re-establish the reset value
for register it owns and cares about but nevertheles I still expect this
register to as-reset when we handover to the kernel.
Post by Dave Martin
* Is ICC_CTLR_EL1.CBPR set correctly?
I've never checked. On GICv2 that would be secure state so to be honest
I didn't think its value is any of my business.
Post by Dave Martin
Cheers
---Dave
Dave Martin
2015-04-01 15:15:41 UTC
Permalink
Apologies for the slow reply... :/

Anyway,
Post by Daniel Thompson
Post by Dave Martin
Post by Daniel Thompson
This patchset provides a pseudo-NMI for arm64 kernels by reimplementing
the irqflags macros to modify the GIC PMR (the priority mask register is
accessible as a system register on GICv3 and later) rather than the
PSR. The pseudo-NMI changes are support by a prototype implementation of
arch_trigger_all_cpu_backtrace that allows the new code to be exercised.
Minor nit: the "pseudo NMI" terminology could lead to confusion if
something more closely resembling a real NMI comes along.

I'll have to have a think, but nothing comes to mind right now...

[...]
Post by Daniel Thompson
Post by Dave Martin
Post by Daniel Thompson
3. Requires GICv3+ hardware together with firmware support to enable
GICv3 features at EL3. If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is
enabled the kernel will not boot on older hardware. It will be hard
to diagnose because we will crash very early in the boot (i.e.
before the call to start_kernel). Auto-detection might be possible
but the performance and code size cost of adding conditional code to
the irqflags macros probably makes it impractical. As such it may
never be possible to remove this limitation (although it might be
possible to find a way to survive long enough to panic and show the
results on the console).
This can (and should) be done via patching -- otherwise we risk breaking
single kernel image for GICv2+v3.
Do you mean real patching (hunting down all those inlines and
rewrite them) or simply implementing irqflags with an ops table? If
the former I didn't look at this because I didn't release we could
do that...
A generic patching framework was introduced by Andre Przywara in this
patch:

e039ee4 arm64: add alternative runtime patching

I believe you should be able to use this to patch between DAIF and
ICC_PMR accesses.

You should be able to find examples of this framework being used by
grepping. I've not played with it myself yet.

[...]
Post by Daniel Thompson
Post by Dave Martin
Post by Daniel Thompson
5. There is no code in el1_irq to detect NMI and switch from IRQ to NMI
handling. This means all the irq handling machinary is re-entered in
order to handle the NMI. This not safe and deadlocks are likely.
This is a severe limitation although, in this proof-of-concept
work, NMI can only be triggered by SysRq-L or severe kernel damage.
This means we just about get away with it for simple test (lockdep
detects that we are doing wrong and shows a backtrace). This is
definitely the first thing that needs to be tackled to take this
code further.
Indeed, and this does look a bit weird at present... it took me a
while to figure out where NMIs could possibly be coming from in
this series.
My plan was to check the running priority register early in el1_irq
and branch to a handler specific to NMI when the priority indicates
we are handling a pseudo-NMI.
Sounds reasonable.
Post by Daniel Thompson
Post by Dave Martin
Post by Daniel Thompson
Note also that alternative approaches to implementing a pseudo-NMI on
arm64 are possible but only through runtime cooperation with other
software components in the system, potentially both those running at EL3
and at secure EL1. I should like to explore these options in future but,
For the KVM case, vFIQ is an obvious choice, but you're correct that
all other scenarios would require cooperation from a separate hypervisor/
firmware etc.
Ideally, we should avoid having multiple ways of implementing the same
thing.
Post by Daniel Thompson
as far as I know, this is the only sane way to provide NMI-like features
whilst being implementable entirely in non-secure EL1[1]
[1] Except for a single register write to ICC_SRE_EL3 by the EL3
firmware (and already implemented by ARM trusted firmware).
Even that would require more of the memory-mapped GIC CPU interface
to be NS-accessible than is likely to be the case on product
platforms. Note also that the memory-mapped interface is not
mandated for GICv3, so some platforms may simply not have it.
Perhaps I used clumsy phrasing here.
There is a main difference I care about is between runtime
cooperation and boot-time cooperation. The approach I have taken in
the patchset requires boot time cooperation (to configure GIC
appropriately) but no runtime cooperation.
I think that's reasonable. Any new boot requirements will need to be
documented (probably in booting.txt) as part of the final series
and alongside the relevant Kconfig option.
Post by Daniel Thompson
Post by Dave Martin
* How are NMIs prioritised with respect to other interrupts and
exceptions? This needs to be concretely specified. A sensible
answer would probably be that the effect is to split the
existing single-priority IRQ into two bands: ordinary IRQs
and NMIs. Prioritisation against FIQ and other exceptions
would be unaffected.
I think this is effectively what you've implemented so far.
Pretty much. Normal interrupts run at the default priority and NMIs
run at default priority but with bit 6 cleared.
In addition I would expect most kernel exception handlers to unmask
the I-bit as soon as the context has been saved. This allows them to
be pre-empted by an NMI.
Yep, that matches my expectation.
Post by Daniel Thompson
Post by Dave Martin
* Should it be possible to map SPIs as NMIs? How would they
be configured/registed? Should it be possible to register
multiple interrupts as NMIs?
Yes, although not quite yet.
The work on arm64 is following in the footsteps of similar work for arm.
My initial ideas are here (although as you can see from the review
http://thread.gmane.org/gmane.linux.kernel/1871163
1. Use existing interrupt API to register NMI handlers (with special
flag).
2. Flag makes callback to irqchip driver. In case of GICv3 this would
alter the priority of interrupt (for ARMv7+FIQ it would also change
interrupt group but this is not needed on ARMv8+GICv3).
3. "Simple" demux. We cannot traverse all the IRQ data structures from
NMI due to locks within the structures so we need some simplifying
a) NMI only for first level interrupt controller (i.e. peripherals
directly attached to this controller).
b) No shared interrupt lines.
Do other arches have ways of addressing the same problems?
Post by Daniel Thompson
Based on tglx's review I'm working on the basis that b) above is
simplication too many but I've not yet had the chance to go back and
have anyother go.
I think that it's best to avoid adding arbitrary restrictions that
make this look excessively different from working with a regular
irqchip, unless there is really some fundamental constraint in play.
Post by Daniel Thompson
As you can see from the reviews I have a bit of work to do in orde
Post by Dave Martin
* What about interrupt affinity?
It should "just work" as normal if I can get the rest of the
interrupt system right. Do you foresee a specific problem?
So long as NMI-ness is just an extra flag when registering an
interrupt, things should probably work. I was wondering about
special cases like perf (PPI on sensible SoCs) versus, say,
debug UART (SPI).
Post by Daniel Thompson
Post by Dave Martin
* I feel uneasy about using reserved SPSR fields to store
information. This is probably OK for now, but it might
be cleaner simply to save/restore the PMR directly.
Providing that the affected bit is cleared before writing
to the SPSR (as you do already in kernel_exit) looks
workable, but wonder whether the choice of bit should be
UAPI -- it may have to change in the future.
I agree we ought to keep this out of the uapi.
Regarding stealing a bit from the SPSR this was mostly an
implementation convenience. It might be interesting (eventually) to
benchmark it against a more obvious approach.
I think your current approach is OK for now, at least while the
series is under development.
Post by Daniel Thompson
Post by Dave Martin
* You can probably thin out the ISBs.
I believe that the via the system register interface,
the GICC PMR is intended to be self-synchronising.
That sounds great. I've just found the relevant line in the ARMv8
manual... I'd overlooked that before.
Post by Dave Martin
* The value BPR resets to is implementation-dependent.
It should be initialised on each CPU if we are going to rely
on its value, on all platforms. This isn't specific to FVP.
Really? As mentioned I only have a GICv2 spec but on that revision
the reset value is the minimum supported value (i.e. the same effect
as attempting to set it to zero). In other words it is technically
implementation-dependent but nevertheless defaults to a setting that
avoids any weird "globbing" effect on the interrupt priorities.
On FVP something has reached in and changed the BPR for CPU0 from
its proper reset value (all oter CPUs have correct reset value). Of
course that could be the firmware rather than the FVP itself that
has caused this.
Quite possibly. Of course, there is a strong possibility that some
real firmware will also do this (and never get fixed).

Forcing BPR to a sane state from Linux makes sense, since we can
do it.
Post by Daniel Thompson
I guess it is good practice for a driver to re-establish the reset
value for register it owns and cares about but nevertheles I still
expect this register to as-reset when we handover to the kernel.
Post by Dave Martin
* Is ICC_CTLR_EL1.CBPR set correctly?
I've never checked. On GICv2 that would be secure state so to be
honest I didn't think its value is any of my business.
If we have a dependency on how this is set up, it needs to be
documented alongside the other booting requirements.

Cheers
---Dave
Marc Zyngier
2015-04-01 15:29:38 UTC
Permalink
Post by Dave Martin
Apologies for the slow reply... :/
Anyway,
Post by Daniel Thompson
Post by Dave Martin
Post by Daniel Thompson
This patchset provides a pseudo-NMI for arm64 kernels by reimplementing
the irqflags macros to modify the GIC PMR (the priority mask register is
accessible as a system register on GICv3 and later) rather than the
PSR. The pseudo-NMI changes are support by a prototype implementation of
arch_trigger_all_cpu_backtrace that allows the new code to be exercised.
Minor nit: the "pseudo NMI" terminology could lead to confusion if
something more closely resembling a real NMI comes along.
I'll have to have a think, but nothing comes to mind right now...
[...]
Post by Daniel Thompson
Post by Dave Martin
Post by Daniel Thompson
3. Requires GICv3+ hardware together with firmware support to enable
GICv3 features at EL3. If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is
enabled the kernel will not boot on older hardware. It will be hard
to diagnose because we will crash very early in the boot (i.e.
before the call to start_kernel). Auto-detection might be possible
but the performance and code size cost of adding conditional code to
the irqflags macros probably makes it impractical. As such it may
never be possible to remove this limitation (although it might be
possible to find a way to survive long enough to panic and show the
results on the console).
This can (and should) be done via patching -- otherwise we risk breaking
single kernel image for GICv2+v3.
Do you mean real patching (hunting down all those inlines and
rewrite them) or simply implementing irqflags with an ops table? If
the former I didn't look at this because I didn't release we could
do that...
A generic patching framework was introduced by Andre Przywara in this
e039ee4 arm64: add alternative runtime patching
I believe you should be able to use this to patch between DAIF and
ICC_PMR accesses.
You should be able to find examples of this framework being used by
grepping. I've not played with it myself yet.
To follow-up on this, I have a few patches queued that use the runtime
patching code to deal with GICv3 in KVM:

http://thread.gmane.org/gmane.comp.emulators.kvm.arm.devel/616

The first few patches are already queued for v4.1, and the rest should
follow shortly after.

Cheers,

M.
--
Jazz is not dead. It just smells funny...
Daniel Thompson
2015-04-08 12:27:39 UTC
Permalink
Post by Marc Zyngier
Post by Dave Martin
Apologies for the slow reply... :/
Anyway,
Post by Daniel Thompson
Post by Dave Martin
Post by Daniel Thompson
This patchset provides a pseudo-NMI for arm64 kernels by reimplementing
the irqflags macros to modify the GIC PMR (the priority mask register is
accessible as a system register on GICv3 and later) rather than the
PSR. The pseudo-NMI changes are support by a prototype implementation of
arch_trigger_all_cpu_backtrace that allows the new code to be exercised.
Minor nit: the "pseudo NMI" terminology could lead to confusion if
something more closely resembling a real NMI comes along.
I'll have to have a think, but nothing comes to mind right now...
[...]
Post by Daniel Thompson
Post by Dave Martin
Post by Daniel Thompson
3. Requires GICv3+ hardware together with firmware support to enable
GICv3 features at EL3. If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is
enabled the kernel will not boot on older hardware. It will be hard
to diagnose because we will crash very early in the boot (i.e.
before the call to start_kernel). Auto-detection might be possible
but the performance and code size cost of adding conditional code to
the irqflags macros probably makes it impractical. As such it may
never be possible to remove this limitation (although it might be
possible to find a way to survive long enough to panic and show the
results on the console).
This can (and should) be done via patching -- otherwise we risk breaking
single kernel image for GICv2+v3.
Do you mean real patching (hunting down all those inlines and
rewrite them) or simply implementing irqflags with an ops table? If
the former I didn't look at this because I didn't release we could
do that...
A generic patching framework was introduced by Andre Przywara in this
e039ee4 arm64: add alternative runtime patching
I believe you should be able to use this to patch between DAIF and
ICC_PMR accesses.
You should be able to find examples of this framework being used by
grepping. I've not played with it myself yet.
To follow-up on this, I have a few patches queued that use the runtime
http://thread.gmane.org/gmane.comp.emulators.kvm.arm.devel/616
The first few patches are already queued for v4.1, and the rest should
follow shortly after.
Thanks (both).

That's really helpful: links for that sort of thing are not easily
googleable (things like kpatch floods the results).

Continue reading on narkive:
Loading...