Discussion:
[PATCH 0/4] arm64: unwind: fix broken exception stack dump
Ard Biesheuvel
2017-07-24 11:26:19 UTC
Permalink
This fixes an issue in the code that dumps the exception stack. As it
turns out, it doesn't display what we think it does. Please refer to
patch #3 for the details.

Patches #1 and #2 are preparatory patches. Patch #4 removes the sp
member from struct stackframe because it is broken and no longer
needed after patch #3.

Mark and I ran into this issue while working on the vmapped stacks code,
and so these changes are now prerequisites for it, especially given how
dealing with multiple stacks is simplified by these changes.

Ard Biesheuvel (3):
arm64: unwind: disregard frame.sp when validating frame pointer
arm64: unwind: reference pt_regs via embedded stack frame
arm64: unwind: remove sp from struct stackframe

Mark Rutland (1):
arm64: avoid percpu indirection for irq stack ops

arch/arm64/include/asm/irq.h | 39 +++++----------
arch/arm64/include/asm/ptrace.h | 1 +
arch/arm64/include/asm/stacktrace.h | 1 -
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kernel/entry.S | 16 +++---
arch/arm64/kernel/perf_callchain.c | 1 -
arch/arm64/kernel/process.c | 5 +-
arch/arm64/kernel/ptrace.c | 2 +-
arch/arm64/kernel/return_address.c | 1 -
arch/arm64/kernel/stacktrace.c | 51 ++------------------
arch/arm64/kernel/time.c | 1 -
arch/arm64/kernel/traps.c | 29 ++---------
12 files changed, 33 insertions(+), 115 deletions(-)
--
2.9.3
Ard Biesheuvel
2017-07-24 11:26:20 UTC
Permalink
From: Mark Rutland <***@arm.com>

Our IRQ_STACK_PTR() and on_irq_stack() helpers both take a cpu argument,
used to generate a percpu address. In all cases, they are passed
{raw_,}smp_processor_id(), so this parameter is redundant.

Since {raw_,}smp_processor_id() use a percpu variable internally, this
approach means we generate a percpu offset to find the current cpu, then
use this to index an array of percpu offsets, which we then use to find
the current CPU's IRQ stack pointer. Thus, most of the work is
redundant.

Instead, we can consistently use raw_cpu_ptr() to generate the CPU's
irq_stack pointer by simply adding the percpu offset to the irq_stack
address, which is simpler in both respects.

Signed-off-by: Mark Rutland <***@arm.com>
Cc: Catalin Marinas <***@arm.com>
Cc: James Morse <***@arm.com>
Cc: Will Deacon <***@arm.com>
Signed-off-by: Ard Biesheuvel <***@linaro.org>
---
arch/arm64/include/asm/irq.h | 4 ++--
arch/arm64/kernel/ptrace.c | 2 +-
arch/arm64/kernel/stacktrace.c | 4 ++--
arch/arm64/kernel/traps.c | 2 +-
4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
index b77197d941fc..c01a912fbb1e 100644
--- a/arch/arm64/include/asm/irq.h
+++ b/arch/arm64/include/asm/irq.h
@@ -47,10 +47,10 @@ static inline int nr_legacy_irqs(void)
return 0;
}

-static inline bool on_irq_stack(unsigned long sp, int cpu)
+static inline bool on_irq_stack(unsigned long sp)
{
/* variable names the same as kernel/stacktrace.c */
- unsigned long low = (unsigned long)per_cpu(irq_stack, cpu);
+ unsigned long low = (unsigned long)raw_cpu_ptr(irq_stack);
unsigned long high = low + IRQ_STACK_START_SP;

return (low <= sp && sp <= high);
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 1b38c0150aec..baf0838205c7 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -127,7 +127,7 @@ static bool regs_within_kernel_stack(struct pt_regs *regs, unsigned long addr)
{
return ((addr & ~(THREAD_SIZE - 1)) ==
(kernel_stack_pointer(regs) & ~(THREAD_SIZE - 1))) ||
- on_irq_stack(addr, raw_smp_processor_id());
+ on_irq_stack(addr);
}

/**
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 09d37d66b630..6ffb965be641 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -54,13 +54,13 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
* non-preemptible context.
*/
if (tsk == current && !preemptible())
- irq_stack_ptr = IRQ_STACK_PTR(smp_processor_id());
+ irq_stack_ptr = IRQ_STACK_PTR();
else
irq_stack_ptr = 0;

low = frame->sp;
/* irq stacks are not THREAD_SIZE aligned */
- if (on_irq_stack(frame->sp, raw_smp_processor_id()))
+ if (on_irq_stack(frame->sp))
high = irq_stack_ptr;
else
high = ALIGN(low, THREAD_SIZE) - 0x20;
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index c7c7088097be..7ba80f2a6177 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -159,7 +159,7 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
* non-preemptible context.
*/
if (tsk == current && !preemptible())
- irq_stack_ptr = IRQ_STACK_PTR(smp_processor_id());
+ irq_stack_ptr = IRQ_STACK_PTR();
else
irq_stack_ptr = 0;
--
2.9.3
Ard Biesheuvel
2017-07-24 11:26:21 UTC
Permalink
Currently, when unwinding the call stack, we validate the frame pointer
of each frame against frame.sp, whose value is not clearly defined, and
which makes it more difficult to link stack frames together across
different stacks. It is far better to simply check whether the frame
pointer itself points into a valid stack.

Cc: Mark Rutland <***@arm.com>
Cc: Catalin Marinas <***@arm.com>
Cc: James Morse <***@arm.com>
Cc: Will Deacon <***@arm.com>
Signed-off-by: Ard Biesheuvel <***@linaro.org>
---
arch/arm64/include/asm/irq.h | 10 +++++++-
arch/arm64/kernel/stacktrace.c | 24 ++++++--------------
2 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
index c01a912fbb1e..57deb5abfc24 100644
--- a/arch/arm64/include/asm/irq.h
+++ b/arch/arm64/include/asm/irq.h
@@ -7,6 +7,7 @@
#ifndef __ASSEMBLER__

#include <linux/percpu.h>
+#include <linux/sched/task_stack.h>

#include <asm-generic/irq.h>
#include <asm/thread_info.h>
@@ -49,12 +50,19 @@ static inline int nr_legacy_irqs(void)

static inline bool on_irq_stack(unsigned long sp)
{
- /* variable names the same as kernel/stacktrace.c */
unsigned long low = (unsigned long)raw_cpu_ptr(irq_stack);
unsigned long high = low + IRQ_STACK_START_SP;

return (low <= sp && sp <= high);
}

+static inline bool on_task_stack(struct task_struct *tsk, unsigned long sp)
+{
+ unsigned long low = (unsigned long)task_stack_page(tsk);
+ unsigned long high = low + THREAD_SIZE;
+
+ return (low <= sp && sp < high);
+}
+
#endif /* !__ASSEMBLER__ */
#endif
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 6ffb965be641..beaf51fb3088 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -42,9 +42,10 @@
*/
int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
{
- unsigned long high, low;
unsigned long fp = frame->fp;
- unsigned long irq_stack_ptr;
+
+ if (fp & 0xf)
+ return -EINVAL;

if (!tsk)
tsk = current;
@@ -53,19 +54,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
* Switching between stacks is valid when tracing current and in
* non-preemptible context.
*/
- if (tsk == current && !preemptible())
- irq_stack_ptr = IRQ_STACK_PTR();
- else
- irq_stack_ptr = 0;
-
- low = frame->sp;
- /* irq stacks are not THREAD_SIZE aligned */
- if (on_irq_stack(frame->sp))
- high = irq_stack_ptr;
- else
- high = ALIGN(low, THREAD_SIZE) - 0x20;
-
- if (fp < low || fp > high || fp & 0xf)
+ if (!(tsk == current && !preemptible() && on_irq_stack(fp)) &&
+ !on_task_stack(tsk, fp))
return -EINVAL;

frame->sp = fp + 0x10;
@@ -94,9 +84,9 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
* Check the frame->fp we read from the bottom of the irq_stack,
* and the original task stack pointer are both in current->stack.
*/
- if (frame->sp == irq_stack_ptr) {
+ if (frame->sp == IRQ_STACK_PTR()) {
struct pt_regs *irq_args;
- unsigned long orig_sp = IRQ_STACK_TO_TASK_STACK(irq_stack_ptr);
+ unsigned long orig_sp = IRQ_STACK_TO_TASK_STACK(frame->sp);

if (object_is_on_stack((void *)orig_sp) &&
object_is_on_stack((void *)frame->fp)) {
--
2.9.3
Ard Biesheuvel
2017-07-24 11:26:23 UTC
Permalink
The unwind code sets the sp member of struct stackframe to
'frame pointer + 0x10' unconditionally, without regard for whether
doing so produces a legal value. So let's simply remove it now that
we have stopped using it anyway.

Cc: Mark Rutland <***@arm.com>
Cc: Catalin Marinas <***@arm.com>
Cc: James Morse <***@arm.com>
Cc: Will Deacon <***@arm.com>
Signed-off-by: Ard Biesheuvel <***@linaro.org>
---
arch/arm64/include/asm/stacktrace.h | 1 -
arch/arm64/kernel/perf_callchain.c | 1 -
arch/arm64/kernel/process.c | 5 +----
arch/arm64/kernel/return_address.c | 1 -
arch/arm64/kernel/stacktrace.c | 1 -
arch/arm64/kernel/time.c | 1 -
arch/arm64/kernel/traps.c | 2 --
7 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 5b6eafccc5d8..3bebab378c72 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -20,7 +20,6 @@ struct task_struct;

struct stackframe {
unsigned long fp;
- unsigned long sp;
unsigned long pc;
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
unsigned int graph;
diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
index 713ca824f266..bcafd7dcfe8b 100644
--- a/arch/arm64/kernel/perf_callchain.c
+++ b/arch/arm64/kernel/perf_callchain.c
@@ -162,7 +162,6 @@ void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
}

frame.fp = regs->regs[29];
- frame.sp = regs->sp;
frame.pc = regs->pc;
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
frame.graph = current->curr_ret_stack;
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 659ae8094ed5..85b953dd023a 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -382,15 +382,12 @@ unsigned long get_wchan(struct task_struct *p)
return 0;

frame.fp = thread_saved_fp(p);
- frame.sp = thread_saved_sp(p);
frame.pc = thread_saved_pc(p);
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
frame.graph = p->curr_ret_stack;
#endif
do {
- if (frame.sp < stack_page ||
- frame.sp >= stack_page + THREAD_SIZE ||
- unwind_frame(p, &frame))
+ if (unwind_frame(p, &frame))
goto out;
if (!in_sched_functions(frame.pc)) {
ret = frame.pc;
diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c
index 12a87f2600f2..933adbc0f654 100644
--- a/arch/arm64/kernel/return_address.c
+++ b/arch/arm64/kernel/return_address.c
@@ -42,7 +42,6 @@ void *return_address(unsigned int level)
data.addr = NULL;

frame.fp = (unsigned long)__builtin_frame_address(0);
- frame.sp = current_stack_pointer;
frame.pc = (unsigned long)return_address; /* dummy */
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
frame.graph = current->curr_ret_stack;
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index b079af1ad145..cd5771d4e2d2 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -58,7 +58,6 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
!on_task_stack(tsk, fp))
return -EINVAL;

- frame->sp = fp + 0x10;
frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));

diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
index da33c90248e9..a4391280fba9 100644
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -50,7 +50,6 @@ unsigned long profile_pc(struct pt_regs *regs)
return regs->pc;

frame.fp = regs->regs[29];
- frame.sp = regs->sp;
frame.pc = regs->pc;
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
frame.graph = -1; /* no task info */
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index f2b284669587..9ba060b62837 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -155,14 +155,12 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)

if (tsk == current) {
frame.fp = (unsigned long)__builtin_frame_address(0);
- frame.sp = current_stack_pointer;
frame.pc = (unsigned long)dump_backtrace;
} else {
/*
* task blocked in __switch_to
*/
frame.fp = thread_saved_fp(tsk);
- frame.sp = thread_saved_sp(tsk);
frame.pc = thread_saved_pc(tsk);
}
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
--
2.9.3
Ard Biesheuvel
2017-07-24 11:26:22 UTC
Permalink
As it turns out, the unwind code is slightly broken, and probably has
been for a while. The problem is in the dumping of the exception stack,
which is intended to dump the contents of the pt_regs struct at each
level in the call stack where an exception was taken and routed to a
routine marked as __exception (which means its stack frame is right
below the pt_regs struct on the stack).

'Right below the pt_regs struct' is ill defined, though: the unwind
code assigns 'frame pointer + 0x10' to the .sp member of the stackframe
struct at each level, and dump_backtrace() happily dereferences that as
the pt_regs pointer when encountering an __exception routine. However,
the actual size of the stack frame created by this routine (which could
be one of many __exception routines we have in the kernel) is not known,
and so frame.sp is pretty useless to figure out where struct pt_regs
really is.

So it seems the only way to ensure that we can find our struct pt_regs
when walking the stack frames is to put it at a known fixed offset of
the stack frame pointer that is passed to such __exception routines.
The simplest way to do that is to put it inside pt_regs itself, which is
the main change implemented by this patch. As a bonus, doing this allows
us to get rid of a fair amount of cruft related to walking from one stack
to the other, which is especially nice since we intend to introduce yet
another stack for overflow handling once we add support for vmapped
stacks. It also fixes an inconsistency where we only add a stack frame
pointing to ELR_EL1 if we are executing from the IRQ stack but not when
we are executing from the task stack.

Cc: Mark Rutland <***@arm.com>
Cc: Catalin Marinas <***@arm.com>
Cc: James Morse <***@arm.com>
Cc: Will Deacon <***@arm.com>
Signed-off-by: Ard Biesheuvel <***@linaro.org>
---
arch/arm64/include/asm/irq.h | 25 ----------------
arch/arm64/include/asm/ptrace.h | 1 +
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kernel/entry.S | 16 +++++------
arch/arm64/kernel/stacktrace.c | 30 --------------------
arch/arm64/kernel/traps.c | 27 ++++--------------
6 files changed, 15 insertions(+), 85 deletions(-)

diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
index 57deb5abfc24..8ba89c4ca183 100644
--- a/arch/arm64/include/asm/irq.h
+++ b/arch/arm64/include/asm/irq.h
@@ -16,31 +16,6 @@ struct pt_regs;

DECLARE_PER_CPU(unsigned long [IRQ_STACK_SIZE/sizeof(long)], irq_stack);

-/*
- * The highest address on the stack, and the first to be used. Used to
- * find the dummy-stack frame put down by el?_irq() in entry.S, which
- * is structured as follows:
- *
- * ------------
- * | | <- irq_stack_ptr
- * top ------------
- * | x19 | <- irq_stack_ptr - 0x08
- * ------------
- * | x29 | <- irq_stack_ptr - 0x10
- * ------------
- *
- * where x19 holds a copy of the task stack pointer where the struct pt_regs
- * from kernel_entry can be found.
- *
- */
-#define IRQ_STACK_PTR(cpu) ((unsigned long)per_cpu(irq_stack, cpu) + IRQ_STACK_START_SP)
-
-/*
- * The offset from irq_stack_ptr where entry.S will store the original
- * stack pointer. Used by unwind_frame() and dump_backtrace().
- */
-#define IRQ_STACK_TO_TASK_STACK(ptr) (*((unsigned long *)((ptr) - 0x08)))
-
extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));

static inline int nr_legacy_irqs(void)
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index 11403fdd0a50..ee72aa979078 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -119,6 +119,7 @@ struct pt_regs {
u64 syscallno;
u64 orig_addr_limit;
u64 unused; // maintain 16 byte alignment
+ u64 stackframe[2];
};

#define MAX_REG_OFFSET offsetof(struct pt_regs, pstate)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index b3bb7ef97bc8..71bf088f1e4b 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -75,6 +75,7 @@ int main(void)
DEFINE(S_ORIG_X0, offsetof(struct pt_regs, orig_x0));
DEFINE(S_SYSCALLNO, offsetof(struct pt_regs, syscallno));
DEFINE(S_ORIG_ADDR_LIMIT, offsetof(struct pt_regs, orig_addr_limit));
+ DEFINE(S_STACKFRAME, offsetof(struct pt_regs, stackframe));
DEFINE(S_FRAME_SIZE, sizeof(struct pt_regs));
BLANK();
DEFINE(MM_CONTEXT_ID, offsetof(struct mm_struct, context.id.counter));
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index b738880350f9..32f3b2413e5f 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -111,6 +111,14 @@
mrs x23, spsr_el1
stp lr, x21, [sp, #S_LR]

+ /*
+ * In order to be able to dump the contents of struct pt_regs at the
+ * time the exception was taken (in case we attempt to walk the call
+ * stack later), chain it together with the stack frames.
+ */
+ stp x29, x22, [sp, #S_STACKFRAME]
+ add x29, sp, #S_STACKFRAME
+
#ifdef CONFIG_ARM64_SW_TTBR0_PAN
/*
* Set the TTBR0 PAN bit in SPSR. When the exception is taken from
@@ -265,14 +273,6 @@ alternative_else_nop_endif

/* switch to the irq stack */
mov sp, x26
-
- /*
- * Add a dummy stack frame, this non-standard format is fixed up
- * by unwind_frame()
- */
- stp x29, x19, [sp, #-16]!
- mov x29, sp
-
9998:
.endm

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index beaf51fb3088..b079af1ad145 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -75,36 +75,6 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
}
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */

- /*
- * Check whether we are going to walk through from interrupt stack
- * to task stack.
- * If we reach the end of the stack - and its an interrupt stack,
- * unpack the dummy frame to find the original elr.
- *
- * Check the frame->fp we read from the bottom of the irq_stack,
- * and the original task stack pointer are both in current->stack.
- */
- if (frame->sp == IRQ_STACK_PTR()) {
- struct pt_regs *irq_args;
- unsigned long orig_sp = IRQ_STACK_TO_TASK_STACK(frame->sp);
-
- if (object_is_on_stack((void *)orig_sp) &&
- object_is_on_stack((void *)frame->fp)) {
- frame->sp = orig_sp;
-
- /* orig_sp is the saved pt_regs, find the elr */
- irq_args = (struct pt_regs *)orig_sp;
- frame->pc = irq_args->pc;
- } else {
- /*
- * This frame has a non-standard format, and we
- * didn't fix it, because the data looked wrong.
- * Refuse to output this frame.
- */
- return -EINVAL;
- }
- }
-
return 0;
}

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 7ba80f2a6177..f2b284669587 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -143,7 +143,6 @@ static void dump_instr(const char *lvl, struct pt_regs *regs)
void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
{
struct stackframe frame;
- unsigned long irq_stack_ptr;
int skip;

pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
@@ -154,15 +153,6 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
if (!try_get_task_stack(tsk))
return;

- /*
- * Switching between stacks is valid when tracing current and in
- * non-preemptible context.
- */
- if (tsk == current && !preemptible())
- irq_stack_ptr = IRQ_STACK_PTR();
- else
- irq_stack_ptr = 0;
-
if (tsk == current) {
frame.fp = (unsigned long)__builtin_frame_address(0);
frame.sp = current_stack_pointer;
@@ -203,20 +193,13 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
ret = unwind_frame(tsk, &frame);
if (ret < 0)
break;
- stack = frame.sp;
if (in_exception_text(where)) {
- /*
- * If we switched to the irq_stack before calling this
- * exception handler, then the pt_regs will be on the
- * task stack. The easiest way to tell is if the large
- * pt_regs would overlap with the end of the irq_stack.
- */
- if (stack < irq_stack_ptr &&
- (stack + sizeof(struct pt_regs)) > irq_stack_ptr)
- stack = IRQ_STACK_TO_TASK_STACK(irq_stack_ptr);
+ stack = frame.fp - offsetof(struct pt_regs, stackframe);

- dump_mem("", "Exception stack", stack,
- stack + sizeof(struct pt_regs));
+ if (on_task_stack(tsk, stack) ||
+ (tsk == current && !preemptible() && on_irq_stack(stack)))
+ dump_mem("", "Exception stack", stack,
+ stack + sizeof(struct pt_regs));
}
}
--
2.9.3
Mark Rutland
2017-07-24 17:54:46 UTC
Permalink
Hi Ard,

This series looks good. I've tested the whole thing with perf top -g and
sysrq-L.

In the process of reviewing this, I spotted some other (existing) bugs
in this area.
Post by Ard Biesheuvel
@@ -203,20 +193,13 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
ret = unwind_frame(tsk, &frame);
if (ret < 0)
break;
- stack = frame.sp;
if (in_exception_text(where)) {
- /*
- * If we switched to the irq_stack before calling this
- * exception handler, then the pt_regs will be on the
- * task stack. The easiest way to tell is if the large
- * pt_regs would overlap with the end of the irq_stack.
- */
- if (stack < irq_stack_ptr &&
- (stack + sizeof(struct pt_regs)) > irq_stack_ptr)
- stack = IRQ_STACK_TO_TASK_STACK(irq_stack_ptr);
+ stack = frame.fp - offsetof(struct pt_regs, stackframe);
- dump_mem("", "Exception stack", stack,
- stack + sizeof(struct pt_regs));
+ if (on_task_stack(tsk, stack) ||
+ (tsk == current && !preemptible() && on_irq_stack(stack)))
+ dump_mem("", "Exception stack", stack,
+ stack + sizeof(struct pt_regs));
So here, we're using information from three records (where the last
might have been faked from the current FP + LR). We look at the previous
frame's LR value to determine whether the current frame's fp points to a
next frame that's associated with some regs:

+----+----------------+
| fp | interrupted PC | /* embedded in pt_regs */
+----+----------------+
/\
||
+----+----------------+
where => | fp | < entry PC > |
+----+----------------+
/\
||
+----+----------------+
| fp | where | /* "where" is in exception text */
+----+----------------+

Which (IIUC) has three problems, inherited from the existing approach.

1) Several C functions called from exception entry assembly are missing
__exception annotations. e.g. bad_mode.

So we can miss exception regs in some cases.

2) Several functions may be called either from the exception entry
assembly, or from other C code. e.g. do_undefinstr, and any cascaded
irqchip handlers.

So we can spuriously decide part of the stack is a pt_regs.

3) The entry assembly can enable exceptions before calling C code, so if
an exception is taken at the right time, these will eb in the
backtrace without a surrounding frame in the exception text.

So we can miss exception regs in some cases (distinct from (1)).

... given that, I don't think we can rely on in_exception_text(). I
think we have to look at whether the current frame's PC/LR is in
.entry.text.

Fixup to that effect below, tested with sysrq-L and perf top -g. Note
that cpu_switch_to and ret_from_fork have to be moved into .text to
avoid false positives. I also killed 'where' since it's no longer
necessary to remember the previous frame.

There's another bug -- we always dump the exception regs, even when
skipping frames. The sqsrq-L handler tries to skip frames, so you get
the regs for irq_entry, but without an irq_entry backtrace, which is
confusing. That's simple enough to fix up separately, I guess.

Thanks,
Mark.

---->8----
diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h
index 02e9035..4136168 100644
--- a/arch/arm64/include/asm/traps.h
+++ b/arch/arm64/include/asm/traps.h
@@ -60,4 +60,9 @@ static inline int in_exception_text(unsigned long ptr)
return in ? : __in_irqentry_text(ptr);
}

+static inline int in_entry_text(unsigned long ptr)
+{
+ return ptr >= (unsigned long)&__entry_text_start &&
+ ptr < (unsigned long)&__entry_text_end;
+}
#endif
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 32f3b24..3d25ec5 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -707,38 +707,6 @@ el0_irq_naked:
ENDPROC(el0_irq)

/*
- * Register switch for AArch64. The callee-saved registers need to be saved
- * and restored. On entry:
- * x0 = previous task_struct (must be preserved across the switch)
- * x1 = next task_struct
- * Previous and next are guaranteed not to be the same.
- *
- */
-ENTRY(cpu_switch_to)
- mov x10, #THREAD_CPU_CONTEXT
- add x8, x0, x10
- mov x9, sp
- stp x19, x20, [x8], #16 // store callee-saved registers
- stp x21, x22, [x8], #16
- stp x23, x24, [x8], #16
- stp x25, x26, [x8], #16
- stp x27, x28, [x8], #16
- stp x29, x9, [x8], #16
- str lr, [x8]
- add x8, x1, x10
- ldp x19, x20, [x8], #16 // restore callee-saved registers
- ldp x21, x22, [x8], #16
- ldp x23, x24, [x8], #16
- ldp x25, x26, [x8], #16
- ldp x27, x28, [x8], #16
- ldp x29, x9, [x8], #16
- ldr lr, [x8]
- mov sp, x9
- msr sp_el0, x1
- ret
-ENDPROC(cpu_switch_to)
-
-/*
* This is the fast syscall return path. We do as little as possible here,
* and this includes saving x0 back into the kernel stack.
*/
@@ -781,18 +749,6 @@ finish_ret_to_user:
ENDPROC(ret_to_user)

/*
- * This is how we return from a fork.
- */
-ENTRY(ret_from_fork)
- bl schedule_tail
- cbz x19, 1f // not a kernel thread
- mov x0, x20
- blr x19
-1: get_thread_info tsk
- b ret_to_user
-ENDPROC(ret_from_fork)
-
-/*
* SVC handler.
*/
.align 6
@@ -865,3 +821,47 @@ ENTRY(sys_rt_sigreturn_wrapper)
mov x0, sp
b sys_rt_sigreturn
ENDPROC(sys_rt_sigreturn_wrapper)
+
+/*
+ * Register switch for AArch64. The callee-saved registers need to be saved
+ * and restored. On entry:
+ * x0 = previous task_struct (must be preserved across the switch)
+ * x1 = next task_struct
+ * Previous and next are guaranteed not to be the same.
+ *
+ */
+ENTRY(cpu_switch_to)
+ mov x10, #THREAD_CPU_CONTEXT
+ add x8, x0, x10
+ mov x9, sp
+ stp x19, x20, [x8], #16 // store callee-saved registers
+ stp x21, x22, [x8], #16
+ stp x23, x24, [x8], #16
+ stp x25, x26, [x8], #16
+ stp x27, x28, [x8], #16
+ stp x29, x9, [x8], #16
+ str lr, [x8]
+ add x8, x1, x10
+ ldp x19, x20, [x8], #16 // restore callee-saved registers
+ ldp x21, x22, [x8], #16
+ ldp x23, x24, [x8], #16
+ ldp x25, x26, [x8], #16
+ ldp x27, x28, [x8], #16
+ ldp x29, x9, [x8], #16
+ ldr lr, [x8]
+ mov sp, x9
+ msr sp_el0, x1
+ ret
+ENDPROC(cpu_switch_to)
+
+/*
+ * This is how we return from a fork.
+ */
+ENTRY(ret_from_fork)
+ bl schedule_tail
+ cbz x19, 1f // not a kernel thread
+ mov x0, x20
+ blr x19
+1: get_thread_info tsk
+ b ret_to_user
+ENDPROC(ret_from_fork)
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 9ba060b..63cbfb0 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -170,13 +170,12 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
skip = !!regs;
printk("Call trace:\n");
while (1) {
- unsigned long where = frame.pc;
unsigned long stack;
int ret;

/* skip until specified stack frame */
if (!skip) {
- dump_backtrace_entry(where);
+ dump_backtrace_entry(frame.pc);
} else if (frame.fp == regs->regs[29]) {
skip = 0;
/*
@@ -191,7 +190,7 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
ret = unwind_frame(tsk, &frame);
if (ret < 0)
break;
- if (in_exception_text(where)) {
+ if (in_entry_text(frame.pc)) {
stack = frame.fp - offsetof(struct pt_regs, stackframe);

if (on_task_stack(tsk, stack) ||
Ard Biesheuvel
2017-07-24 18:34:19 UTC
Permalink
Post by Mark Rutland
Hi Ard,
This series looks good. I've tested the whole thing with perf top -g and
sysrq-L.
In the process of reviewing this, I spotted some other (existing) bugs
in this area.
Post by Ard Biesheuvel
@@ -203,20 +193,13 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
ret = unwind_frame(tsk, &frame);
if (ret < 0)
break;
- stack = frame.sp;
if (in_exception_text(where)) {
- /*
- * If we switched to the irq_stack before calling this
- * exception handler, then the pt_regs will be on the
- * task stack. The easiest way to tell is if the large
- * pt_regs would overlap with the end of the irq_stack.
- */
- if (stack < irq_stack_ptr &&
- (stack + sizeof(struct pt_regs)) > irq_stack_ptr)
- stack = IRQ_STACK_TO_TASK_STACK(irq_stack_ptr);
+ stack = frame.fp - offsetof(struct pt_regs, stackframe);
- dump_mem("", "Exception stack", stack,
- stack + sizeof(struct pt_regs));
+ if (on_task_stack(tsk, stack) ||
+ (tsk == current && !preemptible() && on_irq_stack(stack)))
+ dump_mem("", "Exception stack", stack,
+ stack + sizeof(struct pt_regs));
So here, we're using information from three records (where the last
might have been faked from the current FP + LR). We look at the previous
frame's LR value to determine whether the current frame's fp points to a
+----+----------------+
| fp | interrupted PC | /* embedded in pt_regs */
+----+----------------+
/\
||
+----+----------------+
where => | fp | < entry PC > |
+----+----------------+
/\
||
+----+----------------+
| fp | where | /* "where" is in exception text */
+----+----------------+
Which (IIUC) has three problems, inherited from the existing approach.
1) Several C functions called from exception entry assembly are missing
__exception annotations. e.g. bad_mode.
So we can miss exception regs in some cases.
2) Several functions may be called either from the exception entry
assembly, or from other C code. e.g. do_undefinstr, and any cascaded
irqchip handlers.
So we can spuriously decide part of the stack is a pt_regs.
3) The entry assembly can enable exceptions before calling C code, so if
an exception is taken at the right time, these will eb in the
backtrace without a surrounding frame in the exception text.
So we can miss exception regs in some cases (distinct from (1)).
... given that, I don't think we can rely on in_exception_text(). I
think we have to look at whether the current frame's PC/LR is in
.entry.text.
I guess that's an improvement, yes. At least we won't have false
positives where a stack frame is misidentified as being on embedded in
pt_regs. But it does rely on the bl instruction being used even in
cases where we know we won't be returning, i.e., jumps to bad_mode,
do_sp_pc_abort and do_undef_instr from .entry.text
Post by Mark Rutland
Fixup to that effect below, tested with sysrq-L and perf top -g. Note
that cpu_switch_to and ret_from_fork have to be moved into .text to
avoid false positives. I also killed 'where' since it's no longer
necessary to remember the previous frame.
There's another bug -- we always dump the exception regs, even when
skipping frames. The sqsrq-L handler tries to skip frames, so you get
the regs for irq_entry, but without an irq_entry backtrace, which is
confusing. That's simple enough to fix up separately, I guess.
Patch looks good to me (modulo the comment above). Re skipping frames,
that's a trivial one-liner fix, but needed regardless.
Post by Mark Rutland
---->8----
diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h
index 02e9035..4136168 100644
--- a/arch/arm64/include/asm/traps.h
+++ b/arch/arm64/include/asm/traps.h
@@ -60,4 +60,9 @@ static inline int in_exception_text(unsigned long ptr)
return in ? : __in_irqentry_text(ptr);
}
+static inline int in_entry_text(unsigned long ptr)
+{
+ return ptr >= (unsigned long)&__entry_text_start &&
+ ptr < (unsigned long)&__entry_text_end;
+}
#endif
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 32f3b24..3d25ec5 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
ENDPROC(el0_irq)
/*
- * Register switch for AArch64. The callee-saved registers need to be saved
- * x0 = previous task_struct (must be preserved across the switch)
- * x1 = next task_struct
- * Previous and next are guaranteed not to be the same.
- *
- */
-ENTRY(cpu_switch_to)
- mov x10, #THREAD_CPU_CONTEXT
- add x8, x0, x10
- mov x9, sp
- stp x19, x20, [x8], #16 // store callee-saved registers
- stp x21, x22, [x8], #16
- stp x23, x24, [x8], #16
- stp x25, x26, [x8], #16
- stp x27, x28, [x8], #16
- stp x29, x9, [x8], #16
- str lr, [x8]
- add x8, x1, x10
- ldp x19, x20, [x8], #16 // restore callee-saved registers
- ldp x21, x22, [x8], #16
- ldp x23, x24, [x8], #16
- ldp x25, x26, [x8], #16
- ldp x27, x28, [x8], #16
- ldp x29, x9, [x8], #16
- ldr lr, [x8]
- mov sp, x9
- msr sp_el0, x1
- ret
-ENDPROC(cpu_switch_to)
-
-/*
* This is the fast syscall return path. We do as little as possible here,
* and this includes saving x0 back into the kernel stack.
*/
ENDPROC(ret_to_user)
/*
- * This is how we return from a fork.
- */
-ENTRY(ret_from_fork)
- bl schedule_tail
- cbz x19, 1f // not a kernel thread
- mov x0, x20
- blr x19
-1: get_thread_info tsk
- b ret_to_user
-ENDPROC(ret_from_fork)
-
-/*
* SVC handler.
*/
.align 6
@@ -865,3 +821,47 @@ ENTRY(sys_rt_sigreturn_wrapper)
mov x0, sp
b sys_rt_sigreturn
ENDPROC(sys_rt_sigreturn_wrapper)
+
+/*
+ * Register switch for AArch64. The callee-saved registers need to be saved
+ * x0 = previous task_struct (must be preserved across the switch)
+ * x1 = next task_struct
+ * Previous and next are guaranteed not to be the same.
+ *
+ */
+ENTRY(cpu_switch_to)
+ mov x10, #THREAD_CPU_CONTEXT
+ add x8, x0, x10
+ mov x9, sp
+ stp x19, x20, [x8], #16 // store callee-saved registers
+ stp x21, x22, [x8], #16
+ stp x23, x24, [x8], #16
+ stp x25, x26, [x8], #16
+ stp x27, x28, [x8], #16
+ stp x29, x9, [x8], #16
+ str lr, [x8]
+ add x8, x1, x10
+ ldp x19, x20, [x8], #16 // restore callee-saved registers
+ ldp x21, x22, [x8], #16
+ ldp x23, x24, [x8], #16
+ ldp x25, x26, [x8], #16
+ ldp x27, x28, [x8], #16
+ ldp x29, x9, [x8], #16
+ ldr lr, [x8]
+ mov sp, x9
+ msr sp_el0, x1
+ ret
+ENDPROC(cpu_switch_to)
+
+/*
+ * This is how we return from a fork.
+ */
+ENTRY(ret_from_fork)
+ bl schedule_tail
+ cbz x19, 1f // not a kernel thread
+ mov x0, x20
+ blr x19
+1: get_thread_info tsk
+ b ret_to_user
+ENDPROC(ret_from_fork)
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 9ba060b..63cbfb0 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -170,13 +170,12 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
skip = !!regs;
printk("Call trace:\n");
while (1) {
- unsigned long where = frame.pc;
unsigned long stack;
int ret;
/* skip until specified stack frame */
if (!skip) {
- dump_backtrace_entry(where);
+ dump_backtrace_entry(frame.pc);
} else if (frame.fp == regs->regs[29]) {
skip = 0;
/*
@@ -191,7 +190,7 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
ret = unwind_frame(tsk, &frame);
if (ret < 0)
break;
- if (in_exception_text(where)) {
+ if (in_entry_text(frame.pc)) {
stack = frame.fp - offsetof(struct pt_regs, stackframe);
if (on_task_stack(tsk, stack) ||
Ard Biesheuvel
2017-07-24 19:09:08 UTC
Permalink
Post by Ard Biesheuvel
Post by Mark Rutland
Hi Ard,
This series looks good. I've tested the whole thing with perf top -g and
sysrq-L.
In the process of reviewing this, I spotted some other (existing) bugs
in this area.
Post by Ard Biesheuvel
@@ -203,20 +193,13 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
ret = unwind_frame(tsk, &frame);
if (ret < 0)
break;
- stack = frame.sp;
if (in_exception_text(where)) {
- /*
- * If we switched to the irq_stack before calling this
- * exception handler, then the pt_regs will be on the
- * task stack. The easiest way to tell is if the large
- * pt_regs would overlap with the end of the irq_stack.
- */
- if (stack < irq_stack_ptr &&
- (stack + sizeof(struct pt_regs)) > irq_stack_ptr)
- stack = IRQ_STACK_TO_TASK_STACK(irq_stack_ptr);
+ stack = frame.fp - offsetof(struct pt_regs, stackframe);
- dump_mem("", "Exception stack", stack,
- stack + sizeof(struct pt_regs));
+ if (on_task_stack(tsk, stack) ||
+ (tsk == current && !preemptible() && on_irq_stack(stack)))
+ dump_mem("", "Exception stack", stack,
+ stack + sizeof(struct pt_regs));
So here, we're using information from three records (where the last
might have been faked from the current FP + LR). We look at the previous
frame's LR value to determine whether the current frame's fp points to a
+----+----------------+
| fp | interrupted PC | /* embedded in pt_regs */
+----+----------------+
/\
||
+----+----------------+
where => | fp | < entry PC > |
+----+----------------+
/\
||
+----+----------------+
| fp | where | /* "where" is in exception text */
+----+----------------+
Which (IIUC) has three problems, inherited from the existing approach.
1) Several C functions called from exception entry assembly are missing
__exception annotations. e.g. bad_mode.
So we can miss exception regs in some cases.
2) Several functions may be called either from the exception entry
assembly, or from other C code. e.g. do_undefinstr, and any cascaded
irqchip handlers.
So we can spuriously decide part of the stack is a pt_regs.
3) The entry assembly can enable exceptions before calling C code, so if
an exception is taken at the right time, these will eb in the
backtrace without a surrounding frame in the exception text.
So we can miss exception regs in some cases (distinct from (1)).
... given that, I don't think we can rely on in_exception_text(). I
think we have to look at whether the current frame's PC/LR is in
.entry.text.
I guess that's an improvement, yes. At least we won't have false
positives where a stack frame is misidentified as being on embedded in
pt_regs. But it does rely on the bl instruction being used even in
cases where we know we won't be returning, i.e., jumps to bad_mode,
do_sp_pc_abort and do_undef_instr from .entry.text
I guess we could do something along the lines of

"""
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 45947b36f1ff..37f67c2080dc 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -60,6 +60,22 @@
#endif
.endm

+ .macro b_e, target, b=b
+ .pushsection ".rodata.entrycalls", "a", @progbits
+ .align 2
+ .long 6767f - .
+ .popsection
+6767: \b \target
+ .endm
+
+ .macro bl_e, target
+ b_e \target, bl
+ .endm
+
+ .macro blr_e, target
+ b_e \target, blr
+ .endm
+
/*
* Bad Abort numbers
*-----------------
@@ -352,7 +368,7 @@ tsk .req x28 // current thread_info
ldr_l x1, handle_arch_irq
mov x0, sp
irq_stack_entry
- blr x1
+ blr_e x1
irq_stack_exit
.endm

@@ -381,7 +397,7 @@ __bad_stack:
mov x0, sp

/* Time to die */
- bl handle_bad_stack
+ bl_e handle_bad_stack
nop // ensure lr points somewhere sane
ENDPROC(__bad_stack)
#endif /* CONFIG_VMAP_STACK */
@@ -429,7 +445,7 @@ END(vectors)
mov x0, sp
mov x1, #\reason
mrs x2, esr_el1
- b bad_mode
+ b_e bad_mode
.endm

el0_sync_invalid:
"""

etc to explicitly record the places where we call out of the entry
code with a pt_regs struct on the top of the stack.

Then, we can use

static bool is_entry_call(unsigned long pc)
{
extern s32 __entrycalls_start[], __entrycalls_end[];
s32 *p;

for (p = __entrycalls_start; p < __entrycalls_end; p++)
if ((unsigned long)p + *p == pc - 4)
return true;
return false;
}

to identify values of frame.pc that coincide with such calls.
Mark Rutland
2017-07-25 09:53:55 UTC
Permalink
Post by Ard Biesheuvel
Post by Ard Biesheuvel
Post by Mark Rutland
So here, we're using information from three records (where the last
might have been faked from the current FP + LR). We look at the previous
frame's LR value to determine whether the current frame's fp points to a
+----+----------------+
| fp | interrupted PC | /* embedded in pt_regs */
+----+----------------+
/\
||
+----+----------------+
where => | fp | < entry PC > |
+----+----------------+
/\
||
+----+----------------+
| fp | where | /* "where" is in exception text */
+----+----------------+
Which (IIUC) has three problems, inherited from the existing approach.
3) The entry assembly can enable exceptions before calling C code, so if
an exception is taken at the right time, these will eb in the
backtrace without a surrounding frame in the exception text.
[...]
Post by Ard Biesheuvel
Post by Ard Biesheuvel
I guess that's an improvement, yes. At least we won't have false
positives where a stack frame is misidentified as being on embedded in
pt_regs. But it does rely on the bl instruction being used even in
cases where we know we won't be returning, i.e., jumps to bad_mode,
do_sp_pc_abort and do_undef_instr from .entry.text
Good point, yes.

This would be good for backtracing, regardless.
Post by Ard Biesheuvel
I guess we could do something along the lines of
"""
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 45947b36f1ff..37f67c2080dc 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -60,6 +60,22 @@
#endif
.endm
+ .macro b_e, target, b=b
+ .align 2
+ .long 6767f - .
+ .popsection
+6767: \b \target
+ .endm
+
+ .macro bl_e, target
+ b_e \target, bl
+ .endm
+
+ .macro blr_e, target
+ b_e \target, blr
+ .endm
+
/*
* Bad Abort numbers
*-----------------
@@ -352,7 +368,7 @@ tsk .req x28 // current thread_info
ldr_l x1, handle_arch_irq
mov x0, sp
irq_stack_entry
- blr x1
+ blr_e x1
irq_stack_exit
.endm
mov x0, sp
/* Time to die */
- bl handle_bad_stack
+ bl_e handle_bad_stack
nop // ensure lr points somewhere sane
ENDPROC(__bad_stack)
#endif /* CONFIG_VMAP_STACK */
@@ -429,7 +445,7 @@ END(vectors)
mov x0, sp
mov x1, #\reason
mrs x2, esr_el1
- b bad_mode
+ b_e bad_mode
.endm
"""
etc to explicitly record the places where we call out of the entry
code with a pt_regs struct on the top of the stack.
Then, we can use
static bool is_entry_call(unsigned long pc)
{
extern s32 __entrycalls_start[], __entrycalls_end[];
s32 *p;
for (p = __entrycalls_start; p < __entrycalls_end; p++)
if ((unsigned long)p + *p == pc - 4)
return true;
return false;
}
to identify values of frame.pc that coincide with such calls.
Unfortunately, I believe that fails for case (3), where entry code is
interrupted before calling C code. e.g. if an interrupt was taken during
data abort handling:

kernel_entry 1
...
el1_da:
...
enable_irq
< interrupted here >
...
bl do_mem_abort

... as we'd dump the regs for the interrupt, but not those for the data
abort.

Thanks,
Mark.
Ard Biesheuvel
2017-07-25 10:07:40 UTC
Permalink
Post by Mark Rutland
Post by Ard Biesheuvel
Post by Ard Biesheuvel
Post by Mark Rutland
So here, we're using information from three records (where the last
might have been faked from the current FP + LR). We look at the previous
frame's LR value to determine whether the current frame's fp points to a
+----+----------------+
| fp | interrupted PC | /* embedded in pt_regs */
+----+----------------+
/\
||
+----+----------------+
where => | fp | < entry PC > |
+----+----------------+
/\
||
+----+----------------+
| fp | where | /* "where" is in exception text */
+----+----------------+
Which (IIUC) has three problems, inherited from the existing approach.
3) The entry assembly can enable exceptions before calling C code, so if
an exception is taken at the right time, these will eb in the
backtrace without a surrounding frame in the exception text.
[...]
Post by Ard Biesheuvel
Post by Ard Biesheuvel
I guess that's an improvement, yes. At least we won't have false
positives where a stack frame is misidentified as being on embedded in
pt_regs. But it does rely on the bl instruction being used even in
cases where we know we won't be returning, i.e., jumps to bad_mode,
do_sp_pc_abort and do_undef_instr from .entry.text
Good point, yes.
This would be good for backtracing, regardless.
Post by Ard Biesheuvel
I guess we could do something along the lines of
"""
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 45947b36f1ff..37f67c2080dc 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -60,6 +60,22 @@
#endif
.endm
+ .macro b_e, target, b=b
+ .align 2
+ .long 6767f - .
+ .popsection
+6767: \b \target
+ .endm
+
+ .macro bl_e, target
+ b_e \target, bl
+ .endm
+
+ .macro blr_e, target
+ b_e \target, blr
+ .endm
+
/*
* Bad Abort numbers
*-----------------
@@ -352,7 +368,7 @@ tsk .req x28 // current thread_info
ldr_l x1, handle_arch_irq
mov x0, sp
irq_stack_entry
- blr x1
+ blr_e x1
irq_stack_exit
.endm
mov x0, sp
/* Time to die */
- bl handle_bad_stack
+ bl_e handle_bad_stack
nop // ensure lr points somewhere sane
ENDPROC(__bad_stack)
#endif /* CONFIG_VMAP_STACK */
@@ -429,7 +445,7 @@ END(vectors)
mov x0, sp
mov x1, #\reason
mrs x2, esr_el1
- b bad_mode
+ b_e bad_mode
.endm
"""
etc to explicitly record the places where we call out of the entry
code with a pt_regs struct on the top of the stack.
Then, we can use
static bool is_entry_call(unsigned long pc)
{
extern s32 __entrycalls_start[], __entrycalls_end[];
s32 *p;
for (p = __entrycalls_start; p < __entrycalls_end; p++)
if ((unsigned long)p + *p == pc - 4)
return true;
return false;
}
to identify values of frame.pc that coincide with such calls.
Unfortunately, I believe that fails for case (3), where entry code is
interrupted before calling C code. e.g. if an interrupt was taken during
kernel_entry 1
...
...
enable_irq
< interrupted here >
...
bl do_mem_abort
... as we'd dump the regs for the interrupt, but not those for the data
abort.
True.
Will Deacon
2017-07-25 13:13:03 UTC
Permalink
This post might be inappropriate. Click to display it.
Mark Rutland
2017-07-25 13:58:34 UTC
Permalink
Post by Will Deacon
Post by Mark Rutland
So here, we're using information from three records (where the last
might have been faked from the current FP + LR). We look at the previous
frame's LR value to determine whether the current frame's fp points to a
+----+----------------+
| fp | interrupted PC | /* embedded in pt_regs */
+----+----------------+
/\
||
+----+----------------+
where => | fp | < entry PC > |
+----+----------------+
/\
||
+----+----------------+
| fp | where | /* "where" is in exception text */
+----+----------------+
Which (IIUC) has three problems, inherited from the existing approach.
1) Several C functions called from exception entry assembly are missing
__exception annotations. e.g. bad_mode.
So we can miss exception regs in some cases.
2) Several functions may be called either from the exception entry
assembly, or from other C code. e.g. do_undefinstr, and any cascaded
irqchip handlers.
So we can spuriously decide part of the stack is a pt_regs.
3) The entry assembly can enable exceptions before calling C code, so if
an exception is taken at the right time, these will eb in the
backtrace without a surrounding frame in the exception text.
So we can miss exception regs in some cases (distinct from (1)).
... given that, I don't think we can rely on in_exception_text(). I
think we have to look at whether the current frame's PC/LR is in
.entry.text.
Looking around, I think most (all?) users of in_exception_text and the
associated linker script labels suffer from similar problems. AFAICT, these
kprobes - Used to blacklist probes, but omits entry code
ftrace - For graph tracer backtracing; attempted to hack around the
problem in 9a5ad7d0e3e1 but still excludes .entry.text
kasan - Duplicates ftrace stuff in core code, but also checks
against softirqentry
So it looks like we should either include the entry text in these sections
or avoid using these functions altogether (eventually removing them). It
needn't hold this series up, but worth considering.
What do you think?
Agreed; this is currently a mess, and I don't think it's possible to
make portable inferences based on these sections. This will all need an
audit.

As you say, I think that needn't hold this series up, as those are
orthogonal issues. I'm happy to dig into those in the coming weeks.

The kasan case isn't too problematic. It only affects where an
alloc/free stacktrace gets trimmed for an alloc/free in an IRQ handler,
so in the worst case we just lose a few frames beyond the inner irqchip
handler in the dumped backtrace.

For kprobes/ftrace, I'd need to do some further digging.

I'd looked at cleaning up .irqentry.text in the past (as part of further
arm64/entry-deasm work). It's typically used to find the task->IRQ (or
X->IRQ) transition. That can be cleaned up by having a single
(annotated) arch-specific entry point that invokes (unannotated) irqchip
handlers. This would fix kasan, at least.

Thanks,
Mark.

Loading...