Discussion:
[PATCH 01/16] arm64: Use arch_timer_get_rate when trapping CNTFRQ_EL0
Marc Zyngier
2017-07-21 17:14:33 UTC
Permalink
In an ideal world, CNTFRQ_EL0 always contains the timer frequency
for the kernel to use. Sadly, we get quite a few broken systems
where the firmware authors cannot be bothered to program that
register on all CPUs, and rely on DT to provide that frequency.

So when trapping CNTFRQ_EL0, make sure to return the actual rate
(as known by the kernel), and not CNTFRQ_EL0.

Acked-by: Mark Rutland <***@arm.com>
Signed-off-by: Marc Zyngier <***@arm.com>
---
arch/arm64/kernel/traps.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index c7c7088097be..b02295673216 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -519,7 +519,7 @@ static void cntfrq_read_handler(unsigned int esr, struct pt_regs *regs)
{
int rt = (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> ESR_ELx_SYS64_ISS_RT_SHIFT;

- pt_regs_write_reg(regs, rt, read_sysreg(cntfrq_el0));
+ pt_regs_write_reg(regs, rt, arch_timer_get_rate());
regs->pc += 4;
}
--
2.11.0
Marc Zyngier
2017-07-21 17:14:34 UTC
Permalink
So far, we don't have anything to help decoding ESR_ELx when dealing
with ESR_ELx_EC_CP15_{32,64}. As we're about to handle some of those,
let's add some useful macros.

Signed-off-by: Marc Zyngier <***@arm.com>
---
arch/arm64/include/asm/esr.h | 52 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 8cabd57b6348..fc71b89c99f9 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -206,6 +206,58 @@
(((e) & ESR_ELx_SYS64_ISS_OP2_MASK) >> \
ESR_ELx_SYS64_ISS_OP2_SHIFT))

+#define ESR_ELx_CP15_32_ISS_COND_SHIFT 20
+#define ESR_ELx_CP15_32_ISS_COND_MASK (UL(0xf) << ESR_ELx_CP15_32_ISS_COND_SHIFT)
+
+#define ESR_ELx_CP15_32_ISS_DIR_MASK 0x1
+#define ESR_ELx_CP15_32_ISS_DIR_READ 0x1
+#define ESR_ELx_CP15_32_ISS_DIR_WRITE 0x0
+
+#define ESR_ELx_CP15_32_ISS_RT_SHIFT 5
+#define ESR_ELx_CP15_32_ISS_RT_MASK (UL(0x1f) << ESR_ELx_CP15_32_ISS_RT_SHIFT)
+#define ESR_ELx_CP15_32_ISS_CRM_SHIFT 1
+#define ESR_ELx_CP15_32_ISS_CRM_MASK (UL(0xf) << ESR_ELx_CP15_32_ISS_CRM_SHIFT)
+#define ESR_ELx_CP15_32_ISS_CRN_SHIFT 10
+#define ESR_ELx_CP15_32_ISS_CRN_MASK (UL(0xf) << ESR_ELx_CP15_32_ISS_CRN_SHIFT)
+#define ESR_ELx_CP15_32_ISS_OP1_SHIFT 14
+#define ESR_ELx_CP15_32_ISS_OP1_MASK (UL(0x7) << ESR_ELx_CP15_32_ISS_OP1_SHIFT)
+#define ESR_ELx_CP15_32_ISS_OP2_SHIFT 17
+#define ESR_ELx_CP15_32_ISS_OP2_MASK (UL(0x7) << ESR_ELx_CP15_32_ISS_OP2_SHIFT)
+
+#define ESR_ELx_CP15_32_ISS_SYS_MASK (ESR_ELx_CP15_32_ISS_OP1_MASK | \
+ ESR_ELx_CP15_32_ISS_OP2_MASK | \
+ ESR_ELx_CP15_32_ISS_CRN_MASK | \
+ ESR_ELx_CP15_32_ISS_CRM_MASK | \
+ ESR_ELx_CP15_32_ISS_DIR_MASK)
+#define ESR_ELx_CP15_32_ISS_SYS_VAL(op1, op2, crn, crm) \
+ (((op1) << ESR_ELx_CP15_32_ISS_OP1_SHIFT) | \
+ ((op2) << ESR_ELx_CP15_32_ISS_OP2_SHIFT) | \
+ ((crn) << ESR_ELx_CP15_32_ISS_CRN_SHIFT) | \
+ ((crm) << ESR_ELx_CP15_32_ISS_CRM_SHIFT))
+
+#define ESR_ELx_CP15_64_ISS_DIR_MASK 0x1
+#define ESR_ELx_CP15_64_ISS_DIR_READ 0x1
+#define ESR_ELx_CP15_64_ISS_DIR_WRITE 0x0
+
+#define ESR_ELx_CP15_64_ISS_RT_SHIFT 5
+#define ESR_ELx_CP15_64_ISS_RT_MASK (UL(0x1f) << ESR_ELx_CP15_64_ISS_RT_SHIFT)
+
+#define ESR_ELx_CP15_64_ISS_RT2_SHIFT 10
+#define ESR_ELx_CP15_64_ISS_RT2_MASK (UL(0x1f) << ESR_ELx_CP15_64_ISS_RT2_SHIFT)
+
+#define ESR_ELx_CP15_64_ISS_OP1_SHIFT 16
+#define ESR_ELx_CP15_64_ISS_OP1_MASK (UL(0xf) << ESR_ELx_CP15_64_ISS_OP1_SHIFT)
+#define ESR_ELx_CP15_64_ISS_CRM_SHIFT 1
+#define ESR_ELx_CP15_64_ISS_CRM_MASK (UL(0xf) << ESR_ELx_CP15_64_ISS_CRM_SHIFT)
+
+#define ESR_ELx_CP15_64_ISS_SYS_VAL(op1, crm) \
+ (((op1) << ESR_ELx_CP15_64_ISS_OP1_SHIFT) | \
+ ((crm) << ESR_ELx_CP15_64_ISS_CRM_SHIFT))
+
+#define ESR_ELx_CP15_64_ISS_SYS_MASK (ESR_ELx_CP15_64_ISS_OP1_MASK | \
+ ESR_ELx_CP15_64_ISS_CRM_MASK | \
+ ESR_ELx_CP15_64_ISS_DIR_MASK)
+
#ifndef __ASSEMBLY__
#include <asm/types.h>
--
2.11.0
Marc Zyngier
2017-07-21 17:15:27 UTC
Permalink
In an ideal world, CNTFRQ_EL0 always contains the timer frequency
for the kernel to use. Sadly, we get quite a few broken systems
where the firmware authors cannot be bothered to program that
register on all CPUs, and rely on DT to provide that frequency.

So when trapping CNTFRQ_EL0, make sure to return the actual rate
(as known by the kernel), and not CNTFRQ_EL0.

Acked-by: Mark Rutland <***@arm.com>
Signed-off-by: Marc Zyngier <***@arm.com>
---
arch/arm64/kernel/traps.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index c7c7088097be..b02295673216 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -519,7 +519,7 @@ static void cntfrq_read_handler(unsigned int esr, struct pt_regs *regs)
{
int rt = (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> ESR_ELx_SYS64_ISS_RT_SHIFT;

- pt_regs_write_reg(regs, rt, read_sysreg(cntfrq_el0));
+ pt_regs_write_reg(regs, rt, arch_timer_get_rate());
regs->pc += 4;
}
--
2.11.0
Marc Zyngier
2017-07-21 17:15:28 UTC
Permalink
So far, we don't have anything to help decoding ESR_ELx when dealing
with ESR_ELx_EC_CP15_{32,64}. As we're about to handle some of those,
let's add some useful macros.

Signed-off-by: Marc Zyngier <***@arm.com>
---
arch/arm64/include/asm/esr.h | 52 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 8cabd57b6348..fc71b89c99f9 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -206,6 +206,58 @@
(((e) & ESR_ELx_SYS64_ISS_OP2_MASK) >> \
ESR_ELx_SYS64_ISS_OP2_SHIFT))

+#define ESR_ELx_CP15_32_ISS_COND_SHIFT 20
+#define ESR_ELx_CP15_32_ISS_COND_MASK (UL(0xf) << ESR_ELx_CP15_32_ISS_COND_SHIFT)
+
+#define ESR_ELx_CP15_32_ISS_DIR_MASK 0x1
+#define ESR_ELx_CP15_32_ISS_DIR_READ 0x1
+#define ESR_ELx_CP15_32_ISS_DIR_WRITE 0x0
+
+#define ESR_ELx_CP15_32_ISS_RT_SHIFT 5
+#define ESR_ELx_CP15_32_ISS_RT_MASK (UL(0x1f) << ESR_ELx_CP15_32_ISS_RT_SHIFT)
+#define ESR_ELx_CP15_32_ISS_CRM_SHIFT 1
+#define ESR_ELx_CP15_32_ISS_CRM_MASK (UL(0xf) << ESR_ELx_CP15_32_ISS_CRM_SHIFT)
+#define ESR_ELx_CP15_32_ISS_CRN_SHIFT 10
+#define ESR_ELx_CP15_32_ISS_CRN_MASK (UL(0xf) << ESR_ELx_CP15_32_ISS_CRN_SHIFT)
+#define ESR_ELx_CP15_32_ISS_OP1_SHIFT 14
+#define ESR_ELx_CP15_32_ISS_OP1_MASK (UL(0x7) << ESR_ELx_CP15_32_ISS_OP1_SHIFT)
+#define ESR_ELx_CP15_32_ISS_OP2_SHIFT 17
+#define ESR_ELx_CP15_32_ISS_OP2_MASK (UL(0x7) << ESR_ELx_CP15_32_ISS_OP2_SHIFT)
+
+#define ESR_ELx_CP15_32_ISS_SYS_MASK (ESR_ELx_CP15_32_ISS_OP1_MASK | \
+ ESR_ELx_CP15_32_ISS_OP2_MASK | \
+ ESR_ELx_CP15_32_ISS_CRN_MASK | \
+ ESR_ELx_CP15_32_ISS_CRM_MASK | \
+ ESR_ELx_CP15_32_ISS_DIR_MASK)
+#define ESR_ELx_CP15_32_ISS_SYS_VAL(op1, op2, crn, crm) \
+ (((op1) << ESR_ELx_CP15_32_ISS_OP1_SHIFT) | \
+ ((op2) << ESR_ELx_CP15_32_ISS_OP2_SHIFT) | \
+ ((crn) << ESR_ELx_CP15_32_ISS_CRN_SHIFT) | \
+ ((crm) << ESR_ELx_CP15_32_ISS_CRM_SHIFT))
+
+#define ESR_ELx_CP15_64_ISS_DIR_MASK 0x1
+#define ESR_ELx_CP15_64_ISS_DIR_READ 0x1
+#define ESR_ELx_CP15_64_ISS_DIR_WRITE 0x0
+
+#define ESR_ELx_CP15_64_ISS_RT_SHIFT 5
+#define ESR_ELx_CP15_64_ISS_RT_MASK (UL(0x1f) << ESR_ELx_CP15_64_ISS_RT_SHIFT)
+
+#define ESR_ELx_CP15_64_ISS_RT2_SHIFT 10
+#define ESR_ELx_CP15_64_ISS_RT2_MASK (UL(0x1f) << ESR_ELx_CP15_64_ISS_RT2_SHIFT)
+
+#define ESR_ELx_CP15_64_ISS_OP1_SHIFT 16
+#define ESR_ELx_CP15_64_ISS_OP1_MASK (UL(0xf) << ESR_ELx_CP15_64_ISS_OP1_SHIFT)
+#define ESR_ELx_CP15_64_ISS_CRM_SHIFT 1
+#define ESR_ELx_CP15_64_ISS_CRM_MASK (UL(0xf) << ESR_ELx_CP15_64_ISS_CRM_SHIFT)
+
+#define ESR_ELx_CP15_64_ISS_SYS_VAL(op1, crm) \
+ (((op1) << ESR_ELx_CP15_64_ISS_OP1_SHIFT) | \
+ ((crm) << ESR_ELx_CP15_64_ISS_CRM_SHIFT))
+
+#define ESR_ELx_CP15_64_ISS_SYS_MASK (ESR_ELx_CP15_64_ISS_OP1_MASK | \
+ ESR_ELx_CP15_64_ISS_CRM_MASK | \
+ ESR_ELx_CP15_64_ISS_DIR_MASK)
+
#ifndef __ASSEMBLY__
#include <asm/types.h>
--
2.11.0
Marc Zyngier
2017-07-21 17:15:29 UTC
Permalink
Instead of directly generating an UNDEF when trapping a CP15 access,
let's add a new entry point to that effect (which only generates an
UNDEF for now).

Acked-by: Mark Rutland <***@arm.com>
Signed-off-by: Marc Zyngier <***@arm.com>
---
arch/arm64/kernel/entry.S | 14 ++++++++++++--
arch/arm64/kernel/traps.c | 13 +++++++++++++
2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index b738880350f9..d78fe4594338 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -560,9 +560,9 @@ el0_sync_compat:
cmp x24, #ESR_ELx_EC_UNKNOWN // unknown exception in EL0
b.eq el0_undef
cmp x24, #ESR_ELx_EC_CP15_32 // CP15 MRC/MCR trap
- b.eq el0_undef
+ b.eq el0_cp15
cmp x24, #ESR_ELx_EC_CP15_64 // CP15 MRRC/MCRR trap
- b.eq el0_undef
+ b.eq el0_cp15
cmp x24, #ESR_ELx_EC_CP14_MR // CP14 MRC/MCR trap
b.eq el0_undef
cmp x24, #ESR_ELx_EC_CP14_LS // CP14 LDC/STC trap
@@ -585,6 +585,16 @@ el0_svc_compat:
el0_irq_compat:
kernel_entry 0, 32
b el0_irq_naked
+el0_cp15:
+ /*
+ * Trapped CP15 (MRC, MCR, MRRC, MCRR) instructions
+ */
+ enable_dbg_and_irq
+ ct_user_exit
+ mov x0, x25
+ mov x1, sp
+ bl do_cp15instr
+ b ret_to_user
#endif

el0_da:
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index b02295673216..19e4f8d9a73c 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -556,6 +556,19 @@ static struct sys64_hook sys64_hooks[] = {
{},
};

+
+#ifdef CONFIG_COMPAT
+asmlinkage void __exception do_cp15instr(unsigned int esr, struct pt_regs *regs)
+{
+ /*
+ * New cp15 instructions may previously have been undefined at
+ * EL0. Fall back to our usual undefined instruction handler
+ * so that we handle these consistently.
+ */
+ do_undefinstr(regs);
+}
+#endif
+
asmlinkage void __exception do_sysinstr(unsigned int esr, struct pt_regs *regs)
{
struct sys64_hook *hook;
--
2.11.0
Marc Zyngier
2017-07-21 17:15:36 UTC
Permalink
In order to deal with conditional Thumb code, let's add a helper
to advance the IT state when an instruction is either skipped
or emulated.

Signed-off-by: Marc Zyngier <***@arm.com>
---
arch/arm/include/asm/opcodes.h | 2 ++
arch/arm/kernel/opcodes.c | 34 ++++++++++++++++++++++++++++++++++
2 files changed, 36 insertions(+)

diff --git a/arch/arm/include/asm/opcodes.h b/arch/arm/include/asm/opcodes.h
index e796c598513b..6ff7a04e8530 100644
--- a/arch/arm/include/asm/opcodes.h
+++ b/arch/arm/include/asm/opcodes.h
@@ -12,6 +12,8 @@
#ifndef __ASSEMBLY__
#include <linux/linkage.h>
extern asmlinkage unsigned int arm_check_condition(u32 opcode, u32 psr);
+struct pt_regs;
+extern void arm_advance_itstate(struct pt_regs *regs);
#endif

#define ARM_OPCODE_CONDTEST_FAIL 0
diff --git a/arch/arm/kernel/opcodes.c b/arch/arm/kernel/opcodes.c
index 827318ee5ff7..58cb36d0ee55 100644
--- a/arch/arm/kernel/opcodes.c
+++ b/arch/arm/kernel/opcodes.c
@@ -53,6 +53,40 @@ static u32 psr_get_it_state(u32 psr)
return it;
}

+static void psr_set_it_state(struct pt_regs *regs, u32 it)
+{
+ u32 cpsr_it;
+
+ cpsr_it = (it << PSR_IT_1_0_SHIFT) & PSR_IT_1_0_MASK;
+ cpsr_it |= ((it >> 2) << PSR_IT_7_2_SHIFT) & PSR_IT_7_2_MASK;
+
+ regs->ARM_cpsr &= ~PSR_IT_MASK;
+ regs->ARM_cpsr |= cpsr_it;
+}
+
+void arm_advance_itstate(struct pt_regs *regs)
+{
+ u32 it;
+
+ /* ARM mode or no conditional */
+ if (thumb_mode(regs) || !(regs->ARM_cpsr & PSR_IT_MASK))
+ return;
+
+ it = psr_get_it_state(regs->ARM_cpsr);
+
+ /*
+ * If this is the last instruction of the block, wipe the IT
+ * state. Otherwise advance it. See the ITAdvance() pseudocode
+ * for reference.
+ */
+ if (!(it & 7))
+ it = 0;
+ else
+ it = (it & 0xe0) | ((it << 1) & 0x1f);
+
+ psr_set_it_state(regs, it);
+}
+
/*
* Returns:
* ARM_OPCODE_CONDTEST_FAIL - if condition fails
--
2.11.0
Marc Zyngier
2017-07-21 17:15:33 UTC
Permalink
Just like CNTVCT, we need to handle userspace trapping into the
kernel if we're decided that the timer wasn't fit for purpose...
64bit userspace is already dealt with, but we're missing the
equivalent compat handling.

Acked-by: Mark Rutland <***@arm.com>
Signed-off-by: Marc Zyngier <***@arm.com>
---
arch/arm64/include/asm/esr.h | 3 +++
arch/arm64/kernel/traps.c | 13 +++++++++++++
2 files changed, 16 insertions(+)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 43e629fe73dd..64ef4075b597 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -261,6 +261,9 @@
#define ESR_ELx_CP15_64_ISS_SYS_CNTVCT (ESR_ELx_CP15_64_ISS_SYS_VAL(1, 14) | \
ESR_ELx_CP15_64_ISS_DIR_READ)

+#define ESR_ELx_CP15_32_ISS_SYS_CNTFRQ (ESR_ELx_CP15_32_ISS_SYS_VAL(0, 0, 14, 0) |\
+ ESR_ELx_CP15_32_ISS_DIR_READ)
+
#ifndef __ASSEMBLY__
#include <asm/types.h>

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 70efd63a5545..07846d747188 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -558,7 +558,20 @@ static struct sys64_hook sys64_hooks[] = {


#ifdef CONFIG_COMPAT
+static void compat_cntfrq_read_handler(unsigned int esr, struct pt_regs *regs)
+{
+ int reg = (esr & ESR_ELx_CP15_32_ISS_RT_MASK) >> ESR_ELx_CP15_32_ISS_RT_SHIFT;
+
+ regs->user_regs.regs[reg] = arch_timer_get_rate();
+ regs->pc += 4;
+}
+
static struct sys64_hook cp15_32_hooks[] = {
+ {
+ .esr_mask = ESR_ELx_CP15_32_ISS_SYS_MASK,
+ .esr_val = ESR_ELx_CP15_32_ISS_SYS_CNTFRQ,
+ .handler = compat_cntfrq_read_handler,
+ },
{},
};
--
2.11.0
Marc Zyngier
2017-07-21 17:15:39 UTC
Permalink
As we're about to enable trapping of some of the userspace-visible
timer registers, let's add a handler for CNTVCT.

Signed-off-by: Marc Zyngier <***@arm.com>
---
arch/arm/kernel/traps.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index b697e9234a07..bda12a243b40 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -30,6 +30,7 @@
#include <linux/irq.h>

#include <linux/atomic.h>
+#include <asm/arch_timer.h>
#include <asm/cacheflush.h>
#include <asm/exception.h>
#include <asm/unistd.h>
@@ -735,6 +736,34 @@ late_initcall(arm_mrc_hook_init);

#endif

+#ifdef CONFIG_ARM_ARCH_TIMER
+static int read_cntvct_trap(struct pt_regs *regs, unsigned int instr)
+{
+ int rt = (instr >> 12) & 15;
+ int rt2 = (instr >> 16) & 15;
+ u64 val = arch_counter_get_cntvct();
+
+ regs->uregs[rt] = lower_32_bits(val);
+ regs->uregs[rt2] = upper_32_bits(val);
+ regs->ARM_pc += 4;
+ return 0;
+}
+
+static struct undef_hook cntvct_hook = {
+ .instr_mask = 0x0ff00fff,
+ .instr_val = 0x0c500f1e,
+ .fn = read_cntvct_trap,
+};
+
+static int __init arch_timer_hook_init(void)
+{
+ register_undef_hook(&cntvct_hook);
+ return 0;
+}
+
+late_initcall(arch_timer_hook_init);
+#endif
+
/*
* A data abort trap was taken, but we did not handle the instruction.
* Try to abort the user program, or panic if it was the kernel.
--
2.11.0
Marc Zyngier
2017-07-21 17:15:32 UTC
Permalink
Since people seem to make a point in breaking the userspace visible
counter, we have no choice but to trap the access. We already do this
for 64bit userspace, but this is lacking for compat. Let's provide
the required handler.

Acked-by: Mark Rutland <***@arm.com>
Signed-off-by: Marc Zyngier <***@arm.com>
---
arch/arm64/include/asm/esr.h | 3 +++
arch/arm64/kernel/traps.c | 16 ++++++++++++++++
2 files changed, 19 insertions(+)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index fc71b89c99f9..43e629fe73dd 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -258,6 +258,9 @@
ESR_ELx_CP15_64_ISS_CRM_MASK | \
ESR_ELx_CP15_64_ISS_DIR_MASK)

+#define ESR_ELx_CP15_64_ISS_SYS_CNTVCT (ESR_ELx_CP15_64_ISS_SYS_VAL(1, 14) | \
+ ESR_ELx_CP15_64_ISS_DIR_READ)
+
#ifndef __ASSEMBLY__
#include <asm/types.h>

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index ac91f4ead737..70efd63a5545 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -562,7 +562,23 @@ static struct sys64_hook cp15_32_hooks[] = {
{},
};

+static void compat_cntvct_read_handler(unsigned int esr, struct pt_regs *regs)
+{
+ int rt = (esr & ESR_ELx_CP15_64_ISS_RT_MASK) >> ESR_ELx_CP15_64_ISS_RT_SHIFT;
+ int rt2 = (esr & ESR_ELx_CP15_64_ISS_RT2_MASK) >> ESR_ELx_CP15_64_ISS_RT2_SHIFT;
+ u64 val = arch_counter_get_cntvct();
+
+ regs->user_regs.regs[rt] = lower_32_bits(val);
+ regs->user_regs.regs[rt2] = upper_32_bits(val);
+ regs->pc += 4;
+}
+
static struct sys64_hook cp15_64_hooks[] = {
+ {
+ .esr_mask = ESR_ELx_CP15_64_ISS_SYS_MASK,
+ .esr_val = ESR_ELx_CP15_64_ISS_SYS_CNTVCT,
+ .handler = compat_cntvct_read_handler,
+ },
{},
};
--
2.11.0
Marc Zyngier
2017-07-21 17:15:35 UTC
Permalink
As indicated in the ARMv7 ARM in B1.9.2, an instruction that
would cause an UNDEF can trap even if it fails its condition
code. Yes, this is annoying.

Add a arm_check_condition call before trying to handle an
UNDEF, and move the PC by the right amount if the condition
check fails.

Signed-off-by: Marc Zyngier <***@arm.com>
---
arch/arm/kernel/traps.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 948c648fea00..ca51e80a60b6 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -462,6 +462,19 @@ asmlinkage void __exception do_undefinstr(struct pt_regs *regs)
instr = __mem_to_opcode_arm(instr);
}

+ /*
+ * An UNDEF is allowed to trap even when failing its condition
+ * code (see B1.9.2 in the ARMv7 ARM).
+ */
+ if (arm_check_condition(instr, regs->ARM_cpsr) == ARM_OPCODE_CONDTEST_FAIL) {
+ if (thumb_mode(regs) && !is_wide_instruction(instr))
+ regs->ARM_pc += 2;
+ else
+ regs->ARM_pc +=4;
+
+ return;
+ }
+
if (call_undef_hook(regs, instr) == 0)
return;
--
2.11.0
Marc Zyngier
2017-07-21 17:15:34 UTC
Permalink
arm_check_condition indicates whether a trapped conditional
instruction would have failed or passed its condition check.

This works perfectly well for ARM code, but it ignores entirely
the Thumb mode, where the condition code is not encoded in the
intruction, but in the IT state contained in the PSR.

This patch adds the necessary decoding, allowing arm_check_condition
to correctly behave when presented with a Thumb instruction.

Signed-off-by: Marc Zyngier <***@arm.com>
---
arch/arm/kernel/opcodes.c | 34 +++++++++++++++++++++++++++++++++-
1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kernel/opcodes.c b/arch/arm/kernel/opcodes.c
index f8179c6a817f..827318ee5ff7 100644
--- a/arch/arm/kernel/opcodes.c
+++ b/arch/arm/kernel/opcodes.c
@@ -38,6 +38,21 @@ static const unsigned short cc_map[16] = {
0 /* NV */
};

+#define PSR_IT_1_0_SHIFT 25
+#define PSR_IT_1_0_MASK (0x3 << PSR_IT_1_0_SHIFT)
+#define PSR_IT_7_2_SHIFT 10
+#define PSR_IT_7_2_MASK (0x3f << PSR_IT_7_2_SHIFT)
+
+static u32 psr_get_it_state(u32 psr)
+{
+ u32 it;
+
+ it = (psr & PSR_IT_1_0_MASK) >> PSR_IT_1_0_SHIFT;
+ it |= ((psr & PSR_IT_7_2_MASK) >> PSR_IT_7_2_SHIFT) << 2;
+
+ return it;
+}
+
/*
* Returns:
* ARM_OPCODE_CONDTEST_FAIL - if condition fails
@@ -54,10 +69,27 @@ static const unsigned short cc_map[16] = {
*/
asmlinkage unsigned int arm_check_condition(u32 opcode, u32 psr)
{
- u32 cc_bits = opcode >> 28;
+ u32 cc_bits;
u32 psr_cond = psr >> 28;
unsigned int ret;

+ /*
+ * If the CPU is in Thumb mode Thumb, extract the condition
+ * code from psr. Otherwise, extract the condition code from
+ * the instruction itself.
+ */
+ if (psr & PSR_T_BIT) {
+ u32 it;
+
+ it = psr_get_it_state(psr);
+ if (!it)
+ return ARM_OPCODE_CONDTEST_PASS;
+
+ cc_bits = it >> 4;
+ } else {
+ cc_bits = opcode >> 28;
+ }
+
if (cc_bits != ARM_OPCODE_CONDITION_UNCOND) {
if ((cc_map[cc_bits] >> (psr_cond)) & 1)
ret = ARM_OPCODE_CONDTEST_PASS;
--
2.11.0
Marc Zyngier
2017-07-21 17:15:42 UTC
Permalink
If we end-up in a situation where any of the CPUs doesn't have its
CNTFRQ register correctly programmed, we cannot reliably expose this
register to userspace. It means that in this case, we have to trap
it, and CNTVCT at the same time (since they are controlled by the
same trapping bit).

It also means that there is no point in enabling the VDSO access to
the counter, since we're going to enter the kernel.

Acked-by: Mark Rutland <***@arm.com>
Signed-off-by: Marc Zyngier <***@arm.com>
---
drivers/clocksource/arm_arch_timer.c | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 4f4633157978..37404a6a1bfa 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -759,6 +759,25 @@ static void arch_timer_configure_evtstream(void)
arch_timer_evtstrm_enable(min(pos, 15));
}

+static bool cntfrq_valid(void)
+{
+ u32 freq = arch_timer_get_cntfrq();
+
+ if (freq != arch_timer_rate) {
+ /*
+ * If any of the CPUs disagrees on what CNTFRQ should
+ * actually be, we end-up disabling the vdso fastpath
+ * for the whole system.
+ */
+ pr_warn("CPU%d: Invalid CNTFRQ (%u, expected %u)\n",
+ smp_processor_id(), freq, arch_timer_rate);
+ disable_vdso();
+ return false;
+ }
+
+ return true;
+}
+
static void arch_counter_set_user_access(void)
{
u32 cntkctl = arch_timer_get_cntkctl();
@@ -773,11 +792,12 @@ static void arch_counter_set_user_access(void)

/*
* Enable user access to the virtual counter if it doesn't
- * need to be workaround. The vdso may have been already
- * disabled though.
+ * need to be workaround, and that the frequency has been
+ * correctly set. The vdso may have been already disabled
+ * though.
*/
- if (arch_timer_this_cpu_has_cntvct_wa())
- pr_info("CPU%d: Trapping CNTVCT access\n", smp_processor_id());
+ if (arch_timer_this_cpu_has_cntvct_wa() || !cntfrq_valid())
+ pr_info("CPU%d: Trapping CNTVCT/CNTFRQ access\n", smp_processor_id());
else
cntkctl |= ARCH_TIMER_USR_VCT_ACCESS_EN;
--
2.11.0
Marc Zyngier
2017-07-21 17:15:31 UTC
Permalink
We're now ready to start handling CP15 access. Let's add (empty)
arrays for both 32 and 64bit accessors, and the code that deals
with them.

Signed-off-by: Marc Zyngier <***@arm.com>
---
arch/arm64/kernel/traps.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index c89458f975bf..ac91f4ead737 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -558,6 +558,14 @@ static struct sys64_hook sys64_hooks[] = {


#ifdef CONFIG_COMPAT
+static struct sys64_hook cp15_32_hooks[] = {
+ {},
+};
+
+static struct sys64_hook cp15_64_hooks[] = {
+ {},
+};
+
#define PSTATE_IT_1_0_SHIFT 25
#define PSTATE_IT_1_0_MASK (0x3 << PSTATE_IT_1_0_SHIFT)
#define PSTATE_IT_7_2_SHIFT 10
@@ -629,6 +637,8 @@ static void advance_itstate(struct pt_regs *regs)

asmlinkage void __exception do_cp15instr(unsigned int esr, struct pt_regs *regs)
{
+ struct sys64_hook *hook, *hook_base;
+
if (!cp15_cond_valid(esr, regs)) {
advance_itstate(regs);
/*
@@ -639,6 +649,25 @@ asmlinkage void __exception do_cp15instr(unsigned int esr, struct pt_regs *regs)
return;
}

+ switch (ESR_ELx_EC(esr)) {
+ case ESR_ELx_EC_CP15_32:
+ hook_base = cp15_32_hooks;
+ break;
+ case ESR_ELx_EC_CP15_64:
+ hook_base = cp15_64_hooks;
+ break;
+ default:
+ do_undefinstr(regs);
+ return;
+ }
+
+ for (hook = hook_base; hook->handler; hook++)
+ if ((hook->esr_mask & esr) == hook->esr_val) {
+ hook->handler(esr, regs);
+ advance_itstate(regs);
+ return;
+ }
+
/*
* New cp15 instructions may previously have been undefined at
* EL0. Fall back to our usual undefined instruction handler
--
2.11.0
Marc Zyngier
2017-07-21 17:15:41 UTC
Permalink
As we are going to add more paths where we have to disable the
counter access in the VDSO, let's add a helper that does exactly that.

Acked-by: Mark Rutland <***@arm.com>
Signed-off-by: Marc Zyngier <***@arm.com>
---
drivers/clocksource/arm_arch_timer.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index aae87c4c546e..4f4633157978 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -188,6 +188,22 @@ struct ate_acpi_oem_info {
u32 oem_revision;
};

+static void __maybe_unused disable_vdso(void)
+{
+ /*
+ * Don't use the vdso fastpath if errata require using the
+ * out-of-line counter accessor and/or trap its access. We may
+ * change our mind pretty late in the game (with a per-CPU
+ * erratum, for example), so change both the default value and
+ * the vdso itself.
+ */
+ if (vdso_default) {
+ pr_info("Disabling VDSO direct access\n");
+ clocksource_counter.archdata.vdso_direct = false;
+ vdso_default = false;
+ }
+}
+
#ifdef CONFIG_FSL_ERRATUM_A008585
/*
* The number of retries is an arbitrary value well beyond the highest number
@@ -457,16 +473,8 @@ void arch_timer_enable_workaround(const struct arch_timer_erratum_workaround *wa

static_branch_enable(&arch_timer_read_ool_enabled);

- /*
- * Don't use the vdso fastpath if errata require using the
- * out-of-line counter accessor. We may change our mind pretty
- * late in the game (with a per-CPU erratum, for example), so
- * change both the default value and the vdso itself.
- */
- if (wa->read_cntvct_el0) {
- clocksource_counter.archdata.vdso_direct = false;
- vdso_default = false;
- }
+ if (wa->read_cntvct_el0)
+ disable_vdso();
}

static void arch_timer_check_ool_workaround(enum arch_timer_erratum_match_type type,
--
2.11.0
Marc Zyngier
2017-07-21 17:15:40 UTC
Permalink
As we're about to enable trapping of some of the userspace-visible
timer registers, let's add a handler for CNTFRQ.

Signed-off-by: Marc Zyngier <***@arm.com>
---
arch/arm/kernel/traps.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index bda12a243b40..1c1b86af20e9 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -755,9 +755,26 @@ static struct undef_hook cntvct_hook = {
.fn = read_cntvct_trap,
};

+static int read_cntfrq_trap(struct pt_regs *regs, unsigned int instr)
+{
+ int reg = (instr >> 12) & 15;
+ if (reg == 15)
+ return 1;
+ regs->uregs[reg] = arch_timer_get_rate();
+ regs->ARM_pc += 4;
+ return 0;
+}
+
+static struct undef_hook cntfrq_hook = {
+ .instr_mask = 0x0fff0fff,
+ .instr_val = 0x0e1e0f10,
+ .fn = read_cntfrq_trap,
+};
+
static int __init arch_timer_hook_init(void)
{
register_undef_hook(&cntvct_hook);
+ register_undef_hook(&cntfrq_hook);
return 0;
}
--
2.11.0
Marc Zyngier
2017-07-21 17:15:38 UTC
Permalink
We now check the validity of the condition code before calling
the UNDEF handlers, so we can simplify swp_handler by only
checking for ARM_OPCODE_CONDTEST_UNCOND, which indicates that
this is not a SWP instruction.

Signed-off-by: Marc Zyngier <***@arm.com>
---
arch/arm/kernel/swp_emulate.c | 15 ++-------------
1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/arch/arm/kernel/swp_emulate.c b/arch/arm/kernel/swp_emulate.c
index 3bda08bee674..2683c559e0ff 100644
--- a/arch/arm/kernel/swp_emulate.c
+++ b/arch/arm/kernel/swp_emulate.c
@@ -179,20 +179,9 @@ static int swp_handler(struct pt_regs *regs, unsigned int instr)

perf_sw_event(PERF_COUNT_SW_EMULATION_FAULTS, 1, regs, regs->ARM_pc);

- res = arm_check_condition(instr, regs->ARM_cpsr);
- switch (res) {
- case ARM_OPCODE_CONDTEST_PASS:
- break;
- case ARM_OPCODE_CONDTEST_FAIL:
- /* Condition failed - return to next instruction */
- regs->ARM_pc += 4;
- return 0;
- case ARM_OPCODE_CONDTEST_UNCOND:
- /* If unconditional encoding - not a SWP, undef */
+ /* If unconditional encoding - not a SWP, undef */
+ if (arm_check_condition(instr, regs->ARM_cpsr) == ARM_OPCODE_CONDTEST_UNCOND)
return -EFAULT;
- default:
- return -EINVAL;
- }

if (current->pid != previous_pid) {
pr_debug("\"%s\" (%ld) uses deprecated SWP{B} instruction\n",
--
2.11.0
Marc Zyngier
2017-07-21 17:15:30 UTC
Permalink
Here's a /really nice/ part of the architecture: a CP15 access is
allowed to trap even if it fails its condition check, and SW must
handle it. This includes decoding the IT state if this happens in
am IT block. As a consequence, SW must also deal with advancing
the IT state machine.

Signed-off-by: Marc Zyngier <***@arm.com>
---
arch/arm64/kernel/traps.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 79 insertions(+)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 19e4f8d9a73c..c89458f975bf 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -558,8 +558,87 @@ static struct sys64_hook sys64_hooks[] = {


#ifdef CONFIG_COMPAT
+#define PSTATE_IT_1_0_SHIFT 25
+#define PSTATE_IT_1_0_MASK (0x3 << PSTATE_IT_1_0_SHIFT)
+#define PSTATE_IT_7_2_SHIFT 10
+#define PSTATE_IT_7_2_MASK (0x3f << PSTATE_IT_7_2_SHIFT)
+
+static u32 compat_get_it_state(struct pt_regs *regs)
+{
+ u32 it, pstate = regs->pstate;
+
+ it = (pstate & PSTATE_IT_1_0_MASK) >> PSTATE_IT_1_0_SHIFT;
+ it |= ((pstate & PSTATE_IT_7_2_MASK) >> PSTATE_IT_7_2_SHIFT) << 2;
+
+ return it;
+}
+
+static void compat_set_it_state(struct pt_regs *regs, u32 it)
+{
+ u32 pstate_it;
+
+ pstate_it = (it << PSTATE_IT_1_0_SHIFT) & PSTATE_IT_1_0_MASK;
+ pstate_it |= ((it >> 2) << PSTATE_IT_7_2_SHIFT) & PSTATE_IT_7_2_MASK;
+
+ regs->pstate &= ~COMPAT_PSR_IT_MASK;
+ regs->pstate |= pstate_it;
+}
+
+static bool cp15_cond_valid(unsigned int esr, struct pt_regs *regs)
+{
+ int cond;
+
+ /* Only a T32 instruction can trap without CV being set */
+ if (!(esr & ESR_ELx_CV)) {
+ u32 it;
+
+ it = compat_get_it_state(regs);
+ if (!it)
+ return true;
+
+ cond = it >> 4;
+ } else {
+ cond = (esr & ESR_ELx_CP15_32_ISS_COND_MASK) >> ESR_ELx_CP15_32_ISS_COND_SHIFT;
+ }
+
+ return aarch32_opcode_cond_checks[cond](regs->pstate);
+}
+
+static void advance_itstate(struct pt_regs *regs)
+{
+ u32 it;
+
+ /* ARM mode */
+ if (!(regs->pstate & COMPAT_PSR_T_BIT) ||
+ !(regs->pstate & COMPAT_PSR_IT_MASK))
+ return;
+
+ it = compat_get_it_state(regs);
+
+ /*
+ * If this is the last instruction of the block, wipe the IT
+ * state. Otherwise advance it.
+ */
+ if (!(it & 7))
+ it = 0;
+ else
+ it = (it & 0xe0) | ((it << 1) & 0x1f);
+
+ compat_set_it_state(regs, it);
+}
+
asmlinkage void __exception do_cp15instr(unsigned int esr, struct pt_regs *regs)
{
+ if (!cp15_cond_valid(esr, regs)) {
+ advance_itstate(regs);
+ /*
+ * There is no T16 variant of a CP access, so we
+ * always advance PC by 4 bytes.
+ */
+ regs->pc += 4;
+ return;
+ }
+
/*
* New cp15 instructions may previously have been undefined at
* EL0. Fall back to our usual undefined instruction handler
--
2.11.0
Marc Zyngier
2017-07-21 17:15:37 UTC
Permalink
When trapping a conditional Thumb instruction, we need to
advance the IT state accordingly, or we'll end-up corrupting
the execution of a subsequent instruction.

Let's add calls to arm_advance_itstate() in the relevant
spots.

Signed-off-by: Marc Zyngier <***@arm.com>
---
arch/arm/kernel/traps.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index ca51e80a60b6..b697e9234a07 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -472,11 +472,14 @@ asmlinkage void __exception do_undefinstr(struct pt_regs *regs)
else
regs->ARM_pc +=4;

+ arm_advance_itstate(regs);
return;
}

- if (call_undef_hook(regs, instr) == 0)
+ if (call_undef_hook(regs, instr) == 0) {
+ arm_advance_itstate(regs);
return;
+ }

die_sig:
#ifdef CONFIG_DEBUG_USER
--
2.11.0
Will Deacon
2017-07-24 12:19:59 UTC
Permalink
It is an unfortunate situation that CNTFRQ{,_EL0} is often
misprogrammed from the firmware side, leaving it up to the kernel to
work around it. This is usually done by providing an alternative
frequency in the Device Tree.
Unfortunately, CNTFRQ is accessible from EL0, giving userspace the
wrong frequency, and potentially a different frequency per CPU, which
is definitely not what you want. A possible workaround is to trap this
into the kernel and to emulate it (together with the VDSO being
disabled), and this is what this series is achieving.
Which userspace is actually affected by a broken CNTFRQ register? I suspect
most users will be more upset at losing their (perfectly functional) vDSO
acceleration than they are about having a broken CNTFRQ value that is hardly
ever used, especially since this affects quite a few systems.

Will
Marc Zyngier
2017-07-24 12:48:37 UTC
Permalink
Post by Will Deacon
It is an unfortunate situation that CNTFRQ{,_EL0} is often
misprogrammed from the firmware side, leaving it up to the kernel to
work around it. This is usually done by providing an alternative
frequency in the Device Tree.
Unfortunately, CNTFRQ is accessible from EL0, giving userspace the
wrong frequency, and potentially a different frequency per CPU, which
is definitely not what you want. A possible workaround is to trap this
into the kernel and to emulate it (together with the VDSO being
disabled), and this is what this series is achieving.
Which userspace is actually affected by a broken CNTFRQ register? I suspect
most users will be more upset at losing their (perfectly functional) vDSO
acceleration than they are about having a broken CNTFRQ value that is hardly
ever used, especially since this affects quite a few systems.
OpenMPI is one of the things I'm aware of (we broke it when implementing
the first set of timer workarounds), and from trawling the Debian code
search, at least HHVM is another candidate. How this will affect them is
anybody's guess.

M.
--
Jazz is not dead. It just smells funny...
Will Deacon
2017-07-24 13:12:53 UTC
Permalink
Post by Marc Zyngier
Post by Will Deacon
It is an unfortunate situation that CNTFRQ{,_EL0} is often
misprogrammed from the firmware side, leaving it up to the kernel to
work around it. This is usually done by providing an alternative
frequency in the Device Tree.
Unfortunately, CNTFRQ is accessible from EL0, giving userspace the
wrong frequency, and potentially a different frequency per CPU, which
is definitely not what you want. A possible workaround is to trap this
into the kernel and to emulate it (together with the VDSO being
disabled), and this is what this series is achieving.
Which userspace is actually affected by a broken CNTFRQ register? I suspect
most users will be more upset at losing their (perfectly functional) vDSO
acceleration than they are about having a broken CNTFRQ value that is hardly
ever used, especially since this affects quite a few systems.
OpenMPI is one of the things I'm aware of (we broke it when implementing
the first set of timer workarounds), and from trawling the Debian code
search, at least HHVM is another candidate. How this will affect them is
anybody's guess.
The latest mcrouter sources pulled into HHVM don't use cntfrq, but you're
right about OpenMPI. However, these things are using the counter directly
as a performance optimisation: the moment we start trapping then they've
lost. I doubt it's much better than giving the wrong data for the
frequency (i.e. they're just as broken in both cases).

So, if they want to run on these systems, their best bet is to use the
vDSO-accelerated clock_gettime implementation. Yes, there's a dispatch cost
compared to an inline asm, but it will beat the pants off a trap to the
kernel. The problem is that this patch series prevents them from doing that
and just means they're screwed whatever they do. We can point at the broken
firmware, but it doesn't feel to me like this workaround is really helping
anybody :/.

Will
Marc Zyngier
2017-07-24 13:35:08 UTC
Permalink
Post by Will Deacon
Post by Marc Zyngier
Post by Will Deacon
It is an unfortunate situation that CNTFRQ{,_EL0} is often
misprogrammed from the firmware side, leaving it up to the kernel to
work around it. This is usually done by providing an alternative
frequency in the Device Tree.
Unfortunately, CNTFRQ is accessible from EL0, giving userspace the
wrong frequency, and potentially a different frequency per CPU, which
is definitely not what you want. A possible workaround is to trap this
into the kernel and to emulate it (together with the VDSO being
disabled), and this is what this series is achieving.
Which userspace is actually affected by a broken CNTFRQ register? I suspect
most users will be more upset at losing their (perfectly functional) vDSO
acceleration than they are about having a broken CNTFRQ value that is hardly
ever used, especially since this affects quite a few systems.
OpenMPI is one of the things I'm aware of (we broke it when implementing
the first set of timer workarounds), and from trawling the Debian code
search, at least HHVM is another candidate. How this will affect them is
anybody's guess.
The latest mcrouter sources pulled into HHVM don't use cntfrq, but you're
right about OpenMPI. However, these things are using the counter directly
as a performance optimisation: the moment we start trapping then they've
lost. I doubt it's much better than giving the wrong data for the
frequency (i.e. they're just as broken in both cases).
So, if they want to run on these systems, their best bet is to use the
vDSO-accelerated clock_gettime implementation. Yes, there's a dispatch cost
compared to an inline asm, but it will beat the pants off a trap to the
kernel. The problem is that this patch series prevents them from doing that
and just means they're screwed whatever they do. We can point at the broken
firmware, but it doesn't feel to me like this workaround is really helping
anybody :/.
Fair enough.

It still remains that our trapping story is inconsistent. When we
workaround timer issues, we do disable the VDSO and trap both cntvct and
cntfrq. These will trap to EL1, and we'll deliver a SIGILL to userspace,
while 64bit userspace will just work. Is that something we want to address?

With a 32bit kernel, we don't trap anything yet (because we don't work
around any of the errata existing on 64bit systems), so I guess this
could be postponed until we actually make these workarounds available on
32bit (for KVM guests).

Thanks,

M.
--
Jazz is not dead. It just smells funny...
Will Deacon
2017-07-25 11:46:57 UTC
Permalink
Post by Marc Zyngier
Post by Will Deacon
Post by Marc Zyngier
Post by Will Deacon
It is an unfortunate situation that CNTFRQ{,_EL0} is often
misprogrammed from the firmware side, leaving it up to the kernel to
work around it. This is usually done by providing an alternative
frequency in the Device Tree.
Unfortunately, CNTFRQ is accessible from EL0, giving userspace the
wrong frequency, and potentially a different frequency per CPU, which
is definitely not what you want. A possible workaround is to trap this
into the kernel and to emulate it (together with the VDSO being
disabled), and this is what this series is achieving.
Which userspace is actually affected by a broken CNTFRQ register? I suspect
most users will be more upset at losing their (perfectly functional) vDSO
acceleration than they are about having a broken CNTFRQ value that is hardly
ever used, especially since this affects quite a few systems.
OpenMPI is one of the things I'm aware of (we broke it when implementing
the first set of timer workarounds), and from trawling the Debian code
search, at least HHVM is another candidate. How this will affect them is
anybody's guess.
The latest mcrouter sources pulled into HHVM don't use cntfrq, but you're
right about OpenMPI. However, these things are using the counter directly
as a performance optimisation: the moment we start trapping then they've
lost. I doubt it's much better than giving the wrong data for the
frequency (i.e. they're just as broken in both cases).
So, if they want to run on these systems, their best bet is to use the
vDSO-accelerated clock_gettime implementation. Yes, there's a dispatch cost
compared to an inline asm, but it will beat the pants off a trap to the
kernel. The problem is that this patch series prevents them from doing that
and just means they're screwed whatever they do. We can point at the broken
firmware, but it doesn't feel to me like this workaround is really helping
anybody :/.
Fair enough.
It still remains that our trapping story is inconsistent. When we
workaround timer issues, we do disable the VDSO and trap both cntvct and
cntfrq. These will trap to EL1, and we'll deliver a SIGILL to userspace,
while 64bit userspace will just work. Is that something we want to address?
That's a good question, and I guess it depends on whether or not we have
people asking for that. Are we aware of userspace that is being broken by
this? If so, then we can address it, otherwise we can wait until somebody
shouts. It's not unlikely that any 32-bit code trying to use the counter
registers directly already handles SIGILL, since that's what will happen
on older CPUs without the arch timer anyway.

Will

Loading...