Discussion:
[RFC PATCH v2 00/38] Nested Virtualization on KVM/ARM
Jintack Lim
2017-07-18 16:58:26 UTC
Permalink
Nested virtualization is the ability to run a virtual machine inside another
virtual machine. In other words, it’s about running a hypervisor (the guest
hypervisor) on top of another hypervisor (the host hypervisor).

Supporting nested virtualization on ARM means that the hypervisor provides not
only EL0/EL1 execution environment to VMs as it usually does but also the
virtualization extensions including EL2 execution environment. Once the host
hypervisor provides those execution environments to the VMs, then the guest
hypervisor can run its own VMs (nested VMs) naturally.

This series supports nested virtualization on arm64. ARM recently announced an
extension (ARMv8.3) which has support for nested virtualization[1]. This patch
set is based on the ARMv8.3 specification and tested on the FastModel with
ARMv8.3 extension.

The whole patch set to support nested virtualization is huge over 70
patches, so I categorized them into four parts: CPU, memory, VGIC, and timer
virtualization. This patch series is the first part.

CPU virtualization patch series provides basic nested virtualization framework
and instruction emulations including v8.1 VHE feature and v8.3 nested
virtualization feature for VMs.

This patch series again can be divided into four parts. Patch 1 to 5 introduces
nested virtualization by discovering hardware feature, adding a kernel
parameter and allowing the userspace to set the initial CPU mode to EL2.

Patch 6 to 25 are to support the EL2 execution environment, the virtual EL2, to
a VM on v8.0 architecture. We de-privilege the guest hypervisor and emulate the
virtual EL2 mode in EL1 using the hardware features provided by ARMv8.3; The
host hypervisor manages virtual EL2 register state for the guest hypervisor
and shadow EL1 register state that reflects the virtual EL2 register state to
run the guest hypervisor in EL1.

Patch 26 to 33 add support for the virtual EL2 with Virtualization Host
Extensions. These patches emulate newly defined registers and bits in v8.1 and
allow the virtual EL2 to access EL2 register states via EL1 register accesses
as in the real EL2.

Patch 34 to 38 are to support for the virtual EL2 with nested virtualization.
These enable recursive nested virtualization.

This patch set is tested on the FastModel with the v8.3 extension for arm64 and
a cubietruck for arm32. On the FastModel, the host and the guest kernels are
compiled with and without VHE, so there are four combinations. I was able to
boot SMP Linux in the nested VM on all four configurations and able to run
hackbench. I also checked that regular VMs could boot when the nested
virtualization kernel parameter was not set. On the cubietruck, I also verified
that regular VMs could boot as well.

I'll share my experiment setup shortly.

Even though this work has some limitations and TODOs, I'd appreciate early
feedback on this RFC. Specifically, I'm interested in:

- Overall design to manage vcpu context for the virtual EL2
- Verifying correct EL2 register configurations such as HCR_EL2, CPTR_EL2
(Patch 30 and 32)
- Patch organization and coding style

This patch series is based on kvm/next d38338e.
The whole patch series including memory, VGIC, and timer patches is available
here:

***@github.com:columbia/nesting-pub.git rfc-v2

Limitations:
- There are some cases that the target exception level of a VM is ambiguous when
emulating eret instruction. I'm discussing this issue with Christoffer and
Marc. Meanwhile, I added a temporary patch (not included in this
series. f1beaba in the repo) and used 4.10.0 kernel when testing the guest
hypervisor with VHE.
- Recursive nested virtualization is not tested yet.
- Other hypervisors (such as Xen) on KVM are not tested.

TODO:
- Submit memory, VGIC, and timer patches
- Evaluate regular VM performance to see if there's a negative impact.
- Test other hypervisors such as Xen on KVM
- Test recursive nested virtualization

v1-->v2:
- Added support for the virtual EL2 with VHE
- Rewrote commit messages and comments from the perspective of supporting
execution environments to VMs, rather than from the perspective of the guest
hypervisor running in them.
- Fixed a few bugs to make it run on the FastModel.
- Tested on ARMv8.3 with four configurations. (host/guest. with/without VHE.)
- Rebased to kvm/next

[1] https://www.community.arm.com/processors/b/blog/posts/armv8-a-architecture-2016-additions

Christoffer Dall (7):
KVM: arm64: Add KVM nesting feature
KVM: arm64: Allow userspace to set PSR_MODE_EL2x
KVM: arm64: Add vcpu_mode_el2 primitive to support nesting
KVM: arm/arm64: Add a framework to prepare virtual EL2 execution
arm64: Add missing TCR hw defines
KVM: arm64: Create shadow EL1 registers
KVM: arm64: Trap EL1 VM register accesses in virtual EL2

Jintack Lim (31):
arm64: Add ARM64_HAS_NESTED_VIRT feature
KVM: arm/arm64: Enable nested virtualization via command-line
KVM: arm/arm64: Check if nested virtualization is in use
KVM: arm64: Add EL2 system registers to vcpu context
KVM: arm64: Add EL2 special registers to vcpu context
KVM: arm64: Add the shadow context for virtual EL2 execution
KVM: arm64: Set vcpu context depending on the guest exception level
KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and
exit
KVM: arm64: Move exception macros and enums to a common file
KVM: arm64: Support to inject exceptions to the virtual EL2
KVM: arm64: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
KVM: arm64: Trap CPACR_EL1 access in virtual EL2
KVM: arm64: Handle eret instruction traps
KVM: arm64: Set a handler for the system instruction traps
KVM: arm64: Handle PSCI call via smc from the guest
KVM: arm64: Inject HVC exceptions to the virtual EL2
KVM: arm64: Respect virtual HCR_EL2.TWX setting
KVM: arm64: Respect virtual CPTR_EL2.TFP setting
KVM: arm64: Add macros to support the virtual EL2 with VHE
KVM: arm64: Add EL2 registers defined in ARMv8.1 to vcpu context
KVM: arm64: Emulate EL12 register accesses from the virtual EL2
KVM: arm64: Support a VM with VHE considering EL0 of the VHE host
KVM: arm64: Allow the virtual EL2 to access EL2 states without trap
KVM: arm64: Manage the shadow states when virtual E2H bit enabled
KVM: arm64: Trap and emulate CPTR_EL2 accesses via CPACR_EL1 from the
virtual EL2 with VHE
KVM: arm64: Emulate appropriate VM control system registers
KVM: arm64: Respect the virtual HCR_EL2.NV bit setting
KVM: arm64: Respect the virtual HCR_EL2.NV bit setting for EL12
register traps
KVM: arm64: Respect virtual HCR_EL2.TVM and TRVM settings
KVM: arm64: Respect the virtual HCR_EL2.NV1 bit setting
KVM: arm64: Respect the virtual CPTR_EL2.TCPAC setting

Documentation/admin-guide/kernel-parameters.txt | 4 +
arch/arm/include/asm/kvm_emulate.h | 17 ++
arch/arm/include/asm/kvm_host.h | 15 +
arch/arm64/include/asm/cpucaps.h | 3 +-
arch/arm64/include/asm/esr.h | 1 +
arch/arm64/include/asm/kvm_arm.h | 2 +
arch/arm64/include/asm/kvm_coproc.h | 3 +-
arch/arm64/include/asm/kvm_emulate.h | 56 ++++
arch/arm64/include/asm/kvm_host.h | 64 ++++-
arch/arm64/include/asm/kvm_hyp.h | 24 --
arch/arm64/include/asm/pgtable-hwdef.h | 6 +
arch/arm64/include/asm/sysreg.h | 70 +++++
arch/arm64/include/uapi/asm/kvm.h | 1 +
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kernel/cpufeature.c | 11 +
arch/arm64/kvm/Makefile | 5 +-
arch/arm64/kvm/context.c | 346 +++++++++++++++++++++++
arch/arm64/kvm/emulate-nested.c | 83 ++++++
arch/arm64/kvm/guest.c | 2 +
arch/arm64/kvm/handle_exit.c | 89 +++++-
arch/arm64/kvm/hyp/entry.S | 13 +
arch/arm64/kvm/hyp/hyp-entry.S | 2 +-
arch/arm64/kvm/hyp/switch.c | 33 ++-
arch/arm64/kvm/hyp/sysreg-sr.c | 117 ++++----
arch/arm64/kvm/inject_fault.c | 12 -
arch/arm64/kvm/nested.c | 63 +++++
arch/arm64/kvm/reset.c | 8 +
arch/arm64/kvm/sys_regs.c | 359 +++++++++++++++++++++++-
arch/arm64/kvm/sys_regs.h | 8 +
arch/arm64/kvm/trace.h | 43 ++-
virt/kvm/arm/arm.c | 20 ++
31 files changed, 1363 insertions(+), 118 deletions(-)
create mode 100644 arch/arm64/kvm/context.c
create mode 100644 arch/arm64/kvm/emulate-nested.c
create mode 100644 arch/arm64/kvm/nested.c
--
1.9.1
Jintack Lim
2017-07-18 16:58:30 UTC
Permalink
Nested virtualizaion is in use only if all three conditions are met:
- The architecture supports nested virtualization.
- The kernel parameter is set.
- The userspace uses nested virtualiztion feature.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm/include/asm/kvm_host.h | 11 +++++++++++
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/kvm/nested.c | 17 +++++++++++++++++
virt/kvm/arm/arm.c | 4 ++++
4 files changed, 34 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 00b0f97..7e9e6c8 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -303,4 +303,15 @@ static inline int __init kvmarm_nested_cfg(char *buf)
{
return 0;
}
+
+static inline int init_nested_virt(void)
+{
+ return 0;
+}
+
+static inline bool nested_virt_in_use(struct kvm_vcpu *vcpu)
+{
+ return false;
+}
+
#endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 6df0c7c..86d4b6c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -387,5 +387,7 @@ static inline void __cpu_init_stage2(void)
}

int __init kvmarm_nested_cfg(char *buf);
+int init_nested_virt(void);
+bool nested_virt_in_use(struct kvm_vcpu *vcpu);

#endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 79f38da..9a05c76 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -24,3 +24,20 @@ int __init kvmarm_nested_cfg(char *buf)
{
return strtobool(buf, &nested_param);
}
+
+int init_nested_virt(void)
+{
+ if (nested_param && cpus_have_const_cap(ARM64_HAS_NESTED_VIRT))
+ kvm_info("Nested virtualization is supported\n");
+
+ return 0;
+}
+
+bool nested_virt_in_use(struct kvm_vcpu *vcpu)
+{
+ if (nested_param && cpus_have_const_cap(ARM64_HAS_NESTED_VIRT)
+ && test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features))
+ return true;
+
+ return false;
+}
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 1c1c772..36aae3a 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1478,6 +1478,10 @@ int kvm_arch_init(void *opaque)
if (err)
goto out_err;

+ err = init_nested_virt();
+ if (err)
+ return err;
+
err = init_subsystems();
if (err)
goto out_hyp;
--
1.9.1
Jintack Lim
2017-07-18 16:58:28 UTC
Permalink
Add a new kernel parameter(kvm-arm.nested) to enable KVM/ARM nested
virtualization support. This kernel parameter on arm architecture is
ignored since nested virtualization is not supported on arm.

Note that this kernel parameter will not have any impact until nested
virtualization support is completed. Just add this parameter first to
use it when implementing nested virtualization support.

Signed-off-by: Jintack Lim <***@linaro.org>
---
Documentation/admin-guide/kernel-parameters.txt | 4 ++++
arch/arm/include/asm/kvm_host.h | 4 ++++
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/kvm/Makefile | 2 ++
arch/arm64/kvm/nested.c | 26 +++++++++++++++++++++++++
virt/kvm/arm/arm.c | 2 ++
6 files changed, 40 insertions(+)
create mode 100644 arch/arm64/kvm/nested.c

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index aa8341e..8fb152d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1841,6 +1841,10 @@
[KVM,ARM] Trap guest accesses to GICv3 common
system registers

+ kvm-arm.nested=
+ [KVM,ARM] Allow nested virtualization in KVM/ARM.
+ Default is 0 (disabled)
+
kvm-intel.ept= [KVM,Intel] Disable extended page tables
(virtualized MMU) support on capable Intel chips.
Default is 1 (enabled)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 127e2dd..00b0f97 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -299,4 +299,8 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr);

+static inline int __init kvmarm_nested_cfg(char *buf)
+{
+ return 0;
+}
#endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0c4fd1f..dcc4df8 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -386,4 +386,6 @@ static inline void __cpu_init_stage2(void)
"PARange is %d bits, unsupported configuration!", parange);
}

+int __init kvmarm_nested_cfg(char *buf);
+
#endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 5d98100..f513047 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -35,3 +35,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-debug.o
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
+
+kvm-$(CONFIG_KVM_ARM_HOST) += nested.o
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
new file mode 100644
index 0000000..79f38da
--- /dev/null
+++ b/arch/arm64/kvm/nested.c
@@ -0,0 +1,26 @@
+/*
+ * Copyright (C) 2017 - Columbia University and Linaro Ltd.
+ * Author: Jintack Lim <***@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+static bool nested_param;
+
+int __init kvmarm_nested_cfg(char *buf)
+{
+ return strtobool(buf, &nested_param);
+}
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index a39a1e1..1c1c772 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -67,6 +67,8 @@

static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);

+early_param("kvm-arm.nested", kvmarm_nested_cfg);
+
static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
{
BUG_ON(preemptible());
--
1.9.1
Jintack Lim
2017-07-18 16:58:27 UTC
Permalink
Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the
CPU has the ARMv8.3 nested virtualization capability.

This will be used to support nested virtualization in KVM.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/include/asm/cpucaps.h | 3 ++-
arch/arm64/include/asm/sysreg.h | 1 +
arch/arm64/kernel/cpufeature.c | 11 +++++++++++
3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 8d2272c..64df263 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -39,7 +39,8 @@
#define ARM64_WORKAROUND_QCOM_FALKOR_E1003 18
#define ARM64_WORKAROUND_858921 19
#define ARM64_WORKAROUND_CAVIUM_30115 20
+#define ARM64_HAS_NESTED_VIRT 21

-#define ARM64_NCAPS 21
+#define ARM64_NCAPS 22

#endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 16e44fa..737ca30 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -376,6 +376,7 @@
#define ID_AA64MMFR1_VMIDBITS_16 2

/* id_aa64mmfr2 */
+#define ID_AA64MMFR2_NV_SHIFT 24
#define ID_AA64MMFR2_LVA_SHIFT 16
#define ID_AA64MMFR2_IESB_SHIFT 12
#define ID_AA64MMFR2_LSM_SHIFT 8
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 94b8f7f..523f998 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -144,6 +144,7 @@
};

static const struct arm64_ftr_bits ftr_id_aa64mmfr2[] = {
+ ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64MMFR2_NV_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64MMFR2_LVA_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64MMFR2_IESB_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64MMFR2_LSM_SHIFT, 4, 0),
@@ -867,6 +868,16 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
.min_field_value = 0,
.matches = has_no_fpsimd,
},
+ {
+ .desc = "Nested Virtualization Support",
+ .capability = ARM64_HAS_NESTED_VIRT,
+ .def_scope = SCOPE_SYSTEM,
+ .matches = has_cpuid_feature,
+ .sys_reg = SYS_ID_AA64MMFR2_EL1,
+ .sign = FTR_UNSIGNED,
+ .field_pos = ID_AA64MMFR2_NV_SHIFT,
+ .min_field_value = 1,
+ },
{},
};
--
1.9.1
Jintack Lim
2017-07-18 16:58:31 UTC
Permalink
From: Christoffer Dall <***@linaro.org>

We were not allowing userspace to set a more privileged mode for the VCPU
than EL1, but now that we support nesting with a virtual EL2 mode, do
allow this!

Signed-off-by: Christoffer Dall <***@linaro.org>
---
arch/arm64/kvm/guest.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5c7f657..5e673ae 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -117,6 +117,8 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
case PSR_MODE_EL0t:
case PSR_MODE_EL1t:
case PSR_MODE_EL1h:
+ case PSR_MODE_EL2h:
+ case PSR_MODE_EL2t:
break;
default:
err = -EINVAL;
--
1.9.1
Jintack Lim
2017-07-18 16:58:29 UTC
Permalink
From: Christoffer Dall <***@linaro.org>

Set the initial exception level of the guest to EL2 if nested
virtualization feature is enabled.

Signed-off-by: Christoffer Dall <***@linaro.org>
Signed-off-by: Jintack Lim <***@cs.columbia.edu>
---
arch/arm64/include/asm/kvm_host.h | 2 +-
arch/arm64/include/uapi/asm/kvm.h | 1 +
arch/arm64/kvm/reset.c | 8 ++++++++
3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index dcc4df8..6df0c7c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -39,7 +39,7 @@

#define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS

-#define KVM_VCPU_MAX_FEATURES 4
+#define KVM_VCPU_MAX_FEATURES 5

#define KVM_REQ_SLEEP \
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 9f3ca24..4a71a72 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -99,6 +99,7 @@ struct kvm_regs {
#define KVM_ARM_VCPU_EL1_32BIT 1 /* CPU running a 32bit VM */
#define KVM_ARM_VCPU_PSCI_0_2 2 /* CPU uses PSCI v0.2 */
#define KVM_ARM_VCPU_PMU_V3 3 /* Support guest PMUv3 */
+#define KVM_ARM_VCPU_NESTED_VIRT 4 /* Support nested virtualization */

struct kvm_vcpu_init {
__u32 target;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 3256b92..1353516 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -41,6 +41,11 @@
PSR_F_BIT | PSR_D_BIT),
};

+static const struct kvm_regs default_regs_reset_el2 = {
+ .regs.pstate = (PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT |
+ PSR_F_BIT | PSR_D_BIT),
+};
+
static const struct kvm_regs default_regs_reset32 = {
.regs.pstate = (COMPAT_PSR_MODE_SVC | COMPAT_PSR_A_BIT |
COMPAT_PSR_I_BIT | COMPAT_PSR_F_BIT),
@@ -106,6 +111,9 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
if (!cpu_has_32bit_el1())
return -EINVAL;
cpu_reset = &default_regs_reset32;
+ } else if (test_bit(KVM_ARM_VCPU_NESTED_VIRT,
+ vcpu->arch.features)) {
+ cpu_reset = &default_regs_reset_el2;
} else {
cpu_reset = &default_regs_reset;
}
--
1.9.1
Jintack Lim
2017-07-18 16:58:32 UTC
Permalink
From: Christoffer Dall <***@linaro.org>

When running a nested hypervisor we occasionally have to figure out if
the mode we are switching into is the virtual EL2 mode or a regular
EL0/1 mode.

Signed-off-by: Christoffer Dall <***@linaro.org>
---
arch/arm/include/asm/kvm_emulate.h | 6 ++++++
arch/arm64/include/asm/kvm_emulate.h | 12 ++++++++++++
2 files changed, 18 insertions(+)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 9a8a45a..399cd75e 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -77,6 +77,12 @@ static inline bool vcpu_mode_is_32bit(const struct kvm_vcpu *vcpu)
return 1;
}

+/* We don't support nesting on arm */
+static inline bool vcpu_mode_el2(const struct kvm_vcpu *vcpu)
+{
+ return false;
+}
+
static inline unsigned long *vcpu_pc(struct kvm_vcpu *vcpu)
{
return &vcpu->arch.ctxt.gp_regs.usr_regs.ARM_pc;
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index fe39e68..5d6f3d0 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -143,6 +143,18 @@ static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
return mode != PSR_MODE_EL0t;
}

+static inline bool vcpu_mode_el2(const struct kvm_vcpu *vcpu)
+{
+ u32 mode;
+
+ if (vcpu_mode_is_32bit(vcpu))
+ return false;
+
+ mode = *vcpu_cpsr(vcpu) & PSR_MODE_MASK;
+
+ return mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t;
+}
+
static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
{
return vcpu->arch.fault.esr_el2;
--
1.9.1
Jintack Lim
2017-07-18 16:58:43 UTC
Permalink
From: Christoffer Dall <***@linaro.org>

When running in virtual EL2 mode, we actually run the hardware in EL1
and therefore have to use the EL1 registers to ensure correct operation.

By setting the HCR.TVM and HCR.TVRM we ensure that the virtual EL2 mode
doesn't shoot itself in the foot when setting up what it believes to be
a different mode's system register state (for example when preparing to
switch to a VM).

We can leverage the existing sysregs infrastructure to support trapped
accesses to these registers.

Signed-off-by: Christoffer Dall <***@linaro.org>
---
arch/arm64/kvm/hyp/switch.c | 2 ++
arch/arm64/kvm/sys_regs.c | 7 ++++++-
2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 945e79c..ec91cd08 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -85,6 +85,8 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
write_sysreg(1 << 30, fpexc32_el2);
isb();
}
+ if (vcpu_mode_el2(vcpu))
+ val |= HCR_TVM | HCR_TRVM;
write_sysreg(val, hcr_el2);
/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
write_sysreg(1 << 15, hstr_el2);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2b3ed70..d8b1d4b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -121,7 +121,12 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
{
bool was_enabled = vcpu_has_cache_enabled(vcpu);

- BUG_ON(!p->is_write);
+ BUG_ON(!vcpu_mode_el2(vcpu) && !p->is_write);
+
+ if (!p->is_write) {
+ p->regval = vcpu_sys_reg(vcpu, r->reg);
+ return true;
+ }

if (!p->is_aarch32) {
vcpu_sys_reg(vcpu, r->reg) = p->regval;
--
1.9.1
Jintack Lim
2017-07-18 16:58:51 UTC
Permalink
Forward traps due to FP/ASIMD register accesses to the virtual EL2 if
virtual CPTR_EL2.TFP is set. Note that if TFP bit is set, then even
accesses to FP/ASIMD register from EL2 as well as NS EL0/1 will trap to
EL2. So, we don't check the VM's exception level.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kvm/handle_exit.c | 15 +++++++++++----
arch/arm64/kvm/hyp/entry.S | 13 +++++++++++++
arch/arm64/kvm/hyp/hyp-entry.S | 2 +-
4 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index b3bb7ef..f5117a3 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -134,6 +134,7 @@ int main(void)
DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, fp_regs));
DEFINE(VCPU_FPEXC32_EL2, offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
DEFINE(VCPU_HOST_CONTEXT, offsetof(struct kvm_vcpu, arch.host_cpu_context));
+ DEFINE(VIRTUAL_CPTR_EL2, offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[CPTR_EL2]));
#endif
#ifdef CONFIG_CPU_PM
DEFINE(CPU_SUSPEND_SZ, sizeof(struct cpu_suspend_ctx));
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 25ec824..d4e7b2b 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -84,11 +84,18 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
}

/*
- * Guest access to FP/ASIMD registers are routed to this handler only
- * when the system doesn't support FP/ASIMD.
+ * When the system supports FP/ASMID and we are NOT running nested
+ * virtualization, FP/ASMID traps are handled in EL2 directly.
+ * This handler handles the cases those are not belong to the above case.
*/
-static int handle_no_fpsimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
+static int kvm_handle_fpasimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
+
+ /* This is for nested virtualization */
+ if (vcpu_sys_reg(vcpu, CPTR_EL2) & CPTR_EL2_TFP)
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
+ /* This is the case when the system doesn't support FP/ASIMD. */
kvm_inject_undefined(vcpu);
return 1;
}
@@ -220,7 +227,7 @@ static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
[ESR_ELx_EC_BREAKPT_LOW]= kvm_handle_guest_debug,
[ESR_ELx_EC_BKPT32] = kvm_handle_guest_debug,
[ESR_ELx_EC_BRK64] = kvm_handle_guest_debug,
- [ESR_ELx_EC_FP_ASIMD] = handle_no_fpsimd,
+ [ESR_ELx_EC_FP_ASIMD] = kvm_handle_fpasimd,
};

static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index 12ee62d..95af673 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -158,6 +158,19 @@ abort_guest_exit_end:
1: ret
ENDPROC(__guest_exit)

+ENTRY(__fpsimd_guest_trap)
+ // If virtual CPTR_EL2.TFP is set, then forward the trap to the
+ // virtual EL2. For the non-nested case, this bit is always 0.
+ mrs x1, tpidr_el2
+ ldr x0, [x1, #VIRTUAL_CPTR_EL2]
+ and x0, x0, #CPTR_EL2_TFP
+ cbnz x0, 1f
+ b __fpsimd_guest_restore
+1:
+ mov x0, #ARM_EXCEPTION_TRAP
+ b __guest_exit
+ENDPROC(__fpsimd_guest_trap)
+
ENTRY(__fpsimd_guest_restore)
stp x2, x3, [sp, #-16]!
stp x4, lr, [sp, #-16]!
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 5170ce1..ab169fd 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -113,7 +113,7 @@ el1_trap:
*/
alternative_if_not ARM64_HAS_NO_FPSIMD
cmp x0, #ESR_ELx_EC_FP_ASIMD
- b.eq __fpsimd_guest_restore
+ b.eq __fpsimd_guest_trap
alternative_else_nop_endif

mrs x1, tpidr_el2
--
1.9.1
Jintack Lim
2017-07-18 16:58:57 UTC
Permalink
When creating the shadow context for the virtual EL2 execution, we can
directly copy the EL2 register states to the shadow EL1 register states
if the virtual HCR_EL2.E2H bit is set. This is because EL1 and EL2
system register formats compatible with E2H=1.

Now that we allow the virtual EL2 modify its EL2 registers without trap
via the physical EL1 system register accesses, we need to reflect the
changes made to the EL1 system registers to the virtual EL2 register
states. This is not required to the virtual EL2 without VHE, since the
virtual EL2 should always use _EL2 accessors, which traps to EL2.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/kvm/context.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index 39bd92d..9947bc8 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -39,6 +39,27 @@ struct el1_el2_map {
{ VBAR_EL1, VBAR_EL2 },
};

+/*
+ * List of pair of EL1/EL2 registers which are used to access real EL2
+ * registers in EL2 with E2H bit set.
+ */
+static const struct el1_el2_map vhe_map[] = {
+ { SCTLR_EL1, SCTLR_EL2 },
+ { CPACR_EL1, CPTR_EL2 },
+ { TTBR0_EL1, TTBR0_EL2 },
+ { TTBR1_EL1, TTBR1_EL2 },
+ { TCR_EL1, TCR_EL2},
+ { AFSR0_EL1, AFSR0_EL2 },
+ { AFSR1_EL1, AFSR1_EL2 },
+ { ESR_EL1, ESR_EL2},
+ { FAR_EL1, FAR_EL2},
+ { MAIR_EL1, MAIR_EL2 },
+ { AMAIR_EL1, AMAIR_EL2 },
+ { VBAR_EL1, VBAR_EL2 },
+ { CONTEXTIDR_EL1, CONTEXTIDR_EL2 },
+ { CNTKCTL_EL1, CNTHCTL_EL2 },
+};
+
static inline u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
{
return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
@@ -57,7 +78,27 @@ static inline u64 cptr_to_cpacr(u64 cptr_el2)
return cpacr_el1;
}

-static void flush_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
+static void sync_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
+{
+ u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
+ int i;
+
+ /*
+ * In the virtual EL2 without VHE no EL1 system registers can't be
+ * changed without trap except el1_non_trap_regs[]. So we have nothing
+ * to sync on exit from a guest.
+ */
+ if (!vcpu_el2_e2h_is_set(vcpu))
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(vhe_map); i++) {
+ const struct el1_el2_map *map = &vhe_map[i];
+
+ vcpu_sys_reg(vcpu, map->el2) = s_sys_regs[map->el1];
+ }
+}
+
+static void flush_shadow_el1_sysregs_nvhe(struct kvm_vcpu *vcpu)
{
u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
u64 tcr_el2;
@@ -86,6 +127,29 @@ static void flush_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
s_sys_regs[CPACR_EL1] = cptr_to_cpacr(vcpu_sys_reg(vcpu, CPTR_EL2));
}

+static void flush_shadow_el1_sysregs_vhe(struct kvm_vcpu *vcpu)
+{
+ u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
+ int i;
+
+ /*
+ * When e2h bit is set, EL2 registers becomes compatible
+ * with corrensponding EL1 registers. So, no conversion required.
+ */
+ for (i = 0; i < ARRAY_SIZE(vhe_map); i++) {
+ const struct el1_el2_map *map = &vhe_map[i];
+
+ s_sys_regs[map->el1] = vcpu_sys_reg(vcpu, map->el2);
+ }
+}
+
+static void flush_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
+{
+ if (vcpu_el2_e2h_is_set(vcpu))
+ flush_shadow_el1_sysregs_vhe(vcpu);
+ else
+ flush_shadow_el1_sysregs_nvhe(vcpu);
+}

/*
* List of EL0 and EL1 registers which we allow the virtual EL2 mode to access
@@ -247,6 +311,7 @@ void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
if (unlikely(is_hyp_ctxt(vcpu))) {
sync_shadow_special_regs(vcpu);
sync_shadow_non_trap_el1_state(vcpu);
+ sync_shadow_el1_sysregs(vcpu);
} else
sync_special_regs(vcpu);
}
--
1.9.1
Jintack Lim
2017-07-18 16:58:58 UTC
Permalink
While the EL1 virtual memory control registers can be accessed in the
virtual EL2 with VHE without trap to manuplate the virtual EL2 states,
we can't do that for CPTR_EL2 for an unfortunate reason.

This is because the top bit of CPTR_EL2, which is TCPAC, will be ignored
if it is accessed via CPACR_EL1 in the virtual EL2 without trap since
the top bot of cpacr_el1 is RES0. Therefore we need to trap CPACR_EL1
accesses from the virtual EL2 to emulate this bit correctly.

Set CPTR_EL2.TCPAC bit to trap CPACR_EL1 accesses and handle them in the
existing handler considering that they could be meant to access CPTR_EL2
instead in the virtual EL2 with VHE.

Note that CPTR_EL2 format depends on HCR_EL2.E2H bit. We always keep it
in v8.0 format for the convenience. Otherwise, we need to check E2H bit
and use different bit masks in the entry.S, and we also check E2H bit in
all places we access virtual CPTR_EL2. The downside of using v8.0 format
is to convert the format when copying states between CPTR_EL2 and
CPACR_EL1 to support the virtual EL2 with VHE. The decision is subject
to change depending on the future discussion.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/include/asm/kvm_emulate.h | 2 ++
arch/arm64/kvm/context.c | 29 ++++++++++++++++++++++++++---
arch/arm64/kvm/hyp/switch.c | 2 ++
arch/arm64/kvm/sys_regs.c | 18 +++++++++++++++++-
4 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 68aafbd..4776bfc 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -59,6 +59,8 @@ enum exception_type {
void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
+u64 cptr_to_cpacr(u64 cptr_el2);
+u64 cpacr_to_cptr(u64 cpacr_el1);

static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
{
diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index 9947bc8..a7811e1 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -66,7 +66,7 @@ static inline u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
<< TCR_IPS_SHIFT;
}

-static inline u64 cptr_to_cpacr(u64 cptr_el2)
+u64 cptr_to_cpacr(u64 cptr_el2)
{
u64 cpacr_el1 = 0;

@@ -78,6 +78,21 @@ static inline u64 cptr_to_cpacr(u64 cptr_el2)
return cpacr_el1;
}

+u64 cpacr_to_cptr(u64 cpacr_el1)
+{
+ u64 cptr_el2;
+
+ cptr_el2 = CPTR_EL2_DEFAULT;
+ if (!(cpacr_el1 & CPACR_EL1_FPEN))
+ cptr_el2 |= CPTR_EL2_TFP;
+ if (cpacr_el1 & CPACR_EL1_TTA)
+ cptr_el2 |= CPTR_EL2_TTA;
+ if (cpacr_el1 & CPTR_EL2_TCPAC)
+ cptr_el2 |= CPTR_EL2_TCPAC;
+
+ return cptr_el2;
+}
+
static void sync_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
{
u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
@@ -93,8 +108,12 @@ static void sync_shadow_el1_sysregs(struct kvm_vcpu *vcpu)

for (i = 0; i < ARRAY_SIZE(vhe_map); i++) {
const struct el1_el2_map *map = &vhe_map[i];
+ u64 *el2_reg = &vcpu_sys_reg(vcpu, map->el2);

- vcpu_sys_reg(vcpu, map->el2) = s_sys_regs[map->el1];
+ /* We do trap-and-emulate CPACR_EL1 accesses. So, don't sync */
+ if (map->el2 == CPTR_EL2)
+ continue;
+ *el2_reg = s_sys_regs[map->el1];
}
}

@@ -138,8 +157,12 @@ static void flush_shadow_el1_sysregs_vhe(struct kvm_vcpu *vcpu)
*/
for (i = 0; i < ARRAY_SIZE(vhe_map); i++) {
const struct el1_el2_map *map = &vhe_map[i];
+ u64 *el1_reg = &s_sys_regs[map->el1];

- s_sys_regs[map->el1] = vcpu_sys_reg(vcpu, map->el2);
+ if (map->el2 == CPTR_EL2)
+ *el1_reg = cptr_to_cpacr(vcpu_sys_reg(vcpu, map->el2));
+ else
+ *el1_reg = vcpu_sys_reg(vcpu, map->el2);
}
}

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index fffd0c7..50c90f2 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -50,6 +50,8 @@ static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
val = read_sysreg(cpacr_el1);
val |= CPACR_EL1_TTA;
val &= ~CPACR_EL1_FPEN;
+ if (is_hyp_ctxt(vcpu))
+ val |= CPTR_EL2_TCPAC;
write_sysreg(val, cpacr_el1);

write_sysreg(__kvm_hyp_vector, vbar_el1);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2aa922c..79980be 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -972,7 +972,23 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
- access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
+ u64 reg = sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+ /*
+ * When the virtual HCR_EL2.E2H == 1, an access to CPACR_EL1
+ * in the virtual EL2 is to access CPTR_EL2.
+ */
+ if (vcpu_el2_e2h_is_set(vcpu) && (reg == SYS_CPACR_EL1)) {
+ u64 *sysreg = &vcpu_sys_reg(vcpu, CPTR_EL2);
+
+ /* We keep the value in ARMv8.0 CPTR_EL2 format. */
+ if (!p->is_write)
+ p->regval = cptr_to_cpacr(*sysreg);
+ else
+ *sysreg = cpacr_to_cptr(p->regval);
+ } else /* CPACR_EL1 access with E2H == 0 or CPACR_EL12 access */
+ access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
+
return true;
}
--
1.9.1
Jintack Lim
2017-07-18 16:59:04 UTC
Permalink
Forward CPACR_EL1 traps to the virtual EL2 if virtual CPTR_EL2 is
configured to trap CPACR_EL1 accesses from EL1.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/kvm/sys_regs.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 6f67666..ba2966d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1091,6 +1091,11 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
if (el12_reg(p) && forward_nv_traps(vcpu))
return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));

+ /* Forward this trap to the virtual EL2 if CPTR_EL2.TCPAC is set*/
+ if (!el12_reg(p) && !vcpu_mode_el2(vcpu) &&
+ (vcpu_sys_reg(vcpu, CPTR_EL2) & CPTR_EL2_TCPAC))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
/*
* When the virtual HCR_EL2.E2H == 1, an access to CPACR_EL1
* in the virtual EL2 is to access CPTR_EL2.
--
1.9.1
Jintack Lim
2017-07-18 16:59:00 UTC
Permalink
Forward traps due to HCR_EL2.NV bit to the virtual EL2 if they are not
coming from the virtual EL2 and the virtual HCR_EL2.NV bit is set.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/include/asm/kvm_coproc.h | 1 +
arch/arm64/kvm/handle_exit.c | 13 +++++++++++++
arch/arm64/kvm/sys_regs.c | 22 ++++++++++++++++++++++
4 files changed, 37 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 6e99978..aeaac4e 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -23,6 +23,7 @@
#include <asm/types.h>

/* Hyp Configuration Register (HCR) bits */
+#define HCR_NV (UL(1) << 42)
#define HCR_E2H (UL(1) << 34)
#define HCR_ID (UL(1) << 33)
#define HCR_CD (UL(1) << 32)
diff --git a/arch/arm64/include/asm/kvm_coproc.h b/arch/arm64/include/asm/kvm_coproc.h
index 1b3d21b..6223df6 100644
--- a/arch/arm64/include/asm/kvm_coproc.h
+++ b/arch/arm64/include/asm/kvm_coproc.h
@@ -44,6 +44,7 @@ void kvm_register_target_sys_reg_table(unsigned int target,
int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_handle_sys(struct kvm_vcpu *vcpu, struct kvm_run *run);
+bool forward_nv_traps(struct kvm_vcpu *vcpu);

#define kvm_coproc_table_init kvm_sys_reg_table_init
void kvm_sys_reg_table_init(void);
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index d4e7b2b..fccd9d6 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -61,6 +61,12 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
int ret;

+ /*
+ * Forward this trapped smc instruction to the virtual EL2.
+ */
+ if (forward_nv_traps(vcpu) && (vcpu_sys_reg(vcpu, HCR_EL2) & HCR_TSC))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
/* If imm is non-zero, it's not defined */
if (kvm_vcpu_hvc_get_imm(vcpu)) {
kvm_inject_undefined(vcpu);
@@ -197,6 +203,13 @@ static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
vcpu_el2_sreg(vcpu, SPSR_EL2));

/*
+ * Forward this trap to the virtual EL2 if the virtual HCR_EL2.NV
+ * bit is set.
+ */
+ if (forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
+ /*
* Note that the current exception level is always the virtual EL2,
* since we set HCR_EL2.NV bit only when entering the virtual EL2.
*/
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 910b50d..4fd7090 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -939,6 +939,14 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
return true;
}

+/* This function is to support the recursive nested virtualization */
+bool forward_nv_traps(struct kvm_vcpu *vcpu)
+{
+ if (!vcpu_mode_el2(vcpu) && (vcpu_sys_reg(vcpu, HCR_EL2) & HCR_NV))
+ return true;
+ return false;
+}
+
static inline void access_rw(struct sys_reg_params *p, u64 *sysreg)
{
if (!p->is_write)
@@ -977,6 +985,13 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
{
u64 *sys_reg;

+ /*
+ * Forward this trap to the virtual EL2 if the virtual HCR_EL2.NV
+ * bit is set.
+ */
+ if (forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
sys_reg = get_special_reg(vcpu, p);
if (!sys_reg)
sys_reg = &vcpu_sys_reg(vcpu, r->reg);
@@ -1914,6 +1929,13 @@ static int emulate_sys_instr(struct kvm_vcpu *vcpu,
{
int ret = 0;

+ /*
+ * Forward this trap to the virtual EL2 if the virtual HCR_EL2.NV
+ * bit is set.
+ */
+ if (forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
/* TLB maintenance instructions*/
if (params->CRn == 0b1000)
ret = emulate_tlbi(vcpu, params);
--
1.9.1
Jintack Lim
2017-07-18 16:58:49 UTC
Permalink
Now that the psci call is done by the smc instruction when nested
virtualization is enabled, it is clear that all hvc instruction from the
VM (including from the virtual EL2) are supposed to handled in the
virtual EL2.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/kvm/handle_exit.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 6cf6b93..8b398b2 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -42,6 +42,12 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
kvm_vcpu_hvc_get_imm(vcpu));
vcpu->stat.hvc_exit_stat++;

+ /* Forward hvc instructions to the virtual EL2 if the guest has EL2. */
+ if (nested_virt_in_use(vcpu)) {
+ kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+ return 1;
+ }
+
ret = kvm_psci_call(vcpu);
if (ret < 0) {
kvm_inject_undefined(vcpu);
--
1.9.1
Jintack Lim
2017-07-18 16:58:48 UTC
Permalink
VMs used to execute hvc #0 for the psci call if EL3 is not implemented.
However, when we come to provide the virtual EL2 mode to the VM, the
host OS inside the VM calls kvm_call_hyp() which is also hvc #0. So,
it's hard to differentiate between them from the host hypervisor's point
of view.

So, let the VM execute smc instruction for the psci call. On ARMv8.3,
even if EL3 is not implemented, a smc instruction executed at non-secure
EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than being treated as
UNDEFINED. So, the host hypervisor can handle this psci call without any
confusion.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/kvm/handle_exit.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index d19e253..6cf6b93 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -53,8 +53,28 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)

static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
- kvm_inject_undefined(vcpu);
- return 1;
+ int ret;
+
+ /* If imm is non-zero, it's not defined */
+ if (kvm_vcpu_hvc_get_imm(vcpu)) {
+ kvm_inject_undefined(vcpu);
+ return 1;
+ }
+
+ /*
+ * If imm is zero, it's a psci call.
+ * Note that on ARMv8.3, even if EL3 is not implemented, SMC executed
+ * at Non-secure EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than
+ * being treated as UNDEFINED.
+ */
+ ret = kvm_psci_call(vcpu);
+ if (ret < 0) {
+ kvm_inject_undefined(vcpu);
+ return 1;
+ }
+ kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+
+ return ret;
}

/*
--
1.9.1
Jintack Lim
2017-07-18 16:58:47 UTC
Permalink
When HCR.NV bit is set, execution of the EL2 translation regime address
aranslation instructions and TLB maintenance instructions are trapped to
EL2. In addition, execution of the EL1 translation regime address
aranslation instructions and TLB maintenance instructions that are only
accessible from EL2 and above are trapped to EL2. In these cases,
ESR_EL2.EC will be set to 0x18.

Change the existing handler to handle those system instructions as well
as MRS/MSR instructions. Emulation of each system instructions will be
done in separate patches.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/include/asm/kvm_coproc.h | 2 +-
arch/arm64/kvm/handle_exit.c | 2 +-
arch/arm64/kvm/sys_regs.c | 53 ++++++++++++++++++++++++++++++++-----
arch/arm64/kvm/trace.h | 2 +-
4 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_coproc.h b/arch/arm64/include/asm/kvm_coproc.h
index 0b52377..1b3d21b 100644
--- a/arch/arm64/include/asm/kvm_coproc.h
+++ b/arch/arm64/include/asm/kvm_coproc.h
@@ -43,7 +43,7 @@ void kvm_register_target_sys_reg_table(unsigned int target,
int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
-int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_sys(struct kvm_vcpu *vcpu, struct kvm_run *run);

#define kvm_coproc_table_init kvm_sys_reg_table_init
void kvm_sys_reg_table_init(void);
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 9259881..d19e253 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -174,7 +174,7 @@ static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
[ESR_ELx_EC_SMC32] = handle_smc,
[ESR_ELx_EC_HVC64] = handle_hvc,
[ESR_ELx_EC_SMC64] = handle_smc,
- [ESR_ELx_EC_SYS64] = kvm_handle_sys_reg,
+ [ESR_ELx_EC_SYS64] = kvm_handle_sys,
[ESR_ELx_EC_ERET] = kvm_handle_eret,
[ESR_ELx_EC_IABT_LOW] = kvm_handle_guest_abort,
[ESR_ELx_EC_DABT_LOW] = kvm_handle_guest_abort,
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7062645..dbf5022 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1808,6 +1808,40 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
return 1;
}

+static int emulate_tlbi(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *params)
+{
+ /* TODO: support tlbi instruction emulation*/
+ kvm_inject_undefined(vcpu);
+ return 1;
+}
+
+static int emulate_at(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *params)
+{
+ /* TODO: support address translation instruction emulation */
+ kvm_inject_undefined(vcpu);
+ return 1;
+}
+
+static int emulate_sys_instr(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *params)
+{
+ int ret = 0;
+
+ /* TLB maintenance instructions*/
+ if (params->CRn == 0b1000)
+ ret = emulate_tlbi(vcpu, params);
+ /* Address Translation instructions */
+ else if (params->CRn == 0b0111 && params->CRm == 0b1000)
+ ret = emulate_at(vcpu, params);
+
+ if (ret)
+ kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+
+ return ret;
+}
+
static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *table, size_t num)
{
@@ -1819,18 +1853,19 @@ static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
}

/**
- * kvm_handle_sys_reg -- handles a mrs/msr trap on a guest sys_reg access
+ * kvm_handle_sys-- handles a system instruction or mrs/msr instruction trap
+ on a guest execution
* @vcpu: The VCPU pointer
* @run: The kvm_run struct
*/
-int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run)
+int kvm_handle_sys(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
struct sys_reg_params params;
unsigned long esr = kvm_vcpu_get_hsr(vcpu);
int Rt = kvm_vcpu_sys_get_rt(vcpu);
int ret;

- trace_kvm_handle_sys_reg(esr);
+ trace_kvm_handle_sys(esr);

params.is_aarch32 = false;
params.is_32bit = false;
@@ -1842,10 +1877,16 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run)
params.regval = vcpu_get_reg(vcpu, Rt);
params.is_write = !(esr & 1);

- ret = emulate_sys_reg(vcpu, &params);
+ if (params.Op0 == 1) {
+ /* System instructions */
+ ret = emulate_sys_instr(vcpu, &params);
+ } else {
+ /* MRS/MSR instructions */
+ ret = emulate_sys_reg(vcpu, &params);
+ if (!params.is_write)
+ vcpu_set_reg(vcpu, Rt, params.regval);
+ }

- if (!params.is_write)
- vcpu_set_reg(vcpu, Rt, params.regval);
return ret;
}

diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
index 5f40987..192708e 100644
--- a/arch/arm64/kvm/trace.h
+++ b/arch/arm64/kvm/trace.h
@@ -134,7 +134,7 @@
TP_printk("%s %s reg %d (0x%08llx)", __entry->fn, __entry->is_write?"write to":"read from", __entry->reg, __entry->write_value)
);

-TRACE_EVENT(kvm_handle_sys_reg,
+TRACE_EVENT(kvm_handle_sys,
TP_PROTO(unsigned long hsr),
TP_ARGS(hsr),
--
1.9.1
Jintack Lim
2017-07-18 16:58:54 UTC
Permalink
With HCR_EL2.NV bit set, accesses to EL12 registers in the virtual EL2
trap to EL2. Handle those traps just like we do for EL1 registers.

One exception is CNTKCTL_EL12. We don't trap on CNTKCTL_EL1 for non-VHE
virtual EL2 because we don't have to. However, accessing CNTKCTL_EL12
will trap since it's one of the EL12 registers controlled by HCR_EL2.NV
bit. Therefore, add a handler for it and don't treat it as a
non-trap-registers when preparing a shadow context.

Move EL12 system register macros to a common place to reuse them.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/include/asm/kvm_hyp.h | 24 ------------------------
arch/arm64/include/asm/sysreg.h | 24 ++++++++++++++++++++++++
arch/arm64/kvm/context.c | 7 +++++++
arch/arm64/kvm/sys_regs.c | 25 +++++++++++++++++++++++++
4 files changed, 56 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 4572a9b..353b895 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -73,30 +73,6 @@
#define read_sysreg_el1(r) read_sysreg_elx(r, _EL1, _EL12)
#define write_sysreg_el1(v,r) write_sysreg_elx(v, r, _EL1, _EL12)

-/* The VHE specific system registers and their encoding */
-#define sctlr_EL12 sys_reg(3, 5, 1, 0, 0)
-#define cpacr_EL12 sys_reg(3, 5, 1, 0, 2)
-#define ttbr0_EL12 sys_reg(3, 5, 2, 0, 0)
-#define ttbr1_EL12 sys_reg(3, 5, 2, 0, 1)
-#define tcr_EL12 sys_reg(3, 5, 2, 0, 2)
-#define afsr0_EL12 sys_reg(3, 5, 5, 1, 0)
-#define afsr1_EL12 sys_reg(3, 5, 5, 1, 1)
-#define esr_EL12 sys_reg(3, 5, 5, 2, 0)
-#define far_EL12 sys_reg(3, 5, 6, 0, 0)
-#define mair_EL12 sys_reg(3, 5, 10, 2, 0)
-#define amair_EL12 sys_reg(3, 5, 10, 3, 0)
-#define vbar_EL12 sys_reg(3, 5, 12, 0, 0)
-#define contextidr_EL12 sys_reg(3, 5, 13, 0, 1)
-#define cntkctl_EL12 sys_reg(3, 5, 14, 1, 0)
-#define cntp_tval_EL02 sys_reg(3, 5, 14, 2, 0)
-#define cntp_ctl_EL02 sys_reg(3, 5, 14, 2, 1)
-#define cntp_cval_EL02 sys_reg(3, 5, 14, 2, 2)
-#define cntv_tval_EL02 sys_reg(3, 5, 14, 3, 0)
-#define cntv_ctl_EL02 sys_reg(3, 5, 14, 3, 1)
-#define cntv_cval_EL02 sys_reg(3, 5, 14, 3, 2)
-#define spsr_EL12 sys_reg(3, 5, 4, 0, 0)
-#define elr_EL12 sys_reg(3, 5, 4, 0, 1)
-
/**
* hyp_alternate_select - Generates patchable code sequences that are
* used to switch between two implementations of a function, depending
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index b01c608..b8d4d0c 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -338,6 +338,30 @@
#define SYS_CNTVOFF_EL2 sys_reg(3, 4, 14, 0, 3)
#define SYS_CNTHCTL_EL2 sys_reg(3, 4, 14, 1, 0)

+/* The VHE specific system registers and their encoding */
+#define sctlr_EL12 sys_reg(3, 5, 1, 0, 0)
+#define cpacr_EL12 sys_reg(3, 5, 1, 0, 2)
+#define ttbr0_EL12 sys_reg(3, 5, 2, 0, 0)
+#define ttbr1_EL12 sys_reg(3, 5, 2, 0, 1)
+#define tcr_EL12 sys_reg(3, 5, 2, 0, 2)
+#define afsr0_EL12 sys_reg(3, 5, 5, 1, 0)
+#define afsr1_EL12 sys_reg(3, 5, 5, 1, 1)
+#define esr_EL12 sys_reg(3, 5, 5, 2, 0)
+#define far_EL12 sys_reg(3, 5, 6, 0, 0)
+#define mair_EL12 sys_reg(3, 5, 10, 2, 0)
+#define amair_EL12 sys_reg(3, 5, 10, 3, 0)
+#define vbar_EL12 sys_reg(3, 5, 12, 0, 0)
+#define contextidr_EL12 sys_reg(3, 5, 13, 0, 1)
+#define cntkctl_EL12 sys_reg(3, 5, 14, 1, 0)
+#define cntp_tval_EL02 sys_reg(3, 5, 14, 2, 0)
+#define cntp_ctl_EL02 sys_reg(3, 5, 14, 2, 1)
+#define cntp_cval_EL02 sys_reg(3, 5, 14, 2, 2)
+#define cntv_tval_EL02 sys_reg(3, 5, 14, 3, 0)
+#define cntv_ctl_EL02 sys_reg(3, 5, 14, 3, 1)
+#define cntv_cval_EL02 sys_reg(3, 5, 14, 3, 2)
+#define spsr_EL12 sys_reg(3, 5, 4, 0, 0)
+#define elr_EL12 sys_reg(3, 5, 4, 0, 1)
+
#define SYS_SP_EL2 sys_reg(3, 6, 4, 1, 0)

/* Common SCTLR_ELx flags. */
diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index e1bc753..f3d3398 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -121,6 +121,13 @@ static void copy_shadow_non_trap_el1_state(struct kvm_vcpu *vcpu, bool setup)
for (i = 0; i < ARRAY_SIZE(el1_non_trap_regs); i++) {
const int sr = el1_non_trap_regs[i];

+ /*
+ * We trap on cntkctl_el12 accesses from virtual EL2 as suppose
+ * to not trapping on cntlctl_el1 accesses.
+ */
+ if (vcpu_el2_e2h_is_set(vcpu) && sr == CNTKCTL_EL1)
+ continue;
+
if (setup)
s_sys_regs[sr] = vcpu_sys_reg(vcpu, sr);
else
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index b3e0cb8..2aa922c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -905,6 +905,14 @@ static inline void access_rw(struct sys_reg_params *p, u64 *sysreg)
*sysreg = p->regval;
}

+static bool access_cntkctl_el12(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
+ return true;
+}
+
static u64 *get_special_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
{
u64 reg = sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
@@ -1201,6 +1209,23 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
{ SYS_DESC(SYS_CNTVOFF_EL2), trap_el2_regs, reset_val, CNTVOFF_EL2, 0 },
{ SYS_DESC(SYS_CNTHCTL_EL2), trap_el2_regs, reset_val, CNTHCTL_EL2, 0 },

+ { SYS_DESC(sctlr_EL12), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
+ { SYS_DESC(cpacr_EL12), access_cpacr, reset_val, CPACR_EL1, 0 },
+ { SYS_DESC(ttbr0_EL12), access_vm_reg, reset_unknown, TTBR0_EL1 },
+ { SYS_DESC(ttbr1_EL12), access_vm_reg, reset_unknown, TTBR1_EL1 },
+ { SYS_DESC(tcr_EL12), access_vm_reg, reset_val, TCR_EL1, 0 },
+ { SYS_DESC(spsr_EL12), access_spsr},
+ { SYS_DESC(elr_EL12), access_elr},
+ { SYS_DESC(afsr0_EL12), access_vm_reg, reset_unknown, AFSR0_EL1 },
+ { SYS_DESC(afsr1_EL12), access_vm_reg, reset_unknown, AFSR1_EL1 },
+ { SYS_DESC(esr_EL12), access_vm_reg, reset_unknown, ESR_EL1 },
+ { SYS_DESC(far_EL12), access_vm_reg, reset_unknown, FAR_EL1 },
+ { SYS_DESC(mair_EL12), access_vm_reg, reset_unknown, MAIR_EL1 },
+ { SYS_DESC(amair_EL12), access_vm_reg, reset_amair_el1, AMAIR_EL1 },
+ { SYS_DESC(vbar_EL12), access_vbar, reset_val, VBAR_EL1, 0 },
+ { SYS_DESC(contextidr_EL12), access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 },
+ { SYS_DESC(cntkctl_EL12), access_cntkctl_el12, reset_val, CNTKCTL_EL1, 0 },
+
{ SYS_DESC(SYS_SP_EL2), NULL, reset_special, SP_EL2, 0},
};
--
1.9.1
Jintack Lim
2017-07-18 16:58:53 UTC
Permalink
ARMv8.1 added more EL2 registers: TTBR1_EL2, CONTEXTIDR_EL2, and three
EL2 virtual timer registers. Add the first two registers to vcpu context
and set their handlers. The timer registers and their handlers will be
added in a separate patch.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/include/asm/sysreg.h | 2 ++
arch/arm64/kvm/sys_regs.c | 2 ++
3 files changed, 6 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 53b0b33..373235c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -151,6 +151,7 @@ enum vcpu_sysreg {
HSTR_EL2, /* Hypervisor System Trap Register */
HACR_EL2, /* Hypervisor Auxiliary Control Register */
TTBR0_EL2, /* Translation Table Base Register 0 (EL2) */
+ TTBR1_EL2, /* Translation Table Base Register 1 (EL2) */
TCR_EL2, /* Translation Control Register (EL2) */
VTTBR_EL2, /* Virtualization Translation Table Base Register */
VTCR_EL2, /* Virtualization Translation Control Register */
@@ -164,6 +165,7 @@ enum vcpu_sysreg {
VBAR_EL2, /* Vector Base Address Register (EL2) */
RVBAR_EL2, /* Reset Vector Base Address Register */
RMR_EL2, /* Reset Management Register */
+ CONTEXTIDR_EL2, /* Context ID Register (EL2) */
TPIDR_EL2, /* EL2 Software Thread ID Register */
CNTVOFF_EL2, /* Counter-timer Virtual Offset register */
CNTHCTL_EL2, /* Counter-timer Hypervisor Control register */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 6373d3d..b01c608 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -264,6 +264,7 @@
#define SYS_HACR_EL2 sys_reg(3, 4, 1, 1, 7)

#define SYS_TTBR0_EL2 sys_reg(3, 4, 2, 0, 0)
+#define SYS_TTBR1_EL2 sys_reg(3, 4, 2, 0, 1)
#define SYS_TCR_EL2 sys_reg(3, 4, 2, 0, 2)
#define SYS_VTTBR_EL2 sys_reg(3, 4, 2, 1, 0)
#define SYS_VTCR_EL2 sys_reg(3, 4, 2, 1, 2)
@@ -331,6 +332,7 @@
#define SYS_ICH_LR14_EL2 __SYS__LR8_EL2(6)
#define SYS_ICH_LR15_EL2 __SYS__LR8_EL2(7)

+#define SYS_CONTEXTIDR_EL2 sys_reg(3, 4, 13, 0, 1)
#define SYS_TPIDR_EL2 sys_reg(3, 4, 13, 0, 2)

#define SYS_CNTVOFF_EL2 sys_reg(3, 4, 14, 0, 3)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index dbf5022..b3e0cb8 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1168,6 +1168,7 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
{ SYS_DESC(SYS_HACR_EL2), trap_el2_regs, reset_val, HACR_EL2, 0 },

{ SYS_DESC(SYS_TTBR0_EL2), trap_el2_regs, reset_val, TTBR0_EL2, 0 },
+ { SYS_DESC(SYS_TTBR1_EL2), trap_el2_regs, reset_val, TTBR1_EL2, 0 },
{ SYS_DESC(SYS_TCR_EL2), trap_el2_regs, reset_val, TCR_EL2, 0 },
{ SYS_DESC(SYS_VTTBR_EL2), trap_el2_regs, reset_val, VTTBR_EL2, 0 },
{ SYS_DESC(SYS_VTCR_EL2), trap_el2_regs, reset_val, VTCR_EL2, 0 },
@@ -1194,6 +1195,7 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
{ SYS_DESC(SYS_RVBAR_EL2), trap_el2_regs, reset_val, RVBAR_EL2, 0 },
{ SYS_DESC(SYS_RMR_EL2), trap_el2_regs, reset_val, RMR_EL2, 0 },

+ { SYS_DESC(SYS_CONTEXTIDR_EL2), trap_el2_regs, reset_val, CONTEXTIDR_EL2, 0 },
{ SYS_DESC(SYS_TPIDR_EL2), trap_el2_regs, reset_val, TPIDR_EL2, 0 },

{ SYS_DESC(SYS_CNTVOFF_EL2), trap_el2_regs, reset_val, CNTVOFF_EL2, 0 },
--
1.9.1
Jintack Lim
2017-07-18 16:58:56 UTC
Permalink
When the virtual E2H bit is set, we can support EL2 register accesses
via EL1 registers from the virtual EL2 by doing trap-and-emulate. A
better alternative, however, is to allow the virtual EL2 to access EL2
register states without trap. This can be easily achieved by not traping
EL1 registers since those registers already have EL2 register states.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/kvm/hyp/switch.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index d513da9..fffd0c7 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -74,6 +74,7 @@ static hyp_alternate_select(__activate_traps_arch,
static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
{
u64 val;
+ u64 vhcr_el2;

/*
* We are about to set CPTR_EL2.TFP to trap all floating point
@@ -89,8 +90,26 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
write_sysreg(1 << 30, fpexc32_el2);
isb();
}
- if (vcpu_mode_el2(vcpu))
- val |= HCR_TVM | HCR_TRVM;
+
+ if (is_hyp_ctxt(vcpu)) {
+ /*
+ * For a guest hypervisor on v8.0, trap and emulate the EL1
+ * virtual memory control register accesses.
+ */
+ if (!vcpu_el2_e2h_is_set(vcpu))
+ val |= HCR_TVM | HCR_TRVM;
+ /*
+ * For a guest hypervisor on v8.1 (VHE), allow to access the
+ * EL1 virtual memory control registers natively. These accesses
+ * are to access EL2 register states.
+ * Note that we stil need to respect the virtual HCR_EL2 state.
+ */
+ else {
+ vhcr_el2 = vcpu_sys_reg(vcpu, HCR_EL2);
+ val |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
+ }
+ }
+
write_sysreg(val, hcr_el2);
/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
write_sysreg(1 << 15, hstr_el2);
--
1.9.1
Jintack Lim
2017-07-18 16:58:34 UTC
Permalink
To support the virtual EL2 execution, we need to maintain the EL2
special registers such as SPSR_EL2, ELR_EL2 and SP_EL2 in vcpu context.

Note that SP_EL2 is not accessible in EL2, so we don't need a trap
handler for this register.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
arch/arm64/include/asm/sysreg.h | 4 ++++
arch/arm64/kvm/sys_regs.c | 38 +++++++++++++++++++++++++++++++++-----
arch/arm64/kvm/sys_regs.h | 8 ++++++++
4 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 1dc4ed6..57dccde 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -171,6 +171,15 @@ enum vcpu_sysreg {
NR_SYS_REGS /* Nothing after this line! */
};

+enum el2_special_regs {
+ __INVALID_EL2_SPECIAL_REG__,
+ SPSR_EL2, /* Saved Program Status Register (EL2) */
+ ELR_EL2, /* Exception Link Register (EL2) */
+ SP_EL2, /* Stack Pointer (EL2) */
+
+ NR_EL2_SPECIAL_REGS
+};
+
/* 32bit mapping */
#define c0_MPIDR (MPIDR_EL1 * 2) /* MultiProcessor ID Register */
#define c0_CSSELR (CSSELR_EL1 * 2)/* Cache Size Selection Register */
@@ -218,6 +227,8 @@ struct kvm_cpu_context {
u64 sys_regs[NR_SYS_REGS];
u32 copro[NR_COPRO_REGS];
};
+
+ u64 el2_special_regs[NR_EL2_SPECIAL_REGS];
};

typedef struct kvm_cpu_context kvm_cpu_context_t;
@@ -307,6 +318,7 @@ struct kvm_vcpu_arch {

#define vcpu_gp_regs(v) (&(v)->arch.ctxt.gp_regs)
#define vcpu_sys_reg(v,r) ((v)->arch.ctxt.sys_regs[(r)])
+#define vcpu_el2_sreg(v,r) ((v)->arch.ctxt.el2_special_regs[(r)])
/*
* CP14 and CP15 live in the same array, as they are backed by the
* same system registers.
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 9277c4a..98c32ef 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -268,6 +268,8 @@

#define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0)

+#define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0)
+#define SYS_ELR_EL2 sys_reg(3, 4, 4, 0, 1)
#define SYS_SP_EL1 sys_reg(3, 4, 4, 1, 0)

#define SYS_IFSR32_EL2 sys_reg(3, 4, 5, 0, 1)
@@ -332,6 +334,8 @@
#define SYS_CNTVOFF_EL2 sys_reg(3, 4, 14, 0, 3)
#define SYS_CNTHCTL_EL2 sys_reg(3, 4, 14, 1, 0)

+#define SYS_SP_EL2 sys_reg(3, 6, 4, 1, 0)
+
/* Common SCTLR_ELx flags. */
#define SCTLR_ELx_EE (1 << 25)
#define SCTLR_ELx_I (1 << 12)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1568f8b..2b3ed70 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -900,15 +900,33 @@ static inline void access_rw(struct sys_reg_params *p, u64 *sysreg)
*sysreg = p->regval;
}

+static u64 *get_special_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
+{
+ u64 reg = sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+ switch (reg) {
+ case SYS_SP_EL1:
+ return &vcpu->arch.ctxt.gp_regs.sp_el1;
+ case SYS_ELR_EL2:
+ return &vcpu_el2_sreg(vcpu, ELR_EL2);
+ case SYS_SPSR_EL2:
+ return &vcpu_el2_sreg(vcpu, SPSR_EL2);
+ default:
+ return NULL;
+ };
+}
+
static bool trap_el2_regs(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
- /* SP_EL1 is NOT maintained in sys_regs array */
- if (sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2) == SYS_SP_EL1)
- access_rw(p, &vcpu->arch.ctxt.gp_regs.sp_el1);
- else
- access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
+ u64 *sys_reg;
+
+ sys_reg = get_special_reg(vcpu, p);
+ if (!sys_reg)
+ sys_reg = &vcpu_sys_reg(vcpu, r->reg);
+
+ access_rw(p, sys_reg);

return true;
}
@@ -1116,6 +1134,8 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,

{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },

+ { SYS_DESC(SYS_SPSR_EL2), trap_el2_regs, reset_special, SPSR_EL2, 0 },
+ { SYS_DESC(SYS_ELR_EL2), trap_el2_regs, reset_special, ELR_EL2, 0 },
{ SYS_DESC(SYS_SP_EL1), trap_el2_regs },

{ SYS_DESC(SYS_IFSR32_EL2), NULL, reset_unknown, IFSR32_EL2 },
@@ -1138,6 +1158,8 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,

{ SYS_DESC(SYS_CNTVOFF_EL2), trap_el2_regs, reset_val, CNTVOFF_EL2, 0 },
{ SYS_DESC(SYS_CNTHCTL_EL2), trap_el2_regs, reset_val, CNTHCTL_EL2, 0 },
+
+ { SYS_DESC(SYS_SP_EL2), NULL, reset_special, SP_EL2, 0},
};

static bool trap_dbgidr(struct kvm_vcpu *vcpu,
@@ -2271,6 +2293,8 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)

/* Catch someone adding a register without putting in reset entry. */
memset(&vcpu->arch.ctxt.sys_regs, 0x42, sizeof(vcpu->arch.ctxt.sys_regs));
+ memset(&vcpu->arch.ctxt.el2_special_regs, 0x42,
+ sizeof(vcpu->arch.ctxt.el2_special_regs));

/* Generic chip reset first (so target could override). */
reset_sys_reg_descs(vcpu, sys_reg_descs, ARRAY_SIZE(sys_reg_descs));
@@ -2281,4 +2305,8 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
for (num = 1; num < NR_SYS_REGS; num++)
if (vcpu_sys_reg(vcpu, num) == 0x4242424242424242)
panic("Didn't reset vcpu_sys_reg(%zi)", num);
+
+ for (num = 1; num < NR_EL2_SPECIAL_REGS; num++)
+ if (vcpu_el2_sreg(vcpu, num) == 0x4242424242424242)
+ panic("Didn't reset vcpu_el2_sreg(%zi)", num);
}
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index 060f534..827717b 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -99,6 +99,14 @@ static inline void reset_val(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r
vcpu_sys_reg(vcpu, r->reg) = r->val;
}

+static inline void reset_special(struct kvm_vcpu *vcpu,
+ const struct sys_reg_desc *r)
+{
+ BUG_ON(!r->reg);
+ BUG_ON(r->reg >= NR_EL2_SPECIAL_REGS);
+ vcpu_el2_sreg(vcpu, r->reg) = r->val;
+}
+
static inline int cmp_sys_reg(const struct sys_reg_desc *i1,
const struct sys_reg_desc *i2)
{
--
1.9.1
Jintack Lim
2017-07-18 16:58:38 UTC
Permalink
From: Christoffer Dall <***@linaro.org>

Some bits of the TCR weren't defined and since we're about to use these
in KVM, add these defines.

Signed-off-by: Christoffer Dall <***@linaro.org>
---
arch/arm64/include/asm/pgtable-hwdef.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index eb0c2bd..d26cab7 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -272,9 +272,15 @@
#define TCR_TG1_4K (UL(2) << TCR_TG1_SHIFT)
#define TCR_TG1_64K (UL(3) << TCR_TG1_SHIFT)

+#define TCR_IPS_SHIFT 32
+#define TCR_IPS_MASK (UL(7) << TCR_IPS_SHIFT)
+
#define TCR_ASID16 (UL(1) << 36)
#define TCR_TBI0 (UL(1) << 37)
#define TCR_HA (UL(1) << 39)
#define TCR_HD (UL(1) << 40)

+#define TCR_EPD1 (UL(1) << 23)
+#define TCR_EPD0 (UL(1) << 7)
+
#endif
--
1.9.1
Jintack Lim
2017-07-18 16:58:40 UTC
Permalink
When running in virtual EL2 we use the shadow EL1 systerm register array
for the save/restore process, so that hardware and especially the memory
subsystem behaves as code written for EL2 expects while really running
in EL1.

This works great for EL1 system register accesses that we trap, because
these accesses will be written into the virtual state for the EL1 system
registers used when eventually switching the VCPU mode to EL1.

However, there was a collection of EL1 system registers which we do not
trap, and as a consequence all save/restore operations of these
registers were happening locally in the shadow array, with no benefit to
software actually running in virtual EL1 at all.

To fix this, simply synchronize the shadow and real EL1 state for these
registers on entry/exit to/from virtual EL2 state.

Signed-off-by: Christoffer Dall <***@linaro.org>
Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/kvm/context.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 56 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index e965049..e1bc753 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -86,6 +86,58 @@ static void flush_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
s_sys_regs[CPACR_EL1] = cptr_to_cpacr(vcpu_sys_reg(vcpu, CPTR_EL2));
}

+
+/*
+ * List of EL0 and EL1 registers which we allow the virtual EL2 mode to access
+ * directly without trapping. This is possible because the impact of
+ * accessing those registers are the same regardless of the exception
+ * levels that are allowed.
+ */
+static const int el1_non_trap_regs[] = {
+ CNTKCTL_EL1,
+ CSSELR_EL1,
+ PAR_EL1,
+ TPIDR_EL0,
+ TPIDR_EL1,
+ TPIDRRO_EL0
+};
+
+/**
+ * copy_shadow_non_trap_el1_state
+ * @vcpu: The VCPU pointer
+ * @setup: True, if on the way to the guest (called from setup)
+ * False, if returning form the guet (calld from restore)
+ *
+ * Some EL1 registers are accessed directly by the virtual EL2 mode because
+ * they in no way affect execution state in virtual EL2. However, we must
+ * still ensure that virtual EL2 observes the same state of the EL1 registers
+ * as the normal VM's EL1 mode, so copy this state as needed on setup/restore.
+ */
+static void copy_shadow_non_trap_el1_state(struct kvm_vcpu *vcpu, bool setup)
+{
+ u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(el1_non_trap_regs); i++) {
+ const int sr = el1_non_trap_regs[i];
+
+ if (setup)
+ s_sys_regs[sr] = vcpu_sys_reg(vcpu, sr);
+ else
+ vcpu_sys_reg(vcpu, sr) = s_sys_regs[sr];
+ }
+}
+
+static void sync_shadow_non_trap_el1_state(struct kvm_vcpu *vcpu)
+{
+ copy_shadow_non_trap_el1_state(vcpu, false);
+}
+
+static void flush_shadow_non_trap_el1_state(struct kvm_vcpu *vcpu)
+{
+ copy_shadow_non_trap_el1_state(vcpu, true);
+}
+
static void flush_shadow_special_regs(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
@@ -162,6 +214,7 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
if (unlikely(vcpu_mode_el2(vcpu))) {
flush_shadow_special_regs(vcpu);
flush_shadow_el1_sysregs(vcpu);
+ flush_shadow_non_trap_el1_state(vcpu);
ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
} else {
flush_special_regs(vcpu);
@@ -176,9 +229,10 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
*/
void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
{
- if (unlikely(vcpu_mode_el2(vcpu)))
+ if (unlikely(vcpu_mode_el2(vcpu))) {
sync_shadow_special_regs(vcpu);
- else
+ sync_shadow_non_trap_el1_state(vcpu);
+ } else
sync_special_regs(vcpu);
}
--
1.9.1
Jintack Lim
2017-07-18 16:58:41 UTC
Permalink
These macros and enums can be reused to inject exceptions
for nested virtualization.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/include/asm/kvm_emulate.h | 12 ++++++++++++
arch/arm64/kvm/inject_fault.c | 12 ------------
2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 14c4ce9..94f98cc 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -31,6 +31,18 @@
#include <asm/cputype.h>
#include <asm/virt.h>

+#define CURRENT_EL_SP_EL0_VECTOR 0x0
+#define CURRENT_EL_SP_ELx_VECTOR 0x200
+#define LOWER_EL_AArch64_VECTOR 0x400
+#define LOWER_EL_AArch32_VECTOR 0x600
+
+enum exception_type {
+ except_type_sync = 0,
+ except_type_irq = 0x80,
+ except_type_fiq = 0x100,
+ except_type_serror = 0x180,
+};
+
unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu);

diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index da6a8cf..94679fb 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -28,11 +28,6 @@
#define PSTATE_FAULT_BITS_64 (PSR_MODE_EL1h | PSR_A_BIT | PSR_F_BIT | \
PSR_I_BIT | PSR_D_BIT)

-#define CURRENT_EL_SP_EL0_VECTOR 0x0
-#define CURRENT_EL_SP_ELx_VECTOR 0x200
-#define LOWER_EL_AArch64_VECTOR 0x400
-#define LOWER_EL_AArch32_VECTOR 0x600
-
static void prepare_fault32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
{
unsigned long cpsr;
@@ -101,13 +96,6 @@ static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
*fsr = 0x14;
}

-enum exception_type {
- except_type_sync = 0,
- except_type_irq = 0x80,
- except_type_fiq = 0x100,
- except_type_serror = 0x180,
-};
-
static u64 get_except_vector(struct kvm_vcpu *vcpu, enum exception_type type)
{
u64 exc_offset;
--
1.9.1
Jintack Lim
2017-07-18 16:58:39 UTC
Permalink
From: Christoffer Dall <***@linaro.org>

When entering virtual EL2, we need to reflect virtual EL2 register
states to corresponding shadow EL1 registers. We can simply copy them if
their formats are identical. Otherwise, we need to convert EL2 register
state to EL1 register state.

When entering EL1/EL0, we need a special care for MPIDR_EL1. Read of
this register returns the value of VMPIDR_EL2, so when a VM has the
virtual EL2, the value of MPIDR_EL1 should come from the virtual
VMPIDR_EL2.

Signed-off-by: Christoffer Dall <***@linaro.org>
Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/kvm/context.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 81 insertions(+)

diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index 2645787..e965049 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -17,6 +17,74 @@

#include <linux/kvm_host.h>
#include <asm/kvm_emulate.h>
+#include <asm/esr.h>
+
+struct el1_el2_map {
+ enum vcpu_sysreg el1;
+ enum vcpu_sysreg el2;
+};
+
+/*
+ * List of EL2 registers which can be directly applied to EL1 registers to
+ * emulate running EL2 in EL1.
+ */
+static const struct el1_el2_map el1_el2_map[] = {
+ { AMAIR_EL1, AMAIR_EL2 },
+ { MAIR_EL1, MAIR_EL2 },
+ { TTBR0_EL1, TTBR0_EL2 },
+ { ACTLR_EL1, ACTLR_EL2 },
+ { AFSR0_EL1, AFSR0_EL2 },
+ { AFSR1_EL1, AFSR1_EL2 },
+ { SCTLR_EL1, SCTLR_EL2 },
+ { VBAR_EL1, VBAR_EL2 },
+};
+
+static inline u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
+{
+ return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
+ << TCR_IPS_SHIFT;
+}
+
+static inline u64 cptr_to_cpacr(u64 cptr_el2)
+{
+ u64 cpacr_el1 = 0;
+
+ if (!(cptr_el2 & CPTR_EL2_TFP))
+ cpacr_el1 |= CPACR_EL1_FPEN;
+ if (cptr_el2 & CPTR_EL2_TTA)
+ cpacr_el1 |= CPACR_EL1_TTA;
+
+ return cpacr_el1;
+}
+
+static void flush_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
+{
+ u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
+ u64 tcr_el2;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(el1_el2_map); i++) {
+ const struct el1_el2_map *map = &el1_el2_map[i];
+
+ s_sys_regs[map->el1] = vcpu_sys_reg(vcpu, map->el2);
+ }
+
+ tcr_el2 = vcpu_sys_reg(vcpu, TCR_EL2);
+ s_sys_regs[TCR_EL1] =
+ TCR_EPD1 | /* disable TTBR1_EL1 */
+ ((tcr_el2 & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
+ tcr_el2_ips_to_tcr_el1_ps(tcr_el2) |
+ (tcr_el2 & TCR_EL2_TG0_MASK) |
+ (tcr_el2 & TCR_EL2_ORGN0_MASK) |
+ (tcr_el2 & TCR_EL2_IRGN0_MASK) |
+ (tcr_el2 & TCR_EL2_T0SZ_MASK);
+
+ /* Rely on separate VMID for VA context, always use ASID 0 */
+ s_sys_regs[TTBR0_EL1] &= ~GENMASK_ULL(63, 48);
+ s_sys_regs[TTBR1_EL1] = 0;
+
+ s_sys_regs[CPACR_EL1] = cptr_to_cpacr(vcpu_sys_reg(vcpu, CPTR_EL2));
+}

static void flush_shadow_special_regs(struct kvm_vcpu *vcpu)
{
@@ -72,6 +140,17 @@ static void sync_special_regs(struct kvm_vcpu *vcpu)
ctxt->gp_regs.spsr[KVM_SPSR_EL1] = ctxt->hw_spsr_el1;
}

+static void setup_mpidr_el1(struct kvm_vcpu *vcpu)
+{
+ /*
+ * A non-secure EL0 or EL1 read of MPIDR_EL1 returns
+ * the value of VMPIDR_EL2. For nested virtualization,
+ * it comes from the virtual VMPIDR_EL2.
+ */
+ if (nested_virt_in_use(vcpu))
+ vcpu_sys_reg(vcpu, MPIDR_EL1) = vcpu_sys_reg(vcpu, VMPIDR_EL2);
+}
+
/**
* kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
* @vcpu: The VCPU pointer
@@ -82,9 +161,11 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)

if (unlikely(vcpu_mode_el2(vcpu))) {
flush_shadow_special_regs(vcpu);
+ flush_shadow_el1_sysregs(vcpu);
ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
} else {
flush_special_regs(vcpu);
+ setup_mpidr_el1(vcpu);
ctxt->hw_sys_regs = ctxt->sys_regs;
}
}
--
1.9.1
Jintack Lim
2017-07-18 16:58:44 UTC
Permalink
For the same reason we trap virtual memory register accesses at virtual
EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
introduces the HCR_EL2.NV1 bit to be able to trap on those register
accesses in EL1. Do not set this bit until the whole nesting support is
completed.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/include/asm/sysreg.h | 2 ++
arch/arm64/kvm/sys_regs.c | 29 ++++++++++++++++++++++++++++-
2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 98c32ef..6373d3d 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -164,6 +164,8 @@
#define SYS_TTBR1_EL1 sys_reg(3, 0, 2, 0, 1)
#define SYS_TCR_EL1 sys_reg(3, 0, 2, 0, 2)

+#define SYS_SPSR_EL1 sys_reg(3, 0, 4, 0, 0)
+#define SYS_ELR_EL1 sys_reg(3, 0, 4, 0, 1)
#define SYS_ICC_PMR_EL1 sys_reg(3, 0, 4, 6, 0)

#define SYS_AFSR0_EL1 sys_reg(3, 0, 5, 1, 0)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index d8b1d4b..b83fef2 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -936,6 +936,30 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
return true;
}

+static bool access_elr(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ access_rw(p, &vcpu->arch.ctxt.gp_regs.elr_el1);
+ return true;
+}
+
+static bool access_spsr(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ access_rw(p, &vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1]);
+ return true;
+}
+
+static bool access_vbar(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
+ return true;
+}
+
/*
* Architected system registers.
* Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -994,6 +1018,9 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
{ SYS_DESC(SYS_TTBR1_EL1), access_vm_reg, reset_unknown, TTBR1_EL1 },
{ SYS_DESC(SYS_TCR_EL1), access_vm_reg, reset_val, TCR_EL1, 0 },

+ { SYS_DESC(SYS_SPSR_EL1), access_spsr},
+ { SYS_DESC(SYS_ELR_EL1), access_elr},
+
{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
@@ -1006,7 +1033,7 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
{ SYS_DESC(SYS_MAIR_EL1), access_vm_reg, reset_unknown, MAIR_EL1 },
{ SYS_DESC(SYS_AMAIR_EL1), access_vm_reg, reset_amair_el1, AMAIR_EL1 },

- { SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
+ { SYS_DESC(SYS_VBAR_EL1), access_vbar, reset_val, VBAR_EL1, 0 },

{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
{ SYS_DESC(SYS_ICC_EOIR0_EL1), read_from_write_only },
--
1.9.1
Jintack Lim
2017-07-18 16:58:50 UTC
Permalink
Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/kvm/handle_exit.c | 13 ++++++++++++-
arch/arm64/kvm/nested.c | 20 ++++++++++++++++++++
3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 46880c3..53b0b33 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -442,5 +442,6 @@ static inline void __cpu_init_stage2(void)
int __init kvmarm_nested_cfg(char *buf);
int init_nested_virt(void);
bool nested_virt_in_use(struct kvm_vcpu *vcpu);
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);

#endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 8b398b2..25ec824 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -107,7 +107,18 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
*/
static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
- if (kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
+ bool is_wfe = !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE);
+
+ if (nested_virt_in_use(vcpu)) {
+ int ret = handle_wfx_nested(vcpu, is_wfe);
+
+ if (ret < 0 && ret != -EINVAL)
+ return ret;
+ else if (ret >= 0)
+ return ret;
+ }
+
+ if (is_wfe) {
trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
vcpu->stat.wfe_exit_stat++;
kvm_vcpu_on_spin(vcpu);
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 9a05c76..042d304 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -18,6 +18,8 @@
#include <linux/kvm.h>
#include <linux/kvm_host.h>

+#include <asm/kvm_emulate.h>
+
static bool nested_param;

int __init kvmarm_nested_cfg(char *buf)
@@ -41,3 +43,21 @@ bool nested_virt_in_use(struct kvm_vcpu *vcpu)

return false;
}
+
+/*
+ * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
+ * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
+ * handle this.
+ */
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
+{
+ u64 hcr_el2 = vcpu_sys_reg(vcpu, HCR_EL2);
+
+ if (vcpu_mode_el2(vcpu))
+ return -EINVAL;
+
+ if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
+ return -EINVAL;
+}
--
1.9.1
Jintack Lim
2017-07-18 16:58:52 UTC
Permalink
These macros will be used to support the virtual EL2 with VHE.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/include/asm/kvm_emulate.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 3017234..68aafbd 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -173,6 +173,30 @@ static inline bool vcpu_mode_el2(const struct kvm_vcpu *vcpu)
return mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t;
}

+static inline bool vcpu_el2_e2h_is_set(const struct kvm_vcpu *vcpu)
+{
+ return (vcpu_sys_reg(vcpu, HCR_EL2) & HCR_E2H);
+}
+
+static inline bool vcpu_el2_tge_is_set(const struct kvm_vcpu *vcpu)
+{
+ return (vcpu_sys_reg(vcpu, HCR_EL2) & HCR_TGE);
+}
+
+static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
+{
+ /*
+ * We are in a hypervisor context if the vcpu mode is EL2 or
+ * E2H and TGE bits are set. The latter means we are in the user space
+ * of the VHE kernel. ARMv8.1 ARM describes this as 'InHost'
+ */
+ if (vcpu_mode_el2(vcpu) ||
+ (vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)))
+ return true;
+
+ return false;
+}
+
static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
{
return vcpu->arch.fault.esr_el2;
--
1.9.1
Jintack Lim
2017-07-18 16:59:02 UTC
Permalink
Forward the EL1 virtual memory register traps to the virtual EL2 if they
are not coming from the virtual EL2 and the virtual HCR_EL2.TVM or TRVM
bit is set.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/kvm/sys_regs.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 3559cf7..3e4ec5e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -135,6 +135,27 @@ static inline bool el12_reg(struct sys_reg_params *p)
return (p->Op1 == 5);
}

+/* This function is to support the recursive nested virtualization */
+static bool forward_vm_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
+{
+ u64 hcr_el2 = vcpu_sys_reg(vcpu, HCR_EL2);
+
+ /* If a trap comes from the virtual EL2, the host hypervisor handles. */
+ if (vcpu_mode_el2(vcpu))
+ return false;
+
+ /*
+ * If the virtual HCR_EL2.TVM or TRVM bit is set, we need to foward
+ * this trap to the virtual EL2.
+ */
+ if ((hcr_el2 & HCR_TVM) && p->is_write)
+ return true;
+ else if ((hcr_el2 & HCR_TRVM) && !p->is_write)
+ return true;
+
+ return false;
+}
+
/*
* Generic accessor for VM registers. Only called as long as HCR_TVM
* is set. If the guest enables the MMU, we stop trapping the VM
@@ -152,6 +173,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
if (el12_reg(p) && forward_nv_traps(vcpu))
return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));

+ if (!el12_reg(p) && forward_vm_traps(vcpu, p))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
/*
* Redirect EL1 register accesses to the corresponding EL2 registers if
* they are meant to access EL2 registers.
--
1.9.1
Jintack Lim
2017-07-18 16:59:01 UTC
Permalink
In addition to EL2 register accesses, setting NV bit will also make EL12
register accesses trap to EL2. To emulate this for the virtual EL2,
forword traps due to EL12 register accessses to the virtual EL2 if the
virtual HCR_EL2.NV bit is set.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/kvm/sys_regs.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4fd7090..3559cf7 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -149,6 +149,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
int i;
const struct el1_el2_map *map;

+ if (el12_reg(p) && forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
/*
* Redirect EL1 register accesses to the corresponding EL2 registers if
* they are meant to access EL2 registers.
@@ -959,6 +962,9 @@ static bool access_cntkctl_el12(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
+ if (forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
return true;
}
@@ -1005,6 +1011,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
+ if (el12_reg(p) && forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu->arch.ctxt.gp_regs.elr_el1);
return true;
}
@@ -1013,6 +1022,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
+ if (el12_reg(p) && forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1]);
return true;
}
@@ -1021,6 +1033,9 @@ static bool access_vbar(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
+ if (el12_reg(p) && forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
return true;
}
@@ -1031,6 +1046,9 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
{
u64 reg = sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);

+ if (el12_reg(p) && forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
/*
* When the virtual HCR_EL2.E2H == 1, an access to CPACR_EL1
* in the virtual EL2 is to access CPTR_EL2.
--
1.9.1
Jintack Lim
2017-07-18 16:58:59 UTC
Permalink
Now that the virtual EL2 can access EL2 register states via EL1
registers, we need to consider it when selecting the register to
emulate.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/kvm/sys_regs.c | 46 ++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 44 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 79980be..910b50d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -110,6 +110,31 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
return true;
}

+struct el1_el2_map {
+ int el1;
+ int el2;
+};
+
+static const struct el1_el2_map vm_map[] = {
+ {SCTLR_EL1, SCTLR_EL2},
+ {TTBR0_EL1, TTBR0_EL2},
+ {TTBR1_EL1, TTBR1_EL2},
+ {TCR_EL1, TCR_EL2},
+ {ESR_EL1, ESR_EL2},
+ {FAR_EL1, FAR_EL2},
+ {AFSR0_EL1, AFSR0_EL2},
+ {AFSR1_EL1, AFSR1_EL2},
+ {MAIR_EL1, MAIR_EL2},
+ {AMAIR_EL1, AMAIR_EL2},
+ {CONTEXTIDR_EL1, CONTEXTIDR_EL2},
+};
+
+static inline bool el12_reg(struct sys_reg_params *p)
+{
+ /* All *_EL12 registers have Op1=5. */
+ return (p->Op1 == 5);
+}
+
/*
* Generic accessor for VM registers. Only called as long as HCR_TVM
* is set. If the guest enables the MMU, we stop trapping the VM
@@ -120,16 +145,33 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *r)
{
bool was_enabled = vcpu_has_cache_enabled(vcpu);
+ u64 *sysreg = &vcpu_sys_reg(vcpu, r->reg);
+ int i;
+ const struct el1_el2_map *map;
+
+ /*
+ * Redirect EL1 register accesses to the corresponding EL2 registers if
+ * they are meant to access EL2 registers.
+ */
+ if (vcpu_el2_e2h_is_set(vcpu) && !el12_reg(p)) {
+ for (i = 0; i < ARRAY_SIZE(vm_map); i++) {
+ map = &vm_map[i];
+ if (map->el1 == r->reg) {
+ sysreg = &vcpu_sys_reg(vcpu, map->el2);
+ break;
+ }
+ }
+ }

BUG_ON(!vcpu_mode_el2(vcpu) && !p->is_write);

if (!p->is_write) {
- p->regval = vcpu_sys_reg(vcpu, r->reg);
+ p->regval = *sysreg;
return true;
}

if (!p->is_aarch32) {
- vcpu_sys_reg(vcpu, r->reg) = p->regval;
+ *sysreg = p->regval;
} else {
if (!p->is_32bit)
vcpu_cp15_64_high(vcpu, r->reg) = upper_32_bits(p->regval);
--
1.9.1
Jintack Lim
2017-07-18 16:58:46 UTC
Permalink
When HCR.NV bit is set, eret instructions trap to EL2 with EC code 0x1A.
Emulate eret instructions by setting pc and pstate.

Note that the current exception level is always the virtual EL2, since
we set HCR_EL2.NV bit only when entering the virtual EL2. So, we take
spsr and elr states from the virtual _EL2 registers.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/include/asm/esr.h | 1 +
arch/arm64/kvm/handle_exit.c | 16 ++++++++++++++++
arch/arm64/kvm/trace.h | 21 +++++++++++++++++++++
3 files changed, 38 insertions(+)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index e7d8e28..210fde6 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -43,6 +43,7 @@
#define ESR_ELx_EC_HVC64 (0x16)
#define ESR_ELx_EC_SMC64 (0x17)
#define ESR_ELx_EC_SYS64 (0x18)
+#define ESR_ELx_EC_ERET (0x1A)
/* Unallocated EC: 0x19 - 0x1E */
#define ESR_ELx_EC_IMP_DEF (0x1f)
#define ESR_ELx_EC_IABT_LOW (0x20)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 17d8a16..9259881 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -147,6 +147,21 @@ static int kvm_handle_unknown_ec(struct kvm_vcpu *vcpu, struct kvm_run *run)
return 1;
}

+static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+ trace_kvm_nested_eret(vcpu, vcpu_el2_sreg(vcpu, ELR_EL2),
+ vcpu_el2_sreg(vcpu, SPSR_EL2));
+
+ /*
+ * Note that the current exception level is always the virtual EL2,
+ * since we set HCR_EL2.NV bit only when entering the virtual EL2.
+ */
+ *vcpu_pc(vcpu) = vcpu_el2_sreg(vcpu, ELR_EL2);
+ *vcpu_cpsr(vcpu) = vcpu_el2_sreg(vcpu, SPSR_EL2);
+
+ return 1;
+}
+
static exit_handle_fn arm_exit_handlers[] = {
[0 ... ESR_ELx_EC_MAX] = kvm_handle_unknown_ec,
[ESR_ELx_EC_WFx] = kvm_handle_wfx,
@@ -160,6 +175,7 @@ static int kvm_handle_unknown_ec(struct kvm_vcpu *vcpu, struct kvm_run *run)
[ESR_ELx_EC_HVC64] = handle_hvc,
[ESR_ELx_EC_SMC64] = handle_smc,
[ESR_ELx_EC_SYS64] = kvm_handle_sys_reg,
+ [ESR_ELx_EC_ERET] = kvm_handle_eret,
[ESR_ELx_EC_IABT_LOW] = kvm_handle_guest_abort,
[ESR_ELx_EC_DABT_LOW] = kvm_handle_guest_abort,
[ESR_ELx_EC_SOFTSTP_LOW]= kvm_handle_guest_debug,
diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
index 7c86cfb..5f40987 100644
--- a/arch/arm64/kvm/trace.h
+++ b/arch/arm64/kvm/trace.h
@@ -187,6 +187,27 @@
TP_printk("vcpu: %p, inject exception to vEL2: ESR_EL2 0x%lx, vector: 0x%016lx",
__entry->vcpu, __entry->esr_el2, __entry->pc)
);
+
+TRACE_EVENT(kvm_nested_eret,
+ TP_PROTO(struct kvm_vcpu *vcpu, unsigned long elr_el2,
+ unsigned long spsr_el2),
+ TP_ARGS(vcpu, elr_el2, spsr_el2),
+
+ TP_STRUCT__entry(
+ __field(struct kvm_vcpu *, vcpu)
+ __field(unsigned long, elr_el2)
+ __field(unsigned long, spsr_el2)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu = vcpu;
+ __entry->elr_el2 = elr_el2;
+ __entry->spsr_el2 = spsr_el2;
+ ),
+
+ TP_printk("vcpu: %p, eret to elr_el2: 0x%016lx, with spsr_el2: 0x%08lx",
+ __entry->vcpu, __entry->elr_el2, __entry->spsr_el2)
+);
#endif /* _TRACE_ARM64_KVM_H */

#undef TRACE_INCLUDE_PATH
--
1.9.1
Jintack Lim
2017-07-18 16:58:45 UTC
Permalink
For the same reason we trap virtual memory register accesses in virtual
EL2, we trap CPACR_EL1 access too; We allow the virtual EL2 mode to
access EL1 system register state instead of the virtual EL2 one.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/kvm/hyp/switch.c | 10 +++++++---
arch/arm64/kvm/sys_regs.c | 10 +++++++++-
2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index ec91cd08..d513da9 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -42,7 +42,8 @@ bool __hyp_text __fpsimd_enabled(void)
return __fpsimd_is_enabled()();
}

-static void __hyp_text __activate_traps_vhe(void)
+static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
+
{
u64 val;

@@ -54,12 +55,15 @@ static void __hyp_text __activate_traps_vhe(void)
write_sysreg(__kvm_hyp_vector, vbar_el1);
}

-static void __hyp_text __activate_traps_nvhe(void)
+static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
+
{
u64 val;

val = CPTR_EL2_DEFAULT;
val |= CPTR_EL2_TTA | CPTR_EL2_TFP;
+ if (vcpu_mode_el2(vcpu))
+ val |= CPTR_EL2_TCPAC;
write_sysreg(val, cptr_el2);
}

@@ -99,7 +103,7 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
write_sysreg(0, pmselr_el0);
write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
- __activate_traps_arch()();
+ __activate_traps_arch()(vcpu);
}

static void __hyp_text __deactivate_traps_vhe(void)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index b83fef2..7062645 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -960,6 +960,14 @@ static bool access_vbar(struct kvm_vcpu *vcpu,
return true;
}

+static bool access_cpacr(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
+ return true;
+}
+
/*
* Architected system registers.
* Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -1013,7 +1021,7 @@ static bool access_vbar(struct kvm_vcpu *vcpu,

{ SYS_DESC(SYS_MPIDR_EL1), NULL, reset_mpidr, MPIDR_EL1 },
{ SYS_DESC(SYS_SCTLR_EL1), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
- { SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
+ { SYS_DESC(SYS_CPACR_EL1), access_cpacr, reset_val, CPACR_EL1, 0 },
{ SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 },
{ SYS_DESC(SYS_TTBR1_EL1), access_vm_reg, reset_unknown, TTBR1_EL1 },
{ SYS_DESC(SYS_TCR_EL1), access_vm_reg, reset_val, TCR_EL1, 0 },
--
1.9.1
Jintack Lim
2017-07-18 16:58:36 UTC
Permalink
From: Christoffer Dall <***@linaro.org>

Add functions setting up and restoring the guest's context on each entry
and exit. These functions will come in handy when we want to use
different context for normal EL0/EL1 and virtual EL2 execution.

No functional change yet.

Signed-off-by: Christoffer Dall <***@linaro.org>
Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm/include/asm/kvm_emulate.h | 4 ++
arch/arm64/include/asm/kvm_emulate.h | 4 ++
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/kvm/context.c | 54 ++++++++++++++++
arch/arm64/kvm/hyp/sysreg-sr.c | 117 +++++++++++++++++++----------------
virt/kvm/arm/arm.c | 14 +++++
6 files changed, 140 insertions(+), 55 deletions(-)
create mode 100644 arch/arm64/kvm/context.c

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 399cd75e..0a03b7d 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -47,6 +47,10 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);

+static inline void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu) { };
+static inline void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu) { };
+static inline void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt) { };
+
static inline bool kvm_condition_valid(const struct kvm_vcpu *vcpu)
{
return kvm_condition_valid32(vcpu);
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 5d6f3d0..14c4ce9 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -42,6 +42,10 @@
void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);

+void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
+void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
+void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
+
static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index f513047..5762337 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -15,7 +15,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/e
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arm.o $(KVM)/arm/mmu.o $(KVM)/arm/mmio.o
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o

-kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o
+kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o context.o
kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o
diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
new file mode 100644
index 0000000..bc43e66
--- /dev/null
+++ b/arch/arm64/kvm/context.c
@@ -0,0 +1,54 @@
+/*
+ * Copyright (C) 2016 - Linaro Ltd.
+ * Author: Christoffer Dall <***@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_emulate.h>
+
+/**
+ * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
+ * @vcpu: The VCPU pointer
+ */
+void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
+{
+ struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+
+ ctxt->hw_pstate = *vcpu_cpsr(vcpu);
+ ctxt->hw_sys_regs = ctxt->sys_regs;
+ ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
+ ctxt->hw_elr_el1 = ctxt->gp_regs.elr_el1;
+ ctxt->hw_spsr_el1 = ctxt->gp_regs.spsr[KVM_SPSR_EL1];
+}
+
+/**
+ * kvm_arm_restore_shadow_state -- write back shadow state from guest
+ * @vcpu: The VCPU pointer
+ */
+void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
+{
+ struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+
+ *vcpu_cpsr(vcpu) = ctxt->hw_pstate;
+ ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
+ ctxt->gp_regs.elr_el1 = ctxt->hw_elr_el1;
+ ctxt->gp_regs.spsr[KVM_SPSR_EL1] = ctxt->hw_spsr_el1;
+}
+
+void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)
+{
+ /* This is to set hw_sys_regs of host_cpu_context */
+ cpu_ctxt->hw_sys_regs = cpu_ctxt->sys_regs;
+}
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 9341376..b7a67b1 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -19,6 +19,7 @@
#include <linux/kvm_host.h>

#include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
#include <asm/kvm_hyp.h>

/* Yes, this does nothing, on purpose */
@@ -33,39 +34,43 @@ static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }

static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
{
- ctxt->sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1);
- ctxt->sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0);
- ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
- ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1);
- ctxt->sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1);
+ u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
+
+ sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1);
+ sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0);
+ sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
+ sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1);
+ sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1);
ctxt->gp_regs.regs.sp = read_sysreg(sp_el0);
ctxt->gp_regs.regs.pc = read_sysreg_el2(elr);
- ctxt->gp_regs.regs.pstate = read_sysreg_el2(spsr);
+ ctxt->hw_pstate = read_sysreg_el2(spsr);
}

static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
{
- ctxt->sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2);
- ctxt->sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1);
- ctxt->sys_regs[SCTLR_EL1] = read_sysreg_el1(sctlr);
- ctxt->sys_regs[CPACR_EL1] = read_sysreg_el1(cpacr);
- ctxt->sys_regs[TTBR0_EL1] = read_sysreg_el1(ttbr0);
- ctxt->sys_regs[TTBR1_EL1] = read_sysreg_el1(ttbr1);
- ctxt->sys_regs[TCR_EL1] = read_sysreg_el1(tcr);
- ctxt->sys_regs[ESR_EL1] = read_sysreg_el1(esr);
- ctxt->sys_regs[AFSR0_EL1] = read_sysreg_el1(afsr0);
- ctxt->sys_regs[AFSR1_EL1] = read_sysreg_el1(afsr1);
- ctxt->sys_regs[FAR_EL1] = read_sysreg_el1(far);
- ctxt->sys_regs[MAIR_EL1] = read_sysreg_el1(mair);
- ctxt->sys_regs[VBAR_EL1] = read_sysreg_el1(vbar);
- ctxt->sys_regs[CONTEXTIDR_EL1] = read_sysreg_el1(contextidr);
- ctxt->sys_regs[AMAIR_EL1] = read_sysreg_el1(amair);
- ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg_el1(cntkctl);
- ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
-
- ctxt->gp_regs.sp_el1 = read_sysreg(sp_el1);
- ctxt->gp_regs.elr_el1 = read_sysreg_el1(elr);
- ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
+ u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
+
+ sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2);
+ sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1);
+ sys_regs[SCTLR_EL1] = read_sysreg_el1(sctlr);
+ sys_regs[CPACR_EL1] = read_sysreg_el1(cpacr);
+ sys_regs[TTBR0_EL1] = read_sysreg_el1(ttbr0);
+ sys_regs[TTBR1_EL1] = read_sysreg_el1(ttbr1);
+ sys_regs[TCR_EL1] = read_sysreg_el1(tcr);
+ sys_regs[ESR_EL1] = read_sysreg_el1(esr);
+ sys_regs[AFSR0_EL1] = read_sysreg_el1(afsr0);
+ sys_regs[AFSR1_EL1] = read_sysreg_el1(afsr1);
+ sys_regs[FAR_EL1] = read_sysreg_el1(far);
+ sys_regs[MAIR_EL1] = read_sysreg_el1(mair);
+ sys_regs[VBAR_EL1] = read_sysreg_el1(vbar);
+ sys_regs[CONTEXTIDR_EL1] = read_sysreg_el1(contextidr);
+ sys_regs[AMAIR_EL1] = read_sysreg_el1(amair);
+ sys_regs[CNTKCTL_EL1] = read_sysreg_el1(cntkctl);
+ sys_regs[PAR_EL1] = read_sysreg(par_el1);
+
+ ctxt->hw_sp_el1 = read_sysreg(sp_el1);
+ ctxt->hw_elr_el1 = read_sysreg_el1(elr);
+ ctxt->hw_spsr_el1 = read_sysreg_el1(spsr);
}

static hyp_alternate_select(__sysreg_call_save_host_state,
@@ -86,39 +91,43 @@ void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)

static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
{
- write_sysreg(ctxt->sys_regs[ACTLR_EL1], actlr_el1);
- write_sysreg(ctxt->sys_regs[TPIDR_EL0], tpidr_el0);
- write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
- write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1);
- write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1);
+ u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
+
+ write_sysreg(sys_regs[ACTLR_EL1], actlr_el1);
+ write_sysreg(sys_regs[TPIDR_EL0], tpidr_el0);
+ write_sysreg(sys_regs[TPIDRRO_EL0], tpidrro_el0);
+ write_sysreg(sys_regs[TPIDR_EL1], tpidr_el1);
+ write_sysreg(sys_regs[MDSCR_EL1], mdscr_el1);
write_sysreg(ctxt->gp_regs.regs.sp, sp_el0);
write_sysreg_el2(ctxt->gp_regs.regs.pc, elr);
- write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr);
+ write_sysreg_el2(ctxt->hw_pstate, spsr);
}

static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
{
- write_sysreg(ctxt->sys_regs[MPIDR_EL1], vmpidr_el2);
- write_sysreg(ctxt->sys_regs[CSSELR_EL1], csselr_el1);
- write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1], sctlr);
- write_sysreg_el1(ctxt->sys_regs[CPACR_EL1], cpacr);
- write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1], ttbr0);
- write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1], ttbr1);
- write_sysreg_el1(ctxt->sys_regs[TCR_EL1], tcr);
- write_sysreg_el1(ctxt->sys_regs[ESR_EL1], esr);
- write_sysreg_el1(ctxt->sys_regs[AFSR0_EL1], afsr0);
- write_sysreg_el1(ctxt->sys_regs[AFSR1_EL1], afsr1);
- write_sysreg_el1(ctxt->sys_regs[FAR_EL1], far);
- write_sysreg_el1(ctxt->sys_regs[MAIR_EL1], mair);
- write_sysreg_el1(ctxt->sys_regs[VBAR_EL1], vbar);
- write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL1],contextidr);
- write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1], amair);
- write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1], cntkctl);
- write_sysreg(ctxt->sys_regs[PAR_EL1], par_el1);
-
- write_sysreg(ctxt->gp_regs.sp_el1, sp_el1);
- write_sysreg_el1(ctxt->gp_regs.elr_el1, elr);
- write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
+ u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
+
+ write_sysreg(sys_regs[MPIDR_EL1], vmpidr_el2);
+ write_sysreg(sys_regs[CSSELR_EL1], csselr_el1);
+ write_sysreg_el1(sys_regs[SCTLR_EL1], sctlr);
+ write_sysreg_el1(sys_regs[CPACR_EL1], cpacr);
+ write_sysreg_el1(sys_regs[TTBR0_EL1], ttbr0);
+ write_sysreg_el1(sys_regs[TTBR1_EL1], ttbr1);
+ write_sysreg_el1(sys_regs[TCR_EL1], tcr);
+ write_sysreg_el1(sys_regs[ESR_EL1], esr);
+ write_sysreg_el1(sys_regs[AFSR0_EL1], afsr0);
+ write_sysreg_el1(sys_regs[AFSR1_EL1], afsr1);
+ write_sysreg_el1(sys_regs[FAR_EL1], far);
+ write_sysreg_el1(sys_regs[MAIR_EL1], mair);
+ write_sysreg_el1(sys_regs[VBAR_EL1], vbar);
+ write_sysreg_el1(sys_regs[CONTEXTIDR_EL1], contextidr);
+ write_sysreg_el1(sys_regs[AMAIR_EL1], amair);
+ write_sysreg_el1(sys_regs[CNTKCTL_EL1], cntkctl);
+ write_sysreg(sys_regs[PAR_EL1], par_el1);
+
+ write_sysreg(ctxt->hw_sp_el1, sp_el1);
+ write_sysreg_el1(ctxt->hw_elr_el1, elr);
+ write_sysreg_el1(ctxt->hw_spsr_el1, spsr);
}

static hyp_alternate_select(__sysreg_call_restore_host_state,
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 36aae3a..0ff2997 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -689,6 +689,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
}

kvm_arm_setup_debug(vcpu);
+ kvm_arm_setup_shadow_state(vcpu);

/**************************************************************
* Enter the guest
@@ -704,6 +705,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
* Back from guest
*************************************************************/

+ kvm_arm_restore_shadow_state(vcpu);
kvm_arm_clear_debug(vcpu);

/*
@@ -1334,6 +1336,16 @@ static void teardown_hyp_mode(void)

static int init_vhe_mode(void)
{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ kvm_cpu_context_t *cpu_ctxt;
+
+ cpu_ctxt = per_cpu_ptr(kvm_host_cpu_state, cpu);
+
+ kvm_arm_init_cpu_context(cpu_ctxt);
+ }
+
kvm_info("VHE mode initialized successfully\n");
return 0;
}
@@ -1416,6 +1428,8 @@ static int init_hyp_mode(void)
kvm_err("Cannot map host CPU state: %d\n", err);
goto out_err;
}
+
+ kvm_arm_init_cpu_context(cpu_ctxt);
}

kvm_info("Hyp mode initialized successfully\n");
--
1.9.1
Jintack Lim
2017-07-18 16:58:33 UTC
Permalink
ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
this bit is set, accessing EL2 registers in EL1 traps to EL2. In
addition, executing the following instructions in EL1 will trap to EL2:
tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
instructions that trap to EL2 with the NV bit were undef at EL1 prior to
ARM v8.3. The only instruction that was not undef is eret.

This patch sets up a handler for EL2 registers and SP_EL1 register
accesses at EL1. The host hypervisor keeps those register values in
memory, and will emulate their behavior.

This patch doesn't set the NV bit yet. It will be set in a later patch
once nested virtualization support is completed.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/include/asm/kvm_host.h | 30 +++++++++++++++++++-
arch/arm64/include/asm/sysreg.h | 37 +++++++++++++++++++++++++
arch/arm64/kvm/sys_regs.c | 58 +++++++++++++++++++++++++++++++++++++++
3 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 86d4b6c..1dc4ed6 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -134,12 +134,40 @@ enum vcpu_sysreg {
PMSWINC_EL0, /* Software Increment Register */
PMUSERENR_EL0, /* User Enable Register */

- /* 32bit specific registers. Keep them at the end of the range */
+ /* 32bit specific registers. */
DACR32_EL2, /* Domain Access Control Register */
IFSR32_EL2, /* Instruction Fault Status Register */
FPEXC32_EL2, /* Floating-Point Exception Control Register */
DBGVCR32_EL2, /* Debug Vector Catch Register */

+ /* EL2 registers sorted ascending by Op0, Op1, CRn, CRm, Op2 */
+ VPIDR_EL2, /* Virtualization Processor ID Register */
+ VMPIDR_EL2, /* Virtualization Multiprocessor ID Register */
+ SCTLR_EL2, /* System Control Register (EL2) */
+ ACTLR_EL2, /* Auxiliary Control Register (EL2) */
+ HCR_EL2, /* Hypervisor Configuration Register */
+ MDCR_EL2, /* Monitor Debug Configuration Register (EL2) */
+ CPTR_EL2, /* Architectural Feature Trap Register (EL2) */
+ HSTR_EL2, /* Hypervisor System Trap Register */
+ HACR_EL2, /* Hypervisor Auxiliary Control Register */
+ TTBR0_EL2, /* Translation Table Base Register 0 (EL2) */
+ TCR_EL2, /* Translation Control Register (EL2) */
+ VTTBR_EL2, /* Virtualization Translation Table Base Register */
+ VTCR_EL2, /* Virtualization Translation Control Register */
+ AFSR0_EL2, /* Auxiliary Fault Status Register 0 (EL2) */
+ AFSR1_EL2, /* Auxiliary Fault Status Register 1 (EL2) */
+ ESR_EL2, /* Exception Syndrome Register (EL2) */
+ FAR_EL2, /* Hypervisor IPA Fault Address Register */
+ HPFAR_EL2, /* Hypervisor IPA Fault Address Register */
+ MAIR_EL2, /* Memory Attribute Indirection Register (EL2) */
+ AMAIR_EL2, /* Auxiliary Memory Attribute Indirection Register (EL2) */
+ VBAR_EL2, /* Vector Base Address Register (EL2) */
+ RVBAR_EL2, /* Reset Vector Base Address Register */
+ RMR_EL2, /* Reset Management Register */
+ TPIDR_EL2, /* EL2 Software Thread ID Register */
+ CNTVOFF_EL2, /* Counter-timer Virtual Offset register */
+ CNTHCTL_EL2, /* Counter-timer Hypervisor Control register */
+
NR_SYS_REGS /* Nothing after this line! */
};

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 737ca30..9277c4a 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -250,10 +250,42 @@

#define SYS_PMCCFILTR_EL0 sys_reg (3, 3, 14, 15, 7)

+#define SYS_VPIDR_EL2 sys_reg(3, 4, 0, 0, 0)
+#define SYS_VMPIDR_EL2 sys_reg(3, 4, 0, 0, 5)
+
+#define SYS_SCTLR_EL2 sys_reg(3, 4, 1, 0, 0)
+#define SYS_ACTLR_EL2 sys_reg(3, 4, 1, 0, 1)
+#define SYS_HCR_EL2 sys_reg(3, 4, 1, 1, 0)
+#define SYS_MDCR_EL2 sys_reg(3, 4, 1, 1, 1)
+#define SYS_CPTR_EL2 sys_reg(3, 4, 1, 1, 2)
+#define SYS_HSTR_EL2 sys_reg(3, 4, 1, 1, 3)
+#define SYS_HACR_EL2 sys_reg(3, 4, 1, 1, 7)
+
+#define SYS_TTBR0_EL2 sys_reg(3, 4, 2, 0, 0)
+#define SYS_TCR_EL2 sys_reg(3, 4, 2, 0, 2)
+#define SYS_VTTBR_EL2 sys_reg(3, 4, 2, 1, 0)
+#define SYS_VTCR_EL2 sys_reg(3, 4, 2, 1, 2)
+
#define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0)
+
+#define SYS_SP_EL1 sys_reg(3, 4, 4, 1, 0)
+
#define SYS_IFSR32_EL2 sys_reg(3, 4, 5, 0, 1)
+#define SYS_AFSR0_EL2 sys_reg(3, 4, 5, 1, 0)
+#define SYS_AFSR1_EL2 sys_reg(3, 4, 5, 1, 1)
+#define SYS_ESR_EL2 sys_reg(3, 4, 5, 2, 0)
#define SYS_FPEXC32_EL2 sys_reg(3, 4, 5, 3, 0)

+#define SYS_FAR_EL2 sys_reg(3, 4, 6, 0, 0)
+#define SYS_HPFAR_EL2 sys_reg(3, 4, 6, 0, 4)
+
+#define SYS_MAIR_EL2 sys_reg(3, 4, 10, 2, 0)
+#define SYS_AMAIR_EL2 sys_reg(3, 4, 10, 3, 0)
+
+#define SYS_VBAR_EL2 sys_reg(3, 4, 12, 0, 0)
+#define SYS_RVBAR_EL2 sys_reg(3, 4, 12, 0, 1)
+#define SYS_RMR_EL2 sys_reg(3, 4, 12, 0, 2)
+
#define __SYS__AP0Rx_EL2(x) sys_reg(3, 4, 12, 8, x)
#define SYS_ICH_AP0R0_EL2 __SYS__AP0Rx_EL2(0)
#define SYS_ICH_AP0R1_EL2 __SYS__AP0Rx_EL2(1)
@@ -295,6 +327,11 @@
#define SYS_ICH_LR14_EL2 __SYS__LR8_EL2(6)
#define SYS_ICH_LR15_EL2 __SYS__LR8_EL2(7)

+#define SYS_TPIDR_EL2 sys_reg(3, 4, 13, 0, 2)
+
+#define SYS_CNTVOFF_EL2 sys_reg(3, 4, 14, 0, 3)
+#define SYS_CNTHCTL_EL2 sys_reg(3, 4, 14, 1, 0)
+
/* Common SCTLR_ELx flags. */
#define SCTLR_ELx_EE (1 << 25)
#define SCTLR_ELx_I (1 << 12)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7786288..1568f8b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -892,6 +892,27 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
return true;
}

+static inline void access_rw(struct sys_reg_params *p, u64 *sysreg)
+{
+ if (!p->is_write)
+ p->regval = *sysreg;
+ else
+ *sysreg = p->regval;
+}
+
+static bool trap_el2_regs(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ /* SP_EL1 is NOT maintained in sys_regs array */
+ if (sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2) == SYS_SP_EL1)
+ access_rw(p, &vcpu->arch.ctxt.gp_regs.sp_el1);
+ else
+ access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
+
+ return true;
+}
+
/*
* Architected system registers.
* Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -1077,9 +1098,46 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
*/
{ SYS_DESC(SYS_PMCCFILTR_EL0), access_pmu_evtyper, reset_val, PMCCFILTR_EL0, 0 },

+ { SYS_DESC(SYS_VPIDR_EL2), trap_el2_regs, reset_val, VPIDR_EL2, 0 },
+ { SYS_DESC(SYS_VMPIDR_EL2), trap_el2_regs, reset_val, VMPIDR_EL2, 0 },
+
+ { SYS_DESC(SYS_SCTLR_EL2), trap_el2_regs, reset_val, SCTLR_EL2, 0 },
+ { SYS_DESC(SYS_ACTLR_EL2), trap_el2_regs, reset_val, ACTLR_EL2, 0 },
+ { SYS_DESC(SYS_HCR_EL2), trap_el2_regs, reset_val, HCR_EL2, 0 },
+ { SYS_DESC(SYS_MDCR_EL2), trap_el2_regs, reset_val, MDCR_EL2, 0 },
+ { SYS_DESC(SYS_CPTR_EL2), trap_el2_regs, reset_val, CPTR_EL2, 0 },
+ { SYS_DESC(SYS_HSTR_EL2), trap_el2_regs, reset_val, HSTR_EL2, 0 },
+ { SYS_DESC(SYS_HACR_EL2), trap_el2_regs, reset_val, HACR_EL2, 0 },
+
+ { SYS_DESC(SYS_TTBR0_EL2), trap_el2_regs, reset_val, TTBR0_EL2, 0 },
+ { SYS_DESC(SYS_TCR_EL2), trap_el2_regs, reset_val, TCR_EL2, 0 },
+ { SYS_DESC(SYS_VTTBR_EL2), trap_el2_regs, reset_val, VTTBR_EL2, 0 },
+ { SYS_DESC(SYS_VTCR_EL2), trap_el2_regs, reset_val, VTCR_EL2, 0 },
+
{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
+
+ { SYS_DESC(SYS_SP_EL1), trap_el2_regs },
+
{ SYS_DESC(SYS_IFSR32_EL2), NULL, reset_unknown, IFSR32_EL2 },
+ { SYS_DESC(SYS_AFSR0_EL2), trap_el2_regs, reset_val, AFSR0_EL2, 0 },
+ { SYS_DESC(SYS_AFSR1_EL2), trap_el2_regs, reset_val, AFSR1_EL2, 0 },
+ { SYS_DESC(SYS_ESR_EL2), trap_el2_regs, reset_val, ESR_EL2, 0 },
{ SYS_DESC(SYS_FPEXC32_EL2), NULL, reset_val, FPEXC32_EL2, 0x70 },
+
+ { SYS_DESC(SYS_FAR_EL2), trap_el2_regs, reset_val, FAR_EL2, 0 },
+ { SYS_DESC(SYS_HPFAR_EL2), trap_el2_regs, reset_val, HPFAR_EL2, 0 },
+
+ { SYS_DESC(SYS_MAIR_EL2), trap_el2_regs, reset_val, MAIR_EL2, 0 },
+ { SYS_DESC(SYS_AMAIR_EL2), trap_el2_regs, reset_val, AMAIR_EL2, 0 },
+
+ { SYS_DESC(SYS_VBAR_EL2), trap_el2_regs, reset_val, VBAR_EL2, 0 },
+ { SYS_DESC(SYS_RVBAR_EL2), trap_el2_regs, reset_val, RVBAR_EL2, 0 },
+ { SYS_DESC(SYS_RMR_EL2), trap_el2_regs, reset_val, RMR_EL2, 0 },
+
+ { SYS_DESC(SYS_TPIDR_EL2), trap_el2_regs, reset_val, TPIDR_EL2, 0 },
+
+ { SYS_DESC(SYS_CNTVOFF_EL2), trap_el2_regs, reset_val, CNTVOFF_EL2, 0 },
+ { SYS_DESC(SYS_CNTHCTL_EL2), trap_el2_regs, reset_val, CNTHCTL_EL2, 0 },
};

static bool trap_dbgidr(struct kvm_vcpu *vcpu,
--
1.9.1
Jintack Lim
2017-07-18 16:58:37 UTC
Permalink
If the guest exception level is EL2, then set up the shadow context of
the virtual EL2 to hardware. Otherwise, set the regular EL0/EL1 context.

Note that the shadow context content will be prepared in subsequent
patches.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/kvm/context.c | 74 +++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 64 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index bc43e66..2645787 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -18,11 +18,29 @@
#include <linux/kvm_host.h>
#include <asm/kvm_emulate.h>

-/**
- * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
- * @vcpu: The VCPU pointer
- */
-void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
+static void flush_shadow_special_regs(struct kvm_vcpu *vcpu)
+{
+ struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+
+ ctxt->hw_pstate = *vcpu_cpsr(vcpu) & ~PSR_MODE_MASK;
+ /*
+ * We can emulate the guest's configuration of which
+ * stack pointer to use when executing in virtual EL2 by
+ * using the equivalent feature in EL1 to point to
+ * either the EL1 or EL0 stack pointer.
+ */
+ if ((*vcpu_cpsr(vcpu) & PSR_MODE_MASK) == PSR_MODE_EL2h)
+ ctxt->hw_pstate |= PSR_MODE_EL1h;
+ else
+ ctxt->hw_pstate |= PSR_MODE_EL1t;
+
+ ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
+ ctxt->hw_sp_el1 = vcpu_el2_sreg(vcpu, SP_EL2);
+ ctxt->hw_elr_el1 = vcpu_el2_sreg(vcpu, ELR_EL2);
+ ctxt->hw_spsr_el1 = vcpu_el2_sreg(vcpu, SPSR_EL2);
+}
+
+static void flush_special_regs(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;

@@ -33,11 +51,18 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
ctxt->hw_spsr_el1 = ctxt->gp_regs.spsr[KVM_SPSR_EL1];
}

-/**
- * kvm_arm_restore_shadow_state -- write back shadow state from guest
- * @vcpu: The VCPU pointer
- */
-void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
+static void sync_shadow_special_regs(struct kvm_vcpu *vcpu)
+{
+ struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+
+ *vcpu_cpsr(vcpu) &= PSR_MODE_MASK;
+ *vcpu_cpsr(vcpu) |= ctxt->hw_pstate & ~PSR_MODE_MASK;
+ vcpu_el2_sreg(vcpu, SP_EL2) = ctxt->hw_sp_el1;
+ vcpu_el2_sreg(vcpu, ELR_EL2) = ctxt->hw_elr_el1;
+ vcpu_el2_sreg(vcpu, SPSR_EL2) = ctxt->hw_spsr_el1;
+}
+
+static void sync_special_regs(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;

@@ -47,6 +72,35 @@ void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
ctxt->gp_regs.spsr[KVM_SPSR_EL1] = ctxt->hw_spsr_el1;
}

+/**
+ * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
+ * @vcpu: The VCPU pointer
+ */
+void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
+{
+ struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+
+ if (unlikely(vcpu_mode_el2(vcpu))) {
+ flush_shadow_special_regs(vcpu);
+ ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
+ } else {
+ flush_special_regs(vcpu);
+ ctxt->hw_sys_regs = ctxt->sys_regs;
+ }
+}
+
+/**
+ * kvm_arm_restore_shadow_state -- write back shadow state from guest
+ * @vcpu: The VCPU pointer
+ */
+void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
+{
+ if (unlikely(vcpu_mode_el2(vcpu)))
+ sync_shadow_special_regs(vcpu);
+ else
+ sync_special_regs(vcpu);
+}
+
void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)
{
/* This is to set hw_sys_regs of host_cpu_context */
--
1.9.1
Jintack Lim
2017-07-18 16:58:35 UTC
Permalink
With the nested virtualization support, a hypervisor running inside a VM
(i.e. a guest hypervisor) is now deprivilaged and runs in EL1 instead of
EL2. So, the host hypervisor manages the shadow context for the virtual
EL2 execution.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/include/asm/kvm_host.h | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 57dccde..46880c3 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -229,6 +229,19 @@ struct kvm_cpu_context {
};

u64 el2_special_regs[NR_EL2_SPECIAL_REGS];
+
+ u64 shadow_sys_regs[NR_SYS_REGS]; /* only used for virtual EL2 */
+
+ /*
+ * hw_* will be written to the hardware when entering to a VM.
+ * They have either the virtual EL2 or EL1/EL0 context depending
+ * on the vcpu mode.
+ */
+ u64 *hw_sys_regs;
+ u64 hw_sp_el1;
+ u64 hw_pstate;
+ u64 hw_elr_el1;
+ u64 hw_spsr_el1;
};

typedef struct kvm_cpu_context kvm_cpu_context_t;
--
1.9.1
Jintack Lim
2017-07-18 16:58:42 UTC
Permalink
Support inject synchronous exceptions to the virtual EL2 as
described in ARM ARM AArch64.TakeException().

This can be easily extended to support to inject asynchronous exceptions
to the virtual EL2, but it will be added in a later patch when appropriate.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm/include/asm/kvm_emulate.h | 7 +++
arch/arm64/include/asm/kvm_emulate.h | 2 +
arch/arm64/kvm/Makefile | 1 +
arch/arm64/kvm/emulate-nested.c | 83 ++++++++++++++++++++++++++++++++++++
arch/arm64/kvm/trace.h | 20 +++++++++
5 files changed, 113 insertions(+)
create mode 100644 arch/arm64/kvm/emulate-nested.c

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 0a03b7d..29a4dec 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -47,6 +47,13 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);

+static inline int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+ kvm_err("Unexpected call to %s for the non-nesting configuration\n",
+ __func__);
+ return -EINVAL;
+}
+
static inline void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu) { };
static inline void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu) { };
static inline void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt) { };
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 94f98cc..3017234 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -54,6 +54,8 @@ enum exception_type {
void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);

+int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
+
void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 5762337..0263ef0 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -37,3 +37,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o

kvm-$(CONFIG_KVM_ARM_HOST) += nested.o
+kvm-$(CONFIG_KVM_ARM_HOST) += emulate-nested.o
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
new file mode 100644
index 0000000..48b84cc
--- /dev/null
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -0,0 +1,83 @@
+/*
+ * Copyright (C) 2016 - Linaro and Columbia University
+ * Author: Jintack Lim <***@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_emulate.h>
+
+#include "trace.h"
+
+/* This is borrowed from get_except_vector in inject_fault.c */
+static u64 get_el2_except_vector(struct kvm_vcpu *vcpu,
+ enum exception_type type)
+{
+ u64 exc_offset;
+
+ switch (*vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT)) {
+ case PSR_MODE_EL2t:
+ exc_offset = CURRENT_EL_SP_EL0_VECTOR;
+ break;
+ case PSR_MODE_EL2h:
+ exc_offset = CURRENT_EL_SP_ELx_VECTOR;
+ break;
+ case PSR_MODE_EL1t:
+ case PSR_MODE_EL1h:
+ case PSR_MODE_EL0t:
+ exc_offset = LOWER_EL_AArch64_VECTOR;
+ break;
+ default:
+ kvm_err("Unexpected previous exception level: aarch32\n");
+ exc_offset = LOWER_EL_AArch32_VECTOR;
+ }
+
+ return vcpu_sys_reg(vcpu, VBAR_EL2) + exc_offset + type;
+}
+
+/*
+ * Emulate taking an exception to EL2.
+ * See ARM ARM J8.1.2 AArch64.TakeException()
+ */
+static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
+ enum exception_type type)
+{
+ int ret = 1;
+
+ if (!nested_virt_in_use(vcpu)) {
+ kvm_err("Unexpected call to %s for the non-nesting configuration\n",
+ __func__);
+ return -EINVAL;
+ }
+
+ vcpu_el2_sreg(vcpu, SPSR_EL2) = *vcpu_cpsr(vcpu);
+ vcpu_el2_sreg(vcpu, ELR_EL2) = *vcpu_pc(vcpu);
+ vcpu_sys_reg(vcpu, ESR_EL2) = esr_el2;
+
+ *vcpu_pc(vcpu) = get_el2_except_vector(vcpu, type);
+ /* On an exception, PSTATE.SP becomes 1 */
+ *vcpu_cpsr(vcpu) = PSR_MODE_EL2h;
+ *vcpu_cpsr(vcpu) |= (PSR_A_BIT | PSR_F_BIT | PSR_I_BIT | PSR_D_BIT);
+
+ trace_kvm_inject_nested_exception(vcpu, esr_el2, *vcpu_pc(vcpu));
+
+ return ret;
+}
+
+int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+ return kvm_inject_nested(vcpu, esr_el2, except_type_sync);
+}
diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
index 7fb0008..7c86cfb 100644
--- a/arch/arm64/kvm/trace.h
+++ b/arch/arm64/kvm/trace.h
@@ -167,6 +167,26 @@
);


+TRACE_EVENT(kvm_inject_nested_exception,
+ TP_PROTO(struct kvm_vcpu *vcpu, unsigned long esr_el2,
+ unsigned long pc),
+ TP_ARGS(vcpu, esr_el2, pc),
+
+ TP_STRUCT__entry(
+ __field(struct kvm_vcpu *, vcpu)
+ __field(unsigned long, esr_el2)
+ __field(unsigned long, pc)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu = vcpu;
+ __entry->esr_el2 = esr_el2;
+ __entry->pc = pc;
+ ),
+
+ TP_printk("vcpu: %p, inject exception to vEL2: ESR_EL2 0x%lx, vector: 0x%016lx",
+ __entry->vcpu, __entry->esr_el2, __entry->pc)
+);
#endif /* _TRACE_ARM64_KVM_H */

#undef TRACE_INCLUDE_PATH
--
1.9.1
Jintack Lim
2017-07-18 16:58:55 UTC
Permalink
On VHE systems, EL0 of the host kernel is considered as a part of 'VHE
host'; The execution of EL0 is affected by system registers set by the
VHE kernel including the hypervisor. To emulate this for a VM, we use
the same set of system registers (i.e. shadow registers) for the virtual
EL2 and EL0 execution.

Note that the assumption so far is that a hypervisor in a VM always runs
in the virtual EL2, and the exception level change from/to the virtual
EL2 always goes through the host hypervisor. With VHE support for a VM,
however, the exception level can be changed from EL0 to virtual EL2
without trapping to the host hypervisor. So, when returning from the VHE
host mode, set the vcpu mode depending on the physical exception level.

Signed-off-by: Jintack Lim <***@linaro.org>
---
arch/arm64/kvm/context.c | 36 ++++++++++++++++++++++--------------
1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index f3d3398..39bd92d 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -150,16 +150,18 @@ static void flush_shadow_special_regs(struct kvm_vcpu *vcpu)
struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;

ctxt->hw_pstate = *vcpu_cpsr(vcpu) & ~PSR_MODE_MASK;
- /*
- * We can emulate the guest's configuration of which
- * stack pointer to use when executing in virtual EL2 by
- * using the equivalent feature in EL1 to point to
- * either the EL1 or EL0 stack pointer.
- */
- if ((*vcpu_cpsr(vcpu) & PSR_MODE_MASK) == PSR_MODE_EL2h)
- ctxt->hw_pstate |= PSR_MODE_EL1h;
- else
- ctxt->hw_pstate |= PSR_MODE_EL1t;
+ if (vcpu_mode_el2(vcpu)) {
+ /*
+ * We can emulate the guest's configuration of which
+ * stack pointer to use when executing in virtual EL2 by
+ * using the equivalent feature in EL1 to point to
+ * either the EL1 or EL0 stack pointer.
+ */
+ if ((*vcpu_cpsr(vcpu) & PSR_MODE_MASK) == PSR_MODE_EL2h)
+ ctxt->hw_pstate |= PSR_MODE_EL1h;
+ else
+ ctxt->hw_pstate |= PSR_MODE_EL1t;
+ }

ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
ctxt->hw_sp_el1 = vcpu_el2_sreg(vcpu, SP_EL2);
@@ -182,8 +184,14 @@ static void sync_shadow_special_regs(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;

- *vcpu_cpsr(vcpu) &= PSR_MODE_MASK;
- *vcpu_cpsr(vcpu) |= ctxt->hw_pstate & ~PSR_MODE_MASK;
+ *vcpu_cpsr(vcpu) = ctxt->hw_pstate;
+ *vcpu_cpsr(vcpu) &= ~PSR_MODE_MASK;
+ /* Set vcpu exception level depending on the physical EL */
+ if ((ctxt->hw_pstate & PSR_MODE_MASK) == PSR_MODE_EL0t)
+ *vcpu_cpsr(vcpu) |= PSR_MODE_EL0t;
+ else
+ *vcpu_cpsr(vcpu) |= PSR_MODE_EL2h;
+
vcpu_el2_sreg(vcpu, SP_EL2) = ctxt->hw_sp_el1;
vcpu_el2_sreg(vcpu, ELR_EL2) = ctxt->hw_elr_el1;
vcpu_el2_sreg(vcpu, SPSR_EL2) = ctxt->hw_spsr_el1;
@@ -218,7 +226,7 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;

- if (unlikely(vcpu_mode_el2(vcpu))) {
+ if (unlikely(is_hyp_ctxt(vcpu))) {
flush_shadow_special_regs(vcpu);
flush_shadow_el1_sysregs(vcpu);
flush_shadow_non_trap_el1_state(vcpu);
@@ -236,7 +244,7 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
*/
void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
{
- if (unlikely(vcpu_mode_el2(vcpu))) {
+ if (unlikely(is_hyp_ctxt(vcpu))) {
sync_shadow_special_regs(vcpu);
sync_shadow_non_trap_el1_state(vcpu);
} else
--
1.9.1
Jintack Lim
2017-07-18 16:59:03 UTC
Permalink
Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the virtual EL2 if the
virtual HCR_EL2.NV bit is set.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <***@cs.columbia.edu>
---
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/kvm/sys_regs.c | 18 ++++++++++++++++++
2 files changed, 19 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index aeaac4e..a1274b7 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -23,6 +23,7 @@
#include <asm/types.h>

/* Hyp Configuration Register (HCR) bits */
+#define HCR_NV1 (UL(1) << 43)
#define HCR_NV (UL(1) << 42)
#define HCR_E2H (UL(1) << 34)
#define HCR_ID (UL(1) << 33)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 3e4ec5e..6f67666 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1031,6 +1031,15 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
return true;
}

+/* This function is to support the recursive nested virtualization */
+static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
+{
+ if (!vcpu_mode_el2(vcpu) && (vcpu_sys_reg(vcpu, HCR_EL2) & HCR_NV1))
+ return true;
+
+ return false;
+}
+
static bool access_elr(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
@@ -1038,6 +1047,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
if (el12_reg(p) && forward_nv_traps(vcpu))
return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));

+ if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu->arch.ctxt.gp_regs.elr_el1);
return true;
}
@@ -1049,6 +1061,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
if (el12_reg(p) && forward_nv_traps(vcpu))
return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));

+ if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1]);
return true;
}
@@ -1060,6 +1075,9 @@ static bool access_vbar(struct kvm_vcpu *vcpu,
if (el12_reg(p) && forward_nv_traps(vcpu))
return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));

+ if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
return true;
}
--
1.9.1
Jintack Lim
2017-07-19 02:24:37 UTC
Permalink
Post by Jintack Lim
Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the virtual EL2 if the
virtual HCR_EL2.NV bit is set.
This is for recursive nested virtualization.
This should be linaro e-mail address. Will fix it.
Post by Jintack Lim
---
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/kvm/sys_regs.c | 18 ++++++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index aeaac4e..a1274b7 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -23,6 +23,7 @@
#include <asm/types.h>
/* Hyp Configuration Register (HCR) bits */
+#define HCR_NV1 (UL(1) << 43)
#define HCR_NV (UL(1) << 42)
#define HCR_E2H (UL(1) << 34)
#define HCR_ID (UL(1) << 33)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 3e4ec5e..6f67666 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1031,6 +1031,15 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
return true;
}
+/* This function is to support the recursive nested virtualization */
+static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
+{
+ if (!vcpu_mode_el2(vcpu) && (vcpu_sys_reg(vcpu, HCR_EL2) & HCR_NV1))
+ return true;
+
+ return false;
+}
+
static bool access_elr(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
@@ -1038,6 +1047,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
if (el12_reg(p) && forward_nv_traps(vcpu))
return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+ if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu->arch.ctxt.gp_regs.elr_el1);
return true;
}
@@ -1049,6 +1061,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
if (el12_reg(p) && forward_nv_traps(vcpu))
return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+ if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1]);
return true;
}
@@ -1060,6 +1075,9 @@ static bool access_vbar(struct kvm_vcpu *vcpu,
if (el12_reg(p) && forward_nv_traps(vcpu))
return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+ if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
return true;
}
--
1.9.1
Jintack Lim
2017-07-19 02:23:05 UTC
Permalink
Post by Jintack Lim
Nested virtualization is the ability to run a virtual machine inside another
virtual machine. In other words, it’s about running a hypervisor (the guest
hypervisor) on top of another hypervisor (the host hypervisor).
Supporting nested virtualization on ARM means that the hypervisor provides not
only EL0/EL1 execution environment to VMs as it usually does but also the
virtualization extensions including EL2 execution environment. Once the host
hypervisor provides those execution environments to the VMs, then the guest
hypervisor can run its own VMs (nested VMs) naturally.
This series supports nested virtualization on arm64. ARM recently announced an
extension (ARMv8.3) which has support for nested virtualization[1]. This patch
set is based on the ARMv8.3 specification and tested on the FastModel with
ARMv8.3 extension.
The whole patch set to support nested virtualization is huge over 70
patches, so I categorized them into four parts: CPU, memory, VGIC, and timer
virtualization. This patch series is the first part.
CPU virtualization patch series provides basic nested virtualization framework
and instruction emulations including v8.1 VHE feature and v8.3 nested
virtualization feature for VMs.
This patch series again can be divided into four parts. Patch 1 to 5 introduces
nested virtualization by discovering hardware feature, adding a kernel
parameter and allowing the userspace to set the initial CPU mode to EL2.
Patch 6 to 25 are to support the EL2 execution environment, the virtual EL2, to
a VM on v8.0 architecture. We de-privilege the guest hypervisor and emulate the
virtual EL2 mode in EL1 using the hardware features provided by ARMv8.3; The
host hypervisor manages virtual EL2 register state for the guest hypervisor
and shadow EL1 register state that reflects the virtual EL2 register state to
run the guest hypervisor in EL1.
Patch 26 to 33 add support for the virtual EL2 with Virtualization Host
Extensions. These patches emulate newly defined registers and bits in v8.1 and
allow the virtual EL2 to access EL2 register states via EL1 register accesses
as in the real EL2.
Patch 34 to 38 are to support for the virtual EL2 with nested virtualization.
These enable recursive nested virtualization.
This patch set is tested on the FastModel with the v8.3 extension for arm64 and
a cubietruck for arm32. On the FastModel, the host and the guest kernels are
compiled with and without VHE, so there are four combinations. I was able to
boot SMP Linux in the nested VM on all four configurations and able to run
hackbench. I also checked that regular VMs could boot when the nested
virtualization kernel parameter was not set. On the cubietruck, I also verified
that regular VMs could boot as well.
I'll share my experiment setup shortly.
I summarized my experiment setup here.

https://github.com/columbia/nesting-pub/wiki/Nested-virtualization-on-ARM-setup
Post by Jintack Lim
Even though this work has some limitations and TODOs, I'd appreciate early
- Overall design to manage vcpu context for the virtual EL2
- Verifying correct EL2 register configurations such as HCR_EL2, CPTR_EL2
(Patch 30 and 32)
- Patch organization and coding style
I also wonder if the hardware and/or KVM do not support nested
virtualization but the userspace uses nested virtualization option,
which one is better: giving an error or launching a regular VM
silently.
Post by Jintack Lim
This patch series is based on kvm/next d38338e.
The whole patch series including memory, VGIC, and timer patches is available
- There are some cases that the target exception level of a VM is ambiguous when
emulating eret instruction. I'm discussing this issue with Christoffer and
Marc. Meanwhile, I added a temporary patch (not included in this
series. f1beaba in the repo) and used 4.10.0 kernel when testing the guest
hypervisor with VHE.
- Recursive nested virtualization is not tested yet.
- Other hypervisors (such as Xen) on KVM are not tested.
- Submit memory, VGIC, and timer patches
- Evaluate regular VM performance to see if there's a negative impact.
- Test other hypervisors such as Xen on KVM
- Test recursive nested virtualization
- Added support for the virtual EL2 with VHE
- Rewrote commit messages and comments from the perspective of supporting
execution environments to VMs, rather than from the perspective of the guest
hypervisor running in them.
- Fixed a few bugs to make it run on the FastModel.
- Tested on ARMv8.3 with four configurations. (host/guest. with/without VHE.)
- Rebased to kvm/next
[1] https://www.community.arm.com/processors/b/blog/posts/armv8-a-architecture-2016-additions
KVM: arm64: Add KVM nesting feature
KVM: arm64: Allow userspace to set PSR_MODE_EL2x
KVM: arm64: Add vcpu_mode_el2 primitive to support nesting
KVM: arm/arm64: Add a framework to prepare virtual EL2 execution
arm64: Add missing TCR hw defines
KVM: arm64: Create shadow EL1 registers
KVM: arm64: Trap EL1 VM register accesses in virtual EL2
arm64: Add ARM64_HAS_NESTED_VIRT feature
KVM: arm/arm64: Enable nested virtualization via command-line
KVM: arm/arm64: Check if nested virtualization is in use
KVM: arm64: Add EL2 system registers to vcpu context
KVM: arm64: Add EL2 special registers to vcpu context
KVM: arm64: Add the shadow context for virtual EL2 execution
KVM: arm64: Set vcpu context depending on the guest exception level
KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and
exit
KVM: arm64: Move exception macros and enums to a common file
KVM: arm64: Support to inject exceptions to the virtual EL2
KVM: arm64: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
KVM: arm64: Trap CPACR_EL1 access in virtual EL2
KVM: arm64: Handle eret instruction traps
KVM: arm64: Set a handler for the system instruction traps
KVM: arm64: Handle PSCI call via smc from the guest
KVM: arm64: Inject HVC exceptions to the virtual EL2
KVM: arm64: Respect virtual HCR_EL2.TWX setting
KVM: arm64: Respect virtual CPTR_EL2.TFP setting
KVM: arm64: Add macros to support the virtual EL2 with VHE
KVM: arm64: Add EL2 registers defined in ARMv8.1 to vcpu context
KVM: arm64: Emulate EL12 register accesses from the virtual EL2
KVM: arm64: Support a VM with VHE considering EL0 of the VHE host
KVM: arm64: Allow the virtual EL2 to access EL2 states without trap
KVM: arm64: Manage the shadow states when virtual E2H bit enabled
KVM: arm64: Trap and emulate CPTR_EL2 accesses via CPACR_EL1 from the
virtual EL2 with VHE
KVM: arm64: Emulate appropriate VM control system registers
KVM: arm64: Respect the virtual HCR_EL2.NV bit setting
KVM: arm64: Respect the virtual HCR_EL2.NV bit setting for EL12
register traps
KVM: arm64: Respect virtual HCR_EL2.TVM and TRVM settings
KVM: arm64: Respect the virtual HCR_EL2.NV1 bit setting
KVM: arm64: Respect the virtual CPTR_EL2.TCPAC setting
Documentation/admin-guide/kernel-parameters.txt | 4 +
arch/arm/include/asm/kvm_emulate.h | 17 ++
arch/arm/include/asm/kvm_host.h | 15 +
arch/arm64/include/asm/cpucaps.h | 3 +-
arch/arm64/include/asm/esr.h | 1 +
arch/arm64/include/asm/kvm_arm.h | 2 +
arch/arm64/include/asm/kvm_coproc.h | 3 +-
arch/arm64/include/asm/kvm_emulate.h | 56 ++++
arch/arm64/include/asm/kvm_host.h | 64 ++++-
arch/arm64/include/asm/kvm_hyp.h | 24 --
arch/arm64/include/asm/pgtable-hwdef.h | 6 +
arch/arm64/include/asm/sysreg.h | 70 +++++
arch/arm64/include/uapi/asm/kvm.h | 1 +
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kernel/cpufeature.c | 11 +
arch/arm64/kvm/Makefile | 5 +-
arch/arm64/kvm/context.c | 346 +++++++++++++++++++++++
arch/arm64/kvm/emulate-nested.c | 83 ++++++
arch/arm64/kvm/guest.c | 2 +
arch/arm64/kvm/handle_exit.c | 89 +++++-
arch/arm64/kvm/hyp/entry.S | 13 +
arch/arm64/kvm/hyp/hyp-entry.S | 2 +-
arch/arm64/kvm/hyp/switch.c | 33 ++-
arch/arm64/kvm/hyp/sysreg-sr.c | 117 ++++----
arch/arm64/kvm/inject_fault.c | 12 -
arch/arm64/kvm/nested.c | 63 +++++
arch/arm64/kvm/reset.c | 8 +
arch/arm64/kvm/sys_regs.c | 359 +++++++++++++++++++++++-
arch/arm64/kvm/sys_regs.h | 8 +
arch/arm64/kvm/trace.h | 43 ++-
virt/kvm/arm/arm.c | 20 ++
31 files changed, 1363 insertions(+), 118 deletions(-)
create mode 100644 arch/arm64/kvm/context.c
create mode 100644 arch/arm64/kvm/emulate-nested.c
create mode 100644 arch/arm64/kvm/nested.c
--
1.9.1
Christoffer Dall
2017-07-19 08:49:38 UTC
Permalink
Hi Jintack,
Post by Jintack Lim
Post by Jintack Lim
Nested virtualization is the ability to run a virtual machine inside another
virtual machine. In other words, it’s about running a hypervisor (the guest
hypervisor) on top of another hypervisor (the host hypervisor).
Supporting nested virtualization on ARM means that the hypervisor provides not
only EL0/EL1 execution environment to VMs as it usually does but also the
virtualization extensions including EL2 execution environment. Once the host
hypervisor provides those execution environments to the VMs, then the guest
hypervisor can run its own VMs (nested VMs) naturally.
This series supports nested virtualization on arm64. ARM recently announced an
extension (ARMv8.3) which has support for nested virtualization[1]. This patch
set is based on the ARMv8.3 specification and tested on the FastModel with
ARMv8.3 extension.
The whole patch set to support nested virtualization is huge over 70
patches, so I categorized them into four parts: CPU, memory, VGIC, and timer
virtualization. This patch series is the first part.
CPU virtualization patch series provides basic nested virtualization framework
and instruction emulations including v8.1 VHE feature and v8.3 nested
virtualization feature for VMs.
This patch series again can be divided into four parts. Patch 1 to 5 introduces
nested virtualization by discovering hardware feature, adding a kernel
parameter and allowing the userspace to set the initial CPU mode to EL2.
Patch 6 to 25 are to support the EL2 execution environment, the virtual EL2, to
a VM on v8.0 architecture. We de-privilege the guest hypervisor and emulate the
virtual EL2 mode in EL1 using the hardware features provided by ARMv8.3; The
host hypervisor manages virtual EL2 register state for the guest hypervisor
and shadow EL1 register state that reflects the virtual EL2 register state to
run the guest hypervisor in EL1.
Patch 26 to 33 add support for the virtual EL2 with Virtualization Host
Extensions. These patches emulate newly defined registers and bits in v8.1 and
allow the virtual EL2 to access EL2 register states via EL1 register accesses
as in the real EL2.
Patch 34 to 38 are to support for the virtual EL2 with nested virtualization.
These enable recursive nested virtualization.
This patch set is tested on the FastModel with the v8.3 extension for arm64 and
a cubietruck for arm32. On the FastModel, the host and the guest kernels are
compiled with and without VHE, so there are four combinations. I was able to
boot SMP Linux in the nested VM on all four configurations and able to run
hackbench. I also checked that regular VMs could boot when the nested
virtualization kernel parameter was not set. On the cubietruck, I also verified
that regular VMs could boot as well.
I'll share my experiment setup shortly.
I summarized my experiment setup here.
https://github.com/columbia/nesting-pub/wiki/Nested-virtualization-on-ARM-setup
Thanks for sharing this.
Post by Jintack Lim
Post by Jintack Lim
Even though this work has some limitations and TODOs, I'd appreciate early
- Overall design to manage vcpu context for the virtual EL2
- Verifying correct EL2 register configurations such as HCR_EL2, CPTR_EL2
(Patch 30 and 32)
- Patch organization and coding style
I also wonder if the hardware and/or KVM do not support nested
virtualization but the userspace uses nested virtualization option,
which one is better: giving an error or launching a regular VM
silently.
I think KVM should complain to userspace if userspace tries to set a
feature it does not support, and I think userspace should give as
meaningful an error message as possible to the user when that happens.

Thanks,
-Christoffer
Jintack Lim
2017-07-19 14:35:30 UTC
Permalink
Post by Christoffer Dall
Hi Jintack,
Post by Jintack Lim
Post by Jintack Lim
Nested virtualization is the ability to run a virtual machine inside another
virtual machine. In other words, it’s about running a hypervisor (the guest
hypervisor) on top of another hypervisor (the host hypervisor).
Supporting nested virtualization on ARM means that the hypervisor provides not
only EL0/EL1 execution environment to VMs as it usually does but also the
virtualization extensions including EL2 execution environment. Once the host
hypervisor provides those execution environments to the VMs, then the guest
hypervisor can run its own VMs (nested VMs) naturally.
This series supports nested virtualization on arm64. ARM recently announced an
extension (ARMv8.3) which has support for nested virtualization[1]. This patch
set is based on the ARMv8.3 specification and tested on the FastModel with
ARMv8.3 extension.
The whole patch set to support nested virtualization is huge over 70
patches, so I categorized them into four parts: CPU, memory, VGIC, and timer
virtualization. This patch series is the first part.
CPU virtualization patch series provides basic nested virtualization framework
and instruction emulations including v8.1 VHE feature and v8.3 nested
virtualization feature for VMs.
This patch series again can be divided into four parts. Patch 1 to 5 introduces
nested virtualization by discovering hardware feature, adding a kernel
parameter and allowing the userspace to set the initial CPU mode to EL2.
Patch 6 to 25 are to support the EL2 execution environment, the virtual EL2, to
a VM on v8.0 architecture. We de-privilege the guest hypervisor and emulate the
virtual EL2 mode in EL1 using the hardware features provided by ARMv8.3; The
host hypervisor manages virtual EL2 register state for the guest hypervisor
and shadow EL1 register state that reflects the virtual EL2 register state to
run the guest hypervisor in EL1.
Patch 26 to 33 add support for the virtual EL2 with Virtualization Host
Extensions. These patches emulate newly defined registers and bits in v8.1 and
allow the virtual EL2 to access EL2 register states via EL1 register accesses
as in the real EL2.
Patch 34 to 38 are to support for the virtual EL2 with nested virtualization.
These enable recursive nested virtualization.
This patch set is tested on the FastModel with the v8.3 extension for arm64 and
a cubietruck for arm32. On the FastModel, the host and the guest kernels are
compiled with and without VHE, so there are four combinations. I was able to
boot SMP Linux in the nested VM on all four configurations and able to run
hackbench. I also checked that regular VMs could boot when the nested
virtualization kernel parameter was not set. On the cubietruck, I also verified
that regular VMs could boot as well.
I'll share my experiment setup shortly.
I summarized my experiment setup here.
https://github.com/columbia/nesting-pub/wiki/Nested-virtualization-on-ARM-setup
Thanks for sharing this.
Post by Jintack Lim
Post by Jintack Lim
Even though this work has some limitations and TODOs, I'd appreciate early
- Overall design to manage vcpu context for the virtual EL2
- Verifying correct EL2 register configurations such as HCR_EL2, CPTR_EL2
(Patch 30 and 32)
- Patch organization and coding style
I also wonder if the hardware and/or KVM do not support nested
virtualization but the userspace uses nested virtualization option,
which one is better: giving an error or launching a regular VM
silently.
I think KVM should complain to userspace if userspace tries to set a
feature it does not support, and I think userspace should give as
meaningful an error message as possible to the user when that happens.
Ok, thanks. I'll work this out.
Post by Christoffer Dall
Thanks,
-Christoffer
Loading...