Commit Graph

201429 Commits

Author SHA1 Message Date
Lai Jiangshan
3af1817a0d KVM: MMU: calculate correct gfn for small host pages backing large guest pages
In Documentation/kvm/mmu.txt:
  gfn:
    Either the guest page table containing the translations shadowed by this
    page, or the base page frame for linear translations. See role.direct.

But in function FNAME(fetch)(), sp->gfn is incorrect when one of following
situations occurred:

 1) guest is 32bit paging and the guest PDE maps a 4-MByte page
    (backed by 4k host pages), FNAME(fetch)() miss handling the quadrant.

    And if guest use pse-36, "table_gfn = gpte_to_gfn(gw->ptes[level - delta]);"
    is incorrect.

 2) guest is long mode paging and the guest PDPTE maps a 1-GByte page
    (backed by 4k or 2M host pages).

So we fix it to suit to the document and suit to the code which
requires sp->gfn correct when sp->role.direct=1.

We use the goal mapping gfn(gw->gfn) to calculate the base page frame
for linear translations, it is simple and easy to be understood.

Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Reported-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:39:21 +03:00
Lai Jiangshan
c9fa0b3bef KVM: MMU: Calculate correct base gfn for direct non-DIR level
In Document/kvm/mmu.txt:
  gfn:
    Either the guest page table containing the translations shadowed by this
    page, or the base page frame for linear translations. See role.direct.

But in __direct_map(), the base gfn calculation is incorrect,
it does not calculate correctly when level=3 or 4.

Fix by using PT64_LVL_ADDR_MASK() which accounts for all levels correctly.

Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:53 +03:00
Lai Jiangshan
2032a93d66 KVM: MMU: Don't allocate gfns page for direct mmu pages
When sp->role.direct is set, sp->gfns does not contain any essential
information, leaf sptes reachable from this sp are for a continuous
guest physical memory range (a linear range).
So sp->gfns[i] (if it was set) equals to sp->gfn + i. (PT_PAGE_TABLE_LEVEL)
Obviously, it is not essential information, we can calculate it when need.

It means we don't need sp->gfns when sp->role.direct=1,
Thus we can save one page usage for every kvm_mmu_page.

Note:
  Access to sp->gfns must be wrapped by kvm_mmu_page_get_gfn()
  or kvm_mmu_page_set_gfn().
  It is only exposed in FNAME(sync_page).

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:52 +03:00
Mohammed Gamal
c8174f7b35 KVM: VMX: Add constant for invalid guest state exit reason
For the sake of completeness, this patch adds a symbolic
constant for VMX exit reason 0x21 (invalid guest state).

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:52 +03:00
Xiao Guangrong
9f1a122f97 KVM: MMU: allow more page become unsync at getting sp time
Allow more page become asynchronous at getting sp time, if need create new
shadow page for gfn but it not allow unsync(level > 1), we should unsync all
gfn's unsync page

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:52 +03:00
Xiao Guangrong
9cf5cf5ad4 KVM: MMU: allow more page become unsync at gfn mapping time
In current code, shadow page can become asynchronous only if one
shadow page for a gfn, this rule is too strict, in fact, we can
let all last mapping page(i.e, it's the pte page) become unsync,
and sync them at invlpg or flush tlb time.

This patch allow more page become asynchronous at gfn mapping time

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:51 +03:00
Avi Kivity
221d059d15 KVM: Update Red Hat copyrights
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:51 +03:00
Gleb Natapov
9fb2d2b4ff KVM: SVM: correctly trace irq injection
On SVM interrupts are injected by svm_set_irq() not svm_inject_irq().
The later is used only to wait for irq window.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:51 +03:00
Xiao Guangrong
f78978aa3a KVM: MMU: only update unsync page in invlpg path
Only unsync pages need updated at invlpg time since other shadow
pages are write-protected

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:50 +03:00
Xiao Guangrong
e02aa901b1 KVM: MMU: don't write-protect if have new mapping to unsync page
Two cases maybe happen in kvm_mmu_get_page() function:

- one case is, the goal sp is already in cache, if the sp is unsync,
  we only need update it to assure this mapping is valid, but not
  mark it sync and not write-protect sp->gfn since it not broke unsync
  rule(one shadow page for a gfn)

- another case is, the goal sp not existed, we need create a new sp
  for gfn, i.e, gfn (may)has another shadow page, to keep unsync rule,
  we should sync(mark sync and write-protect) gfn's unsync shadow page.
  After enabling multiple unsync shadows, we sync those shadow pages
  only when the new sp not allow to become unsync(also for the unsyc
  rule, the new rule is: allow all pte page become unsync)

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:50 +03:00
Xiao Guangrong
1d9dc7e000 KVM: MMU: split kvm_sync_page() function
Split kvm_sync_page() into kvm_sync_page() and kvm_sync_page_transient()
to clarify the code address Avi's suggestion

kvm_sync_page_transient() function only update shadow page but not mark
it sync and not write protect sp->gfn. it will be used by later patch

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:49 +03:00
Sheng Yang
98918833a3 KVM: x86: Use FPU API
Convert KVM to use generic FPU API.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:49 +03:00
Sheng Yang
7cf30855e0 KVM: x86: Use unlazy_fpu() for host FPU
We can avoid unnecessary fpu load when userspace process
didn't use FPU frequently.

Derived from Avi's idea.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:48 +03:00
Sheng Yang
5ee481da7b x86: Export FPU API for KVM use
Also add some constants.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:48 +03:00
Avi Kivity
9373662463 KVM: Consolidate arch specific vcpu ioctl locking
Now that all arch specific ioctls have centralized locking, it is easy to
move it to the central dispatcher.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:48 +03:00
Avi Kivity
19483d1440 KVM: PPC: Centralize locking of arch specific vcpu ioctls
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:48 +03:00
Avi Kivity
bc923cc93b KVM: s390: Centrally lock arch specific vcpu ioctls
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:47 +03:00
Avi Kivity
526b78ad1a KVM: x86: Lock arch specific vcpu ioctls centrally
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:47 +03:00
Avi Kivity
2122ff5eab KVM: move vcpu locking to dispatcher for generic vcpu ioctls
All vcpu ioctls need to be locked, so instead of locking each one specifically
we lock at the generic dispatcher.

This patch only updates generic ioctls and leaves arch specific ioctls alone.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:47 +03:00
Xiao Guangrong
1683b2416e KVM: x86: cleanup unused local variable
fix:
 arch/x86/kvm/x86.c: In function ‘handle_emulation_failure’:
 arch/x86/kvm/x86.c:3844: warning: unused variable ‘ctxt’

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:46 +03:00
Xiao Guangrong
f55c3f419a KVM: MMU: unalias gfn before sp->gfns[] comparison in sync_page
sp->gfns[] contain unaliased gfns, but gpte might contain pointer
to aliased region.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:46 +03:00
Xiao Guangrong
6d74229f01 KVM: MMU: remove rmap before clear spte
Remove rmap before clear spte otherwise it will trigger BUG_ON() in
some functions such as rmap_write_protect().

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:46 +03:00
Xiao Guangrong
e8ad9a7074 KVM: MMU: use proper cache object freeing function
Use kmem_cache_free to free objects allocated by kmem_cache_alloc.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:46 +03:00
Alex Williamson
48bb09eee4 KVM: remove CAP_SYS_RAWIO requirement from kvm_vm_ioctl_assign_irq
Remove this check in an effort to allow kvm guests to run without
root privileges.  This capability check doesn't seem to add any
security since the device needs to have already been added via the
assign device ioctl and the io actually occurs through the pci
sysfs interface.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:45 +03:00
Sheng Yang
aad827034e KVM: VMX: Only reset MMU when necessary
Only modifying some bits of CR0/CR4 needs paging mode switch.

Modify EFER.NXE bit would result in reserved bit updates.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:45 +03:00
Sheng Yang
62ad07551a KVM: x86: Clean up duplicate assignment
mmu.free() already set root_hpa to INVALID_PAGE, no need to do it again in the
destory_kvm_mmu().

kvm_x86_ops->set_cr4() and set_efer() already assign cr4/efer to
vcpu->arch.cr4/efer, no need to do it again later.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:44 +03:00
Mohammed Gamal
222b7c52c3 KVM: x86 emulator: Add missing decoder flags for xor instructions
This adds missing decoder flags for xor instructions (opcodes 0x34 - 0x35)

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:44 +03:00
Mohammed Gamal
abc190830f KVM: x86 emulator: Add missing decoder flags for sub instruction
This adds missing decoder flags for sub instructions (opcodes 0x2c - 0x2d)

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:44 +03:00
Mohammed Gamal
dfb507c41d KVM: x86 emulator: Add test acc, imm instruction (opcodes 0xA8 - 0xA9)
This adds test acc, imm instruction to the x86 emulator

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:43 +03:00
Marcelo Tosatti
24955b6c90 KVM: pass correct parameter to kvm_mmu_free_some_pages
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:43 +03:00
Dongxiao Xu
4610c9cc6d KVM: VMX: VMXON/VMXOFF usage changes
SDM suggests VMXON should be called before VMPTRLD, and VMXOFF
should be called after doing VMCLEAR.

Therefore in vmm coexistence case, we should firstly call VMXON
before any VMCS operation, and then call VMXOFF after the
operation is done.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:43 +03:00
Dongxiao Xu
b923e62e4d KVM: VMX: VMCLEAR/VMPTRLD usage changes
Originally VMCLEAR/VMPTRLD is called on vcpu migration. To
support hosted VMM coexistance, VMCLEAR is executed on vcpu
schedule out, and VMPTRLD is executed on vcpu schedule in.
This could also eliminate the IPI when doing VMCLEAR.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:42 +03:00
Dongxiao Xu
92fe13be74 KVM: VMX: Some minor changes to code structure
Do some preparations for vmm coexistence support.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:42 +03:00
Dongxiao Xu
7725b89414 KVM: VMX: Define new functions to wrapper direct call of asm code
Define vmcs_load() and kvm_cpu_vmxon() to avoid direct call of asm
code. Also move VMXE bit operation out of kvm_cpu_vmxoff().

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:41 +03:00
Gui Jianfeng
6859762e8a KVM: update mmu documetation for role.nxe
There's no member "cr4_nxe" in struct kvm_mmu_page_role, it names "nxe" now.
Update mmu document.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:41 +03:00
Avi Kivity
f0f5933a16 KVM: MMU: Fix free memory accounting race in mmu_alloc_roots()
We drop the mmu lock between freeing memory and allocating the roots; this
allows some other vcpu to sneak in and allocate memory.

While the race is benign (resulting only in temporary overallocation, not oom)
it is simple and easy to fix by moving the freeing close to the allocation.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:41 +03:00
Gleb Natapov
6d77dbfc88 KVM: inject #UD if instruction emulation fails and exit to userspace
Do not kill VM when instruction emulation fails. Inject #UD and report
failure to userspace instead. Userspace may choose to reenter guest if
vcpu is in userspace (cpl == 3) in which case guest OS will kill
offending process and continue running.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:35:40 +03:00
Avi Kivity
57bc24cfd6 KVM: Document KVM_SET_BOOT_CPU_ID
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:40 +03:00
Avi Kivity
47dbb84f9e KVM: Document KVM_SET_IDENTITY_MAP ioctl
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:40 +03:00
Gui Jianfeng
54a4f0239f KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed
Currently, kvm_mmu_zap_page() returning the number of freed children sp.
This might confuse the caller, because caller don't know the actual freed
number. Let's make kvm_mmu_zap_page() return the number of pages it actually
freed.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:39 +03:00
Gui Jianfeng
518c5a05e8 KVM: MMU: Fix debug output error in walk_addr()
Fix a debug output error in walk_addr

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:39 +03:00
Gui Jianfeng
f3b8c964a9 KVM: MMU: mark page table dirty when a pte is actually modified
Sometime cmpxchg_gpte doesn't modify gpte, in such case, don't mark
page table page as dirty.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:38 +03:00
Joerg Roedel
eec4b140c9 KVM: SVM: Allow EFER.LMSLE to be set with nested svm
This patch enables setting of efer bit 13 which is allowed
in all SVM capable processors. This is necessary for the
SLES11 version of Xen 4.0 to boot with nested svm.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:38 +03:00
Joerg Roedel
3f10c846f8 KVM: SVM: Dump vmcb contents on failed vmrun
This patch adds a function to dump the vmcb into the kernel
log and calls it after a failed vmrun to ease debugging.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:38 +03:00
Avi Kivity
d94e1dc9af KVM: Get rid of KVM_REQ_KICK
KVM_REQ_KICK poisons vcpu->requests by having a bit set during normal
operation.  This causes the fast path check for a clear vcpu->requests
to fail all the time, triggering tons of atomic operations.

Fix by replacing KVM_REQ_KICK with a vcpu->guest_mode atomic.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:37 +03:00
Gleb Natapov
54b8486f46 KVM: x86 emulator: do not inject exception directly into vcpu
Return exception as a result of instruction emulation and handle
injection in KVM code.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:37 +03:00
Gleb Natapov
95cb229530 KVM: x86 emulator: move interruptibility state tracking out of emulator
Emulator shouldn't access vcpu directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:36 +03:00
Gleb Natapov
4d2179e1e9 KVM: x86 emulator: handle shadowed registers outside emulator
Emulator shouldn't access vcpu directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:36 +03:00
Gleb Natapov
bdb475a323 KVM: x86 emulator: use shadowed register in emulate_sysexit()
emulate_sysexit() should use shadowed registers copy instead of
looking into vcpu state directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:36 +03:00
Gleb Natapov
ef050dc039 KVM: x86 emulator: set RFLAGS outside x86 emulator code
Removes the need for set_flags() callback.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:35 +03:00