1) LDC_MODE_RELIABLE is deprecated an unused by anything, plus
it and LDC_MODE_STREAM were mis-numbered.
2) read_stream() should try to read as much as possible into
the per-LDC stream buffer area, so do not trim the read_nonraw()
length by the caller's size parameter.
3) Send data ACKs when necessary in read_nonraw().
4) In read_nonraw() when we get a pure ACK, advance the RX head
unconditionally past it.
5) Provide the ACKID field in the ldcdgb() packet dump in read_nonraw().
This helps debugging stream mode LDC channel problems.
6) Decrease verbosity of rx_data_wait() so that it is more useful.
A debugging message each loop iteration is too much.
7) In process_data_ack() stop the loop checking when we hit lp->tx_tail
not lp->tx_head.
8) Set the seqid field properly in send_data_nack().
Signed-off-by: David S. Miller <davem@davemloft.net>
This is also a partial workaround for a bug in the LDOM firmware which
double-transmits RX inos during high load. Without this, such an
event causes the kernel to loop forever in the interrupt call chain
ACK'ing but never actually running the IRQ handler (and thus clearing
the interrupt condition in the device).
There is still a bad potential effect when double INOs occur,
not covered by this changeset. Namely, if the INO is already on
the per-cpu INO vector list, we still blindly re-insert it and
thus we can end up losing interrupts already linked in after
it.
We could deal with that by traversing the list before insertion,
but that's too expensive for this edge case.
Signed-off-by: David S. Miller <davem@davemloft.net>
Virtual devices on Sun Logical Domains are built on top
of a virtual channel framework. This, with help of hypervisor
interfaces, provides a link layer protocol with basic
handshaking over which virtual device clients and servers
communicate.
Built on top of this is a VIO device protocol which has it's
own handshaking and message types. At this layer attributes
are exchanged (disk size, network device addresses, etc.)
descriptor rings are registered, and data transfers are
triggers and replied to.
Signed-off-by: David S. Miller <davem@davemloft.net>
Only at the CPU_DYING stage can we be sure that no user process will
be scheduled onto the cpu and oops when trying to use virtualization
extensions.
Signed-off-by: Avi Kivity <avi@qumranet.com>
The hotplug IPIs can be called from the cpu on which we are currently
running on, so use on_cpu(). Similarly, drop on_each_cpu() for the
suspend/resume callbacks, as we're in atomic context here and only one
cpu is up anyway.
Signed-off-by: Avi Kivity <avi@qumranet.com>
By keeping track of which cpus have virtualization enabled, we
prevent double-enable or double-disable during hotplug, which is a
very fatal oops.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This removes the requirement for callers to get_cpu() to check in simple
cases. This patch is for !CONFIG_SMP.
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This removes the requirement for callers to get_cpu() to check in simple
cases.
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This removes the requirement for callers to get_cpu() to check in simple
cases.
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
KVM wants a notification when a cpu is about to die, so it can disable
hardware extensions, but at a time when user processes cannot be scheduled
on the cpu, so it doesn't try to use virtualization extensions after they
have been disabled.
This adds a CPU_DYING notification. The notification is called in atomic
context on the doomed cpu.
Signed-off-by: Avi Kivity <avi@qumranet.com>
kvm uses a pseudo filesystem, kvmfs, to generate inodes, a job that the
new anonymous inodes source does much better.
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch adds an implementation to the svm is_disabled function to
detect reliably if the BIOS disabled the SVM feature in the CPU. This
fixes the issues with kernel panics when loading the kvm-amd module on
machines where SVM is available but disabled.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Protected mode code may have corrupted the real-mode tss, so re-initialize
it when switching to real mode.
Signed-off-by: Avi Kivity <avi@qumranet.com>
When writing to normal memory and the memory area is unchanged the write
can be safely skipped, avoiding the costly kvm_mmu_pte_write.
Signed-Off-By: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
When the old value and new one are the same the emulator skips the
write; this is undesirable when the destination is a MMIO area and the
write shall be performed regardless of the previous value. This
optimization breaks e.g. a Linux guest APIC compiled without
X86_GOOD_APIC.
Remove the check and perform the writeback stage in the emulation unless
it's explicitly disabled (currently push and some 2 bytes instructions
may disable the writeback).
Signed-Off-By: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
With kernel-injected interrupts, we need to check for interrupts on
lightweight exits too.
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
If the time stamp counter goes backwards, a guest delay loop can become
infinite. This can happen if a vcpu is migrated to another cpu, where
the counter has a lower value than the first cpu.
Since we're doing an IPI to the first cpu anyway, we can use that to pick
up the old tsc, and use that to calculate the adjustment we need to make
to the tsc offset.
Signed-off-by: Avi Kivity <avi@qumranet.com>
When a vcpu causes a shadow tlb entry to have reduced permissions, it
must also clear the tlb on remote vcpus. We do that by:
- setting a bit on the vcpu that requests a tlb flush before the next entry
- if the vcpu is currently executing, we send an ipi to make sure it
exits before we continue
Signed-off-by: Avi Kivity <avi@qumranet.com>
A vcpu can pin up to four mmu shadow pages, which means the freeing
loop will never terminate. Fix by first unpinning shadow pages on
all vcpus, then freeing shadow pages.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Switch guest paging context may require us to allocate memory, which
might fail. Instead of wiring up error paths everywhere, make context
switching lazy and actually do the switch before the next guest entry,
where we can return an error if allocation fails.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This was once used to avoid accessing the guest pte when upgrading
the shadow pte from read-only to read-write. But usually we need
to set the guest pte dirty or accessed bits anyway, so this wasn't
really exploited.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Always set the accessed and dirty bit (since having them cleared causes
a read-modify-write cycle), always set the present bit, and copy the
nx bit from the guest.
Signed-off-by: Avi Kivity <avi@qumranet.com>
With guest smp, a second vcpu might see partial updates when the first
vcpu services a page fault. So delay all updates until we have figured
out what the pte should look like.
Note that on i386, this is still not completely atomic as a 64-bit write
will be split into two on a 32-bit machine.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This prevents some work from being performed twice, and, more importantly,
reduces the number of places where we modify shadow ptes.
Signed-off-by: Avi Kivity <avi@qumranet.com>