linux-kernel-test/include/asm-i386
Jeremy Fitzhardinge 5ead97c84f xen: Core Xen implementation
This patch is a rollup of all the core pieces of the Xen
implementation, including:
 - booting and setup
 - pagetable setup
 - privileged instructions
 - segmentation
 - interrupt flags
 - upcalls
 - multicall batching

BOOTING AND SETUP

The vmlinux image is decorated with ELF notes which tell the Xen
domain builder what the kernel's requirements are; the domain builder
then constructs the address space accordingly and starts the kernel.

Xen has its own entrypoint for the kernel (contained in an ELF note).
The ELF notes are set up by xen-head.S, which is included into head.S.
In principle it could be linked separately, but it seems to provoke
lots of binutils bugs.

Because the domain builder starts the kernel in a fairly sane state
(32-bit protected mode, paging enabled, flat segments set up), there's
not a lot of setup needed before starting the kernel proper.  The main
steps are:
  1. Install the Xen paravirt_ops, which is simply a matter of a
     structure assignment.
  2. Set init_mm to use the Xen-supplied pagetables (analogous to the
     head.S generated pagetables in a native boot).
  3. Reserve address space for Xen, since it takes a chunk at the top
     of the address space for its own use.
  4. Call start_kernel()

PAGETABLE SETUP

Once we hit the main kernel boot sequence, it will end up calling back
via paravirt_ops to set up various pieces of Xen specific state.  One
of the critical things which requires a bit of extra care is the
construction of the initial init_mm pagetable.  Because Xen places
tight constraints on pagetables (an active pagetable must always be
valid, and must always be mapped read-only to the guest domain), we
need to be careful when constructing the new pagetable to keep these
constraints in mind.  It turns out that the easiest way to do this is
use the initial Xen-provided pagetable as a template, and then just
insert new mappings for memory where a mapping doesn't already exist.

This means that during pagetable setup, it uses a special version of
xen_set_pte which ignores any attempt to remap a read-only page as
read-write (since Xen will map its own initial pagetable as RO), but
lets other changes to the ptes happen, so that things like NX are set
properly.

PRIVILEGED INSTRUCTIONS AND SEGMENTATION

When the kernel runs under Xen, it runs in ring 1 rather than ring 0.
This means that it is more privileged than user-mode in ring 3, but it
still can't run privileged instructions directly.  Non-performance
critical instructions are dealt with by taking a privilege exception
and trapping into the hypervisor and emulating the instruction, but
more performance-critical instructions have their own specific
paravirt_ops.  In many cases we can avoid having to do any hypercalls
for these instructions, or the Xen implementation is quite different
from the normal native version.

The privileged instructions fall into the broad classes of:
  Segmentation: setting up the GDT and the GDT entries, LDT,
     TLS and so on.  Xen doesn't allow the GDT to be directly
     modified; all GDT updates are done via hypercalls where the new
     entries can be validated.  This is important because Xen uses
     segment limits to prevent the guest kernel from damaging the
     hypervisor itself.
  Traps and exceptions: Xen uses a special format for trap entrypoints,
     so when the kernel wants to set an IDT entry, it needs to be
     converted to the form Xen expects.  Xen sets int 0x80 up specially
     so that the trap goes straight from userspace into the guest kernel
     without going via the hypervisor.  sysenter isn't supported.
  Kernel stack: The esp0 entry is extracted from the tss and provided to
     Xen.
  TLB operations: the various TLB calls are mapped into corresponding
     Xen hypercalls.
  Control registers: all the control registers are privileged.  The most
     important is cr3, which points to the base of the current pagetable,
     and we handle it specially.

Another instruction we treat specially is CPUID, even though its not
privileged.  We want to control what CPU features are visible to the
rest of the kernel, and so CPUID ends up going into a paravirt_op.
Xen implements this mainly to disable the ACPI and APIC subsystems.

INTERRUPT FLAGS

Xen maintains its own separate flag for masking events, which is
contained within the per-cpu vcpu_info structure.  Because the guest
kernel runs in ring 1 and not 0, the IF flag in EFLAGS is completely
ignored (and must be, because even if a guest domain disables
interrupts for itself, it can't disable them overall).

(A note on terminology: "events" and interrupts are effectively
synonymous.  However, rather than using an "enable flag", Xen uses a
"mask flag", which blocks event delivery when it is non-zero.)

There are paravirt_ops for each of cli/sti/save_fl/restore_fl, which
are implemented to manage the Xen event mask state.  The only thing
worth noting is that when events are unmasked, we need to explicitly
see if there's a pending event and call into the hypervisor to make
sure it gets delivered.

UPCALLS

Xen needs a couple of upcall (or callback) functions to be implemented
by each guest.  One is the event upcalls, which is how events
(interrupts, effectively) are delivered to the guests.  The other is
the failsafe callback, which is used to report errors in either
reloading a segment register, or caused by iret.  These are
implemented in i386/kernel/entry.S so they can jump into the normal
iret_exc path when necessary.

MULTICALL BATCHING

Xen provides a multicall mechanism, which allows multiple hypercalls
to be issued at once in order to mitigate the cost of trapping into
the hypervisor.  This is particularly useful for context switches,
since the 4-5 hypercalls they would normally need (reload cr3, update
TLS, maybe update LDT) can be reduced to one.  This patch implements a
generic batching mechanism for hypercalls, which gets used in many
places in the Xen code.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Ian Pratt <ian.pratt@xensource.com>
Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Cc: Adrian Bunk <bunk@stusta.de>
2007-07-18 08:47:42 -07:00
..
mach-bigsmp [PATCH] x86: default to physical mode on hotplug CPU kernels 2007-05-02 19:27:04 +02:00
mach-default paravirt: increase IRQ limit 2007-07-18 08:47:41 -07:00
mach-es7000 i386: es7000 build breakage fix 2007-07-06 10:23:43 -07:00
mach-generic [PATCH] x86: default to physical mode on hotplug CPU kernels 2007-05-02 19:27:04 +02:00
mach-numaq [PATCH] x86: default to physical mode on hotplug CPU kernels 2007-05-02 19:27:04 +02:00
mach-summit [PATCH] x86: Log reason why TSC was marked unstable 2007-05-02 19:27:08 +02:00
mach-visws [PATCH] x86: default to physical mode on hotplug CPU kernels 2007-05-02 19:27:04 +02:00
mach-voyager [PATCH] clockevents: i386 drivers 2007-02-16 08:13:59 -08:00
xen xen: Core Xen implementation 2007-07-18 08:47:42 -07:00
8253pit.h
a.out.h
acpi.h ACPI: cleanup: make disable_acpi() valid w/o CONFIG_ACPI 2007-02-13 00:09:13 -05:00
agp.h [AGPGART] Move [un]map_page_into_agp into asm/agp.h 2007-04-26 14:22:50 -04:00
alternative-asm.i Remove all inclusions of <linux/config.h> 2006-10-04 03:38:54 -04:00
alternative.h i386: work around miscompilation of alternatives code 2007-05-11 08:29:32 -07:00
apic.h [PATCH] i386: safe_apic_wait_icr_idle - i386 2007-05-02 19:27:17 +02:00
apicdef.h x86_64: Remove stale lapic definition from apicdef.h 2006-04-01 22:50:03 -05:00
arch_hooks.h IRQ: Maintain regs pointer globally rather than passing to IRQ handlers 2006-10-05 15:10:12 +01:00
atomic.h i386: fix early usage of atomic_add_return and local_add_return on real i386 2007-05-23 20:14:15 -07:00
auxvec.h
bitops.h Fix misspellings collected by members of KJ list. 2007-05-09 07:14:03 +02:00
boot.h include/asm-i386/boot.h: This is <asm/boot.h>, not <linux/boot.h> 2007-07-12 10:55:54 -07:00
bootparam.h Make struct boot_params a real structure, and remove obsolete fields 2007-07-12 10:55:54 -07:00
bug.h [PATCH] Generic BUG for i386 2006-12-08 08:28:39 -08:00
bugs.h [PATCH] x86: update for i386 and x86-64 check_bugs 2007-05-02 19:27:16 +02:00
byteorder.h Don't include linux/config.h from anywhere else in include/ 2006-04-26 12:56:16 +01:00
cache.h Don't include linux/config.h from anywhere else in include/ 2006-04-26 12:56:16 +01:00
cacheflush.h [PATCH] Optimize D-cache alias handling on fork 2006-12-13 09:27:08 -08:00
checksum.h [NET]: I386 checksum annotations and cleanups. 2006-12-02 21:23:19 -08:00
cmpxchg.h x86: create asm/cmpxchg.h 2007-05-08 11:15:20 -07:00
cpu.h [PATCH] i386: introduce the mechanism of disabling cpu hotplug control 2006-12-07 02:14:10 +01:00
cpufeature.h Use a new CPU feature word to cover features that are spread around 2007-07-12 10:55:54 -07:00
cputime.h
current.h [PATCH] i386: Convert PDA into the percpu section 2007-05-02 19:27:16 +02:00
debugreg.h
delay.h [PATCH] vmi: paravirt drop udelay op 2007-03-05 07:57:52 -08:00
desc.h [PATCH] i386: Page-align the GDT 2007-05-02 19:27:15 +02:00
device.h ACPI: Change ACPI to use dev_archdata instead of firmware_data 2006-12-01 14:52:01 -08:00
div64.h [NET]: div64_64 consolidate (rev3) 2007-04-25 22:23:33 -07:00
dma-mapping.h x86: Disable DAC on VIA bridges 2007-06-20 14:27:25 -07:00
dma.h Don't include linux/config.h from anywhere else in include/ 2006-04-26 12:56:16 +01:00
dmi.h [PATCH] x86_64: Implement early DMI scanning 2006-03-25 09:10:55 -08:00
dwarf2.h [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder 2006-09-26 10:52:41 +02:00
e820.h Make definitions for struct e820entry and struct e820map consistent 2007-07-12 10:55:54 -07:00
edac.h [PATCH] EDAC: core EDAC support code 2006-01-18 19:20:31 -08:00
elf.h i386: sched.h inclusion from module.h is baack 2007-05-08 11:15:08 -07:00
emergency-restart.h
errno.h
fb.h fbdev: detect primary display device 2007-07-17 10:23:11 -07:00
fcntl.h
fixmap.h serial: convert early_uart to earlycon for 8250 2007-07-16 09:05:35 -07:00
floppy.h IRQ: Maintain regs pointer globally rather than passing to IRQ handlers 2006-10-05 15:10:12 +01:00
frame.i Remove all inclusions of <linux/config.h> 2006-10-04 03:38:54 -04:00
futex.h [PATCH] mm: pagefault_{disable,enable}() 2006-12-07 08:39:21 -08:00
genapic.h [PATCH] x86: default to physical mode on hotplug CPU kernels 2007-05-02 19:27:04 +02:00
hardirq.h Don't include linux/config.h from anywhere else in include/ 2006-04-26 12:56:16 +01:00
highmem.h [PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages 2007-05-02 19:27:15 +02:00
hpet.h [PATCH] x86: adjust inclusion of asm/fixmap.h 2007-05-02 19:27:04 +02:00
hw_irq.h [PATCH] i386/x86_64: Remove global IO_APIC_VECTOR 2006-10-08 12:24:02 -07:00
hypertransport.h [PATCH] Initial generic hypertransport interrupt support 2006-10-04 07:55:29 -07:00
i387.h [PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu 2007-05-02 19:27:21 +02:00
i8253.h [PATCH] clockevents: i386 drivers 2007-02-16 08:13:59 -08:00
i8259.h
ide.h fix jvc cdrom drive lockup 2007-07-16 09:05:40 -07:00
intel_arch_perfmon.h [PATCH] x86: i386/x86-64 Add nmi watchdog support for new Intel CPUs 2006-09-26 10:52:27 +02:00
io_apic.h [PATCH] io_apic.h needs apicdef.h 2007-03-05 07:57:50 -08:00
io.h serial: convert early_uart to earlycon for 8250 2007-07-16 09:05:35 -07:00
ioctl.h [PATCH] Generic ioctl.h 2006-01-10 08:01:34 -08:00
ioctls.h tty: i386/x86_64 arbitary speed support 2007-05-08 11:15:03 -07:00
ipc.h
ipcbuf.h
irq_regs.h [PATCH] i386: Convert PDA into the percpu section 2007-05-02 19:27:16 +02:00
irq.h xen: Core Xen implementation 2007-07-18 08:47:42 -07:00
irqflags.h [PATCH] i386: Use X86_EFLAGS_IF in irqflags.h. 2007-05-02 19:27:10 +02:00
ist.h
k8.h [PATCH] x86_64: Clean and enhance up K8 northbridge access code 2006-06-26 10:48:15 -07:00
Kbuild [PATCH] x86: Clean up x86 control register and MSR macros (corrected) 2007-05-02 19:27:12 +02:00
kdebug.h Revert "ipmi: add new IPMI nmi watchdog handling" 2007-05-14 15:24:24 -07:00
kexec.h kdump/kexec: calculate note size at compile time 2007-05-08 11:15:07 -07:00
kmap_types.h Don't include linux/config.h from anywhere else in include/ 2006-04-26 12:56:16 +01:00
kprobes.h [PATCH] IA64: kprobe invalidate icache of jump buffer 2006-07-31 13:28:38 -07:00
ldt.h
linkage.h
local.h i386: fix early usage of atomic_add_return and local_add_return on real i386 2007-05-23 20:14:15 -07:00
math_emu.h [PATCH] i386: PDA: Fix math emulator for new pt_regs 2006-12-07 02:14:03 +01:00
mc146818rtc.h
mca_dma.h [PATCH] kernel-doc for kernel/dma.c 2006-10-03 08:03:41 -07:00
mca.h
mce.h [PATCH] i386: Move mce_disabled to asm/mce.h 2007-02-13 13:26:26 +01:00
mman.h [PATCH] add asm-generic/mman.h 2006-02-15 15:32:22 -08:00
mmu_context.h paravirt: unstatic leave_mm 2007-07-18 08:47:41 -07:00
mmu.h [PATCH] vdso: randomize the i386 vDSO by moving it into a vma 2006-06-27 17:32:38 -07:00
mmx.h
mmzone.h i386 mmzone: use __maybe_unused 2007-05-09 12:30:57 -07:00
module.h [PATCH] i386: Add an option for the VIA C7 which sets appropriate L1 cache 2007-05-02 19:27:05 +02:00
mpspec_def.h [PATCH] x86-64: remove remaining pc98 code 2006-12-07 02:14:19 +01:00
mpspec.h [PATCH] clockevents: i386 drivers 2007-02-16 08:13:59 -08:00
msgbuf.h
msidef.h [PATCH] genirq: i386 irq: Move msi message composition into io_apic.c 2006-10-04 07:55:28 -07:00
msr-index.h [PATCH] i386: Enable support for fixed-range IORRs to keep RdMem & WrMem in sync 2007-05-02 19:27:17 +02:00
msr.h i386: msr.h: be paranoid about types and parentheses 2007-05-09 12:49:33 -07:00
mtrr.h [PATCH] x86: Save the MTRRs of the BSP before booting an AP 2007-05-02 19:27:17 +02:00
mutex.h [PATCH] i386: Remove lock section support in mutex.h 2006-09-26 10:52:31 +02:00
namei.h
nmi.h [PATCH] i386: Clean up NMI watchdog code 2007-05-02 19:27:20 +02:00
numa.h
numaq.h
page.h Add __GFP_MOVABLE for callers to flag allocations from high memory that may be migrated 2007-07-17 10:22:59 -07:00
param.h [PATCH] cleanup asm/setup.h userspace visibility 2006-12-07 08:39:46 -08:00
paravirt.h Add a sched_clock paravirt_op 2007-07-18 08:47:42 -07:00
parport.h
pci-direct.h
pci.h PCI: remove pci_dac_dma_... APIs 2007-07-11 16:02:11 -07:00
percpu.h [PATCH] i386: Define per_cpu_offset 2007-05-02 19:27:16 +02:00
pgalloc.h paravirt: add an "mm" argument to alloc_pt 2007-07-18 08:47:40 -07:00
pgtable-2level-defs.h [PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing 2007-05-02 19:27:13 +02:00
pgtable-2level.h page table handling cleanup 2007-07-16 09:05:36 -07:00
pgtable-3level-defs.h [PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing 2007-05-02 19:27:13 +02:00
pgtable-3level.h page table handling cleanup 2007-07-16 09:05:36 -07:00
pgtable.h mm: remove ptep_test_and_clear_dirty and ptep_clear_flush_dirty 2007-07-17 10:22:59 -07:00
poll.h Consolidate asm/poll.h 2007-05-11 08:29:34 -07:00
posix_types.h i386: improve and correct inline asm memory constraints 2006-07-08 15:24:18 -07:00
processor-flags.h [PATCH] x86: Clean up x86 control register and MSR macros (corrected) 2007-05-02 19:27:12 +02:00
processor.h make seccomp zerocost in schedule 2007-07-16 09:05:50 -07:00
ptrace-abi.h [PATCH] Split i386 and x86_64 ptrace.h 2006-09-26 08:49:10 -07:00
ptrace.h [PATCH] i386: Profile pc badness 2007-02-13 13:26:21 +01:00
reboot_fixups.h [PATCH] i386: clean up mach_reboot_fixups 2007-05-02 19:27:06 +02:00
reboot.h [PATCH] i386: Add machine_ops interface to abstract halting and rebooting 2007-05-02 19:27:11 +02:00
required-features.h Use a new CPU feature word to cover features that are spread around 2007-07-12 10:55:54 -07:00
resource.h
rtc.h
rwlock.h [PATCH] i386: Clean up spin/rwlocks 2006-09-26 10:52:32 +02:00
rwsem.h [PATCH] lockdep: name some old style locks 2006-12-07 08:39:36 -08:00
scatterlist.h PCI: scatterlist.h needs types.h 2007-05-02 19:02:34 -07:00
seccomp.h
sections.h
segment.h [PATCH] i386: Fix UP gdt bugs 2007-05-02 19:27:16 +02:00
semaphore.h [PATCH] i386: Use early clobbers for semaphores now 2006-09-27 14:39:51 -07:00
sembuf.h
serial.h x86, serial: convert legacy COM ports to platform devices 2007-05-08 11:15:23 -07:00
setup.h paravirt: add a hook for once the allocator is ready 2007-07-18 08:47:41 -07:00
shmbuf.h
shmparam.h
sigcontext.h
siginfo.h
signal.h [PATCH] headers_check: move inclusion of <linux/linkage.h> in <asm-i386/signal.h> 2006-09-13 07:32:15 -07:00
smp.h paravirt: make siblingmap functions visible 2007-07-18 08:47:41 -07:00
socket.h [NET]: Adding SO_TIMESTAMPNS / SCM_TIMESTAMPNS support 2007-04-25 22:24:21 -07:00
sockios.h [NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution 2007-04-25 22:24:04 -07:00
sparsemem.h
spinlock_types.h [PATCH] Remove 'volatile' from spinlock_types 2006-12-06 14:39:53 -08:00
spinlock.h [PATCH] paravirt: Patch inline replacements for paravirt intercepts 2006-12-07 02:14:08 +01:00
srat.h
stacktrace.h [PATCH] i386: Do stacktracer conversion too 2006-09-26 10:52:34 +02:00
stat.h [PATCH] 2TB files: st_blocks is invalid when calling stat64 2006-03-26 08:57:00 -08:00
statfs.h
string.h Don't include linux/config.h from anywhere else in include/ 2006-04-26 12:56:16 +01:00
suspend.h Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 2006-12-07 08:59:11 -08:00
sync_bitops.h Fix misspellings collected by members of KJ list. 2007-05-09 07:14:03 +02:00
system.h x86: create asm/cmpxchg.h 2007-05-08 11:15:20 -07:00
termbits.h tty: i386/x86_64 arbitary speed support 2007-05-08 11:15:03 -07:00
termios.h tty: i386/x86_64 arbitary speed support 2007-05-08 11:15:03 -07:00
therm_throt.h [PATCH] x86: Add a cumulative thermal throttle event counter. 2006-09-26 10:52:42 +02:00
thread_info.h make seccomp zerocost in schedule 2007-07-16 09:05:50 -07:00
time.h [PATCH] vmi: pit override 2007-03-05 07:57:52 -08:00
timer.h Add a sched_clock paravirt_op 2007-07-18 08:47:42 -07:00
timex.h [PATCH] Time: i386 Conversion - part 2: Rework TSC Support 2006-06-26 09:58:21 -07:00
tlb.h
tlbflush.h Detach sched.h from mm.h 2007-05-21 09:18:19 -07:00
topology.h [PATCH] sched: remove SMT nice 2007-03-05 07:57:51 -08:00
tsc.h i386: work around miscompilation of alternatives code 2007-05-11 08:29:32 -07:00
types.h [PATCH] Centralise definitions of sector_t and blkcnt_t 2006-12-04 19:41:15 -08:00
uaccess.h [PATCH] i386: Update __copy_to_user_inatomic linuxdoc description 2007-05-02 19:27:06 +02:00
ucontext.h
unaligned.h
unistd.h signal/timer/event: eventfd wire up x86 arches 2007-05-11 08:29:37 -07:00
unwind.h Remove stack unwinder for now 2006-12-15 08:47:51 -08:00
user.h
vga.h [PATCH] vgacon: make VGA_MAP_MEM take size, remove extra use 2006-06-22 15:05:58 -07:00
vic.h [VOYAGER] fix up ptregs removal mess 2006-10-12 22:25:03 -05:00
vm86.h [PATCH] i386: Update sys_vm86 to cope with changed pt_regs and %gs usage 2006-12-07 02:14:03 +01:00
vmi_time.h Add a sched_clock paravirt_op 2007-07-18 08:47:42 -07:00
vmi.h [PATCH] vmi: apic ops 2007-03-05 07:57:52 -08:00
voyager.h [VOYAGER] Convert the monitor thread to use the kthread API 2007-05-01 10:09:29 -05:00
xor.h