Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
This commit is contained in:
Linus Torvalds
2005-04-16 15:20:36 -07:00
commit 1da177e4c3
17291 changed files with 6718755 additions and 0 deletions

View File

@@ -0,0 +1,69 @@
IRQ affinity on IA64 platforms
------------------------------
07.01.2002, Erich Focht <efocht@ess.nec.de>
By writing to /proc/irq/IRQ#/smp_affinity the interrupt routing can be
controlled. The behavior on IA64 platforms is slightly different from
that described in Documentation/IRQ-affinity.txt for i386 systems.
Because of the usage of SAPIC mode and physical destination mode the
IRQ target is one particular CPU and cannot be a mask of several
CPUs. Only the first non-zero bit is taken into account.
Usage examples:
The target CPU has to be specified as a hexadecimal CPU mask. The
first non-zero bit is the selected CPU. This format has been kept for
compatibility reasons with i386.
Set the delivery mode of interrupt 41 to fixed and route the
interrupts to CPU #3 (logical CPU number) (2^3=0x08):
echo "8" >/proc/irq/41/smp_affinity
Set the default route for IRQ number 41 to CPU 6 in lowest priority
delivery mode (redirectable):
echo "r 40" >/proc/irq/41/smp_affinity
The output of the command
cat /proc/irq/IRQ#/smp_affinity
gives the target CPU mask for the specified interrupt vector. If the CPU
mask is preceded by the character "r", the interrupt is redirectable
(i.e. lowest priority mode routing is used), otherwise its route is
fixed.
Initialization and default behavior:
If the platform features IRQ redirection (info provided by SAL) all
IO-SAPIC interrupts are initialized with CPU#0 as their default target
and the routing is the so called "lowest priority mode" (actually
fixed SAPIC mode with hint). The XTP chipset registers are used as hints
for the IRQ routing. Currently in Linux XTP registers can have three
values:
- minimal for an idle task,
- normal if any other task runs,
- maximal if the CPU is going to be switched off.
The IRQ is routed to the CPU with lowest XTP register value, the
search begins at the default CPU. Therefore most of the interrupts
will be handled by CPU #0.
If the platform doesn't feature interrupt redirection IOSAPIC fixed
routing is used. The target CPUs are distributed in a round robin
manner. IRQs will be routed only to the selected target CPUs. Check
with
cat /proc/interrupts
Comments:
On large (multi-node) systems it is recommended to route the IRQs to
the node to which the corresponding device is connected.
For systems like the NEC AzusA we get IRQ node-affinity for free. This
is because usually the chipsets on each node redirect the interrupts
only to their own CPUs (as they cannot see the XTP registers on the
other nodes).

43
Documentation/ia64/README Normal file
View File

@@ -0,0 +1,43 @@
Linux kernel release 2.4.xx for the IA-64 Platform
These are the release notes for Linux version 2.4 for IA-64
platform. This document provides information specific to IA-64
ONLY, to get additional information about the Linux kernel also
read the original Linux README provided with the kernel.
INSTALLING the kernel:
- IA-64 kernel installation is the same as the other platforms, see
original README for details.
SOFTWARE REQUIREMENTS
Compiling and running this kernel requires an IA-64 compliant GCC
compiler. And various software packages also compiled with an
IA-64 compliant GCC compiler.
CONFIGURING the kernel:
Configuration is the same, see original README for details.
COMPILING the kernel:
- Compiling this kernel doesn't differ from other platform so read
the original README for details BUT make sure you have an IA-64
compliant GCC compiler.
IA-64 SPECIFICS
- General issues:
o Hardly any performance tuning has been done. Obvious targets
include the library routines (IP checksum, etc.). Less
obvious targets include making sure we don't flush the TLB
needlessly, etc.
o SMP locks cleanup/optimization
o IA32 support. Currently experimental. It mostly works.

View File

@@ -0,0 +1,128 @@
EFI Real Time Clock driver
-------------------------------
S. Eranian <eranian@hpl.hp.com>
March 2000
I/ Introduction
This document describes the efirtc.c driver has provided for
the IA-64 platform.
The purpose of this driver is to supply an API for kernel and user applications
to get access to the Time Service offered by EFI version 0.92.
EFI provides 4 calls one can make once the OS is booted: GetTime(),
SetTime(), GetWakeupTime(), SetWakeupTime() which are all supported by this
driver. We describe those calls as well the design of the driver in the
following sections.
II/ Design Decisions
The original ideas was to provide a very simple driver to get access to,
at first, the time of day service. This is required in order to access, in a
portable way, the CMOS clock. A program like /sbin/hwclock uses such a clock
to initialize the system view of the time during boot.
Because we wanted to minimize the impact on existing user-level apps using
the CMOS clock, we decided to expose an API that was very similar to the one
used today with the legacy RTC driver (driver/char/rtc.c). However, because
EFI provides a simpler services, not all all ioctl() are available. Also
new ioctl()s have been introduced for things that EFI provides but not the
legacy.
EFI uses a slightly different way of representing the time, noticeably
the reference date is different. Year is the using the full 4-digit format.
The Epoch is January 1st 1998. For backward compatibility reasons we don't
expose this new way of representing time. Instead we use something very
similar to the struct tm, i.e. struct rtc_time, as used by hwclock.
One of the reasons for doing it this way is to allow for EFI to still evolve
without necessarily impacting any of the user applications. The decoupling
enables flexibility and permits writing wrapper code is ncase things change.
The driver exposes two interfaces, one via the device file and a set of
ioctl()s. The other is read-only via the /proc filesystem.
As of today we don't offer a /proc/sys interface.
To allow for a uniform interface between the legacy RTC and EFI time service,
we have created the include/linux/rtc.h header file to contain only the
"public" API of the two drivers. The specifics of the legacy RTC are still
in include/linux/mc146818rtc.h.
III/ Time of day service
The part of the driver gives access to the time of day service of EFI.
Two ioctl()s, compatible with the legacy RTC calls:
Read the CMOS clock: ioctl(d, RTC_RD_TIME, &rtc);
Write the CMOS clock: ioctl(d, RTC_SET_TIME, &rtc);
The rtc is a pointer to a data structure defined in rtc.h which is close
to a struct tm:
struct rtc_time {
int tm_sec;
int tm_min;
int tm_hour;
int tm_mday;
int tm_mon;
int tm_year;
int tm_wday;
int tm_yday;
int tm_isdst;
};
The driver takes care of converting back an forth between the EFI time and
this format.
Those two ioctl()s can be exercised with the hwclock command:
For reading:
# /sbin/hwclock --show
Mon Mar 6 15:32:32 2000 -0.910248 seconds
For setting:
# /sbin/hwclock --systohc
Root privileges are required to be able to set the time of day.
IV/ Wakeup Alarm service
EFI provides an API by which one can program when a machine should wakeup,
i.e. reboot. This is very different from the alarm provided by the legacy
RTC which is some kind of interval timer alarm. For this reason we don't use
the same ioctl()s to get access to the service. Instead we have
introduced 2 news ioctl()s to the interface of an RTC.
We have added 2 new ioctl()s that are specific to the EFI driver:
Read the current state of the alarm
ioctl(d, RTC_WKLAM_RD, &wkt)
Set the alarm or change its status
ioctl(d, RTC_WKALM_SET, &wkt)
The wkt structure encapsulates a struct rtc_time + 2 extra fields to get
status information:
struct rtc_wkalrm {
unsigned char enabled; /* =1 if alarm is enabled */
unsigned char pending; /* =1 if alarm is pending */
struct rtc_time time;
}
As of today, none of the existing user-level apps supports this feature.
However writing such a program should be hard by simply using those two
ioctl().
Root privileges are required to be able to set the alarm.
V/ References.
Checkout the following Web site for more information on EFI:
http://developer.intel.com/technology/efi/

286
Documentation/ia64/fsys.txt Normal file
View File

@@ -0,0 +1,286 @@
-*-Mode: outline-*-
Light-weight System Calls for IA-64
-----------------------------------
Started: 13-Jan-2003
Last update: 27-Sep-2003
David Mosberger-Tang
<davidm@hpl.hp.com>
Using the "epc" instruction effectively introduces a new mode of
execution to the ia64 linux kernel. We call this mode the
"fsys-mode". To recap, the normal states of execution are:
- kernel mode:
Both the register stack and the memory stack have been
switched over to kernel memory. The user-level state is saved
in a pt-regs structure at the top of the kernel memory stack.
- user mode:
Both the register stack and the kernel stack are in
user memory. The user-level state is contained in the
CPU registers.
- bank 0 interruption-handling mode:
This is the non-interruptible state which all
interruption-handlers start execution in. The user-level
state remains in the CPU registers and some kernel state may
be stored in bank 0 of registers r16-r31.
In contrast, fsys-mode has the following special properties:
- execution is at privilege level 0 (most-privileged)
- CPU registers may contain a mixture of user-level and kernel-level
state (it is the responsibility of the kernel to ensure that no
security-sensitive kernel-level state is leaked back to
user-level)
- execution is interruptible and preemptible (an fsys-mode handler
can disable interrupts and avoid all other interruption-sources
to avoid preemption)
- neither the memory-stack nor the register-stack can be trusted while
in fsys-mode (they point to the user-level stacks, which may
be invalid, or completely bogus addresses)
In summary, fsys-mode is much more similar to running in user-mode
than it is to running in kernel-mode. Of course, given that the
privilege level is at level 0, this means that fsys-mode requires some
care (see below).
* How to tell fsys-mode
Linux operates in fsys-mode when (a) the privilege level is 0 (most
privileged) and (b) the stacks have NOT been switched to kernel memory
yet. For convenience, the header file <asm-ia64/ptrace.h> provides
three macros:
user_mode(regs)
user_stack(task,regs)
fsys_mode(task,regs)
The "regs" argument is a pointer to a pt_regs structure. The "task"
argument is a pointer to the task structure to which the "regs"
pointer belongs to. user_mode() returns TRUE if the CPU state pointed
to by "regs" was executing in user mode (privilege level 3).
user_stack() returns TRUE if the state pointed to by "regs" was
executing on the user-level stack(s). Finally, fsys_mode() returns
TRUE if the CPU state pointed to by "regs" was executing in fsys-mode.
The fsys_mode() macro is equivalent to the expression:
!user_mode(regs) && user_stack(task,regs)
* How to write an fsyscall handler
The file arch/ia64/kernel/fsys.S contains a table of fsyscall-handlers
(fsyscall_table). This table contains one entry for each system call.
By default, a system call is handled by fsys_fallback_syscall(). This
routine takes care of entering (full) kernel mode and calling the
normal Linux system call handler. For performance-critical system
calls, it is possible to write a hand-tuned fsyscall_handler. For
example, fsys.S contains fsys_getpid(), which is a hand-tuned version
of the getpid() system call.
The entry and exit-state of an fsyscall handler is as follows:
** Machine state on entry to fsyscall handler:
- r10 = 0
- r11 = saved ar.pfs (a user-level value)
- r15 = system call number
- r16 = "current" task pointer (in normal kernel-mode, this is in r13)
- r32-r39 = system call arguments
- b6 = return address (a user-level value)
- ar.pfs = previous frame-state (a user-level value)
- PSR.be = cleared to zero (i.e., little-endian byte order is in effect)
- all other registers may contain values passed in from user-mode
** Required machine state on exit to fsyscall handler:
- r11 = saved ar.pfs (as passed into the fsyscall handler)
- r15 = system call number (as passed into the fsyscall handler)
- r32-r39 = system call arguments (as passed into the fsyscall handler)
- b6 = return address (as passed into the fsyscall handler)
- ar.pfs = previous frame-state (as passed into the fsyscall handler)
Fsyscall handlers can execute with very little overhead, but with that
speed comes a set of restrictions:
o Fsyscall-handlers MUST check for any pending work in the flags
member of the thread-info structure and if any of the
TIF_ALLWORK_MASK flags are set, the handler needs to fall back on
doing a full system call (by calling fsys_fallback_syscall).
o Fsyscall-handlers MUST preserve incoming arguments (r32-r39, r11,
r15, b6, and ar.pfs) because they will be needed in case of a
system call restart. Of course, all "preserved" registers also
must be preserved, in accordance to the normal calling conventions.
o Fsyscall-handlers MUST check argument registers for containing a
NaT value before using them in any way that could trigger a
NaT-consumption fault. If a system call argument is found to
contain a NaT value, an fsyscall-handler may return immediately
with r8=EINVAL, r10=-1.
o Fsyscall-handlers MUST NOT use the "alloc" instruction or perform
any other operation that would trigger mandatory RSE
(register-stack engine) traffic.
o Fsyscall-handlers MUST NOT write to any stacked registers because
it is not safe to assume that user-level called a handler with the
proper number of arguments.
o Fsyscall-handlers need to be careful when accessing per-CPU variables:
unless proper safe-guards are taken (e.g., interruptions are avoided),
execution may be pre-empted and resumed on another CPU at any given
time.
o Fsyscall-handlers must be careful not to leak sensitive kernel'
information back to user-level. In particular, before returning to
user-level, care needs to be taken to clear any scratch registers
that could contain sensitive information (note that the current
task pointer is not considered sensitive: it's already exposed
through ar.k6).
o Fsyscall-handlers MUST NOT access user-memory without first
validating access-permission (this can be done typically via
probe.r.fault and/or probe.w.fault) and without guarding against
memory access exceptions (this can be done with the EX() macros
defined by asmmacro.h).
The above restrictions may seem draconian, but remember that it's
possible to trade off some of the restrictions by paying a slightly
higher overhead. For example, if an fsyscall-handler could benefit
from the shadow register bank, it could temporarily disable PSR.i and
PSR.ic, switch to bank 0 (bsw.0) and then use the shadow registers as
needed. In other words, following the above rules yields extremely
fast system call execution (while fully preserving system call
semantics), but there is also a lot of flexibility in handling more
complicated cases.
* Signal handling
The delivery of (asynchronous) signals must be delayed until fsys-mode
is exited. This is acomplished with the help of the lower-privilege
transfer trap: arch/ia64/kernel/process.c:do_notify_resume_user()
checks whether the interrupted task was in fsys-mode and, if so, sets
PSR.lp and returns immediately. When fsys-mode is exited via the
"br.ret" instruction that lowers the privilege level, a trap will
occur. The trap handler clears PSR.lp again and returns immediately.
The kernel exit path then checks for and delivers any pending signals.
* PSR Handling
The "epc" instruction doesn't change the contents of PSR at all. This
is in contrast to a regular interruption, which clears almost all
bits. Because of that, some care needs to be taken to ensure things
work as expected. The following discussion describes how each PSR bit
is handled.
PSR.be Cleared when entering fsys-mode. A srlz.d instruction is used
to ensure the CPU is in little-endian mode before the first
load/store instruction is executed. PSR.be is normally NOT
restored upon return from an fsys-mode handler. In other
words, user-level code must not rely on PSR.be being preserved
across a system call.
PSR.up Unchanged.
PSR.ac Unchanged.
PSR.mfl Unchanged. Note: fsys-mode handlers must not write-registers!
PSR.mfh Unchanged. Note: fsys-mode handlers must not write-registers!
PSR.ic Unchanged. Note: fsys-mode handlers can clear the bit, if needed.
PSR.i Unchanged. Note: fsys-mode handlers can clear the bit, if needed.
PSR.pk Unchanged.
PSR.dt Unchanged.
PSR.dfl Unchanged. Note: fsys-mode handlers must not write-registers!
PSR.dfh Unchanged. Note: fsys-mode handlers must not write-registers!
PSR.sp Unchanged.
PSR.pp Unchanged.
PSR.di Unchanged.
PSR.si Unchanged.
PSR.db Unchanged. The kernel prevents user-level from setting a hardware
breakpoint that triggers at any privilege level other than 3 (user-mode).
PSR.lp Unchanged.
PSR.tb Lazy redirect. If a taken-branch trap occurs while in
fsys-mode, the trap-handler modifies the saved machine state
such that execution resumes in the gate page at
syscall_via_break(), with privilege level 3. Note: the
taken branch would occur on the branch invoking the
fsyscall-handler, at which point, by definition, a syscall
restart is still safe. If the system call number is invalid,
the fsys-mode handler will return directly to user-level. This
return will trigger a taken-branch trap, but since the trap is
taken _after_ restoring the privilege level, the CPU has already
left fsys-mode, so no special treatment is needed.
PSR.rt Unchanged.
PSR.cpl Cleared to 0.
PSR.is Unchanged (guaranteed to be 0 on entry to the gate page).
PSR.mc Unchanged.
PSR.it Unchanged (guaranteed to be 1).
PSR.id Unchanged. Note: the ia64 linux kernel never sets this bit.
PSR.da Unchanged. Note: the ia64 linux kernel never sets this bit.
PSR.dd Unchanged. Note: the ia64 linux kernel never sets this bit.
PSR.ss Lazy redirect. If set, "epc" will cause a Single Step Trap to
be taken. The trap handler then modifies the saved machine
state such that execution resumes in the gate page at
syscall_via_break(), with privilege level 3.
PSR.ri Unchanged.
PSR.ed Unchanged. Note: This bit could only have an effect if an fsys-mode
handler performed a speculative load that gets NaTted. If so, this
would be the normal & expected behavior, so no special treatment is
needed.
PSR.bn Unchanged. Note: fsys-mode handlers may clear the bit, if needed.
Doing so requires clearing PSR.i and PSR.ic as well.
PSR.ia Unchanged. Note: the ia64 linux kernel never sets this bit.
* Using fast system calls
To use fast system calls, userspace applications need simply call
__kernel_syscall_via_epc(). For example
-- example fgettimeofday() call --
-- fgettimeofday.S --
#include <asm/asmmacro.h>
GLOBAL_ENTRY(fgettimeofday)
.prologue
.save ar.pfs, r11
mov r11 = ar.pfs
.body
mov r2 = 0xa000000000020660;; // gate address
// found by inspection of System.map for the
// __kernel_syscall_via_epc() function. See
// below for how to do this for real.
mov b7 = r2
mov r15 = 1087 // gettimeofday syscall
;;
br.call.sptk.many b6 = b7
;;
.restore sp
mov ar.pfs = r11
br.ret.sptk.many rp;; // return to caller
END(fgettimeofday)
-- end fgettimeofday.S --
In reality, getting the gate address is accomplished by two extra
values passed via the ELF auxiliary vector (include/asm-ia64/elf.h)
o AT_SYSINFO : is the address of __kernel_syscall_via_epc()
o AT_SYSINFO_EHDR : is the address of the kernel gate ELF DSO
The ELF DSO is a pre-linked library that is mapped in by the kernel at
the gate page. It is a proper ELF shared object so, with a dynamic
loader that recognises the library, you should be able to make calls to
the exported functions within it as with any other shared library.
AT_SYSINFO points into the kernel DSO at the
__kernel_syscall_via_epc() function for historical reasons (it was
used before the kernel DSO) and as a convenience.

View File

@@ -0,0 +1,144 @@
SERIAL DEVICE NAMING
As of 2.6.10, serial devices on ia64 are named based on the
order of ACPI and PCI enumeration. The first device in the
ACPI namespace (if any) becomes /dev/ttyS0, the second becomes
/dev/ttyS1, etc., and PCI devices are named sequentially
starting after the ACPI devices.
Prior to 2.6.10, there were confusing exceptions to this:
- Firmware on some machines (mostly from HP) provides an HCDP
table[1] that tells the kernel about devices that can be used
as a serial console. If the user specified "console=ttyS0"
or the EFI ConOut path contained only UART devices, the
kernel registered the device described by the HCDP as
/dev/ttyS0.
- If there was no HCDP, we assumed there were UARTs at the
legacy COM port addresses (I/O ports 0x3f8 and 0x2f8), so
the kernel registered those as /dev/ttyS0 and /dev/ttyS1.
Any additional ACPI or PCI devices were registered sequentially
after /dev/ttyS0 as they were discovered.
With an HCDP, device names changed depending on EFI configuration
and "console=" arguments. Without an HCDP, device names didn't
change, but we registered devices that might not really exist.
For example, an HP rx1600 with a single built-in serial port
(described in the ACPI namespace) plus an MP[2] (a PCI device) has
these ports:
pre-2.6.10 pre-2.6.10
MMIO (EFI console (EFI console
address on builtin) on MP port) 2.6.10
========== ========== ========== ======
builtin 0xff5e0000 ttyS0 ttyS1 ttyS0
MP UPS 0xf8031000 ttyS1 ttyS2 ttyS1
MP Console 0xf8030000 ttyS2 ttyS0 ttyS2
MP 2 0xf8030010 ttyS3 ttyS3 ttyS3
MP 3 0xf8030038 ttyS4 ttyS4 ttyS4
CONSOLE SELECTION
EFI knows what your console devices are, but it doesn't tell the
kernel quite enough to actually locate them. The DIG64 HCDP
table[1] does tell the kernel where potential serial console
devices are, but not all firmware supplies it. Also, EFI supports
multiple simultaneous consoles and doesn't tell the kernel which
should be the "primary" one.
So how do you tell Linux which console device to use?
- If your firmware supplies the HCDP, it is simplest to
configure EFI with a single device (either a UART or a VGA
card) as the console. Then you don't need to tell Linux
anything; the kernel will automatically use the EFI console.
(This works only in 2.6.6 or later; prior to that you had
to specify "console=ttyS0" to get a serial console.)
- Without an HCDP, Linux defaults to a VGA console unless you
specify a "console=" argument.
NOTE: Don't assume that a serial console device will be /dev/ttyS0.
It might be ttyS1, ttyS2, etc. Make sure you have the appropriate
entries in /etc/inittab (for getty) and /etc/securetty (to allow
root login).
EARLY SERIAL CONSOLE
The kernel can't start using a serial console until it knows where
the device lives. Normally this happens when the driver enumerates
all the serial devices, which can happen a minute or more after the
kernel starts booting.
2.6.10 and later kernels have an "early uart" driver that works
very early in the boot process. The kernel will automatically use
this if the user supplies an argument like "console=uart,io,0x3f8",
or if the EFI console path contains only a UART device and the
firmware supplies an HCDP.
TROUBLESHOOTING SERIAL CONSOLE PROBLEMS
No kernel output after elilo prints "Uncompressing Linux... done":
- You specified "console=ttyS0" but Linux changed the device
to which ttyS0 refers. Configure exactly one EFI console
device[3] and remove the "console=" option.
- The EFI console path contains both a VGA device and a UART.
EFI and elilo use both, but Linux defaults to VGA. Remove
the VGA device from the EFI console path[3].
- Multiple UARTs selected as EFI console devices. EFI and
elilo use all selected devices, but Linux uses only one.
Make sure only one UART is selected in the EFI console
path[3].
- You're connected to an HP MP port[2] but have a non-MP UART
selected as EFI console device. EFI uses the MP as a
console device even when it isn't explicitly selected.
Either move the console cable to the non-MP UART, or change
the EFI console path[3] to the MP UART.
Long pause (60+ seconds) between "Uncompressing Linux... done" and
start of kernel output:
- No early console because you used "console=ttyS<n>". Remove
the "console=" option if your firmware supplies an HCDP.
- If you don't have an HCDP, the kernel doesn't know where
your console lives until the driver discovers serial
devices. Use "console=uart, io,0x3f8" (or appropriate
address for your machine).
Kernel and init script output works fine, but no "login:" prompt:
- Add getty entry to /etc/inittab for console tty. Look for
the "Adding console on ttyS<n>" message that tells you which
device is the console.
"login:" prompt, but can't login as root:
- Add entry to /etc/securetty for console tty.
[1] http://www.dig64.org/specifications/DIG64_PCDPv20.pdf
The table was originally defined as the "HCDP" for "Headless
Console/Debug Port." The current version is the "PCDP" for
"Primary Console and Debug Port Devices."
[2] The HP MP (management processor) is a PCI device that provides
several UARTs. One of the UARTs is often used as a console; the
EFI Boot Manager identifies it as "Acpi(HWP0002,700)/Pci(...)/Uart".
The external connection is usually a 25-pin connector, and a
special dongle converts that to three 9-pin connectors, one of
which is labelled "Console."
[3] EFI console devices are configured using the EFI Boot Manager
"Boot option maintenance" menu. You may have to interrupt the
boot sequence to use this menu, and you will have to reset the
box after changing console configuration.