linux-kernel-test/kernel
Eric W. Biederman 50804fe373 pidns: Support unsharing the pid namespace.
Unsharing of the pid namespace unlike unsharing of other namespaces
does not take affect immediately.  Instead it affects the children
created with fork and clone.  The first of these children becomes the init
process of the new pid namespace, the rest become oddball children
of pid 0.  From the point of view of the new pid namespace the process
that created it is pid 0, as it's pid does not map.

A couple of different semantics were considered but this one was
settled on because it is easy to implement and it is usable from
pam modules.  The core reasons for the existence of unshare.

I took a survey of the callers of pam modules and the following
appears to be a representative sample of their logic.
{
	setup stuff include pam
	child = fork();
	if (!child) {
		setuid()
                exec /bin/bash
        }
        waitpid(child);

        pam and other cleanup
}

As you can see there is a fork to create the unprivileged user
space process.  Which means that the unprivileged user space
process will appear as pid 1 in the new pid namespace.  Further
most login processes do not cope with extraneous children which
means shifting the duty of reaping extraneous child process to
the creator of those extraneous children makes the system more
comprehensible.

The practical reason for this set of pid namespace semantics is
that it is simple to implement and verify they work correctly.
Whereas an implementation that requres changing the struct
pid on a process comes with a lot more races and pain.  Not
the least of which is that glibc caches getpid().

These semantics are implemented by having two notions
of the pid namespace of a proces.  There is task_active_pid_ns
which is the pid namspace the process was created with
and the pid namespace that all pids are presented to
that process in.  The task_active_pid_ns is stored
in the struct pid of the task.

Then there is the pid namespace that will be used for children
that pid namespace is stored in task->nsproxy->pid_ns.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19 05:59:16 -08:00
..
debug KGDB/KDB fixes and cleanups 2012-10-13 11:16:58 +09:00
events pidns: Use task_active_pid_ns where appropriate 2012-11-19 05:59:09 -08:00
gcov
irq irqdomain: augment add_simple() to allocate descs 2012-10-10 08:57:26 +02:00
power Merge branch 'pm-qos' 2012-09-17 20:25:51 +02:00
sched Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-10-12 22:13:05 +09:00
time Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-10-12 22:17:48 +09:00
trace Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent 2012-10-21 19:53:34 +02:00
.gitignore
acct.c vfs: make path_openat take a struct filename pointer 2012-10-12 20:15:09 -04:00
async.c [SCSI] async: make async_synchronize_full() flush all work regardless of domain 2012-07-20 09:07:37 +01:00
audit_tree.c audit: clean up refcounting in audit-tree 2012-08-15 12:55:22 +02:00
audit_watch.c audit: optimize audit_compare_dname_path 2012-10-12 00:32:02 -04:00
audit.c fs: handle failed audit_log_start properly 2012-10-09 23:33:37 -04:00
audit.h audit: optimize audit_compare_dname_path 2012-10-12 00:32:02 -04:00
auditfilter.c audit: optimize audit_compare_dname_path 2012-10-12 00:32:02 -04:00
auditsc.c audit: make audit_inode take struct filename 2012-10-12 20:15:09 -04:00
backtracetest.c
bounds.c
capability.c userns: Teach inode_capable to understand inodes whose uids map to other namespaces. 2012-05-15 14:59:24 -07:00
cgroup_freezer.c cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them 2012-09-14 12:01:16 -07:00
cgroup.c pidns: Use task_active_pid_ns where appropriate 2012-11-19 05:59:09 -08:00
compat.c new helper: sigsuspend() 2012-05-21 23:52:30 -04:00
configs.c
cpu_pm.c kernel/cpu_pm.c: fix various typos 2012-05-31 17:49:27 -07:00
cpu.c CPU hotplug, debug: detect imbalance between get_online_cpus() and put_online_cpus() 2012-10-09 16:22:15 +09:00
cpuset.c cpusets: Remove/update outdated comments 2012-07-24 13:53:28 +02:00
crash_dump.c
cred.c userns: Make credential debugging user namespace safe. 2012-08-23 22:54:18 -07:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c pidns: Wait in zap_pid_ns_processes until pid_ns->nr_hashed == 1 2012-11-19 05:59:12 -08:00
extable.c extable: Skip sorting if sorted at build time. 2012-04-19 15:06:55 -07:00
fork.c pidns: Support unsharing the pid namespace. 2012-11-19 05:59:16 -08:00
freezer.c
futex_compat.c
futex.c futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi() 2012-07-24 16:02:57 +02:00
groups.c userns: Convert in_group_p and in_egroup_p to use kgid_t 2012-05-03 03:29:33 -07:00
hrtimer.c hrtimer: Update hrtimer base offsets each hrtimer_interrupt 2012-07-11 23:34:39 +02:00
hung_task.c hung task debugging: Inject NMI when hung and going to panic 2012-04-25 12:39:25 +02:00
irq_work.c irq_work: fix compile failure on tile from missing include 2012-04-13 13:15:16 -04:00
itimer.c itimer: Use printk_once instead of WARN_ONCE 2012-04-10 11:00:30 +02:00
jump_label.c jump_label: Export jump_label_rate_limit() 2012-08-06 19:00:35 +03:00
kallsyms.c vsprintf: fix %ps on non symbols when using kallsyms 2012-05-29 16:22:32 -07:00
kcmp.c syscalls, x86: add __NR_kcmp syscall 2012-05-31 17:49:32 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking: Adjust spin lock inlining Kconfig options 2012-09-13 17:56:13 +02:00
Kconfig.preempt
kexec.c kdump: remove unneeded include 2012-10-06 03:05:19 +09:00
kfifo.c [media] kernel:kfifo: export __kfifo_max_r symbol 2012-04-11 18:24:37 -03:00
kmod.c infrastructure for saner ret_from_kernel_thread semantics 2012-10-12 13:35:07 -04:00
kprobes.c kprobes/x86: Fix to support jprobes on ftrace-based kprobe 2012-09-13 22:52:11 -04:00
ksysfs.c
kthread.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2012-10-13 10:05:52 +09:00
latencytop.c
lglock.c brlocks/lglocks: turn into functions 2012-05-29 23:28:41 -04:00
lockdep_internals.h
lockdep_proc.c
lockdep_states.h
lockdep.c lockdep: Check if nested lock is actually held 2012-09-13 17:00:44 +02:00
Makefile Makefile: Documentation for external tool should be correct 2012-10-25 16:00:53 -07:00
modsign_pubkey.c MODSIGN: Provide module signing public keys to the kernel 2012-10-10 20:01:22 +10:30
module_signing.c module_signing: fix printk format warning 2012-10-22 08:56:34 +03:00
module-internal.h MODSIGN: Move the magic string to the end of a module and eliminate the search 2012-10-19 17:30:40 -07:00
module.c MODSIGN: Move the magic string to the end of a module and eliminate the search 2012-10-19 17:30:40 -07:00
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
nsproxy.c pidns: Support unsharing the pid namespace. 2012-11-19 05:59:16 -08:00
padata.c
panic.c panic: fix a possible deadlock in panic() 2012-07-30 17:25:13 -07:00
params.c params: replace printk(KERN_<LVL>...) with pr_<lvl>(...) 2012-05-04 17:28:18 -07:00
pid_namespace.c pidns: Support unsharing the pid namespace. 2012-11-19 05:59:16 -08:00
pid.c pidns: Wait in zap_pid_ns_processes until pid_ns->nr_hashed == 1 2012-11-19 05:59:12 -08:00
posix-cpu-timers.c
posix-timers.c
printk.c printk: Fix scheduling-while-atomic problem in console_cpu_notify() 2012-10-16 18:17:44 -07:00
profile.c
ptrace.c ptrace: mark __ptrace_may_access() static 2012-08-03 14:47:17 +10:00
range.c
rcu.h
rcupdate.c rcu: Add PROVE_RCU_DELAY to provoke difficult races 2012-09-23 07:42:49 -07:00
rcutiny_plugin.h rcu: Move TINY_PREEMPT_RCU away from raw_local_irq_save() 2012-09-23 07:42:51 -07:00
rcutiny.c rcu: Move TINY_RCU quiescent state out of extended quiescent state 2012-09-23 07:42:52 -07:00
rcutorture.c rcu: Prevent initialization race in rcutorture kthreads 2012-09-23 07:42:23 -07:00
rcutree_plugin.h rcu: Make RCU_FAST_NO_HZ handle adaptive ticks 2012-09-26 15:44:02 +02:00
rcutree_trace.c Merge remote-tracking branch 'tip/smp/hotplug' into next.2012.09.25b 2012-09-25 10:01:45 -07:00
rcutree.c rcu: Grace-period initialization excludes only RCU notifier 2012-10-08 09:06:38 -07:00
rcutree.h rcu: Grace-period initialization excludes only RCU notifier 2012-10-08 09:06:38 -07:00
relay.c splice: fix racy pipe->buffers uses 2012-06-13 21:16:42 +02:00
res_counter.c rescounters: add res_counter_uncharge_until() 2012-05-29 16:22:27 -07:00
resource.c kernel/resource.c: fix stack overflow in __reserve_region_with_split() 2012-10-06 03:05:31 +09:00
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rwsem.c
seccomp.c seccomp: fix build warnings when there is no CONFIG_SECCOMP_FILTER 2012-04-18 12:24:52 +10:00
semaphore.c semaphore: fix improper comment reference to mutex 2012-04-05 17:15:55 -07:00
signal.c pidns: Use task_active_pid_ns where appropriate 2012-11-19 05:59:09 -08:00
smp.c smp: Remove ipi_call_lock[_irq]()/ipi_call_unlock[_irq]() 2012-06-05 17:27:14 +02:00
smpboot.c hotplug: Fix UP bug in smpboot hotplug code 2012-08-13 17:01:07 +02:00
smpboot.h smpboot: Provide infrastructure for percpu hotplug threads 2012-08-13 17:01:07 +02:00
softirq.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-10-01 10:43:39 -07:00
spinlock.c
srcu.c workqueue: deprecate system_nrt[_freezable]_wq 2012-08-20 14:51:24 -07:00
stacktrace.c
stop_machine.c
sys_ni.c syscalls, x86: add __NR_kcmp syscall 2012-05-31 17:49:32 -07:00
sys.c use clamp_t in UNAME26 fix 2012-10-19 18:51:17 -07:00
sysctl_binary.c pidns: Use task_active_pid_ns where appropriate 2012-11-19 05:59:09 -08:00
sysctl.c Kconfig: clean up the "#if defined(arch)" list for exception-trace sysctl entry 2012-10-09 16:22:14 +09:00
task_work.c task_work: task_work_add() should not succeed after exit_task_work() 2012-09-13 16:47:34 +02:00
taskstats.c taskstats: cgroupstats_user_cmd() may leak on error 2012-10-06 03:05:31 +09:00
test_kprobes.c
time.c time: Move update_vsyscall definitions to timekeeper_internal.h 2012-09-24 12:38:06 -04:00
timeconst.pl
timer.c timers: Fix endless looping between cascade() and internal_add_timer() 2012-10-09 21:27:14 +02:00
tracepoint.c
tsacct.c userns: Convert taskstats to handle the user and pid namespaces. 2012-09-18 01:01:32 -07:00
uid16.c userns: Convert setting and getting uid and gid system calls to use kuid and kgid 2012-05-03 03:28:41 -07:00
up.c
user_namespace.c userns: Add kprojid_t and associated infrastructure in projid.h 2012-09-18 01:01:37 -07:00
user-return-notifier.c
user.c userns: Add kprojid_t and associated infrastructure in projid.h 2012-09-18 01:01:37 -07:00
utsname_sysctl.c
utsname.c userns: Use cred->user_ns instead of cred->user->user_ns 2012-04-07 16:55:51 -07:00
wait.c
watchdog.c watchdog: Use hotplug thread infrastructure 2012-08-13 17:01:07 +02:00
workqueue_sched.h
workqueue.c workqueue: cancel_delayed_work() should return %false if work item is idle 2012-10-24 12:38:16 -07:00