wonderfly Blog

UTLK - CHAPTER 3 Processes

Processes, Lightweight Processes, and Threads

In this book, the term “process” refers to an instance of a program in execution. From the kernel’s point of view, a process is an entity to which system resources (CPU time, memory, etc.) are allocated.

In older Unix systems, when a process is created, it shares the same code as its parent (the text segment) while having its own data segment so that changes it makes won’t affect the parent and vice versa. Modern Unix systems need to support multithreaded applications, i.e., multiple execution flows of the same program need to share some section of the data segment. In earlier Linux kernel versions, multithreaded applications are implemented in User Mode, by the POSIX threading libraries (pthread).

In newer Linux versions, Light Weight Processes (LWP) are used to create threads. LWPs are processes that can share some resources like open files, the address space, and so on. Whenever one of them modifies a shared resource, the others immediately see the change. A straight forward way to implement multithreaded applications is to associate a LWP to each user thread. Many pthread libraries do this.

Thread Groups. Linux uses thread groups to manage a set of LWPs, which together act as a whole with regards to some system calls like getpid(), kill(), _exit(). We are going to discuss these at length later, but basically these syscalls all use the thread group leader’s ID as process ID, treating a group of LWPs as one process.

Process Descriptor

Process descriptor is the data structure that encodes everything the kernel needs to know about a process - whether is running on a CPU or blocked on some events, its scheduling priority, address space allocated, open files, etc. In Linux, the type task_struct is defined for process descriptors (figure 3-1). The six data structures on the right hand side refer to specific resources owned by the process, and will be covered in future chapters. This chapter focuses on two of them: process state and process parent/child relationships.

fig-3-1 Figure 3-1. The Linux Process Descriptor

Process State

This field describes the state of the process - running, stopped etc. The current implementation defines it as an array of flags (bits), each describes a possible state. All states are mutually exclusive, hence at any given time, only one flag is set and others are all cleared. Possible states (source):

  • TASK_RUNNING. The process is either running on a CPU or waiting to be executed.
  • TASK_INTERRUPTIBLE. The process is suspended (sleeping) until some condition is met - an hardware interrupt (e.g., disk controller), a lock is released, a signal is received, etc.
  • TASK_UNINTERRUPTIBLE. Similar to above, except that delivering a signal to the sleeping process leaves its state unchanged. That is, it can be waken up only when if certain condition is met, say, a hardware interrupt.
  • TASK_STOPPED. Process execution has been stopped; upon receiving SIGSTOP, SIGTSTP, SIGTTIN or SIGTTOU.
  • TASK_TRACED. Process execution has been stopped by a debugger - think ptrace.

Apart from these five states, the state field (and the exit_state field) can have two additional states. As their name suggests, a process can reach on of these states only when they are terminated:

  • EXIT_ZOMBIE. Process is terminated, but its parent hasn’t called wait4() or waitpid() yet. This state is important because the kernel cannot discard a process’s descriptor until its parent has called a wait()-like syscall as per a Unit design principle. This will be detailed later in this chapter.
  • EXIT_DEAD. The final state: the process is being removed from the system because its parent has called wait4() or waitpid() for it. The next phase of the process’ life is the descriptor being deleted. So you may ask, “why not just delete the descriptor already?”. Well, having this last state is useful to avoid race conditions due to other threads that execute wait()-like calls on the same process (see Chapter 5).

The value of the state field is usually set with a simple assignment, e.g.:

p->state = TASK_RUNNING;

The kernel also uses the set_task_state and set_current_state macros to set the state of a specified process or the current running process. These macros also ensure atomicity of these instructions. See Chapter 5 for more details on this.

Identifying a Process

Process descriptor pointers. Each execution context that can be scheduled independently must have its own process descriptor; even lightweight processes (threads) have their own task_struct. The strict one-to-one mapping between processes and process descriptors makes the address of the task_struct structure a useful means for the kernel to identify processes. These addresses are referred to as process descriptor pointers.

PIDs. On the other hand, users like to address a process by an integer called the Process ID or PID. To satisfy this requirement the kernel stores PID in the pid field of the task_struct. PIDs are numbered sequentially and there is an upper limit (determined by /proc/sys/kernel/pid_max). When the kernel reaches this limit, it must start recycling the lower, unused PIDs. If the rate of process/thread creation exceeds the rate of PID recycle, process creation is going to fail with likely an EAGAIN (-11) error. The default for /proc/sys/kernel/pid_max is usually 32K, and can be lifted up to 4,194,303 on a 64-bit system.

pidmap_array. When recycling PID numbers, the kernel needs to know which of the 32K numbers are currently assigned and which are unused. It uses a bit map for that, called the pidmap_array. On 32-bit systems, a page contains 32K bits (4K bytes), which is perfect for storage of a pidmap_array. On 64-bit systems, multiple pages might be needed for the pidmap_array depending on the value of /proc/sys/kernel/pid_max. These pages are never released.

Threads. Each lightweight process has its own PID. To satisfy the user application requirement that multi-threaded applications should have only one PID, the kernel groups lightweight processes that belong to the same application (thread group), and makes system calls like getpid() return the PID of the thread group leader. In each LWP’s task_struct, the tgid field stores the PID of the thread group leader and for the thread group leader itself, that value is equal to its pid value.

PID to process descriptor pointer. Many system calls like kill() use PID to denote the affected process. It is thus important to make the process descriptor lookup efficient. As we will see soon enough, Linux has a few tricks to do that.

Relationships Among Processes