Linux, managing child processes executing concurrently

Concurrent processing is complicated, it’s better to avoid it when possible, but when there is a priority for faster execution or there is a need to have multiple functionalities(programs) simultaneously we have to go for it. Using ampersand(&) symbol we can run processes in background or subshell, this way they run concurrently.When running multiple processes we have to keep track the status of all the processes like their exit codes, stdout, stderr etc..also we should be able to control their life cycle like able to kill them, stop them, pause them etc..

So, to know the exit codes of child process simply store the exit code using “$?” and to know stdout, stderr, we have to redirect the output to a file.

“$?” holds the last executed programs status code.

Example: Monitoring child process status:

Consider the script that just sleeps and spits the exit code to file.

#!/bin/sh

sleep 10

echo $? > proc1

Save the file with name ‘script1.sh’ then make it executable.

Similarly, create another file with script2.sh with following contents.

#!/bin/sh

sleep 5

echo $? > proc2

Now run these two scripts concurrently using ampersand symbol.

./script1.sh & ./script2.sh &

In this way the parent process(here the shell session running) can know the exit codes by reading proc1 and proc2 files about the these background process, same way we can redirect the stdout and stderr to files and get to know the status of these child processes.

Example: Controlling child process:

Consider we want to kill the child process, we can do it if we have pid of the child process, so that we can send SIGTERM or SIGKILL signals to kill it. To store the pid of every child process that is going to spawn use “$!” notation, this holds the last run program’s pid.

So in the above scripts replace $? with $!, and run the same command then pids can be read from the files, as we have pids of the child process we have control to stop, kill etc..

Always kill the child process from direct parent:

Linux will reuse pids of dead process, so when we store the pids of child processes of child processes, these grandchildren(not sure of this terminology) may be dead along time ago and you are trying to kill them which could be fatal because the kernel may have assigned those pids to other processes.

However the case is different for direct child processes, the child process that dies before the parent will be considered as zombie process..it doesn’t do anything but its pid is not reaped by the kernel, so the direct parent can send kill signal any no. of times.

Example:

Following script creates a child process which runs for a long time.

#!/bin/sh

echo $$ >> pid.txt # current script pid

sleep 315360000 & # I will be alive for 10 years…

echo $! >> pid.txt # 10 years sleep command pid

Name the script long_script.sh and make it executable.

Run: echo $$ > pid.txt; ./long_script &

Content of pid.txt:

1125

6980

6981

1125 - Current terminal shell session pid.

6980 - Child of 1125, the script pid

6981 - Child of 6980, the long sleeping command pid

Here do not try killing 6981 from 1125 process thinking 6981 is the child of 6980, because when 6980 and 6981 processes exit, the kernel will reap the pid 6981 and assign to some other process, but it will not reap 6980 because this process is the direct child of 1125 which is still alive, hence 6980 is tagged as zombie process. So control 6980 child using 1125 parent process and 6981 child process from 6980 parent process.

Exec Command:

When a program is run using ‘exec’ then the current shell is replaced with the program without creating a new process.So when you run ‘exec <some_command>’ in a terminal, the terminal shell session exits and replaced by that some_command.This helps is in above scenario when you want to control 6981 process from 1125 which is not a direct parent. To make this happen run the long sleep command with exec, this way sleep command will not create a new process but instead, it will get 6980 as its pid, i.e. the 6980 shell session is replaced by 6981 process.

#!/bin/sh

echo $$ >> pid.txt # current script pid

exec sleep 315360000 # I will be alive for 10 years…

Run: echo $$ > pid.txt; ./long_script &

Pid.txt contents:

7414

7696

7414 is current terminal shell session pid, 7696 is the script pid, then script process is replaced by sleep command, so 7696 is the pid of sleep command. Now 7696 is the direct child of 7414 parent, so 7696 can be controlled by 7414.

When Parent Process dies:

When a child’s parent process dies, child processes are considered orphan processes, init process will adopt these orphan processes hence init process becomes the new parent process pid.

To make sure all the child processes die when parent dies we can use prctl PR_SET_PDEATHSIG to inform child when parent dies, so that child knows when the parent dies and exit.