Friday, January 18, 2013

How to find and kill zombie processes on Linux

SkyHi @ Friday, January 18, 2013

A process is called a zombie process if the process has been completed, but its PID and process entry remains in the Linux process table. A process is removed from the process table when the process is completed, and its parent process reads the completed process’ exit status by using the wait() system call. If a parent process fails to call wait() for whatever reason, its child process will be left in the process table, becoming a zombie.
To find zombie processes on Linux:
$ ps axo stat,ppid,pid,comm | grep -w defunct
Z     250 10242 fake-prog 
The above command searches for processes with zombie (defunct) state, and displays them in (state, PPID, PID, command-name) format. The sample output shows that there is one zombie process associated with “fake-prog”, and it was spawned by a parent process with PID 250.
Killing zombie processes is not obvious since zombie processes are already dead. You can try two options to kill a zombie process on Linux as follows.
First, you can try sending SIGCHLD signal to the zombie’s parent process using the kill command. Note that the above command gives you PPID (PID of parent process) of each zombie. In our example, PPID of the zombie is 250.
$ sudo kill -s SIGCHLD 250
If a zombie process still does not go away, you can kill the parent process (e.g., 250) of the zombie.
$ sudo kill -9 250
Once its parent process gets killed, the zombie will be adopted by the init process, which is a parent of all processes in Linux. The init process periodically calls wait() to reap any zombie process.

Zombie process is an inactive computer process, according to wikipedia article, "...On Unix operating systems, a zombie process or defunct process is a process that has completed execution but still has an entry in the process table, allowing the process that started it to read its exit status. In the term's colorful metaphor, the child process has died but has not yet been reaped..."

So how do I find out zombie process?

Use top or ps command:
# top
OR
# ps aux | awk '{ print $8 " " $2 }' | grep -w Z
Output:
Z 4104
Z 5320
Z 2945

How do I kill zombie process?

You cannot kill zombies, as they are already dead. But if you have too many zombies then kill parent process or restart service.
You can kill zombie process using PID obtained from any one of the above command. For example kill zombie proces having PID 4104:
# kill -9 4104
Please note that kill -9 does not guarantee to kill a zombie process (see below for more info).

How do I automate zombie process killing?

Write a script and schedule as a cron job.




Zombies don’t just appear in scary movies anymore, sometimes they also appear on your Linux systems; but don’t fret they are mostly harmless.

What is a Zombie Process?

Before we get started I wanted to first cover what exactly a Zombie process is.
Linux and Unix both have the ability for a process to create a sub process otherwise known as a “Child Process”. Once a process creates a new sub process the first process then becomes a “Parent Process” as it has spawned a child process during its execution.
A Zombie or defunct process is a process that has finished its execution and is waiting for its Parent Process to read its exit status. Because the child process has finished, it is technically a “dead” process however since it is waiting for its parent there is still an entry in the process table. The zombie’s parent process does not necessarily need to be running for a zombie to appear, however it is most common to see a zombie process whose parent has died unexpectedly.

How to spot a Zombie Process

Zombie processes can be found easily with the ps command. Within the ps output there is a STAT column which will show the processes current status, a zombie process will have Z as the status. In addition to the STAT column zombies commonly have the words in the CMD column as well.
Example:
$ ps -elf | grep Z
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
1 Z madflojo 28827 28821 0 80 0 - 0 exit 12:28 pts/4 00:00:00 [zombies.aahhh]
1 Z madflojo 28828 28821 0 80 0 - 0 exit 12:28 pts/4 00:00:00 [zombies.aahhh]
1 Z madflojo 28829 28821 0 80 0 - 0 exit 12:28 pts/4 00:00:00 [zombies.aahhh]
1 Z madflojo 28830 28821 0 80 0 - 0 exit 12:28 pts/4 00:00:00 [zombies.aahhh]
1 Z madflojo 28831 28821 0 80 0 - 0 exit 12:28 pts/4 00:00:00 [zombies.aahhh]

What is the difference between a Zombie and Orphaned Process?

Orphaned processes are very similar to Zombie processes; however there is one major difference. An Orphaned process is a child process that is still an active process whose parent has died. Unlike zombies the orphaned process will be reclaimed or adopted by the init process.

How to spot an Orphaned Process

Orphaned processes can be found easily with the ps command as well. Within the ps output there is a PPID column which will show the processes parent process id; a orphaned process will have the PPID of 1 which is the init process.
You may be thinking to yourself, how do I differentiate an Orphaned process from a Daemon process? Well in short, there is no difference. For all intents and purposes a daemon process is a orphaned process, however the exiting of the parent process is on purpose rather than by error.
Example:
$ ps -elf | grep sshd
4 S root 718 1 0 80 0 - 12487 poll_s Jun07 ? 00:00:00 /usr/sbin/sshd -D

What to do about Zombie Processes?

Before performing any activity to clean up zombie processes it is best to identify the root cause of the issue. Zombie processes do not indicate a normal state for your system, they may be benign for now however like real zombies they become more troublesome when they are in large numbers. They also indicate either a system issue or an application issue depending on the source of the processes.
The steps necessary to clean up zombie processes is complicated and very situational, below are a couple of high level answers that can guide you to a solution.

If the parent process is still active

If the parent process of the zombie or zombies is still active (not process id 1) than this is an indication that the parent process is stalled on a certain task and has not yet read the exit status of the child processes. At this point the resolution is extremely situational, you can use the strace command to attach to the parent process and troubleshoot from there.
You may also be able to make the parent process exit cleanly taking its zombie children with it by issuing the kill command. If you do run the kill command I suggest that you run a kill with the default signal -15 (SIGTERM) rather than using a -9 (SIGKILL); as SIGTERM will tell the parent process to exit cleanly which is more likely to read the exit status of the zombie children.

If the parent process is no longer active

If  the parent process is no longer active than the clean up activity becomes a choice; at this point you can leave the zombie processes on your system, or you can simply reboot. A Zombie process whose parent is no longer active is not going to be cleaned up without rebooting the system. If the zombie processes are only in small numbers and not reoccurring or multiplying than it may be best to leave these processes be until the next reboot. If however they are multiplying or in a large number than this is an indication that there is a significant issue with your system.



REFERENCES