Find out what's causing the high load
In *nix, there's a really handy command called TOP. What TOP does is display process information about the currently running programs. With some of it's options, and a little output redirection, we can get a glimpse into what's causing our high load. Here's the command...
top -b -i -n 20 >> ./top_procs
What that does is tell TOP to run in "batch" mode (not look for any user input), show only running processes, loop 20 times, and append the output to the file /top_procs. Run that command when you are experiencing a high server load. Then you can view the contents of that file to tell you some information. To view the file, you can either open it in your favorite editor (vim?), or simply use "cat ./top_procs | less". Now, that will give you a bunch of output like this:
top - 11:06:36 up 69 days, 2:53, 0 users, load average: 0.02, 0.05, 0.07 Tasks: 137 total, 1 running, 136 sleeping, 0 stopped, 0 zombie Cpu(s): 2.3% us, 0.5% sy, 0.0% ni, 97.1% id, 0.2% wa, 0.0% hi, 0.0% si Mem: 12278340k total, 12230332k used, 48008k free, 363352k buffers Swap: 16386292k total, 157092k used, 16229200k free, 2699912k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8066 root 15 0 1888 1032 776 R 0.1 0.0 0:00.02 top Tasks: 137 total, 1 running, 136 sleeping, 0 stopped, 0 zombie Cpu(s): 2.8% us, 1.5% sy, 0.0% ni, 94.6% id, 1.1% wa, 0.0% hi, 0.0% si Mem: 12278340k total, 12230740k used, 47600k free, 361956k buffers Swap: 16386292k total, 157092k used, 16229200k free, 2696368k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8066 root 15 0 1880 944 704 R 2.0 0.0 0:00.01 top top - 11:06:46 up 69 days, 2:53, 0 users, load average: 0.09, 0.06, 0.07 Tasks: 137 total, 3 running, 134 sleeping, 0 stopped, 0 zombie Cpu(s): 2.2% us, 0.3% sy, 0.0% ni, 97.2% id, 0.3% wa, 0.0% hi, 0.0% si Mem: 12278340k total, 12173908k used, 104432k free, 363416k buffers Swap: 16386292k total, 157092k used, 16229200k free, 2696988k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8066 root 16 0 1888 1032 776 R 0.1 0.0 0:00.03 top 6103 mailman 16 0 10536 7204 2744 R 0.0 0.1 0:33.08 python2.4 6108 mailman 16 0 10172 6904 2648 R 0.0 0.1 0:37.92 python2.4
What does all of that mean? It's really not as bad as it seems. If you break it down, it's really just 3 repetitions of almost the same information. Here's what it means, line by line.
- Line 1 - General server information - Current time, uptime (since last restart of server), number of users logged on (other than yourself), and the load average for the last 1, 5, and 15 minutes
- Line 2 - Tasks - Number of processes, number of actively running processes, sleeping process, stopped process, and zombie processes
- Line 3 - CPU usage info (User, System, Nice, Idle, Waiting, Hardware Interrupts, Software Interrupts). Just worry about Idle, user, system, and waiting.
- Line 4 - Memory usage
- Line 5 - Swap usage (used should be almost 0 if not 0)
- Table header for process list (Process ID, User, Priority, Nice, Virtual Memory, Resident Size, Shared Size, , State, CPU, Memory, CPU Time used, Command)
- The processes themselves...
Now, what to look for is a process that has a high CPU % that appears in multiple repetitions, as well as has a high CPU time. Be aware that you'll more than likely have a few of them. Write down the highest ones (most likely MySQL, Apache, etc). Now that you know what you need to tweak, lets look at how to.
If one of the top processes is MySQL, you may need to tweak MySQL for the load. There are a whole bunch of articles out there on tweaking MySQL, so I'm not going to go into too much detail here. Things that you will want to do is adjust the Key_buffer_size, query_cache_size, thread_cache, and table_cache to larger values (be careful not to go too big, they can easily eat up all available ram). If you want to read more, take a look at Performance Tuning MySQL For Load.
Apache may appear in the list as apache or httpd. Now, I'm not going to get into tweaking Apache for two reasons. First, I don't use Apache, so I'm not familiar with tweaking it, and second, there is a whole host of articles on the internet devoted to tweaking Apache. Here's a decent article on Tweaking Apache For Load.
What if it's something else?
Now this is where things get interesting. Are you noticing something else using your CPU time? There are a few common culprits that like to cause high load. The two biggest ones are SpamAssassin and Sendmail. If you need to have SpamAssassin running, you should set it to discard all messages marked as spam to /dev/null (blackhole). If you don't need it, disable it... It's a great program, but it uses a lot of CPU time to do what it does. Disable all "Catch All" e-mail accounts (as they add time to the spool).
Ok, so now what?
So you've tweaked the server. It's running faster, and more efficient. For now. As time goes on, you may need to tweak some more (as your load changes, or resources change, etc). That's what administrating a server is all about. Your job is never done. However, you really should install some kind of server monitoring tool such as SICM or MRTG, and let them watch your server load. That way you can identify patterns in load, and determine if the problem is with too many users, or something else. I also suggest moving away from Apache, and use Lighttpd, as it uses less memory, less CPU time, and is significantly faster. There you have it!