Friday, August 21, 2009

About Linux Processor load average

SkyHi @ Friday, August 21, 2009
About Linux Processor loadavgs

Processor load averages are those numbers you get when you use the uptime command. Three loadavgs are returned. Each is the result of performing the computation with a different half life.

On a normal users linux box, the load average is usually something pretty low, such as 0.03. This means that on average there are 0.03 processes ready to run at any one time.

The loadavg can be compared to a "percentage of CPU used" metric as found, for example, in the Windows NT Task Manager. However, whilst a CPU percentage measure can only go up to 100% (or 1.00 on the loadavg scale), the loadavg can go arbitrarily high. The reason for this is that the loadavg measures the average number of processes that are ready to run, rather than the average number that are actually running. Obviously, you can only have a maximum of one running process per processor at any given instant.

On an asynchronous server (one that is not interacting directly with users; for example, a mail server or upstream news server), it might be desirable to have the loadavg at 1.00 (call it perfectly loaded). This means that no processor capacity is wasted (or more specifically, no money has been wasted buying a fast processor that is not being used), but the system is not overloaded. It is possible to use loadavgs to determine if this is the case, whereas a simple "percentage of CPU used" metric can not distinguish between an overloaded and perfectly loaded server. (actually this is not true - processes can be ready to run even if they can't immediately use the CPU, I think)
Calculating loadavgs

Loadavg values use an exponentially-weighted average, with increasingly smaller weights over a (theoretically) infinite period of time extending from the present into the past. More recent measurements have larger weight than previous readings.

The theoretical calculation of the load average is as follows:

We have the following values:

1. A (possibly infinite) series of readings labelled x n where n starts at 0 for the most recent reading and increases into the past. Readings before the start of the "universe" (in the case of a unix processor loadavg, before the machine was booted) should be set to 0.
2. A decay factor, d, satisfying 0 < d < 1

Then, we can define the loadavg at time t as follows:

L t = Σ n = 0 ∞ 1 d n ( 1 - d ) x n + t

For practical calculation, note that the present loadavg can be computed iteratively from the present reading x 1 , the decay factor and the loadavg of the previous period as follows:

L t = ( 1 - 1 d ) x 0 + 1 d L t - 1

The initial value of L should be set to 0.

This permits the loadavg to be computing very efficiently on an on-going basis with only a small, fixed number of data.

Decay factors have a length of time associated with them, called the half life. This is the period of time it takes for the loadavg to halve in value if all future input values are 0. No matter what the particular value of the loadavg is at the start of the decay, the time taken for it to half, and hence the length of the half life, is constant.

Some approximate values of half-life are given below:
Decay constants and their associated half-lives. Decay constant d Half life
0.5 1
0.25 <1
0.75 2.5
0.1 <<1
0.9 7
0.95 14
0.965 20

Anyone who has studied A-level physics should find the concept of half-lives familiar.
How the linux kernel actually computes the loadavg

As mentioned at the start of this document, linux provides three load averages, with different decay constants and half lifes. The relevant values are listed in the following table:
Standard decay values in sched.h Decay constant Decay time (not half life) Kernel constant
1 min (12 periods) 1884
5 min (60 periods) 2014
15 min (180 periods) 2037

The code is defined in sched.c and sched.h.

Loadavgs are stored in the three element array avenrun[] as fixed point numbers, with 11 bits for the fractional part. That means, to convert an integer i into this representation, write i<<11 and to extract the integer part from a number in this fractional form, write i>>11.

Readings are taken every 5 seconds, by calling the count_active_tasks() function. This counts the number of tasks that are running, swapping or uninterruptible.

In UNIX computing, the system load is a measure of the amount of work that a computer system performs. The load average represents the average system load over a period of time. It conventionally appears in the form of three numbers which represent the system load during the last one, five, and fifteen -minute periods.

* 1 Unix-style load calculation
* 2 Load average is not CPU utilization
* 3 CPU load vs CPU utilization
* 4 See also
* 5 External links
* 6 References

[edit] Unix-style load calculation

All Unix and Unix-like systems generate a metric of three "load average" numbers in the kernel. Users can easily query the current result from a Unix shell by running the uptime command:

$ uptime
09:53:15 up 119 days, 19:08, 10 users, load average: 3.73 7.98 0.50

The w and top commands show the same three load average numbers, as do a range of graphical user interface utilities.

An idle computer has a load number of 0 and each process using or waiting for CPU adds to the load number by 1. Most UNIX systems count only processes in the running (on CPU) or runnable (waiting for CPU) states. However, Linux also includes processes in uninterruptible sleep states (usually waiting for disk activity), which can lead to markedly different results if many processes remain blocked in I/O due to a busy or stalled I/O system. This, for example, includes processes blocking due to an NFS server failure or to slow media (e.g., USB 1.x storage devices). Such circumstances can result in an elevated load average, which does not reflect an actual increase in CPU use (but still gives an idea on how long users have to wait).

Systems calculate the load average as the exponentially damped/weighted moving average of the load number. The three values of load average refer to the past one, five, and fifteen minutes of system operation.

For single-CPU systems that are CPU-bound, one can think of load average as a percentage of system utilization during the respective time period. For systems with multiple CPUs, one must divide the number by the number of processors in order to get a comparable percentage.

For example, one can interpret a load average of "1.73 0.50 7.98" on a single-CPU system as:

* during the last minute, the CPU was overloaded by 73% (1 CPU with 1.73 runnable processes, so that 0.73 processes had to wait for a turn)
* during the last 5 minutes, the CPU was underloaded 50% (no processes had to wait for a turn)
* during the last 15 minutes, the CPU was overloaded 698% (1 CPU with 7.98 runnable processes, so that 6.98 processes had to wait for a turn)

This means that this CPU could have handled all of the work scheduled for the last minute if it were 1.73 times as fast, or if there were two (1.73 rounded up) times as many CPUs, but that over the last five minutes it was twice as fast as necessary to prevent runnable processes from waiting their turn.

In a system with four CPUs, a load average of 3.73 would indicate that there were, on average, 3.73 processes ready to run, and each one could be scheduled into a CPU.

On modern UNIX systems, the treatment of threading with respect to load averages varies. Some systems treat threads as processes for the purposes of load average calculation: each thread waiting to run will add 1 to the load. However, other systems, especially systems implementing so-called M:N threading, use different strategies, such as counting the process exactly once for the purpose of load (regardless of the number of threads), or counting only threads currently exposed by the user-thread scheduler to the kernel, which may depend on the level of concurrency set on the process.

Many systems generate the load average by sampling the state of the scheduler periodically, rather than recalculating on all pertinent scheduler events. They adopt this approach for performance reasons, as scheduler events occur frequently, and scheduler efficiency impacts significantly on system efficiency. As a result, sampling error can lead to load averages inaccurately representing actual system behavior. This can pose a particular problem for programs that wake up at a fixed interval that aligns with the load-average sampling, in which case a process may be under- or over-represented in the load average numbers.

[edit] Load average is not CPU utilization

Even though the statements in the previous section might suggest that load average is related to CPU utilization because the section relates CPU to load average, load average does not measure CPU utilization of processes. One reason it does not do this is because load averages computations of processes are in a wrong order to relate to trend information of CPU utilization. In other words, the calculations and numbers directly produced by load averages do not compute numbers in an order from more CPU intensive to less CPU intensive or vice-a-versa, nor do they give computations of numbers that would give another way that would result in direct information about CPU utilization. In summary, the functions of load average give numbers based on load queue of processes. [1] The next section uses the same reference to suggest that load average is not very important or vital to system performance information until or unless the a system's CPU is heavily loaded to around 100%. Then, at levels close to 100%, load average could be very important or significant to determinacy of system performance; however, this would be because average load numbers give direct information about process queue length not CPU utilization -- which is something they do not give direct information about.

[edit] CPU load vs CPU utilization

A comparative study of different load indices carried out by Domenico et al.[1] reported that CPU load information based upon the CPU queue length does much better in load balancing compared to CPU utilization. The reason CPU queue length did better is probably because, when a host is heavily loaded, its CPU utilization is likely to be close to 100% and it is unable to reflect the exact load level of the utilization. In contrast, CPU queue lengths can directly reflect the amount of load on a CPU. As an example, two systems, one with 3 and the other with 6 processes in the queue, will probably have utilizations close to 100% although they obviously differ.