Thursday, December 31, 2009

Managing your log files

SkyHi @ Thursday, December 31, 2009
Are you responsible for any Linux systems that are important to the running of your business? A web server, database server or mail server, perhaps, or some edge-of-network appliance like a firewall? If so, it's important to monitor the health of these machines, and the log files are perhaps your first port of call. Log files can tell you if things are misconfigured, alert you to break-in attempts, or simply reassure you that all is well.
In this tutorial we'll begin by taking a peek inside a few log files to get a hint about the kind of stuff you'll find there: then we'll move on to examine some tools for summarising and managing the files.
Logged messages fall into two broad types: low-volume messages that you might want to read line by line, and high-volume messages that really need to be summarised. Let's take a look at a couple of examples of the low-volume type first.
e1000: 0000:04:03.0: e1000_probe: (PCI:66MHz:32-bit) 00:0e:0c:4a:42:4e
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
e1000: eth0: e1000_watchdog_task: NIC Link is Up 10 Mbps Full Duplex
Exhibit A is taken from the kernel's ring buffer - these three lines show a kernel module discovering and initialising a network interface. Such messages are encouraging if you have added new hardware and are trying to establish if Linux can find it.
Feb  6 15:11:56 shrek named: zone localdomain/IN: loaded serial 42
Feb  6 15:11:56 shrek named: zone localhost/IN: loaded serial 42
Feb  6 15:11:56 shrek named: named.local:9: unknown RR type 'PT'
Exhibit B is taken from that all-purpose log file, /var/log/messages, and shows a DNS server failing to start up because of a syntax error in one of its zone files. Feb 6 15:11:56 is the date of the event; shrek is the name of the computer. Next, named is the service reporting the problem, which we are told is loaded serial 42. The third line goes into detail about the exact error, and tells us the line number and filename in question.
These next three examples are of messages that are likely to occur in large numbers, even on a normal healthy system.
Feb  6 15:08:06 shrek su: pam_unix(su:session):
session opened for user root by pete (uid=571)
Exhibit C, taken from /var/log/secure, shows an ordinary user gaining root access with su. On this particular machine, direct root logins are disabled, forcing a user to reveal their identity (and leave an audit trail) when they want to become root.
Feb  2 18:54:16 shrek sshd[4244]:
Failed password for root from port 39409 ssh2
Exhibit D represents a failed SSH login attempt as root. Taken individually, it could just be a fumble-fingered user mis-typing the password, until you notice that it is one of almost 10,000 such messages occurring at roughly five-second intervals over a period of more than 12 hours... - - [05/Feb/2007:08:31:38 +0000]
"GET /media/images/premiumskin07d/bg.gif HTTP/1.1" 404 320
Finally, exhibit E shows a line from an Apache access log. Of course, you expect your web server to get accessed, so lines in the access log should not come as a great surprise; however, this one is interesting because of the 404 (file not found) response, which probably indicates a broken link on the site.
By the way, if you define a custom log format Apache will be capable of writing much more detailed logs than this, including the contents of any fields in the HTTP request header. In fact, there is a fairly standard extension called the combined log format, which many log file analysis tools can understand (more details at
Generally speaking, you do not want to read log messages such as exhibits C, D, and E line by line. You would prefer to have them summarised for you, providing a report that simply says, for example, 'pete used the su command 26 times to become root'.

Become a champion birler

Without intervention, log files will grow without bound, and much of the data in each file is too old to be useful. To help you keep on top of all this informational lumber and send the old logs downstream, Linux distributions include a tool called logrotate. This, ah, rotates the log files; that is, it periodically divides them into manageable pieces and, eventually, deletes the oldest ones.
Typically, logrotate is run daily as a Cron job. It has its own configuration file specifying which logs to manage - choose to rotate logs after a set period of time (daily, weekly or monthly) or when they exceed whatever size you've specified in the logrotate config file. The old logs can be compressed, and files can be mailed to specified users.
The top-level configuration file for logrotate is /etc/logrotate.conf - let's take a look at it. This file defines some default settings for logrotate, and uses an include directive to include config files for individual services from the directory /etc/logrotate.d. Excluding comment lines, on a Fedora test installation logrotate.conf looks like this (note that the line numbers are just here for reference):
1. weekly
2. rotate 4
3. create
4. include /etc/logrotate.d
5. # no packages own wtmp -- we'll rotate them here
6. /var/log/wtmp {
7.    monthly
8.    create 0664 root utmp
9.    rotate 1
10. }
Line 1 instructs logrotate to rotate the files every week. Line 2 says that the last four log files should be kept; in the context of this example, this means the log files for the last four weeks. As log files are rotated, the current log file (for example /var/log/messages) is renamed as messages.1; the file messages.1 is renamed as messages.2, and so on.
Line 3 tells us that after each log has been rotated, a new log file will be created. You can configure logrotate's handling of each log file separately. For example, you might want to rotate /var/log/messages every day and keep seven generations, but you might want to rotate /var/log/mail.log every week and keep four generations.
On to line 4, which tells logrotate to read all the config files in /etc/logrotate.d. In that directory you'll find a collection of files, each of which relates to a specific service (or to be more accurate, a specific log file). Splitting up the configuration into individual service-specific files in this way allows each service to simply drop its own config file into /etc/logrotate.d as it is installed.
However, it is quite possible to put service-specific configurations (or even the entire configuration) directly into logrotate.conf, and indeed we see an example controlling the rotation of the wtmp file at lines 5-10 of the file.
As an example of a service-specific file, here's /etc/logrotate.d/ppp (also from Fedora). This file controls the rotation of the log file generated by the ppp daemon, which manages dial-up connections.
1. /var/log/ppp/connect-errors {
2.         missingok
3.         compress
4.         notifempty
5.         daily
6.         rotate 5
7.         create 0600 root root
8. }
Let's run through this. Line 1 lists the log files to which the following entries apply. In line 2, logrotate is told not to complain if the log file is missing. Line 3 requests that the old (rotated) log files are to be compressed - in this example they will end up with names of the form connect-errors.1.gz.
In lines 4 and 5, the administrator has configured things so that rotation is skipped if the file is empty, but that otherwise rotation should be done daily (note that this overrides the default setting in logwatch.conf). For this to make any sense, of course, logrotate must be run at least once a day.
The remaining settings should be clear to you by now: line 6 says to retain five generations of the log (again overriding the default setting), while line 7 (also overriding the default) says that after rotating, the log file should be recreated with the specified mode, ownership and group. There are quite a few more logrotate directives to pinpoint how a service's log file is dealt with; consult the logrotate man page for details.
Here we can see the result of log file rotation by listing the directory /var/log. Below, we're looking at the four rotations of the log file maillog:
$ ls -l /var/log/maillog*
-rw------- 1 root root 4112 Jan 19 04:02 /var/log/maillog
-rw------- 1 root root 6584 Jan 14 04:02 /var/log/maillog.1
-rw------- 1 root root 5771 Jan  7 04:02 /var/log/maillog.2
-rw------- 1 root root 7794 Dec 31 04:02 /var/log/maillog.3
-rw------- 1 root root 5561 Dec 24 04:02 /var/log/maillog.4
You can see that all these files were created at 04:02 in the morning (not because a neurotic sysadmin felt the need to run logrotate, but because this is the time that the sysadmin has chosen to run the daily Cron jobs), though you'll notice that the rotation occurs only once a week. Because we are keeping only four weekly generations, any mail logs dated prior to 17 December (when maillog.4 was started) have been deleted. In this example, the files are not compressed.

Watching logs

Although logrotate takes care of the problem of parcelling the log files into manageable pieces, it does not actually report on their contents. There are a number of open source applications (and several proprietary offerings, too) that can help with this.
One of the most widely used is logwatch. This is a simple but useful program that undertakes the tedious activity of reading the log files and summarising them, so that instead of looking through tens of thousands of lines relating to individual page fetches by Apache, you can see text reports or bar charts showing site usage by day, by hour, by URL and so on.
Logwatch is available on most Linux distributions. Like logrotate, it is usually run daily as a Cron job. Typically, it is configured to mail its output to root; indeed, if you log in as root and read your mail on a long-neglected server, you are quite likely to find a long list of daily messages from logwatch waiting for you (and very little else). The heart of logwatch is a Perl script, /usr/share/logwatch/scripts/
This is supplemented by a a rather confusing hierarchy of config files and a collection of service-specific filters to cut out logged information about the services you're not interested in. On Fedora, logwatch is run daily as a result of a symlink to the main script in the /etc/cron.daily directory.
Most people will be content to stick with the default configuration of logwatch, but once you've mastered its rather labyrinthine assortment of config files and scripts, there's a lot you can do to configure it, including writing your own filter scripts if you have appropriate programming skills - probably with Perl or PHP.
Directory structure of logwatch Directory structure of logwatch
The Directory Structure Of Logwatch diagram above illustrates the main directory hierarchy of logwatch, which is rooted at /usr/share/logwatch. The left-hand ellipse in the figure (the directory /usr/share/logwatch/default.conf) contains the files that define a base-line configuration. The files here are common to all installations of logwatch. The middle ellipse (directory dist.conf) contains subdirectories that parallel those in default.conf. These directories are intended to contain distro-specific configuration.
There is actually a third directory, /etc/logwatch/conf (not shown in the diagram) that again parallels the default.conf directory and is intended for site-specific configuration. The idea is that you should leave the default configuration alone, and use the distro-specific and site-specific configurations to override or extend it, but on our test Fedora system the distro- and site-specific configurations are essentially empty. Not all distros do things quite this way. Red Hat Enterprise Linux, for example, has only one of these directory hierarchies, rooted at /etc/log.d.
Central to the configuration of logwatch is the concept of a 'service'. We feel obliged to put the word into quotes because we are not (necessarily) talking about a single network service such as DNS or FTP that is provided by a specific Linux daemon. In logwatch-speak, a service is simply some source of logging information, identified by a name. All clear?
Because you're probably suffering from exposure out here nattering about services, let's get back into the warmth of a configuration file: specifically, the top-level logwatch file, /usr/share/logwatch/default.conf/logwatch.conf. Here it is, minus comments (but plus line numbers):
1. LogDir = /var/log
2. TmpDir = /var/cache/logwatch
3. MailTo = root
4. Print = No
5. Range = yesterday
6. Detail = Low 
7. Service = All
8. Service = "-zz-network"
9. Mailer = /bin/mail
Line 1 tells logwatch where to look for the log files. We'll refer to this when we see how to define log-file groups in a minute. Lines 3 and 4 explain that the output from logwatch should be mailed to root and not written to standard output.
Line 5 specifies a time range from which to process log entries (valid values are yesterday, today and all), while line 6 determines the level of detail you'd like in the report - the values Low, Med and High are defined, and it is up to the filters for the individual services to decide how to interpret this setting.
If all services are to be processed you'll have line 7 set to All as in the example here, but you can set exceptions, as in line 8, which explicitly excludes the zz-network service. Line 9 tells us where the mailer is.
Before we go any further it's worth mentioning that you can override most of these settings with command-line options when you invoke logwatch. For example, the command
# logwatch --detail High --service cron --print
...processes only the service Cron, at a high level of output detail, with output written to standard output rather than being mailed. Try man logwatch or logwatch --help for details on command line options.

Get into groups

Next on our config file tour we'll stop by at one of the service definition files in default.conf/services. The files in this directory define the services that logwatch knows about. For example, here's the file sshd.conf, minus comments:
1. Title = "SSHD"
2. LogFile = secure
3. LogFile = messages
4. *OnlyService = sshd 
5. *RemoveHeaders
Line 1 defines a title for the service. Any output generated by the filter for sshd will be bracketed thus:
-------- SSHD Begin ---------
--------- SSHD End ----------
In lines 2 and 3, secure and messages are the names of a couple of log file groups. The log file groups simply serve to factor out configuration information that might be common to several service definitions. These groups are defined by files in default.conf/logfiles; in this case, the files secure.conf and messages.conf. We'll look at those next on our tour, but basically, each log file group names one or more files that will be taken as a source of logged messages.
Line 4 introduces a filter, OnlyService, to be run with the parameter sshd. This is one of the shared scripts in the scripts/shared directory; the purpose of this particular filter is to select only those lines that appear to come from sshd. Another shared filter appears in line 5: this one removes the 'garbage' at the beginning of each syslog-style logged message, by which we mean the time stamp, the host name and the facility name - so much flotsam and jetsam to sysadmin lumberjacks.
Time now to visit one of the log file group definitions. We'll take secure.conf as our example. Here it is:
1. LogFile = secure
 2. LogFile = authlog
 3. Archive = secure.*
 4. Archive = secure.*.gz
 5. Archive = archiv/secure.*
 6. Archive = archiv/secure.*.gz
 7. Archive = authlog.*
 8. Archive = authlog.*.gz
 9. *ExpandRepeats
10. *OnlyHost
11. *ApplyStdDate
Lines 1 and 2 specify the names of the actual log files. You can use absolute path names here, but if you don't, the names are taken relative to LogDir as defined in the top-level config file, which you may remember was the directory /var/log. In this example, the contents of both /var/log/secure and /var/log/authlog are passed into the filter chain for this service.
Lines 3-8 define sources of archived logs; these are used only if logwatch is started with the --archives option or if the line Archives = yes appears in the top-level config file.
Lines 9, 10 and 11 specify a further three filters (also in the scripts/shared directory) that will be placed at the head of the filter chain. The ExpandRepeats filter was originally intended to expand the last message repeated N times messages that syslog generates but the authors have evidently had a change of heart and the filter now merely suppresses these messages.
OnlyHosts filters messages according to the host from which they originated - syslog can accept messages from other hosts on the network. This filter is used in conjunction with the --splithosts command line option, which tells logwatch to create a separate report for each host found in the raw logs. Finally, the ApplyStdDate filter discards those lines whose syslog-style time stamps do not lie within the day being processed.
All of this configuration stitches together a processing chain of filters, which runs under control of the script, as shown below. The shared filters shown here live in /usr/share/logwatch/scripts/shared; the service-specific script is in /usr/share/logwatch/scripts/services. It is the service-specific scripts that ultimately generate output.
The chain of filters for log data. The chain of filters for log data.

One of our files is missing

If you want to see logwatch in action you can run it directly from the command line, courtesy of another symbolic link in /usr/sbin. You will need to specify the --print option to see the output, because the default action is to mail it to root. Here's a sample run:
########### Logwatch 7.2.1 (01/18/06) #############
      Processing Initiated: Thu Feb  1 04:02:02 2007
      Date Range Processed: yesterday
                            ( 2007-Jan-31 )
                            Period is day.
    Detail Level of Output: 0
            Type of Output: unformatted
         Logfiles for Host:
 ------------------- httpd Begin -------------------

 Requests with error response codes
    400 Bad Request
       /: 1 Time(s)
    404 Not Found
       /media/images/premiumskin07d/bg.gif: 51 Time(s)
       /favicon.ico: 3 Time(s)
  -------------------- httpd End --------------------
 ------------------ pam_unix Begin ----------------- 
    Authentication Failures:
       unknown ( 27 Time(s)
       admin ( 3 Time(s)
       admin ( 1 Time(s)
       root ( 1 Time(s)
       unknown ( 1 Time(s)
    Invalid Users:
       Unknown Account: 28 Time(s)
    Authentication Failures:
       pete(571) -> root: 1 Time(s)
    Sessions Opened:
       pete(uid=571) -> root: 3 Time(s)
What are we to make of this output? Well, we appear to have a missing file on our website. Judging by the name, it's probably just cosmetic. Second, there have been a number of failed attempts to log in to sshd (sshd and Apache are the only services enabled on this particular machine).
Thirdly, user Pete has logged in as root three times (and apparently fumbled the password once; we all do it, Pete). Because direct root logins are disabled on the machine, all attempts (successful and unsuccessful) to become root will leave an audit trail like the one shown here. All in all, there's nothing very exciting here. Just the daemons going about their day-to-day business.
Logwatch is not the only fish in this particular sea. Debian users would probably draw our attention to logcheck, which filters out 'normal' lines from log files on the basis of a regular expression match against a database of regular expressions, then mails the remaining 'abnormal' lines to the system administrator. And of course, if you have decent Perl or PHP programming skills, these kinds of tools are not that hard to custom build.
Well, there you are. Log files. We warned you they aren't exciting! But on a rainy afternoon, if you've finished the paper, read your copy of Linux Format magazine from cover to cover, and need an excuse not to shampoo the cat, you could always disappear into your office and read a few.

Web analytics

When it comes to the analysis of web server log files, you could use one of the tools at the top end of the scale such as Google Analytics and Clicktracks. These are intended to help the marketing department figure out how people travel through your site and how they got there in the first place.
The open source The Webalizer ( is a more traditional tool, which simply feeds off the log files written by Apache. What you'll get are usage statistics in HTML format for viewing with a browser. There are yearly, monthly, daily and hourly usage stats, and it can also display usage by site, URL, referrer, user agent (ie browser), username, search strings, entry/exit pages and country.
The Webalizer gives you a detailed web analysis. The Webalizer: each of the underlined links in the table leads to a more detailed analysis for that month, including day-by-day and hour-by-hour usage charts, and a list of popular URLs.
Usefully, The Webalizer maintains its own 'running total' summaries of site activity so that it can present statistics that reach further back in time than your log file rotation regime would preserve. It also supports FTP transfer logs from the wu-ftpd server, and will automatically uncompress gzipped log files if necessary.