Monday, October 17, 2011

LAMP Server Tuning

SkyHi @ Monday, October 17, 2011

Getting the most from your Drupal site means getting the most from your server – optimizing the various layers of the the LAMP stack. This includes the filesystem, database, web server, PHP, RAM and CPU. Tuning the LAMP stack is a major subject requiring a lot of study and practice to become proficient. It’s something you will probably never completely master :) Try Googling lamp performance tune for a few articles to whet your appetite. For now, we’ll cover a few of the major considerations for Drupal, although most of this advice would apply to any PHP web app running on Linux.
Server tuning considerations
Drupal documentation covering the basics.
Tuning LAMP systems, Part 1: Understanding the LAMP architecture
Intermediate article covering LAMP.
Tuning LAMP systems, Part 2: Optimizing Apache and PHP
Intermediate article covering Apache and PHP.
Tuning LAMP systems, Part 3: Tuning your MySQL server
Intermediate article covering MySQL.

Opcode cache

Opcode caches cache the compiled form of a PHP script in shared memory to avoid the overhead of parsing and compiling the code every time the script runs. This saves RAM and reduces script execution time.
Quite a bit of benchmarking has been done in the Drupal and PHP communities betweenAPCeAccelerator and XCache. eAccelerator may have the edge in raw performance, but it appears that APC is the preferred opcode cache in the Drupal community because it is well maintained and less buggy.
All sites: faster and less RAM. Moderate install.
Drupal web server configurations compared
APC gives 2x to 4x increase in throughput under load. PHP5 is around 10% slower.
PHP op-code caches / accelerators: Drupal large site case study
Op-code caches are a must for large sites serving many pages.
Benchmarking APC vs. eAccelerator using Drupal
eAccelerator is faster and smaller than APC. Both offer around 6x – 7x times speedup over PHP.
High PHP execution times for Drupal, and tuning APC for include_once() performance
Make sure apc.shm_size can fit the whole page else there will be no caching.

Database

There are a number of choices to be made when tuning your MySQL database server. The MySQLTuner script can be helpful for identifying outstanding issues you may be unaware of. It can be run on a functioning production server to see how your database is performing in the wild. It’s possible to take a best guess at config options on your dev machine but you aren’t going to know how things are going to shape up until real users start hitting the DB.
MySQLTuner
Perl script which is able to report on the operation of your MySQL installation and offer suggestions as to what can be fixed.
Tuning MySQL Performance with MySQLTuner
Helpful tutorial.

STORAGE ENGINE: INNODB VS MYISAM

A default install of Drupal 6 installs the DB tables as MyISAM. This will change in Drupal 7 with the default set to InnoDB. A Drupal 6 installation may well have some InnoDB tables as modules may create new tables in the InnoBD engine. Your installation may therefore be a mix between the two engines.
In many places on the web you will read statements such as ‘All high performance Drupal sites run InnoDB”. This is not necessarily so as there are some cases where MyISAM may still be preferred although with recent changes to Drupal core the pendulum has swung to InnoDB as a sensible default.
A list of the main difference between the engines is as follows:/p>
  • InnoDB is transactional (better integrity), MyISAM isn’t
  • InnoDB more reliable (better recovery), MyISAM can be repaired
  • InnoDB has row level locking (better concurrency), MyISAM locks tables
  • InnoDB uses clustered indexes (faster access to data), MyISAM indexes just the keys
  • InnoDB has a bigger memory footprint
In general, you would consider sticking with MyISAM if
  • Memory footprint was an issue. If you have very big indexes which might only just fit into the key buffer then MyISAM could offer faster lookups.
  • Most activity is read only.
InnoDB tables definitely should be used for all of the Drupal cache tables since this is where most contention is likely to occur.
Finally, it must be noted that Drupal was written based on the MyISAM engine and as such many queries were not optimized for InnoDB. The SELECT COUNT(*) is particularly slow in InnoDB because it must scan all rows to calculate the count. Many of these shortcomings have been removed in the PressFlow distribution and have since made their way back into core.
All sites: InnoDB for less contention on cache
Most sites: InnoDB for everything else
Big unchanging sites: MyISAM faster reads less RAM
MySQL Engines: MyISAM vs. InnoDB
InnoDB is a good fit for many cases and “in most cases, InnoDB is the correct choice for a Drupal site”. Very good comparison between the two engines.
MySQL InnoDB: performance gains as well as some pitfalls
InnoDB does row level locking but lookup is slower for some slow queries. NB. Pressflow distribution fixes some slow InnoDB queries.
InnoDB vs MyISAM vs Falcon benchmarks – part 1
Myth that MyISAM is faster than InnoDB in all cases.
Which Tables can be converted to InnoDB
High Performance discussion emphasizing that InnoDB should definitely be used for cache tables and complex joins in CCK if memory allows.

MYSQL CONFIGURATION

There are a number of MySQL config variables which must be tweaked to suit your data. It is impossible to specify one set of options to suit all sites. A few rules of thumb are offered below.
Optimizing the mysqld variables
Clear article with some good rules of thumb for MySQL variables.

KEY BUFFER

If you are running MyISAM tables then the key buffer is a very important variable to set. The key buffer stores table indexes in memory, allowing for fast lookups and joins. For large node, node_version and url_alias tables it is a must to have enough room to fit these tables into memory, otherwise your site will very slow on the most basic of operations: looking up nodes, titles and paths.
One rule of thumb is to set this buffer to somewhere between 25% and 50% of the memory on the server. To determine the best value up front sum the size of all the .MYI files.
MyISAM sites: most queries faster. Essential.
key_buffer_size
Documentation on the use of key_buffer_size.

QUERY CACHE

MySQL has a query cache which stores results up to a certain size in memory. The cache is very handy for quickly returning commonly accessed data when all other forms of caching (reverse proxies, page cache, Drupal caches) have not been invoked. Queries which may take sometime return almost instantly.
MySQL’s Query Cache
Covers config and operation of the query cache.
During the development and testing of a site the query cache can catch developers out since a query may appear to be performing quite well the second and subsequent times through. To really test a query you need to fire up mysql client (or phpmyadmin) and add the SQL_NO_CACHE option to the query to see the real time it takes. Don’t be fooled!
Query Cache SELECT Options
Documentation on the use of SQL_NO_CACHE.
The query cache is destroyed if any row in the table is changed and so it cannot be relied upon if tables are changing frequently. The cache shines when the are big tables which don’t change that often. Unless your site has such characteristics it is best to limit it so that it fits small unchanging tables and then some for the most popular queries. Examination of cache hit rates will show you if it needs to be extended or reduced.
query_cache_size
Documentation on the use of query_cache_size.
All sites: common queries faster

INNODB BUFFER POOL SIZE

If you are running InnoDB tables then it is essential to optimize the InnoDB Buffer Pool Size, increasing the memory to reduce query time. InnoDB is more memory intensive and so the pool will be larger than that used for MyISAM tables. MySQL documentation suggests that the size can be upped to 80% of physical memory. Anymore could lead to swap issues.
InnoDBsites: most queries faster. Essential.
innodb_buffer_pool_size
Documentation on the use of innodb_buffer_pool_size.

OTHER VARIABLES

Other variables worth tweaking include the following. See Optimizing the mysqld variablesfor more.
  • table cache
  • sort buffer
  • read_rnd_buffer_size
  • tmp_table_size

DATABASE WARMUP

A warm database will perform much better than a recently started one because its caches and buffers will be primed with keys and data. It therefore makes sense to warm up a DB every time the database is restarted. The best way to do this is to load in the indexes of commonly used tables. This guide recommends loading in node, node_revisions and url_alias. Taxonomy information could be good candidates as well.

USE drupal6;
LOAD INDEX INTO CACHE node;
LOAD INDEX INTO CACHE node_revisions;
LOAD INDEX INTO CACHE url_alias;
LOAD INDEX INTO CACHE term_data;
LOAD INDEX INTO CACHE term_node;
This SQL code can then be put in a script and run when MySQL restarts. It is possible to configure the init_file variable in my.cnf to tell mysql where to find the startup SQL.
init-file = /etc/mysql/init-file.sql
Many nodes: Most queries where index relied upon.
Index Preloading
How to use LOAD INDEX INTO CACHE t1.
init_file
How to use init_file variable to specify startup SQL.

DATABASE INDEXES

Indexes on columns can dramatically speed up queries if the columns are used for filtering, sorting or joining. Generally, Drupal has most of the indexes you need covered, however, there are some areas where standard tables can benefit from an additional index. It is recommended that you profile your queries to see where things are slow before adding indexes in a scattergun approach because adding indexes can harm performance if they are not being used properly. You can use MySQL’s slow query log for queries with no index to identify areas for improvement.
mikeytown2 has come up with a list of tables which could do with an index:
  • All CCK fields that you use in a view. File Field: create an index on the fid; date: index on date; index on value; etc…
  • access: type, mask, status
  • comments: timestamp
  • node_comment_statistics: comment_count
  • menu_links: external, updated, customized, depth
  • users: pass, status
  • menu_custom: title
  • date_format_types: title
  • filter_formats: roles
  • content_group: weight, type_name, group_name
  • term_data: name
  • system: name
  • imagecache_preset: presetname
  • blocks: module, delta
  • system: status, type
  • content_node_field: type, widget_type

Web server

Apache + MPM Prefork + mod_php is the default web server configuration in the LAMP stack. This combination does consume large amounts of RAM which can be a problem for handling many requests. It can also be quite heavy and slow for serving static content. Many administrators have looked to replace it with other combinations including multithreaded processes (MPM Worker) and external PHP (mod_fcgid) as well as swapping it out completely for another server such as Nginx. This guide has adopted the position that Apache problems can be ameliorated somewhat by removing unneeded modules, running fcgid to connect with PHP and using MPM Worker to enable multithreading per process. However, in some cases this won’t be enough and Nginx is a must.

APACHE VS NGINX

Other Drupal users have replaced Apache with faster more lightweight (RAM and CPU) web servers such as Nginx and Lighttpd. Nginx is generally preferred over Lighttpd because of memory leaks in the latter. It is currently possible to run Nginx without losing any functionality in Drupal. Boost, a module based on .htaccess rules, now supports Nginx so it is feasible to run Nginx as the main web server. If you are constrained by CPU or have high loads then this certainly is an option worth considering.
Setting up Nginx is not trivial but it is reasonably straight forward if you are comfortable with compiling and patching. There are some good tutorials on the Web for user who want to do this.
Low resources, High Traffic, Many logged in: Possible to get more for less with Nginx.
Apache vs Nginx : Web Server Performance Deathmatch
“Nginx seems to compete pretty well with Apache and there doesn’t seem like there is a good reason not to use it especially in CPU usage constrained situations (ie. huge traffic, slow machines and etc).”
In reply to kbahey: apache vs nginx
Discussion and results over the pros and cons of Nginx vs various Apache setups.
How to get Drupal working with Nginx
Simple guide for installing and configuring Nginx on a server with only 256MB RAM. Uses FastCGI which may not be preferred method.
NGINX + PHP-FPM + APC = Awesome
“The following guide will walk you through setting up possibly the fastest way to serve PHP known to man…In this article, we’ll be installing nginx http server, PHP with the PHP-FPM patches, as well as APC.”
PHP-FPM – A simple and robust FastCGI Process Manager for PHP
Preferred way of connecting Nginx with PHP. Currently in PHP core for 5.3.2+ but not yet released. Requires patch to PHP 5.2.

APACHE: UNNEEDED MODULES

It is possible to turn off unneeded modules in Apache to reduce memory footprint. The modules you require depends very much on your setup.
The traditional way of controlling modules in Apache has been through the LoadModule directive in httpd.conf. Ubuntu and Debian do it differently with the /etc.apache2/mods-available directory and the a2enmod command. To list all modules to enable try:

$ sudo a2enmod

$ sudo /etc/init.d/apache2 force-reload
And to see what you have enabled you can do $ sudo a2dismod.
All sites: Good savings in RAM
What Apache2 modules can be disabled?
Lists of modules which should be enabled in Apache2.

APACHE THREADING: MPM WORKER (MULTI THREADED) MPM PREFORK

The use of MPM Worker allows for the handling of more requests due to multithreading in each process. It has a smaller memory footprint than Prefork and is faster. According to docs, Apache must be compiled with the --with-mpm argument in order to install Worker as “prefork” is the default on Unix systems.
RAM limited: Worker preferable to Prefork.
Compile-Time Configuration Issues
“Choosing an MPM” section covers differences between the two models.
Multi-Processing Modules (MPMs)
Apache documentation on installation.
Installing Apache2 and PHP5 using mod_fcgid
Hey, you don’t have to recompile Apache. Tutorial on how to install MPM Worker using apt-get with Apache2 on Ubuntu. Just the ticket.
mpm-worker versus mpm-prefork, and mod_php versus fastcgi
PreFork and FastCGI is still a win if you find that Worker is unstable due to long downloads as this person did.

CONNECTORS: MOD_PHP, FASTCGI, MOD_FCGID

The use of mod_php with Apache is the most common setup for calling PHP. mod_php works by embedding PHP into every Apache process. This has the disadvantage of a large memory footprint for each Apache process. FastCGI and mod_fcgid overcomes this problem and reduces resource utilization with no gains in performance.
  • All PHP loaded into the process
  • Heavy process even if flat file
  • Many processes will hog RAM
Use mod_fcgid for lower memory and DB/Network connections
Drupal webserver configurations compared
The most common, Apache+mod_php is the slowest. Tests conducted with FastCGI which is faster. NB: FastCGI has subsequently suffered from stability issues.
Apache with fcgid: acceptable performance and better resource utilization
Informative article which comes out in favor of mod_fcgid over FastCGI and mod_php. This is the must read article if you wish to attempt fcgid.
Configure Apache for high performance on drupal 6
Some solid comments from kbahey from 2bits regarding stable setup: Apache, MPM Worker, fcgid, APC (code cache), memcache No SQL.

APACHE MAXCLIENTS

The MaxClients parameter controls how many simultaneous clients Apache is able to serve. If it is set to high RAM will be chewed up and the Machine will go into swap. If it is set to low then your site will be unnecessarily limited by the number of clients it can serve. The setting of this value should be determined after consideration of (i) how much spare RAM is available on the server and (ii) how much RAM each Apache process consumes. Obviously you will want to maximize available RAM through frugal allocation of RAM to MySQL, JVM, etc and minimize the size of Apache process through techniques described above.
2bits provide the following formula:
MaxClients = (Total Memory - Operating System Memory - MySQL memory) / Size Per Apache process.
The only addition this guide would make is that it is important to leave some RAM free for the OS file buffer to allow efficient operation of the OS.
Tuning the Apache MaxClients parameter
How to set MaxClients param.

.HTACCESS

If you are running Apache then it is possible to either use .htaccess or the apache conf file to specify directives such as rewrite rules, etc. If you use .htaccess then Apache must look for .htaccess rules in the directory hierarchy for every request. This can take some time even if no rules are found. You may consider putting the rules in httpd.conf/apache2.conf if you are looking to eek out the most performance from your site.
.htaccess can slow down site if performance is crucial.
.htaccess vs httpd.conf
Evidence that .htaccess can slow a site by 6.6%.

RAM: A precious resource

Given the above, serious thought should be given to how the RAM on your box is to be divided up. In a nutshell we have the following apps contesting for their fair share:
  • The JVM if you are running Solr
  • MySQL query cache and key buffers
  • Apache processes for client requests
  • PHP if it runs outside Apache
  • Memcached for holding Drupal caches
  • The file system cache
Consider the following when deciding how to divide up your box:
  • The JVM needs a certain amount or else Solr will crash.
  • MySQL really should have indexes buffered for MyISAM and InnoDB. Use MySQLTuner. If they can’t fit then buy more RAM or (i) reduce max clients and (ii) forget about CacheRouter.
  • Apache MaxClients should be set to consume available RAM.
  • The file system cache needs to be big enough to allow smooth running of system.

This article forms part of a series on Drupal performance and scalability. The first article in the series is Squeezing the last drop from Drupal: Performance and Scalability.


REFERENCES
http://cruncht.com/89/drupal-lamp-server-tuning/
http://cruncht.com/tag/performance/page/2/