Thursday, June 2, 2011

PHP performance tips

SkyHi @ Thursday, June 02, 2011
Recommended experience: Beginner to intermediate PHP knowledge
PHP is a very popular scripting language, used on many popular sites across the web. In this article, we hope to help you to improve the performance of your PHP scripts with some changes that you can make very quickly and painlessly. Please keep in mind that your own performance gains may vary greatly, depending on which version of PHP you are running, your web server environment, and the complexity of your code.

Profile your code to pinpoint bottlenecks

Hoare's dictum states that Premature optimization is the root of all evil, an important thing to keep in mind when trying to make your web sites faster. Before changing your code, you'll need to determine what is causing it to be slow. You may go through this guide, and many others on optimizing PHP, when the issue might instead be database-related or network-related. By profiling your PHP code, you can try to pinpoint bottlenecks.

Upgrade your version of PHP

The team of developers who maintain the PHP engine have made a number of significant performance improvements over the years. If your web server is still running an older version, such as PHP 3 or PHP 4, you may want to investigate upgrading before you try to optimize your code.

Use caching

Making use of a caching module, such as Memcache, or a templating system which supports caching, such as Smarty, can help to improve the performance of your website by caching database results and rendered pages.

Use output buffering

PHP uses a memory buffer to store all of the data that your script tries to print. This buffer can make your pages seem slow, because your users have to wait for the buffer to fill up before it sends them any data. Fortunately, you can make some changes that will force PHP to flush the output buffers sooner, and more often, making your site feel faster to your users.

Avoid writing naive setters and getters

When writing classes in PHP, you can save time and speed up your scripts by working with object properties directly, rather than writing naive setters and getters. In the following example, the dog class uses the setName() and getName() methods for accessing the name property.
class dog {
  public $name = '';

  public function setName($name) {
    $this->name = $name;
  }

  public function getName() {
    return $this->name;
  }
}
Notice that setName() and getName() do nothing more than store and return the name property, respectively.
$rover = new dog();
$rover->setName('rover');
echo $rover->getName();
Setting and calling the name property directly can run up to 100% faster, as well as cutting down on development time.
$rover = new dog();
$rover->name = 'rover';
echo $rover->name;

Don't copy variables for no reason

Sometimes PHP novices attempt to make their code "cleaner" by copying predefined variables to variables with shorter names before working with them. What this actually results in is doubled memory consumption (when the variable is altered), and therefore, slow scripts. In the following example, if a user had inserted 512KB worth of characters into a textarea field. This implementation would result in nearly 1MB of memory being used.
$description = strip_tags($_POST['description']);
echo $description;
There's no reason to copy the variable above. You can simply do this operation inline and avoid the extra memory consumption:
echo strip_tags($_POST['description']);

Avoid doing SQL queries within a loop

A common mistake is placing a SQL query inside of a loop. This results in multiple round trips to the database, and significantly slower scripts. In the example below, you can change the loop to build a single SQL query and insert all of your users at once.
foreach ($userList as $user) {
  $query = 'INSERT INTO users (first_name,last_name) VALUES("' . $user['first_name'] . '", "' . $user['last_name'] . '")';
  mysql_query($query);
  }
Produces:
INSERT INTO users (first_name,last_name) VALUES("John", "Doe")
Instead of using a loop, you can combine the data into a single database query.

$userData = array();
foreach ($userList as $user) {
    $userData[] = '("' . $user['first_name'] . '", "' . $user['last_name'] . '")';
 }
$query = 'INSERT INTO users (first_name,last_name) VALUES' . implode(',', $userData);
mysql_query($query);
Produces:
INSERT INTO users (first_name,last_name) VALUES("John", "Doe"),("Jane", "Doe")...

Additional resources

REFERENCES
http://code.google.com/speed/articles/optimizing-php.html

HOWTO: Configure linux sendmail to use external ISP as SMTP mail relay

SkyHi @ Thursday, June 02, 2011
Many server based applications on linux need to send email from the localhost. Unfortunately "out-of-the-box", linux sendmail doesn't work as expected.  Additionally, if you are hosting a server on an in-house machine and use a dynamically assigned IP address e.g. a Joomla website using dynDNS.org, but you use an external ISP to handle email, you need to setup sendmail on your local machine so that it will relay email through your ISP. My example here is for a server that is hosted locally but email is handled by an ISP, namely Network Solutions. This article describes the basic steps you need to carry out to get sendmail working so that applications on your localhost can send email.

To setup sendmail on redhat, centos or fedora so that mail can be dispatched using local smtp that relays through your ISP (e.g. PHP mail() function for Joomla, Trac, sugarCRM etc)  you need to perform the following steps.

Step 1 - Install all the packages that you are going to need

Use yum to install all the packages that you are going to need. If they are already installed yum will tell you so, if they are not already installed, then yum will install them for you.
# yum install sendmail sendmail-devel sendmail-cf  bind bind-chroot 

NOTE: omit the Step3 for non-SMTP auth ISP(telus,shaw) on same network

Step 2 - Setup sendmail client-side SMTP AUTH

sendmail client-side SMTP AUTH allows us to authenticate in order to relay all outgoing mail to our ISP's SMTP mail server and have them send the mail on our behalf via SMART_HOST. Note that you can have SMART_HOST work just fine without SMTP AUTH if your ISP's SMTP server doesnt require authentication.

Now edit /etc/mail/sendmail.mc to contain the following. I have removed all lines that are commented out to make this more human readable.
divert(-1)dnl
include(`/usr/share/sendmail-cf/m4/cf.m4')dnl
VERSIONID(`setup for linux')dnl
OSTYPE(`linux')dnl
define(`SMART_HOST', `smtp.katriatechnology.com.netsolmail.net')dnl
define(`confDEF_USER_ID', ``8:12'')dnl
define(`confTO_CONNECT', `1m')dnl
define(`confTRY_NULL_MX_LIST', `True')dnl
define(`confDONT_PROBE_INTERFACES', `True')dnl
define(`PROCMAIL_MAILER_PATH', `/usr/bin/procmail')dnl
define(`ALIAS_FILE', `/etc/aliases')dnl
define(`STATUS_FILE', `/var/log/mail/statistics')dnl
define(`UUCP_MAILER_MAX', `2000000')dnl
define(`confUSERDB_SPEC', `/etc/mail/userdb.db')dnl
define(`confPRIVACY_FLAGS', `authwarnings,novrfy,noexpn,restrictqrun')dnl
define(`confAUTH_OPTIONS', `A')dnl
define(`confTO_IDENT', `0')dnl
FEATURE(`authinfo',`hash -o /etc/mail/authinfo.db')dnl
FEATURE(`no_default_msa', `dnl')dnl
FEATURE(`smrsh', `/usr/sbin/smrsh')dnl
FEATURE(`mailertable', `hash -o /etc/mail/mailertable.db')dnl
FEATURE(`virtusertable', `hash -o /etc/mail/virtusertable.db')dnl
FEATURE(redirect)dnl
FEATURE(always_add_domain)dnl
FEATURE(use_cw_file)dnl
FEATURE(use_ct_file)dnl
FEATURE(local_procmail, `', `procmail -t -Y -a $h -d $u')dnl
FEATURE(`access_db', `hash -T -o /etc/mail/access.db')dnl  
FEATURE(`blacklist_recipients')dnl 
EXPOSED_USER(`root')dnl  
FEATURE(`accept_unresolvable_domains')dnl  
FEATURE(`relay_based_on_MX')dnl  
LOCAL_DOMAIN(`localhost.localdomain')dnl  
MASQUERADE_AS(`katriatechnology.com')dnl  
FEATURE(masquerade_envelope)dnl  
FEATURE(masquerade_entire_domain)dnl  
MASQUERADE_DOMAIN(`katriatechnology.com')dnl  
MAILER(smtp)dnl  
MAILER(procmail)dnl

Step 3 -  Setup the authinfo file

Now edit /etc/mail/authinfo to contain the following.
AuthInfo:katriatechnology.com "U:user @ katriatechnology.com" "P:password" "M:PLAIN"AuthInfo: "U:user @ katriatechnology.com" "P:password" "M:PLAIN"
# chmod 660 /etc/mail/authinfo
# makemap hash /etc/mail/authinfo < /etc/mail/authinfo

Step 4 - Setup the access file

Now edit the /etc/mail/access file so that it contains the following.
# by default we allow relaying from localhost...
Connect:localhost.localdomain       RELAY
Connect:localhost                            RELAY 
Connect:127.0.0.1                           RELAY
# chmod 660 /etc/mail/access
# makemap hash /etc/mail/access < /etc/mail/access

Step 5 - Now you need to install bind to get sendmail to work

For some reason sendmail does not use /etc/hosts for local name resolution, so it always fails (and is unbearably slow on system boot as it cannot find localhost.localdomain). To solve this problem you need to run bind on your local machine.

Now edit the /etc/named.conf file so that it contains the following.
options {
       listen-on port 53 { 127.0.0.1 };
};
As we are using bind-chroot we need to copy the named.conf to where it expects t to be.
# cp -f /etc/named.conf /var/named/chroot/etc/
Now edit /etc/resolv.conf so that it contains the following. Change 192.168.2.50 to be the IP address of your primary DNS
nameserver 127.0.0.1
nameserver 192.168.2.50
domain localdomain

Step 6 - Start the required services

# chkconfig named on
# chkconfig saslauthd on
# chkconfig sendmail on
# service named restart
# service saslauthd restart
# service sendmail restart

Test that it works

To check that sendmail is working properly, login to your server and fire up a terminal window.
# mailq
The mailq command will display a list of messages that are held in the outgoing mail queue. It should be empty. Now send an email, and then check the mail queue using mailq.

You can also get better diagnostic information by manually sending a message like this:
# sendmail -Am -t -v 
to:user @ domain.com
from:user @ domain.com
.
 
REFERENCES
http://www.recital.com/index.php?option=com_content&view=article&id=71%3Ahowto-configure-linux-sendmail-to-use-external-isp-as-smtp-mail-relay&Itemid=59

Wednesday, June 1, 2011

MySQL Backup: Table By Table Backup With Auto Rotation, For Easy Restoration Of Partial/Full Database

SkyHi @ Wednesday, June 01, 2011
Here is a MySQL backup script which can take table by table backups (individual backup files of each table of each database) in a compressed format. It also provides an automatic rotation of old backup files. The backup script handles innodb and myisam tables separately.

You have to set the following variables prior to running the backup script.

DB_USER

The database user who has access to all databases and its tables. I used "root" for my deployment.

DB_PASS

Password of the above user, prefixed with "-p". For example if the password is Secret, then you should write the password as "-pSecret".

BAKUP_LOG

File to which the backup log will be written. It should be writable by the user who is running the script.

BASE_BAK_FLDR

The backup folder. It should be writable by the user who is running the script.

RM_FLDR_DAYS

Backup rotation period. +30 is 30 days.

#!/bin/bash
# Database Backup script.
# Created By:    Mohammed Salih
#                 Senior System Administrator
#                Date: 21/06/2007
#
# Database credentials
DB_USER=root
#Please append password in the xxxxx section below, note that there is
# no space between -p and xxxxx
DB_PASS="-pxxxxxxx"
# Get list of Databases except the pid file
DBS_LIST=$(echo "show databases;"|mysql -u $DB_USER $DB_PASS -N)
# Log file
BAKUP_LOG=/backup/log/db-backup.log
# Backup Base directory
BASE_BAK_FLDR=/backup/db
# Backup rotation period.
RM_FLDR_DAYS="+30"
# From here, only edit if you know what you are doing.
index=0
# Check if we can connect to the mysql server; otherwise die
if [ ! "$(id -u -n)" = "mysql" ]; then
        echo -e "Error:: $0 : Only user 'mysql' can run this script"
        exit 100
fi
PING=$(mysqladmin ping -u $DB_USER $DB_PASS 2>/dev/null)
if [ "$PING" != "mysqld is alive" ]; then
        echo "Error:: Unable to connected to MySQL Server, exiting !!"
        exit 101
fi
# Backup process starts here.
# Flush logs prior to the backup.
mysql -u $DB_USER $DB_PASS -e "FLUSH LOGS"
# Loop through the DB list and create table level backup,
# applying appropriate option for MyISAM and InnoDB tables.
for DB in $DBS_LIST; do
    DB_BKP_FLDR=$BASE_BAK_FLDR/$(date +%d-%m-%Y)/$DB
    [ ! -d $DB_BKP_FLDR ]  && mkdir -p $DB_BKP_FLDR
    # Get the schema of database with the stored procedures.
    # This will be the first file in the database backup folder
    mysqldump -u $DB_USER $DB_PASS -R -d --single-transaction $DB | \
            gzip -c > $DB_BKP_FLDR/000-DB_SCHEMA.sql.gz
    index=0
    #Get the tables and its type. Store it in an array.
    table_types=($(mysql -u $DB_USER $DB_PASS -e "show table status from $DB" | \
            awk '{ if ($2 == "MyISAM" || $2 == "InnoDB") print $1,$2}'))
    table_type_count=${#table_types[@]}
    # Loop through the tables and apply the mysqldump option according to the table type
    # The table specific SQL files will not contain any create info for the table schema.
    # It will be available in SCHEMA file
    while [ "$index" -lt "$table_type_count" ]; do
        START=$(date +%s)
        TYPE=${table_types[$index + 1]}
        table=${table_types[$index]}
        echo -en "$(date) : backup $DB : $table : $TYPE "
        if [ "$TYPE" = "MyISAM" ]; then
            DUMP_OPT="-u $DB_USER $DB_PASS $DB --no-create-info --tables "
        else
            DUMP_OPT="-u $DB_USER $DB_PASS $DB --no-create-info --single-transaction --tables"
        fi
        mysqldump  $DUMP_OPT $table |gzip -c > $DB_BKP_FLDR/$table.sql.gz
        index=$(($index + 2))
        echo -e " - Total time : $(($(date +%s) - $START))\n"
    done
done
# Rotating old backup. according to the 'RM_FLDR_DAYS'
if [ ! -z "$RM_FLDR_DAYS" ]; then
    echo -en "$(date) : removing folder : "
    find $BASE_BAK_FLDR/ -maxdepth 1 -mtime $RM_FLDR_DAYS -type d -exec rm -rf {} \;
    echo
fi

The Backup Location

For example, if you have taken the backup of "bigdb" on 1st Jan 2007, then the backup will be kept in

$BKP_BASE_FLDR/01-01-2007/bigdb


The Restore Script

Following command/script is an example for restoring a database called bigdb for which the backup was taken on 1st Jan 2007.

cd /backup/01-01-2007/bigdb;

for table in *; do gunzip -c $table | mysql -u root -pSecret bigdb_new; done.

The above command will iterate through the list of files in the directory and restore all the tables to bigdb_new database. It is assumed that you have created the bigdb_new database prior to running the script.

REFERENCES
http://www.howtoforge.com/mysql-backup-table-by-table-backup-with-auto-rotation-for-easy-restoration-of-partial-full-database
http://sourceforge.net/projects/automysqlbackup/

Monday, May 30, 2011

How to handle 1000′s of concurrent users on a 360MB VPS

SkyHi @ Monday, May 30, 2011
There has been some recent confusion about how much memory you need in a web server to handle a huge number of concurrent requests. I also made a performance claim on the STS list that got me an unusual number of private emails.

Here’s how you run a highly concurrent website on a shoe-string budget:
The first thing you’ll do is get a Linode server because they have the fastest CPU and disk.
Install Apache with your web application running under mod_php, mod_perl or some other persistence engine for your language. Then you get famous and start getting emails about people not being able to access your website.

You increase the number of Apache threads or processes (depending on which Apache MPM you’re using) until you can’t anymore because you only have 360MB of memory in your server.
Then you’ll lower the KeepaliveTimeout and eventually disable Keepalive so that more users can access your website without tying up your Apache processes. Your users will slow down a little because they now have to re-establish a new connection for every piece of your website they want to fetch, but you’ll be able to serve more of them.

But as you scale up you will get a few more emails about your server being down. Even though  you’ve disabled keepalive it still takes time for each Apache child to send data to users, especially if they’re on slow connections or connections with high latency. Here’s what you do next:

Install Nginx on your new Linode box and get it to listen on Port 80. Then reconfigure Apache so that it listens on another port – say port 81 – and can only be accessed from the local machine. Configure Nginx as a reverse proxy to Apache listening on port 81 so that it sits in front of Apache like so:

YourVisitor <—–> Nginx:Port80 <—–> Apache:Port81
Enable Keepalive on Nginx and set the Keepalive timeout as high as you’d like. Disable Keepalive on Apache – this is just-in-case because Nginx’s proxy engine doesn’t support Keepalive to the back-end servers anyway.

The 10 or so Apache children you’re running will be getting requests from a client (Nginx) that is running locally. Because there is zero latency and a huge amount of bandwidth (it’s a loopback request), the only time Apache takes to handle the request is the amount of CPU time it actually takes to handle the request. Apache children are no longer tied up with clients on slow connections. So each request is handled in a few microseconds, freeing up each child to do a hell of a lot more work.

Nginx will occupy about 5 to 10 Megs of Memory. You’ll see thousands of users concurrently connected to it. If you have Munin loaded on your server check out the netstat graph. Bitchin isn’t it? You’ll also notice that Nginx uses very little CPU – almost nothing in fact. That’s because Nginx is designed using a single threaded model where one thread handles a huge number of connections. It can do this with little CPU usage because it uses a feature in the Linux kernel called epoll().

Footnotes:
Lack of time forced me to leave out all explanations on how to install and configure Nginx (I’m assuming you know Apache already) – but the Nginx Wiki is excellent, even if the Russain translation is a little rough.
I’ve also purposely left out all references to solving disk bottlenecks (as I’ve left out a discussion about browser caching) because there has been a lot written about this and depending on what app or app-server you’re running, there are some very standard ways to solve IO problems already. e.g. Memcached, the InnoDB cache for MySQL, PHP’s Alternative PHP Cache, perstence engines that keep your compiled code in memory, etc..etc..

This technique works to speed up any back-end application server that uses a one-thread-per-connection model. It doesn’t matter if it’s Ruby via FastCGI, Mod_Perl on Apache or some crappy little Bash script spitting out data on a socket.

This is a very standard config for most high traffic websites today. It’s how they are able to leave keepalive enabled and handle a huge number of concurrent users with a relatively small app server cluster.  Lighttpd and Nginx are the two most popular free FSM/epoll web servers out there and Nginx is the fastest growing, best designed (IMHO) and the one I use to serve 400 requests per second on a small Apache cluster. It’s also what guys like WordPress.com use.


Very high performance web servers

Have you ever tried to get Apache to handle 10,000 concurrent connections? For example, you have a very busy website and you enable keepalive on your web server. Then you set the timeout to something high like 300 seconds for ridiculously slow clients (sounds crazy but I think that’s Apache’s default). All of a sudden when you run netstat it tells you that you have thousands of clients with established connections to your machine.

Apache can’t handle 10,000 connections efficiently because it uses a one-thread-per-connection model (or if you’re using prefork then one process per connection).
If you want to allow your clients to use keepalive on your very busy website you need to use a server that uses an event notification model. That means that you have a single thread or process that manages thousands of sockets or connections. The sockets don’t block the execution of the thread but instead sit quietly until something happens and then have a way of notifying the thread that something happened and it better come take a look.

Most of us use Linux these days – of course there are the BSD die hards but whatever. The linux kernel 2.6 introduced something called epoll that is an event notification system for applications that want to manage lots of file descriptors without blocking execution and be notified when something changes.
Both lighttpd and nginx are two very fast web servers that use epoll and a non-blocking event notification model to manage thousands of connections with a single thread and just a few megs of ram (ram consumption is the real reason you can’t use apache for high concurrency). You can also spawn more than one thread on both servers if you’d like to have them use more than one processor or cpu core.

I used to use lighttpd 1.4.x but it’s configuration really sucks because it’s so inflexible. I love nginx’s configuration because it’s very intuitive and very flexible. It also has some very cool modules including an experimental embedded perl module. The performance I’m getting out of it is nothing short of spectacular. I run 8 worker processes and each process consumes about 7 megs of RAM with a few modules loaded.
So my config looks like:
request ==> nginx_with_keepalive –> apache/appserver_nokeepalive

If you’d like to read more about server models for handling huge numbers of clients, check out Dan Kegel’s page on the so called c10k problem where he documents a few other event models for servers and has a history lesson on event driven IO.
Also, if you’re planning on running a high traffic server with high concurrency you should probably optimize your IP stack – here are a few suggestions I made a while back on how to do that.

REFERENCES
http://markmaunder.com/2009/how-to-handle-1000s-of-concurrent-users-on-a-360mb-vps/
http://news.ycombinator.com/item?id=2594050 
http://markmaunder.com/category/scaling/