Thursday, November 10, 2011

MySQL wait_timeout default is set too high!

SkyHi @ Thursday, November 10, 2011
Recently an OKPublic server was experiencing fairly sporadic bouts of extreme CPU loads (upwards of 80!) every 6 to 12 hours or so. It was severely limiting the web server. As I was monitoring the running processes, I witnessed several of these bouts and noticed that MySQL processes were spawned repeatedly and in great number; about 3-5 new processes per second.

So I initially thought that these processes were the result of a popular account on the server, but digging through Apache domain logs, there didn’t seem to be any that corresponded with the timing of these events.

Also, looking at the time that processes had been alived, I noticed that the new processes weren’t new at all as some were 30 minutes to 3 hours old, so they were actually just coming out of sleeping.

Reading guides about MySQL optimization on the web, the wait_timeout variable stood out to me as it represents the amount of time that MySQL will wait before killing an idle connection. What really shocked me was that the default wait_timeout variable is 28800 seconds, of 8 hours. Why would anyone want to keep idle MySQL connections alive for 8 hours?

So I changed the setting to 300 seconds (which is still probably more than necessary) and we haven’t experienced the problem since.

The only reason a wait_timeout as high as the default (28800) may make sense would be if there are lot of reoccurring visitors (or connections to the database through other means) and that the applications handling the connections are optimized for persistent connections, which most aren’t.

So the wait_timeout being set so high doesn’t act as a cache, but more of a pool of dead connections. And when MySQL goes into kill mode to wipe the idle connections clean (I’m not sure about the technicalities of this), it seems that the pool can become so large that it takes a lot of CPU usage and RAM to do it all at the same time.

It seems that a better way to save resources through MySQL configuration is to setup a thread_cache and/or query_cache.

REFERENCES
http://www.eliotk.net/12/21/mysql-wait_timeout-default-is-set-too-high/
http://chrisjean.com/2011/01/21/fix-amember-mysql-error-mysql-server-has-gone-away/

MySQL server has gone away max_allowed_packet.

SkyHi @ Thursday, November 10, 2011

Most MySQL users have tried getting this rather cryptic error message: “MySQL server has gone away”. The MySQL documentation describes lots of possible reasons for this here:http://dev.mysql.com/doc/refman/5.1/en/gone-away.html
However this page is of little help for most users, I think. Dozens of reasons are listed, but except for the trivial ones (like physical connection was lost, the MySQL server or the machine where it runs has crashed etc.) there are a few reasons for this that are very common in our experience and a lot of those mentioned are not.
Here we will discuss one situation that to our experience happens very frequently for people working across multiple servers. The situation is that if a client sends a SQL-statement longer than the server max_allowed_packet setting, the server will simply disconnect the client. Next query from the same client instance will find that the ‘MySQL server has gone away’.  At least it is like that with recent server versions.
1)
But the documentation at
http://dev.mysql.com/doc/refman/5.1/en/error-messages-client.html
.. also lists another client error:
Error: 2020 (CR_NET_PACKET_TOO_LARGE)  Message: Got packet bigger than ‘max_allowed_packet’ bytes
along with
Error: 2006 (CR_SERVER_GONE_ERROR): Message: MySQL server has gone away.
Actually I have not seen the ‘got packet bigger ..’ error myself for many years. Not since MySQL 3.23 or 4.0. I am uncertain if a recent server will sometimes still return ‘got packet bigger’ or not or if also this error message itself has ‘gone away’. If the ‘got packet bigger’ message is still relevant with recent servers it would be nice to have it specified under what conditions it occurs and when only ‘gone away’ will. If this error mesage is now ‘historical’ it should at least be removed from documentation or it should be mentioned that the error no. is reserved for this message – but not used anymore. But it would of course be much preferable to have the ‘got packet bigger’ error returned if that is the problem. It tells what the problem is – “MySQL server has gone away” does not tell anything specific. So ‘got packet bigger’ is a *much* better message than ‘gone away’. Also ‘got packet bigger’ is listed among client errors and not server errors what I would expect.  So maybe some problem with my understanding of things here?
Does anybody have any idea about if and why ‘got packet bigger’ now effectively seems to have ‘gone away’ too?
And most important: why disconnect the client? There are reconnect options of course, but it does not really help here. After a reconnect and executing the same query things just repeat themselves.
2)
Basically I never understood why MySQL stick with the default 1M setting for [mysqld] when it is 16M for [mysqldump] in configuration ‘templates’ shipped with the MySQL server (I have tried to ‘hint’ them several times over the last 3-4 years). Obviously they realize that 1M is often too little for backup/restore since they use a larger setting for mysqldump. However users use all other sorts of tools for backup: other script-based tools running on the server, third-party (and MySQL) GUI clients, web-based tools (hosting control panels, phpMyAdmin), backup/restore routines shipping with or built-in applications etc. Often users do not have access to run mysqldump at all on hosted servers (at least not if they are shared servers). Further often Sysadmins are unwilling to change configuration settings and users are left with the option to generate SINGLE INSERTS – with horrible restore performance as a consequence – to ensure cross-server exports/imports (and still it fails with a well-grown MEDIUMBLOB). I deliberately use the term ‘exports/imports’ and not ‘backup/restore’ because it also applies to various tools that can connect to two or more servers at a time and copy data using various techniques without actually generating a file.
The max_allowed_packet problem as described here has been a big problem for us over time. I do not think MySQL fully realises the importance of the problem – mostly because our tools and the tools/clients shipped with the server respectively are used primarily by different segments of users (with some significant overlapping of course). We handle this problem now 100% in SQLyog (we generate the largest BULK INSERTS possible up to 16M everywhere when transferring data from one server to another with all the methods available) but we cannot prevent user  - if he wants to use BULK INSERTS -  to generate a SQL-DUMP on one server that will not import another because BULK INSERTS are too large. We will of course only be able to handle it if we are connected to both servers.
3)
One solution would be to allow for max_allowed_packet as a SESSION variable. After a long time of unclarity about this – refer to http://bugs.mysql.com/bug.php?id=22891 andhttp://bugs.mysql.com/bug.php?id=32223
- it is now clear that it is not and will not be possible to override the GLOBAL setting for the SESSION. I regret this! It would be very nice to be able to “SET max_allowed_packet ..” on top of a SQL-script for instance.
4)
And actually – and most basically – I also do not really understand why a max_allowed_packet setting is required at all – except that it makes sense of course that a server admin should be able to restrict not-well-behaving users in bombing the server with statements containing 1 GB large WHERE-clauses! But then we are not talking about 1M but rather something like 16-64-100M as a critical threshold, I think.
Also I am not sure if the reason is that the setting is used to allocate a fixed-size memory buffer for handling the query string or if it is related to handling network packages or whatever. I just wondered for quite some time if such restriction could not be avoided and whether this implementation is a deliberate choice for some reason or rather some consequence of coding techniques used currently.  I would like to get rid of it!



================================================================

max_allowed_packet= 64M
wait_timeout= 6000
solution worked for me as well – thanks a lot


REFERENCES
http://www.webyog.com/blog/2009/08/10/mysql-server-has-gone-away-part-1-max_allowed_packet/
http://bogdan.org.ua/2008/12/25/how-to-fix-mysql-server-has-gone-away-error-2006.html
http://dev.mysql.com/doc/refman/5.1/en/gone-away.html
http://chrisjean.com/2011/01/21/fix-amember-mysql-error-mysql-server-has-gone-away/
http://stackoverflow.com/questions/1644432/mysql-server-has-gone-away-in-exactly-60-seconds
http://robsnotebook.com/wordpress-mysql-gone-away

CentOS 6 RPMForge/EPEL Repositories

SkyHi @ Thursday, November 10, 2011

Server CentOS
The primary reason for adding these repositories is that neither the CentOS repository nor the rpmforge repository contain the same packages that were available in CentOS 5.x. There is some serious lag time which will cripple your ability to move to CentOS 6. Therefore, in order to gain access to all of the necessary packages you will need to add these two repositories. Unfortunately, this is an unsatisfactory solution as the more repositories you add the greater the chance you will have package conflicts. At this point in time however, this cannot be avoided. In order to minimize the impact of using multiple repositories it is important to use yum-priorities.
wget http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.2-2.el6.rf.i686.rpm

rpm -ivh rpmforge-release-0.5.2-2.el6.rf.i686.rpm
warning: rpmforge-release-0.5.2-2.el6.rf.i686.rpm: Header V3 DSA/SHA1 Signature, key ID 6b8d79e6: NOKEY
Preparing... ########################################### [100%]
1:rpmforge-release ########################################### [100%]

http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.2-2.el6.rf.x86_64.rpm

wget http://download.fedora.redhat.com/pub/epel/6/i386/epel-release-6-5.noarch.rpm

rpm -ivh epel-release-6-5.noarch.rpm
warning: epel-release-6-5.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID 0608b895: NOKEY
Preparing... ########################################### [100%]
1:epel-release ########################################### [100%]

Once you have added the repositories verify they exist in /etc/yum.repos.d

CentOS-Base.repo CentOS-Media.repo epel-testing.repo mirrors-rpmforge-extras rpmforge.repo
CentOS-Debuginfo.repo epel.repo mirrors-rpmforge mirrors-rpmforge-testing

BEWARE: There are testing repos that are also added. These repos are dangerous to use on a production machine.

Edit the /etc/yum.repos.d/CentOS-Base.repo and add priorities to these three:

[base]
priority=1

[updates]
priority=1

[extras]
priority=1

Now access the rpmforge.repo and edit one repository, the others are disabled, to make it 5 in priority.
[rpmforge]
priority=5

Finally, edit epel.repo and make it 11 in priority.

[epel]
priority=11

The purpose of the priorities for a production server is to maintain as much stability as possible with access to packages that are not in the CentOS 6 repository. The numbers represent the most important repository as the CentOS repo, then if packages are not located the rpmforge.repo will be accessed and last the epel.repo. The epel.repo, which is Fedora, is likely to introduce packages slightly ahead of current CentOS packages. Again, this is not a perfect solution, but if you need packages outside of the CentOS repositories you do not have many options.

REFERENCES

Famous Perl One-Liners Explained

SkyHi @ Thursday, November 10, 2011

Here is the general plan:
REFERENCES

Tuesday, November 8, 2011

Diagnosing slow PHP execution with Xdebug and KCachegrind

SkyHi @ Tuesday, November 08, 2011

Diagnosing slow PHP execution with Xdebug and KCachegrind
Tracking down a performance issue to the actual PHP app can be hard enough by itself, but what do you do once you’re sure that the app itself is the bottleneck? The excellent Xdebug extension has many useful features for assisting application developers, but today we’ll be looking at one specific feature that can help us see exactly what is slow in the application: profiling.
Profiling a PHP application can explain how much time the server spent in each function, each file, and each code path. It can also show you how many times a function or method was called, which is useful for diagnosing programming errors involving pointless looping. Xdebug generates cachegrind-compatible files (part of the Valgrind tool suite) which can also be used to create easy-to-understand graphs usingKCachegrind.
Let’s start off with some very simple PHP code that just loops 10 times so we can get some output out of KCachegrind. I won’t cover the installation of Xdebug here, but here is the configuration that I’m using for it:
1
2
3
4
5
6
7
8
9
10
11
12
13
; Enable xdebug extension module
zend_extension=/usr/lib64/php/modules/xdebug.so
xdebug.profiler_output_dir="/dev/shm/trace"
xdebug.profiler_append=On
xdebug.profiler_enable_trigger=On
xdebug.profiler_output_name="%R-%u.trace"
xdebug.trace_options=1
xdebug.collect_params=4
xdebug.collect_return=1
xdebug.collect_vars=0
 
xdebug.profiler_enable=0
;xdebug.auto_trace=Off
Notice that I’m storing the cachegrind output on /dev/shm/trace. This is so I don’t kill performance on a production system when using Xdebug profiling with autotrace if I’m trying to get a large sampling of data. Not really necessary here, but it doesn’t hurt anything either, so we’ll just stick with it.
Here is the PHP code that we’ll be executing:
1
2
3
for ($i = 1; $i <= 10; $i++) {
    echo $i;
}
This will just print ’12345678910′ in our browser window when we load the page up. Because of the way we have Xdebug configured, it will only profile PHP execution when we either GET or POST the special string ‘XDEBUG_PROFILE’ . So, assuming our small test PHP script is hosted at http://example.com/test.php , we would want to hit http://example.com/test.php?XDEBUG_PROFILE . This will generate a file called something like this: ‘_test_php_XDEBUG_PROFILE-1296249911_052628.trace’ . Notice how it has the name of the PHP file that was executed, the query string, and the UNIX timestamp (including sub-second timing info). If we open up the file in a text editor, we can see the contents, which aren’t really meant to be human-readable in this mode (although you can change that, refer to the Xdebug documentation for more info).
1
2
3
4
5
6
7
8
9
10
11
12
13
==== NEW PROFILING FILE ==============================================
version: 0.9.6
cmd: /chroot/home/magbench/magbench1.nexcess.net/html/test.php
part: 1
 
events: Time
 
fl=/chroot/home/examplec/example.com/html/test.php
fn={main}
 
summary: 23
 
0 23
Let’s see what happens when we run this cachegrind output file through KCachegrind. Because I really don’t like wasting all the space installing the KDE libraries on my Ubuntu system which runs Gnome, I just set up a Ubuntu Server virtual machine and installed KCachegrind and all the KDE libraries on that. I then connect to it using ‘ssh -X user@vmhostname kcachegrind’ which opens up KCachegrind in a new window, similar to if I were running it locally.
Image 1
As you can see, test.php took up 100% of the time and there were no callers or calls to external scripts / functions / methods.
Now, let’s see what happens when we test this with some simple PHP software that we love: DokuWiki (http://www.dokuwiki.org/dokuwiki ):
Image 2
Ahh, now we’re on to something. We can see on the chart that the ‘langsel’ function took ~63% of the processing time when executing the script, so if we wanted to optimise, we would want to look there first.
Here is another example: The main doku.php index page once DokuWiki is installed:
Image 3
We can see the structure of the program from the graph, and while most of it looks normal, I’d be a bit suspicious as to why php::date_default_timezone_get took so long and consider just making that a constant to speed things up (by almost 13% !). This is, of course, with zero real wiki content on the page, so things would change as the content changes, but it’s still an very valuable tool for going deep into why an application is slow.
Next time, I’ll cover the other side of application performance tuning: SQL query profiling with Maatkit.

REFERENCES