Monday, March 7, 2011

An Nginx load balancing, caching, reverse proxy

SkyHi @ Monday, March 07, 2011
Continuing the evaluation of clustering our main website on Linux KVM virtual machines, below is our test nginx reverse proxy cache config.

The back end is 2 Apache servers, one on the local host and one on a remote host.

The main site files live on NFS. We set up the cache file system in tmpfs. Our site is dynamic using PHP for most pages. The caching appears to eliminate most of the PHP overhead and keeps the number of Apache processes at bay, reducing overall RAM needed on the VM's. We think this configuration will allow us to use 2 VM's and still end up with fewer resources needed than our old dual CPU 3GB RAM physical web server.

Our nginx disk cache file system was set to 50 Meg for testing.


tmpfs /var/lib/nginx tmpfs size=50M,uid=33 0 0

We used the main nginx config file to define default caching parameters.


# Two processes work well for a single CPU
  user www-data;
  worker_processes  2;

  error_log  /var/log/nginx/error.log;
  pid        /var/run/;

  events {
     worker_connections  1024;
     use epoll;

  http {
    include       /etc/nginx/mime.types;

    # Nginx does the logging
    access_log /var/log/nginx/access.log;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  65;
    tcp_nodelay        on;

    server_names_hash_bucket_size 64;

    # Have nginx do the compression, turn off Apache's mod_deflate
    gzip                on;
    gzip_comp_level     1;
    gzip_disable        msie6;
    gzip_proxied        any;

    # text/html mime type is automatically included for gzip, have to add the rest
    gzip_types          text/plain text/css application/x-javascript text/xml application/xml application/rss+xml text/javascript;

    # Default cache parameters for use by virtual hosts
    # Set the cache path to tmpfs mounted disk, and the zone name
    # Set the maximum size of the on disk cache to less than the tmpfs file system size
    proxy_cache_path  /var/lib/nginx/cache  levels=1:2  keys_zone=adams:10m max_size=45m;
    proxy_temp_path   /var/lib/nginx/proxy;

    # Putting the host name in the cache key allows different virtual hosts to share the same cache zone
    proxy_cache_key "$scheme://$host$request_uri";
    proxy_redirect off;

    # Pass some client identification headers back to Apache  
    proxy_set_header        Host            $host;
    proxy_set_header        X-Real-IP       $remote_addr;
    proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;

    # Cache different return codes for different lengths of time 
    # We cached normal pages for 10 minutes
    proxy_cache_valid 200 302  10m;
    proxy_cache_valid 404      1m;

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;

Following is an nginx virtual host config with the reverse proxy cache enabled. The host name of the server is megalon-test, the website virtual host name is adams-bal. The proxy also handles SSL, and can selectively decide what to cache and for how long.

# The Apache load balancing proxy targets for port 80 traffic
  upstream adams-bal {
  # Proxies for port 443 traffic
  upstream sadams-bal {

  # Virtual host definition 
  server {
    error_page 404 = /about/searchasc/notfound.php;

    # The default location definition, we do some rewrites via nginx as well
    location / {
      # We do some rewrites via nginx as well
      include adams-rewrite;

      # Do caching using the adams zone, with the settings defined in /etc/nginx.conf   
      proxy_cache adams;

      # If it's not in the cache pass back to the adams-bal load balanced targets defined above 
      proxy_pass  http://adams-bal$request_uri;

    # Serve static files directly via nginx, set an expires header for the browser 
    location ~* \.(pdf|css|js|png|gif|jpg|ico|swf|mov|doc|pdf|xls|ppt|docx|pptx|xlsx)$ {
      root /home/www/adams;
      expires max;

  # The SSL virtual host definition, setting up the SSL proxy end to middle to end.
  # client -> SSL -> nginx -> SSL -> Apache
  server {

    # We use a domain wild card cert, under nginx the intermediate lives
    # in the same file as the domain cert
    ssl  on;
    ssl_certificate  adams.edu_wildcard_chain.crt;
    ssl_certificate_key  adams.edu_wildcard.key;
    ssl_session_timeout  5m;
    ssl_protocols  SSLv2 SSLv3 TLSv1;
    ssl_ciphers  ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;
    ssl_prefer_server_ciphers   on;

    # Don't cache ssl pages, just proxy them
    location / {
      proxy_pass  https://sadams-bal$request_uri;

The Apache back end configuration on Ubuntu uses a couple of files, here are configs for the localhost server, running on the same VM as nginx.

# Set up name virtual host for localhost port 80

  # Set up server for SSL
  # SSL name based virtual hosts are not yet supported, therefore no
  # NameVirtualHost statement here

    # SSL for adams-bal

    # SSL for another virtual host through nginx

The Apache virtual host definition.

# For port 80, server name the same as nginx
    DocumentRoot /home/www/adams

    Options -Indexes FollowSymLinks
    AllowOverride AuthConfig Limit 
      Order allow,deny
      Allow from all

  # For SSL, notice Apache requires the intermediate cert in a separate file 
    DocumentRoot /home/www/adams
    SSLEngine on
    SSLCertificateFile /etc/nginx/adams.edu_wildcard.crt
    SSLCertificateKeyFile /etc/nginx/adams.edu_wildcard.key
    SSLCertificateChainFile /etc/nginx/GoDaddy_Intermediate.crt

The Apache configs on the remote host are almost identical, using the actual server IP address rather than localhost. One can envision a swarm of VM's talking to the proxy through a virtual network on private IP's...

The performance and resource usage is very good due to the ram disk caching. We did notice a slow down when the nginx cache filled during a site crawl from a search engine, so make sure there is enough cache to cover the site. Some sites may need a mix of on disk and ram disk caches.