Wednesday, January 16, 2013

Linux Iptables ip_conntrack: table full, dropping packet error and solution

SkyHi @ Wednesday, January 16, 2013

Some readers may be interested to know what ip_conntrack is in the first place, and why it fills up. If you run an iptables firewall, and have rules that act upon the state of a packet, then the kernel uses ip_conntrack to keep track of what state what connections are in so that the firewall rule logic can be applied against them. If you have a system that's getting a lot of network activity (high rates of connections, lots of concurrent connections, etc) then the table will accumulate entries.
The entries remain until an RST packet is sent from the original IP address. If you have a flaky network somewhere between you, and the clients accessing your server, it can cause the RST packets to be dropped due to the packet loss, and leave orphaned entries in your ip_conntrack table. This can also happen if you have a malfunctioning switch or NIC card... not necessarily a routing problem out on the internet somewhere.
Typically when I've seen this trouble crop up is when a server is the target of a DDoS attack. Filling up the ip_conntrack table is a relatively easy way to knock a server off line, and attackers know this.
As Major suggested, you can get short term relief by increasing the size of the table. However, these entries are held in memory by the kernel. The bigger you make the table, the more memory it will consume. That memory could be used by your server to serve requests if you really don't need the stateful firewall capability. Don't waste resources on this feature if you really don't need it.
Another option to consider is turning OFF iptables rules that use ip_conntrack so the state able is not used at all. Anything with "-m state" or "-t nat" can be turned off. If you want to just flush all your iptables rules you can do an "iptables -P" to set a default allow policy and "iptables -F" to flush all the rules. On an RHEL or CentOS system you can just do "service iptables stop".
Once iptables is no longer using ip_conntrack, you can reclaim the memory the table was using by unloading the related kernel modules.
rmmod ipt_MASQUERADE
rmmod iptable_nat
rmmod ipt_state
rmmod ip_conntrack
Then you will have an empty ip_conntrack that will stay empty. I mention this because a lot of sysadmins have hordes of iptables rules installed as a matter of course, and don't recognize the downside of having them present. You can still use iptables, but to avoid the use of ip_conntrack simply don't use rules based on stateful logic.

One other aspect to consider when raising your max conntrack setting is the depth of the memory objects used to track these connections, henceforth referred to as "buckets".
On RedHat the default hashsize for the conntrack module is 8192. The rule of thumb is to allow for no more than 8 connections per bucket so you would set your conntrack size to be equal to 8 * hashsize. This is why RedHat defaults the ip_conntrack_max to 65536.
You can tweak these settings by adjusting not just the ip_conntrack_max setting but the hashsize option to the ip_conntrack module.
So, for example, if you were to set your ip_conntrack_max to 131072 without modifying the default hashsize of 8k, you are allowing for a bucket depth of 16 entries. Thus the kernel has to dig deeper, potentially, to find that one connection object in it's bucket.
There are a number of schools of thought on how best to address this but in practice I have found that, given the resources, a shallower bucket is better.
For a server that does extremely heavy network traffic, and of course has the memory to spare, you would want to keep the average bucket depth to 2 or 4.
Hashsize, to my knowledge, isn't a dynamic setting so you will need to load the ip_conntrack module with the option:
hashsize =
So in Major's example above, if you want to double your server's capacity for tracked connections while not doubling the lookups you would reload the module with:
options ip_conntrack hashsize=16384
This keeps the items per bucket to 8. I have seen machines with a depth of beyond 8 get completely cowed under heavy network load and since memory is relatively plentiful nowadays you can increase the efficiency of the lookups by making this 4 connections per bucket or even 2 by just doing simple math and reloading the module with the right options.
Hope that helps.
Here is some relatively dated yet still applicable information on the subject:

My Red hat Enterprise Linux 5 server reporting the following message in /var/log/messages (syslog):
ip_conntrack: table full, dropping packet.
How do I fix this error?
A. If you notice the above message in syslog, it looks like the conntrack database doesn't have enough entries for your environment. Connection tracking by default handles up to a certain number of simultaneous connections. This number is dependent on you system's maximum memory size.
You can easily increase the number of maximal tracked connections, but be aware that each tracked connection eats about 350 bytes of non-swappable kernel memory!
To print current limit type:
# sysctl net.ipv4.netfilter.ip_conntrack_max
To increase this limit to e.g. 12000, type:
# sysctl -w net.ipv4.netfilter.ip_conntrack_max=12000
Alternatively, add the following line to /etc/sysctl.conf file:
The following will tell you how many sessions are open right now:
# wc -l /proc/net/ip_conntrack
5000 /proc/net/ip_conntrack

Regarding open connections. the best way to keep track of them is by checking /proc/net/ip_conntrack.
cat /proc/net/ip_conntrack | egrep 'dport=(80|443)'| wc -l