Wednesday, 2 June 2010

M. T. Urself

So, I had a server that was misbehaving in a truely weird way. I was accessing this server via and ADSL line, connecting into it using Secure Shell. Any large amount of data pulled from it caused the network connection to drop. But I could push as much data to it as I liked. Huge uploads worked fine, but downloads would get me disconnected instantly. Logging in and typing 'ps ax' would normally only get me half the page of output before the connection froze up. Also, anything that posted information to websites, like logging into googlemail for instance, seemed to freeze as well.

I tried altering the firewall, the router, the routing table on the server, recompiling the kernel, etc, etc, all to no avail. After each new attempt the problem would seem to go away, cruely making me think I'd fixed it, before returning again.

Now, often in the past when there's been network problems, some old unix-hacker with open-toed sandals and a grateful-dead t-shirt has been around to say; "You know man, could it be, like, the MTU?"

The MTU. That thing they had on modems, remember? No, I don't suppose you would.



The proper reply to this statement is, of course: "Get real grandad! It's the effin' nineteen-nineties, innit? 'MTU', ha! You iz well outta touch, m8."

And then, sometime in the noughties, he just wasn't there any more. I miss that dude now.

PARTICULARLY AS, THIS TIME, IT WAS THE ****ING MTU! TWO DAYS OF BANGING MY HEAD AGAINST A BRICK FIREWALL WITH MY BOSS RIDING MY ASS SAYING THINGS LIKE "Haven't you fixed it yet? I thought you were good at this stuff?" AND IT TURNS OUT TO BE THE GODDAMN LEFT-OVER-FROM-THE-DARK-AGES-OF-ARPANET SMALL-PRINT-OF-NETWORKING MTU!!

So, being a civic-minded soul I'm posting the information here, so others don't have to suffer as I did.

The MTU is the "Maximum Transmission Unit". It's a value that tells the computer what the maximum size of network packet is that can be transmitted on a given network. There's a "Maximum Receive Unit" too, but this is less often messed with. The MRU is the maximum packet size that a network device will accept. Obviously, people transmitting to you have to be using an MTU smaller than your MRU, otherwise you'll reject their packets as 'too big'.
The reason you can't transmit packets (sometimes called 'frames') of any old size is that most networks can only actually carry one chunk of data at a time. They achieve the 'magic' of many people using the network at once by a simple method of breaking messages up into packets, and letting everyone take it in turns to send one packet. If the packets and small, and the network moves them fast, it seems like everyone is using it at the same time. But if one person could make a huge packet, I mean really humungous, the mother of all packets, then that single packet could clog up the network and everyone would have to wait for it to have finished transmitting, before they could get a slice of the action. The slower the network, the smaller the packets have to be to provide the illusion of multi-user networking.
Another issue with mtu size, is noisy network connections. If you have networks where there is a lot of noise, and packets are likely to get corrupted, then the bigger the packet, the more likely it is to be 'hit' by corruption. This will mean that the packet has to be re-sent, but because it's a big packet, it's quite likely to get hit by corruption again!

Modems and serial links were generally noisy links, and they were slow too. As a result they generally had small MTU sizes, and much time was spent tweaking the MTU size to get the best results on a particular line. However, ADSL/Cable and Ethernet are much faster and more reliable, and generally all seem to agree on a default MTU of 1500 bytes.

There's also some magic in TCP/IP called 'Path MTU discovery', which works like this: If there's a network device between you and the machine you are exchanging data with, that cannot handle your chosen MTU/packet size, it will drop your packet and send a message back to you saying 'packet too big'. This message is sent using the icmp protocol (the same protocol that powers 'ping'). Hence, most modern systems should automagically reset their MTU to whatever is needed.

However, if you block icmp with your firewall, because you're thinking "I don't need ping", then this magic won't be able to happen. If you then also have an ADSL router who is using an MRU smaller than your MTU, it will drop any large packets you send it, and your system won't know because it's icmp based complaints won't get through your firewall. Hence, you'll be able to connect to your computer from outside, or connect to websites and pull data from them, but the first time you pull more than a little data from your machine, or push more than a little data to the net, everything will lock up.

The Cure

There's two things to do:

1) Find out what MTU/MRU the device you are talking through (in my case ADSL router) is using, and use /sbin/ifconfig to set the network card to use it.

/sbin/ifconfig eth0 mtu 1496

2) Make sure you let 'icmp' through you firewall so your system can adjust the MTU automagically.

/sbin/iptables -I INPUT -i eth0 -p icmp -j ACCEPT
/sbin/iptables -I OUTPUT -o eth0 -p icmp -j ACCEPT

Of course, doing item '2' allows people to 'ping' your machine. There are ways to prevent that with iptables, but these are left as an exercise for the reader.

No comments:

Post a Comment