So I’ve just discovered an interesting issue with an Ubuntu 12.04 server crashing under high IO load.
It appears that the default IO scheduler (CFQ) can cause a complete system lockup when its getting flogged.
When this is combined with OCFS2 it can lead to OCFS2 rebooting the system due to fencing. This appears to be the solution: http://techtitbits.com/2010/04/get-rid-of-freeze-ups-during-disk-io-activity-in-ubuntu/
I’ve just changed two of my boxes to using deadline instead, will report back on how it works.