We run ESX 4.0 on Dell R710s connecting back to an IBM N-Series SAN on FC.
This week we had a major outage due to a RedHat MySQL server falling over.
First off the server would not boot past virtual POST so - knowing it had an RDM disk I removed this from the configuration. The server then would boot to OS but everytime the RDM was reconnected it would either fall over or not boot.
I remembered we had farily recently expanded the space on this server and the RDM was now 300GB in size. However the space used was only 85% on the lun (255GB). Then I remembered block size limitations. The server was sitting on a vmfs partition with a 1MB block size (max 256GB). So I removed the RDM and svMotion'd the server to a datastore which had a 2MB block size, reconnected the RDM and the server booted fine. After a DB consistency check we were back up and running.
So from this fault investigation I went on to read the whitepaper maximums document
http://www.vmware.com/pdf/vsphere4/r40/vsp_40_config_max.pdf
which to be honest- had vague information on RDMs
and then I found VMware KB
which surprising to me, clearly states the RDM pointer files are affected by vmfs block sizes.
However there is a difference of opinion in this thread
And looking further at my own environment, we have multiple running production server with connected RDMs which are in excess in size to the stated block size limits.
I need defined clarity on this because as it sits we could have many production machines in an unstable state, or I am simply left with no explanation for the disk lock we witnessed and an outage which caused a substantial loss to the business.
regards
wolfsonmicro