Hey, I have a very frustrating problem which I wondered if anyone could help with.
We store our VM's on an EMC2 Isilon cluster - accessed over NFS. We see an issue where the storage network will frequently 'hang' for between 4 and 5 seconds. It's not hardware related as I can reproduce the issue on various different servers and workstations.
So far i've boiled it down to a very simple setup to recreate the issue:
1) Install ESXi 4.1U1 on a server or workstation connected to the network with a single 1gig link
2) Setup a VMkernel port for storage and management traffic (either untagged or on a VLAN)
3) Setup an datastore on the isilon cluster mounted over NFS.
4) Create 2 CentOS 5.5 linux VM's (they don't need network). Boot them into runlevel 1 (ie no network, minimum services).
5)
On one VM run ioping to measure latency to it's virtual disk. E.g.:
ioping -c 1000 /tmp
On the other VM, write some data to it's virtual own disk. E.g.:
dd if=/dev/zero of=/tmp/test bs=1024 count=40000
Most often when you run the command on the 2nd machine, a few seconds later both VM's (or all VM's on the host) will hang for 4-5 seconds. When it returns from hanging ioping always reports a ping time between 4000-5000ms. Both machines are frozen during this period, but the network is OK, I can still ping the ESXi host over the same link.
Any ideas?
Nick