Quantcast
Channel: VMware Communities : All Content - All Communities
Viewing all articles
Browse latest Browse all 207710

ESXi 4.1 U1 host becomes unresponsive

$
0
0

I'm having this problem with an ESXi 4.1 host on an almost weekly basis - the host suddenly becomes unresponsive and the guests appear to be completely dead. I can still navigate through the vSphere Client but cannot perform any tasks, and the VM console screens are blank. The VMs appear to be running (according to status in vSphere Client), but I cannot ping them or connect to them in any way. I try the reboot from the DCUI but it does nothing - I end up having to power cycle the server to get it working again.

 

I have looked in /scratch/log/messages on the host and do not see anything obvious. Here's the last few minutes before the last time it hung:

 

Oct 24 14:22:48 Hostd: [2011-10-24 14:22:48.055 343F0B90 error 'App'] Failed to read header on stream TCP(local=127.0.0.1:51337, peer=127.0.0.1:0): N7Vmacore15SystemExceptionE(Connection reset by p
Oct 24 14:22:48 Hostd: [2011-10-24 14:22:48.068 33F2EB90 verbose 'Proxysvc Req01002'] New proxy client SSL(TCP(local=193.120.91.121:60914, peer=193.120.91.2:443))                                  
Oct 24 14:22:58 nssquery: Group lookup failed for 'S3\ESX Admins'                                                                                                                                   
Oct 24 14:22:58 Hostd: [2011-10-24 14:22:58.866 33F2EB90 warning 'UserDirectory'] Group lookup failed for 'S3\ESX Admins'                                                                           
Oct 24 14:23:27 Hostd: [2011-10-24 14:23:27.863 33F2EB90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root                                                                           
Oct 24 14:23:58 nssquery: Group lookup failed for 'S3\ESX Admins'                                                                                                                                   
Oct 24 14:23:58 Hostd: [2011-10-24 14:23:58.923 33F6FB90 warning 'UserDirectory'] Group lookup failed for 'S3\ESX Admins'                                                                           
Oct 24 14:24:58 nssquery: Group lookup failed for 'S3\ESX Admins'                                                                                                                                   
Oct 24 14:24:58 Hostd: [2011-10-24 14:24:58.983 342DBB90 warning 'UserDirectory'] Group lookup failed for 'S3\ESX Admins'                                                                           
Oct 24 14:24:59 Hostd: [2011-10-24 14:24:59.304 FFEC5E80 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root                                                                           
Oct 24 14:25:59 nssquery: Group lookup failed for 'S3\ESX Admins'                                                                                                                                   
Oct 24 14:25:59 Hostd: [2011-10-24 14:25:59.038 33F2EB90 warning 'UserDirectory'] Group lookup failed for 'S3\ESX Admins'                                                                           
Oct 24 14:26:29 Hostd: [2011-10-24 14:26:29.926 33EEDB90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root                                                                           
Oct 24 14:26:59 nssquery: Group lookup failed for 'S3\ESX Admins'                                                                                                                                   
Oct 24 14:26:59 Hostd: [2011-10-24 14:26:59.092 343F0B90 warning 'UserDirectory'] Group lookup failed for 'S3\ESX Admins'                                                                           
Oct 24 14:27:13 Hostd: [2011-10-24 14:27:13.010 33F2EB90 verbose 'Proxysvc Req01003'] New proxy client TCP(local=127.0.0.1:57757, peer=127.0.0.1:80)                                                
Oct 24 14:27:13 Hostd: [2011-10-24 14:27:13.011 344B1B90 info 'Vmomi'] Activation [N5Vmomi10ActivationE:0x34708c28] : Invoke done [waitForUpdates] on [vmodl.query.PropertyCollector:ha-property-coll
Oct 24 14:27:13 Hostd: [2011-10-24 14:27:13.011 344B1B90 verbose 'Vmomi'] Arg version:                                                                                                              
Oct 24 14:27:13 Hostd: "50"                                                                                                                                                                         
Oct 24 14:27:13 Hostd: [2011-10-24 14:27:13.012 344B1B90 info 'Vmomi'] Throw vmodl.fault.RequestCanceled                                                                                            
Oct 24 14:27:13 Hostd: [2011-10-24 14:27:13.012 344B1B90 info 'Vmomi'] Result:                                                                                                                      
Oct 24 14:27:13 Hostd: (vmodl.fault.RequestCanceled) {                                                                                                                                              
Oct 24 14:27:13 Hostd:    dynamicType = <unset>,                                                                                                                                                    
Oct 24 14:27:13 Hostd:    faultCause = (vmodl.MethodFault) null,                                                                                                                                    
Oct 24 14:27:13 Hostd:    msg = "",                                                                                                                                                                 
Oct 24 14:27:13 Hostd: }                                                                                                                                                                            
Oct 24 14:27:13 Hostd: [2011-10-24 14:27:13.012 342DBB90 error 'App'] Failed to read header on stream TCP(local=127.0.0.1:62851, peer=127.0.0.1:0): N7Vmacore15SystemExceptionE(Connection reset by p
Oct 24 14:27:13 sfcb-vmware_base[5907]: LsaFindUserByName: 40008                                                                                                                                    
Oct 24 14:27:13 sfcb-vmware_base[5907]: LsaFindUserByName: 40008                                                                                                                                    
Oct 24 14:27:13 sfcb-vmware_base[5907]: LsaFindUserByName: 40008                                                                                                                                    
Oct 24 14:27:13 sfcb-vmware_base[5907]: LsaFindUserByName: 40008                                                                                                                                    
Oct 24 14:27:13 sfcb-vmware_base[5907]: LsaFindUserByName: 40008                                                                                                                                    
Oct 24 14:27:13 sfcb-vmware_base[5907]: LsaFindUserByName: 40008                                                                                                                                    
Oct 24 14:27:13 sfcb-vmware_base[5907]: LsaFindUserByName: 40008                                                                                                                                    
Oct 24 14:27:13 sfcb-vmware_base[5907]: LsaFindUserByName: 40008                                                                                                                                    
Oct 24 14:27:13 sfcb-vmware_base[5907]: LsaFindUserByName: 40008                                                                                                                                    
Oct 24 14:27:13 sfcb-vmware_base[5907]: LsaFindUserByName: 40008                                                                                                                                    
Oct 24 14:27:46 Hostd: [2011-10-24 14:27:46.929 342DBB90 verbose 'DvsManager'] PersistAllDvsInfo called                                                                                             
Oct 24 14:27:59 nssquery: Group lookup failed for 'S3\ESX Admins'                                                                                                                                   
Oct 24 14:27:59 Hostd: [2011-10-24 14:27:59.148 3436DB90 warning 'UserDirectory'] Group lookup failed for 'S3\ESX Admins'                                                                           
Oct 24 14:28:00 Hostd: [2011-10-24 14:28:00.549 33EEDB90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root                                                                           
Oct 24 14:28:59 nssquery: Group lookup failed for 'S3\ESX Admins'                                                                                                                                   
Oct 24 14:28:59 Hostd: [2011-10-24 14:28:59.203 33F2EB90 warning 'UserDirectory'] Group lookup failed for 'S3\ESX Admins'                                                                           
Oct 24 14:29:31 Hostd: [2011-10-24 14:29:31.171 33EEDB90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root                                                                           
Oct 24 14:29:59 nssquery: Group lookup failed for 'S3\ESX Admins'                                                                                                                                   
Oct 24 14:29:59 Hostd: [2011-10-24 14:29:59.260 33F2EB90 warning 'UserDirectory'] Group lookup failed for 'S3\ESX Admins'                                                                           
Oct 24 14:30:59 nssquery: Group lookup failed for 'S3\ESX Admins'                                                                                                                                   
Oct 24 14:30:59 Hostd: [2011-10-24 14:30:59.316 342DBB90 warning 'UserDirectory'] Group lookup failed for 'S3\ESX Admins'                                                                           
Oct 24 14:31:02 Hostd: [2011-10-24 14:31:02.675 343F0B90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root                                                                           
Oct 24 14:31:59 nssquery: Group lookup failed for 'S3\ESX Admins'                                                                                                                                   
Oct 24 14:31:59 Hostd: [2011-10-24 14:31:59.374 34431B90 warning 'UserDirectory'] Group lookup failed for 'S3\ESX Admins'

 

Then there is nothing and at 14:43:35 I rebooted the machine. I don't really understand most of the above log, but none of it looks critical to me. I can't see any obvious errors on the VMs either, and they're not under any high load.

 

Host hardware:

  • Dell PowerEdge R210
  • Xeon X3440 (quad core + HT)
  • 8 GB RAM
  • Dell SAS 6/iR RAID controller
  • 2x 250 GB SATA disks in RAID 1 array
  • Broadcom BCM5716 onboard NIC (NIC teamimg set up in ESXi)
  • BIOS 1.8.2, iDRAC 6 Express firmware 1.80, Lifecycle Controller firmware 1.4.0.445, RAID controller firmware up-to-date

 

VMs:

  • RHEL 5.7 desktop (64-bit)
  • CentOS 5.7 (64-bit)
  • Windows XP Professional SP3 (32-bit) - this is only used on occasions and was not running the last time the host failed

 

I have an iSCSI target set up on this (there was a Windows 2008 R2 domain controller on this too but I moved it to another host due to unreliability with this one) but it was failing before this was configured. I have installed patches on the host so it is currently running 4.1.0 Build 433742. Guests are also reasonably up to date. However this problem has been happening for a few months, even before I upgraded to U1.

 

I noticed one time the system failed that upon restarting, ESXi was reporting (in Configuration -> Health Status) that one of the disks in the array was rebuilding. I have not noticed this happen again, and the rebuild was successfull.

 

I ran Dell Diagnostics and MemTest (one pass) on the machine and everything seemed ok. There are no errors in the iDRAC event logs.

 

Any ideas what could be wrong?


Viewing all articles
Browse latest Browse all 207710

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>