Badges

gestaltitbadge

follow-me-twitter

Subscribe to me on FriendFeed

Comments / DISQUS
Feedjit.com

VM3463 – Monitoring Hardware Health With vCenter 4

This VMworld 2009 session took place Thurs at 9:30 am in room 134

Points made by the presenter worth remembering.

  • Physical failure is unavoidable, and an ounce of prevention is worth a pound of cure
  • There is a 50% chance that pieces of an ESX Cluster will fail and take down critical services and servers.
  • You’re not usually staring at a monitoring screen, and you want to be notified as the hardware degrades not afterwards.
  • You want as much hardware info about a host, from multiple different vendor platforms, and on a single screen
  • Physical failure is a fact of virtual life
  • Be proactive about hardware failure and use DRS + hardware monitoring + Alarms

An interesting demo in this session showed the use of  the built in vCenter 4 host hardware temperature status alarm generating SNMP traps as well as automatically putting a host in maintenance mode so an administrator can investigate. This action instigated a VMotion evacuation of the VMs on the impacted host and effectively isolated the hardware issue in the environment with minimal or zero impact.

My key take away of this session is that numerous “out of the box” vCenter event based alarms can be leveraged during the warning phase of hardware failures. This includes alerts covering power, fans, cpus, memory, batteries, etc. The ESX host hardware monitoring feature is detected and available automtically in vSphere 4.

My notes: 

New Hardware Status Hardware Monitoring service in vCenter 4.0

  • Trigger alarms on hardware health changes
  • Expose more sensor information at the VC level
    • Storage profile
    • Firmware info
  • Support industry standards
    • CIM SMASH
    • IBM, HP, DELL, LSI, etc out of the box
  • Separate from the VIM release cycle
    • Tomcat webapp – no load on vpxd
    • Not dependent on VIM API changes
  • HSM – hardware status monitoring service
  • CIM Server SFCBD /Pegasus
  • CIM provider

Alerts on hardware health changes

  • Service divides into various hardware groups
  • Detect changes to health state of individual sensors
  • Service maintains group health
  • Posts event to VC on HW group health change
  • Triggered alarms execute remediate/alert actions
    • SNMP traps
    • Enter/exit maintenance mode
    • Enter/exit standby mode
    • Power off/reboot hos

Demo Video hardware health monitoring system and alerts

  • 2 ESX hosts and vCenter 4 managed
  • Temperature of a host triggers an alarm so you can take preventative action to prevent downtime
  • VC is configured to send SNMP traps
  • Receiving traps on trap receiver
  • Host hardware temperature status alarm is shipped as a default in vCenter
    • Actions send notification trap
    • Actions enter maintenance mode when goes fom green to yellow state
    • Actions exit maintenance mode when alarm returns to green normal state – this may not be a desirable action as automated. You may want to perform additional steps and exit MM manually

 New hardware    status tab in vCenter

  • Get old view if use vSphere client direct to host
  • New tab / view from vCenter
    • Sensors Update every 5 mins
    • You can reset sensors
    • Print and export page in xml format
    • All alerts and warning and details about them are stored on this page
    • System event log with drop down filter for all, informational, or just alerts and warnings

 More Details exposed

  • 16 groups –bios, cpu, ram, fan, voltage, temp, power, sys board/chassis, network, & more
  • 10 hardware alarms    
  • 90 sensor types/classes
  • 160 sensor properties types/classes

CIM Model (Common Information Model)

  • Db-like modeling language
  • Supports query modification language
  • Xml-based transport called cim-xml (over https)
  • Industry standard supported by many hardware vendors – Dell, HP, IBM, LSI & more

CIM Monitor

  • Monitors every host in inventory
  • Calculates / rolls up any changes in status states
  • Client interaction is secure. User needs to possess the System.view privilege to see

Related Posts

  • Tom

    This looks great!! What kinds of things did the presenter must be installed on an HP server to make this workable?? Is this included with vCenter 4?? Or is it something expensive?? If it's included and not excessively difficult to set up and work with, this will be really helpful to many of us. Thank you, Tom

  • http://vmetc.com rbrambley

    Tom,

    Nothing to install. The hardware health monitoring works out of the
    box with Dell, IBM, HP, and others. vCenter 4 has numerous default
    alerting enabled. It up to you to add actions like the SNMP traps,
    enter maintenance mode, or just plain SMTP notification.

  • Tom

    Awesome!! Looking forward to this a lot!! Thank you, Tom

  • Pingback: VMworld 2009 (San Francisco) – Linkage » Yellow Bricks

  • Pingback: Welcome to vSphere-land! » Session Links

  • socialmaker

    I think people need a more personal approach when it comes to doctors. I know i have a doctor which takes care of my problems(vigrx) and he is so nice. He always knows me by name, he's friendly and i gladly attent every meeting.

Get My Podcast On iTunes!
Support VM /ETC
Support VMETC.com

Support VMETC.com

Free Business and Tech Magazines and eBooks
@rbrambley tweets
VMTN Roundtable Podcasts
Subscribe



Add to Google Reader or Homepage
Subscribe in NewsGator Online
Add to netvibes
Add to Plusmo