Monday, November 11, 2013

PSOD (Purple Screen of Death)

Example of a PSOD kernel stack trace screen
A PSOD (Purple Screen of Death) is the VMware ESX version of a Windows BSOD (Blue Screen of Death).  This occurs when the kernel panics and can no longer function.  There most common causes for a PSOD are:
  • Hardware failure
  • Out of memory
  • Hung CPU conditions
  • Misbehaving drivers (null pointers, invalid memory access, etc)
  • NMI (Non Maskable Interrupts)
When a PSOD occurs, one should collect the following:
  1. Screenshot of PSOD kernel stack trace screen (if possible)
  2. Support logs from the vm-support command
  3. Kernel log(should be included in vm-support, but better safe then sorry)
  4. Kernel core dump (only needed if a developer asks for it)
If the cause of the PSOD isn't obvious from the PSOD kernel stack trace screen, then the kernel log is the second best place to look for the cause of a kernel panic. To manually collect the kernel log:
## Kernel log - will output: vmkernel-log.1
# esxcfg-dumppart -L /vmfs/devices/disks/$( \
  esxcfg-dumppart --get-active | awk '{print $1}' )

To manually collect the kernel core dump: (if developer asks for it)
## Kernel core dump - will output: vmkernel-zdump.1
## Note: ESXi 5.x will put kernel dump here:
##     /scratch/core/vmkernel-zdump.*
# esxcfg-dumppart -C -D /vmfs/devices/disks/$( \
  esxcfg-dumppart --get-active | awk '{print $1}' )

For testing purposes, one can manually trigger a PSOD:
# vsish -e set /reliability/crashMe/Panic



No comments: