Visitors

EMC VNX systems with iSCSI-attached hosts may have a storage processor (SP) reboot after 248 days of runtime.

Not so long ago we experienced a very strange outage on our old EMC VNX 5300. All NFS shares went down and the following alerts were present in Unisphere:

EMC VNX Storage Processor Reboot - 1

Data Mover failed over to standby:

EMC VNX Storage Processor Reboot - 2

EMC responded to ESRS alert and after short investigation, advised that both Storage Processors rebooted themselves as per the following EMC Knowledgebase article.

“ETA emc291837: VNX: VNX systems with iSCSI-attached hosts may have a single or dual storage processor (SP) reboot after 248 days of runtime.”

ID: emc291837
Usage: 161
Date Created: 04/04/2012
Last Modified: 01/04/2013
STATUS: Approved
Audience: Customer

Knowledgebase Solution

Question: ETA emc291837: VNX: VNX systems with iSCSI-attached hosts may have a single or dual storage processor (SP) reboot after 248 days of runtime.
Environment: EMC Technical Advisory (ETA)
Environment: Product: VNX Series
Environment: EMC SW: VNX Operating Environment (OE) 05.31.000.5.0xx
Environment: EMC SW: VNX Operating Environment (OE) 05.31.000.5.5xx
Environment: iSCSI connection usage
Environment: EMC SW: VNX Operating Environment (OE) 05.31.000.5.716 or earlier
Problem: Single or Dual HEMI_CPU1_WATCHDOG SP, NMI_HARDWARE_FAILURE and/or CMID_BUGCHECK_PARTITION_FROM_LIVE_PEER_DETECTED, or 05900000 FF_ASSERT_PANIC reboot after 248 days of storage processor runtime.
Root Cause: A timer overflow within the VNX operating system occurs after approximately 248 days of runtime. When this overflow happens, iSCSI-based network traffic can be interrupted. Depending on the length and severity of this interruption, the VNX OE software can interpret it as non-responsive driver software and induce a reset (reboot) to clear the issue.This timer is started when the SP is booted so any non-disruptive upgrade (NDU), or controlled shutdown/reboot sequence resets this timer and starts the clock anew for that SP.The reset does not happen at exactly 248.000 days of runtime because there is some variance depending on the network history of the array. However, all arrays that have so far encountered the issue have reset themselves between 248 and 249 days of runtime. The likelihood of this issue affecting your storage system or environment is extremely low.Note: Not all systems with iSCSI connection usage will necessarily reset themselves after running 248 days.
Fix: Fix: Upgrade VNX Block OE to 05.31.000.5.726 or later and VNX File OE to 7.0.54.3 or later (if applicable).To determine your system storage processor runtime you can download the SP Runtime tool. The SP Runtime tool requires that your client system has the latest version of Navisphere CLI installed and the SP RemotelyAnywhere IP filters are set to allow your client system to connect to the storage processors. (See the SP Runtime User Manual for more detailed information how to use the tool.)

Workaround:

If upgrading VNX OE is not an option at this time, the storage processors may be rebooted one at a time to reset the 248 day timer. However, before the 248 day timer expires, you must upgrade your VNX OE to the version containing the permanent fix.

Notes: To check if this issue affects your system: From Unisphere Manager task bar select Hosts > Initiators. Check the Initiator table Protocol column. If any initiators list the ISCSI protocol, your system is affected.
Notes: AR526528, AR531278
Notes: EMC Confidential

When you contact EMC to arrange VNX upgrade, please bear these in mind:

  1. VNX block and file OE code version must be compatible with all applications and solutions you have in your environment.
    1. VMware Site Secovery Manager and SRAs – check VMware Compatibility Guides;
    2. EMC RecoverPoint – EMC RecoverPoint documentation on EMC Support web site. EMC Simple Support Matrix / EMC RecoverPoint 3.5.
  2. Prior to VNX OE 5.31 upgrade to 5.32, make sure you run SMLink_check tool (EMC primus emc308955) and confirm all your Virtual Provision (VP) Pools with auto-tiering enabled are OK.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>