How to Avoid System Outages & Most Common Severity Level 1 Issues: Part 1
You are already aware that if you and your users cannot access the system or its data, it can translate into more than a loss of revenue—it can damage your organization’s reputation. At DataCore we are committed to delivering the highest availability possible. In this series, our goal is to define severity levels and clarify why outages happen and how you can avoid them.
At DataCore, a Severity 1 issue is when no data can be accessed. We are constantly improving our documentation and products to avoid an outage that impacts product data. In a study of the most common severity level 1 issues that resulted in the product data becoming unavailable, the top two reasons for this occurring were because the site had a system-wide power outage, or all the DataCore thin-provisioned pools became full at the same time.
When two DataCore servers with virtual disks that are in a synchronous mirror configuration lose power at the same time (or nearly the same time), it can result in a double failure. Recovering from a double failure requires manual intervention to decide which side of the virtual disk held the last known good data. Once this has been decided, the user must select the ‘Force Online’ option to allow hosts to gain access and for the mirrors to begin synchronizing. This may not be easily ascertainable if, for instance, the power had been restored, and then lost again, or one side of the mirror had a previous issue when power was lost. In this case, we recommend that an incident be opened with our technical support team so that the last known “good” side of the virtual disk can be determined.
Read the entire article here, How to Avoid System Outages & Most Common Severity Level 1 Issues: Part 1
Via the fine folks at DataCore Software.