Event Start: Thursday, September 17th, 5:19pm Eastern
Event Resolution: Thursday, September 17th, 6:22pm Eastern
Summary
A work error was made when making a change to the VLAN configuration of a switch trunk port in the core networking infrastructure in the Greenville datacenter. Additionally, a switch terminating a small subset of colocation customers was impacted even after the configuration change to the core networking switch was reverted due to a suspected Cisco bug that prevented its uplinks from coming back up.
Timeline of Events
5:19pm - Erroneous configuration change entered.
5:23pm - Issue was identified.
5:28pm - Statuspage notice posted.
5:37pm - Configuration change reverted.
5:55pm - Additional impact detected on colocation switch.
6:01pm - Colocation switch issue identified.
6:18pm - Colocation switch rebooted.
6:22pm - Service restored after colocation switch completed boot.
Root Cause Analysis
When performing a configuration change on a port-channel connecting two core network switches, a work error was made that caused all VLANs to be suspended. This caused an immediate drop in upstream network connectivity to all impacted services in Greenville. The change was reverted 18 minutes later and connectivity was restored. Additionally, when the VLANs were suspended, one of Green Cloud's colocation switches blocked all connectivity on its uplinks into the core due to what appears to be a Cisco bug. After the ongoing impact was identified, Green Cloud proceeded with a reboot of that switch to clear the issue.
Remediation
Green Cloud will reinforce the existing policies and procedures around change management and code review on core networking devices, as well as work on additional training for team members to prevent further errors from happening. Additionally, we are working with Cisco's TAC on the switch issue, and will likely upgrade the software of that switch in a future published maintenance event.