Resolved
This incident has been resolved for all applications.
Monitoring
Continuing to monitor but looks stable
Monitoring
We've finished rolling out the fix. We are continuing to monitor, but it appears to have stabilized the system as expected.
Monitoring
We're in the process of implementing something we think should fix the problem, and are monitoring to make sure the latency spikes do not resume
Identified
We are seeing two of our redis nodes fail over every 10 - 20 minutes. Each time this happens, it creates a burst of latency for 2-3 minutes that may show up as user apps being slower to load. We are looking into what is causing the failovers and how to best remediate. Most user apps should still be accessible, and in between the failovers we are seeing normal system performance.
Investigating
We are investigating sporadic degraded performance on some apps