Chats / Dashboard intermittently not responsive

Incident Report for Ultimate AI's System Uptime

Postmortem

Around 10:00 AM UTC we have noticed an increase errors in our internal error reporting tool, and a quick investigation revealed that one of our database clusters, managed by an external provider, cannot handle the current load. This affected our dashboard as well as incoming chats.

An investigation revealed that our database was automatically scaled down to a less performant instance based on the previous 24 hours. We reacted by manually scaling it up to the previous cluster size using external provider’s console, however due to a malfunction on the external provider side, this process did not complete as it usually should.

At 10:40 AM UTC we contacted the external provider’s support team, and they force-applied our change from their end.

At 11:57 AM UTC our dashboard became operational.

At 12:12 PM UTC chats on all CRMs became operations.

As first measure to avoid a similar future situation, we now enforce a higher minimum cluster size to avoid excessive automatic down scaling. Furthermore, we are in contact with the provider’s support to investigate why our changes were not applied.

Posted Nov 27, 2023 - 17:30 CET

Resolved

This incident has been resolved.

Posted Nov 27, 2023 - 14:04 CET

Monitoring

Both dashboard and chats are now operational

Posted Nov 27, 2023 - 13:07 CET

Update

Ultimate dashboard has resumed to be operational

Posted Nov 27, 2023 - 12:57 CET

Update

We are continuing to work on a fix for this issue.

Posted Nov 27, 2023 - 12:39 CET

Update

We are continuing to work on a fix for this issue.

Posted Nov 27, 2023 - 12:02 CET

Identified

The issue has been identified and a fix is being implemented.

Posted Nov 27, 2023 - 11:28 CET

Update

We are continuing to investigate this issue.

Posted Nov 27, 2023 - 11:17 CET

Investigating

Chats are slow to response and fail at certain actions.
Dashboard is timing out intermittently as well.

Posted Nov 27, 2023 - 11:13 CET

This incident affected: Chat integrations (Sunshine, LiveChat.com CRM Integration, Zendesk Chat, Salesforce, Freshchat, Zendesk Support Automation, Giosg Automation) and Dashboard, Backend Integrations, Web Chat Widget.