Bots using our Zero-Training AI model were experiencing slow responses and in some extreme cases were replying with “technical error” due to high latency from requests to OpenAI scale tier API.
On a first try to mitigate the issue, we restarted our llm-related services but to no avail. Moving on to a different approach we first disabled our scale-tier for a few minutes and then reenabled. After a few minutes we could see performance improvements and the request latency returning to normal values.
In an initial investigation we could not find any odd behaviour on our systems, which has led to a on-going investigation together with OpenAI to determine the root cause.
To increase our awareness of this type of issues we are implementing the following improvements:
We have found that some SNGP bots were also affected as a side-effect of this incident. This was due to our internal processing queue being shared between models. The high latency on Zero-Training requests was blocking queued requests, resulting in a cascading failure.
To mitigate this in the future, we have increased the maximum number of queued requests, ensuring that SNGP requests can still be processed even if Zero-Training requests encounters delays due to OpenAI-related issues.