POSSIBLE ISSUE: elevated latency on some API calls and app requests

Incident Report for GOV.UK Platform as a Service (PaaS)

Resolved

Hello,

We’ve resolved the issue that affected GOV.UK PaaS today, and full service has been restored to development and production applications running on the platform.

The routers in the production environment exhausted their available memory, causing them to provide a degraded service. Eventually the processes were stopped by the out-of-memory killer. Once all of the routers had restarted normal service was restored.

Tenant applications experienced slow responses and occasional 502 (bad gateway) http errors. These will have been visible to end users. The incident lasted ~5.5 hours (from 04:45 to 10:22).

We’ll now start looking into why and how this happened. In the coming days, we’ll publish an incident report describing the timelines of the event, root cause of the problem, lessons we’ve learned and actions we’ll take to ensure it doesn’t happen again.

I’m sorry for the impact that this has had on your users and the service you provide, and the problems this has caused for your team.

The quickest way to get help using the platform is to email us via gov-uk-paas-support@digital.cabinet-office.gov.uk. To let us know how this incident has affected you, please contact the GOV.UK PaaS product manager: ben.andrews@digital.cabinet-office.gov.uk.

Regards,

Richard Towers,
Developer,
GOV.UK PaaS team

Posted Aug 01, 2018 - 13:30 BST

Monitoring

We identified that some of our routers had exhausted their allocated memory. The operating systems out-of-memory killer eventually killed the routers, causing them to restart. During this time users will have experienced slow requests.

We expect apps and API to be performing normally again now. We will continue to monitor the situation.

If your service is continuing to experience problems and you think it might be related to this issue, please email us via gov-uk-paas-support@digital.cabinet-office.gov.uk.

Posted Aug 01, 2018 - 11:34 BST

Investigating

Hello,

We’re investigating a possible problem with GOV.UK PaaS.

A problem has been reported and we’re investigating whether it’s an issue with the platform or with certain services/apps running on it, and whether or not end users are affected.

If your service is experiencing problems and you think it might be related to this issue, please email us via gov-uk-paas-support@digital.cabinet-office.gov.uk.

We will update you as soon as we know more.

Regards,

Richard Towers
GOV.UK PaaS team

Posted Aug 01, 2018 - 10:57 BST

This incident affected: Ireland (API - availability of the Cloud Foundry API to tenants, Apps - availability of tenant applications to end users).