An change to the rds-broker introduced a problem with removing existing bindings between an application and an RDS (MySQL or Postgres) service.
What caused the original problem?
The rds-broker changes introduced changes to the way that usernames and passwords are generated when an application is bound to an RDS service (using SHA256 instead of MD5).
For RDS services which were bound prior to the change, the unbind failed as the username used for the binding was created in the old way, but the rds-broker tried to delete a (nonexistent) username generated in the new way.
Why wasn’t this caught before it reached prod?
Checking for this was not a part of our acceptance tests, there are no straightforward automated tests for errors of this kind. The error was reported by a user.
We were alerted by a ticket from a user that there was a problem with deleting an app.
This did not affect the availability of applications hosted on GOV.UK PaaS.
This would have caused errors for tenants who tried to unbind most existing Postgres or MySQL databases, but did not affect new bindings (and unbindings which follow this).
We put updates about this issue on Statuspage, but realised that users weren’t receiving notifications.