Issues with automatic kubernetes update

Started at

API

Resolved

Our api and website was down after a automatic node upgrade on one of our DB nodes. We noticed that we only had one read replica for our DB which caused pgbouncer to throw a connection error while the upgrade was running.

This caused a bunch of issues.

Steps to improve:

  • We'll introduce a much better readiness and liveness for our API, since our API tried to scale (and restart all existing pods) to solve the problem, this just made the issue worse.
  • We have increase our instances of the DB from 2 -> 3 to always have 1 read-replica available.
  • Have a better plan when automatic upgrades should be triggered

These two changes will improve similar issues in the future.