Deploying our Kubernetes-based solution in a new environment uncovered a tricky difference in underlying software configurations. Be ready to dive into solving an exciting SRE mystery involving Rook, Ceph, containerd, and Linux systemd!
Publications by tag troubleshooting
How API Priority and Fairness can help your Kubernetes workloads? Here's a real-life case where its flow control features helped us bring a production application back to life.
A fascinating story of our recent incident resulted in a couple of pull requests to Kubernetes-related projects. Be ready to dive into some intricacies of Kubernetes API as well as etcd interaction.
Renewing Let’s Encrypt root certificates for legacy CentOS, handling an error in DNS records with Ingress, restoring a PgSQL table from a backup, dealing with a tricky sharding problem in Elasticsearch, and more.
Our new exciting SRE/Ops stories include complicated Linux server migration (bare-metal to VM), large PostgreSQL replica recovery, small case with ClickHouse Kubernetes operator, and abnormal result of CockroachDB upgrade.
Our stories include a poorly prepared Kafka in Docker, an unexpected network issue for ZooKeeper & ClickHouse, a faulty hardware in the data center, and the PgSQL database optimization.
Success stories
Get our new tech articles in a good old fashion!
We promise not to send anything besides them.