Deploying our Kubernetes-based solution in a new environment uncovered a tricky difference in underlying software configurations. Be ready to dive into solving an exciting SRE mystery involving Rook, Ceph, containerd, and Linux systemd!
Publications by tag failures
How API Priority and Fairness can help your Kubernetes workloads? Here's a real-life case where its flow control features helped us bring a production application back to life.
A fascinating story of our recent incident resulted in a couple of pull requests to Kubernetes-related projects. Be ready to dive into some intricacies of Kubernetes API as well as etcd interaction.
Our recent experience with Chaos Mesh as a way to test an application run in Kubernetes for various disruptive scenarios.
The disastrous fire OVHcloud data centers experienced this March affected our monitoring system badly. Here is how it challenged us and what we did to keep everything working smoothly.
Here is another failure experience from our SREs that is worth sharing. It involves the migration of an Elasticsearch cluster from one storage to another inside a Kubernetes cluster.
Success stories
Get our new tech articles in a good old fashion!
We promise not to send anything besides them.