Senior Site Reliability Engineer
Michael Page View all jobs
- Mauritius
- Permanent
- Full-time
- Site Reliability Engineer
- DevOps
- Own reliability, availability, scalability, and security of production systems
- Design and operate highly available, fault-tolerant, multi-region cloud architectures
- Define and manage SLOs, SLIs, SLAs, and error budgets for critical services
- Lead high-severity incidents and drive effective post-incident reviews
- Improve MTTD and MTTR through automation, tooling, and runbooks
- Operate and evolve Kubernetes (EKS) platforms and multi-tenant deployments
- Work with Infrastructure-as-Code (Terraform, CloudFormation, Pulumi) at scale
- Build and improve CI/CD pipelines and deployment safeguards
- Design and maintain observability (metrics, logs, traces, alerting)
- Drive capacity planning, performance optimisation, and cloud cost efficiency
- Partner with Security & Compliance on SOC 2, ISO 27001, GDPR, and DORA controls
- Mentor SREs and influence reliability-first engineering practices across teams
- 6+ years in SRE, DevOps, or cloud infrastructure roles (2+ years in a senior/lead capacity)
- Strong AWS experience (EKS, RDS/Aurora, S3, MSK, VPC, IAM, ALB/NLB)
- Deep Kubernetes operational expertise
- Proven incident management and post-mortem leadership
- Solid experience with IaC, CI/CD, and automation
- Strong scripting or programming skills (Python, Go, Bash)
- Hands-on observability experience (Prometheus, Grafana, Datadog, ELK, OpenTelemetry)
- Excellent communication and cross-team collaboration skills