Site Reliability Engineer (SRE) -
Multi-Cloud Expert

Posted – 2 days ago / Share Job

Job Description

We are seeking an experienced Senior Site Reliability Engineer (SRE) with a proven track record of 6+ years in designing, implementing, and maintaining large-scale, highly available systems across multiple cloud platforms. The ideal candidate will have deep expertise in cloud infrastructure, automation, and DevOps practices, with a strong focus on system reliability, performance, and scalability.

Key Responsibilities:

1) Lead the design and implementation of robust, scalable, and fault-tolerant systems across multiple cloud platforms (AWS, Azure, GCP)
2) Develop and maintain infrastructure as code using tools like Terraform, CloudFormation, or ARM templates
3) Implement and manage CI/CD pipelines for seamless application deployment across various cloud environments
4) Design and implement monitoring, alerting, and observability solutions for complex distributed systems.
5) Lead incident response efforts and conduct thorough post-mortems to prevent recurring issues.
6) Optimise system performance, cost, and resource utilisation across different cloud platforms.
7) Mentor junior team members and promote best practices in SRE and DevOps methodologies
8) Collaborate with development teams to improve application reliability and performance
9) Contribute to the company's multi-cloud strategy and cloud migration efforts. Stay current with emerging technologies and evaluate their potential impact on our infrastructure.

Requirements:

1) 4+ years of experience in SRE, DevOps, or similar roles
2) Extensive hands-on experience with at least two major cloud platforms (AWS, Azure, GCP)
3) Strong programming skills in languages such as Python, Go, or Java
4) Deep understanding of Linux/Unix systems and networking principles - Expertise in containerization technologies (Docker, Kubernetes) and orchestration
5) Extensive experience with Kubernetes for container orchestration and microservices management..
6)Proficiency in infrastructure as code tools (Terraform, CloudFormation, ARM templates).Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack)
7) Strong knowledge of CI/CD practices and tools (Jenkins, GitLab CI, GitHub Actions) - Experience with database technologies (both SQL and NoSQL)
8) Excellent problem-solving skills and ability to troubleshoot complex systems - Strong communication skills and experience leading technical teams.

Preferred Skills:

1) Relevant certifications from major cloud providers (AWS Solutions Architect, Azure Solutions Expert, Google Cloud Professional).
2) Experience with service mesh technologies (e.g., Istio, Linkerd) - Knowledge of security best practices and compliance requirements in cloud environments .
3) Contributions to open-source projects or speaking engagements at technical conferences
4) Familiarity with serverless architectures and FaaS platforms - Experience with chaos engineering practices and tools

What We Offer:

1) Competitive salary and benefits package for entry-level positions.
2) Continuous learning opportunities and budget for professional development
3) Collaborative and innovative work environment.
4) Chance to work on cutting-edge technologies and shape the future of our infrastructure
5) Flexible work arrangements.
6) Regular team-building activities and events.