Job Description
We are seeking an experienced Senior Site Reliability Engineer (SRE) with a proven track record of 6+ years in designing, implementing, and maintaining large-scale, highly available systems across multiple cloud platforms. The ideal candidate will have deep expertise in cloud infrastructure, automation, and DevOps practices, with a strong focus on system reliability, performance, and scalability.
Key Responsibilities:
2) Develop and maintain infrastructure as code using tools like Terraform, CloudFormation, or ARM templates
3) Implement and manage CI/CD pipelines for seamless application deployment across various cloud environments
4) Design and implement monitoring, alerting, and observability solutions for complex distributed systems.
5) Lead incident response efforts and conduct thorough post-mortems to prevent recurring issues.
6) Optimise system performance, cost, and resource utilisation across different cloud platforms.
7) Mentor junior team members and promote best practices in SRE and DevOps methodologies
8) Collaborate with development teams to improve application reliability and performance 9) Contribute to the company's multi-cloud strategy and cloud migration efforts. Stay current with emerging technologies and evaluate their potential impact on our infrastructure.
Requirements:
2) Extensive hands-on experience with at least two major cloud platforms (AWS, Azure, GCP)
3) Strong programming skills in languages such as Python, Go, or Java 4) Deep understanding of Linux/Unix systems and networking principles - Expertise in containerization technologies (Docker, Kubernetes) and orchestration
5) Extensive experience with Kubernetes for container orchestration and microservices management..
6)Proficiency in infrastructure as code tools (Terraform, CloudFormation, ARM templates).Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack)
7) Strong knowledge of CI/CD practices and tools (Jenkins, GitLab CI, GitHub Actions) - Experience with database technologies (both SQL and NoSQL)
8) Excellent problem-solving skills and ability to troubleshoot complex systems - Strong communication skills and experience leading technical teams.
Preferred Skills:
2) Experience with service mesh technologies (e.g., Istio, Linkerd) - Knowledge of security best practices and compliance requirements in cloud environments .
3) Contributions to open-source projects or speaking engagements at technical conferences
4) Familiarity with serverless architectures and FaaS platforms - Experience with chaos engineering practices and tools
What We Offer:
2) Continuous learning opportunities and budget for professional development
3) Collaborative and innovative work environment.
4) Chance to work on cutting-edge technologies and shape the future of our infrastructure
5) Flexible work arrangements.
6) Regular team-building activities and events.
If you’re passionate about building and maintaining highly reliable, scalable systems across multiple cloud platforms and ready to take on a leadership role in shaping our infrastructure, we want to hear from you!