Senior Site Reliability Engineer at IMG ARENA

  • Anywhere (100% Remote) Only
  • IMG ARENA
Job Description:

About the role:

IMG ARENA’s engineering community are looking to minimise the challenges of shipping, rapidly iterating, and securing their applications, as well as ensuring they operate in a reliable and performant manner. As a senior SRE, you will join a growing SRE team that works closely with an established DevOps team, multiple agile software development teams who embrace DevOps culture, and business stakeholders. You will be responsible for designing, implementing and maintaining our cloud infrastructure that powers services with millions of users whilst adopting and evangelizing SRE principles and best practices. You will ensure adherence to security, best practice and standards for all application development, deployment and testing practices. You will design, develop, and maintain high-quality toolings and automation frameworks that helps tracking and ensuring the services' SLOs are met. You will help IMG ARENA’s engineering community learn and grow as industry best practices for DevOps and SRE evolve.

Responsibilities:

  • Ensure the services' SLOs are met through aspects such as reliability and performance; support software development teams to meet their SLOs when necessary
  • Partner with DevOps and development teams to establish SLIs and SLOs for IMG ARENA’s internal and external-facing services
  • Implement toolings to automate away repetitive tasks and improve observability (such as tracking SLOs, monitoring custom platform metrics etc.)
  • Work closely with the DevOps team to manage and improve IMG ARENA’s cloud - infrastructure that mainly uses AWS, Terraform and Kubernetes
  • Work closely with the Security and QA teams to implement and improve automated testing frameworks that are used by IMG ARENA services
  • Improve IMG ARENA’s GitOps-based Continuous Delivery pipelines
  • Establish and maintain SRE best practices across the organisation through high-quality documentations, knowledge-sharing workshops, and training sessions etc.
  • Contribute to the design of project solutions and architectures that directly impact - the business
  • Participate in on-call/support rota

Essential requirements:

  • Strong track record as an SRE or software engineer who has managed large-scale applications in production whilst adopting SRE principles and best practices
  • Proficiency in at least one modern programming language (Full-stack development - experience is a strong plus)
  • Experience in AWS services (including EC2, EBS, EKS, S3, CloudFront, VPC, IAM, CloudWatch, Lambda) and Infrastructure-as-Code (Terraform)
  • Good knowledge on TCP/IP, HTTP, websockets, load balancer, and DNS technologies Strong expertise in Kubernetes
  • Passion about SRE principles and best practices
  • Strong communication (written& verbal) and collaboration skills across both technial and non-technical stakeholders

Nice-to-have skills:

  • Experience with GitOps and Kubernetes operator implementation
  • Experience with automated tests (such as load tests and chaos tests)
  • Experience with modern observability tools (Prometheus/Thanos, Loki, Grafana, Tempo)
  • Basic working knowledge of SQL and NoSQL databases

Company Benefits

  • Life Insurance
  • Pension
  • Private Medical Insurance
  • Income Protection
  • Season Ticket Loan
  • Dental Insurance
  • Cycle To Work
  • Eye Care
  • Will Writing
  • Give As You Earn
  • Employee Assistance Programme
  • Wellness
  • Gym Membership

Interview Process

  • 1st Stage - 45 minute initial call with SRE Lead
  • 2nd Stage - Technical task
  • 3rd Stage - Present technical task through Microsoft teams
  • Final Stage - Call with Head of Core Services

Other Jobs in DevOps & SysAdmins