Service Reliability Engineer

  • Australia Only
  • Amadeus Careers
Job Description:

The Role

The goal of a Service Reliability Engineer (SRE) will be to accelerate Application teams’ ability to reliably and consistently deliver applications by developing standardized automation to form a common continuous deployment pipeline for functional engineering teams as a whole. Other responsibilities include ongoing issues such as change management, problem management, incident management, performance improvement, and automation/tool development.

The SRE is expected to excel under pressure, work well with others, be self-motivated, and be able to manage short and long term projects. Implementing automation for kick starting, monitoring, management, and support will be a key component of the position. The SRE will actively interface with software developers, network engineers, systems, storage, project management and database administrators on projects and provide support as required. The SRE will troubleshoot and resolve issues quickly and effectively. Good communication and teamwork is extremely important. The role also involves participating in the 24 x 7 pager rotation of the team.

Main Responsibilities

Application Support

  • Proactive incident management in synchronization with frontline services and Incidents Response Team
  • Incident response: Monitor and build/define alerting to enable auto-recovery. Provide automation to ensure auto-recovery. In case of not-automatically recovered issue, ensure first full recovery. Once solved, analyze the root-cause of the issue, liaising with the development teams if needed, implement a specific monitoring and automate a response that will manage auto-recovery if the same issue happens again
  • Assist developers in debugging application & performance issues
  • Support application deployments, building new systems and upgrading and patching existing ones.
  • Operate the platform within our security and privacy guidelines.

Service Automation

  • Participate in the design and building of tools and processes to support operations. Leverage scripting to build required automation and tools on an ad-hoc basis.
  • Build and develop automation to enable quick & safe instance deployment
  • Design, drive, develop and use monitoring tools to find problems, resolve and/or escalate to development and ensure that we exceed our Service Level Agreements

Continuous Improvement

  • Be accountable for an applicative platform according to SLA, NFR and operability criteria
  • Contribute to the definition of SLAs, OLAs and NFR
  • Adopt and ensure usage of monitoring tools to find problems, raise alert, and ensure that we meet our SLAs/OLAs
  • Ensure process reengineering and optimization
  • Proactive thought leadership for creative and efficient technology solutions.
  • Drive continuous improvement to the service delivered to customer (agility, stability).
  • Drive the enforcement and definition of operational requirements / non-functional requirements in collaboration with application owners and middleware organizations
  • Document configuration processes and policies

Required Experience

  • University degree (or equivalent) in Computer Science or related technical field
  • Experience in an operational ITSM role, ideally in a mission-critical environment
  • Exposure to operations in open-source and cloud stacks
  • Experience in operational automation
  • ITIL and/or Cloud technology certifications are a plus
  • Lean management or similar is a plus

Other Jobs in Information Security