Vice President, Site Reliability Engineering, Google Cloud

  • USA Only
  • Google Careers
Job Description:

Minimum qualifications:

• Bachelor's degree in Computer Science, related technical field, or equivalent professional experience. • 5 years of experience with software development (e.g., algorithms, data structures, complexity analysis) and/or systems design (e.g., Unix/Linux, IP networking, performance/reliability). • 15 years of experience managing a team and experience managing multiple cross-functional projects.

Preferred qualifications:

• Master's degree or PhD in a related technical field. • Industry leadership experience in a reliability function for products at scale. • Experience growing and building highly effective teams. • Experience collaborating across organizational boundaries, forming alliances with other members of the executive leadership team, and building bridges that support functional as well as company goals. • Ability to identify trends and promote solutions that solve challenges efficiently across multiple product areas

About the job

As a Vice President, Site Reliability Engineering, Google Cloud you will apply your engineering leadership skills and knowledge of infrastructure and software development to drive ultra-scalable and highly reliable software systems for Google’s products and services.

Site Reliability Engineering (SRE) holds the responsibility for the big picture: determining how our systems relate to each other and using a breadth of tools and approaches to solve a broad spectrum of problems. Practices, such as limiting time spent on operations, blameless postmortems, and proactive identification of potential outages, factor into the iterative improvement key to both product quality and interesting, dynamic day-to-day work. SRE’s culture of diversity, intellectual curiosity, problem solving, and openness unlocks its success. Our organization brings together people with a wide variety of backgrounds, experiences, and perspectives. We encourage collaboration, thinking big, and taking risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also create an environment that provides the support and mentorship to learn and grow.

Google’s diverse range of integrated products makes reliable and high-performance infrastructure more difficult and important than ever. SRE addresses these challenges in driving industry best practice reliability and scalability while enabling feature velocity and coordination, reducing outages, and increasing reliability and scalability. As these systems, services, and products have thrived, we are now having to think about scaling to better serve our internal and external customers while maintaining the reliability our users expect.

Google Cloud helps millions of employees and organizations empower their employees, serve their customers, and build what’s next for their business — all with technology built in the cloud. Our products are engineered for security, reliability and scalability, running the full stack from infrastructure to applications to devices and hardware. And our teams are dedicated to helping our customers & developers see the benefits of our technology come to life.

Responsibilities

• Manage, innovate, and create programs, new software, analytics that drive improvements to the availability, scalability, latency, and efficiency of Google’s products and services. • Work cross-functionally in close partnership with product group leads from technologies up and down Google's very tall and deep stack to guide product engineering to build fast, reliable, and durable production systems. • Develop strategic directions, workforce plans, and organizational structure for the reliability teams within each product group. • Be a key strategic leader of Google’s SRE organization

Other Jobs in DevOps & SysAdmins