Senior Level Site Reliability Engineer Jobs

Senior level site reliability engineer jobs place engineers in charge of platform reliability at scale, setting the architectural direction, owning incident response programs, and leading the teams that keep critical systems running. Hiring is concentrated in Technology & Software, Electronics & Hardware, and Manufacturing, with 34% of roles remote or hybrid, and employers like MongoDB, NVIDIA, and Applied Materials hiring at this level now.

Find Jobs

Overview

Open roles312+

Top stateCalifornia

Top employerMongoDB

Top citySanta Clara, CA

Work type66% On-site

Top industryTechnology

Showing 5 of 312+ Senior Level Site Reliability Engineer jobs

Senior Site Reliability Engineer

We won't show you this job again

New 1m ago

Okta

Senior Site Reliability Engineer

San Francisco, California

New 1m ago

Okta

San Francisco, California

Software Engineering

Technical Product & Program Management

UI/UX Design

Software Engineering

Technical Product & Program Management

UI/UX Design

Salary TBD

7+ yrs exp.

5,001-10,000

Have you applied for this role?

Manager, Software Engineering (Reliability Platform)

We won't show you this job again

New 1m ago

Affirm

Manager, Software Engineering (Reliability Platform)

South San Francisco, California

New 1m ago

Affirm

South San Francisco, California

Backend Engineering

Cloud Engineering

Backend Engineering

Cloud Engineering

$204k - $290k/yr

7+ yrs exp.

Remote (US)

Bachelor's degree

$204k - $290k/yr

Remote (US)

Bachelor's degree

1,001-5,000

Have you applied for this role?

SRE Engineer / Site Reliability Engineer Specialist

We won't show you this job again

New 1m ago

NTT DATA

SRE Engineer / Site Reliability Engineer Specialist

Addison, Texas

New 1m ago

NTT DATA

Addison, Texas

Creative & Design

Product Management

UI/UX Design

Creative & Design

Product Management

UI/UX Design

$60 - $65/hr

8+ yrs exp.

$60 - $65/hr

10,000+

Have you applied for this role?

Engineering Manager SRE - Reading, PA

We won't show you this job again

New 1m ago

AkzoNobel

Engineering Manager SRE - Reading, PA

Reading, Pennsylvania

New 1m ago

AkzoNobel

Reading, Pennsylvania

Chemical Engineering

Engineering (Non-Software)

Manufacturing Engineering

Chemical Engineering

Engineering (Non-Software)

Manufacturing Engineering

Salary TBD

7+ yrs exp.

On-Site

Bachelor's degree in mechanical engineering

On-Site

Bachelor's degree in mechanical engineering

10,000+

Have you applied for this role?

Sr. Infrastructure Site Reliability Engineer

We won't show you this job again

New 1m ago

Charles Schwab

Sr. Infrastructure Site Reliability Engineer

Southlake, Texas

New 1m ago

Charles Schwab

Southlake, Texas

Cybersecurity Operations

Security Engineering

Cybersecurity Operations

Security Engineering

$139k - $161k/yr

10+ yrs exp.

Master of Science

$139k - $161k/yr

Master of Science

10,000+

Have you applied for this role?

New 1m ago

Senior Site Reliability Engineer

Okta·San Francisco, California

Job functions:

Software Engineering

Technical Product & Program Management

UI/UX Design

7+ yrs exp.

Full Time

5,001-10,000 employees

Job Description

Secure Every Identity, from AI to Human

Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.

This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.

Get to know Okta

Okta is The World’s Identity Company. We free everyone to safely use any technology—anywhere, on any device or app. Our Workforce and Customer Identity Clouds enable secure yet flexible access, authentication, and automation that transforms how people move through the digital world, putting Identity at the heart of business security and growth.

At Okta, we celebrate a variety of perspectives and experiences. We are not looking for someone who checks every single box, we’re looking for lifelong learners and people who can make us better with their unique experiences.

Join our team! We’re building a world where Identity belongs to you.

The Engineering Opportunity

We are looking for an experienced Senior Site Reliability Engineer to join Okta's Emerging Products Group (EPG). Our mission is to build highly reliable, scalable, and secure cloud services that our customers can trust. We embrace an automation-first mindset and continuously invest in platform engineering, observability, and operational excellence to enable our engineering teams to move quickly and safely.

This role is ideal for an experienced Site Reliability Engineer who enjoys solving complex technical challenges at scale, building automation, and improving the reliability of production systems. You will serve as a key contributor within the EPG SRE organization, partnering closely with software engineers, architects, and product teams to design, build, and operate world-class cloud services.

The ideal candidate exemplifies the philosophy of "if you have to do it more than once, automate it" and possesses a strong passion for continuous improvement, operational excellence, and software engineering.

What You'll Be Doing

Reliability & Operations

Design, build, and operate large-scale cloud infrastructure and production services.
Participate in an on-call rotation supporting highly available customer-facing systems.
Lead incident response efforts and drive post-incident reviews focused on systemic improvements.
Define, measure, and improve Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets.
Partner with engineering teams to improve service availability, scalability, performance, and resilience.
Continuously improve observability through metrics, logging, tracing, dashboards, and alerting.

Engineering & Automation

Develop software, automation, and infrastructure using Go, Python, Terraform, and related technologies.
Eliminate operational toil through automation, tooling, and platform engineering.
Improve deployment safety and operational workflows through CI/CD and GitOps practices.
Collaborate on modernizing existing workloads and aligning them with evolving platform capabilities.
Build self-service platforms, operational guardrails, and automation that improve developer velocity while maintaining reliability and security.

Technical Leadership

Contribute to and drive reliability initiatives within the product group.
Guide engineers in adopting operational best practices and reliability engineering principles.
Mentor engineers through technical collaboration, design reviews, incident analysis, and knowledge sharing.
Support architecture and operational decisions through data-driven recommendations and engineering expertise.
Execute projects from conception through production rollout and long-term operational ownership.

Innovation

Explore and apply AI-assisted engineering techniques to improve operational efficiency, incident response, troubleshooting, and automation.
Identify opportunities to leverage emerging technologies to reduce toil and improve engineering productivity.

Our Tech Stack

Infrastructure/Orchestration: Kubernetes (EKS/GKE), Terraform, Helm, Git, ArgoCD, GitOps
Programming: Golang, Python
Observability: Datadog, Splunk
Data Stores: PostgreSQL, Redis, OpenSearch

What We Are Looking For

Technical Excellence

Strong experience operating large-scale production services in AWS and/or GCP.
Deep expertise with Kubernetes in production environments.
Experience troubleshooting Kubernetes networking, storage, scheduling, scaling, and workload lifecycle issues.
Extensive experience with Infrastructure as Code technologies such as Terraform and Helm.
Strong software engineering skills in Golang and/or Python.
Experience building automation and internal engineering platforms.
Experience operating and troubleshooting distributed data platforms such as PostgreSQL, Redis, OpenSearch, MySQL, Cassandra, or similar technologies.
Strong understanding of cloud networking fundamentals including DNS, load balancing, ingress, TLS, service networking, and traffic management.
Experience with observability platforms, monitoring strategies, and production telemetry.
Experience with or strong interest in AI-assisted engineering and operational automation.

Operational Excellence

Strong expertise operating customer-facing production systems.
Experience leading incident response and driving operational improvements.
Deep understanding of reliability engineering concepts including SLIs, SLOs, error budgets, and capacity planning.
Strong understanding of CI/CD pipelines, deployment strategies, and automation-first operational practices.
Proven ability to balance reliability, scalability, security, and engineering velocity.

Security & Compliance

Understanding of cloud security fundamentals, IAM, secrets management, and secure infrastructure design.
Experience implementing operational controls and best practices in regulated or security-sensitive environments is a plus.

Leadership

Demonstrated experience contributing to complex engineering initiatives.
Strong collaboration and communication skills.
Experience working effectively within globally distributed engineering organizations spanning multiple timezones and cultures.
Experience mentoring engineers and elevating technical capabilities within an organization.
Ability to collaborate on technical direction through expertise, partnership, and execution.

Preferred Qualifications

Experience operating SaaS platforms serving large-scale customer workloads.
Experience working within Kubernetes-based microservices environments.
Experience supporting globally distributed production environments.
Experience with GitOps and ArgoCD.
Experience implementing AI-assisted operational tooling or automation workflows.

#LI-Hybrid
#P22403

The Okta Experience

Supporting Your Well-Being
Driving Social Impact
Developing Talent and Fostering Connection + Community

We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.

Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws.

If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation.

Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please

Okta

The foundation for secure connections between people and technology

Okta is the leading independent provider of identity for the enterprise. The Okta Identity Cloud enables organizations to securely connect the right people to the right technologies at the right time. With over 7,000 pre-built integrations to applications and infrastructure providers, Okta customers can easily and securely use the best technologies for their business. More than 19,300 organizations, including JetBlue, Nordstrom, Slack, T-Mobile, Takeda, Teach for America, and Twilio, trust Okta to help protect the identities of their workforces and customers.

Okta

Senior Site Reliability Engineer

San Francisco, California

New 1m ago

5,001-10,000 employees

Job functions:

Software Engineering

Technical Product & Program Management

UI/UX Design

7+ yrs exp.

Full Time

Job Description

Secure Every Identity, from AI to Human

Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.

This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.

Get to know Okta

Okta is The World’s Identity Company. We free everyone to safely use any technology—anywhere, on any device or app. Our Workforce and Customer Identity Clouds enable secure yet flexible access, authentication, and automation that transforms how people move through the digital world, putting Identity at the heart of business security and growth.

At Okta, we celebrate a variety of perspectives and experiences. We are not looking for someone who checks every single box, we’re looking for lifelong learners and people who can make us better with their unique experiences.

Join our team! We’re building a world where Identity belongs to you.

The Engineering Opportunity

We are looking for an experienced Senior Site Reliability Engineer to join Okta's Emerging Products Group (EPG). Our mission is to build highly reliable, scalable, and secure cloud services that our customers can trust. We embrace an automation-first mindset and continuously invest in platform engineering, observability, and operational excellence to enable our engineering teams to move quickly and safely.

This role is ideal for an experienced Site Reliability Engineer who enjoys solving complex technical challenges at scale, building automation, and improving the reliability of production systems. You will serve as a key contributor within the EPG SRE organization, partnering closely with software engineers, architects, and product teams to design, build, and operate world-class cloud services.

The ideal candidate exemplifies the philosophy of "if you have to do it more than once, automate it" and possesses a strong passion for continuous improvement, operational excellence, and software engineering.

What You'll Be Doing

Reliability & Operations

Design, build, and operate large-scale cloud infrastructure and production services.
Participate in an on-call rotation supporting highly available customer-facing systems.
Lead incident response efforts and drive post-incident reviews focused on systemic improvements.
Define, measure, and improve Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets.
Partner with engineering teams to improve service availability, scalability, performance, and resilience.
Continuously improve observability through metrics, logging, tracing, dashboards, and alerting.

Engineering & Automation

Develop software, automation, and infrastructure using Go, Python, Terraform, and related technologies.
Eliminate operational toil through automation, tooling, and platform engineering.
Improve deployment safety and operational workflows through CI/CD and GitOps practices.
Collaborate on modernizing existing workloads and aligning them with evolving platform capabilities.
Build self-service platforms, operational guardrails, and automation that improve developer velocity while maintaining reliability and security.

Technical Leadership

Contribute to and drive reliability initiatives within the product group.
Guide engineers in adopting operational best practices and reliability engineering principles.
Mentor engineers through technical collaboration, design reviews, incident analysis, and knowledge sharing.
Support architecture and operational decisions through data-driven recommendations and engineering expertise.
Execute projects from conception through production rollout and long-term operational ownership.

Innovation

Explore and apply AI-assisted engineering techniques to improve operational efficiency, incident response, troubleshooting, and automation.
Identify opportunities to leverage emerging technologies to reduce toil and improve engineering productivity.

Our Tech Stack

Infrastructure/Orchestration: Kubernetes (EKS/GKE), Terraform, Helm, Git, ArgoCD, GitOps
Programming: Golang, Python
Observability: Datadog, Splunk
Data Stores: PostgreSQL, Redis, OpenSearch

What We Are Looking For

Technical Excellence

Strong experience operating large-scale production services in AWS and/or GCP.
Deep expertise with Kubernetes in production environments.
Experience troubleshooting Kubernetes networking, storage, scheduling, scaling, and workload lifecycle issues.
Extensive experience with Infrastructure as Code technologies such as Terraform and Helm.
Strong software engineering skills in Golang and/or Python.
Experience building automation and internal engineering platforms.
Experience operating and troubleshooting distributed data platforms such as PostgreSQL, Redis, OpenSearch, MySQL, Cassandra, or similar technologies.
Strong understanding of cloud networking fundamentals including DNS, load balancing, ingress, TLS, service networking, and traffic management.
Experience with observability platforms, monitoring strategies, and production telemetry.
Experience with or strong interest in AI-assisted engineering and operational automation.

Operational Excellence

Strong expertise operating customer-facing production systems.
Experience leading incident response and driving operational improvements.
Deep understanding of reliability engineering concepts including SLIs, SLOs, error budgets, and capacity planning.
Strong understanding of CI/CD pipelines, deployment strategies, and automation-first operational practices.
Proven ability to balance reliability, scalability, security, and engineering velocity.

Security & Compliance

Understanding of cloud security fundamentals, IAM, secrets management, and secure infrastructure design.
Experience implementing operational controls and best practices in regulated or security-sensitive environments is a plus.

Leadership

Demonstrated experience contributing to complex engineering initiatives.
Strong collaboration and communication skills.
Experience working effectively within globally distributed engineering organizations spanning multiple timezones and cultures.
Experience mentoring engineers and elevating technical capabilities within an organization.
Ability to collaborate on technical direction through expertise, partnership, and execution.

Preferred Qualifications

Experience operating SaaS platforms serving large-scale customer workloads.
Experience working within Kubernetes-based microservices environments.
Experience supporting globally distributed production environments.
Experience with GitOps and ArgoCD.
Experience implementing AI-assisted operational tooling or automation workflows.

#LI-Hybrid
#P22403

The Okta Experience

Supporting Your Well-Being
Driving Social Impact
Developing Talent and Fostering Connection + Community

We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.

Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws.

If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation.

Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please

Okta

The foundation for secure connections between people and technology

Okta is the leading independent provider of identity for the enterprise. The Okta Identity Cloud enables organizations to securely connect the right people to the right technologies at the right time. With over 7,000 pre-built integrations to applications and infrastructure providers, Okta customers can easily and securely use the best technologies for their business. More than 19,300 organizations, including JetBlue, Nordstrom, Slack, T-Mobile, Takeda, Teach for America, and Twilio, trust Okta to help protect the identities of their workforces and customers.

See All 312+ Senior Level Site Reliability Engineer Jobs

Find roles that match your experience and apply in just a few clicks.

Find Jobs

Senior Level Site Reliability Engineer Job Market

Who's Hiring

MongoDB13
NVIDIA8
Applied Materials6
Apple5
International Paper5

Top Industries Hiring

Technology & Software116
Electronics & Hardware31
Manufacturing22
Chemicals & Materials14
Consulting & Professional Services13

Find Jobs

Senior Level Site Reliability Engineer Jobs: Frequently Asked Questions

How do I get a senior level site reliability engineer job?

Employers at this level look for engineers who have owned reliability outcomes end to end, not just executed tasks. Demonstrated experience designing SLOs, leading postmortems, and building observability infrastructure from the ground up signals readiness. Equally important is evidence of cross-functional influence, whether that means driving adoption of reliability practices across engineering teams or mentoring junior engineers through complex incidents.

Which companies hire senior level site reliability engineers?

Companies hiring senior level site reliability engineers right now include MongoDB, NVIDIA, and Applied Materials, based on current listings on Migrate Mate as of June 2026. Hiring at this level tends to concentrate at technology-driven organizations where platform uptime and engineering velocity are directly tied to business outcomes.

Are there remote senior level site reliability engineer jobs?

Yes, remote and hybrid options are widely available at the senior level. About 34% of senior level site reliability engineer openings are remote or hybrid as of June 2026, reflecting how many engineering teams have built distributed on-call and incident response workflows that function effectively without a shared physical location.

What makes a site reliability engineer role senior level?

Senior level site reliability engineer roles are defined by ownership and scope rather than task execution. Engineers at this stage are expected to set reliability standards for entire platforms or product lines, lead incident command during major outages, evaluate and adopt tooling at an organizational level, and actively develop the engineers around them through mentorship, code review, and design feedback.

Which industries hire the most senior level site reliability engineers?

Senior Level site reliability engineer roles concentrate in Technology & Software, Electronics & Hardware, and Manufacturing, based on current listings on Migrate Mate as of June 2026. These sectors share a dependence on high-availability systems where downtime carries significant financial or operational cost, making experienced reliability engineers a consistent hiring priority.

Find Jobs

Related Senior Level Roles

Senior Level React Developer JobsUnited States

Senior Level Mobile Applications Developer JobsUnited States

Senior Level Devops Lead JobsUnited States

Senior Level Devops Manager JobsUnited States

Senior Level Full Stack Web Engineer JobsUnited States

Senior Level Security Engineer JobsUnited States

Senior Level Senior Software Developer JobsUnited States

Senior Level Infrastructure Software Engineer JobsUnited States

Senior Level Staff Engineer JobsUnited States