Site Reliability Engineer Jobs in San Francisco, CA

Site Reliability Engineer jobs in San Francisco are in high demand, concentrated in SoMa, the Financial District, and Mission Bay across cloud infrastructure, fintech, and enterprise software. Employers hiring right now include Braze, vapi, and Unify. Scan the live roles below and apply to whichever ones fit.

Find Jobs

Overview

Open roles49

Top employerBraze

Work type63% On-site

Companies hiring34

Showing 5 of 49+ Site Reliability Engineer jobs

C

Staff Site Reliability Engineer

We won't show you this job again

C

Added 1w ago

Company

Staff Site Reliability Engineer

San Francisco, California

Added 1w ago

Company

San Francisco, California

Software Engineering

Cloud & DevOps

Technical Product & Program Management

DevOps

Cloud Engineering

Software Engineering

Cloud & DevOps

Technical Product & Program Management

DevOps

Cloud Engineering

$210k - $250k/yr

Have you applied for this role?

Senior Site Reliability Engineer

We won't show you this job again

Added 1w ago

Anyscale

Senior Site Reliability Engineer

San Francisco, California

Added 1w ago

Anyscale

San Francisco, California

Cloud & DevOps

Cybersecurity

Technical Product & Program Management

Cloud Engineering

DevOps

Technical Program Management

Cloud & DevOps

Cybersecurity

Technical Product & Program Management

Cloud Engineering

DevOps

Technical Program Management

Salary TBD

3+ yrs exp.

On-Site

Bachelor's

On-Site

Bachelor's

Have you applied for this role?

Senior Site Reliability Engineer

We won't show you this job again

Added 1w ago

Autodesk

Senior Site Reliability Engineer

San Francisco, California

Added 1w ago

Autodesk

San Francisco, California

Software Engineering

Cloud & DevOps

Technical Product & Program Management

DevOps

Cloud Engineering

Software Engineering

Cloud & DevOps

Technical Product & Program Management

DevOps

Cloud Engineering

$117k - $209k/yr

7+ yrs exp.

Bachelor's

$117k - $209k/yr

Bachelor's

10,000+

Have you applied for this role?

N

Principal Site Reliability Engineer - AI Infrastructure Operations

We won't show you this job again

N

Added 1w ago

nSCALE

Principal Site Reliability Engineer - AI Infrastructure Operations

San Francisco, California

Added 1w ago

nSCALE

San Francisco, California

Cloud & DevOps

Technical Product & Program Management

Cybersecurity

Cloud Engineering

DevOps

Cloud & DevOps

Technical Product & Program Management

Cybersecurity

Cloud Engineering

DevOps

$150k - $2150k/yr

10+ yrs exp.

Remote (US)

$150k - $2150k/yr

Remote (US)

Have you applied for this role?

Senior Software Engineer, Site Reliability Engineer

We won't show you this job again

Added 1w ago

Harvey

Senior Software Engineer, Site Reliability Engineer

San Francisco, California

Added 1w ago

Harvey

San Francisco, California

Software Engineering

Cloud & DevOps

Technical Product & Program Management

Cloud Engineering

DevOps

Backend Engineering

Software Engineering

Cloud & DevOps

Technical Product & Program Management

Cloud Engineering

DevOps

Backend Engineering

$200k - $260k/yr

5+ yrs exp.

On-Site

None

$200k - $260k/yr

On-Site

None

Have you applied for this role?

C

Added 1w ago

Staff Site Reliability Engineer

Company·San Francisco, California

Job functions:

Software Engineering

Cloud & DevOps

Technical Product & Program Management

DevOps

Cloud Engineering

$210k - $250k/yr

Full Time

Job Description

This is not a ticket-taking SRE role.

You will define how mission-critical machine learning and real-time analytics systems operate in production — influencing reliability strategy, deployment standards, and infrastructure architecture across engineering.

This team operates in a highly collaborative, in-person engineering environment in SOMA. Infrastructure, ML, and engineering leaders work side by side to design, build, and operate complex systems in real time. The pace is fast, the feedback loops are tight, and decisions happen quickly.

If you’ve grown from Linux systems DevOps Staff-level SRE, and you now think in terms of systemic risk, scalability, and long-term reliability strategy — this role gives you direct influence and visibility.

This role is intentionally in-person because:

Reliability decisions happen at architectural depth — not over Slack threads
ML, data, and infrastructure teams collaborate continuously in real time
Post-incident reviews, system design debates, and performance tuning sessions are hands-on and high impact
You will have direct access to engineering leadership and decision-makers
The infrastructure you’re operating is mission-critical and evolving quickly

If you value deep technical collaboration, tight feedback loops, and being at the center of high-scale ML systems — this environment is built for that.

What You’ll Own

Production reliability for ML and real-time analytics workloads
CI/CD strategy, deployment automation, and rollback design
Observability frameworks (SLOs, alerting, monitoring, incident response)
Infrastructure-as-Code and Kubernetes environments
Capacity planning and performance optimization
Post-incident reviews that drive measurable, long-term reliability improvements
Reliability standards across teams — not just within a single service

You’ll partner directly with engineering and data science teams to ensure ML workloads are production-ready and reliable by design.

What We’re Looking For

Deep experience operating Linux infrastructure and networking in production environments
Proven impact as a Staff SRE, Senior SRE, or senior-level DevOps/Platform Engineer supporting distributed systems
Experience supporting complex, data-intensive or ML-driven systems in production
Strong hands-on experience with Docker and Kubernetes
Infrastructure-as-Code expertise
Strong scripting ability (Bash and/or Python)
CI/CD ownership experience (GitHub Actions, ArgoCD, or similar)
Experience with modern observability stacks (Prometheus, Grafana, Datadog, ELK, OpenTelemetry)
Ability to debug systemic failures across infrastructure, deployments, and workloads
Clear communicator who works effectively across engineering and data teams

Engineers who have evolved from infrastructure foundations into strategic reliability leaders will thrive here.

These Skills Are a Plus

Experience operating ML platforms at scale (training + inference)
AWS or cloud-managed services experience
Exposure to data platforms such as Spark, Airflow, or Kafka
Experience in SOC 2 or regulated environments

Why This Opportunity

Staff-level ownership of mission-critical ML infrastructure
Direct influence over reliability standards across engineering
High-visibility role with architectural impact
Collaborative engineering culture designed for speed and depth
Competitive base compensation ($210K–$250K)

If you're a Staff-level reliability engineer who wants real ownership and architectural influence — let’s start the conversation.

StratITech is partnering with our San Francisco client to build the next generation of high-scale ML infrastructure.

C

Company

Staff Site Reliability Engineer

San Francisco, California

Added 1w ago

Job functions:

Software Engineering

Cloud & DevOps

Technical Product & Program Management

DevOps

Cloud Engineering

$210k - $250k/yr

Full Time

Job Description

This is not a ticket-taking SRE role.

You will define how mission-critical machine learning and real-time analytics systems operate in production — influencing reliability strategy, deployment standards, and infrastructure architecture across engineering.

This team operates in a highly collaborative, in-person engineering environment in SOMA. Infrastructure, ML, and engineering leaders work side by side to design, build, and operate complex systems in real time. The pace is fast, the feedback loops are tight, and decisions happen quickly.

If you’ve grown from Linux systems DevOps Staff-level SRE, and you now think in terms of systemic risk, scalability, and long-term reliability strategy — this role gives you direct influence and visibility.

This role is intentionally in-person because:

Reliability decisions happen at architectural depth — not over Slack threads
ML, data, and infrastructure teams collaborate continuously in real time
Post-incident reviews, system design debates, and performance tuning sessions are hands-on and high impact
You will have direct access to engineering leadership and decision-makers
The infrastructure you’re operating is mission-critical and evolving quickly

If you value deep technical collaboration, tight feedback loops, and being at the center of high-scale ML systems — this environment is built for that.

What You’ll Own

Production reliability for ML and real-time analytics workloads
CI/CD strategy, deployment automation, and rollback design
Observability frameworks (SLOs, alerting, monitoring, incident response)
Infrastructure-as-Code and Kubernetes environments
Capacity planning and performance optimization
Post-incident reviews that drive measurable, long-term reliability improvements
Reliability standards across teams — not just within a single service

You’ll partner directly with engineering and data science teams to ensure ML workloads are production-ready and reliable by design.

What We’re Looking For

Deep experience operating Linux infrastructure and networking in production environments
Proven impact as a Staff SRE, Senior SRE, or senior-level DevOps/Platform Engineer supporting distributed systems
Experience supporting complex, data-intensive or ML-driven systems in production
Strong hands-on experience with Docker and Kubernetes
Infrastructure-as-Code expertise
Strong scripting ability (Bash and/or Python)
CI/CD ownership experience (GitHub Actions, ArgoCD, or similar)
Experience with modern observability stacks (Prometheus, Grafana, Datadog, ELK, OpenTelemetry)
Ability to debug systemic failures across infrastructure, deployments, and workloads
Clear communicator who works effectively across engineering and data teams

Engineers who have evolved from infrastructure foundations into strategic reliability leaders will thrive here.

These Skills Are a Plus

Experience operating ML platforms at scale (training + inference)
AWS or cloud-managed services experience
Exposure to data platforms such as Spark, Airflow, or Kafka
Experience in SOC 2 or regulated environments

Why This Opportunity

Staff-level ownership of mission-critical ML infrastructure
Direct influence over reliability standards across engineering
High-visibility role with architectural impact
Collaborative engineering culture designed for speed and depth
Competitive base compensation ($210K–$250K)

If you're a Staff-level reliability engineer who wants real ownership and architectural influence — let’s start the conversation.

StratITech is partnering with our San Francisco client to build the next generation of high-scale ML infrastructure.

See All 49 Site Reliability Engineer Jobs in San Francisco

Find roles in San Francisco that match your experience and apply in just a few clicks.

Find Jobs

Site Reliability Engineer Job Market in San Francisco

Who's Hiring

Braze3
vapi2
Unify2
Harvey2
Airbyte2

Top Industries Hiring

Technology & Software26
Banking & Financial Services4
Social Media2
Media & Entertainment2
Cybersecurity1

Other California Cities Hiring Site Reliability Engineers

Other cities in California with the most openings

See all Site Reliability Engineer jobs across California

Cupertino

9 jobs

Irvine

7 jobs

Site Reliability Engineer Jobs in San Francisco: Frequently Asked Questions

How do I get a site reliability engineer job in San Francisco?

Target San Francisco's density of cloud-native companies in SoMa and Mission Bay, plus the fintech and enterprise software firms in the Financial District. Hands-on experience with Kubernetes, Terraform, and observability tooling stands out in this market. Contributing to open-source infrastructure projects and networking through local DevOps and SRE meetups in the city gives candidates a concrete edge over remote applicants.

Which companies hire site reliability engineers in San Francisco?

San Francisco site reliability engineer roles are posted by Braze, vapi, and Unify and others right now, based on current listings on Migrate Mate as of June 2026. The city's hiring base is especially strong among high-growth cloud platforms, consumer tech companies, and financial technology firms headquartered in SoMa and the Financial District.

Are there remote site reliability engineer jobs in San Francisco?

Yes, though many SRE roles require at least some on-site presence given hands-on infrastructure and incident response responsibilities. About 37% of site reliability engineer openings tied to San Francisco are remote or hybrid as of June 2026, skewing toward hybrid schedules. Automation, monitoring, and on-call engineering tasks are most commonly performed remotely within San Francisco-based teams.

How can I get a site reliability engineer job in San Francisco with little or no experience?

The most realistic entry path in San Francisco is moving laterally from a systems administration or DevOps engineer role at one of the city's mid-size SaaS or fintech companies, which tend to hire junior SRE talent more actively than large enterprises. Building skills in Linux, cloud platforms like AWS or GCP, and basic scripting, then applying to associate infrastructure or platform engineer roles, is the clearest route into the field locally.

Which industries hire the most site reliability engineers in San Francisco?

San Francisco site reliability engineer roles concentrate in Technology & Software, Banking & Financial Services, and Social Media, based on current listings on Migrate Mate as of June 2026. San Francisco's position as a global hub for cloud-native software, financial technology, and consumer platforms means these sectors consistently generate the highest volume of SRE openings in the city.

Related Jobs in California

Backend Infrastructure Engineer JobsCalifornia

Senior Fullstack Developer JobsCalifornia

Backend Software Engineer JobsCalifornia

Software Implementation Consultant JobsCalifornia

Staff Software Engineer JobsCalifornia

Customer Support Engineer JobsCalifornia

Principal Full Stack Engineer JobsCalifornia

Developer JobsCalifornia

Junior Software Developer JobsCalifornia

See All 49 Site Reliability Engineer Jobs in San Francisco

Find roles in San Francisco that match your experience and apply in just a few clicks.

Find Jobs