Site Reliability Engineer Jobs in San Francisco, CA
Site Reliability Engineer jobs in San Francisco are in high demand, concentrated in SoMa, the Financial District, and Mission Bay across cloud infrastructure, fintech, and enterprise software. Employers hiring right now include Braze, vapi, and Unify. Scan the live roles below and apply to whichever ones fit.
Find JobsOverview
Showing 5 of 49+ Site Reliability Engineer jobs






This is not a ticket-taking SRE role.
You will define how mission-critical machine learning and real-time analytics systems operate in production — influencing reliability strategy, deployment standards, and infrastructure architecture across engineering.
This team operates in a highly collaborative, in-person engineering environment in SOMA. Infrastructure, ML, and engineering leaders work side by side to design, build, and operate complex systems in real time. The pace is fast, the feedback loops are tight, and decisions happen quickly.
If you’ve grown from Linux systems DevOps Staff-level SRE, and you now think in terms of systemic risk, scalability, and long-term reliability strategy — this role gives you direct influence and visibility.
This role is intentionally in-person because:
Reliability decisions happen at architectural depth — not over Slack threads
ML, data, and infrastructure teams collaborate continuously in real time
Post-incident reviews, system design debates, and performance tuning sessions are hands-on and high impact
You will have direct access to engineering leadership and decision-makers
The infrastructure you’re operating is mission-critical and evolving quickly
If you value deep technical collaboration, tight feedback loops, and being at the center of high-scale ML systems — this environment is built for that.
What You’ll Own
Production reliability for ML and real-time analytics workloads
CI/CD strategy, deployment automation, and rollback design
Observability frameworks (SLOs, alerting, monitoring, incident response)
Infrastructure-as-Code and Kubernetes environments
Capacity planning and performance optimization
Post-incident reviews that drive measurable, long-term reliability improvements
Reliability standards across teams — not just within a single service
You’ll partner directly with engineering and data science teams to ensure ML workloads are production-ready and reliable by design.
What We’re Looking For
Deep experience operating Linux infrastructure and networking in production environments
Proven impact as a Staff SRE, Senior SRE, or senior-level DevOps/Platform Engineer supporting distributed systems
Experience supporting complex, data-intensive or ML-driven systems in production
Strong hands-on experience with Docker and Kubernetes
Infrastructure-as-Code expertise
Strong scripting ability (Bash and/or Python)
CI/CD ownership experience (GitHub Actions, ArgoCD, or similar)
Experience with modern observability stacks (Prometheus, Grafana, Datadog, ELK, OpenTelemetry)
Ability to debug systemic failures across infrastructure, deployments, and workloads
Clear communicator who works effectively across engineering and data teams
Engineers who have evolved from infrastructure foundations into strategic reliability leaders will thrive here.
These Skills Are a Plus
Experience operating ML platforms at scale (training + inference)
AWS or cloud-managed services experience
Exposure to data platforms such as Spark, Airflow, or Kafka
Experience in SOC 2 or regulated environments
Why This Opportunity
Staff-level ownership of mission-critical ML infrastructure
Direct influence over reliability standards across engineering
High-visibility role with architectural impact
Collaborative engineering culture designed for speed and depth
Competitive base compensation ($210K–$250K)
If you're a Staff-level reliability engineer who wants real ownership and architectural influence — let’s start the conversation.
StratITech is partnering with our San Francisco client to build the next generation of high-scale ML infrastructure.
See All 49 Site Reliability Engineer Jobs in San Francisco
Find roles in San Francisco that match your experience and apply in just a few clicks.
Find JobsSite Reliability Engineer Job Market in San Francisco
Who's Hiring
- Braze3

- vapi2

- Unify2

- Harvey2

- Airbyte2
Top Industries Hiring
- Technology & Software26
- Banking & Financial Services4
- Social Media2
- Media & Entertainment2
- Cybersecurity1
Site Reliability Engineer Jobs in San Francisco: Frequently Asked Questions
How do I get a site reliability engineer job in San Francisco?
Target San Francisco's density of cloud-native companies in SoMa and Mission Bay, plus the fintech and enterprise software firms in the Financial District. Hands-on experience with Kubernetes, Terraform, and observability tooling stands out in this market. Contributing to open-source infrastructure projects and networking through local DevOps and SRE meetups in the city gives candidates a concrete edge over remote applicants.
Which companies hire site reliability engineers in San Francisco?
San Francisco site reliability engineer roles are posted by Braze, vapi, and Unify and others right now, based on current listings on Migrate Mate as of June 2026. The city's hiring base is especially strong among high-growth cloud platforms, consumer tech companies, and financial technology firms headquartered in SoMa and the Financial District.
Are there remote site reliability engineer jobs in San Francisco?
Yes, though many SRE roles require at least some on-site presence given hands-on infrastructure and incident response responsibilities. About 37% of site reliability engineer openings tied to San Francisco are remote or hybrid as of June 2026, skewing toward hybrid schedules. Automation, monitoring, and on-call engineering tasks are most commonly performed remotely within San Francisco-based teams.
How can I get a site reliability engineer job in San Francisco with little or no experience?
The most realistic entry path in San Francisco is moving laterally from a systems administration or DevOps engineer role at one of the city's mid-size SaaS or fintech companies, which tend to hire junior SRE talent more actively than large enterprises. Building skills in Linux, cloud platforms like AWS or GCP, and basic scripting, then applying to associate infrastructure or platform engineer roles, is the clearest route into the field locally.
Which industries hire the most site reliability engineers in San Francisco?
San Francisco site reliability engineer roles concentrate in Technology & Software, Banking & Financial Services, and Social Media, based on current listings on Migrate Mate as of June 2026. San Francisco's position as a global hub for cloud-native software, financial technology, and consumer platforms means these sectors consistently generate the highest volume of SRE openings in the city.
Related Jobs in California
See All 49 Site Reliability Engineer Jobs in San Francisco
Find roles in San Francisco that match your experience and apply in just a few clicks.
Find Jobs