Incident Manager Jobs in California
Incident Manager jobs in California are among the most active in the country, concentrated in technology, financial services, healthcare, and state and local government, with openings from junior IT service desk coordinators through senior enterprise incident management leads. The biggest hiring markets are San Jose, San Francisco, and Los Angeles, where companies like Apple, Kaiser Permanente, and Wells Fargo maintain large operations and consistently hire for incident management functions. Demand is strongest for professionals specializing in IT service management, cybersecurity incident response, and critical infrastructure operations. Find a role that fits below and apply directly.
Find Incident Manager JobsOverview
Showing 5 of 9+ Incident Manager jobs











INTRODUCTION
Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serving tens of thousands of customers. Our customers range from AI researchers to enterprises and hyperscalers. Lambda's mission is to make compute as ubiquitous as electricity and give everyone the power of superintelligence. One person, one GPU.
If you'd like to build the world's best AI cloud, join us.
We are seeking a Senior Incident Manager to lead critical incident response across our AI data center infrastructure. This role is responsible for coordinating rapid resolution of service-impacting events, improving operational resilience, and driving incident management best practices across infrastructure, networking, platform engineering, and data center operations.
ROLE OVERVIEW
The Senior Incident Manager is responsible for leading the end-to-end lifecycle of operational incidents impacting AI infrastructure and data center services. This individual acts as the central command point during major incidents, ensuring rapid triage, cross-team coordination, effective communication, and structured post-incident analysis.
This role requires deep operational expertise in high-availability infrastructure, large-scale GPU clusters, networking, and cloud platforms, along with strong leadership and communication skills.
What You’ll Do
Incident Leadership
- Lead the response to critical (SEV-1 / SEV-2) incidents impacting AI infrastructure, GPU clusters, networking, storage, and data center operations.
- Serve as the Incident Commander during major outages, coordinating engineering, networking, facilities, and vendor teams.
- Act as the liaison between leadership and external teams during incidents / post-incidents to provide updates and status summaries.
- Establish clear incident timelines, triage actions, and resolution plans.
Incident Management Operations
-
Own the incident response lifecycle including:
-
Assisting Technical Triage
- Escalation
- Coordination
- Resolution
Post-incident review
- Ensure timely and accurate communication with internal stakeholders and leadership.
- Maintain incident response documentation and operational playbooks.
- Conduct analysis on incidents and identify patterns / trends for improvement in response and systems reliability.
- Work in an On-Call Rotation to respond to, lead, and coordinate incidents.
Cross-Functional Coordination
-
Work closely with:
-
Data center operations
- Infrastructure engineering & operations
- Network engineering
- Platform reliability engineering
- Security operations
- Hardware and facility vendors
- Drive alignment during outages involving multiple infrastructure layers.
Post-Incident Analysis & Continuous Improvement
- Lead post-incident reviews (PIRs) and root cause analysis. Identify systemic reliability gaps and implement corrective actions.
- Track incident metrics including MTTR, MTTD, and incident recurrence rates.
Operational Excellence
- Improve incident response processes, escalation paths, and tooling by working with technical support and engineering teams.
- Contribute to runbooks, operational standards, and reliability frameworks.
- Support implementation of automation and observability improvements.
Communication & Reporting
- Provide executive-level incident summaries and reports.
- Deliver clear, concise updates during active incidents.
- Maintain incident dashboards and operational health reporting.
YOU
- 8+ years experience in incident management, site reliability engineering, or infrastructure operations.
- Experience managing incidents in large-scale distributed infrastructure environments.
-
Strong understanding of:
-
Data center operations
- GPU compute clusters
- Networking and storage infrastructure
- Cloud or hybrid infrastructure platforms
- Proven ability to lead high-pressure incident response situations.
- Experience with incident management frameworks (ITIL, SRE, or equivalent).
- Excellent communication and stakeholder management skills.
-
Experience with incident tracking and monitoring tools such as:
-
PagerDuty
- ServiceNow
- Jira
- Datadog
- Prometheus / Grafana
NICE TO HAVE
- Experience operating AI or HPC infrastructure.
- Background in SRE, infrastructure engineering, or data center operations.
- Familiarity with high-density GPU environments (NVIDIA clusters, InfiniBand networks).
- Experience with hyperscale or colocation data center environments.
- Knowledge of automation and incident response tooling.
- Knowledge of and experience with Incident command system (ICS).
- Experience in leading and developing incident command from scratch.
KEY COMPETENCIES
- Incident Command & Leadership
- Operational Decision Making
- Cross-Team Coordination
- Root Cause Analysis
- Crisis Communication
- Infrastructure Reliability
WHAT SUCCESS LOOKS LIKE IN THIS ROLE
- Reduced Mean Time to Resolution (MTTR) for critical incidents.
- Improved cross-team incident coordination.
- High-quality post-incident reviews and corrective actions.
- Increased infrastructure reliability and operational maturity.
SALARY RANGE INFORMATION
The annual salary range for this position has been set based on market data and other factors. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.
ABOUT LAMBDA
- Founded in 2012, with 500+ employees, and growing fast.
- Our investors notably include TWG Global, US Innovative Technology Fund (USIT), Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, Gradient Ventures, Mercato Partners, SVB, 1517, and Crescent Cove.
- We have research papers accepted at top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG.
- Our values are publicly available: https://lambda.ai/careers.
- We offer generous cash & equity compensation.
- Health, dental, and vision coverage for you and your dependents.
- Wellness and commuter stipends for select roles.
- 401k Plan with 2% company match (USA employees).
- Flexible paid time off plan that we all actually use.
A FINAL NOTE:
You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.
EQUAL OPPORTUNITY EMPLOYER
Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
Compensation Range: $125K - $195K
See All 9 Incident Manager Jobs in California
Find roles in California that match your experience and apply in just a few clicks.
Find Incident Manager JobsIncident Manager Jobs by City in California
Where California roles are concentrated, by current openings.
Incident Manager Job Market in California
A snapshot from current California openings, updated as new roles post.
Who's Hiring
- Block2

- Anthropic1

- Apple1

- CRUSOE1

- Cockroach Labs1

Top Industries Hiring
- Consulting & Professional Services2
- Electronics & Hardware2
- Technology & Software2
- Science & Research1
What California Employers Look For
The qualifications that appear most often in incident manager jobs across California.
- ITIL 4 Foundation certification or higher required by most California enterprise employers
- Three or more years of IT incident management or IT operations experience in a production environment
- Proficiency with ITSM platforms such as ServiceNow, Jira Service Management, or PagerDuty
- Demonstrated ability to lead major incident response calls and coordinate cross-functional technical teams
- Experience writing post-incident reviews, root cause analyses, and stakeholder communications
- Bachelor's degree in information technology, computer science, or a related technical field preferred
Incident Manager Jobs in California: Frequently Asked Questions
How do you become an incident manager in California?
Most California employers expect a combination of hands-on IT operations experience and formal certification, with ITIL 4 Foundation being the most widely required credential in job listings. A bachelor's degree in information technology or computer science is the common educational baseline, though candidates who progress from roles like systems administrator, NOC analyst, or IT support engineer often qualify without one. California has no state-issued license specific to incident management, so certifications from recognized bodies and a documented record of leading major incidents carry the most weight with hiring managers.
Which companies hire incident managers in California?
Employers hiring incident managers in California right now include Block, Anthropic, and Apple, based on current listings on Migrate Mate as of June 2026. California's density of large tech headquarters, regional bank operations centers, and integrated health systems like Kaiser Permanente makes it one of the most consistent states for incident management hiring year-round.
Which California cities have the most incident manager jobs?
San Francisco, San Jose, and Cupertino account for the largest share of incident manager openings in California. The San Jose and San Francisco concentration reflects the Bay Area's technology industry, where major platforms and cloud providers maintain large site reliability and IT operations teams, while Los Angeles draws demand from media, finance, and healthcare organizations that require dedicated incident management functions.
Are there remote incident manager jobs in California?
Yes, and more than most technical roles, because incident management is largely coordination and communication work that can run over video and chat tooling without a physical presence. About 11% of incident manager openings tied to California are remote or hybrid as of June 2026, reflecting strong employer appetite for distributed operations coverage. The most remote-friendly sub-roles tend to be major incident managers and IT service management leads, where the work centers on bridge calls, runbooks, and stakeholder updates rather than on-site infrastructure.
How can I get hired as an incident manager in California with little or no experience?
The most realistic entry path is through a NOC analyst, IT help desk, or systems administrator role at a mid-size or large California employer, where handling escalations and on-call rotation builds the foundational skills hiring managers look for. Companies like Kaiser Permanente and large Bay Area tech firms run associate IT operations programs that place entry-level candidates into tiered support functions with a clear progression toward incident management. Earning an ITIL 4 Foundation certification before applying signals seriousness without requiring hands-on experience, and candidates who can show documented participation in even minor incident response drills or tabletop exercises stand out at the resume screen.
Where can I find and apply to incident manager jobs in California?
You can find and apply to incident manager jobs in California on Migrate Mate, which lists current California openings updated regularly. Search the listings for roles that match your experience level and specialization, then apply directly to the ones that fit.
See All 9 Incident Manager Jobs in California
Find roles in California that match your experience and apply in just a few clicks.
Find Incident Manager Jobs