Reliability Engineer Jobs in USA with Visa Sponsorship
Reliability engineers build and maintain systems that keep critical infrastructure running smoothly. For H-1B visa sponsorship, your role needs to require specialized engineering knowledge typically gained through a bachelor's degree in engineering, computer science, or related field. Most positions qualify as specialty occupations under immigration law. For detailed occupation requirements, see the O*NET profile.
See All Reliability Engineer JobsOverview
Showing 5 of 955+ Reliability Engineer jobs


Have you applied for this role?


Have you applied for this role?


Have you applied for this role?


Have you applied for this role?


Have you applied for this role?
See all 955+ Reliability Engineer jobs
Sign up for free to unlock all listings, filter by visa type, and get alerts for new Reliability Engineer roles.
Get Access To All Jobs
INTRODUCTION
We are At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm combines creativity and innovative technology to deliver industry-leading digital solutions. Synechron’s progressive technologies and optimization strategies span end-to-end Artificial Intelligence, Consulting, Digital, Cloud & DevOps, Data, and Software Engineering, servicing an array of noteworthy financial services and technology firms. Through research and development initiatives in our FinLabs we develop solutions for modernization, from Artificial Intelligence and Blockchain to Data Science models, Digital Underwriting, mobile-first applications and more. Over the last 20+ years, our company has been honored with multiple employer awards, recognizing our commitment to our talented teams. With top clients to boast about, Synechron has a global workforce of 14,500+, and has 58 offices in 21 countries within key global markets.
ROLE AND RESPONSIBILITIES
The Resiliency & Recovery Engineer is a senior, hands-on engineering role focused on improving production resiliency and recovery outcomes across critical services and payment rails. This role is responsible for driving measurable improvements such as faster recovery (reduced time to restore service), stronger and actionable alert coverage, increased automation to reduce manual toil, and safer releases with repeatable rollback/cutback readiness. The engineer will partner closely with application teams, DevOps, Infrastructure, Database teams, and operational stakeholders to identify resiliency gaps, prioritize remediation, and implement durable solutions that improve stability and reduce customer impact.
- Work across all MMC payment rails to develop faster, more repeatable resiliency and recovery processes that benefit every platform, ensuring these enhancements are adopted broadly across the organization rather than siloed on any single platform.
- Identify resiliency gaps based on incident patterns and recurring failures; turn findings into prioritized remediation work.
- Build/strengthen monitoring, alerting, and dashboards that are actually used by engineers and leadership.
- Create runbooks and automate recovery actions to reduce manual toil and human error during incidents.
- Improve release safety and rollback/fallback readiness (clear, repeatable cutback procedures).
- Support SQL reliability efforts (SQL Server 2022 focus) in partnership with DB/infrastructure teams.
- Owns backlog, prioritization, design reviews, and cross-team coordination (Ops/Product/Tech).
- Runs weekly standup + prepares bi-weekly exec readout.
- Integrate resilience testing into CI/CD pipelines and DevOps workflows to catch issues early and ensure robust, automated releases.
- Conduct chaos engineering experiments (failure injections, game days) to proactively uncover system weaknesses and validate recovery processes under real-world failure scenarios.
- Document and share resiliency best practices; mentor and train engineering teams to foster a culture of reliability and continuous improvement across the organization.
- Ensure a seamless handoff of all newly created resiliency and recovery practices (once mature and repeatable) to the MMC Engineering team by thoroughly documenting the improvements and conducting knowledge transfer, so that the permanent team can sustain and build upon these enhancements after the contract period.
BASIC QUALIFICATIONS
- Proven experience in high-availability, high-transaction environments (preferably payments or financial services).
- Strong background in production resiliency and recovery (recovery execution, runbooks/playbooks, RCA mindset).
- Incident pattern analysis + MTTR baselines (P2 Major/Minor) and recurring failure taxonomy (by rail/service).
- Senior-level observability expertise: dashboards, monitors, and alerts (Datadog preferred; similar tools considered).
- Splunk, Datadog, SQLs, JQL Jira Query language, Gitlab.
- Experience of CI / CD metrics and generating code quality, changes, testing automation executives reports from Gitlab.
- Understand quality of stories, metrics, monitoring experiences - help get data to showcase deficiencies.
- Senior CI/CD experience: pipeline design/operation, release safety patterns, and rollback readiness.
- Experience using metrics and monitoring data to identify and communicate deficiencies.
- Automation skills: Python and/or PowerShell (or equivalent) for building repeatable recovery workflows and operational tooling.
- Kubernetes/container platform production troubleshooting (deployments, pods, config drift, safe restarts, and “why did this change break prod” investigations).
- Experience with identity/credentials/certificate & secret-rotation resilience (preventing outages during password rotations, certificate upgrades, and secret propagation; implementing guardrails and monitoring for these events).
- Batch/scheduler/job-execution reliability (detecting/preventing silent job failures, validating multi-DC scenarios, and building controls to ensure scheduled processing does not impact customers).
- Distributed integration failure-handling (timeouts, retries, backpressure, idempotency, duplicate prevention, and reconciliation—especially across vendor/downstream dependencies).
PREFERRED QUALIFICATIONS
- Experience with SRE-style reliability practices (SLO/SLI thinking, error budgets, operational metrics).
- Experience with failover / DC flip / active-active or active-passive recovery concepts and scenario-based runbooks.
- Cloud Engineering (Azure, AWS).
- DevOps tools expertise (Jenkins, Terraform, Sonar Cube, Helm Charts).
- Network & traffic-management incident triage (load balancers/firewalls/VLAN changes, DC traffic flips, and rapid isolation of “app vs infra vs network” to stabilize service).
COMPENSATION
- A highly competitive compensation and benefits package.
- A multinational organization with 58 offices in 21 countries and the possibility to work abroad.
- 10 days of paid annual leave (plus sick leave and national holidays).
- Maternity & paternity leave plans.
- A comprehensive insurance plan including medical, dental, vision, life insurance, and long-/short-term disability (plans vary by region).
- Retirement savings plans.
- A higher education certification policy.
- Commuter benefits (varies by region).
- Extensive training opportunities, focused on skills, substantive knowledge, and personal development.
- On-demand Udemy for Business for all Synechron employees with free access to more than 5000 curated courses.
- Coaching opportunities with experienced colleagues from our Financial Innovation Labs (FinLabs) and Center of Excellences (CoE) groups.
- Cutting edge projects at the world’s leading tier-one banks, financial institutions and insurance firms.
- A flat and approachable organization.
- A truly diverse, fun-loving, and global work culture.
SYNECHRON’S DIVERSITY & INCLUSION STATEMENT
Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more. All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

INTRODUCTION
We are At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm combines creativity and innovative technology to deliver industry-leading digital solutions. Synechron’s progressive technologies and optimization strategies span end-to-end Artificial Intelligence, Consulting, Digital, Cloud & DevOps, Data, and Software Engineering, servicing an array of noteworthy financial services and technology firms. Through research and development initiatives in our FinLabs we develop solutions for modernization, from Artificial Intelligence and Blockchain to Data Science models, Digital Underwriting, mobile-first applications and more. Over the last 20+ years, our company has been honored with multiple employer awards, recognizing our commitment to our talented teams. With top clients to boast about, Synechron has a global workforce of 14,500+, and has 58 offices in 21 countries within key global markets.
ROLE AND RESPONSIBILITIES
The Resiliency & Recovery Engineer is a senior, hands-on engineering role focused on improving production resiliency and recovery outcomes across critical services and payment rails. This role is responsible for driving measurable improvements such as faster recovery (reduced time to restore service), stronger and actionable alert coverage, increased automation to reduce manual toil, and safer releases with repeatable rollback/cutback readiness. The engineer will partner closely with application teams, DevOps, Infrastructure, Database teams, and operational stakeholders to identify resiliency gaps, prioritize remediation, and implement durable solutions that improve stability and reduce customer impact.
- Work across all MMC payment rails to develop faster, more repeatable resiliency and recovery processes that benefit every platform, ensuring these enhancements are adopted broadly across the organization rather than siloed on any single platform.
- Identify resiliency gaps based on incident patterns and recurring failures; turn findings into prioritized remediation work.
- Build/strengthen monitoring, alerting, and dashboards that are actually used by engineers and leadership.
- Create runbooks and automate recovery actions to reduce manual toil and human error during incidents.
- Improve release safety and rollback/fallback readiness (clear, repeatable cutback procedures).
- Support SQL reliability efforts (SQL Server 2022 focus) in partnership with DB/infrastructure teams.
- Owns backlog, prioritization, design reviews, and cross-team coordination (Ops/Product/Tech).
- Runs weekly standup + prepares bi-weekly exec readout.
- Integrate resilience testing into CI/CD pipelines and DevOps workflows to catch issues early and ensure robust, automated releases.
- Conduct chaos engineering experiments (failure injections, game days) to proactively uncover system weaknesses and validate recovery processes under real-world failure scenarios.
- Document and share resiliency best practices; mentor and train engineering teams to foster a culture of reliability and continuous improvement across the organization.
- Ensure a seamless handoff of all newly created resiliency and recovery practices (once mature and repeatable) to the MMC Engineering team by thoroughly documenting the improvements and conducting knowledge transfer, so that the permanent team can sustain and build upon these enhancements after the contract period.
BASIC QUALIFICATIONS
- Proven experience in high-availability, high-transaction environments (preferably payments or financial services).
- Strong background in production resiliency and recovery (recovery execution, runbooks/playbooks, RCA mindset).
- Incident pattern analysis + MTTR baselines (P2 Major/Minor) and recurring failure taxonomy (by rail/service).
- Senior-level observability expertise: dashboards, monitors, and alerts (Datadog preferred; similar tools considered).
- Splunk, Datadog, SQLs, JQL Jira Query language, Gitlab.
- Experience of CI / CD metrics and generating code quality, changes, testing automation executives reports from Gitlab.
- Understand quality of stories, metrics, monitoring experiences - help get data to showcase deficiencies.
- Senior CI/CD experience: pipeline design/operation, release safety patterns, and rollback readiness.
- Experience using metrics and monitoring data to identify and communicate deficiencies.
- Automation skills: Python and/or PowerShell (or equivalent) for building repeatable recovery workflows and operational tooling.
- Kubernetes/container platform production troubleshooting (deployments, pods, config drift, safe restarts, and “why did this change break prod” investigations).
- Experience with identity/credentials/certificate & secret-rotation resilience (preventing outages during password rotations, certificate upgrades, and secret propagation; implementing guardrails and monitoring for these events).
- Batch/scheduler/job-execution reliability (detecting/preventing silent job failures, validating multi-DC scenarios, and building controls to ensure scheduled processing does not impact customers).
- Distributed integration failure-handling (timeouts, retries, backpressure, idempotency, duplicate prevention, and reconciliation—especially across vendor/downstream dependencies).
PREFERRED QUALIFICATIONS
- Experience with SRE-style reliability practices (SLO/SLI thinking, error budgets, operational metrics).
- Experience with failover / DC flip / active-active or active-passive recovery concepts and scenario-based runbooks.
- Cloud Engineering (Azure, AWS).
- DevOps tools expertise (Jenkins, Terraform, Sonar Cube, Helm Charts).
- Network & traffic-management incident triage (load balancers/firewalls/VLAN changes, DC traffic flips, and rapid isolation of “app vs infra vs network” to stabilize service).
COMPENSATION
- A highly competitive compensation and benefits package.
- A multinational organization with 58 offices in 21 countries and the possibility to work abroad.
- 10 days of paid annual leave (plus sick leave and national holidays).
- Maternity & paternity leave plans.
- A comprehensive insurance plan including medical, dental, vision, life insurance, and long-/short-term disability (plans vary by region).
- Retirement savings plans.
- A higher education certification policy.
- Commuter benefits (varies by region).
- Extensive training opportunities, focused on skills, substantive knowledge, and personal development.
- On-demand Udemy for Business for all Synechron employees with free access to more than 5000 curated courses.
- Coaching opportunities with experienced colleagues from our Financial Innovation Labs (FinLabs) and Center of Excellences (CoE) groups.
- Cutting edge projects at the world’s leading tier-one banks, financial institutions and insurance firms.
- A flat and approachable organization.
- A truly diverse, fun-loving, and global work culture.
SYNECHRON’S DIVERSITY & INCLUSION STATEMENT
Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more. All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.
See all 955+ Reliability Engineer jobs
Sign up for free to unlock all listings, filter by visa type, and get alerts for new Reliability Engineer roles.
Get Access To All JobsTips for Finding Visa Sponsorship as a Reliability Engineer
Emphasize systems engineering expertise
Highlight experience with complex distributed systems, monitoring tools, and failure analysis. USCIS values technical depth that demonstrates specialized knowledge beyond general IT support roles.
Document automation and tooling skills
Detail your work with infrastructure as code, CI/CD pipelines, and custom monitoring solutions. These technical specializations strengthen the specialty occupation case for visa petitions.
Target companies with proven SRE practices
Look for employers with established site reliability engineering teams and mature DevOps cultures. These companies understand the role's technical requirements and sponsor more confidently.
Showcase incident response leadership
Demonstrate experience leading post-mortems, designing fault-tolerant systems, and reducing mean time to recovery. This operational expertise differentiates reliability engineers from general software developers.
Highlight cross-functional collaboration
Document your work bridging development and operations teams, establishing SLAs, and implementing observability practices. This systems thinking approach supports the specialized role argument.
Consider related engineering titles
Site Reliability Engineer, DevOps Engineer, and Platform Engineer roles often have similar responsibilities and sponsor H-1B visas. Don't limit your search to exact title matches.
Reliability Engineer jobs are hiring across the US. Find yours.
Find Reliability Engineer JobsFrequently Asked Questions
Does a reliability engineer role qualify for H-1B sponsorship?
Yes, reliability engineer positions typically qualify as specialty occupations because they require specialized engineering knowledge in systems design, automation, and infrastructure management. The role demands technical expertise typically gained through an engineering or computer science degree, meeting H-1B requirements.
What degree do I need for reliability engineer H-1B sponsorship?
A bachelor's degree in computer science, software engineering, electrical engineering, or related technical field is typically required. Some employers accept equivalent combinations of education and experience, but a relevant engineering degree provides the strongest foundation for visa approval.
How do reliability engineers compare to software engineers for visa sponsorship?
Both roles commonly receive H-1B sponsorship, but reliability engineers may face additional scrutiny to distinguish their specialized systems focus from general software development. Emphasizing infrastructure expertise, automation skills, and operational responsibilities helps establish the role's technical depth and specialty occupation status.
Do startups sponsor reliability engineers for work visas?
Yes, but less frequently than established tech companies. Startups may not have dedicated SRE teams or the resources for visa sponsorship. Target growth-stage companies that have scaled beyond initial product development and need specialized reliability expertise to maintain their infrastructure.
How to find Reliability Engineer jobs with visa sponsorship?
To find Reliability Engineer jobs with visa sponsorship, use Migrate Mate, which specializes in connecting international candidates with sponsoring employers. Focus on manufacturing companies, tech firms, aerospace, and energy sectors that frequently hire Reliability Engineers on H-1B, O-1, or EB-3 visas. Filter searches specifically for positions requiring reliability analysis, maintenance optimization, and failure prevention expertise.
Can I transfer from a software developer role to reliability engineer on the same H-1B?
You'll likely need to file an H-1B amendment if the job duties change significantly, even within the same company. While both are engineering roles, reliability engineering's focus on operations and infrastructure may require documenting the role's continued specialty occupation status through the amendment process.
What is the prevailing wage requirement for sponsored Reliability Engineer jobs?
U.S. employers sponsoring a visa must pay at least the prevailing wage, which is what workers in the same role, area, and experience level typically earn. The Department of Labor sets this rate to make sure companies aren't hiring foreign workers simply because they'd accept lower pay than a U.S. worker. It varies by job title, location, and experience. You can look up current prevailing wage rates for any occupation and location using the OFLC Wage Search page.
See which Reliability Engineer employers are hiring and sponsoring visas right now.
Search Reliability Engineer Jobs