TN Visa Senior Cloud Platform Engineer Jobs
Senior Cloud Platform Engineer roles qualify for TN visa sponsorship under the USMCA's engineer category, requiring a bachelor's degree in computer science, engineering, or a related field. Canadian citizens can secure TN status at the border or a U.S. consulate without a lottery. Mexican nationals follow a consular interview process with a limited annual allocation.
See All Senior Cloud Platform Engineer JobsOverview
Showing 5 of 102+ Senior Cloud Platform Engineer jobs


Have you applied for this role?


Have you applied for this role?


Have you applied for this role?


Have you applied for this role?


Have you applied for this role?
See all 102+ Senior Cloud Platform Engineer jobs
Sign up for free to unlock all listings, filter by visa type, and get alerts for new Senior Cloud Platform Engineer roles.
Get Access To All Jobs
INTRODUCTION
The era of pervasive AI has arrived. In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fundamentally transform their businesses and operations at scale.
SambaNova Suite™ is the first full-stack, generative AI platform, from chip to model, optimized for enterprise and government organizations. Powered by the intelligent SN40L chip, the SambaNova Suite is a fully integrated platform, delivered on-premises or in the cloud, combined with state-of-the-art open-source models that can be easily and securely fine-tuned using customer data for greater accuracy. Once adapted with customer data, customers retain model ownership in perpetuity, so they can turn generative AI into one of their most valuable assets.
ABOUT SAMBANOVA SYSTEMS
Join the company that's building the future of AI computing. At SambaNova, we are disrupting the AI and high-performance computing space with our integrated hardware and software platform. Our DataScale systems and SambaFlow software are pushing the boundaries of what's possible with generative AI and large language models. We are a team of passionate innovators tackling some of the world's most challenging computational problems.
THE ROLE
As a Senior Cloud Site Reliability Engineer (SRE) specializing in our AI Inferencing Service, you will be the guardian of its reliability, performance, and scalability. You will bridge the gap between software development and operations, applying an engineering mindset to solve operational challenges. Your primary focus will be ensuring our inference endpoints have exceptional uptime, low-latency response times, and efficient resource utilization, directly impacting the experience of our customers and the success of our AI products. This role includes participating in a shared on-call rotation to maintain 24/7 service reliability.
Service Ownership & On-Call: Take shared ownership of the production inferencing service, including its availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning across multiple regions. This includes implementing and supporting AI infrastructure in new regions, such as Asia, Europe, and Latin America, to support the growth of our business. Participate in a balanced on-call rotation to provide 24/7 support for the service.
ON-CALL & WORK-LIFE BALANCE
We believe a sustainable on-call schedule is critical for long-term success and team health. Our on-call philosophy is built on the following principles:
- Balanced Rotation: The on-call rotation is shared equally across the team, typically following a primary/secondary (follow-the-sun) model to ensure no single person bears a disproportionate burden.
- Focus on Prevention: We invest heavily in automation, robust testing, and system design to prevent pages before they happen. The goal of on-call is not to heroically fight fires, but to manage rare, complex failures and use those learnings to make the system more resilient.
- Actionable Alerts: We have a strict policy against alert fatigue. Alerts must be actionable and require immediate human intervention.
- Incident Management: Lead the response to incidents affecting the inferencing service, driving blameless post-mortems and implementing corrective actions to prevent recurrence.
- Monitoring & Alerting: Develop and maintain advanced monitoring, alerting, and dashboarding (using tools like Prometheus, Grafana, Datadog) to gain deep insights into service health, model performance (e.g., latency, throughput, error rates), and accelerator utilization. A key responsibility is ensuring alerts are actionable and have a low false-positive rate, minimizing on-call fatigue.
- Performance & Scalability: Proactively identify and eliminate performance bottlenecks. Design and implement auto-scaling policies to handle variable inference loads cost-effectively. Use insights from on-call incidents to drive improvements that enhance system stability and scalability.
- Infrastructure as Code (IaC): Manage and evolve our cloud infrastructure (on AWS, GCP, and/or Azure along with on-prem) using tools like Terraform and Ansible, ensuring it is secure, repeatable, and scalable.
- CI/CD & Automation: Champion automation by building and improving CI/CD pipelines for the seamless and safe deployment of new model versions and service updates. A core goal is to automate manual toil identified during on-call shifts, reducing future operational overhead.
- Capacity Planning: Forecast infrastructure needs based on product roadmaps and usage trends. Work with finance and engineering teams to manage cloud costs and optimize spending.
- SLOs & SLIs: Define, measure, and report on Service Level Objectives (SLOs) and Indicators (SLIs) for the inferencing platform, using data to drive prioritization and reliability investments.
WHAT WE'RE LOOKING FOR (MUST-HAVES)
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- 5-8+ years of experience in a Site Reliability Engineer, DevOps, or related role supporting a large-scale, customer-facing service in a public cloud environment (AWS, GCP, Azure).
- Strong programming/scripting skills in languages like Python, Go, or Java.
- Proven experience with containerization and orchestration technologies (Docker, Kubernetes).
- Deep understanding of monitoring and observability principles and tools (e.g., Prometheus, Grafana, ELK Stack, Datadog).
- Solid experience with Infrastructure as Code (e.g., Terraform, CloudFormation).
- Familiarity with CI/CD principles and tools (e.g., Jenkins, GitHub Actions, ArgoCD).
- Excellent problem-solving skills and a systematic approach to troubleshooting complex distributed systems.
WHAT WILL MAKE YOU STAND OUT (NICE-TO-HAVES)
- Experience in a hybrid environment bridging cloud and on-premise/data center infrastructure.
- Direct experience supporting ML/AI inferencing services in production.
- Familiarity with GPU-accelerated computing and optimizing workloads for NVIDIA GPUs for purposes of mapping to RDUs.
- Knowledge of model serving frameworks like vLLM, SGLang or Ray.
- Understanding of MLOps principles and practices.
- Experience with managing and tuning databases (SQL or NoSQL) and caching systems (Redis, Memcached).
- Strong Linux/Unix system administration fundamentals.
WHY SAMBANOVA?
- Massive Impact: You will be a key part of a critical platform with high visibility and direct impact on our product and engineers.
- Cutting-Edge Technology: Work with a world-class team on one of the most advanced AI stacks in the industry.
- Autonomy and Growth: We trust you to make technical decisions. This is a greenfield opportunity to build something remarkable from the ground up.
- Competitive Compensation: Including equity, excellent benefits, and a flexible work environment.
SUBMISSION GUIDELINES
Please note that in order to be considered an applicant for any position at SambaNova Systems, you must submit an application form for each position for which you believe you are qualified.
EEO POLICY
SambaNova Systems is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard basis of age (40 and over), color, disability, gender identity, genetic information, marital status, military or veteran status, national origin/ancestry, race, religion, creed, sex (including pregnancy, childbirth, breastfeeding), sexual orientation, and any other applicable status protected by federal, state, or local laws.
BENEFITS SUMMARY FOR US-BASED, FULL-TIME EMPLOYMENT POSITIONS
SambaNova offers a competitive total rewards package, including the base salary, plus equity and benefits. We cover 95% premium coverage for employee medical insurance, and 77% premium coverage for dependents and offer a Health Savings Account (HSA) with employer contribution. We also offer Dental, Vision, Short/Long term Disability, Basic Life, Voluntary Life, and AD&D insurance plans in addition to Flexible Spending Account (FSA) options like Health Care, Limited Purpose, and Dependent Care. Our library of well-being benefits available to you and your dependents includes a full subscription to Headspace, Gympass+ membership with access to physical gyms, One Medical membership, counseling services with an Employee Assistance Program, and much more.

INTRODUCTION
The era of pervasive AI has arrived. In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fundamentally transform their businesses and operations at scale.
SambaNova Suite™ is the first full-stack, generative AI platform, from chip to model, optimized for enterprise and government organizations. Powered by the intelligent SN40L chip, the SambaNova Suite is a fully integrated platform, delivered on-premises or in the cloud, combined with state-of-the-art open-source models that can be easily and securely fine-tuned using customer data for greater accuracy. Once adapted with customer data, customers retain model ownership in perpetuity, so they can turn generative AI into one of their most valuable assets.
ABOUT SAMBANOVA SYSTEMS
Join the company that's building the future of AI computing. At SambaNova, we are disrupting the AI and high-performance computing space with our integrated hardware and software platform. Our DataScale systems and SambaFlow software are pushing the boundaries of what's possible with generative AI and large language models. We are a team of passionate innovators tackling some of the world's most challenging computational problems.
THE ROLE
As a Senior Cloud Site Reliability Engineer (SRE) specializing in our AI Inferencing Service, you will be the guardian of its reliability, performance, and scalability. You will bridge the gap between software development and operations, applying an engineering mindset to solve operational challenges. Your primary focus will be ensuring our inference endpoints have exceptional uptime, low-latency response times, and efficient resource utilization, directly impacting the experience of our customers and the success of our AI products. This role includes participating in a shared on-call rotation to maintain 24/7 service reliability.
Service Ownership & On-Call: Take shared ownership of the production inferencing service, including its availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning across multiple regions. This includes implementing and supporting AI infrastructure in new regions, such as Asia, Europe, and Latin America, to support the growth of our business. Participate in a balanced on-call rotation to provide 24/7 support for the service.
ON-CALL & WORK-LIFE BALANCE
We believe a sustainable on-call schedule is critical for long-term success and team health. Our on-call philosophy is built on the following principles:
- Balanced Rotation: The on-call rotation is shared equally across the team, typically following a primary/secondary (follow-the-sun) model to ensure no single person bears a disproportionate burden.
- Focus on Prevention: We invest heavily in automation, robust testing, and system design to prevent pages before they happen. The goal of on-call is not to heroically fight fires, but to manage rare, complex failures and use those learnings to make the system more resilient.
- Actionable Alerts: We have a strict policy against alert fatigue. Alerts must be actionable and require immediate human intervention.
- Incident Management: Lead the response to incidents affecting the inferencing service, driving blameless post-mortems and implementing corrective actions to prevent recurrence.
- Monitoring & Alerting: Develop and maintain advanced monitoring, alerting, and dashboarding (using tools like Prometheus, Grafana, Datadog) to gain deep insights into service health, model performance (e.g., latency, throughput, error rates), and accelerator utilization. A key responsibility is ensuring alerts are actionable and have a low false-positive rate, minimizing on-call fatigue.
- Performance & Scalability: Proactively identify and eliminate performance bottlenecks. Design and implement auto-scaling policies to handle variable inference loads cost-effectively. Use insights from on-call incidents to drive improvements that enhance system stability and scalability.
- Infrastructure as Code (IaC): Manage and evolve our cloud infrastructure (on AWS, GCP, and/or Azure along with on-prem) using tools like Terraform and Ansible, ensuring it is secure, repeatable, and scalable.
- CI/CD & Automation: Champion automation by building and improving CI/CD pipelines for the seamless and safe deployment of new model versions and service updates. A core goal is to automate manual toil identified during on-call shifts, reducing future operational overhead.
- Capacity Planning: Forecast infrastructure needs based on product roadmaps and usage trends. Work with finance and engineering teams to manage cloud costs and optimize spending.
- SLOs & SLIs: Define, measure, and report on Service Level Objectives (SLOs) and Indicators (SLIs) for the inferencing platform, using data to drive prioritization and reliability investments.
WHAT WE'RE LOOKING FOR (MUST-HAVES)
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- 5-8+ years of experience in a Site Reliability Engineer, DevOps, or related role supporting a large-scale, customer-facing service in a public cloud environment (AWS, GCP, Azure).
- Strong programming/scripting skills in languages like Python, Go, or Java.
- Proven experience with containerization and orchestration technologies (Docker, Kubernetes).
- Deep understanding of monitoring and observability principles and tools (e.g., Prometheus, Grafana, ELK Stack, Datadog).
- Solid experience with Infrastructure as Code (e.g., Terraform, CloudFormation).
- Familiarity with CI/CD principles and tools (e.g., Jenkins, GitHub Actions, ArgoCD).
- Excellent problem-solving skills and a systematic approach to troubleshooting complex distributed systems.
WHAT WILL MAKE YOU STAND OUT (NICE-TO-HAVES)
- Experience in a hybrid environment bridging cloud and on-premise/data center infrastructure.
- Direct experience supporting ML/AI inferencing services in production.
- Familiarity with GPU-accelerated computing and optimizing workloads for NVIDIA GPUs for purposes of mapping to RDUs.
- Knowledge of model serving frameworks like vLLM, SGLang or Ray.
- Understanding of MLOps principles and practices.
- Experience with managing and tuning databases (SQL or NoSQL) and caching systems (Redis, Memcached).
- Strong Linux/Unix system administration fundamentals.
WHY SAMBANOVA?
- Massive Impact: You will be a key part of a critical platform with high visibility and direct impact on our product and engineers.
- Cutting-Edge Technology: Work with a world-class team on one of the most advanced AI stacks in the industry.
- Autonomy and Growth: We trust you to make technical decisions. This is a greenfield opportunity to build something remarkable from the ground up.
- Competitive Compensation: Including equity, excellent benefits, and a flexible work environment.
SUBMISSION GUIDELINES
Please note that in order to be considered an applicant for any position at SambaNova Systems, you must submit an application form for each position for which you believe you are qualified.
EEO POLICY
SambaNova Systems is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard basis of age (40 and over), color, disability, gender identity, genetic information, marital status, military or veteran status, national origin/ancestry, race, religion, creed, sex (including pregnancy, childbirth, breastfeeding), sexual orientation, and any other applicable status protected by federal, state, or local laws.
BENEFITS SUMMARY FOR US-BASED, FULL-TIME EMPLOYMENT POSITIONS
SambaNova offers a competitive total rewards package, including the base salary, plus equity and benefits. We cover 95% premium coverage for employee medical insurance, and 77% premium coverage for dependents and offer a Health Savings Account (HSA) with employer contribution. We also offer Dental, Vision, Short/Long term Disability, Basic Life, Voluntary Life, and AD&D insurance plans in addition to Flexible Spending Account (FSA) options like Health Care, Limited Purpose, and Dependent Care. Our library of well-being benefits available to you and your dependents includes a full subscription to Headspace, Gympass+ membership with access to physical gyms, One Medical membership, counseling services with an Employee Assistance Program, and much more.
See all 102+ Senior Cloud Platform Engineer jobs
Sign up for free to unlock all listings, filter by visa type, and get alerts for new Senior Cloud Platform Engineer roles.
Get Access To All JobsTips for Finding TN Visa Sponsorship as a Senior Cloud Platform Engineer
Align your credentials to TN engineer requirements
TN classification for engineers requires a degree in a field directly related to the role. A computer science or software engineering degree maps cleanly to cloud platform work. A business or unrelated degree creates a documentation problem you'll need to address before applying.
Target employers with recent visa filing experience
Cloud platform engineering roles may qualify for TN status if you meet the professional requirements. Focus your job search on employers with recent visa filings for technical roles, which signals they're experienced with work visa sponsorship and understand the documentation process. When you identify a suitable role, your employer will prepare a support letter outlining your qualifications and role details—this letter is what you'll present at the U.S. port of entry (for Canadian citizens) or U.S. consulate (for Mexican citizens) as part of your TN application.
Search TN-sponsored roles through Migrate Mate
Migrate Mate filters Senior Cloud Platform Engineer jobs by employers with TN visa sponsorship history, so you're not cold-applying to companies that will decline at the offer stage. Start your search there to concentrate on roles where sponsorship is already part of the hiring conversation.
Clarify your job title before the offer is signed
USCIS and CBP scrutinize job title alignment with TN engineer criteria. If your offer letter says 'Cloud Operations Manager' instead of 'Cloud Platform Engineer,' you may face a status denial. Negotiate the title to match your actual engineering duties before finalizing the offer.
Prepare a technical duties letter for port-of-entry
Canadians entering on TN status at a land border or airport must present a detailed employer support letter outlining your engineering responsibilities, required degree, and duration of employment. A vague letter is the most common reason for CBP delays on otherwise clean TN applications.
Understand the difference in processing paths by nationality
Canadian citizens can activate TN status directly at a U.S. port of entry with no advance petition filing. Mexican nationals must schedule a consular interview and are subject to the annual TN cap. Confirm your nationality's pathway with USCIS guidance before you accept an offer timeline from your employer.
Senior Cloud Platform Engineer jobs are hiring across the US. Find yours.
Find Senior Cloud Platform Engineer JobsSenior Cloud Platform Engineer TN Visa: Frequently Asked Questions
Does a Senior Cloud Platform Engineer role qualify for TN visa status?
Yes, provided your employer's job description frames the role as engineering work requiring a bachelor's degree in computer science, engineering, or a closely related field. Roles weighted toward management, operations strategy, or business oversight without clear engineering duties can fail TN classification. The technical depth of your day-to-day responsibilities, not just your title, determines eligibility.
How does TN visa sponsorship compare to H-1B for cloud platform engineering jobs?
TN is significantly more accessible for Canadian and Mexican engineers. There is no annual lottery for Canadians, no cap-subject petition for most cases, and processing can happen at the port of entry in a single day. H-1B requires a lottery slot, a six-month wait after registration, and USCIS petition adjudication. For engineers who qualify under USMCA, TN removes the uncertainty that makes H-1B sponsorship a bottleneck for many employers.
What documents does my employer need to prepare for my TN sponsorship?
Your employer must provide a support letter on company letterhead that specifies your job title, a description of engineering duties, your required qualifications, the length of employment, and your compensation. Canadians presenting at a port of entry need this letter in hand. Mexican nationals need it as part of the consular packet submitted to a U.S. consulate. The letter serves as evidence of your job offer and establishes that you meet the TN occupational requirements for your role.
Where can I find Senior Cloud Platform Engineer jobs with TN visa sponsorship?
Migrate Mate is built specifically for Canadian and Mexican professionals searching for TN-eligible engineering roles in the United States. It surfaces employers who have active sponsorship history for TN visa categories, so you can focus your applications on companies that are already prepared to support the process rather than those encountering it for the first time.
Can I switch employers after starting a Senior Cloud Platform Engineer role on TN status?
Yes, but TN status is tied to a specific employer. If you change jobs, your new employer must independently support a fresh TN application before you begin work. Canadians can handle this at a port of entry with a new support letter. Mexican nationals must return to the consulate. There is no portability mechanism equivalent to what H-1B holders have under AC21, so plan your transition timeline carefully.
See which Senior Cloud Platform Engineer employers are hiring and sponsoring visas right now.
Search Senior Cloud Platform Engineer Jobs