Cloud Infrastructure Engineer Jobs in Massachusetts

Cloud Infrastructure Engineer jobs in Massachusetts are open across Boston, Burlington, and Cambridge and other Massachusetts metros, with employers like State Street, Humana, and AcuityMD hiring at every experience level. Find a role that fits below and apply directly.

Find Jobs

Overview

Open roles36+

Top employerState Street

Top cityBoston, MA

Cities hiring10

Work type56% On-site

Top industryTechnology

Showing 5 of 36+ Cloud Infrastructure Engineer jobs

ML Systems Engineer, Infrastructure & Cloud

We won't show you this job again

Added 7mo ago

Basis Research Institute

ML Systems Engineer, Infrastructure & Cloud

Cambridge, Massachusetts

Added 7mo ago

Basis Research Institute

Cambridge, Massachusetts

Cloud & DevOps

Software Engineering

Data Science & Analytics

Technical Product & Program Management

Cloud Engineering

Data Science

DevOps

Technical Program Management

Cloud & DevOps

Software Engineering

Data Science & Analytics

Technical Product & Program Management

Cloud Engineering

Data Science

DevOps

Technical Program Management

Salary TBD

On-Site

None

On-Site

None

Have you applied for this role?

Senior Infrastructure Engineer

We won't show you this job again

Added 1w ago

Definitive Healthcare

Senior Infrastructure Engineer

Framingham, Massachusetts

Added 1w ago

Definitive Healthcare

Framingham, Massachusetts

Cloud & DevOps

IT Support & Systems Administration

Quality Assurance & Testing (QA Testing)

Cloud Engineering

Network Engineering

Systems Administration

Automation QA

Cloud & DevOps

IT Support & Systems Administration

Quality Assurance & Testing (QA Testing)

Cloud Engineering

Network Engineering

Systems Administration

Automation QA

$173k/yr

4+ yrs exp.

On-Site

None

$173k/yr

On-Site

None

Have you applied for this role?

Infrastructure Engineer

We won't show you this job again

Added 3w ago

Rtx Corporation

Infrastructure Engineer

Gloucester, Massachusetts

Added 3w ago

Rtx Corporation

Gloucester, Massachusetts

IT Support & Systems Administration

Cloud & DevOps

IT Support

Systems Administration

DevOps

IT Support & Systems Administration

Cloud & DevOps

IT Support

Systems Administration

DevOps

Salary TBD

On-Site

Associate's

On-Site

Associate's

Have you applied for this role?

Senior ASIC Front End Infrastructure Engineer

We won't show you this job again

Added 3w ago

NVIDIA

Senior ASIC Front End Infrastructure Engineer

Westford, Massachusetts

Added 3w ago

NVIDIA

Westford, Massachusetts

Software Engineering

Data Science & Analytics

Cloud & DevOps

Data Science

DevOps

Frontend Engineering

Software Engineering

Data Science & Analytics

Cloud & DevOps

Data Science

DevOps

Frontend Engineering

$184k - $357k/yr

8+ yrs exp.

On-Site

Master's

$184k - $357k/yr

On-Site

Master's

Have you applied for this role?

Build Infrastructure Engineer

We won't show you this job again

Added 1mo ago

Realtime Robotics, Inc.

Build Infrastructure Engineer

Boston, Massachusetts

Added 1mo ago

Realtime Robotics, Inc.

Boston, Massachusetts

Cloud & DevOps

Technical Product & Program Management

DevOps

Cloud Engineering

Technical Program Management

Cloud & DevOps

Technical Product & Program Management

DevOps

Cloud Engineering

Technical Program Management

Salary TBD

Hybrid

Bachelor's

Hybrid

Bachelor's

Have you applied for this role?

Added 7mo ago

ML Systems Engineer, Infrastructure & Cloud

Basis Research Institute·Cambridge, Massachusetts

Job functions:

Cloud & DevOps

Software Engineering

Data Science & Analytics

Technical Product & Program Management

Cloud Engineering

Data Science

DevOps

Technical Program Management

On-Site

None

Full Time

Job Description

About Basis

Basis is a nonprofit applied AI research organization with two mutually reinforcing goals.

The first is to understand and build intelligence. This means to establish the mathematical principles of what it means to reason, to learn, to make decisions, to understand, and to explain; and to construct software that implements these principles.

The second is to advance society’s ability to solve intractable problems. This means expanding the scale, complexity, and breadth of problems that we can solve today, and even more importantly, accelerating our ability to solve problems in the future.

To achieve these goals, we’re building both a new technological foundation that draws inspiration from how humans reason, and a new kind of collaborative organization that puts human values first.

About the Role

ML Systems Engineers at Basis ensure training and evaluation infrastructure is fast, reliable, and scalable. You will own the full stack from distributed training frameworks through cloud administration, making it possible for researchers to iterate quickly on complex models while managing computational resources efficiently.

We are looking for engineers who combine deep understanding of ML systems with operational excellence. The ideal ML Systems Engineer has experience with distributed training at scale, understands the intricacies of debugging numerical instabilities, and can manage cloud infrastructure that scales from experiments to production. You will be the guardian of training stability, the optimizer of compute costs, and the enabler of reproducible research.

This role spans traditional ML engineering and cloud/DevOps responsibilities. You will manage GPU clusters, optimize cloud spending, ensure security and compliance, and build the infrastructure that lets researchers focus on algorithms rather than operations.

We seek individuals who aspire to build robust ML infrastructure, maintain “logbook culture” for documenting issues and solutions, and treat operational excellence as a first-class concern.

We expect you to:

Have demonstrated expertise in ML systems engineering. Examples include:
Managing distributed training jobs across hundreds of GPUs
Debugging and fixing numerical instabilities in large-scale training
Building infrastructure for reproducible ML experiments
Optimizing training throughput and resource utilization
Possess deep knowledge of distributed training frameworks including PyTorch/JAX distributed strategies (DDP, FSDP, ZeRO), gradient accumulation, mixed precision training, and checkpoint/recovery systems.
Have strong cloud administration skills including AWS/GCP/Azure services, infrastructure as code (Terraform), Kubernetes orchestration, cost optimization, security best practices, and compliance requirements.
Understand the full ML stack from hardware (GPUs, interconnects, storage) through frameworks (PyTorch, JAX) to high-level training loops and evaluation pipelines.
Be skilled at debugging complex failures across the stack—GPU/NCCL issues, data loading bottlenecks, memory leaks, gradient explosions, and convergence problems.
Value documentation and knowledge sharing. You maintain comprehensive logs of issues encountered, solutions found, and lessons learned, building institutional knowledge.
Progress with autonomy while coordinating closely with researchers. You can anticipate infrastructure needs, prevent problems before they occur, and respond quickly when issues arise.

In addition, the following would be an advantage:

Experience at organizations training large models (OpenAI, Anthropic, Google, Meta).
Background in both ML research and production systems.
Contributions to ML frameworks or distributed training libraries.
Experience with on-premise GPU cluster management.
Knowledge of optimization theory and numerical methods.
Understanding of robotics-specific infrastructure requirements.

Responsibilities:

Own distributed training infrastructure including job launchers, checkpointing systems, recovery mechanisms, and monitoring that ensures experiments run reliably at scale.
Debug and resolve training failures by diagnosing issues across GPUs, networking, numerics, and data pipelines, maintaining detailed logs of problems and solutions.
Profile and optimize training performance by identifying bottlenecks in data loading, gradient computation, communication overhead, and implementing solutions that improve step time.
Manage cloud infrastructure and costs including capacity planning, spot instance strategies, storage optimization, and building tools that give researchers visibility into resource usage.
Implement security and compliance measures including access controls, data encryption, audit logging, and ensuring infrastructure meets requirements for handling sensitive data.
Build evaluation and benchmarking infrastructure that enables consistent, reproducible measurement of model performance across different conditions and datasets.
Develop monitoring and alerting systems that detect anomalies in training metrics, resource utilization, or system health, enabling rapid response to issues.
Maintain development environments including containerization, dependency management, and tools that ensure researchers can reproduce results across different systems.
Document and share knowledge through runbooks, post-mortems, and training materials that help the team understand and operate ML infrastructure effectively.
Collaborate with researchers to understand requirements, suggest infrastructure solutions, and ensure systems support rather than constrain research goals.

Role Details

Exceptional candidates who may not meet all of the following criteria are still encouraged to apply.

FT/PT: Full-time.
In-person Policy: We are in the office four days a week. Be prepared to attend multi-day Basis-wide in-person events.
Location: New York City or Cambridge, MA.
Salary range: Competitive salary.

Privacy Notice

By submitting your application, you grant Basis permission to use your materials for both hiring evaluation and recruitment-related research and development purposes. Your information may be processed in different countries, including the US. You retain copyright while providing Basis a license to use these materials for the stated purposes.

Read our full Global Data Privacy Notice here.

Basis Research Institute

ML Systems Engineer, Infrastructure & Cloud

Cambridge, Massachusetts

Added 7mo ago

Job functions:

Cloud & DevOps

Software Engineering

Data Science & Analytics

Technical Product & Program Management

Cloud Engineering

Data Science

DevOps

Technical Program Management

On-Site

None

Full Time

Job Description

About Basis

Basis is a nonprofit applied AI research organization with two mutually reinforcing goals.

The first is to understand and build intelligence. This means to establish the mathematical principles of what it means to reason, to learn, to make decisions, to understand, and to explain; and to construct software that implements these principles.

The second is to advance society’s ability to solve intractable problems. This means expanding the scale, complexity, and breadth of problems that we can solve today, and even more importantly, accelerating our ability to solve problems in the future.

To achieve these goals, we’re building both a new technological foundation that draws inspiration from how humans reason, and a new kind of collaborative organization that puts human values first.

About the Role

ML Systems Engineers at Basis ensure training and evaluation infrastructure is fast, reliable, and scalable. You will own the full stack from distributed training frameworks through cloud administration, making it possible for researchers to iterate quickly on complex models while managing computational resources efficiently.

We are looking for engineers who combine deep understanding of ML systems with operational excellence. The ideal ML Systems Engineer has experience with distributed training at scale, understands the intricacies of debugging numerical instabilities, and can manage cloud infrastructure that scales from experiments to production. You will be the guardian of training stability, the optimizer of compute costs, and the enabler of reproducible research.

This role spans traditional ML engineering and cloud/DevOps responsibilities. You will manage GPU clusters, optimize cloud spending, ensure security and compliance, and build the infrastructure that lets researchers focus on algorithms rather than operations.

We seek individuals who aspire to build robust ML infrastructure, maintain “logbook culture” for documenting issues and solutions, and treat operational excellence as a first-class concern.

We expect you to:

Have demonstrated expertise in ML systems engineering. Examples include:
Managing distributed training jobs across hundreds of GPUs
Debugging and fixing numerical instabilities in large-scale training
Building infrastructure for reproducible ML experiments
Optimizing training throughput and resource utilization
Possess deep knowledge of distributed training frameworks including PyTorch/JAX distributed strategies (DDP, FSDP, ZeRO), gradient accumulation, mixed precision training, and checkpoint/recovery systems.
Have strong cloud administration skills including AWS/GCP/Azure services, infrastructure as code (Terraform), Kubernetes orchestration, cost optimization, security best practices, and compliance requirements.
Understand the full ML stack from hardware (GPUs, interconnects, storage) through frameworks (PyTorch, JAX) to high-level training loops and evaluation pipelines.
Be skilled at debugging complex failures across the stack—GPU/NCCL issues, data loading bottlenecks, memory leaks, gradient explosions, and convergence problems.
Value documentation and knowledge sharing. You maintain comprehensive logs of issues encountered, solutions found, and lessons learned, building institutional knowledge.
Progress with autonomy while coordinating closely with researchers. You can anticipate infrastructure needs, prevent problems before they occur, and respond quickly when issues arise.

In addition, the following would be an advantage:

Experience at organizations training large models (OpenAI, Anthropic, Google, Meta).
Background in both ML research and production systems.
Contributions to ML frameworks or distributed training libraries.
Experience with on-premise GPU cluster management.
Knowledge of optimization theory and numerical methods.
Understanding of robotics-specific infrastructure requirements.

Responsibilities:

Own distributed training infrastructure including job launchers, checkpointing systems, recovery mechanisms, and monitoring that ensures experiments run reliably at scale.
Debug and resolve training failures by diagnosing issues across GPUs, networking, numerics, and data pipelines, maintaining detailed logs of problems and solutions.
Profile and optimize training performance by identifying bottlenecks in data loading, gradient computation, communication overhead, and implementing solutions that improve step time.
Manage cloud infrastructure and costs including capacity planning, spot instance strategies, storage optimization, and building tools that give researchers visibility into resource usage.
Implement security and compliance measures including access controls, data encryption, audit logging, and ensuring infrastructure meets requirements for handling sensitive data.
Build evaluation and benchmarking infrastructure that enables consistent, reproducible measurement of model performance across different conditions and datasets.
Develop monitoring and alerting systems that detect anomalies in training metrics, resource utilization, or system health, enabling rapid response to issues.
Maintain development environments including containerization, dependency management, and tools that ensure researchers can reproduce results across different systems.
Document and share knowledge through runbooks, post-mortems, and training materials that help the team understand and operate ML infrastructure effectively.
Collaborate with researchers to understand requirements, suggest infrastructure solutions, and ensure systems support rather than constrain research goals.

Role Details

Exceptional candidates who may not meet all of the following criteria are still encouraged to apply.

FT/PT: Full-time.
In-person Policy: We are in the office four days a week. Be prepared to attend multi-day Basis-wide in-person events.
Location: New York City or Cambridge, MA.
Salary range: Competitive salary.

Privacy Notice

By submitting your application, you grant Basis permission to use your materials for both hiring evaluation and recruitment-related research and development purposes. Your information may be processed in different countries, including the US. You retain copyright while providing Basis a license to use these materials for the stated purposes.

Read our full Global Data Privacy Notice here.

See All 36 Cloud Infrastructure Engineer Jobs in Massachusetts

Find roles in Massachusetts that match your experience and apply in just a few clicks.

Find Jobs

Cloud Infrastructure Engineer Jobs by City in Massachusetts

Where Massachusetts roles are concentrated, by current openings.

Boston

20 jobs

Burlington

4 jobs

Cambridge

3 jobs

Somerville

3 jobs

Beverly

1 jobs

Canton

1 jobs

Framingham

1 jobs

Gloucester

1 jobs

Natick

1 jobs

Cloud Infrastructure Engineer Job Market in Massachusetts

A snapshot from current Massachusetts openings, updated as new roles post.

Who's Hiring

State Street6
Humana4
AcuityMD2
Klaviyo2
10Beauty1

Top Industries Hiring

Technology & Software12
Banking & Financial Services6
Biotechnology & Pharmaceuticals3
Consulting & Professional Services2
Electronics & Hardware2

Find Jobs

What Massachusetts Employers Look For

The qualifications that appear most often in cloud infrastructure engineer jobs across Massachusetts.

Hands-on experience with at least one major cloud platform such as AWS, Azure, or GCP
Proficiency in infrastructure-as-code tools such as Terraform, CloudFormation, or Pulumi
Experience designing and managing containerized workloads using Kubernetes or Docker
Familiarity with CI/CD pipelines and automation tooling such as Jenkins, GitLab CI, or GitHub Actions
Strong understanding of cloud networking concepts including VPCs, subnets, load balancers, and DNS
Relevant cloud certification such as AWS Solutions Architect, Google Professional Cloud Architect, or Azure Administrator

Find Jobs

Cloud Infrastructure Engineer Jobs in Massachusetts: Frequently Asked Questions

How many cloud infrastructure engineer jobs are there in Massachusetts?

There are 36+ cloud infrastructure engineer openings in Massachusetts on Migrate Mate as of June 2026, with the most roles in Boston, Burlington, and Cambridge. New positions post regularly as employers across Massachusetts hire.

How much do cloud infrastructure engineers make in Massachusetts?

Cloud infrastructure engineers in Massachusetts earn a median of about $109,590 a year, based on May 2025 Bureau of Labor Statistics wage data, ranging from around $55,740 for the lowest 10% to over $183,460 for the top 10%. Pay rises with experience, specialty, and employer.

Which Massachusetts cities have the most cloud infrastructure engineer jobs?

Boston, Burlington, and Cambridge have the most cloud infrastructure engineer openings in Massachusetts right now, with additional roles spread across smaller metros statewide.

Which companies hire cloud infrastructure engineers in Massachusetts?

Employers hiring cloud infrastructure engineers in Massachusetts include State Street, Humana, and AcuityMD, based on current listings on Migrate Mate as of June 2026.

Are there remote cloud infrastructure engineer jobs in Massachusetts?

Yes. About 44% of cloud infrastructure engineer openings tied to Massachusetts are remote or hybrid as of June 2026. The rest are on-site roles based in Massachusetts metros.

How do I apply for cloud infrastructure engineer jobs in Massachusetts?

You can apply to cloud infrastructure engineer jobs in Massachusetts directly on Migrate Mate. Search the listings above, find roles that match your experience and preferred Massachusetts location, then apply to each one that fits.

Find Jobs