Data Engineer Jobs at Basis Research Institute with Visa Sponsorship
Basis Research Institute hires Data Engineers to build and maintain research data infrastructure, and the company has a track record of supporting international candidates through visa sponsorship. If you're on an F-1, TN, or H-1B, this is a team worth targeting.
See All Data Engineer at Basis Research Institute JobsOverview
Showing 5 of 31+ Data Engineer Jobs at Basis Research Institute jobs


Have you applied for this role?


Have you applied for this role?


Have you applied for this role?


Have you applied for this role?


Have you applied for this role?
See all 31+ Data Engineer Jobs at Basis Research Institute
Sign up for free to unlock all listings, filter by visa type, and get alerts for new Data Engineer Jobs at Basis Research Institute.
Get Access To All Jobs
About Basis
Basis is a nonprofit applied AI research organization with two mutually reinforcing goals. The first is to understand and build intelligence. This means to establish the mathematical principles of what it means to reason, to learn, to make decisions, to understand, and to explain; and to construct software that implements these principles. The second is to advance society’s ability to solve intractable problems. This means expanding the scale, complexity, and breadth of problems that we can solve today, and even more importantly, accelerating our ability to solve problems in the future. To achieve these goals, we’re building both a new technological foundation that draws inspiration from how humans reason, and a new kind of collaborative organization that puts human values first.
About The Role
Data Engineers on the Platform team at Basis build trustworthy data pipelines with comprehensive provenance and quality gates, curate documented datasets for training and evaluation, and ensure data infrastructure scales reliably. You will work on both platform-specific data needs and cross-project data coordination, preventing duplicate work and facilitating shared datasets. We are looking for people who are technically excellent and treat data quality as a first-class concern. The ideal Data Engineer has experience with ML data pipelines, understands the full lifecycle from raw data through model training and evaluation, and brings rigor to data provenance, lineage tracking, and quality assurance. You combine software engineering discipline with deep understanding of data systems and ML requirements. This role is embedded across Platform and Research teams, working on infrastructure that supports both commercial offerings and internal research. You will help Basis scale data operations to support medium-scale models, ensure data governance as we serve external customers, and build systems that researchers can trust for reproducible experiments. We seek individuals who aspire to do rigorous, high-quality, robust data engineering, but are not afraid to iterate, learn from real usage, and explore different approaches to achieve excellence. Basis is a collaborative effort, both internally and with our external partners; we are looking for people who enjoy building data foundations for problems larger than ones they can tackle alone.
We expect you to:
- Have demonstrated significant achievements in data engineering for ML/AI systems. Examples include:
+ Building data pipelines for model training or evaluation at scale
+ Developing feature stores or data platforms serving multiple teams
+ Creating data quality frameworks and implementing governance systems
+ Designing data architectures that enabled new ML capabilities
- Possess strong proficiency in data technologies including SQL (expert level), Python for data processing, distributed computing frameworks (Spark, Dask), and workflow orchestration tools (Airflow, Dagster, Prefect).
- Have experience with cloud data platforms including data warehouses (Snowflake, BigQuery, Redshift), data lakes, object storage (S3), and streaming systems (Kafka, Kinesis, Flink) for both batch and real-time processing.
- Understand ML data requirements including feature engineering, training/validation/test splits, data versioning, experiment reproducibility, and the specific data needs of different model types and training procedures.
- Be skilled at data quality and governance including implementing validation frameworks, anomaly detection, data lineage tracking, metadata management, and ensuring compliance with privacy and security policies.
- Have knowledge of data modeling principles for both relational and NoSQL systems, understanding of schema design, normalization/denormalization tradeoffs, and performance optimization.
- Value data provenance and documentation. You ensure data pipelines are transparent, decisions are documented, and others can understand and trust the data you deliver.
- Progress with autonomy on complex data challenges. You can scope data projects, make sound architectural decisions, and deliver complete solutions from ingestion through consumption.
- Be excited about enabling rigorous research through trustworthy data infrastructure that advances our ability to solve intractable problems.
In addition, the following would be an advantage:
- Experience with feature stores (Tecton, Feast) or building feature platforms.
- Background in ML research or research engineering providing understanding of data needs across experiment lifecycle.
- Experience with data lineage tools (Apache Atlas, DataHub, Monte Carlo) and metadata management.
- Knowledge of vector databases and embedding pipelines for modern AI applications.
- Contributions to data engineering open-source projects (Airflow, dbt, Great Expectations).
- Understanding of responsible AI and data governance practices.
Responsibilities:
- Design and build data pipelines for training and evaluation across Basis research projects and platform offerings, ensuring reliability, performance, and scalability.
- Implement data quality frameworks including validation rules, quality gates, anomaly detection, and monitoring that catch data issues before they impact research or production systems.
- Develop and maintain feature stores or equivalent systems that enable consistent feature access across training and serving environments, preventing train-serve skew.
- Ensure data provenance and lineage tracking so researchers and engineers can understand data origins, transformations applied, and dependencies, enabling reproducible experiments and debugging.
- Curate documented datasets for model training and evaluation, including dataset versioning, comprehensive documentation, quality metrics, and metadata that enables appropriate usage.
- Coordinate cross-project data initiatives to prevent duplicate data work, facilitate shared datasets, and ensure consistent data practices across Basis as the organization scales.
- Optimize data infrastructure for scale as compute grows, including cost optimization, performance tuning, caching strategies, and efficient data access patterns.
- Collaborate with research and engineering teams to understand data needs, translate requirements into technical solutions, and provide consultation on data architecture and best practices.
- Implement data governance policies ensuring compliance with privacy regulations, security requirements, and responsible AI practices as Basis serves external customers.
- Contribute to the culture and direction of Basis by modeling data quality rigor, documentation excellence, and focus on trustworthy data infrastructure.
Role Details
Exceptional candidates who may not meet all of the following criteria are still encouraged to apply.
- FT/PT: Full-time.
- In-person Policy: We are in the office four days a week. Be prepared to attend multi-day Basis-wide in-person events.
- Location: New York City.
- Salary range: Competitive salary.
Privacy Notice
By submitting your application, you grant Basis permission to use your materials for both hiring evaluation and recruitment-related research and development purposes. Your information may be processed in different countries, including the US. You retain copyright while providing Basis a license to use these materials for the stated purposes.
Read our full Global Data Privacy Notice here.

About Basis
Basis is a nonprofit applied AI research organization with two mutually reinforcing goals. The first is to understand and build intelligence. This means to establish the mathematical principles of what it means to reason, to learn, to make decisions, to understand, and to explain; and to construct software that implements these principles. The second is to advance society’s ability to solve intractable problems. This means expanding the scale, complexity, and breadth of problems that we can solve today, and even more importantly, accelerating our ability to solve problems in the future. To achieve these goals, we’re building both a new technological foundation that draws inspiration from how humans reason, and a new kind of collaborative organization that puts human values first.
About The Role
Data Engineers on the Platform team at Basis build trustworthy data pipelines with comprehensive provenance and quality gates, curate documented datasets for training and evaluation, and ensure data infrastructure scales reliably. You will work on both platform-specific data needs and cross-project data coordination, preventing duplicate work and facilitating shared datasets. We are looking for people who are technically excellent and treat data quality as a first-class concern. The ideal Data Engineer has experience with ML data pipelines, understands the full lifecycle from raw data through model training and evaluation, and brings rigor to data provenance, lineage tracking, and quality assurance. You combine software engineering discipline with deep understanding of data systems and ML requirements. This role is embedded across Platform and Research teams, working on infrastructure that supports both commercial offerings and internal research. You will help Basis scale data operations to support medium-scale models, ensure data governance as we serve external customers, and build systems that researchers can trust for reproducible experiments. We seek individuals who aspire to do rigorous, high-quality, robust data engineering, but are not afraid to iterate, learn from real usage, and explore different approaches to achieve excellence. Basis is a collaborative effort, both internally and with our external partners; we are looking for people who enjoy building data foundations for problems larger than ones they can tackle alone.
We expect you to:
- Have demonstrated significant achievements in data engineering for ML/AI systems. Examples include:
+ Building data pipelines for model training or evaluation at scale
+ Developing feature stores or data platforms serving multiple teams
+ Creating data quality frameworks and implementing governance systems
+ Designing data architectures that enabled new ML capabilities
- Possess strong proficiency in data technologies including SQL (expert level), Python for data processing, distributed computing frameworks (Spark, Dask), and workflow orchestration tools (Airflow, Dagster, Prefect).
- Have experience with cloud data platforms including data warehouses (Snowflake, BigQuery, Redshift), data lakes, object storage (S3), and streaming systems (Kafka, Kinesis, Flink) for both batch and real-time processing.
- Understand ML data requirements including feature engineering, training/validation/test splits, data versioning, experiment reproducibility, and the specific data needs of different model types and training procedures.
- Be skilled at data quality and governance including implementing validation frameworks, anomaly detection, data lineage tracking, metadata management, and ensuring compliance with privacy and security policies.
- Have knowledge of data modeling principles for both relational and NoSQL systems, understanding of schema design, normalization/denormalization tradeoffs, and performance optimization.
- Value data provenance and documentation. You ensure data pipelines are transparent, decisions are documented, and others can understand and trust the data you deliver.
- Progress with autonomy on complex data challenges. You can scope data projects, make sound architectural decisions, and deliver complete solutions from ingestion through consumption.
- Be excited about enabling rigorous research through trustworthy data infrastructure that advances our ability to solve intractable problems.
In addition, the following would be an advantage:
- Experience with feature stores (Tecton, Feast) or building feature platforms.
- Background in ML research or research engineering providing understanding of data needs across experiment lifecycle.
- Experience with data lineage tools (Apache Atlas, DataHub, Monte Carlo) and metadata management.
- Knowledge of vector databases and embedding pipelines for modern AI applications.
- Contributions to data engineering open-source projects (Airflow, dbt, Great Expectations).
- Understanding of responsible AI and data governance practices.
Responsibilities:
- Design and build data pipelines for training and evaluation across Basis research projects and platform offerings, ensuring reliability, performance, and scalability.
- Implement data quality frameworks including validation rules, quality gates, anomaly detection, and monitoring that catch data issues before they impact research or production systems.
- Develop and maintain feature stores or equivalent systems that enable consistent feature access across training and serving environments, preventing train-serve skew.
- Ensure data provenance and lineage tracking so researchers and engineers can understand data origins, transformations applied, and dependencies, enabling reproducible experiments and debugging.
- Curate documented datasets for model training and evaluation, including dataset versioning, comprehensive documentation, quality metrics, and metadata that enables appropriate usage.
- Coordinate cross-project data initiatives to prevent duplicate data work, facilitate shared datasets, and ensure consistent data practices across Basis as the organization scales.
- Optimize data infrastructure for scale as compute grows, including cost optimization, performance tuning, caching strategies, and efficient data access patterns.
- Collaborate with research and engineering teams to understand data needs, translate requirements into technical solutions, and provide consultation on data architecture and best practices.
- Implement data governance policies ensuring compliance with privacy regulations, security requirements, and responsible AI practices as Basis serves external customers.
- Contribute to the culture and direction of Basis by modeling data quality rigor, documentation excellence, and focus on trustworthy data infrastructure.
Role Details
Exceptional candidates who may not meet all of the following criteria are still encouraged to apply.
- FT/PT: Full-time.
- In-person Policy: We are in the office four days a week. Be prepared to attend multi-day Basis-wide in-person events.
- Location: New York City.
- Salary range: Competitive salary.
Privacy Notice
By submitting your application, you grant Basis permission to use your materials for both hiring evaluation and recruitment-related research and development purposes. Your information may be processed in different countries, including the US. You retain copyright while providing Basis a license to use these materials for the stated purposes.
Read our full Global Data Privacy Notice here.
See all 31+ Data Engineer at Basis Research Institute jobs
Sign up for free to unlock all listings, filter by visa type, and get alerts for new Data Engineer at Basis Research Institute roles.
Get Access To All JobsTips for Finding Data Engineer Jobs at Basis Research Institute Jobs
Frame your portfolio around research data pipelines
Basis Research Institute works within a science and research context, so position your Data Engineer credentials around large-scale scientific datasets, ETL workflows, and reproducible data pipelines rather than purely commercial analytics projects.
Confirm OPT or CPT eligibility before applying
If you're on an F-1 visa, verify your OPT start date and remaining authorization window with your DSO before submitting applications. Research organizations often run slower hiring cycles, so timing your authorization period to the offer stage matters.
Ask directly about H-1B filing timelines
During the offer stage, ask the hiring team whether they file H-1B petitions on a regular cycle or as needed. For roles in science and research, some employers file outside the standard April cap window using cap-exempt status, which affects your start date planning.
Verify E-Verify enrollment before accepting an offer
Check that Basis Research Institute is enrolled in E-Verify through the USCIS E-Verify employer search before you reach the offer stage. This is a prerequisite for F-1 OPT STEM extension eligibility and confirms the employer can legally onboard international hires.
Target open Data Engineer roles using Migrate Mate
Filter your job search by visa type and employer to surface Data Engineer openings at Basis Research Institute that are open to sponsored candidates. Migrate Mate lets you search specifically by sponsorship history so you're not cold-applying to roles unlikely to move forward.
Prepare your TN documentation before the offer letter
If you're a Canadian or Mexican national, TN status for Data Engineers typically falls under the Computer Systems Analyst or similar qualifying USMCA category. Have your degree certificates and a detailed job duties letter ready before the offer is finalized so you can cross at the port of entry without delay.
Data Engineer at Basis Research Institute jobs are hiring across the US. Find yours.
Find Data Engineer at Basis Research Institute JobsFrequently Asked Questions
Does Basis Research Institute sponsor H-1B visas for Data Engineers?
Yes, Basis Research Institute has sponsored H-1B visas for Data Engineer roles. If you're currently on an H-1B or will need one, confirm during the interview process whether the team files cap-subject petitions in April or operates under a cap-exempt research institution designation, as that changes your expected start date significantly.
Which visa types are commonly used for Data Engineer roles at Basis Research Institute?
Basis Research Institute supports several visa categories for Data Engineers, including H-1B, F-1 OPT, F-1 CPT, TN, and J-1. F-1 candidates working through OPT or CPT are a common fit for research-focused data roles, and TN status is a practical option for Canadian and Mexican nationals in qualifying technical classifications.
What qualifications or experience are expected for a Data Engineer at Basis Research Institute?
Expect the role to require a bachelor's or master's degree in computer science, data engineering, or a closely related field, which also supports H-1B specialty occupation eligibility. Practical experience with data pipeline tooling, SQL, Python, and cloud storage platforms is standard. Research-sector experience or familiarity with scientific data formats can strengthen your application meaningfully.
How do I apply for Data Engineer jobs at Basis Research Institute?
Browse open Data Engineer positions at Basis Research Institute through Migrate Mate, which filters for roles with confirmed sponsorship history so you're applying to positions where your visa situation is already a known factor. From there, tailor your application to highlight research data infrastructure experience and confirm sponsorship scope directly with the recruiter early in the process.
How do I manage the filing timeline if I receive a Data Engineer offer from Basis Research Institute?
Once you have a written offer, your employer needs to file a Labor Condition Application with the DOL before any H-1B petition goes to USCIS. Build in at least 30 to 60 days for LCA certification plus petition preparation. If you're bridging from F-1 OPT, confirm your authorization end date and ask whether the employer will file for an H-1B cap-gap extension to cover any gap in status.
See which Data Engineer at Basis Research Institute employers are hiring and sponsoring visas right now.
Search Data Engineer at Basis Research Institute Jobs