Data Analytics Engineer Jobs at Basis Research Institute with Visa Sponsorship
Basis Research Institute hires Data Analytics Engineers to support rigorous, evidence-based research programs. The institute has an established path for sponsoring work visas in this function, making it a viable target for international candidates with strong technical and analytical backgrounds in research-driven environments.
See All Data Analytics Engineer at Basis Research Institute JobsOverview
Showing 5 of 27+ Data Analytics Engineer Jobs at Basis Research Institute jobs


Have you applied for this role?


Have you applied for this role?


Have you applied for this role?


Have you applied for this role?


Have you applied for this role?
See all 27+ Data Analytics Engineer Jobs at Basis Research Institute
Sign up for free to unlock all listings, filter by visa type, and get alerts for new Data Analytics Engineer Jobs at Basis Research Institute.
Get Access To All Jobs
About Basis
Basis is a nonprofit applied AI research organization with two mutually reinforcing goals. The first is to understand and build intelligence. This means to establish the mathematical principles of what it means to reason, to learn, to make decisions, to understand, and to explain; and to construct software that implements these principles. The second is to advance society’s ability to solve intractable problems. This means expanding the scale, complexity, and breadth of problems that we can solve today, and even more importantly, accelerating our ability to solve problems in the future. To achieve these goals, we’re building both a new technological foundation that draws inspiration from how humans reason, and a new kind of collaborative organization that puts human values first.
About The Role
Data Engineers on the Platform team at Basis build trustworthy data pipelines with comprehensive provenance and quality gates, curate documented datasets for training and evaluation, and ensure data infrastructure scales reliably. You will work on both platform-specific data needs and cross-project data coordination, preventing duplicate work and facilitating shared datasets. We are looking for people who are technically excellent and treat data quality as a first-class concern. The ideal Data Engineer has experience with ML data pipelines, understands the full lifecycle from raw data through model training and evaluation, and brings rigor to data provenance, lineage tracking, and quality assurance. You combine software engineering discipline with deep understanding of data systems and ML requirements. This role is embedded across Platform and Research teams, working on infrastructure that supports both commercial offerings and internal research. You will help Basis scale data operations to support medium-scale models, ensure data governance as we serve external customers, and build systems that researchers can trust for reproducible experiments. We seek individuals who aspire to do rigorous, high-quality, robust data engineering, but are not afraid to iterate, learn from real usage, and explore different approaches to achieve excellence. Basis is a collaborative effort, both internally and with our external partners; we are looking for people who enjoy building data foundations for problems larger than ones they can tackle alone.
We expect you to:
- Have demonstrated significant achievements in data engineering for ML/AI systems. Examples include:
+ Building data pipelines for model training or evaluation at scale
+ Developing feature stores or data platforms serving multiple teams
+ Creating data quality frameworks and implementing governance systems
+ Designing data architectures that enabled new ML capabilities
- Possess strong proficiency in data technologies including SQL (expert level), Python for data processing, distributed computing frameworks (Spark, Dask), and workflow orchestration tools (Airflow, Dagster, Prefect).
- Have experience with cloud data platforms including data warehouses (Snowflake, BigQuery, Redshift), data lakes, object storage (S3), and streaming systems (Kafka, Kinesis, Flink) for both batch and real-time processing.
- Understand ML data requirements including feature engineering, training/validation/test splits, data versioning, experiment reproducibility, and the specific data needs of different model types and training procedures.
- Be skilled at data quality and governance including implementing validation frameworks, anomaly detection, data lineage tracking, metadata management, and ensuring compliance with privacy and security policies.
- Have knowledge of data modeling principles for both relational and NoSQL systems, understanding of schema design, normalization/denormalization tradeoffs, and performance optimization.
- Value data provenance and documentation. You ensure data pipelines are transparent, decisions are documented, and others can understand and trust the data you deliver.
- Progress with autonomy on complex data challenges. You can scope data projects, make sound architectural decisions, and deliver complete solutions from ingestion through consumption.
- Be excited about enabling rigorous research through trustworthy data infrastructure that advances our ability to solve intractable problems.
In addition, the following would be an advantage:
- Experience with feature stores (Tecton, Feast) or building feature platforms.
- Background in ML research or research engineering providing understanding of data needs across experiment lifecycle.
- Experience with data lineage tools (Apache Atlas, DataHub, Monte Carlo) and metadata management.
- Knowledge of vector databases and embedding pipelines for modern AI applications.
- Contributions to data engineering open-source projects (Airflow, dbt, Great Expectations).
- Understanding of responsible AI and data governance practices.
Responsibilities:
- Design and build data pipelines for training and evaluation across Basis research projects and platform offerings, ensuring reliability, performance, and scalability.
- Implement data quality frameworks including validation rules, quality gates, anomaly detection, and monitoring that catch data issues before they impact research or production systems.
- Develop and maintain feature stores or equivalent systems that enable consistent feature access across training and serving environments, preventing train-serve skew.
- Ensure data provenance and lineage tracking so researchers and engineers can understand data origins, transformations applied, and dependencies, enabling reproducible experiments and debugging.
- Curate documented datasets for model training and evaluation, including dataset versioning, comprehensive documentation, quality metrics, and metadata that enables appropriate usage.
- Coordinate cross-project data initiatives to prevent duplicate data work, facilitate shared datasets, and ensure consistent data practices across Basis as the organization scales.
- Optimize data infrastructure for scale as compute grows, including cost optimization, performance tuning, caching strategies, and efficient data access patterns.
- Collaborate with research and engineering teams to understand data needs, translate requirements into technical solutions, and provide consultation on data architecture and best practices.
- Implement data governance policies ensuring compliance with privacy regulations, security requirements, and responsible AI practices as Basis serves external customers.
- Contribute to the culture and direction of Basis by modeling data quality rigor, documentation excellence, and focus on trustworthy data infrastructure.
Role Details
Exceptional candidates who may not meet all of the following criteria are still encouraged to apply.
- FT/PT: Full-time.
- In-person Policy: We are in the office four days a week. Be prepared to attend multi-day Basis-wide in-person events.
- Location: New York City.
- Salary range: Competitive salary.
Privacy Notice
By submitting your application, you grant Basis permission to use your materials for both hiring evaluation and recruitment-related research and development purposes. Your information may be processed in different countries, including the US. You retain copyright while providing Basis a license to use these materials for the stated purposes.
Read our full Global Data Privacy Notice here.

About Basis
Basis is a nonprofit applied AI research organization with two mutually reinforcing goals. The first is to understand and build intelligence. This means to establish the mathematical principles of what it means to reason, to learn, to make decisions, to understand, and to explain; and to construct software that implements these principles. The second is to advance society’s ability to solve intractable problems. This means expanding the scale, complexity, and breadth of problems that we can solve today, and even more importantly, accelerating our ability to solve problems in the future. To achieve these goals, we’re building both a new technological foundation that draws inspiration from how humans reason, and a new kind of collaborative organization that puts human values first.
About The Role
Data Engineers on the Platform team at Basis build trustworthy data pipelines with comprehensive provenance and quality gates, curate documented datasets for training and evaluation, and ensure data infrastructure scales reliably. You will work on both platform-specific data needs and cross-project data coordination, preventing duplicate work and facilitating shared datasets. We are looking for people who are technically excellent and treat data quality as a first-class concern. The ideal Data Engineer has experience with ML data pipelines, understands the full lifecycle from raw data through model training and evaluation, and brings rigor to data provenance, lineage tracking, and quality assurance. You combine software engineering discipline with deep understanding of data systems and ML requirements. This role is embedded across Platform and Research teams, working on infrastructure that supports both commercial offerings and internal research. You will help Basis scale data operations to support medium-scale models, ensure data governance as we serve external customers, and build systems that researchers can trust for reproducible experiments. We seek individuals who aspire to do rigorous, high-quality, robust data engineering, but are not afraid to iterate, learn from real usage, and explore different approaches to achieve excellence. Basis is a collaborative effort, both internally and with our external partners; we are looking for people who enjoy building data foundations for problems larger than ones they can tackle alone.
We expect you to:
- Have demonstrated significant achievements in data engineering for ML/AI systems. Examples include:
+ Building data pipelines for model training or evaluation at scale
+ Developing feature stores or data platforms serving multiple teams
+ Creating data quality frameworks and implementing governance systems
+ Designing data architectures that enabled new ML capabilities
- Possess strong proficiency in data technologies including SQL (expert level), Python for data processing, distributed computing frameworks (Spark, Dask), and workflow orchestration tools (Airflow, Dagster, Prefect).
- Have experience with cloud data platforms including data warehouses (Snowflake, BigQuery, Redshift), data lakes, object storage (S3), and streaming systems (Kafka, Kinesis, Flink) for both batch and real-time processing.
- Understand ML data requirements including feature engineering, training/validation/test splits, data versioning, experiment reproducibility, and the specific data needs of different model types and training procedures.
- Be skilled at data quality and governance including implementing validation frameworks, anomaly detection, data lineage tracking, metadata management, and ensuring compliance with privacy and security policies.
- Have knowledge of data modeling principles for both relational and NoSQL systems, understanding of schema design, normalization/denormalization tradeoffs, and performance optimization.
- Value data provenance and documentation. You ensure data pipelines are transparent, decisions are documented, and others can understand and trust the data you deliver.
- Progress with autonomy on complex data challenges. You can scope data projects, make sound architectural decisions, and deliver complete solutions from ingestion through consumption.
- Be excited about enabling rigorous research through trustworthy data infrastructure that advances our ability to solve intractable problems.
In addition, the following would be an advantage:
- Experience with feature stores (Tecton, Feast) or building feature platforms.
- Background in ML research or research engineering providing understanding of data needs across experiment lifecycle.
- Experience with data lineage tools (Apache Atlas, DataHub, Monte Carlo) and metadata management.
- Knowledge of vector databases and embedding pipelines for modern AI applications.
- Contributions to data engineering open-source projects (Airflow, dbt, Great Expectations).
- Understanding of responsible AI and data governance practices.
Responsibilities:
- Design and build data pipelines for training and evaluation across Basis research projects and platform offerings, ensuring reliability, performance, and scalability.
- Implement data quality frameworks including validation rules, quality gates, anomaly detection, and monitoring that catch data issues before they impact research or production systems.
- Develop and maintain feature stores or equivalent systems that enable consistent feature access across training and serving environments, preventing train-serve skew.
- Ensure data provenance and lineage tracking so researchers and engineers can understand data origins, transformations applied, and dependencies, enabling reproducible experiments and debugging.
- Curate documented datasets for model training and evaluation, including dataset versioning, comprehensive documentation, quality metrics, and metadata that enables appropriate usage.
- Coordinate cross-project data initiatives to prevent duplicate data work, facilitate shared datasets, and ensure consistent data practices across Basis as the organization scales.
- Optimize data infrastructure for scale as compute grows, including cost optimization, performance tuning, caching strategies, and efficient data access patterns.
- Collaborate with research and engineering teams to understand data needs, translate requirements into technical solutions, and provide consultation on data architecture and best practices.
- Implement data governance policies ensuring compliance with privacy regulations, security requirements, and responsible AI practices as Basis serves external customers.
- Contribute to the culture and direction of Basis by modeling data quality rigor, documentation excellence, and focus on trustworthy data infrastructure.
Role Details
Exceptional candidates who may not meet all of the following criteria are still encouraged to apply.
- FT/PT: Full-time.
- In-person Policy: We are in the office four days a week. Be prepared to attend multi-day Basis-wide in-person events.
- Location: New York City.
- Salary range: Competitive salary.
Privacy Notice
By submitting your application, you grant Basis permission to use your materials for both hiring evaluation and recruitment-related research and development purposes. Your information may be processed in different countries, including the US. You retain copyright while providing Basis a license to use these materials for the stated purposes.
Read our full Global Data Privacy Notice here.
See all 27+ Data Analytics Engineer at Basis Research Institute jobs
Sign up for free to unlock all listings, filter by visa type, and get alerts for new Data Analytics Engineer at Basis Research Institute roles.
Get Access To All JobsTips for Finding Data Analytics Engineer Jobs at Basis Research Institute Jobs
Align Your Credentials to Research Analytics Standards
Basis Research Institute operates in a scientific research context, so frame your experience around data pipelines, reproducible analysis, and research infrastructure rather than commercial product metrics. Highlight work that supported peer-reviewed or evidence-based outputs.
Confirm OPT Authorization Before Applying
If you're on F-1 OPT, verify your Employment Authorization Document is valid and that your role qualifies under your degree field before submitting applications. A mismatch between your authorized field and the Data Analytics Engineer job description can create compliance issues.
Target Open Roles with Migrate Mate
Use Migrate Mate to filter Data Analytics Engineer openings at Basis Research Institute by visa type, so you're only reviewing positions where your sponsorship pathway is confirmed before you invest time in the application.
Prepare for an LCA-Driven Salary Conversation
Before your offer stage, research DOL prevailing wage levels for Data Analytics Engineers in the institute's location. Your H-1B or TN offer must meet the wage requirements certified on the Labor Condition Application, so understanding the floor prevents surprises during negotiation.
Understand the J-1 Research Scholar Distinction
Basis Research Institute sponsors J-1 visas, which in research settings often carry a two-year home residency requirement. If you're considering this pathway, confirm with the institute whether your specific program category triggers that requirement before accepting an offer.
Build a Portfolio Around Research Data Infrastructure
Data Analytics Engineer roles at research institutes frequently involve building systems that handle sensitive or longitudinal datasets. Document projects where you designed compliant, auditable data pipelines, as this directly addresses what hiring teams in science-focused organizations evaluate.
Data Analytics Engineer at Basis Research Institute jobs are hiring across the US. Find yours.
Find Data Analytics Engineer at Basis Research Institute JobsFrequently Asked Questions
Does Basis Research Institute sponsor H-1B visas for Data Analytics Engineers?
Yes, Basis Research Institute sponsors H-1B visas for Data Analytics Engineers. As a research-focused organization, it has an established process for filing H-1B petitions through USCIS, including the required Labor Condition Application through DOL. Because H-1B availability depends on the annual lottery for cap-subject candidates, timing your application around the registration window each spring is critical.
How do I apply for Data Analytics Engineer jobs at Basis Research Institute?
Browse current Data Analytics Engineer openings at Basis Research Institute through Migrate Mate, which filters roles by visa sponsorship type so you can confirm eligibility before applying. Once you identify a relevant posting, apply directly through the institute's careers portal. Tailor your application to highlight experience with research data systems, analytical pipelines, and any domain knowledge relevant to the institute's research focus.
Which visa types does Basis Research Institute commonly use for Data Analytics Engineers?
Basis Research Institute sponsors H-1B, TN, J-1, F-1 OPT, and F-1 CPT for Data Analytics Engineer roles. H-1B is the most common long-term work visa path. TN is available to Canadian and Mexican nationals in qualifying technical roles. F-1 OPT and CPT are practical short-term options for students, and J-1 suits candidates entering through exchange or research programs.
What qualifications does Basis Research Institute expect for Data Analytics Engineer roles?
Basis Research Institute typically expects a bachelor's degree or higher in computer science, statistics, data science, or a closely related field for Data Analytics Engineer positions. Practical experience with data pipeline tools, SQL, Python, and working in research or scientific data environments strengthens your candidacy. Familiarity with handling structured research datasets and supporting analytical reproducibility is especially relevant given the institute's science and research focus.
How long does the H-1B sponsorship process take at Basis Research Institute?
The H-1B process at Basis Research Institute generally spans several months from offer to approved status. Your employer files the DOL Labor Condition Application first, typically taking seven business days. USCIS standard processing for the I-129 petition takes three to five months. Premium processing is available to reduce the USCIS adjudication window to around 15 business days, but the DOL step cannot be expedited.
See which Data Analytics Engineer at Basis Research Institute employers are hiring and sponsoring visas right now.
Search Data Analytics Engineer at Basis Research Institute Jobs