TetraScience

Sr. Platform and DataLake Engineer

United States

Not SpecifiedCompensation
Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
Scientific Data, AI Cloud, Data ManagementIndustries

Senior Platform and Data Lake Engineer

Position Overview

TetraScience is the Scientific Data and AI Cloud company, catalyzing the Scientific AI revolution by designing and industrializing AI-native scientific data sets. We are building next-generation lab data management solutions and AI-enabled outcomes. As a Senior Platform and Data Lake Engineer, you will play a critical role in building and maintaining our data infrastructure, working closely with cross-functional teams to ensure the seamless ingestion, processing, and storage of significant volumes of scientific data. This role requires extensive experience building and supporting data pipeline infrastructure and hands-on experience in the Databricks ecosystem.

The Role

  • Design, develop, and optimize data lake solutions to support our scientific data pipelines and analytics capabilities.
  • Design, develop, and optimize data pipelines and workflows within the Databricks platform.
  • Design and architect services to meet customer data processing needs.
  • Implement data quality and governance frameworks to ensure data integrity and compliance.

Requirements

  • Experience: 8+ years of experience in the software development industry, preferably in data engineering, data warehousing, or data analytics companies and teams.
  • Data Pipelines: Experienced in designing and implementing complex, scalable data pipelines/ETL services.
  • Programming Languages: Expert level of Python, Java, and Typescript.
  • Cloud Technologies: Extensive experience with cloud-based data storage and processing technologies, particularly AWS services such as S3, Step Functions, Lambda, and Airflow.
  • Lake House Architecture: Expert level of understanding and hands-on experience with Lake House architecture.
  • Communication: Ability to articulate ideas clearly, present findings persuasively, and build rapport with clients and team members.

Nice to Have

  • Knowledge of basic DevOps and MLOps principles.
  • 3+ years of experience with the Databricks ecosystem.
  • Expert level of experience with Spark/Glue and Delta tables/Iceberg.
  • Working knowledge of Snowflake.
  • Experience in working with Data Scientists and ML Developers.
  • Experience in management and lead developer roles from technology services companies.
  • Hands-on experience with data warehousing solutions and ETL tools.

Company Information

TetraScience is the category leader in the Scientific Data and AI Cloud market. In the last year, dominant players in compute, cloud, data, and AI infrastructure have converged on TetraScience as the de facto standard, entering into co-innovation and go-to-market partnerships.

The Tetra Way: Candidates will be asked to review the "Tetra Way" letter, authored by the CEO, to understand the company's values and ethos. Alignment with this document is crucial for potential employees.

Benefits

  • 100% employer-paid benefits for all eligible employees and immediate family members.
  • Unlimited paid time off (PTO).
  • 401K.
  • Flexible working arrangements - Remote work.
  • Company-paid Life Insurance, LTD/STD.
  • A culture of continuous improvement with opportunities for career growth and coaching.

Additional Details

  • Employment Type: Full-time
  • Workplace: Remote
  • Language: English
  • Department: Engineering
  • Published: 2024-12-02
  • Remote: Yes
  • Visa Sponsorship: Not currently provided.

Skills

Data Lake
Data Pipelines
Databricks
Cloud
AI
Data Infrastructure
Data Ingestion
Data Processing
Data Storage
Analytics

TetraScience

Cloud platform for scientific data management

About TetraScience

TetraScience offers a cloud-based platform called the Scientific Data Cloud, which helps biopharmaceutical companies manage and harmonize their scientific data for research and development, quality assurance, and manufacturing. The platform connects various lab instruments and software, streamlining data management and significantly reducing task completion time. TetraScience's vendor-neutral and open design allows it to work with any lab equipment, making it a flexible solution in the life sciences sector. The company's goal is to enhance scientific outcomes by preparing data for artificial intelligence and machine learning applications.

Boston, MassachusettsHeadquarters
2019Year Founded
$113.8MTotal Funding
SERIES_BCompany Stage
AI & Machine Learning, Biotechnology, HealthcareIndustries
51-200Employees

Benefits

Unlimited PTO
100% company paid health, dental, & vision
Company paid life insurance
401k savings
Company paid disability insurance
Equity program
Flexible work arrangements

Risks

Rapid AI development may outpace TetraScience's integration capabilities, risking obsolescence.
Dependency on partners like Google Cloud and NVIDIA could pose risks if disrupted.
International expansion may expose TetraScience to regulatory and compliance challenges.

Differentiation

TetraScience offers a vendor-neutral, open, cloud-native platform for scientific data management.
The platform integrates with any lab equipment or software, enhancing flexibility and adaptability.
TetraScience's Scientific Data Cloud centralizes and harmonizes data, preparing it for AI/ML applications.

Upsides

Partnerships with NVIDIA and Google Cloud enhance AI-native scientific datasets and capabilities.
Collaboration with Databricks accelerates the Scientific AI revolution in life sciences.
Bayer AG partnership maximizes scientific data value, driving innovation in biopharma.

Land your dream remote job 3x faster with AI