Senior Machine Learning Engineer - Machine Learning Infrastructure
FlipFull Time
Senior (5 to 8 years)
Candidates should have 7+ years of professional experience in software engineering and infrastructure engineering. Extensive experience building and maintaining AI/ML infrastructure in production, including model, deployment, and lifecycle management is required. Strong knowledge of AWS and infrastructure-as-code frameworks, ideally with CDK, is necessary. Expert-level coding skills in TypeScript and Python building robust APIs and backend services are essential. Production-level experience with Databricks MLFlow, including model registration, versioning, asset bundles, and model serving workflows, is required. Expert level understanding of containerization (Docker) and hands-on experience with CI/CD pipelines and orchestration tools (e.g., ECS) is a plus. Proven ability to design reliable, secure, and scalable infrastructure for both real-time and batch ML workloads is needed. The ability to articulate ideas clearly, present findings persuasively, and build rapport with clients and team members is also required.
The Senior AI Infrastructure Engineer will design, implement, and maintain cloud-native infrastructure to support AI and data workloads, with a focus on AI and data platforms such as Databricks and AWS Bedrock. They will build and manage scalable data pipelines to ingest, transform, and serve data for ML and analytics. Responsibilities include developing infrastructure-as-code using tools like Cloudformation and AWS CDK to ensure repeatable and secure deployments. The engineer will collaborate with AI engineers, data engineers, and platform teams to improve the performance, reliability, and cost-efficiency of AI models in production. Driving best practices for observability, including monitoring, alerting, and logging for AI platforms, is a key duty. The role also involves contributing to the design and evolution of the AI platform to support new ML frameworks, workflows, and data types, and staying current with new tools and technologies to recommend improvements to architecture and operations. Integrating AI models and large language models (LLMs) into production systems to enable use cases using architectures like retrieval-augmented generation (RAG) is also part of the role.
Cloud platform for scientific data management
TetraScience offers a cloud-based platform called the Scientific Data Cloud, which helps biopharmaceutical companies manage and harmonize their scientific data for research and development, quality assurance, and manufacturing. The platform connects various lab instruments and software, streamlining data management and significantly reducing task completion time. TetraScience's vendor-neutral and open design allows it to work with any lab equipment, making it a flexible solution in the life sciences sector. The company's goal is to enhance scientific outcomes by preparing data for artificial intelligence and machine learning applications.