ML Systems Engineer, Infrastructure & Cloud at Basis

New York, New York, United States

Basis Logo
Not SpecifiedCompensation
Mid-level (3 to 4 years), Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
AI Research, Nonprofit, Machine LearningIndustries

Requirements

  • Demonstrated expertise in ML systems engineering, such as managing distributed training jobs across hundreds of GPUs, debugging and fixing numerical instabilities in large-scale training, building infrastructure for reproducible ML experiments, and optimizing training throughput and resource utilization
  • Deep knowledge of distributed training frameworks including PyTorch/JAX distributed strategies (DDP, FSDP, ZeRO), gradient accumulation, mixed precision training, and checkpoint/recovery systems
  • Strong cloud administration skills including AWS/GCP/Azure services, infrastructure as code (Terraform), Kubernetes orchestration, cost optimization, security best practices, and compliance requirements
  • Understanding of the full ML stack from hardware (GPUs, interconnects, storage) through frameworks (PyTorch, JAX) to high-level training loops and evaluation pipelines
  • Skilled at debugging complex failures across the stack—GPU/NCCL issues, data loading bottlenecks, memory leaks, gradient explosions, and convergence problems
  • Value documentation and knowledge sharing, maintaining comprehensive logs of issues, solutions, and lessons learned
  • Progress with autonomy while coordinating closely with researchers, anticipating infrastructure needs, preventing problems, and responding quickly to issues

Responsibilities

  • Own distributed training infrastructure including job launchers, checkpointing systems

Skills

ML Systems
Distributed Training
GPU Clusters
Cloud Infrastructure
DevOps
Numerical Debugging
Compute Optimization
Training Frameworks
Cloud Administration
Security Compliance

Basis

Platform for developing financial applications

About Basis

Basis provides a platform that assists businesses in creating financial applications. The platform includes a variety of tools and integrations that simplify the process of building, testing, and deploying these applications. Clients, which range from financial institutions to fintech startups, can access the platform through a subscription model, paying for its features and support. Basis distinguishes itself from competitors by focusing on streamlining the development process and offering custom solutions tailored to the specific needs of its clients. The company's goal is to empower businesses to innovate and enhance their financial operations through improved application offerings.

1688 Pine St UNIT E211, San Francisco, CA 94109, USAHeadquarters
2022Year Founded
$6MTotal Funding
SEEDCompany Stage
Enterprise Software, FintechIndustries
1-10Employees

Risks

Emerging fintech startups offer similar tools at lower costs.
Regulatory scrutiny on data privacy may increase compliance costs.
Economic downturns could reduce demand for Basis's services.

Differentiation

Basis provides real-time cash flow profiles for B2B lenders.
The platform integrates data from revenue, accounting, and banking sources.
Basis offers a subscription-based model with premium services and custom solutions.

Upsides

Increased demand for real-time financial data analytics benefits Basis's offerings.
The rise of embedded finance aligns with Basis's focus on lending stacks.
Open banking trends enhance Basis's cash flow profile accuracy.

Land your dream remote job 3x faster with AI