Staff Engineer, Distributed Systems
LaunchDarklyFull Time
Senior (5 to 8 years)
Key technologies and capabilities for this role
Common questions about this position
Required skills include demonstrated expertise in ML systems engineering such as managing distributed training jobs across hundreds of GPUs, debugging numerical instabilities, and building reproducible ML infrastructure; deep knowledge of PyTorch/JAX distributed strategies like DDP, FSDP, ZeRO; strong cloud skills in AWS/GCP/Azure, Terraform, Kubernetes; and understanding of the full ML stack from hardware to evaluation pipelines.
This information is not specified in the job description.
This information is not specified in the job description.
Basis emphasizes a 'logbook culture' for documenting issues and solutions, treats operational excellence as a first-class concern, and is building a collaborative organization that puts human values first.
Strong candidates combine deep understanding of ML systems with operational excellence, have experience with distributed training at scale, debugging numerical instabilities, managing cloud infrastructure, and a passion for enabling reproducible research and optimizing compute costs.
Platform for developing financial applications
Basis provides a platform that assists businesses in creating financial applications. The platform includes a variety of tools and integrations that simplify the process of building, testing, and deploying these applications. Clients, which range from financial institutions to fintech startups, can access the platform through a subscription model, paying for its features and support. Basis distinguishes itself from competitors by focusing on streamlining the development process and offering custom solutions tailored to the specific needs of its clients. The company's goal is to empower businesses to innovate and enhance their financial operations through improved application offerings.