Senior Software Engineer, Distributed Systems
MixpanelFull Time
Senior (5 to 8 years)
Candidates should have 5+ years of experience in infrastructure/platform engineering or large-scale distributed systems, with at least 2 years of hands-on experience with the Ray platform. A strong understanding of distributed computing principles, experience with distributed storage systems and large-scale data processing, and proven ability to debug and profile distributed jobs are required. Experience with deep learning frameworks like PyTorch or TensorFlow is a significant advantage, and experience with model optimization for distributed training or Ads ML is a bonus.
The Senior Machine Learning Engineer will design, build, and maintain large-scale distributed training infrastructure for Ads ML models. They will develop tools and frameworks on top of the Ray platform, and build tools to debug, profile, and tune distributed training jobs for performance and reliability. Responsibilities also include integrating with object storage systems, improving data access patterns, and collaborating with ML engineers to enhance model training time, efficiency, and GPU training costs. Driving improvements in scheduling, state management, and fault tolerance within the training platform is also a key duty.
Online platform for community discussions and content
Reddit is an online platform that allows users to post, vote, and comment on content within various communities based on shared interests. Users can engage in discussions on a wide range of topics, from news and sports to entertainment and hobbies. The platform features a unique voting system where content can receive upvotes or downvotes, helping the most popular posts gain visibility. Reddit generates revenue primarily through advertising, premium memberships, and the sale of virtual goods like Reddit Coins. Unlike many other social media platforms, Reddit emphasizes community-driven content and discussions, creating a space for authentic interactions among users. The goal of Reddit is to foster engagement and connection among its diverse user base.