ML Platform Engineer at eBay

Bengaluru, Karnataka, India

eBay Logo
Not SpecifiedCompensation
Mid-level (3 to 4 years), Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
Ecommerce, Technology, AIIndustries

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related technical discipline (or equivalent experience)
  • 5+ years of experience in ML operations, DevOps, or platform support for distributed AI/ML systems
  • Proven experience providing L1/L2 and on-call support for Ray.io and Kubernetes-based clusters supporting ML training and inference workloads
  • Strong understanding of Ray cluster operations, including autoscaling, job scheduling, and workload orchestration

Responsibilities

  • Serve as the first point of contact (L1) for all support requests related to the AI/ML Platform, including ML training, inference, model deployment, and GPU allocation
  • Provide operational and on-call (PagerDuty) support for Ray.io and Kubernetes clusters running distributed ML workloads across cloud and on-prem environments
  • Monitor, triage, and resolve platform incidents involving job failures, scaling errors, cluster instability, or GPU resource contention
  • Manage GPU quota allocation and scheduling across multiple user teams, ensuring compliance with approved quotas and optimal resource utilization
  • Support Ray Train/Tune for large-scale distributed training and Ray Serve for autoscaled inference, maintaining performance and service reliability
  • Troubleshoot Kubernetes workloads, including pod scheduling, networking, image issues, and resource exhaustion in multi-tenant namespaces
  • Collaborate with platform engineers, SREs, and ML practitioners to resolve infrastructure, orchestration, and dependency issues impacting ML workloads
  • Improve observability, monitoring, and alerting for Ray and Kubernetes clusters using Prometheus, Grafana, and OpenTelemetry to enable proactive issue detection
  • Maintain and enhance runbooks, automation scripts, and knowledge base documentation to accelerate incident resolution and reduce recurring support requests
  • Participate in root cause analysis (RCA) and post-incident reviews, contributing to platform improvements and automation initiatives to minimize downtime

Skills

Kubernetes
Ray.io
Machine Learning
ML Training
ML Inference
Model Deployment
GPU
PagerDuty
Monitoring
Triaging

eBay

Global online marketplace for buying and selling

About eBay

eBay is an online marketplace where individuals and businesses can buy and sell a wide range of products, including electronics, fashion, and collectibles. Users can list items for auction or use the 'Buy It Now' option for immediate purchases. eBay stands out from competitors with its vast selection, user-friendly interface, and strong buyer and seller protections, including free shipping on many items. The company's goal is to facilitate seamless transactions while generating revenue through transaction fees, advertising, and subscription services.

San Jose, CaliforniaHeadquarters
1995Year Founded
$6.5MTotal Funding
IPOCompany Stage
Consumer Software, Consumer GoodsIndustries
10,001+Employees

Benefits

Health Insurance
401(k) Company Match
Paid Vacation
Parental Leave

Risks

Niche e-commerce platforms like StockX and Poshmark may draw away eBay's customers.
Social media platforms with shopping features could divert traffic from eBay.
Direct-to-consumer brands bypassing marketplaces may reduce eBay's seller base.

Differentiation

eBay offers a unique auction-style listing alongside 'Buy It Now' options.
The platform provides robust buyer and seller protections, enhancing trust in transactions.
eBay's vast product selection includes new, refurbished, and second-hand items.

Upsides

eBay's acquisition of GSI Commerce strengthens its position as a global commerce partner.
The rise of social commerce offers eBay opportunities to integrate social shopping features.
Increased focus on sustainability aligns with eBay's refurbished and second-hand product categories.

Land your dream remote job 3x faster with AI