Director of Engineering – Platform Engineering at Hippocratic AI

Palo Alto, California, United States

Hippocratic AI Logo
Not SpecifiedCompensation
Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
Healthcare, Artificial IntelligenceIndustries

Requirements

  • Experience leading multidisciplinary engineering teams in cloud operations, SRE, and GPU orchestration
  • Expertise in designing, implementing, and operating cloud infrastructure, observability systems, and GPU control planes
  • Strong background in scaling global compute fabrics for LLM workloads with reliability, security, and cost efficiency
  • Proficiency in multi-cloud (AWS, GCP, Azure), multi-region environments, infrastructure-as-code, and CI/CD practices
  • Knowledge of security and compliance standards (HIPAA, SOC 2) for healthcare data
  • Ability to define infrastructure roadmaps, SLOs/SLIs/SLAs, and incident management processes
  • Skills in architecting observability (telemetry, tracing, logging, alerting) and developer platforms
  • Capability to evaluate emerging technologies and collaborate with AI, product, and compliance teams

Responsibilities

  • Recruit and develop a world-class platform engineering team
  • Foster a culture of innovation, accountability, and technical excellence
  • Mentor and coach engineers and managers for high performance and career growth
  • Build and scale a high-performing team for infrastructure operations and systems reliability
  • Define and execute the long-term infrastructure roadmap for multi-cloud, multi-region GPU and compute environments
  • Drive excellence in cloud cost optimization, capacity planning, and service reliability
  • Architect and manage the global GPU control plane for dynamic provisioning, scheduling, and monitoring of inference workloads
  • Lead design and automation of deployments across AWS, GCP, Azure, and on-prem using infrastructure-as-code and CI/CD
  • Ensure strong security posture and compliance across all environments (HIPAA, SOC 2)
  • Develop and scale comprehensive observability systems (telemetry, tracing, logging, alerting)
  • Establish SLOs, SLIs, SLAs for mission-critical services and infrastructure
  • Implement robust incident management, root cause analysis, and continuous improvement processes
  • Partner with AI and product teams to anticipate needs and design scalable architectures
  • Contribute to internal developer platforms for improved productivity and standardization
  • Evaluate emerging technologies

Skills

Cloud Infrastructure
Observability Systems
GPU Control Plane
LLM Workloads
Compute Scaling
Reliability Engineering
Security
Cost Optimization
Platform Engineering
Kubernetes
AWS
GCP

Hippocratic AI

About Hippocratic AI

N/AHeadquarters
N/AYear Founded
N/ACompany Stage

Land your dream remote job 3x faster with AI