AI/HPC Network Development Engineer - Networking at xAI

Dublin, County Dublin, Ireland

xAI Logo
Not SpecifiedCompensation
Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
AI, High Performance ComputingIndustries

Requirements

  • Minimum of 10 years designing and operating large scale networks with 5 years in the ethernet AI/HPC space
  • Deep understanding of congestion control on ethernet with Infiniband an added bonus
  • Deep understanding of AI training and inference workloads and how they operate on the network
  • Ability to use and debug NCCL and potentially commit to the library
  • Expertise in creating a portfolio of metrics for performance and operations to optimize the fleet for training and inference traffic
  • Experience with Python to automate away repetitive tasks and facilitate working with and analyzing large sets of data
  • Deep experience in RoCEv2
  • Strong communication skills to concisely and accurately share knowledge with teammates
  • Willingness to travel significantly to Memphis for data center buildouts and to Palo Alto for team collaboration
  • Availability for team on-call rotation and helping with scaling and maintenance efforts

Responsibilities

  • Develop network at hyper scale while optimizing performance and availability
  • Understand current network performance and availability and optimize it for training models and customer inference queries
  • Spend time deep inside NCCL, building metric dashboards, and tweaking configurations to ensure no performance is left on the table
  • Help design the next iteration of backend and front-end networks to allow seamless build-out of new GPU infrastructure with little to no engineering assistance
  • Travel to Memphis for building more capacity
  • Participate in team on-call rotation and assist with scaling and maintenance efforts
  • Contribute to deployment and operations frameworks to remove repetitive tasks

Skills

RoCEv2
NCCL
Ethernet
HPC Networking
GPU Clusters
Network Optimization
Metric Dashboards
Performance Tuning
Configuration Management

xAI

AI tools for research and information retrieval

About xAI

x.ai develops AI tools aimed at enhancing research and information retrieval. Their main product, Grok, is designed to answer a variety of questions, including unconventional ones that other AI systems might not handle. Grok provides real-time knowledge, making it a useful resource for researchers, academics, and professionals who need quick access to relevant information. Unlike competitors, Grok stands out for its ability to suggest questions and provide nuanced answers, catering to a diverse range of inquiries. The goal of x.ai is to empower users by streamlining their research processes and fostering innovation through reliable information access.

Burlingame, CaliforniaHeadquarters
2023Year Founded
$11,803.1MTotal Funding
SERIES_CCompany Stage
Data & Analytics, AI & Machine LearningIndustries
1,001-5,000Employees

Benefits

Health Insurance
Remote Work Options

Risks

Increased competition from Anthropic could challenge xAI's market position.
Legal battles involving Elon Musk may divert resources from xAI's operations.
Reliance on Nvidia GPUs poses risks if supply chain issues arise.

Differentiation

Grok answers unconventional questions, unlike many AI systems.
xAI's Grok provides real-time knowledge, enhancing research efficiency.
Grok's ability to generate striking images sets it apart in visual data processing.

Upsides

xAI secured $6 billion funding, boosting AI infrastructure and R&D.
Grok's iOS app launch expands user accessibility and engagement.
AI-driven research tools are increasingly integrated with cloud platforms, aiding xAI's growth.

Land your dream remote job 3x faster with AI