Pod Software Engineer at Etched.ai

Cupertino, California, United States

Apply Now

Not SpecifiedCompensation

Senior (5 to 8 years)Experience Level

Full TimeJob Type

UnknownVisa

Artificial Intelligence, High Performance ComputingIndustries

Requirements

Strong programming skills in C/C++
Experience with at least one scripting language (e.g., Python, Bash, Go)
Strong experience with device-to-device networking technologies, including RDMA, RoCE, GPUDirect, queue pairs, completion queues, and transport types
Solid understanding of operating systems (Linux preferred) and server hardware architectures
Ability to analyze complex technical problems and provide effective solutions
Excellent communication and collaboration skills
Ability to work independently and as part of a team
Experience with version control systems (e.g., Git)

Responsibilities

Design, develop, and implement RDMA-based networking peering for high bandwidth and low latency communication
Work across operating systems, kernel drivers, embedded software, and system software
Develop tests to qualify host processors, NICs, and device network interfaces
Furnish burn-in teams with tests representing real-world use cases and extreme-load stress testing
Define key metrics for system software to collect for high availability and performance
Analyze performance deviations and optimize network stack configurations
Propose kernel tuning parameters for low-latency, high-bandwidth inference workloads
Design and execute automated qualification tests for RDMA NICs and interconnects
Identify and root-cause firmware, driver, and hardware issues impacting RDMA performance and reliability
Collaborate with ODMs and silicon vendors to validate new RDMA features
Implement and validate peer RDMA support for GPU-to-GPU and accelerator-to-accelerator communication
Modify kernel drivers and user-space libraries to optimize direct memory access
Profile and benchmark inter-node RDMA latency and bandwidth
Optimize NIC and switch configurations for throughput, congestion control, and reliability

Skills

Key technologies and capabilities for this role

C++CPythonBashGoRDMARoCEGPUDirectLinuxGitKernel Drivers

Questions & Answers

Common questions about this position

Is this position remote or onsite?

This is an onsite position.

What is the salary for this Pod Software Engineer role?

This information is not specified in the job description.

What skills are required for this role?

Required skills include proficiency in C/C++, experience with at least one scripting language like Python or Bash, strong RDMA expertise including RoCE and GPUDirect, solid understanding of operating systems like Linux, and experience with version control such as Git.

What kind of teamwork is expected in this role?

The role requires the ability to work independently and as part of a team, with collaboration across various teams, ODMs, and silicon vendors.

What makes a strong candidate for this Pod Software Engineer position?

A strong candidate will have strong problem-solving skills, excellent communication and collaboration abilities, and hands-on experience with RDMA technologies, C/C++ programming, and operating systems.

Etched.ai

Develops servers for transformer inference

About Etched.ai

The company specializes in developing powerful servers for transformer inference, utilizing transformer architecture integrated into their chips to achieve highly efficient and advanced technology. The main technologies used in the product are transformer architecture and advanced chip integration.

Cupertino, CA, USAHeadquarters

2022Year Founded

$5.4MTotal Funding

SEEDCompany Stage

HardwareIndustries

11-50Employees