Senior Site Reliability Engineer, DGX Cloud
NVIDIAFull Time
Senior (5 to 8 years)
Candidates must have hands-on experience operating cloud-based systems, preferably AWS, and proficiency with Kubernetes, Helm, and Docker. A strong understanding of observability tools like Datadog, Grafana, or Prometheus is required, along with familiarity with CI/CD tooling and deployment pipelines. Solid scripting or programming fundamentals, with Node.js experience being a plus, are necessary, as well as good instincts around systems design, incident management, and reliability practices. Comfort working in high-speed, high-scale environments is essential. Experience with messaging systems, internal developer platforms, or previous roles in platform, DevOps, or infrastructure teams are considered nice-to-haves.
The Site Reliability Engineer will maintain uptime and reliability across critical systems, focusing on scalability, observability, and incident prevention. Responsibilities include designing and managing cloud infrastructure using tools like Terraform, Kubernetes, and CI/CD pipelines, and automating operations for routine tasks, monitoring, deployment, and disaster recovery. The role also involves supporting and improving on-call processes, collaborating with cross-functional teams to implement best practices, building systems for visibility through dashboards and alerts, and contributing to infrastructure projects that enhance security, performance, and developer velocity.
HR platform for managing global workforces
Deel provides a platform that helps businesses manage their international workforces more easily. It offers a range of services including payroll processing, compliance monitoring, and immigration support, all integrated into one system. This allows companies to handle various HR functions from a single platform, which is especially useful for those with employees in different countries. Deel stands out from its competitors by combining multiple HR services into one cohesive solution, making it simpler for businesses to operate globally. The goal of Deel is to streamline the management of global teams, ensuring that companies can focus on their core operations while staying compliant with local laws and regulations.