Senior Site Reliability Engineer, DGX Cloud
NVIDIAFull Time
Senior (5 to 8 years)
Candidates should possess 3-5 years of experience in systems engineering and architecture, including requirements gathering, basic architecture development, systems integration, and testing. They should also have 3-5 years of experience in cloud computing (AWS, Azure, or GCP), covering design, deployment, operations, and basic automation, along with 3-5 years of experience working with Infrastructure-as-Code (for example, Terraform, Ansible) to provision and manage cloud resources. Furthermore, candidates are expected to have experience meeting SLAs through effective issue identification, escalation, and resolution in a fast-paced environment and a proven track record of contributing to operational improvements and supporting compliance requirements (for example, FedRAMP).
The Site Reliability Engineer will perform hands-on engineering work, including developing new deployments, automation scripts, and tooling to meet client needs. They will also manage and maintain patch management processes, ensuring timely updates, security compliance, and system stability across cloud and on-prem environments, oversee Identity and Access Management (IAM), implementing and enforcing security best practices to protect sensitive data and ensure proper access controls, perform cloud administration and system administration tasks, such as provisioning resources, optimizing performance, and troubleshooting infrastructure issues. The role involves collaborating with senior engineers and solutions architecture teams to address complex technical issues, ensuring timely resolutions and maintaining client satisfaction, adhering to established quality standards for engineering deliverables, aligning with internal protocols, compliance regulations, and project deadlines, identifying and communicating potential risks, working with relevant stakeholders to incorporate mitigation strategies that meet regulatory and client expectations, and contributing to day-to-day project tasks, including tracking progress, providing updates, and ensuring assigned activities are completed on schedule.
Cybersecurity advisory and managed services provider
Coalfire provides cybersecurity advisory services to help businesses safeguard their digital assets and enhance their security protocols. The company offers a range of services, including risk assessments, threat management, compliance evaluations, and third-party risk management. Coalfire also specializes in cloud security consulting, assisting clients in securing their cloud environments and ensuring they meet compliance requirements. What sets Coalfire apart from its competitors is its focus on both large enterprises and highly regulated industries, such as healthcare and finance, along with its commitment to advancing cybersecurity education through initiatives like the Richard E. Dakin Fund. The goal of Coalfire is to empower organizations to effectively manage cyber risks and achieve compliance with industry standards.