PCAI And AI Factory at Hewlett Packard Enterprise

Bengaluru, Karnataka, India

Hewlett Packard Enterprise Logo
Not SpecifiedCompensation
Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
Technology, AI, HPCIndustries

Requirements

  • Deep hands-on expertise in AI, HPC, and GPU-accelerated environments
  • Strong knowledge of HPE Ezmeral, NVIDIA AI Enterprise, Containerized workloads, and Automation frameworks
  • Expertise in managing compute nodes (HPE DL380a, DL325, Cray XD670), GPU clusters (NVIDIA L40S/H100/H200), and InfiniBand NDR networks
  • Knowledge of virtualization and container platforms such as vSphere, RHEL/RHOS, Ezmeral Runtime Enterprise, Kubernetes, and Rancher Harvester

Responsibilities

  • Administer and maintain HPE PCAI and AI Factory environments, ensuring optimal uptime and performance
  • Manage compute nodes (HPE DL380a, DL325, Cray XD670), GPU clusters (NVIDIA L40S/H100/H200), and InfiniBand NDR networks
  • Administer virtualization and container platforms such as vSphere, RHEL/RHOS, Ezmeral Runtime Enterprise, Kubernetes, and Rancher Harvester
  • Perform configuration, patching, version upgrades, and firmware updates across hardware and software layers
  • Proactively monitor system health using DCGM, NetQ, Grafana, and Exivity dashboards
  • Handle alerts, performance anomalies, and incidents across GPU, network, and storage layers
  • Lead root cause analysis (RCA) and corrective action plans to prevent recurring issues
  • Maintain operational documentation, runbooks, and incident logs
  • Manage cluster lifecycle through Ansible, AWX, HPE Performance Cluster Manager (HPCM), and SLURM
  • Oversee automation for provisioning, scaling, and patch management of Compute and Containerized workloads
  • Manage configuration changes, infrastructure templates, and version baselines in production and staging environments
  • Operate HPE Ezmeral Unified Analytics, Data Fabric, and AI Essentials platforms
  • Support NVIDIA AI Enterprise (NVAIE) components including NIMs, NeMO frameworks, and RAPIDS runtime
  • Manage and monitor AI/ML workloads (LLM, NLP, Computer Vision, Chatbots) on containerized clusters
  • Ensure smooth operation of development tools like Jupyter, Spark, Airflow, MLflow, Kubeflow, and Ray
  • Administer VAST, WEKA, and Alletra MP storage solutions for file, object, and distributed storage
  • Monitor storage performance, replication, and capacity utilization
  • Coordinate with storage engineering teams for performance optimization

Skills

Key technologies and capabilities for this role

HPE EzmeralNVIDIA AI EnterpriseAI InfrastructureHPCGPUContainerized WorkloadsAutomation FrameworksPrivate Cloud AIPCAIAI Factory

Questions & Answers

Common questions about this position

Is this role remote or hybrid?

This role is designed as ‘Hybrid’ with an expectation to work on average 2 days per week from an HPE office.

What is the salary range for this position?

This information is not specified in the job description.

What key skills and expertise are required for this role?

The role requires deep hands-on expertise in AI, HPC, and GPU-accelerated environments, strong knowledge of HPE Ezmeral, NVIDIA AI Enterprise, Containerized workloads, and Automation frameworks, plus experience administering vSphere, RHEL/RHOS, Kubernetes, Rancher Harvester, and managing HPE servers like DL380a, DL325, Cray XD670 with NVIDIA GPUs.

What is the company culture like at HPE?

HPE's culture thrives on finding new and better ways to accelerate what’s next, values varied backgrounds, offers flexibility to manage work and personal needs, embraces bold moves together, and supports career growth.

What makes a strong candidate for this position?

A strong candidate is a Subject Matter Expert with deep hands-on expertise in administering and operating large-scale AI infrastructure, including HPE PCAI, AI Factory, GPU clusters, container platforms like Kubernetes, and tools for monitoring and lifecycle management.

Hewlett Packard Enterprise

Provides enterprise IT solutions and services

About Hewlett Packard Enterprise

Hewlett Packard Enterprise provides enterprise IT solutions with a focus on cloud services, artificial intelligence, and edge computing. Their products include HPE Ezmeral for managing containers, HPE GreenLake for cloud services, and HPE Aruba for networking. These solutions help businesses improve their performance and adapt to digital changes. HPE's business model includes selling hardware, software, and services, as well as offering subscription-based services and long-term contracts. What sets HPE apart from competitors is its commitment to open-source projects and its active developer community, which supports collaboration and innovation. The company's goal is to empower organizations to transform digitally and optimize their operations.

Houston, TexasHeadquarters
1939Year Founded
IPOCompany Stage
Hardware, Enterprise Software, AI & Machine LearningIndustries
10,001+Employees

Risks

Integration challenges with Juniper Networks may delay AI-driven networking benefits.
Competition from startups like Flywheel could impact HPE's AI and cloud services.
HPE's acquisition strategy may strain resources and distract from core operations.

Differentiation

HPE's GreenLake offers a unique hybrid cloud platform for diverse IT environments.
HPE Ezmeral provides advanced container management, enhancing enterprise AI and analytics capabilities.
HPE's Aruba solutions integrate cloud security and networking for seamless, secure connectivity.

Upsides

HPE's acquisition of Juniper Networks boosts AI-driven innovation in networking.
OpsRamp acquisition enhances HPE's IT management with AI-based automation capabilities.
Axis Security integration strengthens HPE's cloud security offerings with SASE solutions.

Land your dream remote job 3x faster with AI