Experience building reliable ML pipelines in production that manage the ML lifecycle on AWS and/ or other cloud providers
Understand the requirements of a well-managed ML lifecycle such as having model and data versioning, experiment tracking, feature store, CI/CD, and continuous training (CT) on NVIDIA GPUs
Mastery of Python and at least one programming language such as Java, Kotlin, Scala, Golang, Rust, C++ etc
Mastery in model serving practices for batch and stream processing
Hands-on experience building and operation for data lake using one or more of the following big data frameworks or services: Spark, Kafka, Airflow, DBT, Debezium, AWS Athena, AWS Glue, Delta lake/ Iceberg etc
Experience with Kubernetes, Docker, Terraform or other cluster management
Responsibilities
Unifying ML system development and operations
Designing and building a robust ML lifecycle pipeline
Identifying opportunities to improve data handling
Promoting best practices for ML lifecycle management