Senior Platform Engineer, Machine Learning
Fieldguide- Full Time
- Senior (5 to 8 years), Mid-level (3 to 4 years)
Candidates should have professional experience with model training, Huggingface Transformers, Pytorch, vLLM, TensorRT, infrastructure as code tools like Terraform, scripting languages such as Python or Bash, and cloud platforms such as Google Cloud, AWS, or Azure. Familiarity with high-performance, large-scale ML systems and monitoring tools such as Prometheus or Grafana is also required. A proactive approach to troubleshooting complex systems and identifying performance bottlenecks is essential, along with a strong desire to build and operate scalable, reliable systems. Preferred qualifications include 5+ years of building core infrastructure and experience running inference clusters at scale.
The Platform engineer, MLOps will work closely with AI/ML engineers and researchers to design and deploy a CI/CD pipeline for safe and reproducible experiments. They will set up and manage monitoring, logging, and alerting systems for extensive training runs and client-facing APIs. Ensuring training environments are consistently available across multiple clusters, developing and managing containerization and orchestration systems using Docker and Kubernetes, and operating large Kubernetes clusters with GPU workloads are key duties. Additionally, they will improve the reliability, quality, and time-to-market of software solutions and provide operational support for large-scale distributed applications.
AI platform for business process optimization
Writer.com provides a platform that uses artificial intelligence to enhance business processes for various clients, including major companies. The platform integrates Large Language Models (LLMs), Natural Language Processing (NLP), and Machine Learning (ML) to tailor AI solutions to a client's specific brand and knowledge. LLMs generate human-like text, NLP helps computers understand human language, and ML enables learning from data. Writer's platform is built on secure, enterprise-grade LLMs called Palmyra, which are designed to be open and transparent, ensuring that client data is never used for training. This allows clients to self-host the models, giving them control and visibility over their data. The platform assists businesses in speeding up processes and generating customized outputs, such as summaries and insights, while adhering to legal and brand guidelines. Writer.com operates on a subscription-based model, where clients pay a recurring fee for access to its features.