Develop software, typically in Python, to independently acquire data from disparate sources (databases, files, APIs, etc.) and combine them into appropriate training, validation, and testing datasets
Analyze raw datasets using descriptive statistics, working directly with domain experts to understand the meaning of data fields
Have a deep understanding of Generative AI and LLMs, evaluation metrics, and theory behind LLMs & conversational AI
Be able to explain the full pattern of Retrieval Augmented Generation (RAG) and make recommendations on specific technology and techniques to use when building solutions based on this pattern
Leverage open-source AI software like TensorFlow, PyTorch, HuggingFace, LangChain, AutoGen, LangGraph, and LlamaIndex for solution development and evaluation
Evaluate various AI frameworks and services for efficacy and make recommendations on their inclusion as standardized tooling for AI development
Integrate tools and software with Azure services for seamless development and deployment
Use Terraform for automating cloud provisioning of resources and other infrastructure tasks (IaC and automation)
Build unit tests, data quality checks, and data pipelines to ensure algorithms use trusted data
Implement CI/CD pipelines using Azure DevOps, MLOps, GitHub Actions, and Terraform for automated deployment of AI solutions
Utilize Azure Kubernetes Service (AKS) for managing and scaling containerized AI applications
Regularly update and maintain AI models using Azure Monitor tools
Enforce strict security measures and controls in Azure, including network security configurations, identity management, data encryption, and privacy
Comply with industry standards, best practices, and regulations for AI solutions
Work across multiple projects in a fluid environment across the full research lifecycle (forming hypothesis, acquiring data, developing ETL-style software, presenting findings)
Provide guidance to others
Responsibilities
Develop, deploy, and maintain state-of-the-art Generative AI solutions within the Azure cloud environment
Build and refine Large Language Models (LLMs) and RAG systems on Azure infrastructure, ensuring they meet business requirements and enterprise-level standards
Develop and integrate Retrieval-Augmented Generation (RAG) systems on Azure cloud
Perform data science research and develop custom prototypes for problems not addressed by commercial solutions, enhancing customer/internal experience and providing competitive advantage
Roll out Generative AI solutions and frameworks from Proof of Concepts to full stack development and vendor evaluations
Inspire and empower applied data science practitioners through high-quality data science education and collaboration within Northern Trust and with top universities/research institutions