Senior AI Engineer, GenAI & ML Evaluation Frameworks - Grafana Ops, AI/ML | USA | Remote at Grafana Labs

United States

Grafana Labs Logo
Not SpecifiedCompensation
Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
SoftwareIndustries

Requirements

  • Experience designing and implementing evaluation frameworks for AI/ML systems
  • Familiarity with prompt engineering, structured output evaluation, and context-window management in LLM systems
  • High autonomy to collaborate and translate team goals into clear, testable criteria supported by effective tooling
  • Experience working in environments with rapid iteration and experimental development (bonus point)
  • Pragmatic mindset that values reproducibility, developer experience, and thoughtful trade-offs when scaling GenAI systems (bonus point)
  • Passion for minimizing human toil and building AI systems that actively support engineers (bonus point)

Responsibilities

  • Design and implement robust evaluation frameworks for GenAI and LLM-based systems, including golden test sets, regression tracking, LLM-as-judge methods, and structured output verification
  • Develop tooling to enable automated, low-friction evaluation of model outputs, prompts, and agent behaviors
  • Define and refine metrics for both structure and semantics, ensuring alignment with realistic use cases and operational constraints
  • Lead the development of dataset management processes and guide teams across Grafana in best practices for GenAI evaluation

Skills

Key technologies and capabilities for this role

Generative AILarge Language Models (LLMs)Evaluation FrameworksObservabilityAI/ML

Questions & Answers

Common questions about this position

Is this position remote?

Yes, this is a remote opportunity, and the company is remote-first. They are interested in applicants from USA time zones only.

What salary or compensation does this role offer?

This information is not specified in the job description.

What skills and experience are required for this Senior AI Engineer role?

The role requires experience designing and implementing evaluation frameworks for GenAI and LLM-based systems. Expertise in evaluating Generative AI systems, particularly Large Language Models (LLMs), is essential.

What is the company culture like at Grafana Labs?

Grafana Labs has a global collaborative culture with transparency, autonomy, and trust. The team thrives in an innovation-driven environment.

What makes a strong candidate for this role?

Candidates with experience designing and implementing evaluation frameworks for GenAI and LLMs who are excited about the role will be a great fit, even if they don't meet every requirement.

Grafana Labs

Observability and monitoring solutions provider

About Grafana Labs

Grafana Labs specializes in observability and monitoring solutions for cloud infrastructure and applications. Its main product, Grafana, is an open-source metrics dashboard that allows users to visualize and analyze data from various sources. This helps businesses monitor the performance and health of their systems in real-time. Grafana Labs serves a wide range of clients, including large enterprises and individual developers, particularly in sectors like technology, finance, healthcare, and retail. Unlike many competitors, Grafana Labs offers both open-source and commercial products, generating revenue through premium features, enterprise support, and managed cloud services. The company's goal is to provide essential tools for monitoring and visualizing data, ensuring that digital services are reliable and efficient.

New York City, New YorkHeadquarters
2014Year Founded
$783.2MTotal Funding
SERIES_DCompany Stage
Data & Analytics, Enterprise SoftwareIndustries
1,001-5,000Employees

Benefits

30 days of paid vacation each year on top of national holidays, parental leave, & sick leave
Health coverage
4% contribution match on our 401(k)
$1,500 learning and development stipend
Udemy subscription
Complimentary subscription to Headspace
Discounts on a wide variety of services, including entertainment, food, and fitness.
Remote Work Option
Global Employee Assistance Program

Risks

Increased competition from Datadog, Dynatrace, and New Relic pressures Grafana to innovate.
Recent critical security vulnerabilities could affect customer trust and adoption.
Rapid regional expansion may strain resources and impact service quality.

Differentiation

Grafana Labs offers a unique open-source observability platform with extensive customization.
The company provides both self-managed and fully managed observability solutions for diverse needs.
Grafana Labs integrates AI to enhance data analysis and monitoring capabilities.

Upsides

Grafana Labs raised $270M, boosting its valuation to over $6 billion in 2024.
The company surpassed $250M in annual recurring revenue with over 5,000 customers.
Expansion into Southeast Asia shows Grafana's commitment to localized cloud services.

Land your dream remote job 3x faster with AI