\u003C/p>\u003C/li>\u003C/ul>\u003Ch2>\u003Cstrong>Desired Capabilities\u003C/strong>\u003C/h2>\u003Cul style=\"min-height:1.5em\">\u003Cli>\u003Cp style=\"min-height:1.5em\">PhD or equivalent research experience in machine learning, computer science, cognitive science, or related fields with focus on AI evaluation, interpretability, or model understanding.\u003C/p>\u003C/li>\u003Cli>\u003Cp style=\"min-height:1.5em\">6+ years of academic or industry experience post-doc in a research-first environment\u003C/p>\u003C/li>\u003Cli>\u003Cp style=\"min-height:1.5em\">Strong background in LLM research, evaluation methodologies, and/or foundational AI assessment techniques.\u003C/p>\u003C/li>\u003Cli>\u003Cp style=\"min-height:1.5em\">Proven ability to independently design, lead, and execute evaluation research programs with novel data types end-to-end.\u003C/p>\u003C/li>\u003Cli>\u003Cp style=\"min-height:1.5em\">Deep proficiency in Python and PyTorch for large-scale model analysis, benchmarking, and evaluation.\u003C/p>\u003C/li>\u003Cli>\u003Cp style=\"min-height:1.5em\">Experience building or leading novel benchmark development, systematic model assessment, or interpretability studies.\u003C/p>\u003C/li>\u003Cli>\u003Cp style=\"min-height:1.5em\">Strong publication record in post-training, evaluation, or interpretability that demonstrates field-defining contributions.\u003C/p>\u003C/li>\u003Cli>\u003Cp style=\"min-height:1.5em\">Ability to clearly communicate complex insights and influence both technical and non-technical stakeholders.\u003C/p>\u003Cp style=\"min-height:1.5em\">\u003C/p>\u003C/li>\u003C/ul>\u003Ch2>\u003Cstrong>Extra Credit\u003C/strong>\u003C/h2>\u003Cul style=\"min-height:1.5em\">\u003Cli>\u003Cp style=\"min-height:1.5em\">Experience with RLHF, agent modeling, or AI alignment research.\u003C/p>\u003C/li>\u003Cli>\u003Cp style=\"min-height:1.5em\">Familiarity with data-centric AI approaches, synthetic data generation, or human-in-the-loop systems.\u003C/p>\u003C/li>\u003Cli>\u003Cp style=\"min-height:1.5em\">Understanding of challenges in scaling foundation models (training stability, safety, inference efficiency).\u003C/p>\u003C/li>\u003Cli>\u003Cp style=\"min-height:1.5em\">Contributions to open-source libraries or research tooling.\u003C/p>\u003C/li>\u003Cli>\u003Cp style=\"min-height:1.5em\">Interest in the societal impact, deployment ethics, and governance of frontier AI systems.\u003C/p>\u003C/li>\u003C/ul>\u003Ch1>\u003Cstrong>Perks\u003C/strong>\u003C/h1>\u003Cp style=\"min-height:1.5em\">Handshake delivers benefits that help you feel supported—and thrive at work and in life.\u003C/p>\u003Cp style=\"min-height:1.5em\">\u003C/p>\u003Cp style=\"min-height:1.5em\">\u003Cem>\u003Cstrong>The below benefits are for full-time US employees.\u003C/strong>\u003C/em>\u003C/p>\u003Cp style=\"min-height:1.5em\">🎯 \u003Cstrong>Ownership:\u003C/strong> Equity in a fast-growing company\u003C/p>\u003Cp style=\"min-height:1.5em\">💰 \u003Cstrong>Financial Wellness\u003C/strong>: 401(k) match, competitive compensation, financial coaching\u003C/p>\u003Cp style=\"min-height:1.5em\">🍼 \u003Cstrong>Family Support:\u003C/strong> Paid parental leave, fertility benefits, parental coaching\u003C/p>\u003Cp style=\"min-height:1.5em\">💝 \u003Cstrong>Wellbeing:\u003C/strong> Medical, dental, and vision, mental health support, $500 wellness stipend\u003C/p>\u003Cp style=\"min-height:1.5em\">📚 \u003Cstrong>Growth:\u003C/strong> $2,000 learning stipend, ongoing development\u003C/p>\u003Cp style=\"min-height:1.5em\">💻 \u003Cstrong>Remote & Office:\u003C/strong> Stipends for home office setup, internet, commuting, and free lunch/gym in our SF office\u003C/p>\u003Cp style=\"min-height:1.5em\">🏝 \u003Cstrong>Time Off:\u003C/strong> Flexible PTO, 15 holidays + 2 flex days, winter #ShakeBreak where our whole office closes for a week!\u003C/p>\u003Cp style=\"min-height:1.5em\">🤝 \u003Cstrong>Connection:\u003C/strong> Team outings & referral bonuses\u003C/p>\u003Cp style=\"min-height:1.5em\">Explore our mission, values, and comprehensive US benefits at \u003Ca target=\"_blank\" rel=\"noopener noreferrer nofollow\" href=\"http://joinhandshake.com/careers.Your\">\u003Cu>joinhandshake.com/careers\u003C/u>.\u003C/a>\u003C/p>\u003Ch1>\u003C/h1>","https://jobs.ashbyhq.com/handshake/dc125e6b-86ef-409e-9b54-8c92b95f9cf7",{"id":222,"name":223,"urlSafeSlug":223,"logo":224},[681,682,683],{"city":35,"region":36,"country":37},{"city":256,"region":256,"country":37},{"city":31,"region":31,"country":37},"2025-10-07T07:20:21.655Z","- PhD or equivalent research experience in machine learning, computer science, cognitive science, or related fields with focus on AI evaluation, interpretability, or model understanding\n- 6+ years of academic or industry experience post-doc in a research-first environment\n- Strong background in LLM research, evaluation methodologies, and/or foundational AI assessment techniques\n- Proven ability to independently design, lead, and execute evaluation research programs with novel data types end-to-end\n- Deep proficiency in Python and PyTorch for large-scale model analysis, benchmarking, and evaluation\n- Experience building or leading novel benchmark development, systematic model assessment, or interpretability studies\n- Strong publication record in post-training, evaluation, or interpretability that demonstrates field-defining contributions\n- Ability to clearly communicate complex insights and influence both technical and non-technical stakeholders","- Lead teams of researchers to produce original research in LLM evaluation methodologies, interpretability, and human-AI knowledge alignment\n- Develop novel frameworks and assessment techniques that reveal deep insights into model capabilities, limitations, and emergent behaviors\n- Collaborate with engineers to translate research breakthroughs into scalable benchmarks, evaluation systems, and standards\n- Pioneer new approaches to measuring reasoning, alignment, and trustworthiness in frontier AI systems\n- Author high-quality code to enable large-scale experimentation, reproducible evaluation, and knowledge assessment workflows\n- Publish in top-tier conferences and journals, establishing new directions in the science of AI evaluation\n- Work cross-functionally with leadership, engineers, and external partners to set industry standards for responsible AI evaluation and alignment",{"employment":688,"compensation":690,"experience":693,"visaSponsorship":696,"location":697,"skills":698,"industries":704},{"type":689},{"id":265,"name":266,"description":267},{"minAnnualSalary":691,"maxAnnualSalary":692,"currency":271,"details":31},350000,420000,{"experienceLevels":694},[695],{"id":323,"name":324,"description":452},{"type":279},{"type":279},[699,700,701,394,702,703],"LLM Evaluation","LLM Interpretability","Human-AI Knowledge Alignment","Framework Development","Assessment Techniques",[705],{"id":291,"name":292},["Reactive",707],{"$ssite-config":708},{"env":709,"name":710,"url":711},"production","nuxt-app","https://jobo.world",["Set"],["ShallowReactive",714],{"company-Handshake":-1,"company-jobs-5fa6c79c-f9bd-4013-b9de-ca0aa5a81c72-carousel":-1},"/company/Handshake",{}]