About the Role
We’re seeking a deeply experienced Principal Engineer to lead architecture and engineering strategy across our Risk, Fraud, and Disputes platforms. This is a high-impact, hands-on technical leadership role focused on building ultra-low-latency, high-availability (99.999%), and scalable systems that power real-time transaction processing, fraud detection, and dispute workflows processing hundreds of billions of dollars.
You'll partner with product, platform, and data/ML teams to deliver resilient infrastructure, embed AI/LLM automation, and ensure secure, compliant integrations with global payment networks and identity vendors.
We work Flexible First. This role can be performed remotely anywhere within the United States or from our headquarters in Oakland, CA. We’d love for you to join us!
The Impact You’ll Have
- Architect and scale real-time, high-throughput systems for fraud detection, dispute handling, and identity verification.
- Build and optimize systems with sub-50ms latency SLAs to support real-time decisioning across fraud, 3DS, and dispute domains.
- Partner with ML/AI teams to integrate AI/ML pipelines and LLM-powered automation for intelligent classification, summarization, and recommendations.
- Design secure, fault-tolerant systems using event-driven architectures, Kafka, Flink, gRPC, and microservices.
- Design and launch AI Agents to make step changes in how risk and fraud mitigation is performed.
- Integrate with card networks including Visa, Mastercard, Pulse.
- Lead infrastructure-as-code implementation with Terraform, and scale CI/CD pipelines using modern tools.
- Implement observability, SLOs, tracing, and operational controls to ensure resilience and performance under high load.
- Drive integrations with specialized 3rd-party providers for data enrichment services and compliance automation.
- Mentor senior engineers, review high-risk architecture, and help steer technical decision-making across product teams.
- Apply and evangelize modern best practices in your software designs and implementations to increase the platform's reliability, resiliency, and scalability.
Who You Are
- 15+ years of software engineering experience, including designing, building, and operating real-time distributed systems at scale.
- Team-oriented approach – can effectively lead a project, participate as an influential team member, and work cross-functionally with other organizations.
- Experience partnering with product teams to ideate and bring innovative solutions to market.
- Strong engineering skills in Java with solid knowledge of microservices, message queues, circuit breakers, retries, rate-limiting and asynchronous systems.
- Hands-on experience with CI/CD systems, infrastructure-as-code using Terraform, containerization (Docker) and DevSecOps best practices.
- Proven experience with AI/ML and LLMs in production environments, especially for risk decisioning flows or agentic automation.
- Deep expertise in cloud-native architectures on AWS, including services like Lambda, ECS, and EKS; real-time data streaming using Kafka; and experience building high-performance APIs (REST/gRPC) under tight latency SLAs.
- Knowledge of card network specifications.
- Hands-on technical ability to write code, debug issues, or write queries to unblock teams, validate new ideas, and integrate emerging technologies.
- Proficiency in SQL and data analysis using tools like MySQL, Snowflake, and scripting languages.
Nice to Haves
- Deep experience in risk, fraud, 3DS, or dispute platforms, with high sensitivity to performance, reliability, and compliance.
- Demonstrated success partnering with ML and fraud teams to deploy real-time models into production (e.g. SageMarker, Vertex AI, or Ray Serve) with tight latency constraints.
- Integration experience with U.S. card networks and fraud/identity platforms to incorporate inline risk signals during authorization.
- Proficiency with observability tools such as Datadog and AWS X-Ray to diagnose latency and system issues.
- Familiarity with