Lead Data Pipeline Engineer
Two Six TechnologiesFull Time
Junior (1 to 2 years)
Candidates should have 3+ years of experience in a technical role, including writing Python scripts for data processing and interfacing with technical and non-technical teams. A Bachelor's Degree in Engineering, Computer Science, or a technical field is required. Additionally, 2+ years of experience using LLMs in prompting frameworks and some experience with machine learning models in scripts or data pipelines are necessary. Practical experience using LLMs or traditional models to assist annotation QA or generate/transform data is also required.
The Data Operations Engineer will build, deploy, and maintain Python automation scripts and tools to streamline data annotation, automate tasks, and reduce manual effort. They will identify and resolve bottlenecks in the data labeling pipeline to enhance throughput, accuracy, and scalability. This role involves working with the Project Management team to ensure data labeling accuracy, troubleshooting data quality issues, and planning quality assurance workflows using GenAI and open-source models. Responsibilities also include setting up monitoring tools, reporting key metrics, integrating and managing third-party API tools with Labelbox, building and maintaining internal tools with Retool, and providing technical support to project managers and labelers.
Provides data labeling solutions for AI
Labelbox offers data labeling solutions for artificial intelligence applications, enabling businesses to label images, videos, text, and documents efficiently. Their platform allows users to create workflows that manage labeling tasks, which is crucial for industries like agriculture and healthcare that require large-scale data labeling for AI model training. Operating on a software-as-a-service (SaaS) model, Labelbox generates revenue through subscription fees and additional workforce services. The company's goal is to enhance AI development by providing high-quality data labeling solutions that streamline workflows.