Senior Data Engineer at Datafold

İstanbul, İstanbul, Türkiye

Datafold Logo
Not SpecifiedCompensation
Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
Finance, Airlines, RetailIndustries

Requirements

  • BSc/MSc/PhD degree in Computer Science or a related field or equivalent work experience
  • 5+ years of experience in Data Engineering, Data Architecture or similar role building production systems at scale
  • Proven experience architecting and operating real-time analytics systems handling large volumes of data with demonstrated ability to discuss technical tradeoffs and scalability challenges
  • Strong experience with real-time data modeling, ETL/ELT practices, and streaming architectures at enterprise scale
  • Expert-level proficiency in one or more high-level Python or Java based batch and/or stream processing frameworks such as Apache Spark, Apache Flink or Kafka Streams
  • Production experience with columnar stores and real-time analytics databases (Druid, ClickHouse strongly preferred)
  • Strong experience with relational and non-relational data stores, key-value stores and search engines (PostgreSQL, ScyllaDB, Redis, Hazelcast, Elasticsearch etc.)
  • Hands-on experience with data workflow orchestration tools like Airflow or dbt
  • Deep understanding of storage formats such as Parquet, ORC and/or Avro
  • Strong experience with distributed systems, concurrent programming, and real-time data processing at scale
  • Experience with distributed storage systems like HDFS and/or S3
  • Familiarity with data lake and data warehouse solutions including Hive, Iceberg, Hudi and/or Delta Lake
  • Strong sense of analytical thinking and problem-solving skills with ability to debug complex distributed systems
  • Strong verbal and written communication skills with ability to explain technical decisions to both engineers and business stakeholders
  • Nice to have: Familiarity with containerization & orchestration - Docker and/or Kubernetes
  • Nice to have: Experience with product analytics, behavioral analytics

Responsibilities

  • Design, architect and build large-scale resilient real-time data pipelines processing billions of events daily from 120M+ users using Apache Spark, Apache Flink, Kafka and other tools and frameworks
  • Own end-to-end data platform architecture from SDK data ingestion to real-time analytics APIs serving major enterprises across banking, airlines, and retail, making key technical decisions on storage, processing, and serving layers
  • Build unified customer profile systems that consolidate fragmented user data across web, mobile and IoT channels in real-time
  • Write well-designed, reusable, testable, secure and scalable high-quality code that powers mission-critical analytics for major enterprise brands
  • Drive architectural decisions on streaming, storage, and processing layers as data volumes and client requirements grow
  • Collaborate with cross-functional teams including product, ML, and analytics to shape data strategy and enable data-driven insights
  • Mentor engineers and establish data engineering best practices across the organization
  • Ensure platform reliability and performance to meet enterprise SLAs for systems processing millions of events per second

Skills

Key technologies and capabilities for this role

Apache SparkApache FlinkKafkaData PipelinesReal-time StreamingData ArchitectureCustomer ProfilingScalable Systems

Questions & Answers

Common questions about this position

What experience level is required for the Senior Data Engineer role?

The role requires 5+ years of experience in Data Engineering, Data Architecture or similar role building production systems at scale.

What key technical skills are needed for this position?

Candidates need expert-level proficiency in Python or Java based frameworks such as Apache Spark, Apache Flink or Kafka Streams, along with strong experience in real-time data modeling, ETL/ELT practices, streaming architectures, and columnar stores.

What is the salary or compensation for this role?

This information is not specified in the job description.

Is this a remote position or does it require office work?

This information is not specified in the job description.

What education is required for the Senior Data Engineer position?

A BSc/MSc/PhD degree in Computer Science or a related field, or equivalent work experience, is required.

What does the company culture emphasize for developers?

Developers drive innovation, stay ahead of technology trends, embrace challenges, explore new technologies, and deliver simple and seamless solutions.

What makes a strong candidate for this Senior Data Engineer role?

Strong candidates have proven experience architecting real-time analytics systems at scale, ability to discuss technical tradeoffs, and experience with production systems handling large data volumes.

Datafold

Unified data quality platform for testing

About Datafold

Datafold provides a platform that focuses on maintaining high data quality through proactive and automated testing. The platform works by integrating into the development cycle, allowing data teams to test their data at various stages, such as during code deployments and migrations, to catch potential issues before they affect the data warehouse. This approach is different from traditional data observability tools, which mainly identify problems after they occur. Datafold aims to help data teams across different industries ensure the integrity and reliability of their data, ultimately speeding up their development processes. The company operates on a subscription-based model, generating revenue through recurring payments from its clients.

San Francisco, CaliforniaHeadquarters
2020Year Founded
$21.7MTotal Funding
SERIES_ACompany Stage
Data & Analytics, Enterprise SoftwareIndustries
11-50Employees

Benefits

Remote Work Options

Risks

Increased competition from companies like Monte Carlo focusing on data observability.
Complex data pipelines lead to more frequent data quality incidents.
Open-source tools may cannibalize subscription-based revenue model.

Differentiation

Datafold integrates deeply into the development cycle, preventing bad code deployments.
It offers automated testing during deployment, migrations, and ongoing monitoring stages.
Datafold's platform supports instant table comparison and profiling for data QA.

Upsides

Growing demand for automated data quality solutions as data complexity increases.
Partnerships with companies like dbt Labs enhance data testing capabilities.
Investment interest highlighted by Datafold's $20M Series A funding.

Land your dream remote job 3x faster with AI