Senior Data Engineer at Datafold

İstanbul, İstanbul, Türkiye

Datafold Logo
Not SpecifiedCompensation
Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
Finance, Airlines, RetailIndustries

Requirements

  • BSc/MSc/PhD degree in Computer Science or a related field or equivalent work experience
  • 5+ years of experience in Data Engineering, Data Architecture or similar role building production systems at scale
  • Proven experience architecting and operating real-time analytics systems handling large volumes of data with demonstrated ability to discuss technical tradeoffs and scalability challenges
  • Strong experience with real-time data modeling, ETL/ELT practices, and streaming architectures at enterprise scale
  • Expert-level proficiency in one or more high-level Python or Java based batch and/or stream processing frameworks such as Apache Spark, Apache Flink or Kafka Streams
  • Production experience with columnar stores and real-time analytics databases (Druid, ClickHouse strongly preferred)
  • Strong experience with relational and non-relational data stores, key-value stores and search engines (PostgreSQL, ScyllaDB, Redis, Hazelcast, Elasticsearch etc.)
  • Hands-on experience with data workflow orchestration tools like Airflow or dbt
  • Deep understanding of storage formats such as Parquet, ORC and/or Avro
  • Strong experience with distributed systems, concurrent programming, and real-time data processing at scale
  • Experience with distributed storage systems like HDFS and/or S3
  • Familiarity with data lake and data warehouse solutions including Hive, Iceberg, Hudi and/or Delta Lake
  • Strong sense of analytical thinking and problem-solving skills with ability to debug complex distributed systems
  • Strong verbal and written communication skills with ability to explain technical decisions to both engineers and business stakeholders
  • Nice to have: Familiarity with containerization & orchestration - Docker and/or Kubernetes
  • Nice to have: Experience with product analytics, behavioral analytics

Responsibilities

  • Design, architect and build large-scale resilient real-time data pipelines processing billions of events daily from 120M+ users using Apache Spark, Apache Flink, Kafka and other tools and frameworks
  • Own end-to-end data platform architecture from SDK data ingestion to real-time analytics APIs serving major enterprises across banking, airlines, and retail, making key technical decisions on storage, processing, and serving layers
  • Build unified customer profile systems that consolidate fragmented user data across web, mobile and IoT channels in real-time
  • Write well-designed, reusable, testable, secure and scalable high-quality code that powers mission-critical analytics for major enterprise brands
  • Drive architectural decisions on streaming, storage, and processing layers as data volumes and client requirements grow
  • Collaborate with cross-functional teams including product, ML, and analytics to shape data strategy and enable data-driven insights
  • Mentor engineers and establish data engineering best practices across the organization
  • Ensure platform reliability and performance to meet enterprise SLAs for systems processing millions of events per second

Skills

Apache Spark
Apache Flink
Kafka
Data Pipelines
Real-time Streaming
Data Architecture
Customer Profiling
Scalable Systems

Datafold

Unified data quality platform for testing

About Datafold

Datafold provides a platform that focuses on maintaining high data quality through proactive and automated testing. The platform works by integrating into the development cycle, allowing data teams to test their data at various stages, such as during code deployments and migrations, to catch potential issues before they affect the data warehouse. This approach is different from traditional data observability tools, which mainly identify problems after they occur. Datafold aims to help data teams across different industries ensure the integrity and reliability of their data, ultimately speeding up their development processes. The company operates on a subscription-based model, generating revenue through recurring payments from its clients.

San Francisco, CaliforniaHeadquarters
2020Year Founded
$21.7MTotal Funding
SERIES_ACompany Stage
Data & Analytics, Enterprise SoftwareIndustries
11-50Employees

Benefits

Remote Work Options

Risks

Increased competition from companies like Monte Carlo focusing on data observability.
Complex data pipelines lead to more frequent data quality incidents.
Open-source tools may cannibalize subscription-based revenue model.

Differentiation

Datafold integrates deeply into the development cycle, preventing bad code deployments.
It offers automated testing during deployment, migrations, and ongoing monitoring stages.
Datafold's platform supports instant table comparison and profiling for data QA.

Upsides

Growing demand for automated data quality solutions as data complexity increases.
Partnerships with companies like dbt Labs enhance data testing capabilities.
Investment interest highlighted by Datafold's $20M Series A funding.

Land your dream remote job 3x faster with AI