Spark Data Engineer at Mactores

Mumbai, Maharashtra, India

Mactores Logo
Not SpecifiedCompensation
Mid-level (3 to 4 years), Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
Technology, DataIndustries

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field
  • 4+ years of experience building distributed data processing pipelines, with deep expertise in Apache Spark
  • Strong understanding of Spark internals (Catalyst optimizer, DAG scheduling, shuffle, partitioning, caching)
  • Proficiency in Scala and/or PySpark with strong software engineering fundamentals
  • Solid expertise in ETL/ELT, distributed computing, and large-scale data processing
  • Experience with cluster and job orchestration frameworks
  • Strong ability to identify and resolve performance bottlenecks and production issues
  • Familiarity with data security, governance, and data quality frameworks
  • Excellent communication and collaboration skills to work with distributed engineering teams
  • Ability to work independently and deliver scalable solutions in a fast-paced environment
  • Preferred
  • Experience with Databricks, AWS EMR, Glue Spark, or GCP Dataproc
  • Familiarity with workflow orchestration tools like Apache Airflow, Dagster, or Prefect
  • Exposure to streaming platforms such as Kafka, Kinesis, or Pub/Sub
  • Experience running Spark workloads on Kubernetes
  • Familiarity with data warehouse ecosystems (Snowflake, BigQuery, Redshift, Iceberg, Delta Lake, Hudi)
  • Understanding of DevOps practices, CI/CD, and IaC (Terraform, CloudFormation)
  • Knowledge of distributed logging and monitoring tools (Grafana, Prometheus, CloudWatch, ELK)
  • Prior experience in high-scale production environments or data platform teams

Responsibilities

  • Architect, design, and build scalable data pipelines and distributed applications using Apache Spark (Spark SQL, DataFrames, RDDs)
  • Develop and manage ETL/ELT pipelines to process structured and unstructured data at scale
  • Write high-performance code in Scala or PySpark for distributed data processing workloads
  • Optimize Spark jobs by tuning shuffle, caching, partitioning, memory, executor cores, and cluster resource allocation
  • Monitor and troubleshoot Spark job failures, cluster performance, bottlenecks, and degraded workloads
  • Debug production issues using logs, metrics, and execution plans to maintain SLA-driven pipeline reliability
  • Deploy and manage Spark applications on on-prem or cloud platforms (AWS, Azure, or GCP)
  • Collaborate with data scientists, analysts, and engineers to design data models and enable self-serve analytics
  • Implement best practices around data quality, data reliability, security, and observability
  • Support cluster provisioning, configuration, and workload optimization on platforms like Kubernetes, YARN, or EMR/Databricks
  • Maintain version-controlled codebases, CI/CD pipelines, and deployment automation
  • Document architecture, data flows, pipelines, and runbooks for operational excellence

Skills

Apache Spark
Spark SQL
DataFrames
RDDs
Scala
PySpark
ETL
ELT
AWS
Azure
GCP
Kubernetes
YARN
EMR
Databricks

Mactores

Technology consulting in AI and IoT

About Mactores

Mactores provides technology consulting and engineering services to help businesses tackle complex challenges. The company specializes in areas such as artificial intelligence, fast data processing, industrial Internet of Things (IoT), and cloud solutions. Mactores works closely with organizations to integrate these technologies into their operations, aiming to enhance their success through effective use of technology. What sets Mactores apart from its competitors is its strong focus on understanding customer needs and industries, ensuring that their solutions deliver real business value. The goal of Mactores is to achieve exceptional results that positively impact businesses, communities, and individuals around the world.

Seattle, WashingtonHeadquarters
2008Year Founded
VENTURE_UNKNOWNCompany Stage
Consulting, Industrial & Manufacturing, AI & Machine LearningIndustries
51-200Employees

Risks

Increased competition from major tech firms like Google and Microsoft.
Potential talent shortages in AI and data science fields.
Economic downturns may reduce client budgets for technology consulting.

Differentiation

Mactores specializes in AI, Fast Data, Industrial IoT, and Cloud technologies.
The company has achieved AWS Generative AI Competency, showcasing its expertise.
Mactores focuses on technology as a strategic driver for business success.

Upsides

Growing demand for AI-driven data analytics boosts Mactores' market potential.
Rising interest in Industrial IoT enhances Mactores' service offerings.
Cloud-native technology adoption supports Mactores' scalability solutions.

Land your dream remote job 3x faster with AI