Lead Data Engineer
Access SystemsInternship
Mid-level (3 to 4 years), Senior (5 to 8 years)
Candidates should possess 5+ years of data engineering experience with a significant focus on on-premise systems such as Hadoop and HDFS, demonstrating consistent tenure at companies with an average of 2+ years per engagement. Advanced proficiency in Python and Java/Scala is required, along with expertise in tools like Airflow, Kafka, Spark, and Hive, and working knowledge of SQL across various database dialects. Experience in data governance, data lineage, and data quality initiatives is desirable, as is exposure to architectural/system design or technical leadership tasks.
As a Senior Data Engineer, you will design and build scalable data pipelines using tools like Airflow, Spark, and Kafka, monitor and alert for data quality issues, support data governance and lineage solutions, contribute to the development of the shared data platform for use cases like product analytics and bot detection, and enhance operational excellence by identifying and implementing improvements in system reliability and performance. You will also interact effectively within and across teams, producing clear technical designs and articulating ideas to both technical and non-technical stakeholders.
Operates Wikipedia and free knowledge projects
The Wikimedia Foundation operates Wikipedia and other free knowledge projects, aiming to create a world where everyone can freely access and share knowledge. It provides a platform for users to read, contribute, and share content, while also supporting the volunteer communities that help maintain these projects. The foundation is funded by donations from individuals and institutions, emphasizing its nonprofit status. Unlike many other organizations, it focuses on making knowledge accessible to all without charge, advocating for policies that support free knowledge initiatives. Its goal is to empower individuals to contribute to and benefit from a collective pool of knowledge.