Senior Data Engineer

We are looking for a Senior Data Engineer with strong experience in building and optimising data pipelines using Databricks, Apache Spark, and PySpark. The ideal candidate is passionate about data architecture, performance optimisation, and working with high-scale distributed data systems.

You will play a key role in designing and developing scalable data ingestion, transformation, and processing pipelines, enabling reliable and timely data for downstream analytics, reporting, and machine learning.

About the Client

The world's largest human resources consulting firm is headquartered in New York City, with its main branches in 40+ countries. Over 20,500 employees operate internationally in more than 130 countries. Its services are used by 97% of Fortune 500 companies.

What You’ll Do

Design, develop, and maintain scalable and efficient data pipelines using Databricks, Apache Spark, and PySpark
Collaborate with data scientists, analysts, and product teams to understand data requirements and ensure reliable data delivery
Implement ETL/ELT workflows to extract, cleanse, transform, and load data from various structured and unstructured sources
Optimize Spark jobs and workflows for performance, scalability, and cost-efficiency
Develop reusable components, frameworks, and libraries to accelerate pipeline development
Monitor data quality and pipeline health; implement data validation and error-handling mechanisms
Ensure compliance with security, privacy, and governance policies
Contribute to best practices in data engineering and cloud-native data architecture

What You Bring

3–6+ years of experience in data engineering or software engineering with a focus on large-scale data processing
Strong hands-on experience with Apache Spark and PySpark
Proficiency in working with Databricks platform (including notebooks, jobs, clusters, and workspace management
Solid knowledge of data formats (Parquet, Avro, JSON, etc.) and data modeling concepts
Experience building and orchestrating ETL/ELT pipelines (e.g., using Airflow, Databricks Workflows, Azure Data Factory, etc.)
Familiarity with cloud platforms (Azure, AWS, or GCP) and their data services
Strong programming skills in Python; SQL expertise is a must
Understanding of CI/CD practices and version control (Git)
Ability to work in Agile development environments and collaborate with cross-functional teams

Nice to have

Experience with Delta Lake or other transactional data lake technologies
Familiarity with data lakehouse architecture
Exposure to data warehousing tools and MPP databases (Snowflake, Redshift, BigQuery, etc.)
Knowledge of data governance, lineage, and cataloging tools (e.g., Unity Catalog, DataHub, Collibra)
Experience with streaming data (Kafka, Spark Structured Streaming)
English level - Upper-Intermediate

Senior Data Engineer

About Bonapolia

Senior Data Engineer

Already working at Bonapolia?