Data Engineering

SPAN Technology Services

STS-DATA ENGINEERING-1

Posted a few seconds ago

Erode

Job Summary:
We are looking for motivated Data Engineers to join our team and help design, build, andmaintain scalable data pipelines and warehouse solutions. The ideal candidates will havehands-on experience with modern data engineering tools and a strong interest in real-time and batch data processing.

Responsibilities

Design and implement data models to support analytical and reporting needs, ensuringefficient schema design for warehouse and transactional systems
Build, maintain, and optimize ETL/ELT pipelines using Python, Apache Kafka, Apache Flink for batch and real-time data processing and PySpark
Set up and manage data warehouse infrastructure (Doris), including table design,partitioning strategies, and performance tuning
Develop and schedule data workflows using Apache Airflow, ensuring reliability andmonitoring of pipeline jobs
Integrate data from multiple sources including PostgreSQL, MS SQL, and MinIO into the data warehouse
Containerize data pipeline components using Docker for consistent deployment across environments
Ensure data quality, consistency, and integrity across pipelines through validation and testing
Collaborate with analysts, data scientists, and other engineering teams to understand data requirements and deliver solutions
Troubleshoot and resolve issues related to data pipelines, storage, and processing performance
Document data flows, schemas, and pipeline architecture for team reference

Required Qualifications

Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field
1 - 3 years of hands-on experience in data engineering
Proficiency in Python and SQL for data processing and transformation
Experience with Apache Kafka for stream processing and messaging
Experience with Apache Flink or similar stream/batch processing frameworks
Hands-on experience with PySpark for large-scale data processing
Familiarity with Apache Airflow for workflow orchestration and scheduling
Working knowledge of relational databases (PostgreSQL, MS SQL)
Exposure to OLAP/data warehouse systems (Doris or similar columnar databases)
Experience with Docker for containerization of data pipelines
Familiarity with object storage solutions (MinIO or similar S3-compatible storage)
Understanding of data warehousing concepts, data modeling, and ETL/ELT pipeline

design

Preferred Skills

Experience building real-time data pipelines using Kafka + Flink
Knowledge of distributed computing concepts (partitioning, parallelism, fault tolerance)
Familiarity with CI/CD practices for deploying data pipelines
Experience with version control (Git)
Understanding of data quality, validation, and monitoring practices
Basic knowledge of cloud platforms (AWS, Azure, or GCP) is a plus

Soft Skills

Strong problem-solving and debugging skills
Good communication and collaboration abilities
Eagerness to learn and adapt to new tools in a fast-paced environment