Data Engineering
SPAN Technology Services
STS-DATA ENGINEERING-1
Posted a few seconds ago
Erode
Job Summary:
We are looking for motivated Data Engineers to join our team and help design, build, andmaintain scalable data pipelines and warehouse solutions. The ideal candidates will havehands-on experience with modern data engineering tools and a strong interest in real-time and batch data processing.
Responsibilities
- Design and implement data models to support analytical and reporting needs, ensuringefficient schema design for warehouse and transactional systems
- Build, maintain, and optimize ETL/ELT pipelines using Python, Apache Kafka, Apache Flink for batch and real-time data processing and PySpark
- Set up and manage data warehouse infrastructure (Doris), including table design,partitioning strategies, and performance tuning
- Develop and schedule data workflows using Apache Airflow, ensuring reliability andmonitoring of pipeline jobs
- Integrate data from multiple sources including PostgreSQL, MS SQL, and MinIO into the data warehouse
- Containerize data pipeline components using Docker for consistent deployment across environments
- Ensure data quality, consistency, and integrity across pipelines through validation and testing
- Collaborate with analysts, data scientists, and other engineering teams to understand data requirements and deliver solutions
- Troubleshoot and resolve issues related to data pipelines, storage, and processing performance
- Document data flows, schemas, and pipeline architecture for team reference
Required Qualifications
- Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field
- 1 - 3 years of hands-on experience in data engineering
- Proficiency in Python and SQL for data processing and transformation
- Experience with Apache Kafka for stream processing and messaging
- Experience with Apache Flink or similar stream/batch processing frameworks
- Hands-on experience with PySpark for large-scale data processing
- Familiarity with Apache Airflow for workflow orchestration and scheduling
- Working knowledge of relational databases (PostgreSQL, MS SQL)
- Exposure to OLAP/data warehouse systems (Doris or similar columnar databases)
- Experience with Docker for containerization of data pipelines
- Familiarity with object storage solutions (MinIO or similar S3-compatible storage)
- Understanding of data warehousing concepts, data modeling, and ETL/ELT pipeline
design
Preferred Skills
- Experience building real-time data pipelines using Kafka + Flink
- Knowledge of distributed computing concepts (partitioning, parallelism, fault tolerance)
- Familiarity with CI/CD practices for deploying data pipelines
- Experience with version control (Git)
- Understanding of data quality, validation, and monitoring practices
- Basic knowledge of cloud platforms (AWS, Azure, or GCP) is a plus
Soft Skills
- Strong problem-solving and debugging skills
- Good communication and collaboration abilities
- Eagerness to learn and adapt to new tools in a fast-paced environment