Python+ETL+Pyspark
Infosys
ETL Development
Design, develop, and maintain ETL pipelines using Python and PySpark Extract, transform, and load data from multiple structured and unstructured sources Build reusable and scalable data processing frameworks Ensure data quality, validation, and consistency
PySpark / Big Data Processing
Develop and optimize PySpark jobs for large-scale data processing Work with Spark DataFrames and RDDs Implement transformations, aggregations, and joins in Spark Optimize jobs for performance and scalability
Python Development
Develop backend logic and data processing scripts using Python Write modular, reusable, and efficient code Integrate APIs and automate workflows
Data Management & Integration
Work with data lakes and warehouses (S3, HDFS, Redshift, Hive) Handle file formats like Parquet, ORC, JSON, CSV Perform data cleansing, enrichment, and transformation
Collaboration & Support
Work with data engineers, analysts, and business stakeholders Debug and troubleshoot ETL/data pipeline issues Participate in Agile/Scrum ceremonies Maintain documentation and coding standards
Core Skills
2–5 years of experience in Python and ETL development Hands-on experience with PySpark (mandatory) Strong understanding of data processing and pipelines Solid knowledge of SQL and database concepts
Technical Skills
Experience with Apache Spark ecosystem Good knowledge of data structures and algorithms (basic to intermediate) Familiarity with Big Data technologies (Hadoop, Hive) Experience with version control (Git) Understanding of REST APIs and integrations
Don't want to miss the next one?
Subscribe to daily email alerts for roles matching your interests.