via Career pages·3d ago

Python+ETL+Pyspark

Infosys

Full-timeOn-site

Location:Hyderabad, IndiaType:Full-timePosted:3d ago

ETL Development

Design, develop, and maintain ETL pipelines using Python and PySpark Extract, transform, and load data from multiple structured and unstructured sources Build reusable and scalable data processing frameworks Ensure data quality, validation, and consistency

PySpark / Big Data Processing

Develop and optimize PySpark jobs for large-scale data processing Work with Spark DataFrames and RDDs Implement transformations, aggregations, and joins in Spark Optimize jobs for performance and scalability

Python Development

Develop backend logic and data processing scripts using Python Write modular, reusable, and efficient code Integrate APIs and automate workflows

Data Management & Integration

Work with data lakes and warehouses (S3, HDFS, Redshift, Hive) Handle file formats like Parquet, ORC, JSON, CSV Perform data cleansing, enrichment, and transformation

Collaboration & Support

Work with data engineers, analysts, and business stakeholders Debug and troubleshoot ETL/data pipeline issues Participate in Agile/Scrum ceremonies Maintain documentation and coding standards

Core Skills

2–5 years of experience in Python and ETL development Hands-on experience with PySpark (mandatory) Strong understanding of data processing and pipelines Solid knowledge of SQL and database concepts

Technical Skills

Experience with Apache Spark ecosystem Good knowledge of data structures and algorithms (basic to intermediate) Familiarity with Big Data technologies (Hadoop, Hive) Experience with version control (Git) Understanding of REST APIs and integrations

Don't want to miss the next one?

Subscribe to daily email alerts for roles matching your interests.

Get email alerts