Python-Pyspark Developer
Infosys
We are looking for an experienced Python PySpark Developer to design, develop, and optimize large-scale data processing systems. The ideal candidate will work on big data platforms, build scalable ETL pipelines, and process high-volume datasets using Spark and Python.
Key Responsibilities
Data Engineering & Development
Develop and maintain data pipelines using Python and PySpark Process and transform large datasets in distributed environments Build scalable ETL/ELT workflows
Big Data Processing
Work with Apache Spark (PySpark) for batch and real-time processing Optimize Spark jobs for performance and efficiency Handle structured and unstructured data
Data Integration
Ingest data from multiple sources:
Databases (SQL/NoSQL) APIs Files (CSV, JSON, Parquet)
Integrate with data platforms like:
Hadoop (HDFS) Cloud (AWS, Azure, GCP)
Performance Optimization
Tune Spark jobs (partitioning, caching, parallelism) Optimize SQL queries and transformations Improve data processing efficiency and cost
Collaboration & Support
Work with data engineers, data scientists, and analysts Translate business requirements into technical solutions Participate in code reviews and agile development practices
Monitoring & Troubleshooting
Debug and resolve issues in data pipelines Monitor job execution and data quality Ensure reliability and availability of data workflows
Don't want to miss the next one?
Subscribe to daily email alerts for roles matching your interests.