via Career pages·3d ago

PySpark Developer

Infosys

Full-timeOn-site

Location:Bangalore, IndiaType:Full-timePosted:3d ago

Key Responsibilities

Develop and maintain data pipelines using PySpark Process and analyze large-scale datasets in distributed environments Design and implement ETL/ELT workflows Optimize Spark jobs for performance and scalability Work with data stored in HDFS, Hive, or cloud storage (S3, ADLS) Collaborate with data engineers, analysts, and business teams Ensure data quality, integrity, and governance Debug and troubleshoot data processing issues Automate workflows using scheduling tools (Airflow, Oozie, etc.) Write clean, scalable, and efficient code

Required Skills & Qualifications Technical Skills

Strong proficiency in Python and PySpark Good experience with Apache Spark (RDDs, DataFrames, Spark SQL) Knowledge of Hadoop ecosystem (HDFS, Hive) Experience in ETL pipeline development Familiarity with SQL and database concepts Experience with data formats (Parquet, ORC, JSON, CSV) Basic understanding of distributed computing concepts Exposure to version control tools (Git)

Preferred Skills (Nice-to-Have)

Experience with cloud platforms (AWS, Azure, GCP) Knowledge of Databricks or EMR environments Familiarity with workflow orchestration tools (Airflow) Exposure to Kafka or real-time data streaming Understanding of Delta Lake / Lakehouse architecture Experience with NoSQL databases (MongoDB, Cassandra) Knowledge of CI/CD pipelines and DevOps practices Basic understanding of machine learning workflows

Don't want to miss the next one?

Subscribe to daily email alerts for roles matching your interests.

Get email alerts