via Career pages·3d ago

Python+Spark Scala

Infosys

Full-timeOn-site

Location:Hyderabad, IndiaType:Full-timePosted:3d ago

Big Data & Spark Development

Develop and maintain data processing pipelines using Apache Spark (PySpark & Scala) Work with Spark DataFrames, RDDs, and Spark SQL Implement transformations, joins, aggregations, and optimizations Tune Spark jobs for performance, scalability, and reliability

Python & Scala Programming

Write clean, efficient, and scalable code in Python and Scala Develop modular and reusable components Integrate data pipelines with various applications and APIs

ETL & Data Engineering

Design and build ETL workflows for structured and unstructured data Extract data from multiple sources (databases, APIs, flat files) Perform data cleansing, transformation, and validation Ensure data accuracy, consistency, and completeness

Data Platforms & Integration

Work with Hadoop ecosystem (HDFS, Hive, Spark) Handle large datasets in data lakes and warehouses Process data in formats like Parquet, ORC, JSON, CSV

Collaboration & Support

Work with data engineers, analysts, and business stakeholders Troubleshoot pipeline issues and provide production support Participate in Agile/Scrum processes Maintain technical documentation

Core Skills

2–5 years of experience in Python development Hands-on experience with Apache Spark (PySpark and/or Scala) Strong understanding of data processing and ETL concepts Good knowledge of SQL and relational databases

Don't want to miss the next one?

Subscribe to daily email alerts for roles matching your interests.

Get email alerts