Python+Spark Scala
Infosys
Big Data & Spark Development
Develop and maintain data processing pipelines using Apache Spark (PySpark & Scala) Work with Spark DataFrames, RDDs, and Spark SQL Implement transformations, joins, aggregations, and optimizations Tune Spark jobs for performance, scalability, and reliability
Python & Scala Programming
Write clean, efficient, and scalable code in Python and Scala Develop modular and reusable components Integrate data pipelines with various applications and APIs
ETL & Data Engineering
Design and build ETL workflows for structured and unstructured data Extract data from multiple sources (databases, APIs, flat files) Perform data cleansing, transformation, and validation Ensure data accuracy, consistency, and completeness
Data Platforms & Integration
Work with Hadoop ecosystem (HDFS, Hive, Spark) Handle large datasets in data lakes and warehouses Process data in formats like Parquet, ORC, JSON, CSV
Collaboration & Support
Work with data engineers, analysts, and business stakeholders Troubleshoot pipeline issues and provide production support Participate in Agile/Scrum processes Maintain technical documentation
Core Skills
2–5 years of experience in Python development Hands-on experience with Apache Spark (PySpark and/or Scala) Strong understanding of data processing and ETL concepts Good knowledge of SQL and relational databases
Don't want to miss the next one?
Subscribe to daily email alerts for roles matching your interests.