Scala, Spark/pyspark
Infosys
Big Data & Spark Development
Design and implement scalable data pipelines using Apache Spark (Scala and/or PySpark) Work extensively with Spark Core, Spark SQL, DataFrames, and Datasets Develop batch and real-time data processing solutions using Spark Streaming / Structured Streaming Optimize Spark jobs for performance, memory management, and parallel processing
Scala & Python Development
Develop robust and efficient applications using Scala and Python Write reusable, modular, and maintainable code Implement business logic and transformations on large datasets
Data Engineering & ETL
Build and maintain ETL/ELT pipelines for large-scale data ingestion and transformation Process structured and unstructured data from multiple sources Ensure data validation, quality, and consistency Work with file formats like Parquet, ORC, Avro, JSON, CSV
Big Data Ecosystem
Work with Hadoop ecosystem (HDFS, Hive, YARN) Integrate Spark jobs with data lakes and warehouses Handle large datasets with distributed computing techniques
Cloud & Integration (Optional but Preferred)
Work with cloud platforms (AWS/Azure/GCP) for big data solutions Utilize services such as AWS EMR, Glue, S3 / Azure Databricks / Synapse Integrate pipelines with APIs and external systems
Collaboration & Leadership
Collaborate with data engineers, architects, and business teams Lead technical discussions and provide guidance to junior developers Participate in code reviews and best practice implementation Work in Agile/Scrum environments
Core Skills
5–9 years of experience in data engineering / big data development Strong hands-on expertise in Scala (mandatory for this role) Extensive experience with Apache Spark (Scala and/or PySpark) Solid understanding of ETL processes and data pipelines Strong proficiency in SQL and database concepts
Technical Skills
Deep knowledge of Spark architecture and execution model Experience with Spark performance tuning and optimization Strong data modeling and warehousing concepts Familiarity with version control tools (Git) Understanding of distributed computing principles
Don't want to miss the next one?
Subscribe to daily email alerts for roles matching your interests.