Associate Data Engineer
Associate Data Engineer
India - Hyderabad Apply Now
JOB ID:
R-229068
LOCATION:
India - Hyderabad
WORK LOCATION TYPE:
On Site
DATE POSTED:
May. 01, 2026
CATEGORY:
Information Systems
Roles & Responsibilities
- Develop, test, and maintain data pipelines using Databricks, PySpark, and Python.
- Ingest, transform, and process structured and semi-structured data from multiple sources.
- Support the development of scalable ETL/ELT workflows for analytics, reporting, and machine learning use cases.
- Work with data engineers, analysts, and data scientists to understand data requirements and deliver reliable datasets.
- Perform data cleansing, validation, and quality checks to ensure accuracy and consistency.
- Optimize Spark jobs and Databricks notebooks for performance, reliability, and cost efficiency.
- Create and maintain documentation for data pipelines, workflows, data definitions, and processes.
- Assist in troubleshooting pipeline failures, data issues, and performance bottlenecks.
- Follow best practices for version control, code quality, testing, and deployment.
- Support basic AI/ML data preparation activities, including feature engineering, dataset creation, and model input preparation.
- Monitor scheduled jobs and workflows to ensure timely and successful data delivery.
- Collaborate with cross-functional teams in an Agile or iterative development environment.
Basic Qualifications and Experience
2-6 years of experience with Bachelor’s degree in Computer Science, Data Engineering, Information Systems, Engineering, Mathematics, or a related field, or equivalent practical experience
Must-Have Qualifications
- Bachelor’s degree in Computer Science, Data Engineering, Information Systems, Engineering, Mathematics, or a related field, or equivalent practical experience.
- Hands-on experience with Python for data processing, scripting, and automation.
- Strong working knowledge of PySpark and distributed data processing concepts.
- Proven hands-on experience using Databricks for data engineering, including notebooks, clusters, jobs, workflows, Delta tables, and performance optimization.
- Ability to build, maintain, and troubleshoot scalable ETL/ELT pipelines in Databricks.
- Experience working with Delta Lake and lakehouse architecture concepts.
- Working knowledge of SQL for querying, transforming, and validating data.
- Ability to work with structured and semi-structured data formats such as CSV, JSON, Parquet, and Delta.
- Understanding of data engineering concepts such as ETL/ELT, data pipelines, data lakes, data warehouses, batch processing, and data quality.
- Basic understanding of AI and machine learning concepts, including features, training datasets, model inputs/outputs, and model evaluation basics.
- Experience supporting data preparation or feature engineering for AI/ML use cases.
- Familiarity with cloud-based data platforms, preferably AWS, Azure, or GCP.
- Understanding of Git or other version control tools.
- Strong analytical, problem-solving, and troubleshooting skills.
- Good communication skills and ability to work collaboratively with technical and non-technical stakeholders.
- Willingness to learn new tools, technologies, and data engineering best practices.
Preferred Qualifications
- Exposure to Delta Lake, Unity Catalog, or Lakehouse architecture.
- Experience with workflow orchestration tools or Databricks Jobs.
- Familiarity with CI/CD practices for data engineering projects.
- Exposure to machine learning workflows using MLflow, scikit-learn, or similar tools.
- Experience with Tableau, Power BI, or similar data visualization tools to create dashboards, support reporting needs, validate datasets, and perform exploratory analysis.
- Understanding of data governance, security, and access control concepts.
- Experience working in an Agile/Scrum environment.
Shift Information
This position may require working during later shifts (evening or night) depending on business needs.