Data Engineer

Data Engineer
India - Hyderabad Apply NowRole Description:
The role is responsible for designing, building, maintaining, analyzing, and interpreting data to provide actionable insights that drive business decisions. This role involves working with large datasets, developing reports, supporting and executing data governance initiatives and, visualizing data to ensure Clinical Study Planning, Design Optimization & Startup (CSPDS) data is accessible, reliable, and efficiently managed. The ideal candidate has strong technical skills, experience with big data technologies, and a deep understanding of data architecture and ETL processes
Roles & Responsibilities:
- Develop, and maintain data solutions for data generation, collection, and processing of various data sets from different source systems (like Vault Clinical, Vault RIM, Clinical Trial Data Foundation and other Amgen systems)
- Be a key team member that assists in design and development of the data pipeline
- Create data pipelines and ensure data quality by implementing ETL processes to migrate and deploy data across systems
- Schedule and manage workflows the ensure pipelines run on schedule and are monitored for failures.
- Contribute to the design, development, and implementation of data pipelines, ETL/ELT processes, and data integration solutions
- Take ownership of data pipeline projects from inception to deployment, manage scope, timelines, and risks
- Collaborate with cross-functional teams to understand data requirements and design solutions that meet business needs
- Develop and maintain data models, data dictionaries, and other documentation to ensure data accuracy and consistency
- Implement data security and privacy measures to protect sensitive data
- Leverage cloud platforms (AWS preferred) to build scalable and efficient data solutions
- Collaborate and communicate effectively with product teams
- Stay up to date with latest technology trends & adhere to best practices for coding, testing and designing reusable code/component
Basic Qualifications and Experience:
- Master’s degree with 4 - 6 years of experience in Computer Science, IT or related field OR
- Bachelor’s degree with 6 - 8 years of experience in Computer Science, IT or related field OR
- Diploma with 10 - 12 years of experience in Computer Science, IT or related field
Functional Skills:
Must-Have Skills
- Hands on experience with big data technologies and platforms, such as Databricks, Apache Spark (PySpark, SparkSQL), workflow orchestration (Apache airflow), performance tuning on big data processing
- Proficiency in data analysis tools (eg. SQL) and experience with data visualization tools
- Excellent problem-solving skills and the ability to work with large, complex datasets
- Strong understanding of data governance frameworks, tools, and best practices.
- Knowledge of data protection regulations and compliance requirements (e.g., GDPR, CCPA)
- Experienced with software engineering best-practices, including but not limited to version control (Git, Subversion, etc.), CI/CD (Jenkins, Maven etc.), automated unit testing, and Dev Ops
Good-to-Have Skills:
- Experience with ETL tools such as Apache Spark, and various Python packages related to data processing, machine learning model development
- Strong understanding of data modeling, data warehousing, and data integration concepts
- Knowledge of SageMaker, cloud data platforms
- Familiarity with SQL/NOSQL database, vector database for large language models
Professional Certifications (please mention if the certification is preferred or mandatory for the role):
- Databricks Certification (Preferred)
Soft Skills:
- Excellent critical-thinking and problem-solving skills
- Strong communication and collaboration skills
- Demonstrated awareness of how to function in a team setting
- Demonstrated presentation skills
Shift Information:
This position requires you to work a later shift and may be assigned a second or third shift schedule. Candidates must be willing and able to work during evening or night shifts, as required based on business requirements.