Site Reliability Engineer III
Site Reliability Engineer III
India - Hyderabad APLICAR AHORAPosition Overview
The GCF5 Track Lead is the senior technical leader for one capability pillar—Enterprise Data Fabric (EDF)/Common Data Model, Agentic/ML Platform, or High Performance Computing (HPC) Enablement. They define and socialize standards and patterns, lead multi‑team delivery, and mentor GCF4 engineers. They translate scientific needs into scalable platform designs, own pillar‑level adoption, reliability, and Service Level Agreement (SLA)/Service Level Objective (SLO) outcomes, and influence cross‑team engineering quality.
Core Responsibilities
• Own the pillar roadmap and backlog; plan, prioritize, and deliver multi‑team initiatives to agreed Objectives and Key Results (OKRs).
• Define, document, and govern standards/patterns (APIs, schemas, contracts, security, observability, testing).
• Lead designs and architecture reviews; ensure solutions align with enterprise guardrails and regulatory posture.
• Mentor and develop GCF4 engineers; set expectations for code quality, reviews, testing, and incident response.
• Establish SLAs/SLOs and error budgets; drive reliability, performance, and cost efficiency for the pillar.
• Partner with scientists and platform teams to translate lab/scientific workflows into scalable data/ML/HPC solutions.
• Manage technical risks, trade‑offs, and dependencies; communicate status and decisions to stakeholders.
• Contribute to hiring, onboarding, and capability growth; support diversity and inclusion goals.
Core Competencies
• Deep expertise in the assigned pillar (EDF/Common Data Model (CDM), Agentic‑ML, or HPC) with evidence of standard‑setting and reuse.
• Systems design at scale (data/ML/HPC); performance, security, and observability fundamentals.
• Product/engineering thinking: road mapping, prioritization, and outcome‑oriented delivery.
• Stakeholder influence across science, engineering, and governance forums; crisp written/verbal communication.
Core Success Measures
• Standards adoption (% of services/datasets conformant) and SLO attainment over rolling quarters.
• Platform usage and reliability KPIs (uptime, latency, error rate, Mean Time to Recovery (MTTR)) within the pillar.
• Time‑to‑delivery for new capabilities; reduction in toil; cost/performance efficiency achieved.
• Mentorship outcomes: GCF4 progression, PR quality, design review quality; stakeholder satisfaction (NPS).
Track-Specific Responsibilities
Onboarding playbooks; containerization standards; scheduler policies; performance profiling and tuning; cost/throughput optimization.
Onboarding throughput; job success rate; performance gains; cost per compute hour; quota adherence.
Key Relationships
• Collaborates with GCF6 Group Lead and cross‑functional leaders (R&D/PD/Dev).
• Mentors and develops GCF4 Data and Software Engineers; partners with platform, data, ML, and research teams.
• Interfaces with governance (architecture, security, compliance) and vendor/partner teams.
Decision Authority
• Approve designs within the pillar; define and waive standards/patterns with rationale.
• Recommend buy‑vs‑build; commit pillar resources to meet SLAs/SLOs; escalate risks.
• Prioritize pillar backlog and roadmap in alignment with strategy and OKRs.
Qualifications
Basic Qualifications:
• BS+8 / MS+6 / PhD in CS/Engineering/Data disciplines.
• Demonstrated production delivery experience in data/ML/HPC at scale.
• Demonstrated literacy in a relevant scientific domain (e.g., biology, chemistry, therapeutic discovery).
Preferred Qualifications:
• Depth in the assigned pillar (EDF/CDM, Agentic‑ML, or HPC).
• Kubernetes and continuous integration/continuous delivery (CI/CD) at scale; observability, performance tuning, and security-by-design.
• Evidence of standard‑setting and cross‑team influence; mentoring experience.