Experian logo

Senior Site Reliability Engineer

Experian

Posted about 21 hours ago

Job Description

 

You will collaborate with Cloud Engineering & Operations to develop a unified self-service observability platform that enables teams to instrument, monitor, and troubleshoot applications across on-prem, cloud, and hybrid environments. 

Your work will involve integrating telemetry pipelines, event management workflows, and automation frameworks. You'll standardize observability practices by building reusable templates, automation scripts, and onboarding accelerators, embedding observability into CI/CD pipelines, and driving OpenTelemetry adoption. These efforts will enhance developer experience, reduce operational overhead, and improve system reliability across Experian’s global technology ecosystem. 

Primary Responsibility:  

  • Design and implement monitoring and observability solutions using Dynatrace and Splunk (must-have), along with Datadog and Open Telemetry, to build scalable, automated, and developer-friendly platforms 

  • Develop reusable patterns, templates, and automation scripts to drive consistency across observability practices and reduce manual effort in telemetry onboarding. 

  • Build and maintain dashboards that deliver actionable insights into system performance, reliability, and user experience. 

  • Integrate observability into CI/CD workflows using Jenkins and related tooling to enable continuous feedback and faster incident detection. 

  • Automate infrastructure provisioning and deployment using Terraform and Ansible to support observability at scale. 

  • Implement and manage Open Telemetry pipelines for standardized collection of traces, metrics, and logs, supporting vendor-agnostic ingestion strategies. 

  • Collaborate with Business Units (BUs), Developers, and Platform Engineers to embed observability into the software delivery lifecycle and improve developer experience. 

  • Define and implement SLIs/SLOs and error budgets with Business Units to support reliability engineering and improve service health visibility. 

  • Enhance operational excellence by enabling proactive monitoring, reducing customer pain points, and streamlining incident workflows. 

  • Amplify AIOps outcomes by integrating observability data into intelligent automation and decision-making across technology and business teams. 

  • AWS & Cloud Operations: 

  • Manage and operate systems hosted on AWS (EC2, EKS/ECS, RDS, S3, Lambda, CloudWatch, IAM, VPC) 

  • Support cloud deployments and infrastructure changes following best practices 

  • Assist with backup, disaster recovery, and resiliency planning 

  • Incident Management: 

  • Participating in production incident response, troubleshooting, and service restoration 

  • Perform root cause analysis (RCA) and contribute to post‑incident reviews 

  • Help implement preventive actions to avoid incident recurrence 

  • Secondary Skills:  

    Reliability & Operations: 

  • Support high availability, scalability, and performance of production systems 

  • Implement and maintain SLIs, SLOs, and SLAs for services 

  • Identify and reduce operational toil through automation and process improvement 

  • Support design and implementation of fault tolerant and resilient systems 

  • Collaboration: 

  • Work closely with application and Engineering teams to embed reliability into system design 

  • Act as a strong team player, sharing knowledge and supporting team goals 

  • Communicate effectively with technical and nontechnical stakeholders 

Qualifications

  • Overall: 5+ years of experience in production support and building scalable, automated, and developer-friendly observability platforms. 

  • Cloud Expertise: Minimum 3+ years of hands‑on experience with AWS environments 

  • Must have Skills:  

           - Cloud platform expertise: AWS, with 3+year of experience  

           - Monitoring & observability: Over 3 years of hands-on experience with Dynatrace, with expertise in creating dashboards leveraging logs, metrics, and traces    

Additional Information

Our uniqueness is that we celebrate yours. Experian's culture and people are important differentiators. We take our people agenda very seriously and focus on what matters; DEI, work/life balance, development, authenticity, collaboration, wellness, reward & recognition, volunteering... the list goes on. Experian's people first approach is award-winning; World's Best Workplaces™ 2024 (Fortune Top 25), Great Place To Work™ in 24 countries, and Glassdoor Best Places to Work 2024 to name a few. Check out Experian Life on social or our Careers Site to understand why.

Experian is proud to be an Equal Opportunity and Affirmative Action employer. Innovation is an important part of Experian's DNA and practices, and our diverse workforce drives our success. Everyone can succeed at Experian and bring their whole self to work, irrespective of their gender, ethnicity, religion, colour, sexuality, physical ability or age.

Want to see the full job description?

Sign in to view the complete details and apply to this position.

Job details

Workplace

Office

Location

Hyderabad, , India

Experience

SE

Similar

Jobr Assistant extension

Get the extension →