Senior Site Reliability Engineer
Experian
Posted about 21 hours ago
Job Description
You will collaborate with Cloud Engineering & Operations to develop a unified self-service observability platform that enables teams to instrument, monitor, and troubleshoot applications across on-prem, cloud, and hybrid environments.
Your work will involve integrating telemetry pipelines, event management workflows, and automation frameworks. You'll standardize observability practices by building reusable templates, automation scripts, and onboarding accelerators, embedding observability into CI/CD pipelines, and driving OpenTelemetry adoption. These efforts will enhance developer experience, reduce operational overhead, and improve system reliability across Experian’s global technology ecosystem.
Primary Responsibility:
Design and implement monitoring and observability solutions using Dynatrace and Splunk (must-have), along with Datadog and Open Telemetry, to build scalable, automated, and developer-friendly platforms
Develop reusable patterns, templates, and automation scripts to drive consistency across observability practices and reduce manual effort in telemetry onboarding.
Build and maintain dashboards that deliver actionable insights into system performance, reliability, and user experience.
Integrate observability into CI/CD workflows using Jenkins and related tooling to enable continuous feedback and faster incident detection.
Automate infrastructure provisioning and deployment using Terraform and Ansible to support observability at scale.
Implement and manage Open Telemetry pipelines for standardized collection of traces, metrics, and logs, supporting vendor-agnostic ingestion strategies.
Collaborate with Business Units (BUs), Developers, and Platform Engineers to embed observability into the software delivery lifecycle and improve developer experience.
Define and implement SLIs/SLOs and error budgets with Business Units to support reliability engineering and improve service health visibility.
Enhance operational excellence by enabling proactive monitoring, reducing customer pain points, and streamlining incident workflows.
Amplify AIOps outcomes by integrating observability data into intelligent automation and decision-making across technology and business teams.
AWS & Cloud Operations:
Manage and operate systems hosted on AWS (EC2, EKS/ECS, RDS, S3, Lambda, CloudWatch, IAM, VPC)
Support cloud deployments and infrastructure changes following best practices
Assist with backup, disaster recovery, and resiliency planning
Incident Management:
Participating in production incident response, troubleshooting, and service restoration
Perform root cause analysis (RCA) and contribute to post‑incident reviews
Help implement preventive actions to avoid incident recurrence
Secondary Skills:
Reliability & Operations:
Support high availability, scalability, and performance of production systems
Implement and maintain SLIs, SLOs, and SLAs for services
Identify and reduce operational toil through automation and process improvement
Support design and implementation of fault tolerant and resilient systems
Collaboration:
Work closely with application and Engineering teams to embed reliability into system design
Act as a strong team player, sharing knowledge and supporting team goals
Communicate effectively with technical and nontechnical stakeholders
Qualifications
Overall: 5+ years of experience in production support and building scalable, automated, and developer-friendly observability platforms.
Cloud Expertise: Minimum 3+ years of hands‑on experience with AWS environments
Must have Skills:
- Cloud platform expertise: AWS, with 3+year of experience
- Monitoring & observability: Over 3 years of hands-on experience with Dynatrace, with expertise in creating dashboards leveraging logs, metrics, and traces
Additional Information
Our uniqueness is that we celebrate yours. Experian's culture and people are important differentiators. We take our people agenda very seriously and focus on what matters; DEI, work/life balance, development, authenticity, collaboration, wellness, reward & recognition, volunteering... the list goes on. Experian's people first approach is award-winning; World's Best Workplaces™ 2024 (Fortune Top 25), Great Place To Work™ in 24 countries, and Glassdoor Best Places to Work 2024 to name a few. Check out Experian Life on social or our Careers Site to understand why.
Experian is proud to be an Equal Opportunity and Affirmative Action employer. Innovation is an important part of Experian's DNA and practices, and our diverse workforce drives our success. Everyone can succeed at Experian and bring their whole self to work, irrespective of their gender, ethnicity, religion, colour, sexuality, physical ability or age.
Job details
Jobr Assistant extension
Get the extension →