Senior Site Reliability Engineer

DigiCert.com

Office

Lehi, Utah

Full Time

Who we are

We're a leading, global security authority that's disrupting our own category. Our encryption is trusted by the major ecommerce brands, the world's largest companies, the major cloud providers, entire country financial systems, entire internets of things and even down to the little things like surgically embedded pacemakers. We help companies put trust - an abstract idea - to work. That's digital trust for the real world.

Job summary

As a Senior Site Reliability Engineer, you’ll play a key role in designing, operating, and scaling reliable, high-performing systems that support critical business services. You’ll partner closely with engineering teams to improve system resilience, observability, and deployment practices while driving automation across infrastructure and operations. This role is hands-on and highly technical, with ownership over Kubernetes-based platforms, cloud infrastructure, and CI/CD pipelines. You’ll also help shape engineering best practices, mentor teammates, and contribute to a culture of continuous improvement and operational excellence.

What you will do

Design, implement, and maintain highly available and scalable systems
Improve system reliability, performance, and observability
Automate infrastructure provisioning, configuration, and operational tasks
Support and evolve Kubernetes-based platforms, including cluster management
Collaborate with development teams to enable CI/CD and deployment best practices
Participate in an on-call rotation to support production systems and respond to incidents
Troubleshoot production issues across distributed systems
Help tutor and mentor other team members by sharing knowledge, best practices, and guidance
Contribute to infrastructure standards, documentation, and continuous improvement initiatives

Technologies You’ll Work With

Operating Systems: Linux & Windows
Scripting: Bash
Version Control: Git
Configuration Management: Salt
Container Orchestration & Management: Kubernetes, Rancher
CI/CD & Delivery: Harness, GitActions
Infrastructure as Code: Terraform
Cloud Platforms: Private and public cloud environments (e.g., AWS, Azure, GCP, or equivalents)

What you will have

5+ years of experience in Site Reliability Engineering, DevOps, or similar roles
Strong Linux systems administration and Bash scripting experience
Hands-on experience running and supporting Kubernetes in production
Experience with Kubernetes management platforms such as Rancher
Proven experience with Infrastructure as Code (Terraform preferred)
Experience building, maintaining, and supporting CI/CD pipelines
Solid understanding of cloud infrastructure (public and/or private)
Strong troubleshooting skills across complex, distributed systems
Comfortable collaborating across teams and mentoring junior or mid-level engineers

Nice to have

Experience with observability tools (monitoring, logging, alerting)
Experience with large-scale or high-availability systems
Familiarity with security best practices in cloud-native environments
AEM Cloud management experience
Experience supporting regulated or mission-critical environments

Benefits

Generous time off policies 
Top shelf benefits 
Education, wellness and lifestyle support

#LI-KK1