Qumulo logo

Site Reliability Engineer (SRE) - Hybrid Cloud Storage

Qumulo

Posted 2 days ago

Qumulo's cloud data platform manages exabytes of the world's most demanding data, unifying files, objects, and every workload across edge, core, and cloud.

About the Role:

You'll be one of the first hires on a team with a single mandate: find out how Qumulo breaks before our customers do. The platform manages exabytes of data for more than 1,100 customers across on-prem and every major cloud, and these are mission-critical workloads where a missed edge case becomes a customer's bad day.

This is a test-centric SRE role for an engineer who thinks like a breaker. You'll put on the customer's hat, work out how a feature will really get used, and design the tests that push it past its limits across hardware and cloud. You'll automate the testing our principal engineers run by hand today, and decide what gets tested, how often, and why. You'll help build this function from the ground up, including where we set the quality bar and which builds are good enough to ship.


As a Site Reliability Engineer at Qumulo, you will:

  • Design and operationalize testing for new features: work out how customers will actually use them, how to scale-test them, and how to break them

  • Automate the manual, repetitive testing our principal engineers run by hand today, using Python and our in-house frameworks on Jenkins and Argo

  • Build a data-driven plan for which tests run, how often, and why, plus the framework to schedule and rerun them

  • Troubleshoot build and test failures across VM instances and Qumulo-qualified hardware, from compile-time errors to integration failures

  • Read cluster output and C error logs to tell a test problem from an infrastructure problem from a real bug

  • Set up monitoring and alerting so problems surface early (we use OpenMetrics, Grafana, InfluxDB, and Prometheus alongside home-grown tooling)

  • Help set the quality bar for releases, including a real say in what ships

  • Take part in an on-call rotation for the systems your team owns


Our ideal SRE will have:

  • 3+ years building and operating automated testing, validation, and/or certification for complex software systems

  • Strong programming ability in C. Experience with Qumulo’s distributed file system, or parallel filesystems, would be a major plus

  • A real breaker's instinct. You go looking for edge cases and ask "what happens if I do this?" before anyone asks you to

  • A track record of building tests yourself, not just running test plans handed to you

  • Hands-on experience across both on-premises infrastructure and cloud (AWS, GCP, or Azure), with a real grasp of where each one's limits are

  • Working fluency in Linux (we run Ubuntu) and Python

  • A data-driven approach to deciding what to test and how often

  • Experience with orchestration tools (Ansible, Terraform), containers, and Kubernetes

  • Solid understanding of networks (routing, firewalls, security inspection devices, switch configuration) a plus

  • Storage (IOPS, Latency, read/write patterns) or protocol experience (NFS, SMB, S3, ) a strong plus

The annual pay range for the role is USD $140,000 - $210,000. Individual pay depends on various factors, such as role level, relevant experience, and skills.

 

Benefits & Perks:

  • Pre-IPO stock options

  • Flexible time-off policy

  • HSA and PPO health insurance options

  • Dental and Vision insurance

  • 401(k) plan

  • Choice of an ORCA card or parking subsidy

About Qumulo:
Built for the most demanding enterprise workloads, from AI and HPC simulations to Splunk Observability, genomics and PACS medical imaging, geospatial datasets, media editing and rendering, and video surveillance, Qumulo unlocks the full power of an organization’s information. Our platform unifies file and object storage across data centers, edge, and public clouds, enabling efficient, accelerated computing and extending the reach of data unbound by protocol or transport limitations.

With more than 1,100 customers and exabytes of data under management, Qumulo powers mission-critical workloads anywhere real-time access to massive file datasets is non-negotiable. Qumulo delivers radical simplicity, hardware freedom, exceptional customer support, and a true hybrid-cloud architecture.

Our Values:
At Qumulo, we are building an open and collaborative culture where people can do their best work with customers as our magnetic field. We act as owners, we share by default, we are data driven and experimental and as an inclusive workplace, we encourage and celebrate multiple points of view. As part of our culture we believe diversity drives innovation.

Equal Opportunity Employer:
Qumulo is an Equal Opportunity Employer.

Want to see the full job description?

Sign in to view the complete details and apply to this position.

Job details

Workplace

Hybrid

Location

Seattle (hybrid)

Salary

140k - 210k USD

per year

Similar

Jobr Assistant extension

Get the extension →