Autodoc logo

SRE Engineer (Cloud Infrastructure) m/f/d

Autodoc

Posted 2 days ago

Job Description

We are looking for an SRE Engineer to join one of our Cloud Infrastructure teams. Our Cloud Infrastructure department consists of 4 specialized SRE teams. Each team is dedicated to a specific area, supporting its own group of services and development units.

In this role, you will be part of a team responsible for an area that supports over 200 services running in GCP/GKE. As an SRE, you will balance between maintaining high system availability for your domain and engineering new solutions to enhance our global infrastructure.

Responsibilities

  • Service Ownership: Act as the primary point of contact for developers within your domain, handling service-related queries in chats and managing SRE-specific tasks.
  • Infrastructure Evolution: Maintain and improve current cloud infrastructure, ensuring high availability and scalability.
  • Embedded DevOps: Integrate SRE/DevOps best practices into the development lifecycle, from architecture planning to deployment.
  • Innovation & PoC: Research, develop, and implement new infrastructure tools; conduct Proof of Concept (PoC) projects to drive technical excellence.
  • Automation: Partner with the Automation team to build efficient CI/CD pipelines and custom automated workflows.
  • Reliability & Metrics: Participate in developing quality metrics (SLIs/SLOs) and maintain comprehensive project documentation.
  • On-call Support: Join the on-call rotation to ensure 24/7 stability of our mission-critical services.

Qualifications

  • 3+ years as a SRE/DevOps Engineer.
  • Proven experience with containerization and orchestration tools, Kubernetes is the must (GKE is preferred).
  • Knowledge of SRE/DevOps methodologies, such as CI/CD, IaC, gitOps, etc.
  • Knowledge of at least one tool from the gitOps approach (FluxCD is preferred).
  • Experience in Cloud based infrastructures (GCP is preferred).
  • Research and troubleshooting skills.
  • Experience in administering and tuning relational and columnar databases, specifically PostgreSQL, MySQL, and ClickHouse.
  • Experience in deployment and maintenance of distributed high-load systems.
  • Experience in development of fault-tolerance mechanisms - clustering, replication, scaling approaches, etc.
  • Configuration of monitoring solutions (Grafana, VictoriaMetric (operator) are preferred).
  • Good scripting skills (bash / python are preferred).

Will be as a plus:

  • Configuration of logging/tracing solutions (open telemetry stack, ViktoriaLogs, Grafana Loki, Grafana Tempo are preferred).
  • Hands-on knowledge of maintaining and scaling Elasticsearch.
  • Proficiency with message brokers and event-streaming platforms such as Kafka and RabbitMQ.
  • Proficiency with GitlabCI.
  • Proficiency in developing, maintaining, and refactoring complex Helm charts.
  • Experience in migrating applications to Kubernetes.
  • Deep understanding of Linux-like OS processes.
  • Experience in implementing security controls in containerized environments.
  • Boundless desire to automate any processes with an emphasis on improving security.
  • Excellent communication skills
  • Spoken English

Additional Information

Want to see the full job description?

Sign in to view the complete details and apply to this position.

Job details

Workplace

Office

Location

Chișinău, Chisinau, Moldova, republic of

Similar

Jobr Assistant extension

Get the extension →