Senior Site Reliability Engineer at 100networks

San Francisco, CA · noida · full time · senior level

About the role

Join our SRE team to ensure the reliability and performance of systems serving millions of users across the globe. Key Responsibilities: - Design and implement SLO/SLI frameworks for critical services - Build automated incident response and recovery systems - Optimize system performance and reduce operational toil - Design disaster recovery and business continuity solutions - Lead capacity planning and infrastructure scaling initiatives Requirements: - 5+ years of SRE or DevOps experience - Strong background in Linux systems and networking - Experience with Infrastructure as Code (Terraform, Pulumi) - Proficiency in Python, Go, or Bash scripting - Experience with monitoring systems (Prometheus, Grafana, Datadog)

Responsibilities

SLO/SLI design
Incident response
Capacity planning

Requirements

5+ years SRE experience
Infrastructure automation
Monitoring expertise

Skills: Kubernetes, Terraform, Prometheus, Python, Linux, AWS, Incident Management

Browse more jobs on 100Networks