Blog
- W2
- Bellevue, WA
Website Burlywall LLC
Job Title: Site Reliability Engineer
Location:Bellevue, WA
Positions: 1
We’re seeking a skilled Site Reliability Engineer (SRE) to join our growing engineering team. As an SRE, you will be responsible for building, maintaining, and scaling our production systems while improving reliability, availability, and performance. You’ll work at the intersection of software engineering and infrastructure, automating everything from deployment to monitoring and incident response.
This role is ideal for someone with a passion for operational excellence, infrastructure as code, and a deep understanding of distributed systems.Key Responsibilities:
- Design, implement, and maintain scalable and reliable infrastructure using automation tools.
- Develop and manage monitoring, alerting, and incident response systems to ensure high availability and performance of services.
- Collaborate with development teams to ensure production readiness and enforce best practices for CI/CD, observability, and fault tolerance.
- Troubleshoot and resolve production issues, conducting root cause analysis and implementing postmortem processes.
- Continuously improve deployment pipelines, configuration management, and system orchestration tools.
- Manage cloud infrastructure (e.g. AWS, GCP, Azure), Kubernetes clusters, and containerized applications.
- Define and enforce SLOs/SLIs/SLAs and work proactively to maintain service health and uptime.
- Participate in an on-call rotation, working to minimize pager fatigue through proactive systems improvements.
- Support security, compliance, and audit readiness efforts through automation and monitoring.
Required Qualifications:
- 3–7 years of experience in SRE, DevOps, or backend infrastructure roles.
- Strong understanding of Linux systems administration, networking, and performance tuning.
- Proficiency in scripting and automation using Python, Go, Bash, or similar
- Experience with CI/CD pipelines (e.g. GitLab CI, Jenkins, ArgoCD, etc.).
- Expertise in monitoring and observability tools (e.g. Prometheus, Grafana, ELK/EFK, Datadog).
- Hands-on experience with cloud providers like AWS, GCP, or Azure.
- Strong knowledge of Kubernetes, Docker, and container orchestration best practices.
- Familiarity with infrastructure as code (IaC) using Terraform, Pulumi, or CloudFormation.
- Excellent communication and collaboration skills; ability to work cross-functionally.
Preferred Qualifications:
- Experience in high-scale, high-availability
- Background in incident management, chaos engineering, or resilience testing.
- Familiarity with service mesh technologies (e.g. Istio, Linkerd).
- Experience working in regulated industries (e.g., fintech, healthcare, telecom).
- Contributions to open-source SRE, DevOps, or cloud-native projects.
To apply for this job email your details to jobs@burlywall.com