Staff SRE - Observability
Chicago, Illinois, United States or New York City, New York, United StatesRemoteFull Time$160,000–$200,000 yearMid LevelNot SpecifiedSmall
Job Summary
Seeking an experienced Staff Observability Consultant with expertise in OpenTelemetry and strong Platform Engineering skills to help organizations optimize their observability infrastructure. Candidates should have a solid background in monitoring, distributed systems, and cloud engineering.
Required Qualifications
- 3-7 years of experience in observability, monitoring, and distributed systems
- Deep hands-on experience with OpenTelemetry ecosystem, including SDKs, APIs, and specifications
- Proficiency with OpenTelemetry Collector configuration, processors, exporters, and receivers
- Strong understanding of telemetry data models, semantic conventions, and instrumentation best practices
- 5+ years of Platform Engineering or DevOps experience with focus on site reliability, observability, and incident response
- Proficiency with Infrastructure as Code tools (Terraform, Pulumi, CloudFormation, CDK)
- Strong experience with CI/CD platforms (GitHub Actions, GitLab CI, Jenkins, ArgoCD)
- Hands-on experience with major cloud providers (AWS, GCP, Azure) and their observability services
- Experience with container technologies (Docker, Podman) and container registries
- Knowledge of networking, security, load balancing, and distributed systems concepts
- Experience implementing SRE practices including error budgets and toil metrics
- Proficiency in incident management, on-call procedures, and post-mortem culture
- Experience with capacity planning, performance optimization, and scalability design
- Proficiency in multiple programming languages preferred (Go, Python, Java, Node.js, Rust)
- Strong scripting and automation skills (Bash, Python, PowerShell)
- Understanding of software engineering best practices and testing methodologies
Desired Qualifications
- Understanding of Large Language Models (LLMs) and their application in DevOps
- Knowledge of vector databases, embeddings, and retrieval-augmented generation (RAG)
- Experience with AI/ML model deployment and monitoring in production environments
- Strong technical writing and documentation skills
- Ability to present complex technical concepts to diverse stakeholders
- A passion for knowledge sharing
Additional Requirements
- This role will require being in the Chicago office three days per week and up to 20% travel within the United States.
- Focused is unable to sponsor or take over sponsorship of the employment Visa process at this time.