Observability Engineer (Prometheus / Grafana / Datadog)
$100,000–$150,000 year
Remote · United States
Job Summary
Observability Engineer responsible for designing and operating enterprise-grade observability platforms across metrics, logs, traces, events, and alerts. Architect Prometheus/Thanos/Mimir, Grafana, Loki, Tempo, OpenTelemetry, and Datadog deployments for high availability and scale; develop instrumentation standards and SLO/SLI frameworks; build dashboards and alerting pipelines; lead incident readiness and on-call workflows; mentor engineering teams and produce runbooks and documentation. Requires 5+ years in SRE/platform/observability, deep OpenTelemetry and distributed tracing knowledge, proficiency in Go/Python/Java, and experience with high-cardinality, high-throughput telemetry pipelines.
Required Qualifications
- Bachelor’s degree in Computer Science or a related field
- Five or more years of experience in SRE, platform engineering, or observability roles
- Deep hands-on experience with Prometheus, Grafana, and at least one major commercial observability platform such as Datadog, New Relic, or Splunk
- Strong understanding of OpenTelemetry, distributed tracing, and structured logging
- Proficiency in Go, Python, or Java
- Experience operating high-cardinality, high-throughput metrics and log pipelines
- Strong understanding of SLOs, error budgets, and SRE principles
- Experience integrating observability with CI/CD and incident management tooling
- Solid grasp of Linux internals, networking, and container platforms
- Excellent communication and collaboration skills
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.