Senior DevOps Engineer
Hybrid · Austin, Texas, United States or Reston, Virginia, United States
Job Summary
Senior DevOps Engineer to design, architect, and operate highly available, multi-tenant Kubernetes platforms across cloud and on-premises environments; own the full networking stack (CNI, service mesh, ingress, DNS, load balancing, network policy) and database operations for production Postgres and Opensearch clusters; collaborate with engineering to define platform standards; automate provisioning and lifecycle management with Terraform and Ansible; enforce security best practices; lead incident response and post-mortems; identify and remediate scalability and reliability risks; require 5+ years in platform/infra/DevOps, extensive Kubernetes and Linux networking expertise, hybrid-cloud experience, IaC with Terraform/Ansible, Helm/Kustomize, Python/Go/Bash, and Postgres/Elasticsearch operators; familiarity with storage (CSI/Rook/Ceph) and observability (Prometheus, Grafana, Loki, Tempo); strong communication and teamwork; security-clearance eligibility with Top Secret/SCI preferred; flexible hybrid work with offices in Reston, VA and Austin, TX, and remote options.
Required Qualifications
- 5+ years in Platform Engineering, Infrastructure Engineering, or DevOps supporting large-scale distributed systems
- 5+ years of Kubernetes experience — cluster architecture, multi-tenancy, RBAC, scheduling, and autoscaling across cloud and bare-metal
- Strong Kubernetes networking knowledge — CNI (Calico, Cilium), service mesh (Istio, Traefik), ingress controllers, and NetworkPolicy
- Linux networking fundamentals — TCP/IP, DNS, BGP, and network troubleshooting
- Experience designing and operating infrastructure across hybrid environments — on-premises, edge, and multiple cloud providers (AWS, Azure, OCI)
- Infrastructure as code proficiency — Terraform and Ansible
- Working knowledge of Helm/Kustomize for application packaging and deployment
- Proficiency in Python, Go, and Bash for automation and tooling
- Postgres experience — replication, HA/failover, connection pooling (PgBouncer), query tuning, backup/recovery, and Kubernetes operators (CloudNativePG)
- Experience operating stateful workloads on Kubernetes including Elasticsearch/Opensearch and Postgres
- Experience operating storage solutions (CSI, Rook/Ceph)
- Cloud-native observability experience — Prometheus, Grafana, Loki, and Tempo
- Demonstrated ability to deliver results on time and with high quality
- Effective communication skills and the ability to work effectively across multiple business and technical teams
- Active US Security clearance or eligibility and willingness to obtain a US Security clearance. Top Secret with SCI eligibility highly preferred
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.