American Airlines logo
American Airlines3 days ago

Engineer/Sr Engineer, IT Site Reliability

On-site · Fort Worth, Texas, United States

Type
Full Time
Level
Senior Level
Education
Not Specified
Company size
Unknown

Job Summary

Build end-to-end monitoring infrastructure (Logging, Metrics, Tracing) and work closely with other Product Teams to provide the right tooling to measure the reliability of our systems. Collaborate with development and operations teams to ensure availability and reliability of applications, hardware and infrastructure. Manage physical servers, virtual machines, network equipment, hardware control systems, autonomous mobile robots (AMRs), autonomous guided vehicles (AGVs). Administer SQL Server instances for backups, restores, data purges, and failovers. Efficiently handle live production incidents, debug/troubleshoot application, hardware and infrastructure related issues and implement SRE best practices. Implement and improve on continuous integration / continuous deployment automation using DevOps tools. Facilitate incident management, post-incident reviews, and remediation tasks to reduce the frequency and severity of incidents.

Required Qualifications

  • 4 years of experience in software engineering, SRE or performance engineering role
  • 2 years of experience in Azure cloud architecture, networking, security and administration
  • Expertise in Terraform and CI/CD tools like Jenkins and GitHub
  • Experience with Event Hub client configuration and monitoring
  • Experience with SQL Server and Mongo databases
  • Hands-on expertise with monitoring and logging tools such as DynaTrace, Mezmo, LogInsight, ThousandEyes
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

American Airlines

Engineer/Sr Engineer, IT Site Reliability

Apply on Sorce