Senior Manager System Engineering
Remote · Colombia
Job Summary
Senior Manager, Systems Engineering at GoDaddy leads 24/7 production operations for a Data & AI ecosystem, owning MTTR/MTTD targets and driving reliability, cost optimization, and runbook automation across multiple platforms (Redshift, QuickSight, DPaaS, Protegrity, Alation). The role defines and advances Forge Ops: an agent-based operating model transitioning from hero-based to system-based, with responsibilities including incident response, architecture translation of business problems into scalable technical designs, cost-per-query/workload optimization, and executive reporting. The position requires hands-on AWS expertise, experience operating data platforms at scale, and technical leadership across a team to deliver measurable outcomes and governance for enterprise analytics infrastructure.
Required Qualifications
- 5+ years validated 24/7 production operations leadership — leading incident response end-to-end, owning MTTR performance, leading post-mortems (AARs) that produce controls, and driving systemic fixes that reduce incident recurrence
- Hands-on AWS architecture/platform expertise — Redshift, EMR/Airflow, Lambda, EKS, S3, IAM/RBAC, and CDK/CloudFormation — with end-to-end operational and cost ownership of at least two production data or analytics platforms
- Systems and software architecture fluency — translate business requirements into scalable designs and decompose solutions into actionable tasks
- Data platform operations at scale — ETL/ELT pipelines, data lakes, orchestration frameworks (Airflow, EMR), BI tooling; understanding of data quality, SLAs, lineage
- Technical team leadership with operational rigor — lead engineers through sprint-based planning, capacity management, and cross-functional delivery
- Experience with AI/agentic operations — building or operating LLM-based tools such as automated runbooks, incident response agents, AAR generation systems, or bounded auto-recovery workflows
- Familiarity with graph databases or lineage/observability architectures for dependency mapping and early warning in large data ecosystems
- Hands-on experience with Databricks or analytical compute platforms in a production operations context
- Experience with data protection platforms (e.g., Protegrity) and PII/tokenization workflows in large-scale data lake or analytics environments
- Familiarity with ServiceNow/CMDB or equivalent incident management systems (Jira, PagerDuty) as operational systems of record — including MTTR/MTTD tracking and CI/lineage integration
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.