GoDaddy1 week ago

Senior Manager System Engineering

GoDaddy

Remote · Colombia

ColombiaRemoteFull TimeSenior LevelNot SpecifiedE-commerce ServicesEnterprise

Type

Full Time

Level

Senior Level

Education

Not Specified

Company size

Enterprise

Industry

E-commerce Services

Job Summary

Senior Manager, Systems Engineering at GoDaddy leads 24/7 production operations for a Data & AI ecosystem, owning MTTR/MTTD targets and driving reliability, cost optimization, and runbook automation across multiple platforms (Redshift, QuickSight, DPaaS, Protegrity, Alation). The role defines and advances Forge Ops: an agent-based operating model transitioning from hero-based to system-based, with responsibilities including incident response, architecture translation of business problems into scalable technical designs, cost-per-query/workload optimization, and executive reporting. The position requires hands-on AWS expertise, experience operating data platforms at scale, and technical leadership across a team to deliver measurable outcomes and governance for enterprise analytics infrastructure.

Required Qualifications

5+ years validated 24/7 production operations leadership — leading incident response end-to-end, owning MTTR performance, leading post-mortems (AARs) that produce controls, and driving systemic fixes that reduce incident recurrence
Hands-on AWS architecture/platform expertise — Redshift, EMR/Airflow, Lambda, EKS, S3, IAM/RBAC, and CDK/CloudFormation — with end-to-end operational and cost ownership of at least two production data or analytics platforms
Systems and software architecture fluency — translate business requirements into scalable designs and decompose solutions into actionable tasks
Data platform operations at scale — ETL/ELT pipelines, data lakes, orchestration frameworks (Airflow, EMR), BI tooling; understanding of data quality, SLAs, lineage
Technical team leadership with operational rigor — lead engineers through sprint-based planning, capacity management, and cross-functional delivery
Experience with AI/agentic operations — building or operating LLM-based tools such as automated runbooks, incident response agents, AAR generation systems, or bounded auto-recovery workflows
Familiarity with graph databases or lineage/observability architectures for dependency mapping and early warning in large data ecosystems
Hands-on experience with Databricks or analytical compute platforms in a production operations context
Experience with data protection platforms (e.g., Protegrity) and PII/tokenization workflows in large-scale data lake or analytics environments
Familiarity with ServiceNow/CMDB or equivalent incident management systems (Jira, PagerDuty) as operational systems of record — including MTTR/MTTD tracking and CI/lineage integration

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started