Leantech IO1 week ago

Senior Site Reliability Engineer

Leantech IO

Remote · Medellín, Antioquia, Colombia

Medellín, Antioquia, ColombiaRemoteFull TimeSenior LevelNot SpecifiedUnknown

Type

Full Time

Level

Senior Level

Education

Not Specified

Company size

Unknown

Job Summary

Senior Site Reliability Engineer responsible for owning and evolving reliability, security, observability, and operational maturity of the cloud platform with an AI-native mindset. Drive AI-powered automation across infrastructure, incident response, automation, compliance, and operational excellence. Leverage extensive AWS expertise (VPC, ECS, IAM, RDS, S3, CloudFront, Route53, ALB, API Gateway, Lambda), Terraform IaC, and observability tooling (Grafana, log analysis, distributed tracing) to improve production uptime. Lead incident response and postmortems, optimize CI/CD pipelines, and ensure security and governance (SOC-2, ISO 27001, HIPAA, PCI). Strong Linux, Docker, scripting (Bash, Python/Go/TypeScript), and networking fundamentals; collaborate with global teams across LATAM, US, and beyond.

Required Qualifications

AI-Native SRE Operations (Hard Requirement)
Expert-level proficiency using AI to automate SRE and infrastructure operations
Daily use of AI assistants and agentic workflows in engineering practice
Hands-on AI for Terraform authoring and review, incident triage, log analysis, runbook generation, operational automation, postmortem drafting, Lambda automation, pipeline generation
Strong understanding of where AI is effective and where human validation is critical
Ability to articulate AI workflows, tooling choices, safeguards, and production outcomes
Cloud Infrastructure & AWS (Hard Requirement)
10+ years of professional experience operating production infrastructure for SaaS platforms
Minimum 5+ years of senior-level AWS operational ownership
Deep expertise across AWS services (VPC, ECS, IAM, RDS, S3, CloudFront, Route53, ACM, CloudWatch, Secrets Manager, SSM, ALB, API Gateway, Lambda)
Familiarity with AWS security and governance tooling (WAF, GuardDuty, CloudTrail, Inspector, Security Hub, AWS Config, AWS Backup)
Terraform & Infrastructure as Code (Advanced Terraform, multi-account/multi-workspace)
Experience resolving production infrastructure drift safely
Incident Response & Operational Leadership
Observability & Monitoring (Grafana, distributed tracing, log aggregation, alert tuning)
Experience owning CI/CD pipelines end-to-end
Linux, Containers & Networking (Bash, Python/Go/TypeScript, Docker, networking fundamentals)
Security & Compliance (IAM least privilege, encryption, network isolation, vulnerability management, OWASP Top 10 for infra, SAML/OIDC/SCIM provisioning)
Experience implementing and maintaining compliance controls (SOC-2, ISO 27001, HIPAA, PCI)
Experience engaging with auditors and evidencing controls
Nice to Have: Spring Boot/JVM production experience, runtime security/EDR tooling (Falco), SCIM/IdP tooling
AWS certifications (Architect/DevOps/Security)
Kotlin/Java backend understanding from SRE perspective
Soft skills: communication, autonomy, rapid learning, reliability

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started