Cheggtoday

Senior Software Engineer - Model Training & AI Evals

Chegg

Remote · India

IndiaRemoteFull TimeSenior LevelNot SpecifiedEnterprise

Type

Full Time

Level

Senior Level

Education

Not Specified

Company size

Enterprise

Job Summary

Senior Software Engineer to own end-to-end evaluation and benchmarking infrastructure for LLMs and base models, contribute hands-on to post-training pipelines, and lead domain-specific benchmarks and synthetic data generation to drive model improvements. Responsibilities include designing task-level evaluation frameworks, building comparative benchmarking pipelines, producing capability gap reports, tracking model-version regressions, and collaborating with product, curriculum, and research teams to translate eval insights into post-training and data flywheel decisions. Requires hands-on experience with SFT, RLHF, RLAIF, DPO, PPO, reward modeling, and data quality criteria, plus strong software engineering skills (Python, PyTorch/JAX) and experience with CI/CD and experiment tracking.

Required Qualifications

5+ years of ML/AI engineering experience, with at least 2–3 years focused on large language models
Direct, hands-on experience at an LLM lab, AI research organization, or equivalent frontier AI team
Familiarity with the full model lifecycle: pre-training data, post-training alignment, eval, and production deployment
Deep practical expertise in post-training methods: SFT, RLHF, RLAIF, DPO, PPO
Experience with reward modeling, preference data curation, and quality control for alignment pipelines
Demonstrated experience designing LLM evaluation frameworks beyond standard benchmarks
Hands-on experience building synthetic data generation pipelines for addressing model capability gaps
Validating synthetic data quality through downstream model performance experiments
Proven track record of comparative benchmarking across leading foundation models
Experience training or fine-tuning vertical/industry-specific foundation models
Strong software engineering fundamentals: Python, PyTorch or JAX, distributed training
Publications or applied research contributions in LLM evaluation or alignment (preferred)
Experience with multi-modal models or agents with external tool/API use
Exposure to red-teaming, adversarial evaluation, or safety benchmarking
Model distillation, speculative decoding, or inference optimization experience
Prior experience in education, STEM, legal, biomedical, or enterprise software vertical

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started