Freelance Agent Evaluation Engineer
$104,000–$104,000 year
Remote · Japan or Osaka, Osaka, Japan
Job Summary
Project-based evaluation engineering role building realistic developer environments, crafting tasks from intermediate states, and writing tests to evaluate AI agents. You will design tasks, define success criteria, ensure solvability by AI agents, and iterate based on QA feedback. Requires 5+ years of software development experience and proficiency with Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, and Redis; strong test-writing experience; and English proficiency (B2+). This role is remote-friendly and organized around freelance, part-time engagement with flexible scheduling and project-based workload.
Required Qualifications
- 5+ years in software development
- Core stack: Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, Redis
- Experience writing tests (functional, integration)
- English proficiency - B2+
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.