Centific logo
Centific2 days ago

Speech Research Intern-2

$93,600–$93,600 year

Remote · United States or Redmond, Washington, United States

Type
Full Time
Level
Entry Level
Education
Doctorate Or Professional Degree
Company size
Unknown

Job Summary

PhD Research Intern role focusing on designing and evaluating speech-first models, including spoken language models that reason over audio and interact conversationally. Responsibilities include prototyping end-to-end speech dialogue systems, aligning speech encoders with text backbones via lightweight adapters, efficient speech tokenization and temporal compression for long-form audio, and building an evaluation harness covering ASR/ST/SLU and speech QA with streaming metrics. Projects involve prototyping a conversational SLM with a speech encoder and adapters, creating data recipes blending speech with instruction-following corpora, and shipping a minimal demo with streaming inference and logging. Required skills include PhD candidacy in CS/EE (or related), Python/PyTorch with GPU experience, knowledge of Transformers/SSMs, and experience in at least one area such as discrete speech tokens, modality alignment via adapters, or post-training/instruction tuning for speech tasks. Preferred qualifications cover experience with neural speech codecs/vocoders, multilingual or code-switching speech, robustness and safety evaluation, distributed training (FSDP/DeepSpeed) and tools like ESPnet, SpeechBrain, NVIDIA NeMo, and experience with PyTorch ecosystem tools (CUDA, torchaudio/librosa, ONNX/TensorRT). Location options include Redmond, WA or Remote with flexible scheduling; compensation is a $45 per hour rate (annualized to $93,600) with a stipend and opportunities to publish, mentor, and access GPU infrastructure.

Required Qualifications

  • PhD candidate in CS/EE (or related) with research in speech, audio ML, or multimodal LMs
  • Fluency in Python and PyTorch, with hands-on GPU training; familiarity with torchaudio or librosa
  • Working knowledge of modern sequence models (Transformers or SSMs) and training best practices
  • Depth in at least one area: (a) discrete speech tokens/temporal compression, (b) modality alignment to LLMs via adapters, or (c) post-training/instruction tuning for speech tasks
  • Strong experimentation habits: clean code, ablations, reproducibility, and clear reporting
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

$94k – $94k / yr

Speech Research Intern-2 · Centific

Apply on Sorce