eesungkim
  • Home
  • Log
  • Search

Notes

  • Ml-engineering
    • Profiling
      • Modern Memory Snapshot in PyTorch
      • NeMo PytorchProfilerCallback — Chakra Traces and Execution Profiling
      • Perfetto Trace Debugging for Distributed Training
      • TensorBoard PyTorch Profiler — Distributed Debugging
      • Reading the PyTorch Profiler in TensorBoard
      • Understanding GPU Memory in PyTorch
    • Training
      • Finding the Bottleneck in Distributed Training
      • Merging NeMo FSDP Distributed Checkpoints into a Single File
      • Distributed Training with torchrun

Log

  • SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
  • Review: Streaming SpeechLLM Back-end Architecture for ASR & Translation
  • qwen3_asr_triton vs speechLLM (vLLM): Performance & Architecture Comparison
  • Review: Streaming SpeechLLM Back-end Architecture for ASR & Translation Mar 5, 2026
  • qwen3_asr_triton vs speechLLM (vLLM): Performance & Architecture Comparison Mar 2, 2026
  • SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model Feb 16, 2026
  • Triton devlog: first steps Jan 16, 2026

© 2026 eesungkim.com