Trending

llm-d

llm-dApache-2.02026.05.19

LLM3.2K Stars480 Forks4.9천 조회

llm-d는 Red Hat, Google Cloud, IBM Research, CoreWeave, NVIDIA가 공동 설립한 Kubernetes 네이티브 분산 LLM 추론 서빙 스택입니다. prefill/decode 분리(disaggregation), 계층형 KV 캐시 오프로딩, prefix-cache 및 load-aware 라우팅을 결합해 대규모 모델을 다양한 가속기에서 효율적으로 서비스합니다. OpenAI 호환 API와 SLO 기반 오토스케일링을 제공해 프로덕션 환경에서 안정적인 LLM 서빙 인프라를 구축할 수 있도록 설계되었습니다.

주요 특징

Prefill/Decode 분리(disaggregation) 기반 대형 모델 서빙
계층형 KV 캐시 오프로딩 및 prefix-cache 관리
Prefix-cache 및 부하 인식 인텔리전트 라우팅
OpenAI 호환 API와 배치 처리 지원
SLO 기반 오토스케일링과 흐름 제어

Open Source

llm-d

주요 특징

태그

관련 프로젝트

Hugging Face Transformers

Gemini CLI

LLMs from Scratch

Awesome MCP Servers