vLLM docker build
Vllm - Docker Build
- Docker Images
- https://hub.docker.com/r/dwani/vllm-arm64
-
docker pull dwani/vllm-arm64:latest
-
Create vLLM arm64 - Docker Image ```bash
git clone https://github.com/vllm-project/vllm.git
cd vllm
DOCKER_BUILDKIT=1 sudo docker build . \ --file docker/Dockerfile \ --target vllm-openai \ --platform "linux/arm64" \ -t vllm/vllm-openai:latest \ --build-arg max_jobs=16 \ --build-arg nvcc_threads=4 \ --build-arg VLLM_MAX_SIZE_MB=1000 \ --build-arg torch_cuda_arch_list=""
sudo docker tag vllm/vllm-openai:latest dwani/vllm-arm64:latest
sudo docker push dwani/vllm-arm64:latest
Test Build - sudo docker run --gpus all \ -p 8000:8000 \ dwani/vllm-arm64:latest \ --model Qwen/Qwen3-0.6B \ --port 8000
sudo docker run --runtime nvidia -it --rm -p 9000:9000 dwani/vllm-arm64:latest --model RedHatAI/gemma-3-27b-it-FP8-dynamic --served-model-name gemma3 --host 0.0.0.0 --port 9000 --gpu-memory-utilization 0.7 --tensor-parallel-size 1 --max-model-len 65536 --dtype bfloat16 --disable-log-requests
https://docs.vllm.ai/en/latest/deployment/docker.html#building-vllms-docker-image-from-source
-- Extra - vllm - python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000
- openweb-ui
docker run -d \ --name open-webui \ -p 3000:8080 \ -v open-webui:/app/backend/data \ -e OPENAI_API_BASE_URL=http://0.0.0.0:8000/v1 \ --restart always \ ghcr.io/open-webui/open-webui:main
sudo docker run --gpus all \ -p 8000:8000 \ vllm/vllm-openai \ --model Qwen/Qwen3-0.6B \ --port 8000
--
https://docs.vllm.ai/en/latest/deployment/docker.html#building-for-arm64aarch64