Skip to content

Commit 4fcc24f

Browse files
BBR multi lora guide (#1940)
* Extending serving multiple AI models guide with an example of how to serve multiple LoRAs (many LoRAs per one model while having multiple models) * Changes to PR to address feedback of the reviewers * Address review comments from PR #1859: -- The BBR guide is aligned with Getting Started (Main/Latest) -- There are only two models deployed, with the second one being a simulator -- Formatting issues and style fixed -- Typos and dangling sentences fixed -- The LoRA names are completely different -- The Routing example simplified: one HTTPRoute with matchers * Adds missing Kgateway and Nginx tabs for the second EPP model deployment * fixes formatting typos * Update config/manifests/vllm/sim-deployment-1.yaml Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> * Update site-src/guides/serve-multiple-genai-models.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> * Update site-src/guides/serve-multiple-genai-models.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> * Update site-src/guides/serve-multiple-genai-models.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> * Update site-src/guides/serve-multiple-genai-models.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> * Update site-src/guides/serve-multiple-genai-models.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> * Addressing reviewer (shmuelk) comment to include an explicit setting of PORT and IP when trying out multiple LLM setup --------- Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
1 parent 3635574 commit 4fcc24f

File tree

3 files changed

+451
-135
lines changed

3 files changed

+451
-135
lines changed
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
apiVersion: gateway.networking.k8s.io/v1
2+
kind: HTTPRoute
3+
metadata:
4+
name: llm-llama-route
5+
spec:
6+
parentRefs:
7+
- group: gateway.networking.k8s.io
8+
kind: Gateway
9+
name: inference-gateway
10+
rules:
11+
- backendRefs:
12+
- group: inference.networking.k8s.io
13+
kind: InferencePool
14+
name: vllm-llama3-8b-instruct
15+
matches:
16+
- path:
17+
type: PathPrefix
18+
value: /
19+
headers:
20+
- type: Exact
21+
name: X-Gateway-Model-Name
22+
value: 'meta-llama/Llama-3.1-8B-Instruct'
23+
- path:
24+
type: PathPrefix
25+
value: /
26+
headers:
27+
- type: Exact
28+
name: X-Gateway-Model-Name
29+
value: 'food-review-1'
30+
timeouts:
31+
request: 300s
32+
---
33+
apiVersion: gateway.networking.k8s.io/v1
34+
kind: HTTPRoute
35+
metadata:
36+
name: llm-deepseek-route #give this HTTPRoute any name that helps you to group and track the matchers
37+
spec:
38+
parentRefs:
39+
- group: gateway.networking.k8s.io
40+
kind: Gateway
41+
name: inference-gateway
42+
rules:
43+
- backendRefs:
44+
- group: inference.networking.k8s.io
45+
kind: InferencePool
46+
name: vllm-deepseek-r1
47+
matches:
48+
- path:
49+
type: PathPrefix
50+
value: /
51+
headers:
52+
- type: Exact
53+
name: X-Gateway-Model-Name
54+
value: 'deepseek/vllm-deepseek-r1'
55+
- path:
56+
type: PathPrefix
57+
value: /
58+
headers:
59+
- type: Exact
60+
name: X-Gateway-Model-Name
61+
value: 'ski-resorts'
62+
- path:
63+
type: PathPrefix
64+
value: /
65+
headers:
66+
- type: Exact
67+
name: X-Gateway-Model-Name
68+
value: 'movie-critique'
69+
timeouts:
70+
request: 300s
71+
---
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
apiVersion: apps/v1
2+
kind: Deployment
3+
metadata:
4+
name: vllm-deepseek-r1
5+
spec:
6+
replicas: 1
7+
selector:
8+
matchLabels:
9+
app: vllm-deepseek-r1
10+
template:
11+
metadata:
12+
labels:
13+
app: vllm-deepseek-r1
14+
spec:
15+
containers:
16+
- name: vllm-sim
17+
image: ghcr.io/llm-d/llm-d-inference-sim:v0.6.1
18+
imagePullPolicy: Always
19+
args:
20+
- --model
21+
- deepseek/vllm-deepseek-r1
22+
- --port
23+
- "8000"
24+
- --max-loras
25+
- "2"
26+
- --lora-modules
27+
- '{"name": "ski-resorts"}'
28+
- '{"name": "movie-critique"}'
29+
env:
30+
- name: POD_NAME
31+
valueFrom:
32+
fieldRef:
33+
fieldPath: metadata.name
34+
- name: NAMESPACE
35+
valueFrom:
36+
fieldRef:
37+
fieldPath: metadata.namespace
38+
ports:
39+
- containerPort: 8000
40+
name: http
41+
protocol: TCP
42+
resources:
43+
requests:
44+
cpu: 10m

0 commit comments

Comments
 (0)