openvinotoolkit · dtrawins · Mar 9, 2026 · Mar 9, 2026 · Mar 9, 2026 · Mar 9, 2026
diff --git a/demos/README.md b/demos/README.md
@@ -5,16 +5,13 @@
 maxdepth: 1
 hidden:
 ---
-ovms_demos_continuous_batching_agent
+ovms_demos_continuous_batching
 ovms_demos_integration_with_open_webui
+ovms_demos_code_completion_vsc
+ovms_demos_audio
 ovms_demos_rerank
 ovms_demos_embeddings
-ovms_demos_continuous_batching
-ovms_demo_long_context
 ovms_demos_continuous_batching_vlm
-ovms_demos_llm_npu
-ovms_demos_vlm_npu
-ovms_demos_code_completion_vsc
 ovms_demos_image_generation
 ovms_demo_clip_image_classification
 ovms_demo_age_gender_guide
@@ -40,10 +37,8 @@ ovms_demo_real_time_stream_analysis
 ovms_demo_using_paddlepaddle_model
 ovms_demo_bert
 ovms_demo_universal-sentence-encoder
-ovms_demo_benchmark_client
 ovms_string_output_model_demo
 ovms_demos_gguf
-ovms_demos_audio
 
 ```
 

diff --git a/demos/continuous_batching/README.md b/demos/continuous_batching/README.md
diff --git a/demos/continuous_batching/speculative_decoding/README.md b/demos/continuous_batching/speculative_decoding/README.md
@@ -1,4 +1,4 @@
-# How to serve LLM Models in Speculative Decoding Pipeline{#ovms_demos_continuous_batching_speculative_decoding}
+# LLM Models in Speculative Decoding Pipeline{#ovms_demos_continuous_batching_speculative_decoding}
 
 Following [OpenVINO GenAI docs](https://docs.openvino.ai/2026/openvino-workflow-generative/inference-with-genai.html#efficient-text-generation-via-speculative-decoding):
 > Speculative decoding (or assisted-generation) enables faster token generation when an additional smaller draft model is used alongside the main model. This reduces the number of infer requests to the main model, increasing performance.

diff --git a/demos/continuous_batching/vlm/README.md b/demos/continuous_batching/vlm/README.md
diff --git a/demos/embeddings/README.md b/demos/embeddings/README.md
@@ -1,4 +1,4 @@
-# How to serve Embeddings models via OpenAI API {#ovms_demos_embeddings}
+# Text Embeddings models via OpenAI API {#ovms_demos_embeddings}
 This demo shows how to deploy embeddings models in the OpenVINO Model Server for text feature extractions.
 Text generation use case is exposed via OpenAI API `embeddings` endpoint.
 

diff --git a/demos/integration_with_OpenWebUI/README.md b/demos/integration_with_OpenWebUI/README.md
@@ -1,4 +1,4 @@
-# Demonstrating integration of Open WebUI with OpenVINO Model Server {#ovms_demos_integration_with_open_webui}
+# Open WebUI with OpenVINO Model Server {#ovms_demos_integration_with_open_webui}
 
 ## Description
 
@@ -70,7 +70,9 @@ Go to [http://localhost:8080](http://localhost:8080) and create admin account to
 
 ![get started with Open WebUI](./get_started_with_Open_WebUI.png)
 
-### Reference
+> **Important Note**: While using NPU device for acceleration and model gpt-oss-20b with GPU, it is recommended to disable `Follow-Up Auto-Generation` in `Settings > Interface` menu. It will improve response time and avoid queuing requests. For gpt-oss model it will avoid concurrent execution which in version 2026.0 has an accuracy issue.
-> **Important Note**: While using NPU device for acceleration and model gpt-oss-20b with GPU, it is recommended to disable `Follow-Up Auto-Generation` in `Settings > Interface` menu. It will improve response time and avoid queuing requests. For gpt-oss model it will avoid concurrent execution which in version 2026.0 has an accuracy issue.
+> **Important Note**: While using NPU device for acceleration or model gpt-oss-20b with GPU, it is recommended to disable `Follow-Up Auto-Generation` in `Settings > Interface` menu. It will improve response time and avoid queuing requests. For gpt-oss model it will avoid concurrent execution which in version 2026.0 has an accuracy issue.
-> **Important Note**: While using NPU device for acceleration and model gpt-oss-20b with GPU, it is recommended to disable `Follow-Up Auto-Generation` in `Settings > Interface` menu. It will improve response time and avoid queuing requests. For gpt-oss model it will avoid concurrent execution which in version 2026.0 has an accuracy issue.
+> **Important Note**: While using NPU device for acceleration or model gpt-oss-20b with GPU, it is recommended to disable `Follow-Up Auto-Generation` in `Settings > Interface` menu. It will improve response time and avoid queuing requests. For gpt-oss model it will avoid concurrent execution which in version 2026.0 has an accuracy issue.
+
+### References
 [https://docs.openvino.ai/2026/model-server/ovms_demos_continuous_batching.html](https://docs.openvino.ai/2026/model-server/ovms_demos_continuous_batching.html#model-preparation)
 
 [https://docs.openwebui.com](https://docs.openwebui.com/#installation-with-pip)

diff --git a/demos/rerank/README.md b/demos/rerank/README.md
@@ -1,8 +1,8 @@
-# How to serve Rerank models via Cohere API {#ovms_demos_rerank}
+# Documents Reranking via Cohere API {#ovms_demos_rerank}
 
 ## Prerequisites
 
-**Model preparation**: Python 3.9 or higher with pip 
+**Model preparation**: Python 3.10 or higher with pip 
 
 **Model Server deployment**: Installed Docker Engine or OVMS binary package according to the [baremetal deployment guide](../../docs/deploying_server_baremetal.md)
 

diff --git a/demos/vlm_npu/README.md b/demos/vlm_npu/README.md
@@ -1,4 +1,4 @@
-# Serving for Text generation with Visual Language Models with NPU acceleration {#ovms_demos_vlm_npu}
+# NPU for Visual Language Models {#ovms_demos_vlm_npu}
 
 
 This demo shows how to deploy VLM models in the OpenVINO Model Server with NPU acceleration.
@@ -11,9 +11,7 @@ It is targeted on client machines equipped with NPU accelerator.
 
 ## Prerequisites
 
-**OVMS 2025.1 or higher**
-
-**Model preparation**: Python 3.9 or higher with pip and HuggingFace account
+**Model preparation**: Python 3.10 or higher with pip and HuggingFace account
 
 **Model Server deployment**: Installed Docker Engine or OVMS binary package according to the [baremetal deployment guide](../../docs/deploying_server_baremetal.md)