Qwen3-Embedding fails on optimum/emb path — OVModelForFeatureExtraction doesn't provide position_ids

Hi,
I'm hitting a bug when trying to use Qwen3-Embedding models (Echo9Zulu/Qwen3-Embedding-0.6B-int8_asym-ov).

### What happens

- **On GPU**: immediate crash with  
  `Got unexpected inputs: expecting {'input_ids', 'attention_mask', 'position_ids'} but got {'input_ids', 'attention_mask'}`

- **On CPU**: the forward pass actually runs, but everything comes back as `NaN`, then the server blows up with  
  `ValueError: Out of range float values are not JSON compliant: nan`

Interestingly, the **Qwen3-Reranker** (same model family, same IR input shape) works perfectly fine because it goes through `optimum_rr.py` which uses `OVModelForCausalLM`.

### Environment
- Windows 11 Pro  
- Intel Core Ultra 9 285 + Arc iGPU  
- Python 3.12  
- OpenArc commit: `8856d1d9c2b8a04c1a03143ed0c633a9ebf40987`  
- openvino: `2026.2.0-21876-6b2466c964b`  
- optimum-intel: `1.27.0.dev0+c877c15`

### Quick reproduction

```bash
openarc serve start --host 127.0.0.1
```

Then load with:
```json
{
  "model_name": "qwen3-emb-test",
  "model_path": "C:\\path\\to\\Qwen3-Embedding-0.6B-int8_asym-ov",
  "model_type": "emb",
  "engine": "optimum",
  "device": "CPU"   // or "GPU"
}
```

Hit `/v1/embeddings` → either NaN or the input mismatch error.

I already confirmed the IR really does expose `position_ids` as a required input (3 inputs total).

### Root cause (as I understand it)

In `src/engine/optimum/optimum_emb.py:84` it does:

```python
self.model = OVModelForFeatureExtraction.from_pretrained(...)
```

`OVModelForFeatureExtraction` is designed for **encoder-only** models (BERT-style), where `position_ids` are baked into the IR. Decoder-style embedding models like Qwen3-Embedding (causal LM + last-token pooling + RoPE) export `position_ids` as an explicit input.

The reranker path avoids this because `OVModelForCausalLM` automatically builds `position_ids` from the attention mask. That's why the reranker works but the embedding path doesn't.


### Current fix I'm trying

In `generate_embeddings`, right before calling the model, add a small check:

```python
import torch

batch_dict = self.tokenizer(...)  # unchanged

# Add position_ids if the IR expects it
expected = {name for inp in self.model.model.inputs for name in inp.get_names()}
if "position_ids" in expected and "position_ids" not in batch_dict:
    attn = batch_dict["attention_mask"]
    batch_dict["position_ids"] = (attn.long().cumsum(-1) - 1).clamp(min=0) * attn

outputs = self.model(**batch_dict)
```

This is exactly how `LlamaModel` builds positions internally, and it produces valid embeddings when I test it with raw OpenVINO inference.

A cleaner long-term fix would be to detect decoder-style embeddings and use `OVModelForCausalLM` + `output_hidden_states=True` instead, but the 4-line patch above is low-risk and fixes it immediately.

### I confirm this is not a model conversion issue

- The exported `openvino_model.xml` / `.bin` are fine.
- Same IR works perfectly when I manually supply `position_ids` via raw `openvino.Core`.
- The reranker has the **exact same** 3-input signature and works today.

### Small secondary bug

On GPU, the real error gets logged in `openarc.log`, but the client request just hangs until timeout. The FastAPI exception handler doesn't seem to propagate it properly for the embeddings endpoint. Not critical, but worth fixing later.

--

Hope someone can take a look. Happy to provide more logs or test any patch.

Thanks!

P

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3-Embedding fails on optimum/emb path — OVModelForFeatureExtraction doesn't provide position_ids #106

What happens

Environment

Quick reproduction

Root cause (as I understand it)

Current fix I'm trying

I confirm this is not a model conversion issue

Small secondary bug

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Qwen3-Embedding fails on optimum/emb path — OVModelForFeatureExtraction doesn't provide position_ids #106

Description

What happens

Environment

Quick reproduction

Root cause (as I understand it)

Current fix I'm trying

I confirm this is not a model conversion issue

Small secondary bug

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions