The Problem:
Currently, to pass audio data to multimodal models like gemma4:e2b, people are forced to use the images key.
Why this is an issue:
- Naming Confusion: Currently, audio data often has to be passed into a field called images. This is confusing because audio files are not images.
- Future Multimodal Support: For models that support both image and audio simultaneously, a single images bucket creates ambiguity.
Suggested Solution:
Add dedicated 'audio' key.
The Problem:
Currently, to pass audio data to multimodal models like gemma4:e2b, people are forced to use the images key.
Why this is an issue:
Suggested Solution:
Add dedicated 'audio' key.