Tell us about your request
SGLang is a highly optimized LLM runtime that is often better optimized than llama.cpp or vllm. It supports numerous quants, doing its own quants, and many tuning parameters. It would be great to be able to use it as a docker model runner.
Which service(s) is this request for?
Docker models.
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Better performance for running LLMs on consumer hardware.
Are you currently working around the issue?
Not using docker models, running sglang manually.
Additional context
Add any other context or screenshots about the feature request here.
Tell us about your request
SGLang is a highly optimized LLM runtime that is often better optimized than llama.cpp or vllm. It supports numerous quants, doing its own quants, and many tuning parameters. It would be great to be able to use it as a docker model runner.
Which service(s) is this request for?
Docker models.
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Better performance for running LLMs on consumer hardware.
Are you currently working around the issue?
Not using docker models, running sglang manually.
Additional context
Add any other context or screenshots about the feature request here.