Skip to content

feat: Add support for OpenAI-compatible LLM endpoints (Groq, Ollama, Azure OpenAI, etc.) #1198

@Shaamam

Description

@Shaamam

🔴 Required Information

Is your feature request related to a specific problem?

Yes. Currently, ADK-Java only supports Google's Gemini and Claude models out of the box. Developers cannot easily use other LLM
providers that implement the OpenAI Chat Completions API format, such as:

  • Groq (fast inference with open models)
  • Ollama (local models for offline development)
  • Azure OpenAI (enterprise deployments)
  • Perplexity (search-augmented responses)
  • Any custom OpenAI-compatible endpoint

Python ADK supports this via LiteLLM integration, but Java ADK lacks a native solution. This creates friction for developers who
want to:

  • Experiment with different model providers
  • Reduce inference costs by using alternative providers
  • Develop/test locally without external API calls
  • Support enterprise requirements (Azure OpenAI)

Describe the Solution You'd Like

Add a new OpenAiCompatibleLlm class that wraps the existing ChatCompletionsHttpClient to support any OpenAI-compatible endpoint
with a simple builder pattern.

Key capabilities:

  1. Builder pattern for easy configuration (baseUrl, headers, timeout)
  2. Pattern-based registry integration (e.g., groq-.*, ollama-.*)
  3. Reuses existing ChatCompletionsHttpClient infrastructure
  4. Non-streaming requests (matches current ChatCompletionsHttpClient capabilities)
  5. Comprehensive tests (unit + integration + manual verification)

Impact on your work

Impact Level: High

This feature is critical for:

  • My current project: Building a multi-model agent system that needs to switch between Gemini (complex reasoning) and Groq
    (fast responses) based on task type
  • Cost optimization: Groq is 10x cheaper than Gemini for simple tasks
  • Local development: Need Ollama support for offline testing without API costs

Timeline: Would like this in the next release if possible. Currently using a workaround with direct HTTP calls, but it's not
maintainable.

Willingness to contribute

Yes - I have already implemented this feature with:

  • ✅ Core OpenAiCompatibleLlm class with builder pattern
  • ✅ 13 unit tests (builder validation, registry integration, error handling)
  • ✅ Integration tests with Ollama
  • ✅ Manual verification with Groq API (tested successfully)
  • ✅ README documentation with examples
  • ✅ Follows Google Java Style Guide (google-java-format applied)
  • ✅ Aligned with CONTRIBUTING.md requirements

Can submit PR immediately if maintainers approve this approach.


🟡 Recommended Information

Describe Alternatives You've Considered

  1. Direct HTTP calls: Manually building requests to OpenAI-compatible endpoints

    • ❌ Doesn't integrate with LlmRegistry
    • ❌ No pattern matching for model resolution
    • ❌ Duplicates HTTP client logic
    • ❌ Not maintainable
  2. Separate library wrapper: Create external library for each provider

    • ❌ Fragments ecosystem
    • ❌ Each provider needs separate maintenance
    • ❌ Doesn't leverage existing ADK infrastructure
  3. Use Python ADK instead: Switch to Python for LiteLLM support

    • ❌ Not viable for Java-based projects
    • ❌ Team expertise is in Java
  4. Wait for official provider support: Request Google add each provider individually

    • ❌ Slow (requires coordination with each provider)
    • ❌ Doesn't scale to custom/internal endpoints

Why the proposed solution is better: Single implementation supports all OpenAI-compatible providers, reuses existing
infrastructure, and allows custom endpoints.

Proposed API / Implementation

// Example 1: Groq (fast inference)
OpenAiCompatibleLlm groq = OpenAiCompatibleLlm.builder()
    .baseUrl("https://api.groq.com/openai/v1/")
    .headers(ImmutableMap.of("Authorization", "Bearer " + apiKey))
    .modelName("llama-3.3-70b-versatile")
    .timeoutMillis(30_000)
    .build();

// Register pattern for model resolution
groq.registerWithPattern("groq-.*");

// Use with LlmAgent
LlmAgent agent = LlmAgent.builder()
    .model("groq-llama-3.3-70b-versatile")
    .instruction("You are a helpful assistant.")
    .build();

String response = agent.runAsync(invocationContext).blockingFirst();

// Example 2: Ollama (local models)
OpenAiCompatibleLlm ollama = OpenAiCompatibleLlm.builder()
    .baseUrl("http://localhost:11434/v1/")
    .headers(ImmutableMap.of())  // No auth for local
    .modelName("ollama-llama2")
    .build();

ollama.registerWithPattern("ollama-.*");

// Example 3: Azure OpenAI (enterprise)
OpenAiCompatibleLlm azure = OpenAiCompatibleLlm.builder()
    .baseUrl("https://<resource>.openai.azure.com/openai/deployments/<deployment>/")
    .headers(ImmutableMap.of("api-key", azureApiKey))
    .modelName("azure-gpt-4")
    .build();

azure.registerWithPattern("azure-.*");

Implementation Architecture:

  • OpenAiCompatibleLlm extends BaseLlm
  • Wraps existing ChatCompletionsHttpClient (no code duplication)
  • Uses LlmRegistry.registerLlm(pattern, factory) for pattern matching
  • Throws UnsupportedOperationException for live connections (OpenAI API limitation)

Files to be added/modified:
M README.md
A core/src/main/java/com/google/adk/models/OpenAiCompatibleLlm.java
A core/src/test/java/com/google/adk/models/OpenAiCompatibleLlmTest.java
A core/src/test/java/com/google/adk/models/OpenAiCompatibleLlmIntegrationTest.java
A core/src/test/java/com/google/adk/models/ManualGroqTest.java

Additional Context

Testing completed:

  • ✅ Unit tests: 13 tests, all passing
  • ✅ Integration tests: Ollama-based tests for real endpoints
  • ✅ Manual verification: Successfully tested with Groq API (llama-3.3-70b-versatile)
    • Simple completion test
    • Registry integration test
    • Multi-turn conversation with context retention

Design decisions:

  • Non-streaming only: Matches ChatCompletionsHttpClient capabilities (streaming can be added in future PR)
  • Pattern-based registration: Allows multiple providers with different prefixes (e.g., groq-, ollama-, azure-*)
  • No live connections: OpenAI Chat Completions API doesn't support bidirectional live connections

Questions for maintainers:

  1. Should we add streaming support in a follow-up PR, or include it now?
  2. Any concerns with the pattern-based registration approach?
  3. Should integration tests be tagged/skipped in CI (currently requires manual Ollama setup)?

Related: Addresses similar functionality to Python ADK's LiteLLM integration, but using native Java implementation that reuses
existing ADK infrastructure.


Implementation ready: I have a fully tested, working implementation and can submit a PR immediately upon approval of this approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions