quantcdn.AIInferenceApi

All URIs are relative to https://dashboard.quantcdn.io

Method	HTTP request	Description
chat_inference	POST /api/v3/organizations/{organisation}/ai/chat	Chat inference via API Gateway (buffered responses) with multimodal support
chat_inference_stream	POST /api/v3/organizations/{organisation}/ai/chat/stream	Chat inference via streaming endpoint (true HTTP streaming) with multimodal support
embeddings	POST /api/v3/organizations/{organisation}/ai/embeddings	Generate text embeddings for semantic search and RAG applications
get_durable_execution_status	GET /api/v3/organizations/{organisation}/ai/chat/executions/{identifier}	Get Durable Execution Status
image_generation	POST /api/v3/organizations/{organisation}/ai/image-generation	Generate images with Amazon Nova Canvas
submit_tool_callback	POST /api/v3/organizations/{organisation}/ai/chat/callback	Submit Client Tool Results (Callback)

chat_inference

ChatInference200Response chat_inference(organisation, chat_inference_request)

Chat inference via API Gateway (buffered responses) with multimodal support

Sends requests to the AI API Gateway endpoint which buffers responses. Supports text, images, videos, and documents via base64 encoding. * * Execution Modes: * - Sync Mode (default): Standard JSON response, waits for completion (200 response) * - Async Mode: Set async: true for long-running tasks with polling (202 response) * * Async/Durable Mode (async: true): * - Returns immediately with requestId and pollUrl (HTTP 202) * - Uses AWS Lambda Durable Functions for long-running inference * - Supports client-executed tools via waiting_callback state * - Poll /ai/chat/executions/{requestId} for status * - Submit client tool results via /ai/chat/callback * - Ideal for complex prompts, large contexts, or client-side tools * * Multimodal Support: * - Text: Simple string content * - Images: Base64-encoded PNG, JPEG, GIF, WebP (up to 25MB) * - Videos: Base64-encoded MP4, MOV, WebM, etc. (up to 25MB) * - Documents: Base64-encoded PDF, DOCX, CSV, etc. (up to 25MB) * * Supported Models (Multimodal): * - Claude 4.5 Series: Sonnet 4.5, Haiku 4.5, Opus 4.5 (images, up to 20 per request) * - Claude 3.5 Series: Sonnet v1/v2 (images, up to 20 per request) * - Amazon Nova: Lite, Pro, Micro (images, videos, documents) * * Usage Tips: * - Use base64 encoding for images/videos < 5-10MB * - Place media before text prompts for best results * - Label multiple media files (e.g., 'Image 1:', 'Image 2:') * - Maximum 25MB total payload size * * Response Patterns: * - Text-only: Returns simple text response when no tools requested * - Single tool: Returns toolUse object when AI requests one tool * - Multiple tools: Returns toolUse array when AI requests multiple tools * - Auto-execute sync: Automatically executes tool and returns final text response * - Auto-execute async: Returns toolUse with executionId and status for polling

Example

Bearer (JWT) Authentication (BearerAuth):

import quantcdn
from quantcdn.models.chat_inference200_response import ChatInference200Response
from quantcdn.models.chat_inference_request import ChatInferenceRequest
from quantcdn.rest import ApiException
from pprint import pprint

# Defining the host is optional and defaults to https://dashboard.quantcdn.io
# See configuration.py for a list of all supported configuration parameters.
configuration = quantcdn.Configuration(
    host = "https://dashboard.quantcdn.io"
)

# The client must configure the authentication and authorization parameters
# in accordance with the API server security policy.
# Examples for each auth method are provided below, use the example that
# satisfies your auth use case.

# Configure Bearer authorization (JWT): BearerAuth
configuration = quantcdn.Configuration(
    access_token = os.environ["BEARER_TOKEN"]
)

# Enter a context with an instance of the API client
with quantcdn.ApiClient(configuration) as api_client:
    # Create an instance of the API class
    api_instance = quantcdn.AIInferenceApi(api_client)
    organisation = 'organisation_example' # str | The organisation ID
    chat_inference_request = quantcdn.ChatInferenceRequest() # ChatInferenceRequest | Chat request with optional multimodal content blocks

    try:
        # Chat inference via API Gateway (buffered responses) with multimodal support
        api_response = api_instance.chat_inference(organisation, chat_inference_request)
        print("The response of AIInferenceApi->chat_inference:\n")
        pprint(api_response)
    except Exception as e:
        print("Exception when calling AIInferenceApi->chat_inference: %s\n" % e)

Parameters

Name	Type	Description	Notes
organisation	str	The organisation ID
chat_inference_request	ChatInferenceRequest	Chat request with optional multimodal content blocks

Return type

ChatInference200Response

Authorization

BearerAuth

HTTP request headers

Content-Type: application/json
Accept: application/json

HTTP response details

Status code	Description	Response headers
200	Chat inference completed (buffered response, sync mode)	-
202	Async execution started (when `async: true` in request)	-
500	Failed to perform chat inference	-

[Back to top] [Back to API list] [Back to Model list] [Back to README]

chat_inference_stream

str chat_inference_stream(organisation, chat_inference_stream_request)

Chat inference via streaming endpoint (true HTTP streaming) with multimodal support

Streams responses from the AI streaming subdomain using Server-Sent Events (SSE). Tokens are streamed in real-time as they are generated. * * Execution Modes: * - Streaming Mode (default): Real-time SSE token-by-token responses * - Async Mode: Set async: true for long-running tasks with polling (202 response) * * Async/Durable Mode (async: true): * - Returns immediately with requestId and pollUrl (HTTP 202) * - Uses AWS Lambda Durable Functions for long-running inference * - Supports client-executed tools via waiting_callback state * - Poll /ai/chat/executions/{requestId} for status * - Submit client tool results via /ai/chat/callback * * Multimodal Support: * - Text: Simple string content * - Images: Base64-encoded PNG, JPEG, GIF, WebP (up to 25MB) * - Videos: Base64-encoded MP4, MOV, WebM, etc. (up to 25MB) * - Documents: Base64-encoded PDF, DOCX, CSV, etc. (up to 25MB) * * Supported Models (Multimodal): * - Claude 4.5 Series: Sonnet 4.5, Haiku 4.5, Opus 4.5 (images, up to 20 per request) * - Claude 3.5 Series: Sonnet v1/v2 (images, up to 20 per request) * - Amazon Nova: Lite, Pro, Micro (images, videos, documents) * * Usage Tips: * - Use base64 encoding for images/videos < 5-10MB * - Place media before text prompts for best results * - Label multiple media files (e.g., 'Image 1:', 'Image 2:') * - Maximum 25MB total payload size * - Streaming works with all content types (text, image, video, document)

Example

Bearer (JWT) Authentication (BearerAuth):

import quantcdn
from quantcdn.models.chat_inference_stream_request import ChatInferenceStreamRequest
from quantcdn.rest import ApiException
from pprint import pprint

# Defining the host is optional and defaults to https://dashboard.quantcdn.io
# See configuration.py for a list of all supported configuration parameters.
configuration = quantcdn.Configuration(
    host = "https://dashboard.quantcdn.io"
)

# The client must configure the authentication and authorization parameters
# in accordance with the API server security policy.
# Examples for each auth method are provided below, use the example that
# satisfies your auth use case.

# Configure Bearer authorization (JWT): BearerAuth
configuration = quantcdn.Configuration(
    access_token = os.environ["BEARER_TOKEN"]
)

# Enter a context with an instance of the API client
with quantcdn.ApiClient(configuration) as api_client:
    # Create an instance of the API class
    api_instance = quantcdn.AIInferenceApi(api_client)
    organisation = 'organisation_example' # str | The organisation ID
    chat_inference_stream_request = quantcdn.ChatInferenceStreamRequest() # ChatInferenceStreamRequest | Chat request with optional multimodal content blocks

    try:
        # Chat inference via streaming endpoint (true HTTP streaming) with multimodal support
        api_response = api_instance.chat_inference_stream(organisation, chat_inference_stream_request)
        print("The response of AIInferenceApi->chat_inference_stream:\n")
        pprint(api_response)
    except Exception as e:
        print("Exception when calling AIInferenceApi->chat_inference_stream: %s\n" % e)

Parameters

Name	Type	Description	Notes
organisation	str	The organisation ID
chat_inference_stream_request	ChatInferenceStreamRequest	Chat request with optional multimodal content blocks

Return type

str

Authorization

BearerAuth

HTTP request headers

Content-Type: application/json
Accept: text/event-stream, application/json

HTTP response details

Status code	Description	Response headers
200	Streaming response (text/event-stream, sync mode)	-
202	Async execution started (when `async: true` in request)	-
500	Failed to perform streaming inference	-

[Back to top] [Back to API list] [Back to Model list] [Back to README]

embeddings

Embeddings200Response embeddings(organisation, embeddings_request)

Generate text embeddings for semantic search and RAG applications

Generates vector embeddings for text content using embedding models. Used for semantic search, document similarity, and RAG applications. * * Features: * - Single text or batch processing (up to 100 texts) * - Configurable dimensions (256, 512, 1024, 8192 for Titan v2) * - Optional normalization to unit length * - Usage tracking for billing * * Use Cases: * - Semantic search across documents * - Similarity matching for content recommendations * - RAG (Retrieval-Augmented Generation) pipelines * - Clustering and classification * * Available Embedding Models: * - amazon.titan-embed-text-v2:0 (default, supports 256-8192 dimensions) * - amazon.titan-embed-text-v1:0 (1536 dimensions fixed)

Example

Bearer (JWT) Authentication (BearerAuth):

import quantcdn
from quantcdn.models.embeddings200_response import Embeddings200Response
from quantcdn.models.embeddings_request import EmbeddingsRequest
from quantcdn.rest import ApiException
from pprint import pprint

# Defining the host is optional and defaults to https://dashboard.quantcdn.io
# See configuration.py for a list of all supported configuration parameters.
configuration = quantcdn.Configuration(
    host = "https://dashboard.quantcdn.io"
)

# The client must configure the authentication and authorization parameters
# in accordance with the API server security policy.
# Examples for each auth method are provided below, use the example that
# satisfies your auth use case.

# Configure Bearer authorization (JWT): BearerAuth
configuration = quantcdn.Configuration(
    access_token = os.environ["BEARER_TOKEN"]
)

# Enter a context with an instance of the API client
with quantcdn.ApiClient(configuration) as api_client:
    # Create an instance of the API class
    api_instance = quantcdn.AIInferenceApi(api_client)
    organisation = 'organisation_example' # str | The organisation ID
    embeddings_request = {"input":"The Australian government announced new climate policy","modelId":"amazon.titan-embed-text-v2:0","dimensions":1024,"normalize":true} # EmbeddingsRequest | Embedding request with single or multiple texts

    try:
        # Generate text embeddings for semantic search and RAG applications
        api_response = api_instance.embeddings(organisation, embeddings_request)
        print("The response of AIInferenceApi->embeddings:\n")
        pprint(api_response)
    except Exception as e:
        print("Exception when calling AIInferenceApi->embeddings: %s\n" % e)

Parameters

Name	Type	Description	Notes
organisation	str	The organisation ID
embeddings_request	EmbeddingsRequest	Embedding request with single or multiple texts

Return type

Embeddings200Response

Authorization

BearerAuth

HTTP request headers

Content-Type: application/json
Accept: application/json

HTTP response details

Status code	Description	Response headers
200	Embeddings generated successfully	-
400	Invalid request parameters	-
403	Access denied	-
500	Failed to generate embeddings	-

[Back to top] [Back to API list] [Back to Model list] [Back to README]

get_durable_execution_status

GetDurableExecutionStatus200Response get_durable_execution_status(organisation, identifier)

Get Durable Execution Status

Poll the status of an async/durable chat execution. * * When to use: After starting chat inference with async: true, poll this endpoint * to check execution status and retrieve results when complete. * * Identifier: Accepts either: * - requestId (recommended): The short ID returned from the async request * - executionArn: The full AWS Lambda durable execution ARN (must be URL-encoded) * * Statuses: * - pending: Execution is starting (retry shortly) * - running: Execution is in progress * - waiting_callback: Execution paused, waiting for client tool results * - complete: Execution finished successfully * - failed: Execution failed with error * * Client Tool Callback: * When status is waiting_callback, submit tool results via POST /ai/chat/callback. * * Polling Recommendations: * - Start with 1 second delay, exponential backoff up to 30 seconds * - Stop polling after 15 minutes (consider failed)

Example

Bearer (JWT) Authentication (BearerAuth):

import quantcdn
from quantcdn.models.get_durable_execution_status200_response import GetDurableExecutionStatus200Response
from quantcdn.rest import ApiException
from pprint import pprint

# Defining the host is optional and defaults to https://dashboard.quantcdn.io
# See configuration.py for a list of all supported configuration parameters.
configuration = quantcdn.Configuration(
    host = "https://dashboard.quantcdn.io"
)

# The client must configure the authentication and authorization parameters
# in accordance with the API server security policy.
# Examples for each auth method are provided below, use the example that
# satisfies your auth use case.

# Configure Bearer authorization (JWT): BearerAuth
configuration = quantcdn.Configuration(
    access_token = os.environ["BEARER_TOKEN"]
)

# Enter a context with an instance of the API client
with quantcdn.ApiClient(configuration) as api_client:
    # Create an instance of the API class
    api_instance = quantcdn.AIInferenceApi(api_client)
    organisation = 'organisation_example' # str | The organisation ID
    identifier = 'XkdVWiEfSwMEPrw=' # str | Either the requestId from async response, or full executionArn (URL-encoded)

    try:
        # Get Durable Execution Status
        api_response = api_instance.get_durable_execution_status(organisation, identifier)
        print("The response of AIInferenceApi->get_durable_execution_status:\n")
        pprint(api_response)
    except Exception as e:
        print("Exception when calling AIInferenceApi->get_durable_execution_status: %s\n" % e)

Parameters

Name	Type	Description	Notes
organisation	str	The organisation ID
identifier	str	Either the requestId from async response, or full executionArn (URL-encoded)

Return type

GetDurableExecutionStatus200Response

Authorization

BearerAuth

HTTP request headers

Content-Type: Not defined
Accept: application/json

HTTP response details

Status code	Description	Response headers
200	Execution status retrieved	-
404	Execution not found	-
403	Access denied	-
500	Failed to retrieve execution status	-

[Back to top] [Back to API list] [Back to Model list] [Back to README]

image_generation

ImageGeneration200Response image_generation(organisation, image_generation_request)

Generate images with Amazon Nova Canvas

Generates images using Amazon Nova Canvas image generation model. * * Region Restriction: Nova Canvas is ONLY available in: * - us-east-1 (US East, N. Virginia) * - ap-northeast-1 (Asia Pacific, Tokyo) * - eu-west-1 (Europe, Ireland) * ❌ NOT available in ap-southeast-2 (Sydney) * * Supported Task Types: * - TEXT_IMAGE: Basic text-to-image generation * - TEXT_IMAGE with Conditioning: Layout-guided generation using edge detection or segmentation * - COLOR_GUIDED_GENERATION: Generate images with specific color palettes * - IMAGE_VARIATION: Create variations of existing images * - INPAINTING: Fill masked areas in images * - OUTPAINTING: Extend images beyond their borders * - BACKGROUND_REMOVAL: Remove backgrounds from images * - VIRTUAL_TRY_ON: Try on garments/objects on people * * Quality Options: * - standard: Faster generation, lower cost * - premium: Higher quality, slower generation * * Timeout: Image generation can take up to 5 minutes

Example

Bearer (JWT) Authentication (BearerAuth):

import quantcdn
from quantcdn.models.image_generation200_response import ImageGeneration200Response
from quantcdn.models.image_generation_request import ImageGenerationRequest
from quantcdn.rest import ApiException
from pprint import pprint

# Defining the host is optional and defaults to https://dashboard.quantcdn.io
# See configuration.py for a list of all supported configuration parameters.
configuration = quantcdn.Configuration(
    host = "https://dashboard.quantcdn.io"
)

# The client must configure the authentication and authorization parameters
# in accordance with the API server security policy.
# Examples for each auth method are provided below, use the example that
# satisfies your auth use case.

# Configure Bearer authorization (JWT): BearerAuth
configuration = quantcdn.Configuration(
    access_token = os.environ["BEARER_TOKEN"]
)

# Enter a context with an instance of the API client
with quantcdn.ApiClient(configuration) as api_client:
    # Create an instance of the API class
    api_instance = quantcdn.AIInferenceApi(api_client)
    organisation = 'organisation_example' # str | The organisation ID
    image_generation_request = {"taskType":"TEXT_IMAGE","textToImageParams":{"text":"A serene mountain landscape at sunset with snow-capped peaks","negativeText":"blurry, low quality, distorted","style":"PHOTOREALISM"},"imageGenerationConfig":{"width":1024,"height":1024,"quality":"premium","numberOfImages":1,"cfgScale":7},"region":"us-east-1"} # ImageGenerationRequest | Image generation request

    try:
        # Generate images with Amazon Nova Canvas
        api_response = api_instance.image_generation(organisation, image_generation_request)
        print("The response of AIInferenceApi->image_generation:\n")
        pprint(api_response)
    except Exception as e:
        print("Exception when calling AIInferenceApi->image_generation: %s\n" % e)

Parameters

Name	Type	Description	Notes
organisation	str	The organisation ID
image_generation_request	ImageGenerationRequest	Image generation request

Return type

ImageGeneration200Response

Authorization

BearerAuth

HTTP request headers

Content-Type: application/json
Accept: application/json

HTTP response details

Status code	Description	Response headers
200	Image(s) generated successfully	-
400	Invalid request parameters	-
403	Access denied	-
500	Failed to generate images	-

[Back to top] [Back to API list] [Back to Model list] [Back to README]

submit_tool_callback

SubmitToolCallback200Response submit_tool_callback(organisation, submit_tool_callback_request)

Submit Client Tool Results (Callback)

Submit tool execution results to resume a suspended durable execution. * * When to use: When polling the execution status returns waiting_callback, use this endpoint * to submit the results of client-executed tools. The execution will then resume. * * Flow: * 1. Start async chat with client-executed tools (autoExecute: [] or tools not in autoExecute list) * 2. Poll status until waiting_callback * 3. Execute tools locally using pendingTools from status response * 4. Submit results here with the callbackId * 5. Poll status until complete * * Important: Each callbackId can only be used once. After submission, poll the execution * status to see the updated state.

Example

Bearer (JWT) Authentication (BearerAuth):

import quantcdn
from quantcdn.models.submit_tool_callback200_response import SubmitToolCallback200Response
from quantcdn.models.submit_tool_callback_request import SubmitToolCallbackRequest
from quantcdn.rest import ApiException
from pprint import pprint

# Defining the host is optional and defaults to https://dashboard.quantcdn.io
# See configuration.py for a list of all supported configuration parameters.
configuration = quantcdn.Configuration(
    host = "https://dashboard.quantcdn.io"
)

# The client must configure the authentication and authorization parameters
# in accordance with the API server security policy.
# Examples for each auth method are provided below, use the example that
# satisfies your auth use case.

# Configure Bearer authorization (JWT): BearerAuth
configuration = quantcdn.Configuration(
    access_token = os.environ["BEARER_TOKEN"]
)

# Enter a context with an instance of the API client
with quantcdn.ApiClient(configuration) as api_client:
    # Create an instance of the API class
    api_instance = quantcdn.AIInferenceApi(api_client)
    organisation = 'organisation_example' # str | The organisation ID
    submit_tool_callback_request = quantcdn.SubmitToolCallbackRequest() # SubmitToolCallbackRequest | 

    try:
        # Submit Client Tool Results (Callback)
        api_response = api_instance.submit_tool_callback(organisation, submit_tool_callback_request)
        print("The response of AIInferenceApi->submit_tool_callback:\n")
        pprint(api_response)
    except Exception as e:
        print("Exception when calling AIInferenceApi->submit_tool_callback: %s\n" % e)

Parameters

Name	Type	Description	Notes
organisation	str	The organisation ID
submit_tool_callback_request	SubmitToolCallbackRequest

Return type

SubmitToolCallback200Response

Authorization

BearerAuth

HTTP request headers

Content-Type: application/json
Accept: application/json

HTTP response details

Status code	Description	Response headers
200	Callback submitted successfully, execution will resume	-
400	Invalid request (missing callbackId or toolResults)	-
404	Callback not found or already processed	-
403	Access denied	-
500	Failed to submit callback	-

[Back to top] [Back to API list] [Back to Model list] [Back to README]

FilesExpand file tree

AIInferenceApi.md

Latest commit

History

AIInferenceApi.md

File metadata and controls

quantcdn.AIInferenceApi

chat_inference

Example

Parameters

Return type

Authorization

HTTP request headers

HTTP response details

chat_inference_stream

Example

Parameters

Return type

Authorization

HTTP request headers

HTTP response details

embeddings

Example

Parameters

Return type

Authorization

HTTP request headers

HTTP response details

get_durable_execution_status

Example

Parameters

Return type

Authorization

HTTP request headers

HTTP response details

image_generation

Example

Parameters

Return type

Authorization

HTTP request headers

HTTP response details

submit_tool_callback

Example

Parameters

Return type

Authorization

HTTP request headers

HTTP response details