All URIs are relative to https://dashboard.quantcdn.io
| Method | HTTP request | Description |
|---|---|---|
| chat_inference | POST /api/v3/organizations/{organisation}/ai/chat | Chat inference via API Gateway (buffered responses) with multimodal support |
| chat_inference_stream | POST /api/v3/organizations/{organisation}/ai/chat/stream | Chat inference via streaming endpoint (true HTTP streaming) with multimodal support |
| embeddings | POST /api/v3/organizations/{organisation}/ai/embeddings | Generate text embeddings for semantic search and RAG applications |
| get_durable_execution_status | GET /api/v3/organizations/{organisation}/ai/chat/executions/{identifier} | Get Durable Execution Status |
| image_generation | POST /api/v3/organizations/{organisation}/ai/image-generation | Generate images with Amazon Nova Canvas |
| submit_tool_callback | POST /api/v3/organizations/{organisation}/ai/chat/callback | Submit Client Tool Results (Callback) |
ChatInference200Response chat_inference(organisation, chat_inference_request)
Chat inference via API Gateway (buffered responses) with multimodal support
Sends requests to the AI API Gateway endpoint which buffers responses. Supports text, images, videos, and documents via base64 encoding.
*
* Execution Modes:
* - Sync Mode (default): Standard JSON response, waits for completion (200 response)
* - Async Mode: Set async: true for long-running tasks with polling (202 response)
*
* Async/Durable Mode (async: true):
* - Returns immediately with requestId and pollUrl (HTTP 202)
* - Uses AWS Lambda Durable Functions for long-running inference
* - Supports client-executed tools via waiting_callback state
* - Poll /ai/chat/executions/{requestId} for status
* - Submit client tool results via /ai/chat/callback
* - Ideal for complex prompts, large contexts, or client-side tools
*
* Multimodal Support:
* - Text: Simple string content
* - Images: Base64-encoded PNG, JPEG, GIF, WebP (up to 25MB)
* - Videos: Base64-encoded MP4, MOV, WebM, etc. (up to 25MB)
* - Documents: Base64-encoded PDF, DOCX, CSV, etc. (up to 25MB)
*
* Supported Models (Multimodal):
* - Claude 4.5 Series: Sonnet 4.5, Haiku 4.5, Opus 4.5 (images, up to 20 per request)
* - Claude 3.5 Series: Sonnet v1/v2 (images, up to 20 per request)
* - Amazon Nova: Lite, Pro, Micro (images, videos, documents)
*
* Usage Tips:
* - Use base64 encoding for images/videos < 5-10MB
* - Place media before text prompts for best results
* - Label multiple media files (e.g., 'Image 1:', 'Image 2:')
* - Maximum 25MB total payload size
*
* Response Patterns:
* - Text-only: Returns simple text response when no tools requested
* - Single tool: Returns toolUse object when AI requests one tool
* - Multiple tools: Returns toolUse array when AI requests multiple tools
* - Auto-execute sync: Automatically executes tool and returns final text response
* - Auto-execute async: Returns toolUse with executionId and status for polling
- Bearer (JWT) Authentication (BearerAuth):
import quantcdn
from quantcdn.models.chat_inference200_response import ChatInference200Response
from quantcdn.models.chat_inference_request import ChatInferenceRequest
from quantcdn.rest import ApiException
from pprint import pprint
# Defining the host is optional and defaults to https://dashboard.quantcdn.io
# See configuration.py for a list of all supported configuration parameters.
configuration = quantcdn.Configuration(
host = "https://dashboard.quantcdn.io"
)
# The client must configure the authentication and authorization parameters
# in accordance with the API server security policy.
# Examples for each auth method are provided below, use the example that
# satisfies your auth use case.
# Configure Bearer authorization (JWT): BearerAuth
configuration = quantcdn.Configuration(
access_token = os.environ["BEARER_TOKEN"]
)
# Enter a context with an instance of the API client
with quantcdn.ApiClient(configuration) as api_client:
# Create an instance of the API class
api_instance = quantcdn.AIInferenceApi(api_client)
organisation = 'organisation_example' # str | The organisation ID
chat_inference_request = quantcdn.ChatInferenceRequest() # ChatInferenceRequest | Chat request with optional multimodal content blocks
try:
# Chat inference via API Gateway (buffered responses) with multimodal support
api_response = api_instance.chat_inference(organisation, chat_inference_request)
print("The response of AIInferenceApi->chat_inference:\n")
pprint(api_response)
except Exception as e:
print("Exception when calling AIInferenceApi->chat_inference: %s\n" % e)| Name | Type | Description | Notes |
|---|---|---|---|
| organisation | str | The organisation ID | |
| chat_inference_request | ChatInferenceRequest | Chat request with optional multimodal content blocks |
- Content-Type: application/json
- Accept: application/json
| Status code | Description | Response headers |
|---|---|---|
| 200 | Chat inference completed (buffered response, sync mode) | - |
| 202 | Async execution started (when `async: true` in request) | - |
| 500 | Failed to perform chat inference | - |
[Back to top] [Back to API list] [Back to Model list] [Back to README]
str chat_inference_stream(organisation, chat_inference_stream_request)
Chat inference via streaming endpoint (true HTTP streaming) with multimodal support
Streams responses from the AI streaming subdomain using Server-Sent Events (SSE). Tokens are streamed in real-time as they are generated.
*
* Execution Modes:
* - Streaming Mode (default): Real-time SSE token-by-token responses
* - Async Mode: Set async: true for long-running tasks with polling (202 response)
*
* Async/Durable Mode (async: true):
* - Returns immediately with requestId and pollUrl (HTTP 202)
* - Uses AWS Lambda Durable Functions for long-running inference
* - Supports client-executed tools via waiting_callback state
* - Poll /ai/chat/executions/{requestId} for status
* - Submit client tool results via /ai/chat/callback
*
* Multimodal Support:
* - Text: Simple string content
* - Images: Base64-encoded PNG, JPEG, GIF, WebP (up to 25MB)
* - Videos: Base64-encoded MP4, MOV, WebM, etc. (up to 25MB)
* - Documents: Base64-encoded PDF, DOCX, CSV, etc. (up to 25MB)
*
* Supported Models (Multimodal):
* - Claude 4.5 Series: Sonnet 4.5, Haiku 4.5, Opus 4.5 (images, up to 20 per request)
* - Claude 3.5 Series: Sonnet v1/v2 (images, up to 20 per request)
* - Amazon Nova: Lite, Pro, Micro (images, videos, documents)
*
* Usage Tips:
* - Use base64 encoding for images/videos < 5-10MB
* - Place media before text prompts for best results
* - Label multiple media files (e.g., 'Image 1:', 'Image 2:')
* - Maximum 25MB total payload size
* - Streaming works with all content types (text, image, video, document)
- Bearer (JWT) Authentication (BearerAuth):
import quantcdn
from quantcdn.models.chat_inference_stream_request import ChatInferenceStreamRequest
from quantcdn.rest import ApiException
from pprint import pprint
# Defining the host is optional and defaults to https://dashboard.quantcdn.io
# See configuration.py for a list of all supported configuration parameters.
configuration = quantcdn.Configuration(
host = "https://dashboard.quantcdn.io"
)
# The client must configure the authentication and authorization parameters
# in accordance with the API server security policy.
# Examples for each auth method are provided below, use the example that
# satisfies your auth use case.
# Configure Bearer authorization (JWT): BearerAuth
configuration = quantcdn.Configuration(
access_token = os.environ["BEARER_TOKEN"]
)
# Enter a context with an instance of the API client
with quantcdn.ApiClient(configuration) as api_client:
# Create an instance of the API class
api_instance = quantcdn.AIInferenceApi(api_client)
organisation = 'organisation_example' # str | The organisation ID
chat_inference_stream_request = quantcdn.ChatInferenceStreamRequest() # ChatInferenceStreamRequest | Chat request with optional multimodal content blocks
try:
# Chat inference via streaming endpoint (true HTTP streaming) with multimodal support
api_response = api_instance.chat_inference_stream(organisation, chat_inference_stream_request)
print("The response of AIInferenceApi->chat_inference_stream:\n")
pprint(api_response)
except Exception as e:
print("Exception when calling AIInferenceApi->chat_inference_stream: %s\n" % e)| Name | Type | Description | Notes |
|---|---|---|---|
| organisation | str | The organisation ID | |
| chat_inference_stream_request | ChatInferenceStreamRequest | Chat request with optional multimodal content blocks |
str
- Content-Type: application/json
- Accept: text/event-stream, application/json
| Status code | Description | Response headers |
|---|---|---|
| 200 | Streaming response (text/event-stream, sync mode) | - |
| 202 | Async execution started (when `async: true` in request) | - |
| 500 | Failed to perform streaming inference | - |
[Back to top] [Back to API list] [Back to Model list] [Back to README]
Embeddings200Response embeddings(organisation, embeddings_request)
Generate text embeddings for semantic search and RAG applications
Generates vector embeddings for text content using embedding models. Used for semantic search, document similarity, and RAG applications. * * Features: * - Single text or batch processing (up to 100 texts) * - Configurable dimensions (256, 512, 1024, 8192 for Titan v2) * - Optional normalization to unit length * - Usage tracking for billing * * Use Cases: * - Semantic search across documents * - Similarity matching for content recommendations * - RAG (Retrieval-Augmented Generation) pipelines * - Clustering and classification * * Available Embedding Models: * - amazon.titan-embed-text-v2:0 (default, supports 256-8192 dimensions) * - amazon.titan-embed-text-v1:0 (1536 dimensions fixed)
- Bearer (JWT) Authentication (BearerAuth):
import quantcdn
from quantcdn.models.embeddings200_response import Embeddings200Response
from quantcdn.models.embeddings_request import EmbeddingsRequest
from quantcdn.rest import ApiException
from pprint import pprint
# Defining the host is optional and defaults to https://dashboard.quantcdn.io
# See configuration.py for a list of all supported configuration parameters.
configuration = quantcdn.Configuration(
host = "https://dashboard.quantcdn.io"
)
# The client must configure the authentication and authorization parameters
# in accordance with the API server security policy.
# Examples for each auth method are provided below, use the example that
# satisfies your auth use case.
# Configure Bearer authorization (JWT): BearerAuth
configuration = quantcdn.Configuration(
access_token = os.environ["BEARER_TOKEN"]
)
# Enter a context with an instance of the API client
with quantcdn.ApiClient(configuration) as api_client:
# Create an instance of the API class
api_instance = quantcdn.AIInferenceApi(api_client)
organisation = 'organisation_example' # str | The organisation ID
embeddings_request = {"input":"The Australian government announced new climate policy","modelId":"amazon.titan-embed-text-v2:0","dimensions":1024,"normalize":true} # EmbeddingsRequest | Embedding request with single or multiple texts
try:
# Generate text embeddings for semantic search and RAG applications
api_response = api_instance.embeddings(organisation, embeddings_request)
print("The response of AIInferenceApi->embeddings:\n")
pprint(api_response)
except Exception as e:
print("Exception when calling AIInferenceApi->embeddings: %s\n" % e)| Name | Type | Description | Notes |
|---|---|---|---|
| organisation | str | The organisation ID | |
| embeddings_request | EmbeddingsRequest | Embedding request with single or multiple texts |
- Content-Type: application/json
- Accept: application/json
| Status code | Description | Response headers |
|---|---|---|
| 200 | Embeddings generated successfully | - |
| 400 | Invalid request parameters | - |
| 403 | Access denied | - |
| 500 | Failed to generate embeddings | - |
[Back to top] [Back to API list] [Back to Model list] [Back to README]
GetDurableExecutionStatus200Response get_durable_execution_status(organisation, identifier)
Get Durable Execution Status
Poll the status of an async/durable chat execution.
*
* When to use: After starting chat inference with async: true, poll this endpoint
* to check execution status and retrieve results when complete.
*
* Identifier: Accepts either:
* - requestId (recommended): The short ID returned from the async request
* - executionArn: The full AWS Lambda durable execution ARN (must be URL-encoded)
*
* Statuses:
* - pending: Execution is starting (retry shortly)
* - running: Execution is in progress
* - waiting_callback: Execution paused, waiting for client tool results
* - complete: Execution finished successfully
* - failed: Execution failed with error
*
* Client Tool Callback:
* When status is waiting_callback, submit tool results via POST /ai/chat/callback.
*
* Polling Recommendations:
* - Start with 1 second delay, exponential backoff up to 30 seconds
* - Stop polling after 15 minutes (consider failed)
- Bearer (JWT) Authentication (BearerAuth):
import quantcdn
from quantcdn.models.get_durable_execution_status200_response import GetDurableExecutionStatus200Response
from quantcdn.rest import ApiException
from pprint import pprint
# Defining the host is optional and defaults to https://dashboard.quantcdn.io
# See configuration.py for a list of all supported configuration parameters.
configuration = quantcdn.Configuration(
host = "https://dashboard.quantcdn.io"
)
# The client must configure the authentication and authorization parameters
# in accordance with the API server security policy.
# Examples for each auth method are provided below, use the example that
# satisfies your auth use case.
# Configure Bearer authorization (JWT): BearerAuth
configuration = quantcdn.Configuration(
access_token = os.environ["BEARER_TOKEN"]
)
# Enter a context with an instance of the API client
with quantcdn.ApiClient(configuration) as api_client:
# Create an instance of the API class
api_instance = quantcdn.AIInferenceApi(api_client)
organisation = 'organisation_example' # str | The organisation ID
identifier = 'XkdVWiEfSwMEPrw=' # str | Either the requestId from async response, or full executionArn (URL-encoded)
try:
# Get Durable Execution Status
api_response = api_instance.get_durable_execution_status(organisation, identifier)
print("The response of AIInferenceApi->get_durable_execution_status:\n")
pprint(api_response)
except Exception as e:
print("Exception when calling AIInferenceApi->get_durable_execution_status: %s\n" % e)| Name | Type | Description | Notes |
|---|---|---|---|
| organisation | str | The organisation ID | |
| identifier | str | Either the requestId from async response, or full executionArn (URL-encoded) |
GetDurableExecutionStatus200Response
- Content-Type: Not defined
- Accept: application/json
| Status code | Description | Response headers |
|---|---|---|
| 200 | Execution status retrieved | - |
| 404 | Execution not found | - |
| 403 | Access denied | - |
| 500 | Failed to retrieve execution status | - |
[Back to top] [Back to API list] [Back to Model list] [Back to README]
ImageGeneration200Response image_generation(organisation, image_generation_request)
Generate images with Amazon Nova Canvas
Generates images using Amazon Nova Canvas image generation model.
*
* Region Restriction: Nova Canvas is ONLY available in:
* - us-east-1 (US East, N. Virginia)
* - ap-northeast-1 (Asia Pacific, Tokyo)
* - eu-west-1 (Europe, Ireland)
* ❌ NOT available in ap-southeast-2 (Sydney)
*
* Supported Task Types:
* - TEXT_IMAGE: Basic text-to-image generation
* - TEXT_IMAGE with Conditioning: Layout-guided generation using edge detection or segmentation
* - COLOR_GUIDED_GENERATION: Generate images with specific color palettes
* - IMAGE_VARIATION: Create variations of existing images
* - INPAINTING: Fill masked areas in images
* - OUTPAINTING: Extend images beyond their borders
* - BACKGROUND_REMOVAL: Remove backgrounds from images
* - VIRTUAL_TRY_ON: Try on garments/objects on people
*
* Quality Options:
* - standard: Faster generation, lower cost
* - premium: Higher quality, slower generation
*
* Timeout: Image generation can take up to 5 minutes
- Bearer (JWT) Authentication (BearerAuth):
import quantcdn
from quantcdn.models.image_generation200_response import ImageGeneration200Response
from quantcdn.models.image_generation_request import ImageGenerationRequest
from quantcdn.rest import ApiException
from pprint import pprint
# Defining the host is optional and defaults to https://dashboard.quantcdn.io
# See configuration.py for a list of all supported configuration parameters.
configuration = quantcdn.Configuration(
host = "https://dashboard.quantcdn.io"
)
# The client must configure the authentication and authorization parameters
# in accordance with the API server security policy.
# Examples for each auth method are provided below, use the example that
# satisfies your auth use case.
# Configure Bearer authorization (JWT): BearerAuth
configuration = quantcdn.Configuration(
access_token = os.environ["BEARER_TOKEN"]
)
# Enter a context with an instance of the API client
with quantcdn.ApiClient(configuration) as api_client:
# Create an instance of the API class
api_instance = quantcdn.AIInferenceApi(api_client)
organisation = 'organisation_example' # str | The organisation ID
image_generation_request = {"taskType":"TEXT_IMAGE","textToImageParams":{"text":"A serene mountain landscape at sunset with snow-capped peaks","negativeText":"blurry, low quality, distorted","style":"PHOTOREALISM"},"imageGenerationConfig":{"width":1024,"height":1024,"quality":"premium","numberOfImages":1,"cfgScale":7},"region":"us-east-1"} # ImageGenerationRequest | Image generation request
try:
# Generate images with Amazon Nova Canvas
api_response = api_instance.image_generation(organisation, image_generation_request)
print("The response of AIInferenceApi->image_generation:\n")
pprint(api_response)
except Exception as e:
print("Exception when calling AIInferenceApi->image_generation: %s\n" % e)| Name | Type | Description | Notes |
|---|---|---|---|
| organisation | str | The organisation ID | |
| image_generation_request | ImageGenerationRequest | Image generation request |
- Content-Type: application/json
- Accept: application/json
| Status code | Description | Response headers |
|---|---|---|
| 200 | Image(s) generated successfully | - |
| 400 | Invalid request parameters | - |
| 403 | Access denied | - |
| 500 | Failed to generate images | - |
[Back to top] [Back to API list] [Back to Model list] [Back to README]
SubmitToolCallback200Response submit_tool_callback(organisation, submit_tool_callback_request)
Submit Client Tool Results (Callback)
Submit tool execution results to resume a suspended durable execution.
*
* When to use: When polling the execution status returns waiting_callback, use this endpoint
* to submit the results of client-executed tools. The execution will then resume.
*
* Flow:
* 1. Start async chat with client-executed tools (autoExecute: [] or tools not in autoExecute list)
* 2. Poll status until waiting_callback
* 3. Execute tools locally using pendingTools from status response
* 4. Submit results here with the callbackId
* 5. Poll status until complete
*
* Important: Each callbackId can only be used once. After submission, poll the execution
* status to see the updated state.
- Bearer (JWT) Authentication (BearerAuth):
import quantcdn
from quantcdn.models.submit_tool_callback200_response import SubmitToolCallback200Response
from quantcdn.models.submit_tool_callback_request import SubmitToolCallbackRequest
from quantcdn.rest import ApiException
from pprint import pprint
# Defining the host is optional and defaults to https://dashboard.quantcdn.io
# See configuration.py for a list of all supported configuration parameters.
configuration = quantcdn.Configuration(
host = "https://dashboard.quantcdn.io"
)
# The client must configure the authentication and authorization parameters
# in accordance with the API server security policy.
# Examples for each auth method are provided below, use the example that
# satisfies your auth use case.
# Configure Bearer authorization (JWT): BearerAuth
configuration = quantcdn.Configuration(
access_token = os.environ["BEARER_TOKEN"]
)
# Enter a context with an instance of the API client
with quantcdn.ApiClient(configuration) as api_client:
# Create an instance of the API class
api_instance = quantcdn.AIInferenceApi(api_client)
organisation = 'organisation_example' # str | The organisation ID
submit_tool_callback_request = quantcdn.SubmitToolCallbackRequest() # SubmitToolCallbackRequest |
try:
# Submit Client Tool Results (Callback)
api_response = api_instance.submit_tool_callback(organisation, submit_tool_callback_request)
print("The response of AIInferenceApi->submit_tool_callback:\n")
pprint(api_response)
except Exception as e:
print("Exception when calling AIInferenceApi->submit_tool_callback: %s\n" % e)| Name | Type | Description | Notes |
|---|---|---|---|
| organisation | str | The organisation ID | |
| submit_tool_callback_request | SubmitToolCallbackRequest |
- Content-Type: application/json
- Accept: application/json
| Status code | Description | Response headers |
|---|---|---|
| 200 | Callback submitted successfully, execution will resume | - |
| 400 | Invalid request (missing callbackId or toolResults) | - |
| 404 | Callback not found or already processed | - |
| 403 | Access denied | - |
| 500 | Failed to submit callback | - |
[Back to top] [Back to API list] [Back to Model list] [Back to README]