Skip to content

Use provisioned concurrency to help with cold start #3

@bnusunny

Description

@bnusunny

Thanks for this great example!

To help with cold start, I did some experiments with provisioned concurrency, lazy load the transformers module in sentiment and prime the sentiment method if the provisioned concurrency is enabled. This reduced the cold start time to 1 ~ 2 seconds, and the first predict call complete in about 1 second.

def sentiment(payload):
    from transformers import pipeline
    clf = pipeline("sentiment-analysis", model="model/")
    prediction = clf(payload, return_all_scores=True)

    # convert list to dict
    result = {}
    for pred in prediction[0]:
        result[pred["label"]] = pred["score"]
    return result

# Prime the sentiment function for provisioned concurrency
init_type = os.environ.get("AWS_LAMBDA_INITIALIZATION_TYPE", "on-demand")
if init_type == "provisioned-concurrency":
    payload = json.dumps({"fn_index": 0, "data": [
        "Running Gradio on AWS Lambda is amazing"], "session_hash": "fpx8ngrma3d"})
    sentiment(payload)

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions