Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,45 @@
# Changelog

## 1.0.1

### Added

- **`llm_gateway_generate_text()` UDF wrapper for AI-powered DataFrame transformations.**

New method on proxy providers to generate AI completions in DataFrame operations via a built-in UDF.

```python
from datacustomcode import Client
from pyspark.sql.functions import col

client = Client()

# Generate summaries in a DataFrame column
df = df.withColumn(
"summary",
client._proxy.llm_gateway_generate_text(
"Summarize {company}: revenue={revenue}, CEO={ceo}",
{
"company": col("company"),
"revenue": col("revenue"),
"ceo": col("ceo")
},
llmModelId="sfdc_ai__DefaultGPT4Omni",
maxTokens=200
)
)
```

**Local Development:** Returns placeholder string (doesn't execute)
**Production:** Calls a built-in UDF

**Parameters:**
- `template` (str): Prompt template with {placeholder} syntax
- `values` (dict or Column): Dict mapping placeholders to Columns, or pre-built named_struct
- `llmModelId` (str): Model identifier (required, e.g., "sfdc_ai__DefaultGPT4Omni")
- `maxTokens` (int): Maximum tokens that will be spent on this query


## 1.0.0

### Breaking Changes
Expand Down
31 changes: 29 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ You should only need the following methods:
* `write_to_dmo(name, spark_dataframe, write_mode)` – Write to a Data Lake Object by name with a Spark dataframe

For example:
```
```python
from datacustomcode import Client

client = Client()
Expand All @@ -166,10 +166,37 @@ sdf = client.read_dlo('my_DLO')
client.write_to_dlo('output_DLO')
```


> [!WARNING]
> Currently we only support reading from DMOs and writing to DMOs or reading from DLOs and writing to DLOs, but they cannot mix.
Copy link
Copy Markdown
Contributor

@markdlv-sf markdlv-sf Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this warning? I think it should stay in the section above this one.


### LLM Gateway

Generate AI completions in DataFrame transformations using the LLM gateway UDF.

```python
from datacustomcode import Client
from pyspark.sql.functions import col

client = Client()

# Use template with placeholders
df = df.withColumn(
"summary",
client._proxy.llm_gateway_generate_text(
"Summarize {company}: revenue={revenue}, CEO={ceo}",
{
"company": col("company"),
"revenue": col("revenue"),
"ceo": col("ceo")
},
llmModelId="sfdc_ai__DefaultGPT4Omni",
maxTokens=200
)
)
```

> [!WARNING]
> This method returns a placeholder string in local development. It only makes a LLM call and spends tokens when deployed, where it calls the real LLM Gateway service via a built-in UDF.

## CLI

Expand Down
5 changes: 5 additions & 0 deletions src/datacustomcode/proxy/client/LocalProxyClientProvider.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,8 @@ def __init__(self, **kwargs: object) -> None:

def call_llm_gateway(self, llmModelId: str, prompt: str, maxTokens: int) -> str:
return f"Hello, thanks for using {llmModelId}. So many tokens: {maxTokens}"

def llm_gateway_generate_text(
self, template, values, llmModelId: str, maxTokens: int
):
return f"Using Generate Text with {llmModelId} and maxTokens: {maxTokens}"
5 changes: 5 additions & 0 deletions src/datacustomcode/proxy/client/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,8 @@ def __init__(self):

@abstractmethod
def call_llm_gateway(self, llmModelId: str, prompt: str, maxTokens: int) -> str: ...

@abstractmethod
def llm_gateway_generate_text(
self, template, values, llmModelId: str, maxTokens: int
): ...
Loading