Skip to content

Commit cd91b31

Browse files
authored
Merge pull request #1589 from gooddata/jacek/meta
feat(gooddata-sdk): deliver HLL / aggregate-aware LDM surfaces
2 parents efc6b2a + 27dca51 commit cd91b31

2,238 files changed

Lines changed: 58802 additions & 378211 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Makefile

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -61,13 +61,22 @@ endef
6161
.PHONY: _api-client-generate
6262
_api-client-generate:
6363
rm -f schemas/gooddata-api-client.json
64-
cat schemas/gooddata-*.json | jq -S -s 'reduce .[] as $$item ({}; . * $$item) + { tags : ( reduce .[].tags as $$item (null; . + $$item) | unique_by(.name) ) }' | sed '/\u0000/d' > "schemas/gooddata-api-client.json"
64+
# Merge per-domain specs and strip literal NUL bytes that jq decoded from
65+
# escapes in the source (the previous `sed '/.../d'` pattern was a no-op:
66+
# sed BRE/ERE doesn't interpret \uNNNN, so it never matched anything).
67+
cat schemas/gooddata-*.json | jq -S -s 'reduce .[] as $$item ({}; . * $$item) + { tags : ( reduce .[].tags as $$item (null; . + $$item) | unique_by(.name) ) }' | tr -d '\000' > "schemas/gooddata-api-client.json"
68+
# Break the DashboardCompoundConditionItem ↔ children oneOf/allOf cycle that
69+
# crashes openapi-generator-cli v6.6.0 with StackOverflowError in
70+
# recursiveGetDiscriminator (its walker has no visited-set). Parent has no
71+
# own properties, so dropping the redundant `allOf: [{$$ref: parent}]` from
72+
# each child is semantically a no-op.
73+
jq '(.components.schemas.DashboardCompoundComparisonCondition.allOf) |= map(select(.["$$ref"] != "#/components/schemas/DashboardCompoundConditionItem")) | (.components.schemas.DashboardCompoundRangeCondition.allOf) |= map(select(.["$$ref"] != "#/components/schemas/DashboardCompoundConditionItem"))' schemas/gooddata-api-client.json > schemas/gooddata-api-client.json.tmp && mv schemas/gooddata-api-client.json.tmp schemas/gooddata-api-client.json
6574
$(call generate_client,api)
66-
# OpenAPI Generator drops the \x00 literal from regex patterns like ^[^\x00]*$,
67-
# producing the invalid Python regex ^[^]*$. Restore the null-byte escape.
68-
find gooddata-api-client/gooddata_api_client -name '*.py' -exec \
69-
sed -i.bak 's/\^\[\^\]\*\$$/^[^\\x00]*$$/g' {} + && \
70-
find gooddata-api-client/gooddata_api_client -name '*.py.bak' -delete
75+
# Repair regex patterns of the form ^[^\x00]*$ that openapi-generator mangles
76+
# in two ways: sometimes it drops the NUL (leaving the invalid `^[^]*$`),
77+
# sometimes it embeds a literal NUL byte (producing a Python source that
78+
# fails to import with SyntaxError). The helper handles both shapes.
79+
./scripts/postprocess_api_client.py gooddata-api-client/gooddata_api_client
7180

7281
.PHONY: api-client
7382
api-client: download _api-client-generate

docs/content/en/latest/administration/organization/_index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ See [Manage Organizations](https://www.gooddata.ai/docs/cloud/manage-deployment/
1616
* [delete_jwk](./delete_jwk/)
1717
* [get_jwk](./get_jwk/)
1818
* [list_jwks](./list_jwks/)
19+
* [set_hll_type](./set_hll_type/)
20+
* [get_hll_type](./get_hll_type/)
1921

2022
## Example
2123

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
title: "get_hll_type"
3+
linkTitle: "get_hll_type"
4+
weight: 31
5+
no_list: true
6+
superheading: "catalog_organization."
7+
api_ref: "CatalogOrganizationService.get_hll_type"
8+
---
9+
10+
11+
12+
``get_hll_type() -> HLLType | None``
13+
14+
Reads the organization-level `hyperLogLogType` setting. Returns the
15+
configured value (`"Native"` or `"Presto"`), or `None` when the setting
16+
is unset or carries an unrecognized value.
17+
18+
See [`set_hll_type`](../set_hll_type/) for the meaning of each value and
19+
when to choose `"Native"` versus `"Presto"`.
20+
21+
{{% parameters-block title="Returns"%}}
22+
{{< parameter p_name="value" p_type="HLLType | None" >}}
23+
`"Native"`, `"Presto"`, or `None` if unset.
24+
{{< /parameter >}}
25+
{{% /parameters-block %}}
26+
27+
## Example
28+
29+
```python
30+
current = sdk.catalog_organization.get_hll_type()
31+
if current is None:
32+
sdk.catalog_organization.set_hll_type("Native")
33+
```
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
---
2+
title: "set_hll_type"
3+
linkTitle: "set_hll_type"
4+
weight: 30
5+
no_list: true
6+
superheading: "catalog_organization."
7+
api_ref: "CatalogOrganizationService.set_hll_type"
8+
---
9+
10+
11+
12+
``set_hll_type(value: HLLType)``
13+
14+
Sets the organization-level `hyperLogLogType` setting that controls which
15+
HyperLogLog function family the platform uses when generating SQL over HLL
16+
synopses.
17+
18+
The call is idempotent: it updates the existing setting or creates it if
19+
absent.
20+
21+
| value | when to use |
22+
| -- | -- |
23+
| `"Native"` | StarRocks-native HLL functions. The default. Use when synopses are produced by the platform itself or by a StarRocks-native pipeline. |
24+
| `"Presto"` | Presto-compatible HLL functions. Use when synopses arrive from an upstream Presto pipeline — the binary layout and hash family of Presto HLL synopses differ from StarRocks-native. Requires the StarRocks deployment to carry the Presto HLL UDFs. |
25+
26+
{{% parameters-block title="Parameters"%}}
27+
28+
{{< parameter p_name="value" p_type="HLLType" >}}
29+
Either `"Native"` or `"Presto"` (a `Literal` type re-exported from
30+
`gooddata_sdk` as `HLLType`).
31+
{{< /parameter >}}
32+
{{% /parameters-block %}}
33+
34+
{{% parameters-block title="Returns" None="yes"%}}
35+
{{% /parameters-block %}}
36+
37+
## Example
38+
39+
```python
40+
from gooddata_sdk import GoodDataSdk
41+
42+
sdk = GoodDataSdk.create(host="https://demo.gooddata.com", token="<token>")
43+
44+
# Customer ingests pre-aggregated tables whose HLL columns were produced by
45+
# Presto. Switch the org so calcique emits Presto-compatible HLL SQL.
46+
sdk.catalog_organization.set_hll_type("Presto")
47+
```
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
---
2+
title: "AI Lake"
3+
linkTitle: "AI Lake"
4+
weight: 20
5+
no_list: true
6+
---
7+
8+
Drive AI Lake long-running operations from the SDK. Today the surface
9+
covers the actions needed by aggregate-aware logical data models — most
10+
notably the `ANALYZE TABLE` refresh that pre-aggregation workflows rely
11+
on so the cost-based optimizer picks up new statistics.
12+
13+
The AI Lake API uses long-running operations: an action returns
14+
immediately with an `operation_id`, and the client polls until the
15+
operation reaches a terminal status (`succeeded` or `failed`).
16+
17+
### Action Methods
18+
19+
* [analyze_statistics](./analyze_statistics/)
20+
21+
### Operation Methods
22+
23+
* [get_operation](./get_operation/)
24+
* [wait_for_operation](./wait_for_operation/)
25+
26+
## Example
27+
28+
```python
29+
from gooddata_sdk import GoodDataSdk
30+
31+
sdk = GoodDataSdk.create(host="https://demo.gooddata.com", token="<token>")
32+
33+
# Refresh CBO statistics for a single table after a bulk load.
34+
operation_id = sdk.catalog_ai_lake.analyze_statistics(
35+
instance_id="warehouse-prod",
36+
table_names=["agg_orders_country_daily"],
37+
)
38+
39+
# Block until the operation finishes; raises CatalogAILakeOperationError
40+
# on failure and TimeoutError if it doesn't finish in time.
41+
op = sdk.catalog_ai_lake.wait_for_operation(operation_id, timeout_s=600.0)
42+
assert op.is_succeeded
43+
```
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
---
2+
title: "analyze_statistics"
3+
linkTitle: "analyze_statistics"
4+
weight: 10
5+
no_list: true
6+
superheading: "catalog_ai_lake."
7+
api_ref: "CatalogAILakeService.analyze_statistics"
8+
---
9+
10+
11+
12+
``analyze_statistics(instance_id: str, table_names: list[str] | None = None, operation_id: str | None = None) -> str``
13+
14+
Triggers `ANALYZE TABLE` over an AI Lake database instance so the
15+
cost-based optimizer (CBO) picks up fresh statistics. Required after
16+
schema or bulk-data changes — most importantly after registering a
17+
pre-aggregation table whose dimension attributes the platform will later
18+
resolve via filter pushdown.
19+
20+
The call returns immediately with an operation ID; the actual analyze
21+
runs asynchronously. Use [`get_operation`](../get_operation/) or
22+
[`wait_for_operation`](../wait_for_operation/) to poll for completion.
23+
24+
{{% parameters-block title="Parameters"%}}
25+
26+
{{< parameter p_name="instance_id" p_type="str" >}}
27+
Database instance name (preferred) or UUID.
28+
{{< /parameter >}}
29+
{{< parameter p_name="table_names" p_type="list[str] | None" >}}
30+
Tables to analyze. If `None` or empty, every table in the instance is
31+
analyzed. Defaults to `None`.
32+
{{< /parameter >}}
33+
{{< parameter p_name="operation_id" p_type="str | None" >}}
34+
Optional client-supplied operation identifier. If omitted, a fresh UUID
35+
is generated. Pass the same value that subsequent
36+
`get_operation` / `wait_for_operation` calls will poll on.
37+
{{< /parameter >}}
38+
{{% /parameters-block %}}
39+
40+
{{% parameters-block title="Returns"%}}
41+
{{< parameter p_name="operation_id" p_type="str" >}}
42+
The operation ID (UUID string) the platform will track this run under.
43+
{{< /parameter >}}
44+
{{% /parameters-block %}}
45+
46+
## Example
47+
48+
```python
49+
operation_id = sdk.catalog_ai_lake.analyze_statistics(
50+
instance_id="warehouse-prod",
51+
table_names=["agg_orders_country_daily", "agg_orders_country_monthly"],
52+
)
53+
sdk.catalog_ai_lake.wait_for_operation(operation_id)
54+
```
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
---
2+
title: "get_operation"
3+
linkTitle: "get_operation"
4+
weight: 20
5+
no_list: true
6+
superheading: "catalog_ai_lake."
7+
api_ref: "CatalogAILakeService.get_operation"
8+
---
9+
10+
11+
12+
``get_operation(operation_id: str) -> CatalogAILakeOperation``
13+
14+
Fetches the current state of a long-running AI Lake operation.
15+
16+
The returned `CatalogAILakeOperation` carries `id`, `kind`, `status`
17+
(`"pending"`, `"succeeded"`, or `"failed"`), an optional `result` dict
18+
on success, and an optional `error` dict on failure. Use the
19+
`is_terminal` / `is_succeeded` / `is_failed` properties to branch on
20+
status.
21+
22+
{{% parameters-block title="Parameters"%}}
23+
24+
{{< parameter p_name="operation_id" p_type="str" >}}
25+
The operation ID returned by the action that started the operation
26+
(e.g. `analyze_statistics`).
27+
{{< /parameter >}}
28+
{{% /parameters-block %}}
29+
30+
{{% parameters-block title="Returns"%}}
31+
{{< parameter p_name="operation" p_type="CatalogAILakeOperation" >}}
32+
Snapshot of the operation's current status and payload.
33+
{{< /parameter >}}
34+
{{% /parameters-block %}}
35+
36+
## Example
37+
38+
```python
39+
operation_id = sdk.catalog_ai_lake.analyze_statistics(instance_id="warehouse-prod")
40+
op = sdk.catalog_ai_lake.get_operation(operation_id)
41+
if op.is_terminal:
42+
print(op.status, op.result or op.error)
43+
```
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
---
2+
title: "wait_for_operation"
3+
linkTitle: "wait_for_operation"
4+
weight: 30
5+
no_list: true
6+
superheading: "catalog_ai_lake."
7+
api_ref: "CatalogAILakeService.wait_for_operation"
8+
---
9+
10+
11+
12+
``wait_for_operation(operation_id: str, timeout_s: float = 300.0, poll_s: float = 2.0) -> CatalogAILakeOperation``
13+
14+
Blocks until an AI Lake operation reaches a terminal status, polling
15+
every `poll_s` seconds. Returns the final `CatalogAILakeOperation` on
16+
success, raises `CatalogAILakeOperationError` if the operation finishes
17+
in `failed` state, and raises `TimeoutError` if the operation does not
18+
finish within `timeout_s`.
19+
20+
{{% parameters-block title="Parameters"%}}
21+
22+
{{< parameter p_name="operation_id" p_type="str" >}}
23+
The operation ID returned by the action that started the operation.
24+
{{< /parameter >}}
25+
{{< parameter p_name="timeout_s" p_type="float" >}}
26+
Maximum time to wait, in seconds. Defaults to `300.0`.
27+
{{< /parameter >}}
28+
{{< parameter p_name="poll_s" p_type="float" >}}
29+
Sleep between polls, in seconds. Defaults to `2.0`.
30+
{{< /parameter >}}
31+
{{% /parameters-block %}}
32+
33+
{{% parameters-block title="Returns"%}}
34+
{{< parameter p_name="operation" p_type="CatalogAILakeOperation" >}}
35+
The terminal-state operation; `op.is_succeeded` is guaranteed `True`.
36+
{{< /parameter >}}
37+
{{% /parameters-block %}}
38+
39+
{{% parameters-block title="Raises"%}}
40+
{{< parameter p_type="CatalogAILakeOperationError" >}}
41+
Operation finished with `status="failed"`. The exception carries the
42+
full operation snapshot on its `operation` attribute.
43+
{{< /parameter >}}
44+
{{< parameter p_type="TimeoutError" >}}
45+
Operation did not reach a terminal state within `timeout_s`.
46+
{{< /parameter >}}
47+
{{% /parameters-block %}}
48+
49+
## Example
50+
51+
```python
52+
from gooddata_sdk import CatalogAILakeOperationError
53+
54+
operation_id = sdk.catalog_ai_lake.analyze_statistics(
55+
instance_id="warehouse-prod",
56+
table_names=["agg_orders_country_daily"],
57+
)
58+
try:
59+
op = sdk.catalog_ai_lake.wait_for_operation(operation_id, timeout_s=600.0)
60+
except CatalogAILakeOperationError as exc:
61+
print(f"analyze failed: {exc.operation.error}")
62+
except TimeoutError:
63+
print("analyze still pending; resume polling later with get_operation()")
64+
```

0 commit comments

Comments
 (0)