Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
<module>url-shortener-demo</module>
<module>jdbc</module>
<module>project-course</module>
<module>slo</module>
</modules>

<dependencyManagement>
Expand Down
90 changes: 90 additions & 0 deletions slo/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Multi-stage Dockerfile for the YDB Java SDK SLO workload.
#
# The image is consumed by the YDB SLO action (`ydb-platform/ydb-slo-action`).
# Two instances run in parallel — current and baseline — under the same chaos
# conditions, and their metrics are compared by the report action.
#
# Build context: a directory that contains TWO checkouts side by side:
# ./ydb-java-sdk the SDK source under test (current or baseline)
# ./ydb-java-examples the SLO workload sources
#
# The CI workflow assembles this layout in a temp directory and passes it as
# the docker build context, so this Dockerfile can build the SDK from source
# and then the workload against that exact build — without ever needing the
# SDK to be published to a remote Maven repository.
#
# Optional build args:
# MAVEN_IMAGE Builder image. Defaults to `maven:3.9-eclipse-temurin-17`.
# RUNTIME_IMAGE Runtime image. Defaults to `eclipse-temurin:17-jre`.

ARG MAVEN_IMAGE=maven:3.9-eclipse-temurin-17
ARG RUNTIME_IMAGE=eclipse-temurin:17-jre

# ---------- builder: install the SDK ---------------------------------------
FROM ${MAVEN_IMAGE} AS sdk-build

WORKDIR /src

# Copy only the SDK checkout into the builder so changes elsewhere in the
# context don't invalidate this layer's cache.
COPY ydb-java-sdk /src/ydb-java-sdk

# Install the SDK (and its BOM) into the in-image local Maven repository at
# /root/.m2/repository. Tests are skipped — the SDK has its own CI for that;
# here we only need the artifacts. We also skip javadoc/source jars because
# the workload doesn't need them.
RUN cd /src/ydb-java-sdk && \
mvn -B -q \
-DskipTests \
-Dmaven.javadoc.skip=true \
-Dmaven.source.skip=true \
-Dgpg.skip=true \
install

# Capture the SDK version into a small file so the next stage can read it
# without parsing the pom again. `help:evaluate` is quiet enough to be safe in
# scripts.
RUN cd /src/ydb-java-sdk && \
mvn -B -q help:evaluate -Dexpression=project.version -DforceStdout > /tmp/sdk.version && \
echo "Built SDK version: $(cat /tmp/sdk.version)"

# ---------- builder: build the workload ------------------------------------
FROM sdk-build AS workload-build

# Copy the examples checkout. We do this in a separate stage so changes to
# the workload code don't invalidate the SDK install layer above.
COPY ydb-java-examples /src/ydb-java-examples

# Override the SDK version pinned in the examples parent pom to point at the
# version we just installed. This lets us test SDK SNAPSHOTs without
# publishing anywhere.
RUN cd /src/ydb-java-examples && \
SDK_VERSION="$(cat /tmp/sdk.version)" && \
echo "Pinning ydb-java-examples to SDK ${SDK_VERSION}" && \
mvn -B -q versions:set-property \
-Dproperty=ydb.sdk.version \
-DnewVersion="${SDK_VERSION}" \
-DgenerateBackupPoms=false

# Build only the slo module (and its required parent/BOM context). The
# examples parent pom lists many modules; `-pl slo -am` keeps the build
# focused on what the workload actually needs.
RUN cd /src/ydb-java-examples && \
mvn -B -q \
-pl slo -am \
-DskipTests \
-Dmaven.javadoc.skip=true \
package

# ---------- runtime --------------------------------------------------------
FROM ${RUNTIME_IMAGE}

WORKDIR /app

# Copy the executable jar plus its transitive dependencies. The slo pom is
# configured to drop dependencies into target/libs and to set the manifest
# Class-Path to libs/, so a single `java -jar` call is enough.
COPY --from=workload-build /src/ydb-java-examples/slo/target/ydb-slo-workload.jar /app/ydb-slo-workload.jar
COPY --from=workload-build /src/ydb-java-examples/slo/target/libs /app/libs

ENTRYPOINT ["java", "-jar", "/app/ydb-slo-workload.jar"]
119 changes: 119 additions & 0 deletions slo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# YDB Java SDK SLO workload

This module contains the workload application used by the [YDB SLO Action](https://github.com/ydb-platform/ydb-slo-action) to test the reliability of the YDB Java SDK under load and chaos.

It is a sibling of the SLO workloads in [`ydb-go-sdk`](https://github.com/ydb-platform/ydb-go-sdk/tree/master/tests/slo) and [`ydb-js-sdk`](https://github.com/ydb-platform/ydb-js-sdk/tree/main/tests/slo): the schema, queries and metrics are kept compatible so reports across SDKs are directly comparable.

## What it does

The workload runs three phases:

1. **Setup** — creates a partitioned KV table and prefills it with rows.
2. **Run** — drives concurrent read and write loops at fixed RPS for the configured duration. Each operation is timed and retried via `tech.ydb.query.tools.SessionRetryContext`; the outcome is recorded as Prometheus-compatible metrics that the action scrapes via OTLP.
3. **Teardown** — drops the workload table even if the run failed, so the cluster is left clean.

While the workload runs, the SLO action injects chaos (node restarts, network black holes, container pauses). The metrics show how well the SDK copes with those failures.

## Metrics

Every metric carries a `ref` label whose value is taken from the `WORKLOAD_REF` environment variable. This is how the report action separates the **current** PR run from the **baseline** run.

Names below are shown in Prometheus form (with underscores). Internally the workload uses the OpenTelemetry naming convention with dots (e.g. `sdk.operations.total`); the OTLP → Prometheus conversion replaces dots with underscores automatically, so this is what you see when you query Prometheus or write rules in `metrics.yaml`.

| Metric | Type | Labels |
| ----------------------------------- | --------------- | ------------------------------------------------------- |
| `sdk_operations_total` | counter | `operation_type`, `operation_status` |
| `sdk_errors_total` | counter | `operation_type`, `error_kind` |
| `sdk_retry_attempts_total` | counter | `operation_type`, `operation_status` |
| `sdk_pending_operations` | up/down counter | `operation_type` |
| `sdk_operation_latency_p50_seconds` | gauge | `operation_type`, `operation_status` (always `success`) |
| `sdk_operation_latency_p95_seconds` | gauge | `operation_type`, `operation_status` (always `success`) |
| `sdk_operation_latency_p99_seconds` | gauge | `operation_type`, `operation_status` (always `success`) |

Latency percentiles are computed from per-operation HDR histograms and reflect only successful operations — failure latency is dominated by retry budgets and timeouts and would mask real SDK regressions during chaos. Counters (`sdk_operations_total`, `sdk_errors_total`) cover both branches, so availability is computed correctly.

## Inputs

The workload reads connection details and run parameters from environment variables provided by the action:

| Variable | Description |
| ------------------------------- | ------------------------------------------------ |
| `YDB_CONNECTION_STRING` | YDB connection string (preferred) |
| `YDB_ENDPOINT` + `YDB_DATABASE` | Legacy, used if `YDB_CONNECTION_STRING` is unset |
| `WORKLOAD_REF` | Value of the `ref` label on every metric |
| `WORKLOAD_NAME` | Workload name (used to compose the table name) |
| `WORKLOAD_DURATION` | Run duration in seconds |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP HTTP endpoint to push metrics to |

KV-specific tunables are passed via the command line and parsed by JCommander:

```
--read-rps <int> Target read RPS (default 1000)
--write-rps <int> Target write RPS (default 100)
--read-timeout-ms <int> Per-attempt read timeout in milliseconds (default 10000)
--write-timeout-ms <int> Per-attempt write timeout in milliseconds (default 10000)
--prefill-count <long> Rows to prefill before the run phase (default 1000)
--partition-size <int> Auto-partitioning partition size in MB (default 1)
--min-partition-count <int> Minimum number of table partitions (default 6)
--max-partition-count <int> Maximum number of table partitions (default 1000)
--duration <int> Override WORKLOAD_DURATION when > 0
```

Unknown flags are ignored, so the workload accepts `workload_current_command` strings designed for other SDKs without erroring.

## How CI uses this module

The CI lives in [`ydb-java-sdk/.github/workflows/slo.yml`](https://github.com/ydb-platform/ydb-java-sdk/blob/master/.github/workflows/slo.yml), not here. The flow is:

1. Check out the SDK PR (`current`) and the merge-base SDK commit (`baseline`).
2. Check out `ydb-java-examples` for the workload sources.
3. For each version, run `.github/scripts/build-slo-image.sh` from the SDK repo. The script assembles a build context with the SDK and examples checkouts side by side and feeds it to [`slo/Dockerfile`](Dockerfile), which:
- Builds the SDK from source and installs it into an in-image local Maven repository.
- Pins `ydb.sdk.version` in the examples parent pom to that version.
- Builds the `slo` module against the freshly-installed SDK.
4. Pass the two images (`ydb-app-current`, `ydb-app-baseline`) to `ydb-platform/ydb-slo-action/init@v2`.
5. After the run, [`ydb-platform/ydb-slo-action/report@v2`](https://github.com/ydb-platform/ydb-slo-action) compares the two and posts a summary to the PR.

The build is fully self-contained — the SDK under test does not need to be published to a remote Maven repository.

## Building locally

The workload can be built standalone against a published SDK version. From the `ydb-java-examples` repository root:

```bash
mvn -pl slo -am -DskipTests package
```

The resulting jar is at `slo/target/ydb-slo-workload.jar`. To run it against a local YDB:

```bash
export YDB_CONNECTION_STRING="grpc://localhost:2136?database=/local"
export WORKLOAD_REF=local
export WORKLOAD_NAME=java-query-kv
export WORKLOAD_DURATION=60

java -jar slo/target/ydb-slo-workload.jar --read-rps 100 --write-rps 10 --prefill-count 100
```

If `OTEL_EXPORTER_OTLP_ENDPOINT` is not set, metrics are still recorded in-process but never exported — handy for verifying that the workload itself runs cleanly before pushing to CI.

## Files

```
slo/
├── Dockerfile Multi-stage build (SDK + workload)
├── pom.xml Maven module descriptor
├── README.md This file
└── src/main/
├── java/tech/ydb/slo/
│ ├── Config.java Reads action env vars
│ ├── Main.java Entry point
│ ├── Metrics.java OTLP metrics + HDR histograms
│ └── kv/
│ ├── KvWorkload.java Setup/run/teardown loop
│ ├── KvWorkloadParams.java JCommander-bound CLI flags
│ ├── Row.java Row data class
│ └── RowGenerator.java Random payload generator
└── resources/
└── log4j2.xml Console logging config
```
90 changes: 90 additions & 0 deletions slo/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>tech.ydb.examples</groupId>
<artifactId>ydb-sdk-examples</artifactId>
<version>1.1.0-SNAPSHOT</version>
</parent>

<artifactId>ydb-slo-workload</artifactId>
<name>YDB SLO workload</name>
<description>SLO workload application for testing YDB Java SDK reliability under load and chaos</description>

<properties>
<jcommander.version>1.82</jcommander.version>
<opentelemetry.version>1.59.0</opentelemetry.version>
<hdrhistogram.version>2.2.2</hdrhistogram.version>
</properties>

<dependencies>
<dependency>
<groupId>tech.ydb</groupId>
<artifactId>ydb-sdk-query</artifactId>
</dependency>

<dependency>
<groupId>com.beust</groupId>
<artifactId>jcommander</artifactId>
<version>${jcommander.version}</version>
</dependency>

<dependency>
<groupId>org.hdrhistogram</groupId>
<artifactId>HdrHistogram</artifactId>
<version>${hdrhistogram.version}</version>
</dependency>

<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-api</artifactId>
<version>${opentelemetry.version}</version>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-sdk</artifactId>
<version>${opentelemetry.version}</version>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-sdk-metrics</artifactId>
<version>${opentelemetry.version}</version>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
<version>${opentelemetry.version}</version>
</dependency>

<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j2-impl</artifactId>
</dependency>
</dependencies>

<build>
<finalName>ydb-slo-workload</finalName>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addClasspath>true</addClasspath>
<classpathPrefix>libs/</classpathPrefix>
<mainClass>tech.ydb.slo.Main</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
</project>
Loading
Loading