Skip to content

Polish Java / JVM consumption surface for 0.19.0 (#400)#498

Merged
michalharakal merged 5 commits intodevelopfrom
feature/400-jvm-java-release-polish
Apr 13, 2026
Merged

Polish Java / JVM consumption surface for 0.19.0 (#400)#498
michalharakal merged 5 commits intodevelopfrom
feature/400-jvm-java-release-polish

Conversation

@michalharakal
Copy link
Copy Markdown
Contributor

Refs #400.

Summary

Makes the three new Kotlin surfaces from 0.19.0 (`StableHloConverterFactory`, `TokenizerFactory`, the `TensorEncoding` metadata helpers on `TensorSpec`) cleanly callable from Java, adds `skainet-backend-api` to the BOM so Java consumers get pinned versions for every module, and extends `skainet-test-java` with a `ReleaseApiJavaTest` that exercises each entry point from real Java code — so any future regression surfaces at compile time instead of at runtime in a downstream consumer's project.

The gap before this PR

`SKaiNET` (the pre-existing `sk.ainet.java.SKaiNET` facade in `skainet-backend-cpu/jvmMain`) was already nicely annotated with `@JvmStatic` / `@JvmOverloads`, and `skainet-test-java` already had 3 Java test files exercising `SKaiNET.context()`, `tensor()`, `zeros()`, model building, and tensor ops. But none of the 0.19.0 additions had gotten the same treatment:

  1. `StableHloConverterFactory` — `public object`, no `@JvmStatic`. Java call sites had to write `StableHloConverterFactory.INSTANCE.createExtended()`.
  2. `TokenizerFactory` — same shape, same problem: `TokenizerFactory.INSTANCE.fromGguf(...)`.
  3. `TensorSpecEncoding.kt` — file of top-level extension functions. Java callers saw the default synthetic class name `TensorSpecEncodingKt`, producing verbose sites like `TensorSpecEncodingKt.getTensorEncoding(spec)`.
  4. BOM was missing `skainet-backend-api` (the neutral module from Add skainet-backend-api module (#468) #470) so Java consumers depending on the BOM didn't get a pinned version for it.

The five commits

1. `1253b42f` — BOM adds skainet-backend-api

`skainet-bom/build.gradle.kts` gains one `api(project(":skainet-backends:skainet-backend-api"))` constraint, grouped with `skainet-backend-cpu` under the backend section. BOM still builds clean.

2. `25be9dc7` — @JvmStatic on StableHloConverterFactory

Every factory method gets `@JvmStatic`, and `createCustom` gets `@JvmOverloads` so every parameter default generates a JVM overload. The annotations live in `commonMain` — Kotlin 1.9+ accepts JVM-specific annotations in common code and treats them as no-ops on non-JVM targets. Verified across jvmTest, wasmJsTest, wasmJsBrowserTest, wasmWasiTest, wasmWasiNodeTest, macosArm64Test, and iosSimulatorArm64Test.

3. `1ebd21b4` — @JvmStatic on TokenizerFactory

Same treatment for both factory entry points (`fromGguf(Map)`, `fromTokenizerJson(String)`). This is the canonical entry for the new Qwen byte-level BPE + SentencePiece tokenizers from #463 / #464, so it's a meaningful Java-side win: Qwen / LLaMA / Gemma / TinyLlama tokenization without Kotlin-specific interop glue.

4. `76cfea29` — @file:JvmName("TensorSpecs") on TensorSpecEncoding.kt

Java callers now see `TensorSpecs.getTensorEncoding(spec)` / `TensorSpecs.withTensorEncoding(spec, TensorEncoding.Q8_0.INSTANCE)` / `TensorSpecs.inferTensorEncoding(tensorData)`, matching the name used in the Kotlin extension-syntax receiver. Kotlin call sites stay unchanged (they go through the extension syntax either way). Pure JVM-side binary-name change.

5. `d6e5f226` — ReleaseApiJavaTest

New JUnit5 Java test covering all three surfaces:

```java
// 1. Converter factory via idiomatic static form
StableHloConverter converter = StableHloConverterFactory.createExtended();

// 2. Tokenizer factory via idiomatic static form (error-path
// invocation — cleanest way to prove static dispatch without a
// real GGUF fixture in the test classpath)
assertThrows(UnsupportedTokenizerException.class,
() -> TokenizerFactory.fromGguf(Collections.emptyMap()));

// 3. TensorSpecs facade for the TensorEncoding helpers
TensorSpec annotated = TensorSpecs.withTensorEncoding(
bare, TensorEncoding.Q8_0.INSTANCE);
assertSame(TensorEncoding.Q8_0.INSTANCE,
TensorSpecs.getTensorEncoding(annotated));
```

`skainet-test-java/build.gradle.kts` gains `skainet-compile-hlo` and `skainet-io-core` on the test classpath so the new test can reference the factories and encoding helpers.

Test plan

  • `./gradlew :skainet-bom:build` — green (BOM constraint resolves)
  • `./gradlew :skainet-compile:skainet-compile-hlo:allTests -x kotlinWasmStoreYarnLock` — green across jvmTest, wasmJsTest, wasmJsBrowserTest, wasmWasiTest, wasmWasiNodeTest, macosArm64Test, iosSimulatorArm64Test
  • `./gradlew :skainet-io:skainet-io-core:{jvmTest,compileKotlinWasmJs,macosArm64Test}` — green
  • `./gradlew :skainet-lang:skainet-lang-core:{jvmTest,compileKotlinWasmJs,macosArm64Test}` — green
  • `./gradlew :skainet-test:skainet-test-java:test` — green, including the 4 new tests in `ReleaseApiJavaTest` alongside the 3 pre-existing test classes
  • CI: full multiplatform build

Net effect for Java consumers of 0.19.0

A Java app pulling in the 0.19.0 BOM can now:

```java
// Execute tensors (pre-existing)
var ctx = SKaiNET.context();
var t = SKaiNET.zeros(ctx, new int[]{2, 3}, DType.fp32());

// Export via StableHLO (0.19.0)
var converter = StableHloConverterFactory.createExtended();
var module = converter.convert(graph, "main");

// Tokenize for Qwen / LLaMA / Gemma / TinyLlama (0.19.0)
var tokenizer = TokenizerFactory.fromGguf(ggufFields);

// Read / write TensorEncoding metadata (0.19.0)
var encoding = TensorSpecs.getTensorEncoding(spec);
var annotated = TensorSpecs.withTensorEncoding(
spec, TensorEncoding.Q8_0.INSTANCE);
```

No `.INSTANCE.` noise, no `TensorSpecEncodingKt` naming, no manual version pins for `skainet-backend-api`.

🤖 Generated with Claude Code

michalharakal and others added 5 commits April 13, 2026 16:47
The neutral backend api module landed in #470 as the integration
seam for future backends (IREE, Metal, NPU, the NNAPI-Amlogic
sibling repo) but it was never added to the BOM's version-
alignment constraints. Java / JVM consumers that depend on the
BOM were therefore not getting a pinned version for
skainet-backend-api, so anyone referencing the module from a
Maven / Gradle project had to either spell out the version
manually or drop the BOM reliance for that coordinate.

Adding the missing `api(project(":skainet-backends:skainet-backend-api"))`
constraint groups it with skainet-backend-cpu under the
backend section. BOM still builds clean.

First of five commits polishing the Java / JVM consumption
story for the upcoming 0.19.0 release. See #400.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `@JvmStatic` to every factory method on the
`StableHloConverterFactory` object (`createBasic`, `createExtended`,
`createFast`, `createCustom`) plus `@JvmOverloads` on `createCustom`
so every parameter default generates a separate JVM overload.

Before: Java call sites had to go through the Kotlin singleton
marker:

    var converter = StableHloConverterFactory.INSTANCE.createExtended();

After: Java callers can use the idiomatic static form:

    var converter = StableHloConverterFactory.createExtended();

The `@JvmStatic` annotation lives in `commonMain` — Kotlin 1.9+
accepts JVM-specific annotations in common code and treats them
as no-ops on non-JVM targets. Verified across all Kotlin
Multiplatform targets (jvmTest, wasmJsTest, wasmJsBrowserTest,
wasmWasiTest, wasmWasiNodeTest, macosArm64Test,
iosSimulatorArm64Test) — zero regressions.

Second of five commits polishing the Java / JVM consumption
story for the upcoming 0.19.0 release. See #400.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `@JvmStatic` to both factory entry points on the
`TokenizerFactory` object (`fromGguf(Map)`, `fromTokenizerJson(String)`).
Same motivation as StableHloConverterFactory in the previous
commit — without the annotation, Java consumers had to navigate
through the Kotlin object's `INSTANCE` marker:

    var tokenizer =
        TokenizerFactory.INSTANCE.fromGguf(ggufFields);

With the annotation they get the idiomatic static form:

    var tokenizer =
        TokenizerFactory.fromGguf(ggufFields);

The factory is the canonical entry point for the new Qwen
byte-level BPE + SentencePiece tokenizers that landed in #463
and #464, so this is a meaningful win for Java consumers of
the upcoming 0.19.0 release — they get Qwen / Llama / Gemma /
TinyLlama tokenization without any Kotlin-specific interop
glue.

Verified across jvmTest, compileKotlinWasmJs, and macosArm64Test
for skainet-io-core — no regressions.

Third of five commits polishing the Java / JVM consumption
story for the upcoming 0.19.0 release. See #400.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `@file:JvmName("TensorSpecs")` to
`skainet-lang-core/.../tensor/ops/TensorSpecEncoding.kt`.

The file declares three top-level extension functions used to
read and write the `TensorEncoding` metadata that #469 plumbed
onto `TensorSpec`: `tensorEncoding`, `withTensorEncoding`, and
`inferTensorEncoding`. Top-level extensions in Kotlin compile to
static methods on a synthetic class named after the source file
— by default `TensorSpecEncodingKt`. Java call sites ended up
looking like:

    TensorEncoding encoding =
        TensorSpecEncodingKt.getTensorEncoding(spec);
    TensorSpec annotated =
        TensorSpecEncodingKt.withTensorEncoding(spec, TensorEncoding.Q8_0.INSTANCE);
    TensorEncoding data =
        TensorSpecEncodingKt.inferTensorEncoding(tensorData);

With `@file:JvmName("TensorSpecs")` they become:

    TensorEncoding encoding = TensorSpecs.getTensorEncoding(spec);
    TensorSpec annotated =
        TensorSpecs.withTensorEncoding(spec, TensorEncoding.Q8_0.INSTANCE);
    TensorEncoding data = TensorSpecs.inferTensorEncoding(tensorData);

Same Kotlin call sites are unaffected (they see the
top-level extension syntax either way) — `spec.tensorEncoding`
and `spec.withTensorEncoding(TensorEncoding.Q8_0)` still work
unchanged. Pure JVM-side binary name change.

Verified with jvmTest, compileKotlinWasmJs, macosArm64Test on
skainet-lang-core — no regressions.

Fourth of five commits polishing the Java / JVM consumption
story for the upcoming 0.19.0 release. See #400.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New JUnit5 Java test in skainet-test-java exercising each of the
three Kotlin surfaces polished in the earlier commits of this
branch for Java-first-citizenship:

  - StableHloConverterFactory.createBasic/Extended/Fast() — must
    be reachable via the idiomatic `Factory.create*()` static
    form, never through `Factory.INSTANCE.create*()`. The test
    is effectively a compile-time smoke check: if someone drops
    the @JvmStatic annotations it fails to compile before any
    assertion runs.

  - TokenizerFactory.fromGguf(Map) / fromTokenizerJson(String) —
    same pattern. Passing empty inputs exercises the error
    path (UnsupportedTokenizerException), which is the cleanest
    way to prove static dispatch without needing a real GGUF
    fixture in the test classpath.

  - TensorSpecs (the new JvmName-bound class for
    TensorSpecEncoding.kt): getTensorEncoding / withTensorEncoding
    called via `TensorSpecs.<name>(spec, ...)` in Java syntax.
    Verifies the round-trip of TensorEncoding.Q8_0.INSTANCE and
    confirms withTensorEncoding does not mutate the source spec.

Adds skainet-compile-hlo and skainet-io-core to the Java test
module's `testImplementation` classpath so the new test can
reference the factories + encoding helpers. Existing Java tests
(SKaiNETTest, ModelBuilderTest, TensorJavaOpsTest) are untouched.

Verified: `./gradlew :skainet-test:skainet-test-java:test` green
— all 3 pre-existing tests plus the 4 new tests in
ReleaseApiJavaTest.

Fifth and final commit polishing the Java / JVM consumption
story for the upcoming 0.19.0 release. See #400.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@michalharakal michalharakal merged commit f69444b into develop Apr 13, 2026
7 checks passed
@michalharakal michalharakal deleted the feature/400-jvm-java-release-polish branch April 13, 2026 15:01
@github-actions
Copy link
Copy Markdown

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

  • Operator documentation: docs/modules/operators/_generated_/
  • JSON schema output: operators.json

Artifacts:

  • Download the documentation-preview-498 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant