Polish Java / JVM consumption surface for 0.19.0 (#400) by michalharakal · Pull Request #498 · SKaiNET-developers/SKaiNET

michalharakal · 2026-04-13T15:00:41Z

Refs #400.

Summary

Makes the three new Kotlin surfaces from 0.19.0 (`StableHloConverterFactory`, `TokenizerFactory`, the `TensorEncoding` metadata helpers on `TensorSpec`) cleanly callable from Java, adds `skainet-backend-api` to the BOM so Java consumers get pinned versions for every module, and extends `skainet-test-java` with a `ReleaseApiJavaTest` that exercises each entry point from real Java code — so any future regression surfaces at compile time instead of at runtime in a downstream consumer's project.

The gap before this PR

`SKaiNET` (the pre-existing `sk.ainet.java.SKaiNET` facade in `skainet-backend-cpu/jvmMain`) was already nicely annotated with `@JvmStatic` / `@JvmOverloads`, and `skainet-test-java` already had 3 Java test files exercising `SKaiNET.context()`, `tensor()`, `zeros()`, model building, and tensor ops. But none of the 0.19.0 additions had gotten the same treatment:

`StableHloConverterFactory` — `public object`, no `@JvmStatic`. Java call sites had to write `StableHloConverterFactory.INSTANCE.createExtended()`.
`TokenizerFactory` — same shape, same problem: `TokenizerFactory.INSTANCE.fromGguf(...)`.
`TensorSpecEncoding.kt` — file of top-level extension functions. Java callers saw the default synthetic class name `TensorSpecEncodingKt`, producing verbose sites like `TensorSpecEncodingKt.getTensorEncoding(spec)`.
BOM was missing `skainet-backend-api` (the neutral module from Add skainet-backend-api module (#468) #470) so Java consumers depending on the BOM didn't get a pinned version for it.

The five commits

1. `1253b42f` — BOM adds skainet-backend-api

`skainet-bom/build.gradle.kts` gains one `api(project(":skainet-backends:skainet-backend-api"))` constraint, grouped with `skainet-backend-cpu` under the backend section. BOM still builds clean.

2. `25be9dc7` — @JvmStatic on StableHloConverterFactory

Every factory method gets `@JvmStatic`, and `createCustom` gets `@JvmOverloads` so every parameter default generates a JVM overload. The annotations live in `commonMain` — Kotlin 1.9+ accepts JVM-specific annotations in common code and treats them as no-ops on non-JVM targets. Verified across jvmTest, wasmJsTest, wasmJsBrowserTest, wasmWasiTest, wasmWasiNodeTest, macosArm64Test, and iosSimulatorArm64Test.

3. `1ebd21b4` — @JvmStatic on TokenizerFactory

Same treatment for both factory entry points (`fromGguf(Map)`, `fromTokenizerJson(String)`). This is the canonical entry for the new Qwen byte-level BPE + SentencePiece tokenizers from #463 / #464, so it's a meaningful Java-side win: Qwen / LLaMA / Gemma / TinyLlama tokenization without Kotlin-specific interop glue.

4. `76cfea29` — @file:JvmName("TensorSpecs") on TensorSpecEncoding.kt

Java callers now see `TensorSpecs.getTensorEncoding(spec)` / `TensorSpecs.withTensorEncoding(spec, TensorEncoding.Q8_0.INSTANCE)` / `TensorSpecs.inferTensorEncoding(tensorData)`, matching the name used in the Kotlin extension-syntax receiver. Kotlin call sites stay unchanged (they go through the extension syntax either way). Pure JVM-side binary-name change.

5. `d6e5f226` — ReleaseApiJavaTest

New JUnit5 Java test covering all three surfaces:

```java
// 1. Converter factory via idiomatic static form
StableHloConverter converter = StableHloConverterFactory.createExtended();

// 2. Tokenizer factory via idiomatic static form (error-path
// invocation — cleanest way to prove static dispatch without a
// real GGUF fixture in the test classpath)
assertThrows(UnsupportedTokenizerException.class,
() -> TokenizerFactory.fromGguf(Collections.emptyMap()));

// 3. TensorSpecs facade for the TensorEncoding helpers
TensorSpec annotated = TensorSpecs.withTensorEncoding(
bare, TensorEncoding.Q8_0.INSTANCE);
assertSame(TensorEncoding.Q8_0.INSTANCE,
TensorSpecs.getTensorEncoding(annotated));
```

`skainet-test-java/build.gradle.kts` gains `skainet-compile-hlo` and `skainet-io-core` on the test classpath so the new test can reference the factories and encoding helpers.

Test plan

`./gradlew :skainet-bom:build` — green (BOM constraint resolves)
`./gradlew :skainet-compile:skainet-compile-hlo:allTests -x kotlinWasmStoreYarnLock` — green across jvmTest, wasmJsTest, wasmJsBrowserTest, wasmWasiTest, wasmWasiNodeTest, macosArm64Test, iosSimulatorArm64Test
`./gradlew :skainet-io:skainet-io-core:{jvmTest,compileKotlinWasmJs,macosArm64Test}` — green
`./gradlew :skainet-lang:skainet-lang-core:{jvmTest,compileKotlinWasmJs,macosArm64Test}` — green
`./gradlew :skainet-test:skainet-test-java:test` — green, including the 4 new tests in `ReleaseApiJavaTest` alongside the 3 pre-existing test classes
CI: full multiplatform build

Net effect for Java consumers of 0.19.0

A Java app pulling in the 0.19.0 BOM can now:

```java
// Execute tensors (pre-existing)
var ctx = SKaiNET.context();
var t = SKaiNET.zeros(ctx, new int[]{2, 3}, DType.fp32());

// Export via StableHLO (0.19.0)
var converter = StableHloConverterFactory.createExtended();
var module = converter.convert(graph, "main");

// Tokenize for Qwen / LLaMA / Gemma / TinyLlama (0.19.0)
var tokenizer = TokenizerFactory.fromGguf(ggufFields);

// Read / write TensorEncoding metadata (0.19.0)
var encoding = TensorSpecs.getTensorEncoding(spec);
var annotated = TensorSpecs.withTensorEncoding(
spec, TensorEncoding.Q8_0.INSTANCE);
```

No `.INSTANCE.` noise, no `TensorSpecEncodingKt` naming, no manual version pins for `skainet-backend-api`.

🤖 Generated with Claude Code

The neutral backend api module landed in #470 as the integration seam for future backends (IREE, Metal, NPU, the NNAPI-Amlogic sibling repo) but it was never added to the BOM's version- alignment constraints. Java / JVM consumers that depend on the BOM were therefore not getting a pinned version for skainet-backend-api, so anyone referencing the module from a Maven / Gradle project had to either spell out the version manually or drop the BOM reliance for that coordinate. Adding the missing `api(project(":skainet-backends:skainet-backend-api"))` constraint groups it with skainet-backend-cpu under the backend section. BOM still builds clean. First of five commits polishing the Java / JVM consumption story for the upcoming 0.19.0 release. See #400. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds `@JvmStatic` to every factory method on the `StableHloConverterFactory` object (`createBasic`, `createExtended`, `createFast`, `createCustom`) plus `@JvmOverloads` on `createCustom` so every parameter default generates a separate JVM overload. Before: Java call sites had to go through the Kotlin singleton marker: var converter = StableHloConverterFactory.INSTANCE.createExtended(); After: Java callers can use the idiomatic static form: var converter = StableHloConverterFactory.createExtended(); The `@JvmStatic` annotation lives in `commonMain` — Kotlin 1.9+ accepts JVM-specific annotations in common code and treats them as no-ops on non-JVM targets. Verified across all Kotlin Multiplatform targets (jvmTest, wasmJsTest, wasmJsBrowserTest, wasmWasiTest, wasmWasiNodeTest, macosArm64Test, iosSimulatorArm64Test) — zero regressions. Second of five commits polishing the Java / JVM consumption story for the upcoming 0.19.0 release. See #400. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds `@JvmStatic` to both factory entry points on the `TokenizerFactory` object (`fromGguf(Map)`, `fromTokenizerJson(String)`). Same motivation as StableHloConverterFactory in the previous commit — without the annotation, Java consumers had to navigate through the Kotlin object's `INSTANCE` marker: var tokenizer = TokenizerFactory.INSTANCE.fromGguf(ggufFields); With the annotation they get the idiomatic static form: var tokenizer = TokenizerFactory.fromGguf(ggufFields); The factory is the canonical entry point for the new Qwen byte-level BPE + SentencePiece tokenizers that landed in #463 and #464, so this is a meaningful win for Java consumers of the upcoming 0.19.0 release — they get Qwen / Llama / Gemma / TinyLlama tokenization without any Kotlin-specific interop glue. Verified across jvmTest, compileKotlinWasmJs, and macosArm64Test for skainet-io-core — no regressions. Third of five commits polishing the Java / JVM consumption story for the upcoming 0.19.0 release. See #400. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds `@file:JvmName("TensorSpecs")` to `skainet-lang-core/.../tensor/ops/TensorSpecEncoding.kt`. The file declares three top-level extension functions used to read and write the `TensorEncoding` metadata that #469 plumbed onto `TensorSpec`: `tensorEncoding`, `withTensorEncoding`, and `inferTensorEncoding`. Top-level extensions in Kotlin compile to static methods on a synthetic class named after the source file — by default `TensorSpecEncodingKt`. Java call sites ended up looking like: TensorEncoding encoding = TensorSpecEncodingKt.getTensorEncoding(spec); TensorSpec annotated = TensorSpecEncodingKt.withTensorEncoding(spec, TensorEncoding.Q8_0.INSTANCE); TensorEncoding data = TensorSpecEncodingKt.inferTensorEncoding(tensorData); With `@file:JvmName("TensorSpecs")` they become: TensorEncoding encoding = TensorSpecs.getTensorEncoding(spec); TensorSpec annotated = TensorSpecs.withTensorEncoding(spec, TensorEncoding.Q8_0.INSTANCE); TensorEncoding data = TensorSpecs.inferTensorEncoding(tensorData); Same Kotlin call sites are unaffected (they see the top-level extension syntax either way) — `spec.tensorEncoding` and `spec.withTensorEncoding(TensorEncoding.Q8_0)` still work unchanged. Pure JVM-side binary name change. Verified with jvmTest, compileKotlinWasmJs, macosArm64Test on skainet-lang-core — no regressions. Fourth of five commits polishing the Java / JVM consumption story for the upcoming 0.19.0 release. See #400. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New JUnit5 Java test in skainet-test-java exercising each of the three Kotlin surfaces polished in the earlier commits of this branch for Java-first-citizenship: - StableHloConverterFactory.createBasic/Extended/Fast() — must be reachable via the idiomatic `Factory.create*()` static form, never through `Factory.INSTANCE.create*()`. The test is effectively a compile-time smoke check: if someone drops the @JvmStatic annotations it fails to compile before any assertion runs. - TokenizerFactory.fromGguf(Map) / fromTokenizerJson(String) — same pattern. Passing empty inputs exercises the error path (UnsupportedTokenizerException), which is the cleanest way to prove static dispatch without needing a real GGUF fixture in the test classpath. - TensorSpecs (the new JvmName-bound class for TensorSpecEncoding.kt): getTensorEncoding / withTensorEncoding called via `TensorSpecs.<name>(spec, ...)` in Java syntax. Verifies the round-trip of TensorEncoding.Q8_0.INSTANCE and confirms withTensorEncoding does not mutate the source spec. Adds skainet-compile-hlo and skainet-io-core to the Java test module's `testImplementation` classpath so the new test can reference the factories + encoding helpers. Existing Java tests (SKaiNETTest, ModelBuilderTest, TensorJavaOpsTest) are untouched. Verified: `./gradlew :skainet-test:skainet-test-java:test` green — all 3 pre-existing tests plus the 4 new tests in ReleaseApiJavaTest. Fifth and final commit polishing the Java / JVM consumption story for the upcoming 0.19.0 release. See #400. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-13T15:03:20Z

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

Operator documentation: docs/modules/operators/_generated_/
JSON schema output: operators.json

Artifacts:

Download the documentation-preview-498 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

michalharakal and others added 5 commits April 13, 2026 16:47

michalharakal merged commit f69444b into develop Apr 13, 2026
7 checks passed

michalharakal deleted the feature/400-jvm-java-release-polish branch April 13, 2026 15:01

michalharakal mentioned this pull request Apr 13, 2026

Make JVM/Java first class citizen with SKaiNET #400

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polish Java / JVM consumption surface for 0.19.0 (#400)#498

Polish Java / JVM consumption surface for 0.19.0 (#400)#498
michalharakal merged 5 commits intodevelopfrom
feature/400-jvm-java-release-polish

michalharakal commented Apr 13, 2026

Uh oh!

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented Apr 13, 2026

Summary

The gap before this PR

The five commits

1. `1253b42f` — BOM adds skainet-backend-api

2. `25be9dc7` — @JvmStatic on StableHloConverterFactory

3. `1ebd21b4` — @JvmStatic on TokenizerFactory

4. `76cfea29` — @file:JvmName("TensorSpecs") on TensorSpecEncoding.kt

5. `d6e5f226` — ReleaseApiJavaTest

Test plan

Net effect for Java consumers of 0.19.0

Uh oh!

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant