Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions docs/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,13 @@

.Tutorials
* xref:tutorials/java-getting-started.adoc[Java getting started]
* xref:tutorials/kllama-getting-started.adoc[KLlama getting started]
* xref:tutorials/hlo-getting-started.adoc[StableHLO getting started]
* xref:tutorials/graph-dsl.adoc[Graph DSL]

.How-to guides
* xref:how-to/build.adoc[Build from source]
* xref:how-to/io-readers.adoc[Load models (GGUF, SafeTensors, ONNX)]
* xref:how-to/java-cli-app.adoc[Build a Java CLI app]
* xref:how-to/java-llm-inference.adoc[Run LLM inference]
* xref:how-to/java-model-training.adoc[Train a model]
* xref:how-to/java-model-training.adoc[Train a model from Java]
* xref:how-to/arduino-c-codegen.adoc[Generate C for Arduino]

.Reference
Expand Down
6 changes: 3 additions & 3 deletions docs/modules/ROOT/pages/explanation/perf/jvm-cpu.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@ Source files:

===== Prerequisites

* JDK 21{plus} (JDK 22 toolchain configured by Gradle)
* Gradle will pass required JVM flags:
* JDK 21{plus} (CI builds on JDK 25)
* Gradle passes the required JVM flags automatically:
** `--enable-preview`
** `--add-modules jdk.incubator.vector`

For Java 25-specific performance advantages, see link:java-25-cpu-backend.md[Java 25 CPU Backend].
For JDK 25-specific performance advantages, see xref:explanation/perf/java-25-cpu-backend.adoc[Java 25 CPU Backend notes].

===== Feature flags

Expand Down
4 changes: 2 additions & 2 deletions docs/modules/ROOT/pages/how-to/io-readers.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Add the following dependencies to your `build.gradle.kts`:
[source,kotlin]
----
dependencies {
implementation("sk.ainet.core:skainet-io-gguf:0.5.0")
implementation("sk.ainet:skainet-io-gguf:0.19.0")
implementation("org.jetbrains.kotlinx:kotlinx-io-core:0.8.2")
}
----
Expand All @@ -30,7 +30,7 @@ dependencies {
[source,kotlin]
----
dependencies {
implementation("sk.ainet.core:skainet-io-onnx:0.5.0")
implementation("sk.ainet:skainet-io-onnx:0.19.0")
implementation("org.jetbrains.kotlinx:kotlinx-io-core:0.8.2")
implementation("pro.streem.pbandk:pbandk-runtime:0.16.0")
}
Expand Down
311 changes: 22 additions & 289 deletions docs/modules/ROOT/pages/how-to/java-cli-app.adoc
Original file line number Diff line number Diff line change
@@ -1,289 +1,22 @@
== Building a Java CLI App with KLlama

This guide walks you through creating a standalone Java 21{plus} command-line application that loads a LLaMA model and generates text using the KLlama library.

=== Prerequisites

* *JDK 21 or later* (required for Vector API and virtual threads)
* *Maven 3.8{plus}* or *Gradle 8.4{plus}*
* A GGUF model file (e.g., https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF[TinyLlama-1.1B-Chat GGUF])

'''''

=== Project Setup

==== Maven

Create a `pom.xml`:

[source,xml]
----
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>com.example</groupId>
<artifactId>kllama-cli</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>

<properties>
<maven.compiler.source>21</maven.compiler.source>
<maven.compiler.target>21</maven.compiler.target>
<skainet.version>0.13.0</skainet.version>
</properties>

<dependencyManagement>
<dependencies>
<dependency>
<groupId>sk.ainet</groupId>
<artifactId>skainet-bom</artifactId>
<version>${skainet.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>

<dependencies>
<!-- LLaMA inference -->
<dependency>
<groupId>sk.ainet</groupId>
<artifactId>skainet-kllama-jvm</artifactId>
</dependency>

<!-- CPU backend (SIMD-accelerated) -->
<dependency>
<groupId>sk.ainet</groupId>
<artifactId>skainet-backend-cpu-jvm</artifactId>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.11.0</version>
<configuration>
<source>21</source>
<target>21</target>
<compilerArgs>
<arg>--enable-preview</arg>
</compilerArgs>
</configuration>
</plugin>

<!-- Run with: mvn compile exec:java -Dexec.args="..." -->
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>3.1.0</version>
<configuration>
<mainClass>com.example.KLlamaCli</mainClass>
<jvmArgs>
<jvmArg>--enable-preview</jvmArg>
<jvmArg>--add-modules</jvmArg>
<jvmArg>jdk.incubator.vector</jvmArg>
</jvmArgs>
</configuration>
</plugin>

<!-- Fat JAR for distribution -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.5.1</version>
<executions>
<execution>
<phase>package</phase>
<goals><goal>shade</goal></goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.example.KLlamaCli</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
----

==== Gradle

Create a `build.gradle` (Groovy DSL):

[source,groovy]
----
plugins {
id 'java'
id 'application'
}

java {
toolchain {
languageVersion = JavaLanguageVersion.of(21)
}
}

repositories {
mavenCentral()
}

dependencies {
implementation platform('sk.ainet:skainet-bom:0.13.0')
implementation 'sk.ainet:skainet-kllama-jvm'
implementation 'sk.ainet:skainet-backend-cpu-jvm'
}

application {
mainClass = 'com.example.KLlamaCli'
applicationDefaultJvmArgs = [
'--enable-preview',
'--add-modules', 'jdk.incubator.vector'
]
}

tasks.withType(JavaCompile).configureEach {
options.compilerArgs.add('--enable-preview')
}
----

'''''

=== Source Code

Create `src/main/java/com/example/KLlamaCli.java`:

[source,java]
----
package com.example;

import sk.ainet.apps.kllama.java.GenerationConfig;
import sk.ainet.apps.kllama.java.KLlamaJava;
import sk.ainet.apps.kllama.java.KLlamaSession;
import java.nio.file.Path;

public class KLlamaCli {

public static void main(String[] args) {
if (args.length < 2) {
System.err.println("Usage: kllama-cli <model.gguf> \"<prompt>\" [maxTokens] [temperature]");
System.exit(1);
}

Path modelPath = Path.of(args[0]);
String prompt = args[1];
int maxTokens = args.length > 2 ? Integer.parseInt(args[2]) : 128;
float temperature = args.length > 3 ? Float.parseFloat(args[3]) : 0.8f;

GenerationConfig config = GenerationConfig.builder()
.maxTokens(maxTokens)
.temperature(temperature)
.build();

System.out.println("Loading model from " + modelPath + " ...");

try (KLlamaSession session = KLlamaJava.loadGGUF(modelPath)) {
// Stream tokens to stdout as they are generated
session.generate(prompt, config, token -> System.out.print(token));
System.out.println();
}
}
}
----

'''''

=== Building and Running

==== With Maven

[source,bash]
----
# Run directly
mvn compile exec:java -Dexec.args="model.gguf 'Once upon a time' 128 0.7"

# Build fat JAR
mvn package

# Run from JAR
java --enable-preview --add-modules jdk.incubator.vector \
-jar target/kllama-cli-1.0-SNAPSHOT.jar \
model.gguf "Once upon a time" 128 0.7
----

==== With Gradle

[source,bash]
----
# Run directly
./gradlew run --args="model.gguf 'Once upon a time' 128 0.7"

# Build distribution
./gradlew installDist

# Run from distribution
./build/install/kllama-cli/bin/kllama-cli \
model.gguf "Once upon a time" 128 0.7
----

'''''

=== Loading SafeTensors Models

To load a HuggingFace model directory instead of GGUF, use `loadSafeTensors` and point to the directory containing `model.safetensors`, `config.json`, and `tokenizer.json`:

[source,java]
----
try (KLlamaSession session = KLlamaJava.loadSafeTensors(Path.of("./my-llama-model/"))) {
session.generate("Hello", config, token -> System.out.print(token));
System.out.println();
}
----

'''''

=== Async Generation

Use `generateAsync` to run generation on a virtual thread and get a `CompletableFuture`:

[source,java]
----
import java.util.concurrent.CompletableFuture;

try (KLlamaSession session = KLlamaJava.loadGGUF(modelPath)) {
CompletableFuture<String> future = session.generateAsync(
"Explain quantum computing in one sentence",
GenerationConfig.builder().maxTokens(64).build()
);

// Do other work while generation runs...

String result = future.join();
System.out.println(result);
}
----

You can also compose futures:

[source,java]
----
session.generateAsync("Translate to French: Hello world")
.thenAccept(translation -> System.out.println("Translation: " + translation))
.exceptionally(ex -> { ex.printStackTrace(); return null; });
----

'''''

=== Next Steps

* link:java-llm-inference.md[Java LLM Inference Guide] — BERT embeddings, agent/tool-calling, and more.
* link:java-getting-started.md[Java Getting Started] — tensor operations, full Maven/Gradle setup.
* link:../skainet-apps/skainet-kllama/README.md[KLlama Library] — custom backends and Kotlin embedding.
= Build a Java CLI app with KLlama — moved
:description: LLM CLI content moved to SKaiNET-transformers on 2026-04-13.

[CAUTION]
====
**This how-to moved.** The KLlama Java CLI example described here
depends on `sk.ainet:skainet-kllama-jvm` and the
`sk.ainet.apps.kllama.java` package, both of which now live in the
sibling https://github.com/SKaiNET-developers/SKaiNET-transformers[`SKaiNET-transformers`]
repository. Mainline `SKaiNET` kept the engine layer only.
====

Start here instead:

* https://skainet-developers.github.io/SKaiNET-transformers/[`SKaiNET-transformers` documentation site]
* https://github.com/SKaiNET-developers/SKaiNET-transformers[`SKaiNET-transformers` GitHub]

For the **non-LLM Java entry point** (tensor ops, models, training
loops on the CPU backend), the original content from this page's
predecessor guide still applies — it just doesn't cover LLM
inference. See xref:tutorials/java-getting-started.adoc[Java getting started]
and xref:how-to/java-model-training.adoc[Train a model from Java].
Loading
Loading