Skip to content

Conversation

@yiseungmi87
Copy link

@yiseungmi87 yiseungmi87 commented Dec 18, 2025

This PR introduces improved documentation for new users of SystemDS:

Added

  • quickstart_extended.md - Overview page linking installation and execution docs
  • release_install.md - Clean, updated installation guide for release users
  • source_install.md - Updated guide for building SystemDS from source
  • run_extended.md - Comprehensive execution guide (local, Spark, federated)
  • run.md- Slightly modified

Scope

  • Documentation-only changes
  • No changes to SystemDS code or runtime behavior
  • Existing run.md is intentionally left for compatibility

Purpose

These changes provide clearer onboarding for new SystemDS users and consolidate documentation into a consistent structure.

Let me know if adjustments are desired before merging.

Copy link
Contributor

@janniklinde janniklinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @yiseungmi87 for the good first PR, it seems to be quite clear and understandable so far.

I did not manage to set up SystemDS for Ubuntu by only following your guide (which should be the goal of the install guide) so please have a look into that. You can use a clean docker image to follow your guide and identify possible points of failure. Similarly, please check that for the other operating systems no such weak points exist (if you have windows, maybe try the setup on a new user). Also, I realized that when cloning SystemDS source code via GitHub Desktop on Windows, it might get stuck in the cloning process so we should provide a solution for that (e.g. use 'git' CLI for cloning rather than the app). So far, I have not tested the install for Windows / macOS but will do so once my current comments are resolved.


Download the official release archive from the Apache SystemDS website:

https://apache.org/dyn/closer.lua/systemds/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rather point to https://systemds.apache.org/download

Comment on lines 59 to 73

### 3.1 Extract the Release

```bash
cd /path/to/install
tar -xvf systemds-<VERSION>.tar.gz
cd systemds-<VERSION>
```

### 3.2 Add SystemDS to PATH

```bash
export SYSTEMDS_ROOT=$(pwd)
export PATH="$SYSTEMDS_ROOT/bin:$PATH"
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to follow the guide for ubuntu 22.04 (I set up a fresh docker image with java, tar and wget installed). After downloading and extracting the release, I got stuck with this error.

Docker image I tested on:

FROM ubuntu:22.04

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install -y \
        openjdk-17-jdk \
        ca-certificates \
        wget \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /opt

RUN wget https://dlcdn.apache.org/systemds/3.3.0/systemds-3.3.0-bin.tgz && \
    tar -xzf systemds-3.3.0-bin.tgz && \
    rm systemds-3.3.0-bin.tgz

CMD ["bash"]
root@9385e1a25ddd:/opt# ls
systemds-3.3.0-bin
root@9385e1a25ddd:/opt# cd systemds-3.3.0-bin
root@9385e1a25ddd:/opt/systemds-3.3.0-bin# java -version
openjdk version "17.0.17" 2025-10-21
OpenJDK Runtime Environment (build 17.0.17+10-Ubuntu-122.04)
OpenJDK 64-Bit Server VM (build 17.0.17+10-Ubuntu-122.04, mixed mode, sharing)
root@9385e1a25ddd:/opt/systemds-3.3.0-bin# export SYSTEMDS_ROOT=$(pwd)
root@9385e1a25ddd:/opt/systemds-3.3.0-bin# export PATH="$SYSTEMDS_ROOT/bin:$PATH"
root@9385e1a25ddd:/opt/systemds-3.3.0-bin# systemds -help
Help requested. Will exit after extended usage message!

Usage: /opt/systemds-3.3.0-bin/bin/systemds [-r] [SystemDS.jar] [-f] <dml-filename> [arguments] [-help]

    SystemDS.jar : Specify a custom SystemDS.jar file (this will be prepended
                   to the classpath
                   or fed to spark-submit
    -r           : Spawn a debug server for remote debugging (standalone and
                   spark driver only atm). Default port is 8787 - change within
                   this script if necessary. See SystemDS documentation on how
                   to attach a remote debugger.
    -f           : Optional prefix to the dml-filename for consistency with
                   previous behavior dml-filename : The script file to run.
                   This is mandatory unless running as a federated worker
                   (see below).
    arguments    : The arguments specified after the DML script are passed to
                   SystemDS. Specify parameters that need to go to
                   java/spark-submit by editing this run script.
    -help        : Print this usage message and SystemDS parameter info

Worker Usage: /opt/systemds-3.3.0-bin/bin/systemds [-r] WORKER [SystemDS.jar] <portnumber> [arguments] [-help]

    port         : The port to open for the federated worker.

Federated Monitoring Usage: /opt/systemds-3.3.0-bin/bin/systemds [-r] FEDMONITORING [SystemDS.jar] <portnumber> [arguments] [-help]

    port         : The port to open for the federated monitoring tool.

Set custom launch configuration by setting/editing SYSTEMDS_STANDALONE_OPTS
and/or SYSTEMDS_DISTRIBUTED_OPTS.

Set the environment variable SYSDS_DISTRIBUTED=1 to run spark-submit instead of
local java Set SYSDS_QUIET=1 to omit extra information printed by this run
script.

----------------------------------------------------------------------
Further help on SystemDS arguments:
Error: Unable to access jarfile org.apache.sysds.api.DMLScript

root@9385e1a25ddd:/opt/systemds-3.3.0-bin# cd ..
root@9385e1a25ddd:/opt# echo 'print("Hello World!")' > hello.dml
root@9385e1a25ddd:/opt# systemds -f hello.dml
###############################################################################
#  SYSTEMDS_ROOT= /opt/systemds-3.3.0-bin
#  SYSTEMDS_JAR_FILE= 
#  SYSDS_EXEC_MODE= singlenode
#  CONFIG_FILE= -config /opt/systemds-3.3.0-bin/conf/SystemDS-config.xml
#  LOG4JPROP= -Dlog4j.configuration=file:/opt/systemds-3.3.0-bin/conf/log4j.properties
#  HADOOP_HOME= /opt/systemds-3.3.0-bin/lib/hadoop
#
#  Running script hello.dml locally with opts: 
#  Executing command:    java -Xmx4g -Xms4g -Xmn400m    -Dlog4j.configuration=file:/opt/systemds-3.3.0-bin/conf/log4j.properties   -jar    -f hello.dml   -exec singlenode   -config /opt/systemds-3.3.0-bin/conf/SystemDS-config.xml   
###############################################################################
Error: Invalid or corrupt jarfile hello.dml

Comment on lines 48 to 55
It can be beneficial to enter these into your `~/.profile` or `~/.bashrc` for linux,
(but remember to change `$(pwd` to the full folder path)
or your environment variables in windows to enable reuse between terminals and restarts.

```bash
echo 'export SYSTEMDS_ROOT='$(pwd) >> ~/.bashrc
echo 'export PATH=$SYSTEMDS_ROOT/bin:$PATH' >> ~/.bashrc
```
Copy link
Contributor

@janniklinde janniklinde Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mention that in release_install as well. Otherwise, after restarting the terminal people might get confused when only following quickstart. Also, for prerequisites that are already mentioned in the install, guides reference them rather than repeating the same thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to mention that you can also add the bin folder to PATH. Then you can directly access your last local build through CLI.

@yiseungmi87
Copy link
Author

Thanks again for the detailed feedback @janniklinde !

In addition to the changes in this PR, I verified the guides end-to-end on clean OS environments (Windows, Ubuntu, Mac).
Following only the Quickstart and installation guides, I was able to successfully complete the setup from installation through running the basic Hello World example on all tested systems.

Also, I’ve addressed the remaining minor issues such as markdown rendering issue or duplicate explanations.

Happy to adjust further if anything is still unclear or fails in your setup.

Copy link
Contributor

@janniklinde janniklinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the improvements @yiseungmi87. I left you some more feedback on where you could further refine the quickstart guide

Comment on lines +139 to +150
On some Ubuntu setups (including clean Docker images), running SystemDS directly may fail with `Invalid or corrupt jarfile hello.dml` Error. In this case, explicitly pass the SystemDS JAR shipped with the release.

Locate the JAR in the release root:
```bash
SYSTEMDS_JAR=$(find "$SYSTEMDS_ROOT" -maxdepth 1 -type f -name "systemds-*.jar" | head -n 1)
echo "Using SystemDS JAR: $SYSTEMDS_JAR"
```

Then run:
```bash
systemds "$SYSTEMDS_JAR" -f hello.dml
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core issue is that the script requires an exported SYSTEMDS_JAR_FILE. Then it should work with systemds hello.dml. Please fix this in 3.2

Comment on lines +42 to +54
### 2.2 Set Evironment Variables

To run SystemDS from the command line, configure:
- `SYSTEMDS_ROOT`-> the extracted folder
- Add `%SYSTEMDS_ROOT%\bin` to your `PATH`

Example (PowerShell):
```bash
setx SYSTEMDS_ROOT "C:\path\to\systemds-<VERSION>"
setx PATH "$env:SYSTEMDS_ROOT\bin;$env:PATH"
```

Restart the terminal afterward.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step is unnecessary as we don't use the systemds script anyway


# 1. Install on Windows

First setup java and maven to compile the system note the java version is 17, we suggest using Java OpenJDK 17.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be more consistent with uppercase/lowercase in general (e.g., java vs Java, windows vs Windows)

cd systemds
```

Finally if you want to run systemds from command line, add a SYSTEMDS_ROOT that points to the repository root, and add the bin folder to the path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would not work anyway so leave this out

Comment on lines +44 to +91
# 2. Install on Ubuntu 22.04

First setup java and maven to compile the system note that the java version is 17.

```bash
sudo apt install openjdk-17-jdk
sudo apt install maven
```

Verify the install with:
```bash
java -version
mvn -version
```

This should return something like:
```bash
openjdk 17.0.11 2024-04-16
OpenJDK Runtime Environment Temurin-17.0.11+9 (build 17.0.11+9)
OpenJDK 64-Bit Server VM Temurin-17.0.11+9 (build 17.0.11+9, mixed mode, sharing)

Apache Maven 3.9.9 (8e8579a9e76f7d015ee5ec7bfcdc97d260186937)
Maven home: /home/usr/Programs/maven
Java version: 17.0.11, vendor: Eclipse Adoptium, runtime: /home/usr/Programs/jdk-17.0.11+9
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "6.8.0-59-generic", arch: "amd64", family: "unix"
```

#### Testing

R should be installed to run the test suite, since many tests are constructed to compare output with common R packages.
One option to install this is to follow the guide on the following link: <https://linuxize.com/post/how-to-install-r-on-ubuntu-20-04/>

At the time of writing the commands to install R 4.0.2 are:

```bash
sudo apt install dirmngr gnupg apt-transport-https ca-certificates software-properties-common
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/'
sudo apt install r-base
```

Optionally, you need to install the R dependencies for integration tests, like this:
(use `sudo` mode if the script couldn't write to local R library)

```bash
Rscript ./src/test/scripts/installDependencies.R
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check that this guide works for Ubuntu 24.04 as well (you can again check using docker)

Comment on lines +29 to +45
# 2. Set SYSTEMDS_ROOT and PATH

This step is required for both Release and Source-build installations. Run the following in the root directory of your SystemDS installation:

```bash
export SYSTEMDS_ROOT=$(pwd)
export PATH="$SYSTEMDS_ROOT/bin:$PATH"
```

It can be beneficial to persist these variables in your `~/.profile` or `~/.bashrc`(Linux/macOS) or as environment variables on Windows, so that SystemDS is available across terminal sessions. Make sure to replace the path below with the absolute path to your SystemDS installation.

```bash
echo 'export SYSTEMDS_ROOT=/absolute/path/to/systemds-<VERSION>' >> ~/.bashrc
echo 'export PATH="$SYSTEMDS_ROOT/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
```
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant; Just add one sentence or bullet point to prerequisites to make sure environment variables have been set according to the corresponding install guides

Run:

```bash
systemds -f hello.dml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-f is not needed if set up correctly

Comment on lines +97 to +104
Execute the DML Script:
```bash
systemds -f scripts/algorithms/Univar-Stats.dml -nvargs \
X=data/haberman.data \
TYPES=data/types.csv \
STATS=data/univarOut.mtx \
CONSOLE_OUTPUT=TRUE
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add <path-to-systemds>/scripts/algorithms/... to make it work outside of that dir. Also -f can be removed

Comment on lines +116 to +129
### (Optional) Ubuntu Note: `NoClassDefFoundError` Error / JAR Resolution Issues
On some Ubuntu setups, executing the example may fail with a class loading error such as `NoClassDefFoundError: org/apache/commons/cli/AlreadySelectedException`. This happens when the SystemDS launcher script does not automatically resolve the correct executable JAR. In this case, explicitly pass the SystemDS JAR located in the release root directory:
```bash
SYSTEMDS_JAR=$(find "$SYSTEMDS_ROOT" -maxdepth 1 -type f -name "systemds-*.jar" | head -n 1)
echo "Using SystemDS JAR: $SYSTEMDS_JAR"
```
Then run the example again:
```bash
systemds "$SYSTEMDS_JAR" -f scripts/algorithms/Univar-Stats.dml -nvargs \
X=data/haberman.data \
TYPES=data/types.csv \
STATS=data/univarOut.mtx \
CONSOLE_OUTPUT=TRUE
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be resolved by exporting SYSTEMDS_JAR_FILE, so just point to that fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants