-
Notifications
You must be signed in to change notification settings - Fork 520
Docs: Add extended quickstart and installation guides (release + source) #2388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
janniklinde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @yiseungmi87 for the good first PR, it seems to be quite clear and understandable so far.
I did not manage to set up SystemDS for Ubuntu by only following your guide (which should be the goal of the install guide) so please have a look into that. You can use a clean docker image to follow your guide and identify possible points of failure. Similarly, please check that for the other operating systems no such weak points exist (if you have windows, maybe try the setup on a new user). Also, I realized that when cloning SystemDS source code via GitHub Desktop on Windows, it might get stuck in the cloning process so we should provide a solution for that (e.g. use 'git' CLI for cloning rather than the app). So far, I have not tested the install for Windows / macOS but will do so once my current comments are resolved.
docs/site/release_install.md
Outdated
|
|
||
| Download the official release archive from the Apache SystemDS website: | ||
|
|
||
| https://apache.org/dyn/closer.lua/systemds/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rather point to https://systemds.apache.org/download
|
|
||
| ### 3.1 Extract the Release | ||
|
|
||
| ```bash | ||
| cd /path/to/install | ||
| tar -xvf systemds-<VERSION>.tar.gz | ||
| cd systemds-<VERSION> | ||
| ``` | ||
|
|
||
| ### 3.2 Add SystemDS to PATH | ||
|
|
||
| ```bash | ||
| export SYSTEMDS_ROOT=$(pwd) | ||
| export PATH="$SYSTEMDS_ROOT/bin:$PATH" | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to follow the guide for ubuntu 22.04 (I set up a fresh docker image with java, tar and wget installed). After downloading and extracting the release, I got stuck with this error.
Docker image I tested on:
FROM ubuntu:22.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get install -y \
openjdk-17-jdk \
ca-certificates \
wget \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /opt
RUN wget https://dlcdn.apache.org/systemds/3.3.0/systemds-3.3.0-bin.tgz && \
tar -xzf systemds-3.3.0-bin.tgz && \
rm systemds-3.3.0-bin.tgz
CMD ["bash"]
root@9385e1a25ddd:/opt# ls
systemds-3.3.0-bin
root@9385e1a25ddd:/opt# cd systemds-3.3.0-bin
root@9385e1a25ddd:/opt/systemds-3.3.0-bin# java -version
openjdk version "17.0.17" 2025-10-21
OpenJDK Runtime Environment (build 17.0.17+10-Ubuntu-122.04)
OpenJDK 64-Bit Server VM (build 17.0.17+10-Ubuntu-122.04, mixed mode, sharing)
root@9385e1a25ddd:/opt/systemds-3.3.0-bin# export SYSTEMDS_ROOT=$(pwd)
root@9385e1a25ddd:/opt/systemds-3.3.0-bin# export PATH="$SYSTEMDS_ROOT/bin:$PATH"
root@9385e1a25ddd:/opt/systemds-3.3.0-bin# systemds -help
Help requested. Will exit after extended usage message!
Usage: /opt/systemds-3.3.0-bin/bin/systemds [-r] [SystemDS.jar] [-f] <dml-filename> [arguments] [-help]
SystemDS.jar : Specify a custom SystemDS.jar file (this will be prepended
to the classpath
or fed to spark-submit
-r : Spawn a debug server for remote debugging (standalone and
spark driver only atm). Default port is 8787 - change within
this script if necessary. See SystemDS documentation on how
to attach a remote debugger.
-f : Optional prefix to the dml-filename for consistency with
previous behavior dml-filename : The script file to run.
This is mandatory unless running as a federated worker
(see below).
arguments : The arguments specified after the DML script are passed to
SystemDS. Specify parameters that need to go to
java/spark-submit by editing this run script.
-help : Print this usage message and SystemDS parameter info
Worker Usage: /opt/systemds-3.3.0-bin/bin/systemds [-r] WORKER [SystemDS.jar] <portnumber> [arguments] [-help]
port : The port to open for the federated worker.
Federated Monitoring Usage: /opt/systemds-3.3.0-bin/bin/systemds [-r] FEDMONITORING [SystemDS.jar] <portnumber> [arguments] [-help]
port : The port to open for the federated monitoring tool.
Set custom launch configuration by setting/editing SYSTEMDS_STANDALONE_OPTS
and/or SYSTEMDS_DISTRIBUTED_OPTS.
Set the environment variable SYSDS_DISTRIBUTED=1 to run spark-submit instead of
local java Set SYSDS_QUIET=1 to omit extra information printed by this run
script.
----------------------------------------------------------------------
Further help on SystemDS arguments:
Error: Unable to access jarfile org.apache.sysds.api.DMLScript
root@9385e1a25ddd:/opt/systemds-3.3.0-bin# cd ..
root@9385e1a25ddd:/opt# echo 'print("Hello World!")' > hello.dml
root@9385e1a25ddd:/opt# systemds -f hello.dml
###############################################################################
# SYSTEMDS_ROOT= /opt/systemds-3.3.0-bin
# SYSTEMDS_JAR_FILE=
# SYSDS_EXEC_MODE= singlenode
# CONFIG_FILE= -config /opt/systemds-3.3.0-bin/conf/SystemDS-config.xml
# LOG4JPROP= -Dlog4j.configuration=file:/opt/systemds-3.3.0-bin/conf/log4j.properties
# HADOOP_HOME= /opt/systemds-3.3.0-bin/lib/hadoop
#
# Running script hello.dml locally with opts:
# Executing command: java -Xmx4g -Xms4g -Xmn400m -Dlog4j.configuration=file:/opt/systemds-3.3.0-bin/conf/log4j.properties -jar -f hello.dml -exec singlenode -config /opt/systemds-3.3.0-bin/conf/SystemDS-config.xml
###############################################################################
Error: Invalid or corrupt jarfile hello.dml
docs/site/run_extended.md
Outdated
| It can be beneficial to enter these into your `~/.profile` or `~/.bashrc` for linux, | ||
| (but remember to change `$(pwd` to the full folder path) | ||
| or your environment variables in windows to enable reuse between terminals and restarts. | ||
|
|
||
| ```bash | ||
| echo 'export SYSTEMDS_ROOT='$(pwd) >> ~/.bashrc | ||
| echo 'export PATH=$SYSTEMDS_ROOT/bin:$PATH' >> ~/.bashrc | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would mention that in release_install as well. Otherwise, after restarting the terminal people might get confused when only following quickstart. Also, for prerequisites that are already mentioned in the install, guides reference them rather than repeating the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to mention that you can also add the bin folder to PATH. Then you can directly access your last local build through CLI.
|
Thanks again for the detailed feedback @janniklinde ! In addition to the changes in this PR, I verified the guides end-to-end on clean OS environments (Windows, Ubuntu, Mac). Also, I’ve addressed the remaining minor issues such as markdown rendering issue or duplicate explanations. Happy to adjust further if anything is still unclear or fails in your setup. |
janniklinde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the improvements @yiseungmi87. I left you some more feedback on where you could further refine the quickstart guide
| On some Ubuntu setups (including clean Docker images), running SystemDS directly may fail with `Invalid or corrupt jarfile hello.dml` Error. In this case, explicitly pass the SystemDS JAR shipped with the release. | ||
|
|
||
| Locate the JAR in the release root: | ||
| ```bash | ||
| SYSTEMDS_JAR=$(find "$SYSTEMDS_ROOT" -maxdepth 1 -type f -name "systemds-*.jar" | head -n 1) | ||
| echo "Using SystemDS JAR: $SYSTEMDS_JAR" | ||
| ``` | ||
|
|
||
| Then run: | ||
| ```bash | ||
| systemds "$SYSTEMDS_JAR" -f hello.dml | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core issue is that the script requires an exported SYSTEMDS_JAR_FILE. Then it should work with systemds hello.dml. Please fix this in 3.2
| ### 2.2 Set Evironment Variables | ||
|
|
||
| To run SystemDS from the command line, configure: | ||
| - `SYSTEMDS_ROOT`-> the extracted folder | ||
| - Add `%SYSTEMDS_ROOT%\bin` to your `PATH` | ||
|
|
||
| Example (PowerShell): | ||
| ```bash | ||
| setx SYSTEMDS_ROOT "C:\path\to\systemds-<VERSION>" | ||
| setx PATH "$env:SYSTEMDS_ROOT\bin;$env:PATH" | ||
| ``` | ||
|
|
||
| Restart the terminal afterward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step is unnecessary as we don't use the systemds script anyway
|
|
||
| # 1. Install on Windows | ||
|
|
||
| First setup java and maven to compile the system note the java version is 17, we suggest using Java OpenJDK 17. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be more consistent with uppercase/lowercase in general (e.g., java vs Java, windows vs Windows)
| cd systemds | ||
| ``` | ||
|
|
||
| Finally if you want to run systemds from command line, add a SYSTEMDS_ROOT that points to the repository root, and add the bin folder to the path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would not work anyway so leave this out
| # 2. Install on Ubuntu 22.04 | ||
|
|
||
| First setup java and maven to compile the system note that the java version is 17. | ||
|
|
||
| ```bash | ||
| sudo apt install openjdk-17-jdk | ||
| sudo apt install maven | ||
| ``` | ||
|
|
||
| Verify the install with: | ||
| ```bash | ||
| java -version | ||
| mvn -version | ||
| ``` | ||
|
|
||
| This should return something like: | ||
| ```bash | ||
| openjdk 17.0.11 2024-04-16 | ||
| OpenJDK Runtime Environment Temurin-17.0.11+9 (build 17.0.11+9) | ||
| OpenJDK 64-Bit Server VM Temurin-17.0.11+9 (build 17.0.11+9, mixed mode, sharing) | ||
|
|
||
| Apache Maven 3.9.9 (8e8579a9e76f7d015ee5ec7bfcdc97d260186937) | ||
| Maven home: /home/usr/Programs/maven | ||
| Java version: 17.0.11, vendor: Eclipse Adoptium, runtime: /home/usr/Programs/jdk-17.0.11+9 | ||
| Default locale: en_US, platform encoding: UTF-8 | ||
| OS name: "linux", version: "6.8.0-59-generic", arch: "amd64", family: "unix" | ||
| ``` | ||
|
|
||
| #### Testing | ||
|
|
||
| R should be installed to run the test suite, since many tests are constructed to compare output with common R packages. | ||
| One option to install this is to follow the guide on the following link: <https://linuxize.com/post/how-to-install-r-on-ubuntu-20-04/> | ||
|
|
||
| At the time of writing the commands to install R 4.0.2 are: | ||
|
|
||
| ```bash | ||
| sudo apt install dirmngr gnupg apt-transport-https ca-certificates software-properties-common | ||
| sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 | ||
| sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' | ||
| sudo apt install r-base | ||
| ``` | ||
|
|
||
| Optionally, you need to install the R dependencies for integration tests, like this: | ||
| (use `sudo` mode if the script couldn't write to local R library) | ||
|
|
||
| ```bash | ||
| Rscript ./src/test/scripts/installDependencies.R | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check that this guide works for Ubuntu 24.04 as well (you can again check using docker)
| # 2. Set SYSTEMDS_ROOT and PATH | ||
|
|
||
| This step is required for both Release and Source-build installations. Run the following in the root directory of your SystemDS installation: | ||
|
|
||
| ```bash | ||
| export SYSTEMDS_ROOT=$(pwd) | ||
| export PATH="$SYSTEMDS_ROOT/bin:$PATH" | ||
| ``` | ||
|
|
||
| It can be beneficial to persist these variables in your `~/.profile` or `~/.bashrc`(Linux/macOS) or as environment variables on Windows, so that SystemDS is available across terminal sessions. Make sure to replace the path below with the absolute path to your SystemDS installation. | ||
|
|
||
| ```bash | ||
| echo 'export SYSTEMDS_ROOT=/absolute/path/to/systemds-<VERSION>' >> ~/.bashrc | ||
| echo 'export PATH="$SYSTEMDS_ROOT/bin:$PATH"' >> ~/.bashrc | ||
| source ~/.bashrc | ||
| ``` | ||
| --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant; Just add one sentence or bullet point to prerequisites to make sure environment variables have been set according to the corresponding install guides
| Run: | ||
|
|
||
| ```bash | ||
| systemds -f hello.dml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-f is not needed if set up correctly
| Execute the DML Script: | ||
| ```bash | ||
| systemds -f scripts/algorithms/Univar-Stats.dml -nvargs \ | ||
| X=data/haberman.data \ | ||
| TYPES=data/types.csv \ | ||
| STATS=data/univarOut.mtx \ | ||
| CONSOLE_OUTPUT=TRUE | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add <path-to-systemds>/scripts/algorithms/... to make it work outside of that dir. Also -f can be removed
| ### (Optional) Ubuntu Note: `NoClassDefFoundError` Error / JAR Resolution Issues | ||
| On some Ubuntu setups, executing the example may fail with a class loading error such as `NoClassDefFoundError: org/apache/commons/cli/AlreadySelectedException`. This happens when the SystemDS launcher script does not automatically resolve the correct executable JAR. In this case, explicitly pass the SystemDS JAR located in the release root directory: | ||
| ```bash | ||
| SYSTEMDS_JAR=$(find "$SYSTEMDS_ROOT" -maxdepth 1 -type f -name "systemds-*.jar" | head -n 1) | ||
| echo "Using SystemDS JAR: $SYSTEMDS_JAR" | ||
| ``` | ||
| Then run the example again: | ||
| ```bash | ||
| systemds "$SYSTEMDS_JAR" -f scripts/algorithms/Univar-Stats.dml -nvargs \ | ||
| X=data/haberman.data \ | ||
| TYPES=data/types.csv \ | ||
| STATS=data/univarOut.mtx \ | ||
| CONSOLE_OUTPUT=TRUE | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be resolved by exporting SYSTEMDS_JAR_FILE, so just point to that fix
This PR introduces improved documentation for new users of SystemDS:
Added
quickstart_extended.md- Overview page linking installation and execution docsrelease_install.md- Clean, updated installation guide for release userssource_install.md- Updated guide for building SystemDS from sourcerun_extended.md- Comprehensive execution guide (local, Spark, federated)run.md- Slightly modifiedScope
Purpose
These changes provide clearer onboarding for new SystemDS users and consolidate documentation into a consistent structure.
Let me know if adjustments are desired before merging.