Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Contributing

## Contributing Guidelines

Thank you for considering contributing to the ARX project! We welcome any and all contributions, no matter how big or small. Whether you're fixing a typo or refactoring the entire backend, your contributions are valuable to us.

To ensure a positive and inclusive environment, we kindly ask all contributors to adhere to the following guidelines:

### Code of Conduct

Please review and abide by our [Code of Conduct](./CODE_OF_CONDUCT.md) in all discussions and interactions related to ARX, both within and outside of GitHub. We strive to maintain a safe and respectful space for everyone involved.

## Getting Started

ARX is written in the [Rust](https://rust-lang.org/) Programming language, Familiarity with such is a pre-requisite for writing code.

Before submitting any changes, please ensure it meets the following requirements:

* **It Builds:** the repository can be built with a single `cargo build` within the root folder
* **It's Formatted:** all code is formatted according to the default rust formatting guidelines via `cargo fmt`
* **It's Linted:** `cargo clippy` shows no code issues

Patches can be submitted either directly via Github, or via email to [git@cebbinghaus.com](mailto:git@cebbinghaus.com)

## Documentation

If you would like to contribute to the ARX documentation, please ensure that your changes follow the guidelines outlined in the [docs/README.md](docs/README.md).

## Project Layout

### /common
This is the core implementation of ARX, All of the primitives are defined here as well as a lot of helpers, This will likely be published as a library eventually for programatic use by other applications.

### /client
The client is the CLI used to generate archives and manage commiting / restoring indexes to stores. It utilizes the [clap](https://github.com/clap-rs/clap) commandline parser library, and deals primarily with the filesystem

### /server
This is the HTTP server implementation built on the [axum](https://github.com/tokio-rs/axum) framework utilizing Tokio Async for high concurrent throughput. It is a lighweight Wrapper around the common Store implementation to provide the ability to read and write archives

### /docs
This directory contains all of the ARX documentation. As well as RFCs defining the ARX systems behavior and data layouts
85 changes: 24 additions & 61 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,76 +1,39 @@
# ArtifactRepository
<div align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="docs/assets/banner-dark.svg">
<source media="(prefers-color-scheme: light)" srcset="docs/assets/banner-light.svg">
<img alt="arx icon of cube"
src="docs/assets/icon.svg"
width="70%">
</picture>
</div>

This project aims to create a service, commandline, package/archive format allowing for the creation and distribution of "Artifact". An Artifact is any kind of file structure with some metadata attached to it. It could represent anything, as long as it can be defined through a file hirachy.
The home of the ARX archiving system, It includes the cli, server and documentation / designs.

## Design
---

The ArtifactRepository design is based on a Sha-512 Merkel Tree which serves as the foundation of the Artifact. Each file within an Artifact is content addressed by its sha512 hash and directories are stored in an identical format to git. In general the object structure closely matches that of git.
## Why ARX?

The very top level of an Artifact is called an **Index** (git calls it a commit). This defines the artifact and contains any and all relevant metadata, such as the timestamp of creation. But since it simply contains any and all metadata in a simple key value format similar to HTTP headers, it allows for the same level of flexibility and metadata to be attached to an index.
Some possibilities could include:
* Git has of source that produced output
* Version number of produced artifact
* Deployment configuration (Debug vs Release)
* GPG Signature
* Artifact type
* **Deduplicating:** ARX deduplicates all files being archived, Before they are compressed. This results in smaller archive files and no wasted bandwidth.

One required key however is the `tree` key which defines the hash of the top level tree which forms the root of the artifact. This tree can be iterated over to discover more trees and blobs which together make up the entirety of the artifact.
* **Reliable/Resilient:** Built on the same architecture as Git, ARX Provides data integritry guarantees ensuring all data Extracted is identical to the original.

Trees & Blobs are directly inherited from Git's design and would be interoperable if it weren't for the differing hash sizes.
* **Fast:** Utilizing high performance Rust with a simple core design ensures generating archives is always fast and efficient.

## Benefits
## Quick Start

One of the key benefits of going with a merkel tree approach very similar to git is the automatic deduplication which occurs. Since Artifacts are stored as individual files indexed by their content on the server, One file contained within multiple artifacts (multiple copies of a library) is deduplicated amongst them all and only 1 copy is stored. This also works perfectly for horizontally scaling multiple servers which can operate on the same exact data store without running into conflicts (deletion can still cause problems but that is not the primary focus).
// TODO: Link to Installation documentation / setup instructions

This deduplication extends all the way into the Artifact archive format which thanks to the hashes will also only store a single copy of each file. This makes for an incredibly efficient archive format when many duplicate files are to be expected.
## Getting Help

## Client
// TODO: Create Community channel

The AR Client is a small commandline utility allowing for local creation of Artifacts, Either in the .ar archive format or as a local artifact store similar to the one in the server (useful when creating multiple artifacts on the same machine and deduplication is desired)
## Contributing

It supports Uploading and Downloading Indexes to/from the ArtifactRepository Server which allows for distribution of artifacts amongst clients via the index hash.
See [Contributing](CONTRIBUTING.md) Guidelines

## Server
For a detailed explanation of the ARX design and the archive format see the [design](docs/designs/design.md)

The ArtifactRepository Server is a simple HTTP server which with basic REST calls allows uploading & downloading artifacts. Internally it stores these in its content addressed store which allows for multiple servers to back onto the same data source, enabling horizontal scaling.
## License

One of the key functionalities of a server is that it allows for exactly one upstream to defined which makes the server act in a sort of relay mode. Any artifacts uploaded to it will be mirrored to the upstream, and any artifacts requested will be queried against the upstream if they are not present locally. This allows for multi-tiered caching through the use of machine local, region local and global instances which mirror data between them depending on where the data is required.

Another key consideration is the ability to back a sever onto local file systems as well as S3 compatible object storage API's for global replication and high availability.
## Artifact File Format

The artifact file format `.ar` is an Archive format which is purpose built for artifacts.

It is structured as follows:

| Data | Description |
| ------- | ----------------------------------- |
| [u8; 4] | header / magic number |
| [u8; 2] | compression method |
| [u8; N] | data (compressed with above method) |

With the data layout being as follows

| Section | Description |
| ------------- | ------------- |
| [HEADER] | Archive header |
| [INDEX] | Index file |
| [BLOBS/TREES] | Collection of Blobs & Trees |

The HEADER must be laid out as follows:
| Data | Description |
| ---- | ----------- |
| [entry; N] | Entries |
| [u8; 1] | Null Terminator |

with each entry as follows:

| Data | Description |
| ------- | ----------- |
| [u8;64] | Hash |
| [u64] | Offset |
| [u64] | Length |

All data within the data should be stored in its uncompressed form and taken directly from the binary object records.

A supplementary artifact format `.sar` is entirely identical but without the requirement for every blob/tree to be present. Only those within the HEADER are guaranteed to exist within the archive and as such can aid in cutting down on data transmitted when a server/client is only missing a small number of files.
ARX is distributed under the terms of the [GPL-2.0 License](https://github.com/ArtifactRepository/arx/blob/master/LICENSE) and derives from [Git](https://github.com/git/git)s designs also licensed under GPL-2.0.
Loading
Loading