Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 73 additions & 37 deletions docs/admin/administration.md

Large diffs are not rendered by default.

18 changes: 12 additions & 6 deletions docs/admin/common_tasks.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# DAOS Common Tasks

This section describes some of the common tasks handled by admins at a high level. See [System Deployment](./deployment.md#system-deployment), [DAOS System Administration](./administration.md#daos-system-administration), and [Pool Operations](./pool_operations.md#pool-operations) for more detailed explanations about each step.
This section describes some of the common tasks handled by admins at a high level.
See [System Deployment](./deployment.md#system-deployment),
[DAOS System Administration](./administration.md#daos-system-administration),
and [Pool Operations](./pool_operations.md#pool-operations)
for more detailed explanations about each step.

## Single host setup with PMEM and NVMe

Expand All @@ -9,13 +13,12 @@ This section describes some of the common tasks handled by admins at a high leve
3. Install `daos-server` and `daos-client` RPMs.
4. Generate certificate files.
5. Copy one of the example configs from `utils/config/examples` to
`/etc/daos` and adjust it based on the environment. E.g., `mgmt_svc_replicas`,
`class`.
`/etc/daos` and adjust it based on the environment. E.g., `mgmt_svc_replicas`,
`class`.
6. Check that the directory where the log files will be created exists. E.g.,
`control_log_file`, `log_file` field in `engines` section.
`control_log_file`, `log_file` field in `engines` section.
7. Start `daos_server`.
8. Use `dmg config generate` to generate the config file that contains PMEM and
NVMe.
8. Use `dmg config generate` to generate the config file that contains PMEM and NVMe.
9. Define the certificate files in the server config.
10. Start server with the generated config file.
11. Check that the server is waiting for SCM format. Call `dmg storage format`.
Expand Down Expand Up @@ -62,6 +65,7 @@ server hosts.

1. Start DAOS server with PMEM + NVMe and format.
2. Create a pool with a size percentage. For example,

```
dmg pool create --size=50%
```
Expand All @@ -72,6 +76,7 @@ The percentage is applied to the usable space.
1. Start DAOS server on one host.
2. Create a file that specifies the server host in `/etc/daos`. It's usually
called `daos_control.yml`. Add the following:

```
hostlist:
- <server_host>
Expand All @@ -83,6 +88,7 @@ transport_config:
cert: /etc/daos/certs/admin.crt
key: /etc/daos/certs/admin.key
```

`server_host` is the hostname where the server is running. `group_name` is
usually `daos_server`. Match the `port` field defined in the server config.
Adjust `transport_config` accordingly.
Expand Down
64 changes: 29 additions & 35 deletions docs/admin/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ The configuration file location can be specified on the command line
(`/etc/daos/daos_server.yml`).

Parameter descriptions are specified in
[`daos_server.yml`](https://github.com/daos-stack/daos/blob/master/utils/config/daos_server.yml)
[daos\_server.yml](https://github.com/daos-stack/daos/blob/master/utils/config/daos_server.yml)
and example configuration files in the
[examples](https://github.com/daos-stack/daos/tree/master/utils/config/examples)
directory.
Expand All @@ -97,30 +97,30 @@ for the path specified through the -o option of the `daos_server` command
line, if unspecified then `/etc/daos/daos_server.yml` is used.

Refer to the example configuration file
[`daos_server.yml`](https://github.com/daos-stack/daos/blob/master/utils/config/daos_server.yml)
[daos\_server.yml](https://github.com/daos-stack/daos/blob/master/utils/config/daos_server.yml)
for latest information and examples.

#### MD-on-SSD Configuration

To enable MD-on-SSD, the Control-Plane-Metadata ('control_metadata') global section of the
To enable MD-on-SSD, the Control-Plane-Metadata (`control_metadata`) global section of the
configuration file
[`daos_server.yml`](https://github.com/daos-stack/daos/blob/master/utils/config/daos_server.yml)
[daos\_server.yml](https://github.com/daos-stack/daos/blob/master/utils/config/daos_server.yml)
needs to specify a persistent location to store control-plane specific metadata (which would be
stored on PMem in non MD-on-SSD mode). Either set 'control_metadata:path' to an existing (mounted)
local filesystem path or set 'control_metadata:device' to a storage partition which can be mounted
stored on PMem in non MD-on-SSD mode). Either set `control_metadata:path` to an existing (mounted)
local filesystem path or set `control_metadata:device` to a storage partition which can be mounted
and formatted by the control-plane during storage format. In the latter case when specifying a
device the path parameter value will be used as the mountpoint path.

The MD-on-SSD code path will only be used if it is explicitly enabled by specifying the new
'bdev_role' property for the NVMe storage tier(s) in the 'daos_server.yml' file. There are three
types of 'bdev_role': wal, meta, and data. Each role must be assigned to exactly one NVMe tier.
`bdev_role` property for the NVMe storage tier(s) in the `daos_server.yml` file. There are three
types of `bdev_role`: wal, meta, and data. Each role must be assigned to exactly one NVMe tier.
Depending on the number of NVMe SSDs per DAOS engine there may be one, two or three NVMe tiers with
different 'bdev_role' assignments.
different `bdev_role` assignments.

For a complete server configuration file example enabling MD-on-SSD, see
[`daos_server_mdonssd.yml`](https://github.com/daos-stack/daos/blob/master/utils/config/daos_server.yml).
[daos\_server\_mdonssd.yml](https://github.com/daos-stack/daos/blob/master/utils/config/daos_server.yml).

Below are four different 'daos_server.yml' storage configuration snippets that represent scenarios
Below are four different `daos_server.yml` storage configuration snippets that represent scenarios
for a DAOS engine with four NVMe SSDs and MD-on-SSD enabled.


Expand Down Expand Up @@ -148,7 +148,6 @@ This example shows the typical use case for a DAOS server with a small number of
only four or five NVMe SSDs per engine, it is natural to assign all three roles to all NVMe SSDs
configured as a single NVMe tier.


2. Two NVMe tiers, one SSD assigned wal role (tier-1) and three SSDs assigned both meta and data
roles (tier-2):

Expand Down Expand Up @@ -254,11 +253,11 @@ engine but maybe practical with a larger number of SSDs and so illustrated here
DAOS can attempt to produce a server configuration file that makes optimal use of hardware on a
given set of hosts either through the `dmg` or `daos_server` tools.

To generate an MD-on-SSD configurations set both '--control-metadata-path' and '--use-tmpfs-scm'
To generate an MD-on-SSD configurations set both `--control-metadata-path` and `--use-tmpfs-scm`
options as detailed below. Note that due to the number of variables considered when generating a
configuration automatically the result may not be the most optimal in all situations.

##### Generating Configuration File Using daos_server Tool
##### Generating Configuration File Using daos\_server Tool

To generate a configuration file for a single storage server, run the `daos_server config generate`
command locally. In this case, the `daos_server` service should not be running on the local host.
Expand Down Expand Up @@ -399,7 +398,7 @@ storage tier. The RAM-disk sizes will be calculated based on the host's total me
by `/proc/meminfo`).

- `--control-metadata-path` specifies a persistent location to store control-plane metadata which
allows MD-on-SSD DAOS deployments to survive without data loss over 'daos_server' restarts. If
allows MD-on-SSD DAOS deployments to survive without data loss over `daos_server` restarts. If
this option is set then a MD-on-SSD config will be generated.

- `--fabric-ports` enables custom port numbers to be assigned to each engine's fabric settings.
Expand Down Expand Up @@ -844,7 +843,7 @@ To set the addresses of which DAOS Servers to task, provide either:
- `-l <hostlist>` on the commandline when invoking, or

- `hostlist: <hostlist>` in the control configuration file
[`daos_control.yml`](https://github.com/daos-stack/daos/blob/master/utils/config/daos_control.yml)
[daos_control.yml](https://github.com/daos-stack/daos/blob/master/utils/config/daos_control.yml)

Where `<hostlist>` represents a slurm-style hostlist string e.g.
`foo-1[28-63],bar[256-511]`.
Expand Down Expand Up @@ -978,8 +977,8 @@ network.
`daos_server (nvme|scm) scan` can be used to query storage on the local host directly.

!!! note
'daos_server' commands will refuse to run if a process with the same name exists (e.g. as a
systemd service under the 'daos_server' userid).
`daos_server` commands will refuse to run if a process with the same name exists (e.g. as a
systemd service under the `daos_server` userid).

NVMe SSDs no longer need to be made accessible first by running `daos_server nvme prepare`,
`daos_server nvme scan` will take the necessary steps to prepare the devices unless `--skip-prep`
Expand All @@ -990,7 +989,7 @@ To use an alternative driver with SPDK, set `--disable-vfio` in the nvme prepare
fallback to using UIO user-space driver with SPDK instead.

!!! note
If UIO user-space driver is used instead of VFIO, 'daos_server' needs to be run as root.
If UIO user-space driver is used instead of VFIO, `daos_server` needs to be run as root.

The output will be equivalent running `dmg storage scan --verbose` remotely.

Expand Down Expand Up @@ -1035,7 +1034,7 @@ manual reset to do so.

!!! warning
Due to [SPDK issue 2926](https://github.com/spdk/spdk/issues/2926), if VMD is enabled and
PCI_ALLOWED list is set to a subset of available VMD controllers (as specified in the server
PCI\_ALLOWED list is set to a subset of available VMD controllers (as specified in the server
config file) then the backing devices of the unselected VMD controllers will be bound to no
driver and therefore inaccessible from both OS and SPDK. Workaround is to run
`daos_server nvme scan --ignore-config` to reset driver bindings for all VMD controllers.
Expand Down Expand Up @@ -1279,7 +1278,7 @@ For class == "nvme", the following parameters should be populated:
- `bdev_list` should be populated with NVMe PCI addresses.
- `bdev_roles` optionally specifies a list of roles for this tier.
By default, the DAOS server will assign roles to bdev tiers
automatically, so the bdev_roles directive is only needed when that
automatically, so the `bdev_roles` directive is only needed when that
assignment doesn't match your use case.

When "dcpm" is used for the first tier, this list should be omitted or
Expand All @@ -1293,7 +1292,7 @@ For class == "nvme", the following parameters should be populated:
will assign them. Otherwise all roles must be assigned to a tier.

See the sample configuration file
[`daos_server.yml`](https://github.com/daos-stack/daos/blob/master/utils/config/daos_server.yml)
[daos\_server.yml](https://github.com/daos-stack/daos/blob/master/utils/config/daos_server.yml)
and example configuration files in the
[examples](https://github.com/daos-stack/daos/tree/master/utils/config/examples)
directory for more details.
Expand All @@ -1303,22 +1302,14 @@ To use an alternative driver with SPDK, set `disable_vfio: true` in the global s
server config file to fallback to using UIO user-space driver with SPDK instead.

!!! note
If UIO user-space driver is used instead of VFIO, 'daos_server' needs to be run as root.
If UIO user-space driver is used instead of VFIO, `daos_server` needs to be run as root.

If VMD is enabled on a host, its usage will be enabled by default meaning that the `bdev_list`
device addresses will be interpreted as VMD endpoints and storage scan will report the details of
the physical NVMe backing devices that belong to each VMD endpoint. To disable the use of VMD on a
VMD-enabled host, set `disable_vmd: true` in the global section of the config to fallback to using
physical NVMe devices only.

!!! warning
If upgrading from DAOS 2.0 to a greater version, the old 'enable_vmd' server config file
parameter is no longer honored and instead should be removed (or replaced by
`disable_vmd: true` if VMD is to be explicitly disabled).

Otherwise 'daos_server' may fail config validation and not start after an update from 2.0 to a
greater version.

#### Example Configurations

To illustrate, assume a cluster with homogeneous hardware configurations that
Expand Down Expand Up @@ -1498,7 +1489,8 @@ This configuration yields the fastest access to that network device.
Information about the network configuration is stored as metadata on the DAOS
storage.

If, after initial deployment, the provider must be changed, please follow the directions to [`change fabric provider`](https://github.com/daos-stack/daos/blob/master/docs/admin/common_tasks.md#change-fabric-provider-on-a-daos-system).
If, after initial deployment, the provider must be changed, please follow the directions to
[change fabric provider](https://github.com/daos-stack/daos/blob/master/docs/admin/common_tasks.md#change-fabric-provider-on-a-daos-system).

#### Provider Testing

Expand Down Expand Up @@ -1547,10 +1539,10 @@ per four target threads, for example `targets: 16` and `nr_xs_helpers: 4`.
The server should have sufficiently many physical cores to support the
number of targets plus the additional service threads.

The 'targets:' and 'nr_xs_helpers:' requirement are mandatory, if the number
The `targets:` and `nr_xs_helpers:` requirement are mandatory, if the number
of physical cores are not enough it will fail the starting of the daos engine
(notes that 2 cores reserved for system service), or configures with ENV
"DAOS_TARGET_OVERSUBSCRIBE=1" to force starting daos engine (possibly hurts
`DAOS_TARGET_OVERSUBSCRIBE=1` to force starting daos engine (possibly hurts
performance as multiple XS compete on same core).


Expand Down Expand Up @@ -1742,7 +1734,9 @@ If you wish to use systemd with a development build, you must copy the Agent ser
file from `utils/systemd/` to `/usr/lib/systemd/system/`.
Then modify the `ExecStart` line to point to your Agent configuration file:

`ExecStart=/usr/bin/daos_agent -o <'path to agent configuration file/daos_agent.yml'>`
```
ExecStart=/usr/bin/daos_agent -o <'path to agent configuration file/daos_agent.yml'>
```

Once the service file is installed and `systemctl daemon-reload` has been run to
reload the configuration, the `daos_agent` can be started through systemd
Expand Down
6 changes: 3 additions & 3 deletions docs/admin/env_variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Environment variables in this section only apply to the server side.
|DAOS\_SCHED\_RELAX\_MODE|The mode of CPU relaxing on idle. "disabled":disable relaxing; "net":wait on network request for INTVL; "sleep":sleep for INTVL. STRING. Default to "net"|
|DAOS\_SCHED\_RELAX\_INTVL|CPU relax interval in milliseconds. INTEGER. Default to 1 ms.|
|DAOS\_STRICT\_SHUTDOWN|Use the strict mode when shutting down engines. BOOL. Default to 0. In the strict mode, when certain resource leaks are detected, for instance, the engine will raise an assertion failure.|
|DAOS\_DTX\_AGG\_THD\_CNT|DTX aggregation count threshold. The valid range is [2^20, 2^24]. The default value is 2^19*7.|
|DAOS\_DTX\_AGG\_THD\_CNT|DTX aggregation count threshold. The valid range is [2^20, 2^24]. The default value is 2^19.|
|DAOS\_DTX\_AGG\_THD\_AGE|DTX aggregation age threshold in seconds. The valid range is [210, 1830]. The default value is 630.|
|DAOS\_DTX\_RPC\_HELPER\_THD|DTX RPC helper threshold. The valid range is [18, unlimited). The default value is 513.|
|DAOS\_DTX\_BATCHED\_ULT\_MAX|The max count of DTX batched commit ULTs. The valid range is [0, unlimited). 0 means to commit DTX synchronously. The default value is 32.|
Expand Down Expand Up @@ -80,10 +80,10 @@ Environment variables in this section only apply to the client side.

|Variable |Description|
|------------|-----------|
|D\_LOG\_FILE|DAOS debug logs (both server and client) are written to stdout by default. The debug location can be modified by setting this environment variable ("D\_LOG\_FILE=/tmp/daos_debug.log").|
|D\_LOG\_FILE|DAOS debug logs (both server and client) are written to stdout by default. The debug location can be modified by setting this environment variable (`D\_LOG\_FILE=/tmp/daos_debug.log`).|
|D\_LOG\_FILE\_APPEND\_PID|If set and not 0, causes the main PID to be appended at the end of D\_LOG\_FILE path name (both server and client).|
|D\_LOG\_STDERR\_IN\_LOG|If set and not 0, causes stderr messages to be merged in D\_LOG\_FILE.|
|D\_LOG\_SIZE|DAOS debug logs (both server and client) have a 1GB file size limit by default. When this limit is reached, the current log file is closed and renamed with a .old suffix, and a new one is opened. This mechanism will repeat each time the limit is reached, meaning that available saved log records could be found in both ${D_LOG_FILE} and last generation of ${D_LOG_FILE}.old files, to a maximum of the most recent 2*D_LOG_SIZE records. This can be modified by setting this environment variable ("D_LOG_SIZE=536870912"). Sizes can also be specified in human-readable form using `k`, `m`, `g`, `K`, `M`, and `G`. The lower-case specifiers are base-10 multipliers and the upper case specifiers are base-2 multipliers.|
|D\_LOG\_SIZE|DAOS debug logs (both server and client) have a 1GB file size limit by default. When this limit is reached, the current log file is closed and renamed with a .old suffix, and a new one is opened. This mechanism will repeat each time the limit is reached, meaning that available saved log records could be found in both ${D_LOG_FILE} and last generation of ${D_LOG_FILE}.old files, to a maximum of the most recent `2*D_LOG_SIZE` records. This can be modified by setting this environment variable ("D_LOG_SIZE=536870912"). Sizes can also be specified in human-readable form using `k`, `m`, `g`, `K`, `M`, and `G`. The lower-case specifiers are base-10 multipliers and the upper case specifiers are base-2 multipliers.|
|D\_LOG\_FLUSH|Allows to specify a non-default logging level where flushing will occur. By default, only levels above WARN will cause an immediate flush instead of buffering.|
|D\_LOG\_TRUNCATE|By default log is appended. But if set this variable will cause log to be truncated upon first open and logging start.|
|DD\_SUBSYS |Used to specify which subsystems to enable. DD\_SUBSYS can be set to individual subsystems for finer-grained debugging ("DD\_SUBSYS=vos"), multiple facilities ("DD\_SUBSYS=bio,mgmt,misc,mem"), or all facilities ("DD\_SUBSYS=all") which is also the default setting. If a facility is not enabled, then only ERR messages or more severe messages will print.|
Expand Down
Loading
Loading