diff --git a/installing/installing_two_node_cluster/installing_tnf/install-post-tnf.adoc b/installing/installing_two_node_cluster/installing_tnf/install-post-tnf.adoc index 68031e4ae08c..9390d7b08247 100644 --- a/installing/installing_two_node_cluster/installing_tnf/install-post-tnf.adoc +++ b/installing/installing_two_node_cluster/installing_tnf/install-post-tnf.adoc @@ -6,16 +6,14 @@ include::_attributes/common-attributes.adoc[] toc::[] -:FeatureName: Two-node OpenShift cluster with fencing -include::snippets/technology-preview.adoc[leveloffset=+1] - +[role="_abstract"] Use the following sections help you with recovering from issues in a two-node OpenShift cluster with fencing. // Manually recovering from a disruption event when automated recovery is unavailable include::modules/installation-manual-recovering-when-auto-recovery-is-unavail.adoc[leveloffset=+1] [role="_additional-resources"] -== Additional resources +.Additional resources * xref:../../../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#backup-etcd-restoring_backing-up-etcd[Restoring etcd from a backup]. @@ -25,7 +23,7 @@ include::modules/installation-manual-recovering-when-auto-recovery-is-unavail.ad include::modules/installation-replacing-control-plane-nodes.adoc[leveloffset=+1] [role="_additional-resources"] -== Additional resources +.Additional resources * xref:../../../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#backup-etcd-restoring_backing-up-etcd[Restoring etcd from a backup]. diff --git a/modules/installation-manual-recovering-when-auto-recovery-is-unavail.adoc b/modules/installation-manual-recovering-when-auto-recovery-is-unavail.adoc index 72f9cc99da10..5c8ebd910508 100644 --- a/modules/installation-manual-recovering-when-auto-recovery-is-unavail.adoc +++ b/modules/installation-manual-recovering-when-auto-recovery-is-unavail.adoc @@ -1,7 +1,12 @@ +//Modules included in +// +// *installing_tnf/install-post-tnf.adoc + :_mod-docs-content-type: PROCEDURE [id="installation-manual-recovering-when-auto-recovery-is-unavail_{context}"] = Manually recovering from a disruption event when automated recovery is unavailable +[role="_abstract"] You might need to perform manual recovery steps if a disruption event prevents fencing from functioning correctly. In this case, you can run commands directly on the control plane nodes to recover the cluster. There are four main recovery scenarios, which should be attempted in the following order: . Update fencing secrets: Refresh the Baseboard Management Console (BMC) credentials if they are incorrect or outdated. @@ -23,7 +28,7 @@ Do an etcd backup before proceeding to ensure that you can restore the cluster i . Update the fencing secrets: -.. If the Cluster API is unavilable, update fencing secret by running the following command on one of the cluster nodes: +.. If the Cluster API is unavailable, update fencing secret by running the following command on one of the cluster nodes: + [source,terminal] ---- @@ -32,7 +37,7 @@ $ sudo pcs stonith update _redfish username= password=-fencing +$ oc edit secret fencing-credentials- +---- ++ +The secret contains the following data keys: ++ +.Data keys +[cols="1,1,2",options="header"] +|=== +| Key | Description | Changes during credential rotation? + +| `username` +| BMC username +| Yes + +| `password` +| BMC password +| Yes + +| `address` +| Full Redfish URL (e.g., `redfish+https://192.168.1.10:443/redfish/v1/Systems/1`) +| Only if BMC address changed + +| `certificateVerification` +| `Disabled` or `Enabled` +| Only if TLS settings changed + +|=== ++ +[NOTE] +==== +The `oc edit secret` command displays base64-encoded values, and any new values must also be base64-encoded before editing. +==== ++ +The following command avoids manual encoding: ++ +[source,terminal] +---- +$ oc create secret generic \ + --from-literal=username= \ + --from-literal=password= \ + --dry-run=client -o yaml | oc apply -f - +---- ++ +All four keys must be present. The cluster etcd Operator rejects secrets with missing keys. + +.. Verify that the new credentials can reach the BMC: ++ +[source,terminal] +---- +$ sudo pcs stonith config _redfish +---- ++ +.. Verify that no STONITH resources are blocked: ++ +[source,terminal] +---- +$ sudo pcs status --full +---- ++ +The cluster etcd Operator performs this validation automatically when it applies credentials from the secret by using the following command: ++ +[source,terminal] +---- +`fence_redfish --action status` ---- + If the cluster recovers after updating the fencing secrets, no further action is required. If the issue persists, proceed to the next step.