Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,14 @@ include::_attributes/common-attributes.adoc[]

toc::[]

:FeatureName: Two-node OpenShift cluster with fencing
include::snippets/technology-preview.adoc[leveloffset=+1]

[role="_abstract"]
Use the following sections help you with recovering from issues in a two-node OpenShift cluster with fencing.

// Manually recovering from a disruption event when automated recovery is unavailable
include::modules/installation-manual-recovering-when-auto-recovery-is-unavail.adoc[leveloffset=+1]

[role="_additional-resources"]
== Additional resources
.Additional resources

* xref:../../../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#backup-etcd-restoring_backing-up-etcd[Restoring etcd from a backup].

Expand All @@ -25,7 +23,7 @@ include::modules/installation-manual-recovering-when-auto-recovery-is-unavail.ad
include::modules/installation-replacing-control-plane-nodes.adoc[leveloffset=+1]

[role="_additional-resources"]
== Additional resources
.Additional resources

* xref:../../../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#backup-etcd-restoring_backing-up-etcd[Restoring etcd from a backup].

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
//Modules included in
//
// *installing_tnf/install-post-tnf.adoc

:_mod-docs-content-type: PROCEDURE
[id="installation-manual-recovering-when-auto-recovery-is-unavail_{context}"]
= Manually recovering from a disruption event when automated recovery is unavailable

[role="_abstract"]
You might need to perform manual recovery steps if a disruption event prevents fencing from functioning correctly. In this case, you can run commands directly on the control plane nodes to recover the cluster. There are four main recovery scenarios, which should be attempted in the following order:

. Update fencing secrets: Refresh the Baseboard Management Console (BMC) credentials if they are incorrect or outdated.
Expand All @@ -23,7 +28,7 @@ Do an etcd backup before proceeding to ensure that you can restore the cluster i

. Update the fencing secrets:

.. If the Cluster API is unavilable, update fencing secret by running the following command on one of the cluster nodes:
.. If the Cluster API is unavailable, update fencing secret by running the following command on one of the cluster nodes:
+
[source,terminal]
----
Expand All @@ -32,7 +37,7 @@ $ sudo pcs stonith update <node_name>_redfish username=<user_name> password=<pas
+
After the Cluster API recovers, or the Cluster API is already available, update fencing secret in the cluster to ensure it stays in sync, as described in the following step.

.. Edit the username and password for the existing fencing secret for the control plane node by running the following commads:
.. Edit the username and password for the existing fencing secret for the control plane node by running the following commands:
+
[source,terminal]
----
Expand All @@ -41,7 +46,70 @@ $ oc project openshift-etcd
+
[source,terminal]
----
$ oc edit secret <node_name>-fencing
$ oc edit secret fencing-credentials-<node_name>
----
+
The secret contains the following data keys:
+
.Data keys
[cols="1,1,2",options="header"]
|===
| Key | Description | Changes during credential rotation?

| `username`
| BMC username
| Yes

| `password`
| BMC password
| Yes

| `address`
| Full Redfish URL (e.g., `redfish+https://192.168.1.10:443/redfish/v1/Systems/1`)
| Only if BMC address changed

| `certificateVerification`
| `Disabled` or `Enabled`
| Only if TLS settings changed

|===
+
[NOTE]
====
The `oc edit secret` command displays base64-encoded values, and any new values must also be base64-encoded before editing.
====
+
The following command avoids manual encoding:
+
[source,terminal]
----
$ oc create secret generic <name> \
--from-literal=username=<user> \
--from-literal=password=<password> \
--dry-run=client -o yaml | oc apply -f -
----
+
All four keys must be present. The cluster etcd Operator rejects secrets with missing keys.

.. Verify that the new credentials can reach the BMC:
+
[source,terminal]
----
$ sudo pcs stonith config <node_name>_redfish
----
+
.. Verify that no STONITH resources are blocked:
+
[source,terminal]
----
$ sudo pcs status --full
----
+
The cluster etcd Operator performs this validation automatically when it applies credentials from the secret by using the following command:
+
[source,terminal]
----
`fence_redfish --action status`
----
+
If the cluster recovers after updating the fencing secrets, no further action is required. If the issue persists, proceed to the next step.
Expand Down