feat: add rotate expiring certificates (#905)

FrankYang0529 · jillian-maroket · web-flow · commit 7e70bc25151a · 2025-11-11T11:16:31.000+08:00
* feat: add rotate expiring certificates

Signed-off-by: PoAn Yang &lt;poan.yang@suse.com&gt;

* Update docs/advanced/settings.md

Co-authored-by: Jillian Maroket &lt;67180770+jillian-maroket@users.noreply.github.com&gt;
Signed-off-by: PoAn Yang &lt;yangpoan@gmail.com&gt;

* Update docs/advanced/settings.md

Co-authored-by: Jillian Maroket &lt;67180770+jillian-maroket@users.noreply.github.com&gt;
Signed-off-by: PoAn Yang &lt;yangpoan@gmail.com&gt;

* Update docs/host/host.md

Co-authored-by: Jillian Maroket &lt;67180770+jillian-maroket@users.noreply.github.com&gt;
Signed-off-by: PoAn Yang &lt;yangpoan@gmail.com&gt;

* Update docs/host/host.md

Co-authored-by: Jillian Maroket &lt;67180770+jillian-maroket@users.noreply.github.com&gt;
Signed-off-by: PoAn Yang &lt;yangpoan@gmail.com&gt;

* address comment

Signed-off-by: PoAn Yang &lt;poan.yang@suse.com&gt;

---------

Signed-off-by: PoAn Yang &lt;poan.yang@suse.com&gt;
Signed-off-by: PoAn Yang &lt;yangpoan@gmail.com&gt;
Co-authored-by: Jillian Maroket &lt;67180770+jillian-maroket@users.noreply.github.com&gt;
diff --git a/docs/advanced/settings.md b/docs/advanced/settings.md
@@ -64,10 +64,12 @@ The following example adds disks that match the glob pattern `/dev/sd*` or `/dev
 
 **Definition**: Setting that allows you to automatically rotate certificates for RKE2 services. This setting is disabled by default.
 
-Use the field `expiringInHours` to specify the validity period of each certificate (`1` to `8759` hours). Harvester automatically replaces the certificate before the specified period ends.
+Use the field `expiringInHours` to specify the validity period of each certificate (`1` to `8759` hours). If the certificate expires within the specified period, Harvester automatically replaces the certificate.
 
 For more information, see the **Certificate Rotation** section of the [Rancher](https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/manage-clusters/rotate-certificates) and [RKE2](https://docs.rke2.io/advanced#certificate-rotation) documentation.
 
+If your certificates have expired, you can [rotate them manually](../host/host.md#rotate-expiring-certificates).
+
 **Default value**: `{"enable":false,"expiringInHours":240}`
 
 **Example**:
@@ -823,7 +825,7 @@ https://your.upgrade.checker-url/v99/checkupgrade
 - `restoreVM`: Option that enables Harvester to automatically restore previously running [non-migratable virtual machines](../vm/live-migration.md#non-migratable-virtual-machines) after the upgrade is *successfully* completed. You can specify either of the following values:
 
   - `true`: Harvester forcibly shuts down *running* and *paused* non-migratable virtual machines on each node. After the upgrade is completed, the previously running virtual machines are automatically restarted, while the paused virtual machines remain shut down.
-  
+
   - `false`: This is the default value. In multi-node environments, you must stop all non-migratable virtual machines to allow the upgrade to proceed. Harvester does not restart these virtual machines after the upgrade is completed.
 
   :::note
diff --git a/docs/host/host.md b/docs/host/host.md
@@ -589,3 +589,51 @@ You can configure the URL of the console for remote server management. This cons
 1. Click **Console** to access the remote server.
 
   ![](/img/remote_console_button.png)
+
+## Rotate Expired Certificates
+
+If the RKE2 certificates have expired, you cannot use the `auto-rotate-rke2-certificates` setting to rotate them. The setting only works when the cluster (`cluster.provisioning`) is marked `Ready`.
+
+```
+> kubectl get cluster.provisioning -n fleet-local local -o yaml | yq -e '.status.conditions[] | select(.type=="Ready")'
+lastUpdateTime: "2025-10-22T06:41:33Z"
+status: "True"
+type: Ready
+```
+
+If the `status` field's value is `False`, you must manually rotate the certificates by following these steps on each node:
+
+1. Log in to the node using the root account.
+
+1. Stop the RKE2 service.
+
+    - Management nodes
+      ```
+      systemctl stop rke2-server
+      ```
+    - Worker nodes
+      ```
+      systemctl stop rke2-agent
+      ```
+
+1. Rotate the RKE2 certificates.
+
+    ```
+    /opt/rke2/bin/rke2 certificate rotate
+    ```
+
+1. Start the RKE2 service.
+
+    - Management nodes
+      ```
+      systemctl start rke2-server
+      ```
+    - Worker nodes
+      ```
+      systemctl start rke2-agent
+      ```
+
+1. Restart the `rancher-system-agent` service.
+    ```
+    systemctl restart rancher-system-agent
+    ```
diff --git a/versioned_docs/version-v1.4/advanced/settings.md b/versioned_docs/version-v1.4/advanced/settings.md
@@ -64,10 +64,12 @@ The following example adds disks that match the glob pattern `/dev/sd*` or `/dev
 
 **Definition**: Setting that allows you to automatically rotate certificates for RKE2 services. This setting is disabled by default.
 
-Use the field `expiringInHours` to specify the validity period of each certificate (`1` to `8759` hours). Harvester automatically replaces the certificate before the specified period ends.
+Use the field `expiringInHours` to specify the validity period of each certificate (`1` to `8759` hours). If the certificate expires within the specified period, Harvester automatically replaces the certificate.
 
 For more information, see the **Certificate Rotation** section of the [Rancher](https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/manage-clusters/rotate-certificates) and [RKE2](https://docs.rke2.io/advanced#certificate-rotation) documentation.
 
+If your certificates have expired, you can [rotate them manually](../host/host.md#rotate-expiring-certificates).
+
 **Default value**: `{"enable":false,"expiringInHours":240}`
 
 **Example**:
@@ -78,7 +80,7 @@ For more information, see the **Certificate Rotation** section of the [Rancher](
 
 ### `backup-target`
 
-**Definition**: Custom backup target used to store VM backups. 
+**Definition**: Custom backup target used to store VM backups.
 
 For more information, see the [Longhorn documentation](https://longhorn.io/docs/1.6.0/snapshots-and-backups/backup-and-restore/set-backup-target/#set-up-aws-s3-backupstore).
 
@@ -122,7 +124,7 @@ https://172.16.0.1/v3/import/w6tp7dgwjj549l88pr7xmxb4x6m54v5kcplvhbp9vv2wzqrrjhr
 
 ### `containerd-registry`
 
-**Definition**: Configuration of a private registry created for the Harvester cluster. 
+**Definition**: Configuration of a private registry created for the Harvester cluster.
 
 The value is stored in the `registries.yaml` file of each node (path: `/etc/rancher/rke2/registries.yaml`). For more information, see [Containerd Registry Configuration](https://docs.rke2.io/install/private_registry) in the RKE2 documentation.
 
@@ -207,7 +209,7 @@ Changing this setting might cause single-node clusters to temporarily become una
 - Proxy URL for HTTPS requests: `"httpsProxy": "https://<username>:<pswd>@<ip>:<port>"`
 - Comma-separated list of hostnames and/or CIDRs: `"noProxy": "<hostname | CIDR>"`
 
-You must specify key information in the `noProxy` field if you configured the following options or settings: 
+You must specify key information in the `noProxy` field if you configured the following options or settings:
 
 | Configured option/setting | Required value in `noProxy` | Reason |
 | --- | --- | --- |
@@ -254,7 +256,7 @@ debug
 
 **Definition**: Setting that enables and disables the Longhorn V2 Data Engine.
 
-When set to `true`, Harvester automatically loads the kernel modules required by the Longhorn V2 Data Engine, and attempts to allocate 1024 × 2 MiB-sized huge pages (for example, 2 GiB of RAM) on all nodes. 
+When set to `true`, Harvester automatically loads the kernel modules required by the Longhorn V2 Data Engine, and attempts to allocate 1024 × 2 MiB-sized huge pages (for example, 2 GiB of RAM) on all nodes.
 
 Changing this setting automatically restarts RKE2 on all nodes but does not affect running virtual machine workloads.
 
@@ -263,7 +265,7 @@ Changing this setting automatically restarts RKE2 on all nodes but does not affe
 If you encounter error messages that include the phrase "not enough hugepages-2Mi capacity", allow some time for the error to be resolved. If the error persists, reboot the affected nodes.
 
 To disable the Longhorn V2 Data Engine on specific nodes (for example, nodes with less processing and memory resources), go to the **Hosts** screen and add the following label to the target nodes:
-  
+
 - label: `node.longhorn.io/disable-v2-data-engine`
 - value: `true`
 
@@ -308,7 +310,7 @@ Changes to the server address list are applied to all nodes.
 
 **Definition**: Percentage of physical compute, memory, and storage resources that can be allocated for VM use.
 
-Overcommitting is used to optimize physical resource allocation, particularly when VMs are not expected to fully consume the allocated resources most of the time. Setting values greater than 100% allows scheduling of multiple VMs even when physical resources are notionally fully allocated. 
+Overcommitting is used to optimize physical resource allocation, particularly when VMs are not expected to fully consume the allocated resources most of the time. Setting values greater than 100% allows scheduling of multiple VMs even when physical resources are notionally fully allocated.
 
 **Default values**: `{ "cpu":1600, "memory":150, "storage":200 }`
 
@@ -517,7 +519,7 @@ If you misconfigure this setting and are unable to access the Harvester UI and A
 
 **Supported options and values**:
 
-- `protocols`: Enabled protocols. 
+- `protocols`: Enabled protocols.
 - `ciphers`: Enabled ciphers.
 
 For more information about the supported options, see [`ssl-protocols`](https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#ssl-protocols) and [`ssl-ciphers`](https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#ssl-ciphers) in the Ingress-Nginx Controller documentation.
@@ -688,7 +690,7 @@ When the cluster is upgraded in the future, the contents of the `value` field ma
 
 **Versions**: v1.2.0 and later
 
-**Definition**: Additional namespaces that you can use when [generating a support bundle](../troubleshooting/harvester.md#generate-a-support-bundle). 
+**Definition**: Additional namespaces that you can use when [generating a support bundle](../troubleshooting/harvester.md#generate-a-support-bundle).
 
 By default, the support bundle only collects resources from the following predefined namespaces:
 
@@ -731,7 +733,7 @@ You can specify a value greater than or equal to 0. When the value is 0, Harvest
 
 **Versions**: v1.3.1 and later
 
-**Definition**: Number of minutes Harvester allows for collection of logs and configurations (Harvester) on the nodes for the support bundle. 
+**Definition**: Number of minutes Harvester allows for collection of logs and configurations (Harvester) on the nodes for the support bundle.
 
 If the collection process is not completed within the allotted time, Harvester still allows you to download the support bundle (without the uncollected data). You can specify a value greater than or equal to 0. When the value is 0, Harvester uses the default value.
 
@@ -772,7 +774,7 @@ https://your.upgrade.checker-url/v99/checkupgrade
 **Supported options and fields**:
 
 - `imagePreloadOption`: Options for the image preloading phase.
-  
+
   The full ISO contains the core operating system components and all required container images. Harvester can preload these container images to each node during installation and upgrades. When workloads are scheduled to management and worker nodes, the container images are ready to use.
 
 - `strategy`: Image preload strategy.
@@ -788,10 +790,10 @@ https://your.upgrade.checker-url/v99/checkupgrade
       If you decide to use `skip`, ensure that the following requirements are met:
 
       - You have a private container registry that contains all required images.
-      - Your cluster has high-speed internet access and is able to pull all images from Docker Hub when necessary. 
-        
+      - Your cluster has high-speed internet access and is able to pull all images from Docker Hub when necessary.
+
       Note any potential internet service interruptions and how close you are to reaching your [Docker Hub rate limit](https://www.docker.com/increase-rate-limits/). Failure to download any of the required images may cause the upgrade to fail and may leave the cluster in a middle state.
-    
+
       :::
 
     - `parallel` (**experimental**): Nodes preload images in batches. You can adjust this using the `concurrency` option.
@@ -841,7 +843,7 @@ https://your.upgrade.checker-url/v99/checkupgrade
 
 ### `vm-force-reset-policy`
 
-**Definition**: Setting that allows you to force rescheduling of a VM when the node that it is running on becomes unavailable. 
+**Definition**: Setting that allows you to force rescheduling of a VM when the node that it is running on becomes unavailable.
 
 When the state of the node changes to `Not Ready`, the VM is force deleted and rescheduled to an available node after the configured number of seconds.
 
diff --git a/versioned_docs/version-v1.4/host/host.md b/versioned_docs/version-v1.4/host/host.md
@@ -625,3 +625,42 @@ You can configure the URL of the console for remote server management. This cons
 1. Click **Console** to access the remote server.
 
   ![](/img/remote_console_button.png)
+
+## Rotate Expired Certificates
+
+If the RKE2 certificates have expired, you cannot use the `auto-rotate-rke2-certificates` setting to rotate them. The setting only works when the cluster (`cluster.provisioning`) is marked `Ready`.
+
+```
+> kubectl get cluster.provisioning -n fleet-local local -o yaml | yq -e '.status.conditions[] | select(.type=="Ready")'
+lastUpdateTime: "2025-10-22T06:41:33Z"
+status: "True"
+type: Ready
+```
+If the `status` field's value is `False`, you must manually rotate the certificates by following these steps on each node:
+1. Log in to the node using the root account.
+1. Stop the RKE2 service.
+    - Management nodes
+      ```
+      systemctl stop rke2-server
+      ```
+    - Worker nodes
+      ```
+      systemctl stop rke2-agent
+      ```
+1. Rotate the RKE2 certificates.
+    ```
+    /opt/rke2/bin/rke2 certificate rotate
+    ```
+1. Start the RKE2 service.
+    - Management nodes
+      ```
+      systemctl start rke2-server
+      ```
+    - Worker nodes
+      ```
+      systemctl start rke2-agent
+      ```
+1. Restart the `rancher-system-agent` service.
+    ```
+    systemctl restart rancher-system-agent
+    ```
diff --git a/versioned_docs/version-v1.5/advanced/settings.md b/versioned_docs/version-v1.5/advanced/settings.md
@@ -64,10 +64,12 @@ The following example adds disks that match the glob pattern `/dev/sd*` or `/dev
 
 **Definition**: Setting that allows you to automatically rotate certificates for RKE2 services. This setting is disabled by default.
 
-Use the field `expiringInHours` to specify the validity period of each certificate (`1` to `8759` hours). Harvester automatically replaces the certificate before the specified period ends.
+Use the field `expiringInHours` to specify the validity period of each certificate (`1` to `8759` hours). If the certificate expires within the specified period, Harvester automatically replaces the certificate.
 
 For more information, see the **Certificate Rotation** section of the [Rancher](https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/manage-clusters/rotate-certificates) and [RKE2](https://docs.rke2.io/advanced#certificate-rotation) documentation.
 
+If your certificates have expired, you can [rotate them manually](../host/host.md#rotate-expiring-certificates).
+
 **Default value**: `{"enable":false,"expiringInHours":240}`
 
 **Example**:
diff --git a/versioned_docs/version-v1.5/host/host.md b/versioned_docs/version-v1.5/host/host.md
@@ -593,3 +593,42 @@ You can configure the URL of the console for remote server management. This cons
 1. Click **Console** to access the remote server.
 
   ![](/img/remote_console_button.png)
+
+## Rotate Expired Certificates
+
+If the RKE2 certificates have expired, you cannot use the `auto-rotate-rke2-certificates` setting to rotate them. The setting only works when the cluster (`cluster.provisioning`) is marked `Ready`.
+
+```
+> kubectl get cluster.provisioning -n fleet-local local -o yaml | yq -e '.status.conditions[] | select(.type=="Ready")'
+lastUpdateTime: "2025-10-22T06:41:33Z"
+status: "True"
+type: Ready
+```
+If the `status` field's value is `False`, you must manually rotate the certificates by following these steps on each node:
+1. Log in to the node using the root account.
+1. Stop the RKE2 service.
+    - Management nodes
+      ```
+      systemctl stop rke2-server
+      ```
+    - Worker nodes
+      ```
+      systemctl stop rke2-agent
+      ```
+1. Rotate the RKE2 certificates.
+    ```
+    /opt/rke2/bin/rke2 certificate rotate
+    ```
+1. Start the RKE2 service.
+    - Management nodes
+      ```
+      systemctl start rke2-server
+      ```
+    - Worker nodes
+      ```
+      systemctl start rke2-agent
+      ```
+1. Restart the `rancher-system-agent` service.
+    ```
+    systemctl restart rancher-system-agent
+    ```
diff --git a/versioned_docs/version-v1.6/advanced/settings.md b/versioned_docs/version-v1.6/advanced/settings.md
@@ -64,10 +64,12 @@ The following example adds disks that match the glob pattern `/dev/sd*` or `/dev
 
 **Definition**: Setting that allows you to automatically rotate certificates for RKE2 services. This setting is disabled by default.
 
-Use the field `expiringInHours` to specify the validity period of each certificate (`1` to `8759` hours). Harvester automatically replaces the certificate before the specified period ends.
+Use the field `expiringInHours` to specify the validity period of each certificate (`1` to `8759` hours). If the certificate expires within the specified period, Harvester automatically replaces the certificate.
 
 For more information, see the **Certificate Rotation** section of the [Rancher](https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/manage-clusters/rotate-certificates) and [RKE2](https://docs.rke2.io/advanced#certificate-rotation) documentation.
 
+If your certificates have expired, you can [rotate them manually](../host/host.md#rotate-expiring-certificates).
+
 **Default value**: `{"enable":false,"expiringInHours":240}`
 
 **Example**:
@@ -823,7 +825,7 @@ https://your.upgrade.checker-url/v99/checkupgrade
 - `restoreVM`: Option that enables Harvester to automatically restore previously running [non-migratable virtual machines](../vm/live-migration.md#non-migratable-virtual-machines) after the upgrade is *successfully* completed. You can specify either of the following values:
 
   - `true`: Harvester forcibly shuts down *running* and *paused* non-migratable virtual machines on each node. After the upgrade is completed, the previously running virtual machines are automatically restarted, while the paused virtual machines remain shut down.
-  
+
   - `false`: This is the default value. In multi-node environments, you must stop all non-migratable virtual machines to allow the upgrade to proceed. Harvester does not restart these virtual machines after the upgrade is completed.
 
   :::note
diff --git a/versioned_docs/version-v1.6/host/host.md b/versioned_docs/version-v1.6/host/host.md
@@ -589,3 +589,42 @@ You can configure the URL of the console for remote server management. This cons
 1. Click **Console** to access the remote server.
 
   ![](/img/remote_console_button.png)
+
+## Rotate Expired Certificates
+
+If the RKE2 certificates have expired, you cannot use the `auto-rotate-rke2-certificates` setting to rotate them. The setting only works when the cluster (`cluster.provisioning`) is marked `Ready`.
+
+```
+> kubectl get cluster.provisioning -n fleet-local local -o yaml | yq -e '.status.conditions[] | select(.type=="Ready")'
+lastUpdateTime: "2025-10-22T06:41:33Z"
+status: "True"
+type: Ready
+```
+If the `status` field's value is `False`, you must manually rotate the certificates by following these steps on each node:
+1. Log in to the node using the root account.
+1. Stop the RKE2 service.
+    - Management nodes
+      ```
+      systemctl stop rke2-server
+      ```
+    - Worker nodes
+      ```
+      systemctl stop rke2-agent
+      ```
+1. Rotate the RKE2 certificates.
+    ```
+    /opt/rke2/bin/rke2 certificate rotate
+    ```
+1. Start the RKE2 service.
+    - Management nodes
+      ```
+      systemctl start rke2-server
+      ```
+    - Worker nodes
+      ```
+      systemctl start rke2-agent
+      ```
+1. Restart the `rancher-system-agent` service.
+    ```
+    systemctl restart rancher-system-agent
+    ```