Replacing or Removing Devices
This document describes how to remove a storage device (disk) from a Rook-Ceph cluster managed by the Container Platform. Depending on whether the remaining OSDs have sufficient capacity to absorb the data from the disk being removed, you may need to add a replacement disk first.
Prerequisites
-
All cluster components are functioning properly.
-
The storage cluster was not created with the "add all empty disks" option. Verify by running the following command; the output must show
useAllDevices: false. -
Applicable to platform version 3.8 and above.
Constraints and Limitations
-
During data rebalancing, cluster performance may be temporarily degraded. Avoid operating on multiple disks simultaneously unless absolutely necessary.
-
Do not proceed if the cluster is in
HEALTH_ERRdue to reasons other than the disk being removed. Proceeding in that state may further compromise data resilience. -
If the disk being removed is the last disk of a particular device class, that device class will cease to exist. Any storage pools or policies that depend on it will be affected. Ensure no pools are tied exclusively to this device class before proceeding.
Procedure
Check Cluster State and Capacity
-
Verify overall cluster health.
-
Identify the OSD ID and usage of the disk to be removed.
Note the USE value of the target OSD. Then confirm that the sum of AVAIL across all remaining OSDs (excluding the target) is greater than the target OSD's USE value. This ensures the remaining OSDs have enough free space to absorb the data after removal.
If remaining capacity is insufficient, proceed to the next step to add a replacement disk first. Otherwise, skip to Scale Down the Rook Operator.
Add a Replacement Disk (If Needed)
If the remaining OSDs do not have enough free capacity, add a replacement disk before removing the old one. The Rook operator must be running during this step.
-
Enter the Container Platform.
-
In the left navigation bar, click Storage Management > Distributed Storage > Device Classes.
-
Click Add Device, select the node where the replacement disk is installed, choose the new disk, and assign it to the same device class as the disk being removed.
-
Wait for the new OSD to be created and for data rebalancing to complete. Monitor progress:
Wait until the output shows
HEALTH_OKwith no misplaced or recovering PGs.
Scale Down the Rook Operator
Scale down the Rook operator to prevent it from interfering with the removal process (for example, by recreating deleted OSD deployments mid-procedure).
Mark the OSD Out and Wait for Data Migration
-
Enable the rook-ceph-tools pod if it is not already running.
-
Enter the tools pod.
-
Mark the OSD as
out. This instructs Ceph to migrate all data off the OSD onto the remaining OSDs. -
Monitor rebalancing progress until the cluster returns to
HEALTH_OKwith no misplaced or recovering PGs.Do not proceed until data migration is fully complete. Removing the OSD before migration finishes will result in data loss.
Remove the OSD
-
Edit the CephCluster resource to remove the disk entry.
Locate the disk under
spec.storage.nodesand delete its entry. Save and exit. -
Delete the OSD deployment.
-
Enter the tools pod and permanently remove the OSD from the cluster. Replace
<id>with the OSD ID.Inside the tools pod:
Clean Up the Disk
If the disk will remain physically attached to the node, wipe its metadata to prevent Rook from accidentally picking it up. Run the following commands on the node where the disk is located. Replace /dev/vdb with the actual device path.
Scale Up the Rook Operator
Once the cluster is healthy, restore the Rook operator.
Verify Cluster Health
-
Confirm that the removed OSD no longer appears in the cluster.
-
Verify that the cluster has returned to a healthy state.
The output should show
HEALTH_OKwith all PGs in theactive+cleanstate.