Setting Ceph OSD Full Thresholds
This topic describes how to adjust Ceph OSD capacity thresholds for ACP distributed storage. You can change the thresholds directly in Ceph, or manage them declaratively in the CephCluster custom resource.
Ceph uses three thresholds to control how the cluster behaves as OSD usage increases:
Always keep the thresholds in ascending order: nearfull < backfillfull < full. Setting values too close to 1.0 can leave the cluster with no room to recover.
Prerequisites
- You have cluster-admin access to the ACP cluster.
- The
rook-ceph-toolsdeployment is available in therook-cephnamespace, or you are allowed to start it temporarily. - You understand why the cluster is approaching full capacity and have a plan to add storage or remove data after the emergency adjustment.
Procedure
Check the current cluster state
If the rook-ceph-tools Pod is not running, start it first:
Check overall cluster health:
Check the current threshold values:
output example:
Seting the thresholds via ceph CLI
Use Ceph commands to change the effective cluster values directly.
For example, to raise the thresholds slightly:
Use the smallest increase that restores cluster progress. After changing the values, continue with capacity expansion or data cleanup as soon as possible.
If writes are blocked and OSDs remain stuck, pending, or cannot come back up, stop application I/O first, then raise only the full threshold by a small amount, wait for the rebalance to finish, and restore the threshold after the cluster returns to a stable state.
Setting the thresholds by updating the CephCluster CR
You can set Ceph OSD full thresholds by updating the CephCluster CR, Use this procedure if you want to override the default settings.
If you only need to change one threshold, patch only that field. For example:
Verify the applied values
Verify the persisted settings in Kubernetes:
Verify the effective runtime values in Ceph:
Recheck cluster health and rebalance status:
When these fields are set in CephCluster, that resource becomes the declarative source for the threshold values. If ACP or Rook reconciles the cluster configuration later, the values in CephCluster should be treated as the intended baseline.
Restore or rebaseline the thresholds
After you add capacity, remove data, or complete rebalancing, decide whether to keep the new thresholds. If the higher values were used only as an emergency workaround, patch the CephCluster resource back to your standard baseline and confirm that the runtime values also return to the expected state.
Recommendations
- Treat threshold changes as a temporary mitigation, not as a substitute for capacity planning.
- Review OSD utilization distribution if only a few OSDs are much fuller than the rest.
- Record the original threshold values before making changes so you can restore them after the cluster stabilizes.