Installation

Overview

The monitoring component provides the infrastructure for monitoring, alerting, inspection, and health checking functions within the observability module. This document describes how to install ACP Monitoring with Prometheus or ACP Monitoring with VictoriaMetrics in a cluster.

To decide which plugin to install, first review the Monitoring Component Selection Guide and choose the option that best fits your cluster scale, storage plan, and operational requirements.

Before You Begin

INFO

Some Monitoring components are resource-intensive. We recommend placing them on infra nodes through plugin configuration. Prometheus and VictoriaMetrics both support plugin-level nodeSelector and tolerations settings. If you are evaluating the product and have not provisioned infra nodes, you can leave these settings empty so the components run on general nodes.

For guidance on planning infra nodes, see Cluster Node Planning.

Before installing the monitoring components, please ensure the following conditions are met:

  • The appropriate monitoring component has been selected by referring to the Monitoring Component Selection Guide.
  • When installing in a workload cluster, ensure that the global cluster can access port 11780 of the workload cluster.
  • If you need to use storage classes or persistent volume storage for monitoring data, please create the corresponding resources in the Storage section in advance.

ACP Monitoring with Prometheus

Install from the Console

  1. Navigate to App Store Management > Cluster Plugins and select the target cluster.

  2. Locate the ACP Monitoring with Prometheus plugin and click Install.

  3. Configure the following parameters:

    The console highlights the most common installation options. For detailed configurable fields, see the YAML reference in this section.

    ParameterDescription
    Scale ConfigurationSupports three configurations: Small Scale, Medium Scale, and Large Scale:
    - Default values are set based on the recommended load test values of the platform
    - You can choose or customize quotas based on the actual cluster scale
    - Default values will be updated with platform versions; for fixed configurations, custom settings are recommended
    Storage Type- LocalVolume: Local storage with data stored on specified nodes
    - StorageClass: Automatically generates persistent volumes using a storage class
    - PV: Utilizes existing persistent volumes
    Note: Storage configuration cannot be modified after installation
    Replica CountSets the number of monitoring component pods
    Note: Prometheus supports only single-node installation
    Advanced ConfigurationCollapsible section for plugin-level scheduling parameters.
    Node SelectorsDisplayed in Advanced Configuration. Configure plugin-level node selector rules for the Prometheus plugin workloads.
    Node TolerationsDisplayed in Advanced Configuration. Configure plugin-level toleration rules for the Prometheus plugin workloads.
    Parameter ConfigurationData parameters for the monitoring component can be adjusted as needed
  4. Click Install to complete the installation.

Install with YAML

Check available versions

Ensure the plugin has been published by checking for ModulePlugin and ModuleConfig resources in the global cluster:


# kubectl get moduleplugin | grep prometheus
prometheus                       30h
# kubectl get moduleconfig | grep prometheus
prometheus-v4.1.0                30h

This indicates that the ModulePlugin prometheus exists in the cluster and version v4.1.0 is published.

Create a ModuleInfo

Create a ModuleInfo resource to install the plugin without any configuration parameters:

kind: ModuleInfo
apiVersion: cluster.alauda.io/v1alpha1
metadata:
  name: global-prometheus
  labels:
    cpaas.io/cluster-name: global
    cpaas.io/module-name: prometheus
    cpaas.io/module-type: plugin
spec:
  version: v4.1.0
  config:
    storage:
      type: LocalVolume
      capacity: 40
      nodes:
        - xxx.xxx.xxx.xx
      path: /cpaas/monitoring
      storageClass: ""
      pvSelectorK: ""
      pvSelectorV: ""
    replicas: 1
    components:
      nodeSelector:
        - key: kubernetes.io/os
          value: linux
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
      prometheus:
        retention: 7
        scrapeInterval: 60
        scrapeTimeout: 45
        resources: null
      nodeExporter:
        port: 9100
        resources: null
      alertmanager:
        resources: null
      kubeStateExporter:
        resources: null
      prometheusAdapter:
        resources: null
      thanosQuery:
        resources: null
    size: Small

Resource settings example (Prometheus):

spec:
  config:
    components:
      prometheus:
        resources:
          limits:
            cpu: 2000m
            memory: 2000Mi
          requests:
            cpu: 1000m
            memory: 1000Mi

For more details, refer to Monitor Component Capacity Planning.

YAML field reference (Prometheus):

Field pathDescription
metadata.labels.cpaas.io/cluster-nameTarget cluster name where the plugin is installed.
metadata.labels.cpaas.io/module-nameMust be prometheus.
metadata.labels.cpaas.io/module-typeMust be plugin.
metadata.nameModuleInfo name (e.g., <cluster>-prometheus).
spec.versionPlugin version to install.
spec.config.storage.typeStorage type: LocalVolume, StorageClass, or PV.
spec.config.storage.capacityStorage size for Prometheus (Gi). Minimum 30 Gi recommended.
spec.config.storage.nodesNode list when storage.type=LocalVolume. Up to 1 node supported.
spec.config.storage.pathBase LocalVolume path when storage.type=LocalVolume. Default: /cpaas/monitoring.
spec.config.storage.storageClassStorageClass name when storage.type=StorageClass.
spec.config.storage.pvSelectorKPV selector key when storage.type=PV.
spec.config.storage.pvSelectorVPV selector value when storage.type=PV.
spec.config.replicasReplica count; only applicable to StorageClass/PV types.
spec.config.components.nodeSelectorOptional. Plugin-level node selector rules for the Prometheus plugin workloads.
spec.config.components.tolerationsOptional. Plugin-level toleration rules for the Prometheus plugin workloads.
spec.config.components.prometheus.retentionData retention days.
spec.config.components.prometheus.scrapeIntervalScrape interval seconds; applies to ServiceMonitors without interval.
spec.config.components.prometheus.scrapeTimeoutScrape timeout seconds; must be less than scrapeInterval.
spec.config.components.prometheus.resourcesResource settings for Prometheus.
spec.config.components.nodeExporter.portNode Exporter port (default 9100).
spec.config.components.nodeExporter.resourcesResource settings for Node Exporter.
spec.config.components.alertmanager.resourcesResource settings for Alertmanager.
spec.config.components.kubeStateExporter.resourcesResource settings for Kube State Exporter.
spec.config.components.prometheusAdapter.resourcesResource settings for Prometheus Adapter.
spec.config.components.thanosQuery.resourcesResource settings for Thanos Query.
spec.config.sizeMonitoring scale: Small, Medium, or Large.

Verify the installation

Since the ModuleInfo name changes upon creation, locate the resource via label to check the plugin status and version:

kubectl get moduleinfo -l cpaas.io/module-name=prometheus
NAME                                             CLUSTER         MODULE        DISPLAY_NAME   STATUS    TARGET_VERSION   CURRENT_VERSION   NEW_VERSION
global-e671599464a5b1717732c5ba36079795          global          prometheus    prometheus     Running   v4.1.0           v4.1.0            v4.1.0

Field explanations:

  • NAME: ModuleInfo resource name
  • CLUSTER: Cluster where the plugin is installed
  • MODULE: Plugin name
  • DISPLAY_NAME: Display name of the plugin
  • STATUS: Installation status; Running means successfully installed and running
  • TARGET_VERSION: Intended installation version
  • CURRENT_VERSION: Version before installation
  • NEW_VERSION: Latest available version for installation

Place Prometheus Workloads on Infra Nodes

If you want the Prometheus plugin workloads to run on dedicated infra nodes, configure plugin-level scheduling rules during installation or upgrade instead of patching generated workloads after installation.

  • In the console, use Advanced Configuration to set Node Selectors and Node Tolerations.
  • In YAML, set spec.config.components.nodeSelector and spec.config.components.tolerations.

Example:

config:
  components:
    nodeSelector:
      - key: kubernetes.io/os
        value: linux
    tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists

Before applying these scheduling rules, make sure your infra node planning and storage placement are compatible. For the planning considerations, see the Monitoring guides in How To, including Planning Infra Nodes for Monitoring.

Access the Installed Components

Once installation is complete, the components can be accessed at the following addresses (replace <> with actual values):

ComponentAccess Address
Thanos<platform_access_address>/clusters/<cluster>/prometheus
Prometheus<platform_access_address>/clusters/<cluster>/prometheus-0
Alertmanager<platform_access_address>/clusters/<cluster>/alertmanager

ACP Monitoring with VictoriaMetrics

Prerequisites

  • If you install only the VictoriaMetrics agent, ensure that the VictoriaMetrics Center has been installed in another cluster.

Install from the Console

  1. Navigate to App Store Management > Cluster Plugins and select the target cluster.

  2. Locate the ACP Monitoring with VictoriaMetrics plugin and click Install.

  3. Configure the following parameters:

    The console highlights the most common installation options. For detailed configurable fields, see the YAML reference in this section.

    ParameterDescription
    Scale ConfigurationSupports three configurations: Small Scale, Medium Scale, and Large Scale:
    - Default values are set based on the recommended load test values of the platform
    - You can choose or customize quotas based on the actual cluster scale
    - Default values will be updated with platform versions; for fixed configurations, custom settings are recommended
    Install Agent Only- Off: Install the complete VictoriaMetrics component suite
    - On: Install only the VMAgent collection component, which relies on the VictoriaMetrics Center
    VictoriaMetrics CenterSelect the cluster where the complete VictoriaMetrics component has been installed
    Storage Type- LocalVolume: Local storage with data stored on specified nodes
    - StorageClass: Automatically generates persistent volumes using a storage class
    - PV: Utilizes existing persistent volumes
    Storage PathDisplayed when Storage Type is LocalVolume. Specify the base storage path for monitoring data. Default: /cpaas/monitoring.
    Replica CountDisplayed when Storage Type is StorageClass or PV. Sets the number of monitoring component pods. When Storage Type is LocalVolume, the number of selected nodes determines the number of VMStorage replicas.
    Advanced ConfigurationCollapsible section for plugin-level scheduling parameters.
    Node SelectorsDisplayed in Advanced Configuration. Configure plugin-level node selector rules for the VictoriaMetrics plugin workloads.
    Node TolerationsDisplayed in Advanced Configuration. Configure plugin-level toleration rules for the VictoriaMetrics plugin workloads.
    Parameter ConfigurationData parameters for the monitoring component can be adjusted
    Note: Data may temporarily exceed the retention period before being deleted
  4. Click Install to complete the installation.

Install with YAML

Check available versions

Ensure the plugin has been published by checking for ModulePlugin and ModuleConfig resources in the global cluster:


# kubectl get moduleplugin | grep victoriametrics
victoriametrics                       30h
# kubectl get moduleconfig | grep victoriametrics
victoriametrics-v4.1.0                30h

This indicates that the ModulePlugin victoriametrics exists in the cluster and version v4.1.0 is published.

Create a ModuleInfo

Create a ModuleInfo resource to install the plugin without any configuration parameters:

kind: ModuleInfo
apiVersion: cluster.alauda.io/v1alpha1
metadata:
  name: business-1-victoriametrics
  labels:
    cpaas.io/cluster-name: business-1
    cpaas.io/module-name: victoriametrics
    cpaas.io/module-type: plugin
spec:
  version: v4.1.0
  config:
    storage:
      type: LocalVolume
      capacity: 40
      nodes:
        - xxx.xxx.xxx.xx
      path: /cpaas/monitoring
      storageClass: ""
      pvSelectorK: ""
      pvSelectorV: ""
    replicas: 1
    agentOnly: false
    agentReplicas: 1
    crossClusterDependency:
      victoriametrics: ""
    components:
      nodeSelector:
        - key: kubernetes.io/os
          value: linux
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
      nodeExporter:
        port: 9100
        resources: null
      vmstorage:
        retention: 7
        resources: null
      kubeStateExporter:
        resources: null
      vmalert:
        resources: null
      prometheusAdapter:
        resources: null
      vmagent:
        scrapeInterval: 60
        scrapeTimeout: 45
        resources: null
      vminsert:
        resources: null
      alertmanager:
        resources: null
      vmselect:
        resources: null
    size: Small

Resource settings example (vmagent):

spec:
  config:
    components:
      vmagent:
        resources:
          limits:
            cpu: 2000m
            memory: 2000Mi
          requests:
            cpu: 1000m
            memory: 1000Mi

For more details, refer to Monitor Component Capacity Planning.

YAML field reference (VictoriaMetrics):

Field pathDescription
metadata.labels.cpaas.io/cluster-nameTarget cluster name where the plugin is installed.
metadata.labels.cpaas.io/module-nameMust be victoriametrics.
metadata.labels.cpaas.io/module-typeMust be plugin.
metadata.nameModuleInfo name (e.g., <cluster>-victoriametrics).
spec.versionPlugin version to install.
spec.config.storage.typeStorage type: LocalVolume, StorageClass, or PV.
spec.config.storage.capacityStorage size for VictoriaMetrics (Gi). Minimum 30 Gi recommended.
spec.config.storage.nodesNode list when storage.type=LocalVolume. You can select one or more nodes.
spec.config.storage.pathBase LocalVolume path when storage.type=LocalVolume. Default: /cpaas/monitoring.
spec.config.storage.storageClassStorageClass name when storage.type=StorageClass.
spec.config.storage.pvSelectorKPV selector key when storage.type=PV.
spec.config.storage.pvSelectorVPV selector value when storage.type=PV.
spec.config.replicasReplica count; applicable to StorageClass and PV types.
spec.config.agentOnlyWhether to install only vmagent instead of the full VictoriaMetrics component set.
spec.config.agentReplicasReplica count for vmagent when agent-only mode is enabled.
spec.config.crossClusterDependency.victoriametricsTarget cluster that provides the VictoriaMetrics Center when agent-only mode is enabled.
spec.config.components.nodeSelectorOptional. Plugin-level node selector rules for the VictoriaMetrics plugin workloads.
spec.config.components.tolerationsOptional. Plugin-level toleration rules for the VictoriaMetrics plugin workloads.
spec.config.components.vmstorage.retentionData retention days for vmstorage.
spec.config.components.vmagent.scrapeIntervalScrape interval seconds; applies to ServiceMonitors without interval.
spec.config.components.vmagent.scrapeTimeoutScrape timeout seconds; must be less than scrapeInterval.
spec.config.components.vmstorage.resourcesResource settings for vmstorage.
spec.config.components.vmalert.resourcesResource settings for vmalert.
spec.config.components.nodeExporter.portNode Exporter port (default 9100).
spec.config.components.nodeExporter.resourcesResource settings for Node Exporter.
spec.config.components.alertmanager.resourcesResource settings for Alertmanager.
spec.config.components.kubeStateExporter.resourcesResource settings for Kube State Exporter.
spec.config.components.prometheusAdapter.resourcesResource settings for Prometheus Adapter (used for HPA/custom metrics).
spec.config.components.vmagent.resourcesResource settings for vmagent.
spec.config.components.vminsert.resourcesResource settings for vminsert.
spec.config.components.vmselect.resourcesResource settings for vmselect.
spec.config.sizeMonitoring scale: Small, Medium, or Large.

Verify the installation

Since the ModuleInfo name changes upon creation, locate the resource via label to check the plugin status and version:

kubectl get moduleinfo -l cpaas.io/module-name=victoriametrics
NAME                                             CLUSTER         MODULE            DISPLAY_NAME     STATUS    TARGET_VERSION   CURRENT_VERSION   NEW_VERSION
global-e671599464a5b1717732c5ba36079795          global          victoriametrics   victoriametrics  Running   v4.1.0           v4.1.0            v4.1.0

Field explanations:

  • NAME: ModuleInfo resource name
  • CLUSTER: Cluster where the plugin is installed
  • MODULE: Plugin name
  • DISPLAY_NAME: Display name of the plugin
  • STATUS: Installation status; Running means successfully installed and running
  • TARGET_VERSION: Intended installation version
  • CURRENT_VERSION: Version before installation
  • NEW_VERSION: Latest available version for installation

Place VictoriaMetrics Workloads on Infra Nodes

If you want the VictoriaMetrics plugin workloads to run on dedicated infra nodes, configure plugin-level scheduling rules during installation or upgrade instead of patching generated workloads after installation.

  • In the console, use Advanced Configuration to set Node Selectors and Node Tolerations.
  • In YAML, set spec.config.components.nodeSelector and spec.config.components.tolerations.

Example:

config:
  components:
    nodeSelector:
      - key: kubernetes.io/os
        value: linux
    tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists

Before applying these scheduling rules, make sure your infra node planning and storage placement are compatible. For the planning considerations, see the Monitoring guides in How To, including Planning Infra Nodes for Monitoring.

Access the Installed Components

Once installation is complete, the components can be accessed at the following address (replace <> with actual values):

ComponentAccess Address
VictoriaMetrics UI<platform_access_address>/clusters/<cluster>/vmselect-ui/vmui/?#/metrics
INFO

If Install Agent Only is enabled, the cluster does not deploy the vmselect component locally, so the VictoriaMetrics UI address is unavailable in that cluster.