The module lifecycle stage: General Availability
The module has requirements for installation
Purpose
The Billing and cost management functionality answers the question “how much does the infrastructure cost” and presents that cost in a form useful for finance, business unit owners, and project stakeholders.
Commander collects actual resource consumption data (CPU, memory, storage) from every managed cluster, applies administrator-defined tariffs, and shows the result in three forms:
- an interactive dashboard with breakdowns by workspace, cluster, project, namespace, controller, resource kind, and hardware type;
- reports — on demand (instant) and on schedule (recurring), exportable to CSV;
- a single tariffs and tariffed resources view with warnings about unbound resources and discrepancies with actual consumption.
The functionality is disabled by default. See the administrator guide for how to enable it.
Key concepts
Billing involves three entities: clusters with compute resources, tariffs with prices for those resources, and tariffed resources that link the two together.
A tariffed resource is the linking entity: on one side it is bound to a tariff (providing the price), on the other — to a cluster (providing the consumption data stream). Once the link is established, Commander automatically labels the cluster’s nodes and starts receiving metrics. The dashboard and reports take those metrics, multiply them by the prices from the active tariff, and show the cost.
Binding to a cluster happens automatically in most cases — when a cloud cluster is connected, Commander creates classes and links them to node groups on its own. For static clusters the binding must be done manually.
For storage classes binding works strictly by name: if a cluster has a Kubernetes StorageClass named network-ssd, it is automatically matched with the identically named storage class in billing. Manual override of storage class binding is not yet supported. An active tariff can include multiple storage classes with different prices, but the price is set per class name globally: you cannot assign a different cost to network-ssd in a cluster at one site versus network-ssd in a cluster at another.
More on each term:
- Tariff — a price list: which compute and storage resource types cost how much per unit per hour. At any moment only one tariff is active. Historical tariffs are preserved so past periods do not need to be recalculated.
- Compute class — a named compute resource type with its own price. Analogous to a “virtual machine size” in a cloud provider catalog: one class groups nodes with identical characteristics for which a single CPU and memory price is reasonable. Example names:
standard-v3-cf-100,m1.large,c4-16gb. - Storage class — a named storage type with its own per-gigabyte price. Matches by name a Kubernetes
StorageClassthrough which volumes are created in the cluster. Examples:network-ssd,linstor-thin-r2,ceph-block-gold. - Tariffed resource — a resource that has a price in the tariff and actual consumption data. In the current release these are compute classes and storage classes; the list may expand in future releases.
- Agent (
commander-agent) — a component installed in every managed cluster that links it to Commander. For billing, the agent performs two tasks: it sends Commander information about the cluster’s node groups and storage classes, and it adds a service label to nodes so Commander can match each node with its compute class.
How Commander calculates cost
The calculation is intentionally transparent and relies on the Prometheus integration already familiar to Kubernetes administrators:
- Every managed cluster must have the built-in prometheus Deckhouse module enabled and running — it is enabled by default in Deckhouse Kubernetes Platform. Prometheus collects resource consumption metrics (CPU, memory, storage) — standard metrics available in any cluster with this module.
- Each node automatically receives a service label
billing.commander.deckhouse.io/namewith the compute class name as its value. The same name is used in the active tariff as the key for looking up the price. - For storage the key is the Kubernetes storage class name — already present in metrics by default.
- Consumption metrics are shipped from Prometheus in the managed cluster to Prometheus in the management cluster via the
PrometheusRemoteWritemechanism. No manual setup is required: when billing is enabled, Commander creates the requiredPrometheusRemoteWritein the managed cluster on its own. Commander multiplies the received metrics by the prices from the active tariff and builds cost time series. - Those series feed the dashboard and reports: any slice is a sum of cost over the selected dimensions.
No pricing decisions happen at report-build time — the price comes from the tariff that was active for each hour in the period.
Enabling and initial steps
These steps are typically performed once, when enabling the functionality:
- The administrator enables billing in the Commander module settings.
- The agent (
commander-agent), already running in every managed cluster, starts additionally sending Deckhouse Commander information about node groups and storage classes. At the same time, Deckhouse Commander creates aPrometheusRemoteWriteresource in the managed cluster, pointing at the management cluster’s Prometheus — consumption metrics are shipped through it. No manual changes to the Prometheus configuration in managed clusters are required. - For cloud clusters Commander automatically creates compute classes and storage classes from the received data.
- For static clusters storage classes are also created automatically. Compute classes for a static cluster are either created manually by the administrator, or a
billing.commander.deckhouse.io/name: <class-name>label is pre-set in the node group template — Commander will offer to create the matching class as soon as it sees the label on the resulting nodes. - The administrator creates a tariff and defines prices for the compute classes and storage classes that have appeared.
After this the dashboard and reports start showing cost, effective from the tariff’s start date.
Access control
Access is configured in the “Users and permissions” section. Billing is split into four independent resources that can be assigned to different roles:
billingdashboard— access to the dashboard and analytics.billingtariffs— tariff management (the “Tariffs” tab in the “Tariffication” section).billingresources— management of compute classes, storage classes, and their assignments to cluster resources (the “Compute classes” and “Storage classes” tabs).billingreports— report management.
The side menu shows only the sections the user has access to.
Compute classes and storage classes
Why classes
For cloud clusters compute classes and storage classes are created and bound automatically — Deckhouse Commander determines instance types and storage classes from agent data. No manual setup is required, but you can always adjust the automatically created classes and their bindings if needed.
For static clusters automatic compute class detection is not possible (there are no cloud instance metadata), so they must be created and bound manually. Storage classes are still detected automatically.
In previous versions the administrator had to manually add the billing.commander.deckhouse.io/name label to every node group in every cluster — both cloud and static. Classes remove this manual work for cloud clusters:
- Commander forms class names from cloud instance parameters (or picks them up from the node group template), adds the corresponding label to nodes via the agent, and maintains it when nodes are re-created. Editing cluster manifests is no longer necessary.
- A class has two attributes: a technical identifier and a human-readable name. The identifier is used as the value of the
billing.commander.deckhouse.io/namelabel on nodes and as the reference to the class from the tariff, so it must follow Kubernetes label-value rules. The name is shown in the UI and reports; it can be freely edited without changing the identifier: for example, for the identifierstandard-v3-cf-100you can set the name “Standard (2 vCPU, 4 GiB)”. - A class is a single entity visible in the billing section: identifier, name, description, list of clusters where it is used. Opening a class shows which node groups are assigned to it.
- One class can be used across several clusters (for example, identical node configuration in three clusters of the same site) and in several tariffs.
- The price is set once in the tariff against the class identifier.
Where classes come from
In most cases classes are created automatically:
- Cloud clusters. As soon as a cloud node group (
CloudEphemeral,CloudPermanent) appears in a cluster, Commander receives itsInstanceClassdescription from the agent, computes the compute class name using provider-specific rules, and creates the class if it does not yet exist. Right after a cloud cluster is created, the user sees the full set of compute classes used by the cluster in the “Tariffication” section. - Storage classes are created automatically in every cluster — cloud and static alike. Whenever the agent reports a Kubernetes storage class for which there is no corresponding billing storage class yet, Commander creates one.
For static clusters compute classes do not appear automatically (the agent cannot derive them from hardware configuration). There are two options:
- Create compute classes manually in “Tariffication → Compute classes”.
- Pre-set the
billing.commander.deckhouse.io/name: <class-name>label in the node group template. When the agent sees this label on the resulting nodes, Commander offers to create a matching compute class — accepting the suggestion is a one-click action.
Automatically generated class names
The compute class name is not arbitrary — it is built from instance characteristics that influence price. This way the administrator immediately sees which resource type is meant, and Commander guarantees that nodes with identical configuration land in the same class.
| Provider | Naming rule | Example |
|---|---|---|
| Yandex Cloud | {platformID}-cf-{coreFraction} |
standard-v3-cf-100 |
| AWS | {instanceType} |
m5.xlarge |
| GCP | {machineType} |
n1-standard-4 |
| Azure | {machineSize} |
standard-f4 |
| OpenStack | {flavorName} |
m1.large |
| vSphere / zVirt / Dynamix | {numCPUs}c-{memory}m |
4c-8192m |
| VMware Cloud Director | {sizingPolicy} |
4cpu-8mem |
| Huawei Cloud | {flavorName} |
s6.xlarge.2 |
| DVP (Deckhouse Virtualization Platform) | {virtualMachineClassName}-cf-{coreFraction} |
generic-cf-100 |
If the auto-generated name is not suitable, it can be changed when accepting Commander’s suggestion. After a class is linked to a cluster the name becomes protected from editing — it is used as a key in Prometheus metrics, and renaming would break the link to historical consumption data.
Manual assignments take precedence
If the administrator manually changed the “node group ↔ class” assignment (for example, moved a node group from standard-v3-cf-100 into a separate gpu-enabled class), automatic matching will not revert the change. Manual assignments always take precedence over automatic ones.
“Tariffication” section
The “Tariffication” section in the side menu contains three tabs:
- Compute classes — list of classes, create/edit/delete actions, class detail page.
- Storage classes — list of classes, create/edit/delete actions, class detail page (read-only).
- Tariffs — list of tariffs, tariff creation and editing.
When a tab has a mismatch (node groups without a class, Prometheus metrics without a matching class, resources without a price in the tariff, etc.), an orange indicator appears on the tab title, and inside — a warning with a description of the issue and hints for resolution.
Compute classes
The tab shows a table with class name, description, and the list of clusters where the class is used. Clicking a name opens the detail page with the list of node groups across all linked clusters. There you can:
- manually assign the class to an additional node group (for example, in a static cluster);
- remove an assignment if the class was applied automatically but is not a fit for that particular cluster;
- see which assignments are manual (take precedence) and which are automatic.
The same tab shows Commander’s suggestions:
- “Create a compute class for this node group” — if Commander has observed an instance type in the cluster that does not yet have a matching class;
- “Class for a label from Prometheus” — if historical metrics contain values of the
billing.commander.deckhouse.io/namelabel with no matching class (typical after an upgrade from previous versions).
Accepting a suggestion creates the class automatically; before accepting, the generated name can be edited.
Storage classes
The tab layout is similar. One difference: storage class assignments to clusters are fully automatic by name match with Kubernetes storage classes. In this Commander version the storage class detail page is read-only — the list of clusters where a Kubernetes storage class with this name exists is maintained automatically. Manual assignment is not provided for storage classes.
This is a technical limitation of the current release: storage consumption metrics are keyed by the Kubernetes storage class name, and there is no way yet to tag a StorageClass object with an arbitrary billing label (similar to the billing.commander.deckhouse.io/name label on nodes). Therefore, if a separate price for a specific cluster is required, create a Kubernetes storage class with a distinct name in that cluster — it will appear in billing as a separate class automatically.
Tariffs
The tab shows the tariff list: name, calculation model, start date, status. The Create tariff button opens the form, where:
- the “Compute resources” block takes rows of the form “compute class + CPU price + memory price” — the class is selected from a dropdown of existing classes;
- the “Storage” block takes rows of the form “storage class + per-gigabyte price” — again, the class is selected from the list;
- the calculation model, aggregation model, and tariff start date are set.
Only one tariff is active at any given moment — the one with the latest start date among those whose start dates have already arrived. All other tariffs in the list are preserved as price history: they apply to past periods whose hours fall within their effective date ranges. To “update prices”, do not edit the active tariff — create a new tariff with a later start date, and it automatically becomes the active one.
If some compute classes or storage classes exist in the system but are not listed in the current tariff, a warning appears on the tab so the administrator does not forget to price them.
Creating a tariff
Creating a tariff comes down to one large decision — “what price is fair” — and a handful of mechanical steps. The examples below cover typical situations.
Common steps
- Go to Tariffication → Tariffs and click Create tariff.
- Enter the tariff name (for example, “Yandex Cloud — 2026” or “Internal R&D rate, Q1”).
- Select the calculation and aggregation model.
- Set the tariff start date (today by default).
- In the Compute resources block, add each compute class one by one and set the CPU and memory price for each.
- In the Storage block, add storage classes and the per-gigabyte price.
- Save the tariff.
The dashboard and reports start using the tariff from its start date; tariffs can be entered or canceled retroactively.
Example: Yandex Cloud (or any public provider)
Prices are taken from the provider’s price list and normalized to one hour.
| Compute class | CPU, ₽/core·h | Memory, ₽/GiB·h |
|---|---|---|
standard-v3-cf-100 |
1.50 | 0.40 |
standard-v3-cf-50 |
0.75 | 0.40 |
| Storage class | ₽/GiB·h |
|---|---|
network-ssd |
0.02 |
network-hdd |
0.005 |
Rates are illustrative; for a public cloud it is convenient to keep them in sync with the provider’s official price list.
Example: private cloud (vSphere, OpenStack)
A private cloud has no external price from a provider — the organization owns the infrastructure. The typical approach is to build an internal price list (chargeback) based on:
- depreciation of servers and storage over the expected service life (usually 3–5 years);
- electricity, cooling, and data center placement costs;
- hypervisor, storage, and related software license costs;
- proportional cost of operations staff;
- an overhead for idle capacity and reserves.
The resulting hourly CPU and per-gigabyte memory prices go into the tariff and are applied to the private cloud’s classes exactly as for a public cloud.
Example (OpenStack):
| Compute class | CPU, ₽/core·h | Memory, ₽/GiB·h |
|---|---|---|
m1.large |
0.80 | 0.25 |
m1.xlarge |
0.80 | 0.25 |
c4.large (compute-optimized flavor) |
1.00 | 0.25 |
Example (vSphere, class names are formed as {cpu}c-{memory}m):
| Compute class | CPU, ₽/core·h | Memory, ₽/GiB·h |
|---|---|---|
4c-8192m |
0.70 | 0.20 |
8c-16384m |
0.70 | 0.20 |
16c-32768m |
0.70 | 0.20 |
For a private cloud, tariff rates are not required to match market prices. The important thing is that the total computed by the tariff for a period matches the amount the organization expects to charge back to consuming business units.
For private cloud storage, the tariff usually lists:
- a separate rate for “fast” storage (NVMe, SSD) with a higher price;
- a separate rate for “bulk” storage (HDD, object storage) with a lower price;
- where necessary, a separate rate for replicated or encrypted classes.
If the hardware formally belongs to another legal entity (a parent company, a group service company), the tariff’s role is to fix a transfer price: it is agreed with the finance function and does not have to match either the market or the internal cost of service.
Example: static cluster (bare-metal)
For a static cluster Commander does not create compute classes automatically. The workflow is:
- In Tariffication → Compute classes create classes that reflect the actual node types (for example,
bm-cpu-2x32,bm-gpu-a100). - Additionally, the
billing.commander.deckhouse.io/namelabel with the same class name can be pre-set in the node group template — Commander will then link the class to the node group automatically when nodes appear in the cluster. - Go to the class detail page and manually assign the class to the corresponding node groups.
- No action is required for storage classes — they are created automatically by Kubernetes storage class names.
- Create the tariff and set the prices.
The pricing methodology for a static cluster is typically the same as for a private cloud: depreciation, electricity, licensing, and operations divided by the forecast total consumption.
Calculation and aggregation models
The calculation model defines which consumption value is multiplied by the price:
- By requests — uses
resources.requestsdeclared on pods. Suitable when planned load matters more than actual load. Required for correct billing: every pod whose cost should be included must declareresources.requestsfor CPU and memory. Pods withoutrequestsare counted as zero consumption in this model and their resources are not billed. - By real usage — uses actual CPU and memory consumption by pods. Closer to the public-cloud model;
requestsare not required for the calculation.
The aggregation model defines how the hourly consumption value is obtained from more frequent measurements:
- By average — hourly average;
- 95th percentile — aggregation by peak load, smoothing out rare spikes.
The minimum tariffication unit is one hour.
“Dashboard” section
The dashboard is intended for real-time cost analysis.
The dashboard and reports display only those resources for which metrics are actually arriving. If a resource has been created but data has not yet reached Prometheus, it will not appear in filters or tables. This is not a bug: the system shows the actual consumption picture, not the declarative configuration state. Typical reasons for a delay:
- the agent has not yet transmitted node group information;
- the
billing.commander.deckhouse.io/namelabel has not yet been applied to nodes (class not linked to a tariff); - Prometheus has not yet performed the first scrape with the new label value;
- the remote write interval has not yet elapsed.
Typically, data appears within a few minutes after bindings are configured.
Available slices and filters:
- workspace;
- cluster;
- project (for workspaces where projects are enabled);
- namespace;
- controller (Deployment, StatefulSet, DaemonSet, etc.);
- resource kind — CPU, memory, storage;
- compute class;
- storage class.
For the selected filter the dashboard shows charts and tables of cost grouped by day, week, or month, together with the total for the period. All filter values are derived only from resources the current user has access to.
“Reports” section
The reports section is used to generate analytical reports on demand and on schedule.
Main elements:
- the All reports tab — a list of already generated reports available for download;
- the Manage recurring reports tab — a list of schedules that produce periodic reports.
Creating an instant report
- Click Create.
- Enter the report name.
- Choose type Instant.
- Set the period.
- Select nested columns for detail (for example, workspace, project, cluster, namespace).
- To split cost by resource kind, enable Break down by resources.
- Click Save. The report appears in the list and starts generating.
- The completed report is available for download as CSV (comma separator, first row contains column names).
Creating a recurring report
- Click Create.
- Enter the report name.
- Choose type Recurring.
- Set the start date and frequency.
- Select columns for detail; enable Break down by resources if needed.
- Click Save. The schedule appears in the Manage recurring reports tab.
Reports are generated automatically on schedule and land in the All reports tab.
Migration from earlier versions
If billing in your installation was set up before compute classes and storage classes existed as separate entities (for example, you manually added the billing.commander.deckhouse.io/name label to node groups and specified names in the tariff), nothing breaks after the upgrade — billing continues to work without interruption. No action is required. Historical dashboard and report data is preserved in full.
What happens automatically
- Data migration: for every label name that was specified in a tariff, a compute class with the same name is created and its link to the tariff (price) is preserved.
- Resource discovery: when generating a compute class for a cloud node group, the automation checks whether the node group already has a
billing.commander.deckhouse.io/namelabel. If the label exists, the class is created with the same name. This ensures the auto-created class matches what is already in use, and no conflicts arise. - Storage classes are created immediately from agent data.
After the upgrade you can remove the billing.commander.deckhouse.io/name label from node group templates (spec.nodeTemplate.labels). The agent now applies the label to nodes on its own, and the manual template label is no longer needed. If left in place, the label is simply duplicated with no negative effects.