The module lifecycle stageExperimental
The module has requirements for installation

This guide covers administrative tasks: enabling the module, connecting S3-compatible storage, choosing model delivery, preparing node-local cache, configuring distribution, and checking runtime health.

Requirements

  • Deckhouse Kubernetes Platform >= 1.74.
  • Kubernetes >= 1.30.
  • S3-compatible object storage and a bucket for DMCR data and model preparation staging data.
  • Secret in d8-system with accessKey and secretKey.
  • RWX StorageClass for SharedPVC.
  • sds-node-configurator and sds-local-volume modules for NodeCache.

Enablement

Create a Secret with object storage credentials:

apiVersion: v1
kind: Secret
metadata:
  name: ai-models-artifacts
  namespace: d8-system
type: Opaque
stringData:
  accessKey: "<access-key>"
  secretKey: "<secret-key>"

Enable the module:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: ai-models
spec:
  enabled: true
  version: 1
  settings:
    logLevel: Info
    artifacts:
      bucket: ai-models
      endpoint: https://s3.example.com
      region: us-east-1
      credentialsSecretName: ai-models-artifacts
      usePathStyle: true

If object storage uses a custom CA, add ca.crt to a separate Secret in d8-system and set artifacts.caSecretName. You can also put ca.crt into the credentials Secret; the module uses it as a trust source.

The module-local Secret in d8-ai-models is rendered by Helm from data prepared by the synchronization hook. The administrator manages only the source Secret in d8-system.

Model Delivery

delivery.type selects how models in the Ready phase are attached to workloads. This setting is local to one cluster; it is not a model source and not an external catalog. If the delivery block is omitted, SharedPVC is used.

SharedPVC

SharedPVC fits clusters with storage that supports ReadWriteMany:

spec:
  settings:
    delivery:
      type: SharedPVC
      sharedPVCStorageClassName: rwx-storage-class

When sharedPVCStorageClassName is empty, storage class resolution uses:

  1. global.modules.storageClass;
  2. global.defaultClusterStorageClass;
  3. Kubernetes default StorageClass.

The selected class must exist. The storage provisioner must then bind a ReadWriteMany PVC. If the class is not found, the controller keeps the workload gated with reason SharedPVCStorageClassMissing. If more than one Kubernetes default StorageClass exists, the controller keeps the workload gated with reason SharedPVCStorageClassAmbiguous; set an explicit module or Deckhouse global storage class to make the choice deterministic. If the provisioner cannot bind the PVC, inspect the PVC events for the storage-side reason.

A local RWO PVC is not a separate delivery mode. If the model must be kept near applications on selected nodes, use NodeCache: the module creates a node cache and exposes the model to workloads through a read-only CSI mount.

NodeCache

NodeCache is intended for large models and repeated model reuse by multiple workloads on the same node.

  1. Enable sds-node-configurator and sds-local-volume.

  2. Label cache nodes:

    d8 k label node <node-name> ai.deckhouse.io/model-cache=true
  3. Label free BlockDevice objects:

    d8 k label blockdevice <block-device-name> ai.deckhouse.io/model-cache=true
  4. Enable NodeCache delivery:

    spec:
      settings:
        delivery:
          type: NodeCache
          nodeCacheSize: 200Gi

By default, nodes and block devices are selected by ai.deckhouse.io/model-cache=true. If the cluster has a different labeling scheme, set delivery.nodeCacheNodeSelector and delivery.nodeCacheBlockDeviceSelector.

Check substrate state:

d8 k get blockdevices.storage.deckhouse.io -o wide
d8 k get lvmvolumegroupsets.storage.deckhouse.io
d8 k get lvmvolumegroups.storage.deckhouse.io
d8 k get localstorageclasses.storage.deckhouse.io
d8 k -n d8-ai-models get pods,pvc -l app=ai-models-node-cache-runtime -o wide

The selected disk must be free and have consumable=true.

Storage Limit

artifacts.capacityLimit sets the total budget for module-owned artifacts:

spec:
  settings:
    artifacts:
      capacityLimit: 500Gi

When the limit is set, upload gateway accepts uploads only when payload size is known. A regular curl -T sends Content-Length; multipart clients send size through /probe.

Object Storage and DMCR

The bucket from artifacts.bucket is owned by the module. Do not store unrelated data there and do not delete objects manually: the controller and DMCR keep their own references between objects, and manual deletion can break a local model copy or a later workload delivery retry.

DMCR (Deckhouse Model Container Registry) is the module’s internal OCI registry. It stores prepared models as OCI artifacts on top of the configured S3-compatible bucket. The administrator configures the bucket and credentials, but does not manage OCI paths, tags, service links, or DMCR objects by hand.

A model in DMCR is not stored as a single file. The controller packages the source model files as an internal OCI ModelPack artifact without changing the model weight format. That is why one model can appear in the object storage UI as dozens or hundreds of objects. Some objects are manifests, configs, layers, and registry links; some are source upload or source mirror staging data; some are service markers that allow preparation replay and safe cleanup.

The following groups are useful for orientation in an S3-compatible storage UI. Prefix names are internal structure and are not a stable API.

Object group Purpose Cleanup behavior
docker/registry/... Internal OCI registry data and metadata: manifests, configs, repository links, and model layers. Removed after owner deletion and a successful garbage-collection cycle. Shared layers stay while another model needs them.
raw/... Preparation staging data: uploaded files, HuggingFace/Ollama source snapshots, or replay data for preparation. Removed after model deletion or the related cleanup procedure.
_ai_models/direct-upload/... Physical direct-upload and multipart objects that are later attached to the OCI artifact. Removed after successful finalization or as stale orphaned data.
Open multipart uploads Unfinished upload parts that may not appear as ordinary objects in every UI. Garbage collection aborts stale multipart uploads separately from object deletion.

Object count is not the model count. A small object can be only a service link, while one large model layer can occupy gigabytes. One layer can also be shared by several artifacts, so deleting one model does not always free exactly that model size immediately.

Deleting a Model or ClusterModel starts asynchronous cleanup:

  1. The controller removes the model reference from the catalog and queues a cleanup request.
  2. The DMCR cleanup helper coalesces requests, opens a maintenance gate, and waits for replica acknowledgements.
  3. It then removes stale staging prefixes, aborts stale multipart uploads, and runs OCI registry garbage collection.
  4. The cleanup result is published through metrics and logs.

Objects can therefore remain in the bucket right after model deletion. This is normal while alerts D8AIModelsPublicationCleanupBacklogStale and D8AIModelsPublicationCleanupFailed are not firing and the dashboard shows completed cleanup cycles.

Safe checks:

d8 k get models.ai.deckhouse.io -A
d8 k get clustermodels.ai.deckhouse.io
d8 k -n d8-ai-models get secrets -l ai.deckhouse.io/dmcr-gc-request=true
d8 k -n d8-ai-models logs deploy/dmcr -c dmcr-garbage-collection --since=2h

In logs, look for dmcr garbage collection completed: it contains deleted object count, reclaimed bytes, and deleted registry blob count. If bucket objects keep growing while cleanup requests are stuck or failed, fix the alert cause first. Manual object deletion from the bucket is allowed only as a separate emergency procedure with a verified prefix list.

Model Data Flow

The module has one model preparation path and two workload delivery paths.

Preparation reads a model source, verifies the data, and packages the source files as an internal OCI ModelPack artifact. This is not model weight conversion: GGUF stays GGUF, Safetensors stays Safetensors. DMCR stores that artifact as the local verified copy. Monitoring shows both logical model bytes and stored artifact bytes, so operators can see whether chunking and compression save space in the bucket.

SharedPVC materializes the model from DMCR into a controller-owned RWX PVC in the workload namespace. The materializer Job does not use the Kubernetes API. Progress is measured by DMCR from the signed read grant issued for that Job, so the dashboard can show expected bytes, pulled bytes and throughput per materializer Job.

NodeCache materializes the model from DMCR into the node-local cache. The long-running node-cache runtime reports expected and downloaded bytes per node and artifact identity, plus cache footprint and CSI request latency.

Distribution uses a separate byte path. A consuming cluster first reads the semantic catalog, then imports selected OCI artifacts as local copies. The DMCR logs distribution pulls with consumer identity and exposes pull rate and byte metrics grouped by transfer purpose.

Cross-Perimeter Distribution

Distribution is a catalog/import plane. It exposes ClusterModel objects in the Ready phase between registry tiers and network zones. It does not change how a model is attached to a workload.

Typical topology:

  1. A publishing cluster in the DMZ exposes ClusterModel objects in the Ready phase.
  2. A consuming cluster in the internal perimeter imports selected models as local copies.
  3. Workload delivery in the internal cluster remains SharedPVC or NodeCache.

This topology is useful when a DMZ cluster is only a distribution tier: it prepares and serves model artifacts, but has no annotated workloads. Distribution therefore remains a separate axis, not a third delivery.type value.

Enable the public catalog mode on the publishing tier:

spec:
  settings:
    distribution:
      mode: PublicCatalog

After the setting is applied, the module prepares the distribution transport on the module public host:

https://ai-models.example.com/api/distribution/v1/models
https://ai-models.example.com/v2

/api/distribution/v1/models serves the semantic catalog: only ClusterModel objects in the Ready phase, without internal registry names or object UIDs. /v2 remains the OCI byte path for controller-owned copy/import workflows.

Consumer Access

Public catalog access uses Kubernetes authentication and authorization in the publishing cluster. The module does not create a separate consumer CRD.

Create a ServiceAccount for each consuming cluster, organization, or perimeter and bind it to the module distribution reader role:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: perimeter-a
  namespace: d8-ai-models
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: ai-models-distribution-reader-perimeter-a
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: d8:ai-models:distribution:reader
subjects:
  - kind: ServiceAccount
    name: perimeter-a
    namespace: d8-ai-models

Issue a token in the publishing cluster and pass only the token value to the consuming administrator through a protected external channel:

d8 k -n d8-ai-models create token perimeter-a --duration=720h

For a long-lived operational credential, create a Kubernetes service-account-token Secret in the publishing cluster and read the token key after Kubernetes fills it:

apiVersion: v1
kind: Secret
metadata:
  name: perimeter-a-token
  namespace: d8-ai-models
  annotations:
    kubernetes.io/service-account.name: perimeter-a
type: kubernetes.io/service-account-token

The module does not issue or move this token automatically between clusters: that requires an external trusted channel or a secret manager. Rotate by issuing a new token in the publishing cluster and updating the consuming cluster Secret. After the Secret is updated, the controller rereads ModelCatalogSource and continues with the new token.

Revoke by deleting the RoleBinding or the ServiceAccount. Revocation closes new catalog requests and pull grants; already issued short-lived pull grants expire by TTL.

Consuming Cluster Setup

Consuming clusters describe upstream catalogs with ModelCatalogSource:

apiVersion: ai.deckhouse.io/v1alpha1
kind: ModelCatalogSource
metadata:
  name: dmz
spec:
  url: https://ai-models.dmz.example.com
  credentialsSecretName: ai-models-dmz-read
  caSecretName: ai-models-dmz-ca

The Secrets named by credentialsSecretName and caSecretName live in d8-system. The controller reads them directly for catalog refresh and pull grant exchange. The credentialsSecretName Secret contains the publishing-cluster ServiceAccount token key; caSecretName contains ca.crt and is needed only for a private CA on the external catalog. These Secrets are not copied to workload namespaces.

Models are not selected in ModuleConfig. The publishing cluster exports all ClusterModel objects in the Ready phase, and consuming users import selected models through spec.source.catalog.name.

The consuming catalog view is cluster-scoped. It shows remote catalog entries and local copies as Model or ClusterModel references, including the namespace for namespaced Model objects. This projection is available only to module management personas and does not include source URLs, Secret names, tokens, OCI repositories, tags, or blob lists.

Audit

The catalog API authenticates bearer tokens with TokenReview and authorizes requests with SubjectAccessReview: catalog list requires list on clustermodels.ai.deckhouse.io; item lookup and pull grants require get on the selected ClusterModel. It writes structured audit events for catalog_list, catalog_get, pull_grant_issued, catalog_auth_denied, and conflict/error paths with the Kubernetes username and UID. DMCR writes manifest/blob pull events with the same identity, grantID, modelName, digest, and the authorization result. Raw bearer tokens and token hashes are never written to audit.

RBAC

The module uses the Deckhouse access-level model:

Level Access
User read Model, ClusterModel, and their statuses;
Editor manage namespaced Model;
ClusterEditor manage ClusterModel;
ClusterAdmin manage ModelCatalogSource, distribution reader bindings, and catalog import projections;
rbacv2/use use namespaced Model;
rbacv2/manage manage Model, ClusterModel, ModelCatalogSource, ModuleConfig, and catalog import projections.

Upload credentials are exposed through a dedicated Role for one Secret from status.upload.secretName. The Role is created in the model namespace and is named ai-model-upload-reader-<model-name> or gets a stable hash for long names.

External Catalog Import Recovery

Catalog import stores frozen provenance: catalog source, model name, catalog revision, and remote digest. This prevents a workload from silently moving to another model version during a later reconcile.

The following failures are recoverable after the administrator fixes the source-side problem:

  • CatalogAuthFailed — token expired, Secret was updated, or RBAC on the publishing cluster was fixed;
  • CatalogTLSInvalidcaSecretName or ca.crt was fixed;
  • CatalogSourceNotReady — the external catalog source returned to the Ready phase.

After ModelCatalogSource becomes healthy, the controller retries importing the same frozen model. ManifestInvalid, InsufficientStorage, and an invalid catalog contract are not automatic retry paths: fix the source artifact, storage limits, or the catalog specification first.

Check:

d8 k get modelcatalogsources.ai.deckhouse.io
d8 k describe modelcatalogsource <name>
d8 k -n <namespace> describe model <name>

Monitoring

Check monitoring resources:

d8 k -n d8-ai-models get podmonitor,prometheusrule

Main dashboard sections:

  • Cluster overview. Shows the Model and ClusterModel inventory, objects in Publishing, Ready, and Failed phases, total prepared local copy size, managed workload count, and model references that the controller could not resolve. Start diagnostics here: non-zero Failed objects and unresolved references mean that you should drill down to a specific model or workload.
  • Catalog state. Separate dashboards for namespace-scoped Model objects and cluster-scoped ClusterModel objects help locate whether the problem is in one namespace, in the shared cluster catalog, or in one selected model. Check phase, readiness, conditions, source, format, local copy size, model consumers, and workloads with unresolved delivery.
  • Model preparation. Shows objects currently being prepared, upload and packaging progress, transfer throughput, completion or verification errors, and retries. If progress does not move for a long time, compare transfer throughput with DMCR/bucket state and inspect events on the related Model or ClusterModel.
  • DMCR and bucket. Capacity panels show the configured limit, used, reserved, and available space for prepared local copies. Storage efficiency is shown separately: the logical model size and actual stored bytes can differ because of layer chunking, archiving, and data reuse. The cleanup queue shows pending, active, and failed cleanup requests after model deletion.
  • Workload delivery. Shows which workloads are managed by the module, how many Pods are ready, which delivery mode was selected, and why. For SharedPVC, check PVC state, copy queue, and materializer Job throughput. For unknown delivery mode or unresolved references, verify model name, namespace, and permission to use the model.
  • Node cache. Used for NodeCache mode: shows runtime Pods, bound PVCs, used and available space per node, cache entry count, copy throughput, materialization concurrency, and CSI mount/unmount request latency. Growing latency or low effective free space usually points to a local disk, PVC, or node-cache runtime issue.
  • Catalog distribution. If cross-perimeter distribution is enabled, check public catalog request rate, pull grant issuance, API latency, and import throughput. Authorization errors or growing latency should be compared with consumer RBAC, API-server state, and audit events in the publishing cluster.

Operational Checks

Check components:

d8 k -n d8-ai-models get pods -o wide
d8 k get models.ai.deckhouse.io -A
d8 k get clustermodels.ai.deckhouse.io

Check a model:

d8 k -n <namespace> describe model <name>
d8 k get clustermodel <name> -o yaml

Useful fields:

  • status.phase;
  • status.conditions;
  • status.artifact.digest;
  • status.artifact.sizeBytes;
  • status.resolved.format;
  • status.resolved.supportedEndpointTypes;
  • status.resolved.supportedFeatures.

Disable

When spec.enabled=false, module-owned volatile runtime resources are removed: node-cache runtime Pod/PVC, CSIDriver, LocalStorageClass, LVMVolumeGroupSet, managed LVMVolumeGroup, and StorageClass ai-models-node-cache.

Model, ClusterModel, and already prepared local model copies remain. To delete a model, delete the corresponding Model or ClusterModel; the controller finishes cleanup through a finalizer and GC request.