The module lifecycle stage: Experimental
The module has requirements for installation
This guide covers administrative tasks: enabling the module, connecting S3-compatible storage, choosing model delivery, preparing node-local cache, configuring distribution, and checking runtime health.
Requirements
- Deckhouse Kubernetes Platform
>= 1.74. - Kubernetes
>= 1.30. - S3-compatible object storage and a bucket for DMCR data and model preparation staging data.
- Secret in
d8-systemwithaccessKeyandsecretKey. - RWX
StorageClassforSharedPVC. sds-node-configuratorandsds-local-volumemodules forNodeCache.
Enablement
Create a Secret with object storage credentials:
apiVersion: v1
kind: Secret
metadata:
name: ai-models-artifacts
namespace: d8-system
type: Opaque
stringData:
accessKey: "<access-key>"
secretKey: "<secret-key>"Enable the module:
apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: ai-models
spec:
enabled: true
version: 1
settings:
logLevel: Info
artifacts:
bucket: ai-models
endpoint: https://s3.example.com
region: us-east-1
credentialsSecretName: ai-models-artifacts
usePathStyle: trueIf object storage uses a custom CA, add ca.crt to a separate Secret in
d8-system and set artifacts.caSecretName. You can also put ca.crt into
the credentials Secret; the module uses it as a trust source.
The module-local Secret in d8-ai-models is rendered by Helm from data
prepared by the synchronization hook. The administrator manages only the source
Secret in d8-system.
Model Delivery
delivery.type selects how models in the Ready phase are attached to
workloads. This setting is local to one cluster; it is not a model source and
not an external catalog. If the delivery block is omitted, SharedPVC is
used.
SharedPVC
SharedPVC fits clusters with storage that supports ReadWriteMany:
spec:
settings:
delivery:
type: SharedPVC
sharedPVCStorageClassName: rwx-storage-classWhen sharedPVCStorageClassName is empty, storage class resolution uses:
global.modules.storageClass;global.defaultClusterStorageClass;- Kubernetes default
StorageClass.
The selected class must exist. The storage provisioner must then bind a
ReadWriteMany PVC. If the class is not found, the controller keeps the
workload gated with reason SharedPVCStorageClassMissing. If more than one
Kubernetes default StorageClass exists, the controller keeps the workload
gated with reason SharedPVCStorageClassAmbiguous; set an explicit module or
Deckhouse global storage class to make the choice deterministic. If the
provisioner cannot bind the PVC, inspect the PVC events for the storage-side
reason.
A local RWO PVC is not a separate delivery mode. If the model must be kept near
applications on selected nodes, use NodeCache: the module creates a node
cache and exposes the model to workloads through a read-only CSI mount.
NodeCache
NodeCache is intended for large models and repeated model reuse by multiple
workloads on the same node.
-
Enable
sds-node-configuratorandsds-local-volume. -
Label cache nodes:
d8 k label node <node-name> ai.deckhouse.io/model-cache=true -
Label free
BlockDeviceobjects:d8 k label blockdevice <block-device-name> ai.deckhouse.io/model-cache=true -
Enable
NodeCachedelivery:spec: settings: delivery: type: NodeCache nodeCacheSize: 200Gi
By default, nodes and block devices are selected by
ai.deckhouse.io/model-cache=true. If the cluster has a different labeling
scheme, set delivery.nodeCacheNodeSelector and
delivery.nodeCacheBlockDeviceSelector.
Check substrate state:
d8 k get blockdevices.storage.deckhouse.io -o wide
d8 k get lvmvolumegroupsets.storage.deckhouse.io
d8 k get lvmvolumegroups.storage.deckhouse.io
d8 k get localstorageclasses.storage.deckhouse.io
d8 k -n d8-ai-models get pods,pvc -l app=ai-models-node-cache-runtime -o wideThe selected disk must be free and have consumable=true.
Storage Limit
artifacts.capacityLimit sets the total budget for module-owned artifacts:
spec:
settings:
artifacts:
capacityLimit: 500GiWhen the limit is set, upload gateway accepts uploads only when payload size is
known. A regular curl -T sends Content-Length; multipart clients send size
through /probe.
Object Storage and DMCR
The bucket from artifacts.bucket is owned by the module. Do not store
unrelated data there and do not delete objects manually: the controller and
DMCR keep their own references between objects, and manual deletion can break
a local model copy or a later workload delivery retry.
DMCR (Deckhouse Model Container Registry) is the module’s internal OCI
registry. It stores prepared models as OCI artifacts on top of the configured
S3-compatible bucket. The administrator configures the bucket and credentials,
but does not manage OCI paths, tags, service links, or DMCR objects by hand.
A model in DMCR is not stored as a single file. The controller packages the
source model files as an internal OCI ModelPack artifact without changing the
model weight format. That is why one model can appear in the object storage UI
as dozens or hundreds of objects. Some objects are manifests, configs, layers,
and registry links; some are source upload or source mirror staging data; some
are service markers that allow preparation replay and safe cleanup.
The following groups are useful for orientation in an S3-compatible storage UI. Prefix names are internal structure and are not a stable API.
| Object group | Purpose | Cleanup behavior |
|---|---|---|
docker/registry/... |
Internal OCI registry data and metadata: manifests, configs, repository links, and model layers. | Removed after owner deletion and a successful garbage-collection cycle. Shared layers stay while another model needs them. |
raw/... |
Preparation staging data: uploaded files, HuggingFace/Ollama source snapshots, or replay data for preparation. | Removed after model deletion or the related cleanup procedure. |
_ai_models/direct-upload/... |
Physical direct-upload and multipart objects that are later attached to the OCI artifact. | Removed after successful finalization or as stale orphaned data. |
| Open multipart uploads | Unfinished upload parts that may not appear as ordinary objects in every UI. | Garbage collection aborts stale multipart uploads separately from object deletion. |
Object count is not the model count. A small object can be only a service link, while one large model layer can occupy gigabytes. One layer can also be shared by several artifacts, so deleting one model does not always free exactly that model size immediately.
Deleting a Model or ClusterModel starts asynchronous cleanup:
- The controller removes the model reference from the catalog and queues a cleanup request.
- The DMCR cleanup helper coalesces requests, opens a maintenance gate, and waits for replica acknowledgements.
- It then removes stale staging prefixes, aborts stale multipart uploads, and runs OCI registry garbage collection.
- The cleanup result is published through metrics and logs.
Objects can therefore remain in the bucket right after model deletion. This is
normal while alerts D8AIModelsPublicationCleanupBacklogStale and
D8AIModelsPublicationCleanupFailed are not firing and the dashboard shows
completed cleanup cycles.
Safe checks:
d8 k get models.ai.deckhouse.io -A
d8 k get clustermodels.ai.deckhouse.io
d8 k -n d8-ai-models get secrets -l ai.deckhouse.io/dmcr-gc-request=true
d8 k -n d8-ai-models logs deploy/dmcr -c dmcr-garbage-collection --since=2hIn logs, look for dmcr garbage collection completed: it contains deleted
object count, reclaimed bytes, and deleted registry blob count. If bucket
objects keep growing while cleanup requests are stuck or failed, fix the alert
cause first. Manual object deletion from the bucket is allowed only as a
separate emergency procedure with a verified prefix list.
Model Data Flow
The module has one model preparation path and two workload delivery paths.
Preparation reads a model source, verifies the data, and packages the source
files as an internal OCI ModelPack artifact. This is not model weight
conversion: GGUF stays GGUF, Safetensors stays Safetensors. DMCR stores
that artifact as the local verified copy. Monitoring shows both logical model
bytes and stored artifact bytes, so operators can see whether chunking and
compression save space in the bucket.
SharedPVC materializes the model from DMCR into a controller-owned RWX PVC
in the workload namespace. The materializer Job does not use the Kubernetes
API. Progress is measured by DMCR from the signed read grant issued for that
Job, so the dashboard can show expected bytes, pulled bytes and throughput per
materializer Job.
NodeCache materializes the model from DMCR into the node-local cache. The
long-running node-cache runtime reports expected and downloaded bytes per node
and artifact identity, plus cache footprint and CSI request latency.
Distribution uses a separate byte path. A consuming cluster first reads the semantic catalog, then imports selected OCI artifacts as local copies. The DMCR logs distribution pulls with consumer identity and exposes pull rate and byte metrics grouped by transfer purpose.
Cross-Perimeter Distribution
Distribution is a catalog/import plane. It exposes ClusterModel objects in the
Ready phase between registry tiers and network zones. It does not change how a
model is attached to a workload.
Typical topology:
- A publishing cluster in the DMZ exposes
ClusterModelobjects in theReadyphase. - A consuming cluster in the internal perimeter imports selected models as local copies.
- Workload delivery in the internal cluster remains
SharedPVCorNodeCache.
This topology is useful when a DMZ cluster is only a distribution tier: it
prepares and serves model artifacts, but has no annotated workloads. Distribution
therefore remains a separate axis, not a third delivery.type value.
Enable the public catalog mode on the publishing tier:
spec:
settings:
distribution:
mode: PublicCatalogAfter the setting is applied, the module prepares the distribution transport on the module public host:
https://ai-models.example.com/api/distribution/v1/models
https://ai-models.example.com/v2/api/distribution/v1/models serves the semantic catalog: only ClusterModel
objects in the Ready phase, without internal registry names or object UIDs.
/v2 remains the OCI byte path for controller-owned copy/import workflows.
Consumer Access
Public catalog access uses Kubernetes authentication and authorization in the publishing cluster. The module does not create a separate consumer CRD.
Create a ServiceAccount for each consuming cluster, organization, or perimeter and bind it to the module distribution reader role:
apiVersion: v1
kind: ServiceAccount
metadata:
name: perimeter-a
namespace: d8-ai-models
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: ai-models-distribution-reader-perimeter-a
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: d8:ai-models:distribution:reader
subjects:
- kind: ServiceAccount
name: perimeter-a
namespace: d8-ai-modelsIssue a token in the publishing cluster and pass only the token value to the consuming administrator through a protected external channel:
d8 k -n d8-ai-models create token perimeter-a --duration=720hFor a long-lived operational credential, create a Kubernetes
service-account-token Secret in the publishing cluster and read the token
key after Kubernetes fills it:
apiVersion: v1
kind: Secret
metadata:
name: perimeter-a-token
namespace: d8-ai-models
annotations:
kubernetes.io/service-account.name: perimeter-a
type: kubernetes.io/service-account-tokenThe module does not issue or move this token automatically between clusters:
that requires an external trusted channel or a secret manager. Rotate by
issuing a new token in the publishing cluster and updating the consuming
cluster Secret. After the Secret is updated, the controller rereads
ModelCatalogSource and continues with the new token.
Revoke by deleting the RoleBinding or the ServiceAccount. Revocation closes new catalog requests and pull grants; already issued short-lived pull grants expire by TTL.
Consuming Cluster Setup
Consuming clusters describe upstream catalogs with ModelCatalogSource:
apiVersion: ai.deckhouse.io/v1alpha1
kind: ModelCatalogSource
metadata:
name: dmz
spec:
url: https://ai-models.dmz.example.com
credentialsSecretName: ai-models-dmz-read
caSecretName: ai-models-dmz-caThe Secrets named by credentialsSecretName and caSecretName live in
d8-system. The controller reads them directly for catalog refresh and pull
grant exchange. The credentialsSecretName Secret contains the
publishing-cluster ServiceAccount token key; caSecretName contains
ca.crt and is needed only for a private CA on the external catalog. These
Secrets are not copied to workload namespaces.
Models are not selected in ModuleConfig. The publishing cluster exports all
ClusterModel objects in the Ready phase, and consuming users import
selected models through spec.source.catalog.name.
The consuming catalog view is cluster-scoped. It shows remote catalog entries
and local copies as Model or ClusterModel references, including the
namespace for namespaced Model objects. This projection is available only to
module management personas and does not include source URLs, Secret names,
tokens, OCI repositories, tags, or blob lists.
Audit
The catalog API authenticates bearer tokens with TokenReview and authorizes
requests with SubjectAccessReview: catalog list requires list on
clustermodels.ai.deckhouse.io; item lookup and pull grants require get on
the selected ClusterModel. It writes structured audit events for
catalog_list, catalog_get, pull_grant_issued, catalog_auth_denied, and
conflict/error paths with the Kubernetes username and UID. DMCR writes
manifest/blob pull events with the same identity, grantID,
modelName, digest, and the authorization result. Raw bearer tokens and
token hashes are never written to audit.
RBAC
The module uses the Deckhouse access-level model:
| Level | Access |
|---|---|
User |
read Model, ClusterModel, and their statuses; |
Editor |
manage namespaced Model; |
ClusterEditor |
manage ClusterModel; |
ClusterAdmin |
manage ModelCatalogSource, distribution reader bindings, and catalog import projections; |
rbacv2/use |
use namespaced Model; |
rbacv2/manage |
manage Model, ClusterModel, ModelCatalogSource, ModuleConfig, and catalog import projections. |
Upload credentials are exposed through a dedicated Role for one Secret from
status.upload.secretName. The Role is created in the model namespace and is
named ai-model-upload-reader-<model-name> or gets a stable hash for long
names.
External Catalog Import Recovery
Catalog import stores frozen provenance: catalog source, model name, catalog revision, and remote digest. This prevents a workload from silently moving to another model version during a later reconcile.
The following failures are recoverable after the administrator fixes the source-side problem:
CatalogAuthFailed— token expired, Secret was updated, or RBAC on the publishing cluster was fixed;CatalogTLSInvalid—caSecretNameorca.crtwas fixed;CatalogSourceNotReady— the external catalog source returned to theReadyphase.
After ModelCatalogSource becomes healthy, the controller retries importing
the same frozen model. ManifestInvalid, InsufficientStorage, and an invalid
catalog contract are not automatic retry paths: fix the source artifact,
storage limits, or the catalog specification first.
Check:
d8 k get modelcatalogsources.ai.deckhouse.io
d8 k describe modelcatalogsource <name>
d8 k -n <namespace> describe model <name>Monitoring
Check monitoring resources:
d8 k -n d8-ai-models get podmonitor,prometheusruleMain dashboard sections:
- Cluster overview. Shows the
ModelandClusterModelinventory, objects inPublishing,Ready, andFailedphases, total prepared local copy size, managed workload count, and model references that the controller could not resolve. Start diagnostics here: non-zeroFailedobjects and unresolved references mean that you should drill down to a specific model or workload. - Catalog state. Separate dashboards for namespace-scoped
Modelobjects and cluster-scopedClusterModelobjects help locate whether the problem is in one namespace, in the shared cluster catalog, or in one selected model. Check phase, readiness, conditions, source, format, local copy size, model consumers, and workloads with unresolved delivery. - Model preparation. Shows objects currently being prepared, upload and
packaging progress, transfer throughput, completion or verification errors,
and retries. If progress does not move for a long time, compare transfer
throughput with DMCR/bucket state and inspect events on the related
ModelorClusterModel. - DMCR and bucket. Capacity panels show the configured limit, used, reserved, and available space for prepared local copies. Storage efficiency is shown separately: the logical model size and actual stored bytes can differ because of layer chunking, archiving, and data reuse. The cleanup queue shows pending, active, and failed cleanup requests after model deletion.
- Workload delivery. Shows which workloads are managed by the module, how
many Pods are ready, which delivery mode was selected, and why. For
SharedPVC, check PVC state, copy queue, and materializer Job throughput. For unknown delivery mode or unresolved references, verify model name, namespace, and permission to use the model. - Node cache. Used for
NodeCachemode: shows runtime Pods, bound PVCs, used and available space per node, cache entry count, copy throughput, materialization concurrency, and CSI mount/unmount request latency. Growing latency or low effective free space usually points to a local disk, PVC, or node-cache runtime issue. - Catalog distribution. If cross-perimeter distribution is enabled, check public catalog request rate, pull grant issuance, API latency, and import throughput. Authorization errors or growing latency should be compared with consumer RBAC, API-server state, and audit events in the publishing cluster.
Operational Checks
Check components:
d8 k -n d8-ai-models get pods -o wide
d8 k get models.ai.deckhouse.io -A
d8 k get clustermodels.ai.deckhouse.ioCheck a model:
d8 k -n <namespace> describe model <name>
d8 k get clustermodel <name> -o yamlUseful fields:
status.phase;status.conditions;status.artifact.digest;status.artifact.sizeBytes;status.resolved.format;status.resolved.supportedEndpointTypes;status.resolved.supportedFeatures.
Disable
When spec.enabled=false, module-owned volatile runtime resources are removed:
node-cache runtime Pod/PVC, CSIDriver, LocalStorageClass,
LVMVolumeGroupSet, managed LVMVolumeGroup, and StorageClass
ai-models-node-cache.
Model, ClusterModel, and already prepared local model copies remain. To
delete a model, delete the corresponding Model or ClusterModel; the controller
finishes cleanup through a finalizer and GC request.