The module lifecycle stageExperimental
The module has requirements for installation

Which Delivery Modes Are Supported?

Two modes are supported:

  • SharedPVC — the default mode for clusters with RWX storage;
  • NodeCache — node-local cache backed by SDS and a CSI mount.

SharedPVC creates an RWX PVC in the workload namespace and a materializer Job for each requested model. NodeCache keeps models in a shared per-node cache and mounts them read-only through CSI.

Is The Module Only For LLMs?

No. The module stores and delivers model files in supported formats such as Safetensors, GGUF, and Diffusers. These can be LLMs, embedding models, or other models when their format is supported by the controller.

The module does not run an inference runtime and does not decide how the application uses the model. It prepares verified files and mounts them into a Pod.

How Is Distribution Different From Delivery?

Delivery attaches a model to a workload inside the cluster.

Distribution is the catalog/import plane between clusters or network zones. The publishing cluster exposes ClusterModel objects in the Ready phase, and the consuming cluster imports a local copy before workload delivery. Distribution is therefore not a third delivery.type value; it is enabled through distribution.mode=PublicCatalog.

What Exactly Happens After Model Creation?

The controller reads spec.source, receives model data, verifies format and digest, stores a local copy in DMCR, and updates status.

status.phase: Ready means the local copy is verified and stored. Only then can the model be delivered to workloads or exposed through the public catalog.

What Is ModelPack For?

ModelPack is the module’s internal packaging of source model files as an OCI artifact in DMCR. It is used for verification, replay after failure, cleanup, and repeatable delivery. It is not weight conversion: source file formats do not change. Users do not choose ModelPack, a digest, or a registry path.

For large object-source models, preparation uses the chunked path: the controller reads the source by ranges, stores completed chunks in DMCR, and can continue after restart from the last committed layer.

Compression is automatic where it reduces stored bytes: chunked payload is compressed with zstd, and the raw chunk is kept when compression would make it larger. Archive layers support gzip and zstd media types. Monitoring compares logical bytes with stored bytes to show the actual saving.

Why Are There Thousands Of Objects In The S3 Bucket?

This is normal for prepared models. The module stores an OCI ModelPack, not one file per model: manifest, config, layers, repository links, preparation staging data, and service markers used for replay and cleanup. In the storage UI you may see groups such as docker/registry/..., raw/..., and _ai_models/direct-upload/...; these are internal structure, not a user API.

Object count is not the model count. Deleting a Model or ClusterModel queues cleanup, but physical space is reclaimed later by garbage collection. Do not delete objects manually: check the storage dashboard, cleanup requests, and the dmcr-garbage-collection container logs first.

How Do I See Runtime Download Progress?

For SharedPVC, use the overview dashboard panels for SharedPVC materializer queue and throughput. The controller publishes the expected bytes for every materializer Job; DMCR counts pulled OCI blob bytes from the signed read grant used by that Job.

For NodeCache, use the node-cache materialization throughput and footprint panels. The node-cache runtime reports expected and downloaded bytes per node and artifact identity while it fills the node-local cache.

How Do I Import From A Distribution Catalog?

Create a cluster-scoped ModelCatalogSource for the upstream catalog. Then create a local Model or ClusterModel with spec.source.catalog.name. Set spec.source.catalog.sourceName only when more than one catalog source is available.

Will Catalog Import Recover After Token Or CA Problems?

Yes, for source-side failures. CatalogAuthFailed, CatalogTLSInvalid, and CatalogSourceNotReady are retried after the ModelCatalogSource becomes healthy again. The controller keeps the selected catalog revision and remote digest frozen, so recovery does not silently switch to another model version.

ManifestInvalid, InsufficientStorage, and an invalid catalog contract are not automatic retry paths. Fix the artifact, storage limit, or catalog specification first.

Where Do I Get A Token For ModelCatalogSource?

On the publishing cluster, the administrator creates a ServiceAccount for the consuming cluster and binds it to ClusterRole d8:ai-models:distribution:reader. Then the administrator issues a token with d8 k -n d8-ai-models create token <service-account> and passes only the token value through a protected external channel. On the consuming cluster, put this token into a Secret in d8-system and set its name in ModelCatalogSource.spec.credentialsSecretName.

The module does not rotate this token by itself: it does not have trusted access to both the publishing and consuming clusters. Rotate by issuing a new token in the publishing cluster and updating the Secret in the consuming cluster. The controller rereads ModelCatalogSource after the Secret update.

Revoke by deleting the RoleBinding or ServiceAccount in the publishing cluster.

How Do I Configure A Private CA For An External Catalog?

Create a Secret with ca.crt in d8-system and set ModelCatalogSource.spec.caSecretName. TLS validation is fail-closed: a wrong or missing CA sets a condition with reason CatalogTLSInvalid, CASecretMissing, or InvalidCABundle.

Is SDS Required?

SDS is not required for publishing Model and ClusterModel. SDS is required only for delivery.type=NodeCache. SharedPVC requires a StorageClass that can provision ReadWriteMany PVCs.

How Do I Select A StorageClass For SharedPVC?

Set delivery.sharedPVCStorageClassName:

delivery:
  type: SharedPVC
  sharedPVCStorageClassName: rwx-storage-class

An empty value uses global.modules.storageClass, then global.defaultClusterStorageClass, then Kubernetes default StorageClass. If several Kubernetes default StorageClass objects exist, set the value explicitly in the module or Deckhouse global settings.

How Do I Set Node-Cache Size?

delivery:
  type: NodeCache
  nodeCacheSize: 200Gi

This is the per-node cache size. The same value is used for managed local storage and the node-cache runtime PVC.

Why Does Node-Cache Not Start?

Check selected nodes and BlockDevice objects:

d8 k get nodes --show-labels | grep ai.deckhouse.io/model-cache
d8 k get blockdevices.storage.deckhouse.io -o wide
d8 k describe blockdevice <block-device-name>

The disk must be free, without old LVM or filesystem signatures, with consumable=true, and with labels matching delivery.nodeCacheBlockDeviceSelector.

Where Is The Upload URL?

status.upload.secretName points to a Secret in the model namespace. The Secret contains:

  • url — upload endpoint without token in the path;
  • token — bearer token.

Read the Secret through Role ai-model-upload-reader-<model-name> created by the controller.

What Does InsufficientStorage Mean?

artifacts.capacityLimit is lower than already used and reserved storage plus the new model size. For uploads, the client must send payload size: regular curl -T does this through Content-Length.

How Does An Ollama URL Work?

The controller accepts URLs in the form https://ollama.com/library/<name>[:tag], reads the registry manifest, config, and blob, selects a GGUF layer, and verifies the digest. The Ollama HTML page and a local Ollama daemon are not used for model preparation.

Why Does A Workload Wait For The Model?

The controller adds a scheduling gate and removes it only after the model is resolved and the selected delivery mode is ready. This protects the application from starting with an empty model directory.

How Do I Avoid Drift In GitOps?

Store the source workload with a top-level metadata annotation in Git. The controller-owned volumes, mounts, environment variables, and resolved annotations appear in the live object after reconciliation.

Where Do I Check Diagnostics?

d8 k -n d8-ai-models get pods -o wide
d8 k get models.ai.deckhouse.io -A
d8 k get clustermodels.ai.deckhouse.io
d8 k -n <namespace> describe model <name>

In Grafana, use the module dashboard group: overview, catalog object, model preparation, runtime delivery, and storage.