The module lifecycle stage: Experimental
The module has requirements for installation
Which Delivery Modes Are Supported?
Two modes are supported:
SharedPVC— the default mode for clusters with RWX storage;NodeCache— node-local cache backed by SDS and a CSI mount.
SharedPVC creates an RWX PVC in the workload namespace and a materializer Job
for each requested model. NodeCache keeps models in a shared per-node cache
and mounts them read-only through CSI.
Is The Module Only For LLMs?
No. The module stores and delivers model files in supported formats such as
Safetensors, GGUF, and Diffusers. These can be LLMs, embedding models,
or other models when their format is supported by the controller.
The module does not run an inference runtime and does not decide how the application uses the model. It prepares verified files and mounts them into a Pod.
How Is Distribution Different From Delivery?
Delivery attaches a model to a workload inside the cluster.
Distribution is the catalog/import plane between clusters or network zones.
The publishing cluster exposes ClusterModel objects in the Ready phase, and
the consuming cluster imports a local copy before workload delivery. Distribution
is therefore not a third delivery.type value; it is enabled through
distribution.mode=PublicCatalog.
What Exactly Happens After Model Creation?
The controller reads spec.source, receives model data, verifies format and
digest, stores a local copy in DMCR, and updates status.
status.phase: Ready means the local copy is verified and stored. Only then
can the model be delivered to workloads or exposed through the public catalog.
What Is ModelPack For?
ModelPack is the module’s internal packaging of source model files as an OCI
artifact in DMCR. It is used for verification, replay after failure, cleanup,
and repeatable delivery. It is not weight conversion: source file formats do
not change. Users do not choose ModelPack, a digest, or a registry path.
For large object-source models, preparation uses the chunked path: the controller reads the source by ranges, stores completed chunks in DMCR, and can continue after restart from the last committed layer.
Compression is automatic where it reduces stored bytes: chunked payload is compressed with zstd, and the raw chunk is kept when compression would make it larger. Archive layers support gzip and zstd media types. Monitoring compares logical bytes with stored bytes to show the actual saving.
Why Are There Thousands Of Objects In The S3 Bucket?
This is normal for prepared models. The module stores an OCI ModelPack, not
one file per model: manifest, config, layers, repository links, preparation
staging data, and service markers used for replay and cleanup. In the storage
UI you may see groups such as docker/registry/..., raw/..., and
_ai_models/direct-upload/...; these are internal structure, not a user API.
Object count is not the model count. Deleting a Model or ClusterModel
queues cleanup, but physical space is reclaimed later by garbage collection.
Do not delete objects manually: check the storage dashboard, cleanup requests,
and the dmcr-garbage-collection container logs first.
How Do I See Runtime Download Progress?
For SharedPVC, use the overview dashboard panels for SharedPVC materializer
queue and throughput. The controller publishes the expected bytes for every
materializer Job; DMCR counts pulled OCI blob bytes from the signed read grant
used by that Job.
For NodeCache, use the node-cache materialization throughput and footprint
panels. The node-cache runtime reports expected and downloaded bytes per node
and artifact identity while it fills the node-local cache.
How Do I Import From A Distribution Catalog?
Create a cluster-scoped ModelCatalogSource for the upstream catalog. Then
create a local Model or ClusterModel with spec.source.catalog.name.
Set spec.source.catalog.sourceName only when more than one catalog source is
available.
Will Catalog Import Recover After Token Or CA Problems?
Yes, for source-side failures. CatalogAuthFailed, CatalogTLSInvalid, and
CatalogSourceNotReady are retried after the ModelCatalogSource becomes
healthy again. The controller keeps the selected catalog revision and remote
digest frozen, so recovery does not silently switch to another model version.
ManifestInvalid, InsufficientStorage, and an invalid catalog contract are
not automatic retry paths. Fix the artifact, storage limit, or catalog
specification first.
Where Do I Get A Token For ModelCatalogSource?
On the publishing cluster, the administrator creates a ServiceAccount for the
consuming cluster and binds it to
ClusterRole d8:ai-models:distribution:reader. Then the administrator issues
a token with d8 k -n d8-ai-models create token <service-account> and
passes only the token value through a protected external channel. On the
consuming cluster, put this token into a Secret in d8-system and set its name
in ModelCatalogSource.spec.credentialsSecretName.
The module does not rotate this token by itself: it does not have trusted
access to both the publishing and consuming clusters. Rotate by issuing a new
token in the publishing cluster and updating the Secret in the consuming
cluster. The controller rereads ModelCatalogSource after the Secret update.
Revoke by deleting the RoleBinding or ServiceAccount in the publishing cluster.
How Do I Configure A Private CA For An External Catalog?
Create a Secret with ca.crt in d8-system and set
ModelCatalogSource.spec.caSecretName. TLS validation is fail-closed: a wrong
or missing CA sets a condition with reason CatalogTLSInvalid,
CASecretMissing, or InvalidCABundle.
Is SDS Required?
SDS is not required for publishing Model and ClusterModel. SDS is required
only for delivery.type=NodeCache. SharedPVC requires a StorageClass that
can provision ReadWriteMany PVCs.
How Do I Select A StorageClass For SharedPVC?
Set delivery.sharedPVCStorageClassName:
delivery:
type: SharedPVC
sharedPVCStorageClassName: rwx-storage-classAn empty value uses global.modules.storageClass, then
global.defaultClusterStorageClass, then Kubernetes default StorageClass.
If several Kubernetes default StorageClass objects exist, set the value
explicitly in the module or Deckhouse global settings.
How Do I Set Node-Cache Size?
delivery:
type: NodeCache
nodeCacheSize: 200GiThis is the per-node cache size. The same value is used for managed local storage and the node-cache runtime PVC.
Why Does Node-Cache Not Start?
Check selected nodes and BlockDevice objects:
d8 k get nodes --show-labels | grep ai.deckhouse.io/model-cache
d8 k get blockdevices.storage.deckhouse.io -o wide
d8 k describe blockdevice <block-device-name>The disk must be free, without old LVM or filesystem signatures, with
consumable=true, and with labels matching
delivery.nodeCacheBlockDeviceSelector.
Where Is The Upload URL?
status.upload.secretName points to a Secret in the model namespace. The
Secret contains:
url— upload endpoint without token in the path;token— bearer token.
Read the Secret through Role ai-model-upload-reader-<model-name> created by
the controller.
What Does InsufficientStorage Mean?
artifacts.capacityLimit is lower than already used and reserved storage plus
the new model size. For uploads, the client must send payload size: regular
curl -T does this through Content-Length.
How Does An Ollama URL Work?
The controller accepts URLs in the form
https://ollama.com/library/<name>[:tag], reads the registry manifest,
config, and blob, selects a GGUF layer, and verifies the digest. The Ollama
HTML page and a local Ollama daemon are not used for model preparation.
Why Does A Workload Wait For The Model?
The controller adds a scheduling gate and removes it only after the model is resolved and the selected delivery mode is ready. This protects the application from starting with an empty model directory.
How Do I Avoid Drift In GitOps?
Store the source workload with a top-level metadata annotation in Git. The controller-owned volumes, mounts, environment variables, and resolved annotations appear in the live object after reconciliation.
Where Do I Check Diagnostics?
d8 k -n d8-ai-models get pods -o wide
d8 k get models.ai.deckhouse.io -A
d8 k get clustermodels.ai.deckhouse.io
d8 k -n <namespace> describe model <name>In Grafana, use the module dashboard group: overview, catalog object, model preparation, runtime delivery, and storage.