The module lifecycle stage: Experimental
The module has requirements for installation
This guide is for namespace users and application operators. It shows how to
create a model, import a model from an external catalog, and attach a
model in the Ready phase to a Kubernetes workload without manual registry
credentials, init containers, or Secrets.
The module does not run an inference runtime. It prepares model files in a supported format and mounts them into a Pod; the application reads those files.
Quick Start
- Create a
ModelorClusterModelwith one source:url,upload, orcatalog. - Wait for
status.phase=Ready. - Add
ai.deckhouse.io/modelorai.deckhouse.io/clustermodelto the top-level metadata of a workload. - Check that the workload Pod starts and sees the model under
/data/modelcache/models/<model-name>.
Model Or ClusterModel
| Resource | Scope | Use When |
|---|---|---|
Model |
namespace | The model belongs to one namespace. |
ClusterModel |
cluster | The model is curated by an administrator and shared across namespaces. |
Model can use a Secret from its namespace for a private Hugging Face
repository. ClusterModel does not reference namespaced Secrets.
What Happens After Model Creation
After Model or ClusterModel creation, the module prepares a local model
copy that workloads consume later.
Steps:
- The controller reads
spec.sourceand determines where to get data. - A worker receives bytes from Hugging Face, Ollama, an upload session, or an external catalog import.
- The worker verifies format, digest, and metadata.
- The controller writes
status.artifact.digest,status.artifact.sizeBytes,status.resolved.*, and conditions. status.phasebecomesReady, and the model can then be attached to workloads.
Internal DMCR paths, tags, and runtime digests remain controller-owned runtime
and status data. Users work with the Model or ClusterModel name.
status.phase: Ready means the model data is verified, a local copy is stored
in DMCR, and the selected delivery mode can prepare a Pod mount.
Model From Hugging Face
apiVersion: ai.deckhouse.io/v1alpha1
kind: Model
metadata:
name: bge-m3
namespace: ai-demo
spec:
source:
url: https://huggingface.co/BAAI/bge-m3For a private repository, create a Secret in the same namespace. Supported keys
are token, HF_TOKEN, and HUGGING_FACE_HUB_TOKEN.
apiVersion: v1
kind: Secret
metadata:
name: hf-private-token
namespace: ai-demo
type: Opaque
stringData:
token: hf_xxx
---
apiVersion: ai.deckhouse.io/v1alpha1
kind: Model
metadata:
name: private-llm
namespace: ai-demo
spec:
source:
url: https://huggingface.co/acme/private-llm
authSecretRef:
name: hf-private-tokenCheck preparation:
d8 k -n ai-demo get model bge-m3
d8 k -n ai-demo describe model bge-m3Model From Ollama
apiVersion: ai.deckhouse.io/v1alpha1
kind: ClusterModel
metadata:
name: qwen-gguf
spec:
source:
url: https://ollama.com/library/qwen3.6:latestThe URL must have the form https://ollama.com/library/<name>[:tag]. The
controller reads the registry manifest, config, and blob, selects one GGUF
layer, verifies descriptor digest and GGUF magic header, then stores the model
as a regular catalog OCI artifact.
Malformed URLs are reported as status.phase=Failed with reason
UnsupportedSource. Registry, tag, manifest, descriptor, or GGUF payload
errors are reported as Failed with reason PublicationFailed; details are
stored in conditions.
Upload Model
Use upload for a local file or archive:
- direct file: a
GGUFmodel, usually a*.gguffile; - archives:
tar,tar.gz,tgz,zip,tar.zst,tar.zstd,tzst; - archive contents:
GGUF,Safetensors, orDiffusers.
A Safetensors archive must contain root config.json and one or more
*.safetensors files. A Diffusers archive must contain root
model_index.json and *.safetensors or *.bin weight files. A GGUF
archive only needs a *.gguf file.
apiVersion: ai.deckhouse.io/v1alpha1
kind: Model
metadata:
name: uploaded-model
namespace: ai-demo
spec:
source:
upload: {}After creation, the object moves to WaitForUpload. The Secret with upload URL
and bearer token is available in status.upload.secretName:
d8 k -n ai-demo get model uploaded-model -o jsonpath='{.status.upload.secretName}{"\n"}'Reading this Secret requires get access. The controller creates Role
ai-model-upload-reader-<model-name> that reads only this Secret.
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: uploaded-model-uploader
namespace: ai-demo
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ai-model-upload-reader-uploaded-model
subjects:
- kind: ServiceAccount
name: model-uploader
namespace: ai-demoUpload a file:
UPLOAD_SECRET=$(d8 k -n ai-demo get model uploaded-model -o jsonpath='{.status.upload.secretName}')
UPLOAD_URL=$(d8 k -n ai-demo get secret "$UPLOAD_SECRET" -o jsonpath='{.data.url}' | base64 -d)
UPLOAD_TOKEN=$(d8 k -n ai-demo get secret "$UPLOAD_SECRET" -o jsonpath='{.data.token}' | base64 -d)
curl -fS --progress-bar \
-H "Authorization: Bearer ${UPLOAD_TOKEN}" \
-T ./model.gguf \
"$UPLOAD_URL" | catFor archives, pass a file name explicitly. The controller detects archive type from the extension:
curl -fS --progress-bar \
-H "Authorization: Bearer ${UPLOAD_TOKEN}" \
-T ./model-bundle.zip \
"$UPLOAD_URL?filename=model-bundle.zip" | catImport From An External Catalog
The administrator configures external catalog sources. Users select a model by catalog name, without a digest or OCI reference:
apiVersion: ai.deckhouse.io/v1alpha1
kind: Model
metadata:
name: qwen3-8b
namespace: ai-demo
spec:
source:
catalog:
sourceName: dmz
name: qwen3-8bIf the cluster has exactly one ModelCatalogSource in the Ready phase, omit
sourceName:
spec:
source:
catalog:
name: qwen3-8bThe controller freezes the upstream revision and remote digest in
status.source.catalog, copies the artifact into the local DMCR,
and only then marks the object Ready. Workload delivery always uses the
local copy.
If the upstream token, CA, or catalog source temporarily fails, the object can
show Failed with CatalogAuthFailed, CatalogTLSInvalid, or
CatalogSourceNotReady. These states are recoverable: after the administrator
fixes the source Secret, CA, RBAC, or source health, the controller retries the
same catalog revision and remote digest.
Status
| Phase | Meaning |
|---|---|
Pending |
The controller is waiting for preflight or has not started preparation yet. |
WaitForUpload |
Upload session is ready and waits for a file. |
Publishing |
Model preparation is in progress: receive, verify, and store a local copy. |
Ready |
Model data is verified, the local copy is stored, and metadata is resolved. |
Failed |
Preparation failed; reason is in conditions. |
Deleting |
Cleanup is in progress. |
Useful fields:
status.artifact.digest— digest of the prepared OCI artifact;status.artifact.sizeBytes— artifact size;status.resolved.format—Safetensors,GGUF, orDiffusers;status.resolved.family— model family;status.resolved.supportedEndpointTypes— serving endpoint types;status.resolved.supportedFeatures— model capabilities.
For catalog imports, also check:
status.source.catalog.sourceName— selectedModelCatalogSource;status.source.catalog.catalogRevision— catalog revision used for selection;status.source.catalog.remoteDigest— upstream artifact digest being imported locally.
Common catalog import reasons:
CatalogAuthFailed— credentials or RBAC for the external catalog failed;CatalogTLSInvalid— CA for the external catalog is missing or invalid;CatalogSourceNotReady— the external catalog source is not healthy yet;ManifestInvalid— the imported artifact cannot be verified;InsufficientStorage— the local DMCR has no free capacity.
Attach A Model To A Workload
Add an annotation to top-level workload metadata. This is the source of truth
for model selection.
apiVersion: apps/v1
kind: Deployment
metadata:
name: embedder
namespace: ai-demo
annotations:
ai.deckhouse.io/model: bge-m3
spec:
selector:
matchLabels:
app: embedder
template:
metadata:
labels:
app: embedder
spec:
containers:
- name: app
image: registry.example.com/embedder:latestFor a cluster-wide model:
metadata:
annotations:
ai.deckhouse.io/clustermodel: gemma-smallFor multiple models:
metadata:
annotations:
ai.deckhouse.io/model: bge-m3,bge-reranker
ai.deckhouse.io/clustermodel: gemma-smallThe controller injects the selected delivery mode. The container receives:
AI_MODELS_MODELS_DIR=/data/modelcache/models;AI_MODELS_MODELS— JSON with model descriptors;- each model path:
/data/modelcache/models/<model-name>.
Model names in one workload must be unique.
Delete
d8 k -n ai-demo delete model bge-m3
d8 k delete clustermodel gemma-smallAfter deletion, the controller performs cleanup through a finalizer. During
cleanup, the object can remain in phase Deleting.