The module lifecycle stageExperimental
The module has requirements for installation

This guide is for namespace users and application operators. It shows how to create a model, import a model from an external catalog, and attach a model in the Ready phase to a Kubernetes workload without manual registry credentials, init containers, or Secrets.

The module does not run an inference runtime. It prepares model files in a supported format and mounts them into a Pod; the application reads those files.

Quick Start

  1. Create a Model or ClusterModel with one source: url, upload, or catalog.
  2. Wait for status.phase=Ready.
  3. Add ai.deckhouse.io/model or ai.deckhouse.io/clustermodel to the top-level metadata of a workload.
  4. Check that the workload Pod starts and sees the model under /data/modelcache/models/<model-name>.

Model Or ClusterModel

Resource Scope Use When
Model namespace The model belongs to one namespace.
ClusterModel cluster The model is curated by an administrator and shared across namespaces.

Model can use a Secret from its namespace for a private Hugging Face repository. ClusterModel does not reference namespaced Secrets.

What Happens After Model Creation

After Model or ClusterModel creation, the module prepares a local model copy that workloads consume later.

Steps:

  1. The controller reads spec.source and determines where to get data.
  2. A worker receives bytes from Hugging Face, Ollama, an upload session, or an external catalog import.
  3. The worker verifies format, digest, and metadata.
  4. The controller writes status.artifact.digest, status.artifact.sizeBytes, status.resolved.*, and conditions.
  5. status.phase becomes Ready, and the model can then be attached to workloads.

Internal DMCR paths, tags, and runtime digests remain controller-owned runtime and status data. Users work with the Model or ClusterModel name.

status.phase: Ready means the model data is verified, a local copy is stored in DMCR, and the selected delivery mode can prepare a Pod mount.

Model From Hugging Face

apiVersion: ai.deckhouse.io/v1alpha1
kind: Model
metadata:
  name: bge-m3
  namespace: ai-demo
spec:
  source:
    url: https://huggingface.co/BAAI/bge-m3

For a private repository, create a Secret in the same namespace. Supported keys are token, HF_TOKEN, and HUGGING_FACE_HUB_TOKEN.

apiVersion: v1
kind: Secret
metadata:
  name: hf-private-token
  namespace: ai-demo
type: Opaque
stringData:
  token: hf_xxx
---
apiVersion: ai.deckhouse.io/v1alpha1
kind: Model
metadata:
  name: private-llm
  namespace: ai-demo
spec:
  source:
    url: https://huggingface.co/acme/private-llm
    authSecretRef:
      name: hf-private-token

Check preparation:

d8 k -n ai-demo get model bge-m3
d8 k -n ai-demo describe model bge-m3

Model From Ollama

apiVersion: ai.deckhouse.io/v1alpha1
kind: ClusterModel
metadata:
  name: qwen-gguf
spec:
  source:
    url: https://ollama.com/library/qwen3.6:latest

The URL must have the form https://ollama.com/library/<name>[:tag]. The controller reads the registry manifest, config, and blob, selects one GGUF layer, verifies descriptor digest and GGUF magic header, then stores the model as a regular catalog OCI artifact.

Malformed URLs are reported as status.phase=Failed with reason UnsupportedSource. Registry, tag, manifest, descriptor, or GGUF payload errors are reported as Failed with reason PublicationFailed; details are stored in conditions.

Upload Model

Use upload for a local file or archive:

  • direct file: a GGUF model, usually a *.gguf file;
  • archives: tar, tar.gz, tgz, zip, tar.zst, tar.zstd, tzst;
  • archive contents: GGUF, Safetensors, or Diffusers.

A Safetensors archive must contain root config.json and one or more *.safetensors files. A Diffusers archive must contain root model_index.json and *.safetensors or *.bin weight files. A GGUF archive only needs a *.gguf file.

apiVersion: ai.deckhouse.io/v1alpha1
kind: Model
metadata:
  name: uploaded-model
  namespace: ai-demo
spec:
  source:
    upload: {}

After creation, the object moves to WaitForUpload. The Secret with upload URL and bearer token is available in status.upload.secretName:

d8 k -n ai-demo get model uploaded-model -o jsonpath='{.status.upload.secretName}{"\n"}'

Reading this Secret requires get access. The controller creates Role ai-model-upload-reader-<model-name> that reads only this Secret.

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: uploaded-model-uploader
  namespace: ai-demo
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: ai-model-upload-reader-uploaded-model
subjects:
  - kind: ServiceAccount
    name: model-uploader
    namespace: ai-demo

Upload a file:

UPLOAD_SECRET=$(d8 k -n ai-demo get model uploaded-model -o jsonpath='{.status.upload.secretName}')
UPLOAD_URL=$(d8 k -n ai-demo get secret "$UPLOAD_SECRET" -o jsonpath='{.data.url}' | base64 -d)
UPLOAD_TOKEN=$(d8 k -n ai-demo get secret "$UPLOAD_SECRET" -o jsonpath='{.data.token}' | base64 -d)

curl -fS --progress-bar \
  -H "Authorization: Bearer ${UPLOAD_TOKEN}" \
  -T ./model.gguf \
  "$UPLOAD_URL" | cat

For archives, pass a file name explicitly. The controller detects archive type from the extension:

curl -fS --progress-bar \
  -H "Authorization: Bearer ${UPLOAD_TOKEN}" \
  -T ./model-bundle.zip \
  "$UPLOAD_URL?filename=model-bundle.zip" | cat

Import From An External Catalog

The administrator configures external catalog sources. Users select a model by catalog name, without a digest or OCI reference:

apiVersion: ai.deckhouse.io/v1alpha1
kind: Model
metadata:
  name: qwen3-8b
  namespace: ai-demo
spec:
  source:
    catalog:
      sourceName: dmz
      name: qwen3-8b

If the cluster has exactly one ModelCatalogSource in the Ready phase, omit sourceName:

spec:
  source:
    catalog:
      name: qwen3-8b

The controller freezes the upstream revision and remote digest in status.source.catalog, copies the artifact into the local DMCR, and only then marks the object Ready. Workload delivery always uses the local copy.

If the upstream token, CA, or catalog source temporarily fails, the object can show Failed with CatalogAuthFailed, CatalogTLSInvalid, or CatalogSourceNotReady. These states are recoverable: after the administrator fixes the source Secret, CA, RBAC, or source health, the controller retries the same catalog revision and remote digest.

Status

Phase Meaning
Pending The controller is waiting for preflight or has not started preparation yet.
WaitForUpload Upload session is ready and waits for a file.
Publishing Model preparation is in progress: receive, verify, and store a local copy.
Ready Model data is verified, the local copy is stored, and metadata is resolved.
Failed Preparation failed; reason is in conditions.
Deleting Cleanup is in progress.

Useful fields:

  • status.artifact.digest — digest of the prepared OCI artifact;
  • status.artifact.sizeBytes — artifact size;
  • status.resolved.formatSafetensors, GGUF, or Diffusers;
  • status.resolved.family — model family;
  • status.resolved.supportedEndpointTypes — serving endpoint types;
  • status.resolved.supportedFeatures — model capabilities.

For catalog imports, also check:

  • status.source.catalog.sourceName — selected ModelCatalogSource;
  • status.source.catalog.catalogRevision — catalog revision used for selection;
  • status.source.catalog.remoteDigest — upstream artifact digest being imported locally.

Common catalog import reasons:

  • CatalogAuthFailed — credentials or RBAC for the external catalog failed;
  • CatalogTLSInvalid — CA for the external catalog is missing or invalid;
  • CatalogSourceNotReady — the external catalog source is not healthy yet;
  • ManifestInvalid — the imported artifact cannot be verified;
  • InsufficientStorage — the local DMCR has no free capacity.

Attach A Model To A Workload

Add an annotation to top-level workload metadata. This is the source of truth for model selection.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: embedder
  namespace: ai-demo
  annotations:
    ai.deckhouse.io/model: bge-m3
spec:
  selector:
    matchLabels:
      app: embedder
  template:
    metadata:
      labels:
        app: embedder
    spec:
      containers:
        - name: app
          image: registry.example.com/embedder:latest

For a cluster-wide model:

metadata:
  annotations:
    ai.deckhouse.io/clustermodel: gemma-small

For multiple models:

metadata:
  annotations:
    ai.deckhouse.io/model: bge-m3,bge-reranker
    ai.deckhouse.io/clustermodel: gemma-small

The controller injects the selected delivery mode. The container receives:

  • AI_MODELS_MODELS_DIR=/data/modelcache/models;
  • AI_MODELS_MODELS — JSON with model descriptors;
  • each model path: /data/modelcache/models/<model-name>.

Model names in one workload must be unique.

Delete

d8 k -n ai-demo delete model bge-m3
d8 k delete clustermodel gemma-small

After deletion, the controller performs cleanup through a finalizer. During cleanup, the object can remain in phase Deleting.