The module lifecycle stage: Experimental
The module has requirements for installation
Use this page as a copy-paste manifest catalog. The user and administrator guides explain when to choose each scenario and how to diagnose failures.
Minimal ModuleConfig
apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: ai-models
spec:
enabled: true
version: 1
settings:
artifacts:
bucket: ai-models
endpoint: https://s3.example.com
region: us-east-1
credentialsSecretName: ai-models-artifacts
usePathStyle: trueModuleConfig With Capacity Limit
apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: ai-models
spec:
enabled: true
version: 1
settings:
artifacts:
bucket: ai-models
endpoint: https://s3.example.com
region: us-east-1
credentialsSecretName: ai-models-artifacts
capacityLimit: 1TiSharedPVC Delivery
apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: ai-models
spec:
enabled: true
version: 1
settings:
artifacts:
bucket: ai-models
endpoint: https://s3.example.com
credentialsSecretName: ai-models-artifacts
delivery:
type: SharedPVC
sharedPVCStorageClassName: rwx-storage-classNodeCache Delivery
d8 k label node k8s-w3-gpu ai.deckhouse.io/model-cache=true
d8 k label blockdevice <block-device-name> ai.deckhouse.io/model-cache=trueapiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: ai-models
spec:
enabled: true
version: 1
settings:
artifacts:
bucket: ai-models
endpoint: https://s3.example.com
credentialsSecretName: ai-models-artifacts
delivery:
type: NodeCache
nodeCacheSize: 200GiHugging Face Model
apiVersion: ai.deckhouse.io/v1alpha1
kind: Model
metadata:
name: bge-m3
namespace: ai-demo
spec:
source:
url: https://huggingface.co/BAAI/bge-m3Private Hugging Face Model
apiVersion: v1
kind: Secret
metadata:
name: hf-token
namespace: ai-demo
type: Opaque
stringData:
token: hf_xxx
---
apiVersion: ai.deckhouse.io/v1alpha1
kind: Model
metadata:
name: private-model
namespace: ai-demo
spec:
source:
url: https://huggingface.co/acme/private-model
authSecretRef:
name: hf-tokenClusterModel
apiVersion: ai.deckhouse.io/v1alpha1
kind: ClusterModel
metadata:
name: gemma-small
spec:
source:
url: https://huggingface.co/google/gemma-3-4b-itGGUF Model From Ollama
apiVersion: ai.deckhouse.io/v1alpha1
kind: ClusterModel
metadata:
name: qwen-gguf
spec:
source:
url: https://ollama.com/library/qwen3.6:latestUpload Model
apiVersion: ai.deckhouse.io/v1alpha1
kind: Model
metadata:
name: uploaded-safetensors
namespace: ai-demo
spec:
source:
upload: {}d8 k -n ai-demo wait --for=jsonpath='{.status.phase}'=WaitForUpload model/uploaded-safetensors
UPLOAD_SECRET=$(d8 k -n ai-demo get model uploaded-safetensors -o jsonpath='{.status.upload.secretName}')
UPLOAD_URL=$(d8 k -n ai-demo get secret "$UPLOAD_SECRET" -o jsonpath='{.data.url}' | base64 -d)
UPLOAD_TOKEN=$(d8 k -n ai-demo get secret "$UPLOAD_SECRET" -o jsonpath='{.data.token}' | base64 -d)
curl -fS --progress-bar -H "Authorization: Bearer ${UPLOAD_TOKEN}" -T ./model-bundle.zip "$UPLOAD_URL?filename=model-bundle.zip" | catDeployment With Model
apiVersion: apps/v1
kind: Deployment
metadata:
name: embedder
namespace: ai-demo
annotations:
ai.deckhouse.io/model: bge-m3
spec:
replicas: 2
selector:
matchLabels:
app: embedder
template:
metadata:
labels:
app: embedder
spec:
containers:
- name: embedder
image: registry.example.com/embedder:latestDeployment With ClusterModel
apiVersion: apps/v1
kind: Deployment
metadata:
name: generator
namespace: ai-demo
annotations:
ai.deckhouse.io/clustermodel: gemma-small
spec:
selector:
matchLabels:
app: generator
template:
metadata:
labels:
app: generator
spec:
containers:
- name: generator
image: registry.example.com/generator:latestWorkload With Multiple Models
apiVersion: apps/v1
kind: Deployment
metadata:
name: rag-service
namespace: ai-demo
annotations:
ai.deckhouse.io/clustermodel: gemma-small
ai.deckhouse.io/model: bge-m3
spec:
selector:
matchLabels:
app: rag-service
template:
metadata:
labels:
app: rag-service
spec:
containers:
- name: rag-service
image: registry.example.com/rag-service:latestInside the container, models are available under
/data/modelcache/models/<model-name>.
Perimeter Distribution Tier
Perimeter distribution is a separate catalog/import axis, not a
delivery.type value. Enable public catalog mode in the publishing tier:
apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: ai-models
spec:
enabled: true
version: 1
settings:
artifacts:
bucket: ai-models
endpoint: https://s3.example.com
credentialsSecretName: ai-models-artifacts
distribution:
mode: PublicCatalogAll ClusterModel objects in the Ready phase appear in the public catalog.
Create a ServiceAccount identity for the consuming cluster in the publishing
cluster and bind it to the distribution reader role:
apiVersion: v1
kind: ServiceAccount
metadata:
name: perimeter-a
namespace: d8-ai-models
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: ai-models-distribution-reader-perimeter-a
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: d8:ai-models:distribution:reader
subjects:
- kind: ServiceAccount
name: perimeter-a
namespace: d8-ai-modelsIssue a token in the publishing cluster and pass it to the consuming administrator:
d8 k -n d8-ai-models create token perimeter-a --duration=720hOn the consuming cluster, create the source Secret in d8-system and describe
the external catalog source:
apiVersion: v1
kind: Secret
metadata:
name: ai-models-dmz-read
namespace: d8-system
type: Opaque
stringData:
token: "<publishing-cluster-service-account-token>"
---
apiVersion: v1
kind: Secret
metadata:
name: ai-models-dmz-ca
namespace: d8-system
type: Opaque
stringData:
ca.crt: |
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
---
apiVersion: ai.deckhouse.io/v1alpha1
kind: ModelCatalogSource
metadata:
name: dmz
spec:
url: https://ai-models.dmz.example.com
credentialsSecretName: ai-models-dmz-read
caSecretName: ai-models-dmz-ca
---
apiVersion: ai.deckhouse.io/v1alpha1
kind: Model
metadata:
name: qwen3-8b
namespace: ai-demo
spec:
source:
catalog:
sourceName: dmz
name: qwen3-8bIf the cluster has only one ready ModelCatalogSource, sourceName can be
omitted.
Workload delivery in the consuming cluster remains a regular delivery mode:
delivery:
type: SharedPVCor NodeCache when node-local cache is required.