The module lifecycle stageExperimental
The module has requirements for installation

The module adds an AI/ML model file catalog to DKP and mounts those files into Kubernetes applications. A user creates a Model or ClusterModel with a model source. The controller receives files, verifies the format, packages them as an internal OCI artifact, stores a local copy in DMCR, and exposes progress in status.

The module does not run an inference runtime and is not limited to LLMs. It works with model artifacts in supported formats such as Safetensors, GGUF, and Diffusers. The application receives a model file path and decides how to use it.

Use Cases

  • Create a Model from Hugging Face, Ollama, or a local upload and receive a model file directory inside a Pod.
  • Create a ClusterModel so several teams can use one prepared model by a stable name.
  • Attach a model to a Deployment, StatefulSet, DaemonSet, or CronJob with one top-level metadata annotation.
  • Deliver a prepared model through SharedPVC when the cluster has ReadWriteMany storage.
  • Deliver a large model through NodeCache when several workloads read it on the same nodes.
  • Expose cluster-wide models in the Ready phase through a public catalog and import a local copy in another cluster.

Roles

Role Responsibilities Start Here
Cluster administrator Enable the module, configure object storage, choose delivery mode, expose distribution, grant catalog access, monitor runtime health. Administration Guide
Namespace user Create or import models, upload local files, attach models in the Ready phase to workloads. User Guide
Application operator Keep workload manifests in Git and consume models through annotations. User Guide

Resources

Resource Scope Purpose
Model namespace A model owned by one namespace.
ClusterModel cluster A shared model curated by a cluster administrator.
ModelCatalogSource cluster External ClusterModel catalog imported as local copies.

Model supports private Hugging Face sources through a Secret in the same namespace. ClusterModel is intended for shared remote sources and does not reference namespaced Secrets. ModelCatalogSource references source credentials in d8-system. Public catalog access is authorized by Kubernetes RBAC in the publishing cluster.

What Happens After Model Creation

  flowchart LR
  User["User"] --> API["Model / ClusterModel"]
  API --> Controller["Controller"]
  Controller --> Worker["Preparation worker"]
  Worker --> Store["OCI artifact in DMCR"]
  Controller --> Status["status / conditions / metadata"]
  Workload["Annotated workload"] --> Delivery["Delivery controller"]
  Delivery --> SharedPVC["RWX PVC"]
  Delivery --> NodeCache["NodeCache"]

Users choose only the model source: a URL, an upload session, or an external catalog entry. The controller chooses the internal DMCR path, verifies the data, packages the source files as an OCI artifact, and writes the result into status. The digest appears in status.artifact.digest after verification.

ModelPack is the module’s internal format for this OCI packaging. It is used for verification, replay after failure, cleanup, and repeatable delivery. This is not weight conversion: GGUF stays GGUF, and Safetensors stays Safetensors. Users do not choose ModelPack, a digest, a tag, or a registry path.

status.phase: Ready means the local model copy is verified and stored, and the controller can start workload delivery or catalog import. Workloads receive only the stable runtime contract:

  • model directory: /data/modelcache/models;
  • AI_MODELS_MODELS_DIR environment variable;
  • AI_MODELS_MODELS environment variable with model names, paths, digests, and families.

Delivery And Distribution

Delivery attaches a model in the Ready phase to a workload inside the cluster.

Delivery modes:

  • SharedPVC is the default mode. The controller creates a ReadWriteMany PVC in the workload namespace, a materializer Job downloads the model into it, and Pods mount the model read-only. A local RWO PVC is not a separate delivery mode.
  • NodeCache is intended for SDS-backed node-local cache. Selected nodes get a shared cache, and workloads receive a read-only CSI mount.

Distribution belongs to the catalog/import plane, not to workload delivery. It is used for DMZ, perimeter, or external verified catalog topologies: a publishing cluster exposes a list of ClusterModel objects in the Ready phase, a consuming cluster imports the selected model into its local DMCR, and only then uses normal delivery. Internal @sha256 values and OCI paths stay inside controller-owned copy workflows.

The public distribution surface is enabled with distribution.mode=PublicCatalog and uses the module public host from global Deckhouse settings. The publishing administrator grants access to a Kubernetes subject, usually a ServiceAccount, with ClusterRole d8:ai-models:distribution:reader. The consuming cluster stores that token in a d8-system Secret and describes the upstream with ModelCatalogSource.

Catalog import is recoverable for source-side failures. If a token expires, CA is fixed, or the source temporarily becomes not ready, the selected catalog revision and remote digest stay frozen and the controller retries the import after the source becomes healthy again.

Components

Component Namespace Purpose
ai-models-controller d8-ai-models Manages Model / ClusterModel resources, upload sessions, delivery, and metrics.
publish-worker d8-ai-models Reads model sources and stores verified OCI artifacts in DMCR.
upload-gateway d8-ai-models Accepts direct file or archive uploads.
DMCR d8-ai-models Deckhouse Model Container Registry: the module’s internal OCI registry that stores prepared models on top of the configured object storage.
node-cache-runtime selected nodes Prepares node-local cache and CSI mounts for NodeCache.

Documentation

Third-party components

List of third-party software used in the ai-models module:

  • AI Models 0.0.1

    License: Apache License 2.0

    Deckhouse module for AI/ML model registry and catalog services.