Log collection and delivery

This section describes the operation of logging system components in Deckhouse Virtualization Platform (DVP).

Log collection and delivery mechanism

DVP uses the log-shipper module for log collection and delivery. A separate log-shipper instance runs on each cluster node and is configured based on Deckhouse resources. The log-shipper module uses Vector as a logging agent. The combination of settings for log collection and delivery forms a pipeline.

log-shipper architecture

Deckhouse monitors ClusterLoggingConfig, ClusterLogDestination, and PodLoggingConfig resources:
- ClusterLoggingConfig: Describes log sources at the cluster level, including collection, filtering, and parsing rules.
- PodLoggingConfig: Describes log sources within a specified namespace, including collection, filtering, and parsing rules.
- ClusterLogDestination: Sets log storage parameters.
Based on the specified parameters, Deckhouse automatically creates a configuration file and saves it in a Kubernetes Secret.
The Secret is mounted on all log-shipper agent pods. When the configuration changes, updates occur automatically using the reloader sidecar container.

Log delivery schemes

DVP supports various log delivery topologies depending on reliability requirements and resource consumption.

Distributed

log-shipper agents send logs directly to storage, such as Loki or Elasticsearch.

log-shipper distributed

Advantages:

Simple configuration.
Available “out of the box” without additional dependencies, except for storage.

Disadvantages:

Complex transformations consume more resources on application nodes.

Centralized

All logs are sent to one of the available aggregators, such as Logstash or Vector. Agents on nodes send logs as quickly as possible, consuming minimal resources. Complex transformations are performed on the aggregator side.

log-shipper centralized

Advantages:

Reduces resource consumption on application nodes.
Users can configure any transformations in the aggregator and send logs to a much larger number of storage systems.

Disadvantages:

Requires dedicated nodes for aggregators. Their number may increase depending on the load.

Streaming

The main task of this architecture is to send logs to a message queue (e.g., Kafka) as quickly as possible, from which they are transferred to long-term storage for further analysis in a service order.

log-shipper stream

Advantages:

Reduces resource consumption on application nodes.
Users can configure any transformations in the aggregator and send logs to a much larger number of storage systems.
High reliability. Suitable for infrastructure where log delivery is a priority task.

Disadvantages:

Adds an intermediate link (message queue).
Requires dedicated nodes for aggregators. Their number may increase depending on the load.

Log processing

Message filters

Before sending logs, DVP can filter out unnecessary records to reduce the number of messages sent to storage. For this, labelFilter and logFilter filters of the log-shipper module are used.

log-shipper pipeline

Filters run immediately after combining strings using multiline parsing.

labelFilter:
- Rules are applied to message metadata.
- Fields for metadata (or labels) are populated based on the log source, so different sources will have different sets of fields.
- Rules are used, for example, to exclude messages from a specific container or pod matching a given label.
logFilter:
- Rules are applied to the original message.
- Allows excluding a message based on the value of a JSON field.
- If the message is not in JSON format, you can use a regular expression to search by string.

Both filters have a unified configuration structure:

field: Data source for running filtering. Most often this is a label value or field from a JSON document.
operator: Action for comparison. Available options: In, NotIn, Regex, NotRegex, Exists, DoesNotExist.
values: This option has different values for different operators:
- In, NotIn: The field value must equal or not equal one of the values in the values list.
- Regex, NotRegex: The value must match at least one or not match any regular expression from the values list.
- Exists, DoesNotExist: Not supported.

Additional labels (extraLabels) are added at the Destination stage, so filtering logs by them is not possible.

Metadata

When processing logs, log-shipper automatically enriches messages with metadata depending on their source. Enrichment occurs at the Source stage.

Kubernetes

When collecting logs from Kubernetes pods and nodes, the following fields are automatically exported:

Label	Pod spec path
`pod`	`metadata.name`
`namespace`	`metadata.namespace`
`pod_labels`	`metadata.labels`
`pod_ip`	`status.podIP`
`image`	`spec.containers[].image`
`container`	`spec.containers[].name`
`node`	`spec.nodeName`
`pod_owner`	`metadata.ownerRef[0]`

Label	Node spec path
`node_group`	`metadata.labels[].node.deckhouse.io/group`

For Splunk, the pod_labels field is not exported because it is a nested object that Splunk does not support.

File

When collecting logs from file sources, only the host label is available, which contains the hostname of the server from which the log was received.

Log collection and delivery mechanism