Skip to content

Log Forwarding#

Subsystem Goal#

This subsystem is responsible for forwarding all application logs to the central VT Central Logging System (CLS). We provide the opportunity for tenants to specify the specific index logs should be sent to, allowing them to access their own logs.

Components in Use#

  • Filebeat - tool that collects log messages, adds Kubernetes-specific details, and forwards the log messages
  • Gatekeeper - by using Gatekeeper's mutation support, we can add the necessary annotations on all tenant pods to ensure the logs end up in the correct CLS index with zero effort by the tenant

Background#

Collecting Logs#

When using containers, the preferred approach for logging is to simply write log messages to stdout/stderr. This allows the container orchestration tooling to easily collect the log messages in a standardized manner. In the case of the platform, that means log messages are placed onto the filesystem.

The idea for log collection is to run a log collector that has access to this log directory and can watch for log messages. It can query the Kubernetes API to gather additional information about each running container to enhance the log messages.

To ensure all messages are collected across all nodes, this log collector runs as a DaemonSet, causing an instance of the log collector to run on every node in the cluster.

flowchart LR subgraph cluster [Cluster] subgraph node1 [Node 1] C1[Container]-->L1[Log directory] C2[Container]-->L1 L1<-. Watches changes .->LC1[Log collector] end subgraph node2 [Node 2] C3[Container]-->L2[Log directory] C4[Container]-->L2 L2<-. Watches changes .->LC2[Log collector] end end LC1-->CLS[VT CLS System] LC2-->CLS

Mutating Admission Controllers#

While discussing the Policy Enforcement subsystem, we explored how validating controllers can be used to perform additional validation as objects are created, updated, and deleted. During the API request process, there is also an opportunity for objects to be mutated using mutating admission controllers.

Using Gatekeeper's mutation support, we can mutate objects as they are created. For example, with the following definition, all pods in the sample-tenant namespace will have a tenant label added with a value of sample-tenant.

apiVersion: mutations.gatekeeper.sh/v1beta1
kind: AssignMetadata
metadata:
  name: sample-tenant-label
spec:
  match:
    scope: Namespaced
    kinds:
      - apiGroups: ["*"]
        kinds: ["Pod"]
    namespaces: ["sample-tenant"]
  location: "metadata.labels.tenant"
  parameters:
    assign:
      value: "sample-tenant"

Since these mutations occur during object creation, existing pods won't have this mutation. But, all new pods will have the new label. The nice thing about using mutations is that we can easily mutate all pods in a namespace with annotations the log collector can use to know where to send the log messages without requiring the user to update their pod specs themselves. It almost feels like magic to them!

Deploying it Yourself#

Setting up Filebeat#

  1. In order to deploy Filebeat, we need to create a config file and pass that along as part of the Helm chart install.

    Create a file named filebeat-values.yaml with the following YAML:

    daemonset:
      filebeatConfig:
        filebeat.yml: |
          filebeat.inputs:
            - type: container
              paths:
                - /var/log/containers/*.log
              processors:
                - add_kubernetes_metadata:
                    default_indexers.enabled: true
                    default_matchers.enabled: true
                    annotations.dedot: false
                    include_annotations:
                      - "platform-logging-droplogging"
                      - "platform-logging-splunkindex"
                    host: ${NODE_NAME}
                    matchers:
                      - logs_path:
                          logs_path: "/var/log/containers/"
                - drop_event:
                    when:
                      and:
                        - has_fields:
                            - kubernetes.annotations.platform-logging-droplogging
                        - equals:
                            kubernetes.annotations.platform-logging-droplogging: "true"
                - add_fields:
                    when:
                      not:
                        has_fields:
                          - kubernetes.annotations.platform-logging-splunkindex
                    fields:
                      index: default-index
                - rename:
                    when:
                      not:
                        has_fields:
                          - fields.index
                    fields:
                      - from: "kubernetes.annotations.platform-logging-splunkindex"
                        to: "fields.index"
                    ignore_missing: true
                - add_fields:
                    target: kubernetes
                    fields: 
                      clustername: onboarding
          output.file.path: "/tmp/filebeat/"
    
  2. Now, let's install the Helm chart using the custom values file.

    helm repo add filebeat https://helm.elastic.co
    helm repo update
    helm install filebeat filebeat/filebeat --namespace platform-filebeat --create-namespace -f filebeat-values.yaml
    

    After a moment, you should see the filebeat container startup!

    kubectl get pods -n platform-filebeat
    

    And you should see output similar to the following:

    NAME                      READY   STATUS    RESTARTS   AGE
    filebeat-filebeat-4xpcw   0/1     Running   0          50s
    

    You'll see the pod isn't listed as ready, but that's ok because the file output we're testing with doesn't support readiness probes. It should still work for us though!

Viewing the Logs#

Since we are only using the file output (it's less infra to deploy), we will simply exec into the pod to view the logs. Once inside, we'll use jq to filter and view our logs.

  1. Exec into the pod using the pod's name.

    kubectl exec -tin platform-filebeat $(kubectl get pods -n platform-filebeat --output=jsonpath="{.items[0].metadata.name}") -- bash
    
  2. Now, let's install jq! Run the following inside your pod shell.

    yum install epel-release -y
    yum install jq -y
    
  3. Now, let's view some log messages! Run the following command to view all log messages using the default-index index (which is the default when a pod doesn't specify an index).

    cat /tmp/filebeat/filebeat | jq -c '. | select( .fields.index == "default-index" ) | .message'
    

Updating our Sample Tenant#

Let's assume our sample tenant wanted to send logs into their own index. Based on the Filebeat config, all they need to do is add an annotation to their pods and the messages should use the specified index.

  1. Outside of your pod shell, run the following to adjust the sample-tenant deployment to add an annotation of platform-logging-splunkindex=sample-tenant.

    About making manual changes

    While we're doing this manually for a GitOps-driven app, we're only doing this for demo. In fact, Flux will revert this change when it reconciles the sources. We'll switch to mutating the pod in the next section.

    kubectl patch -n sample-tenant deployment sample-app -p '{"spec":{"template":{"metadata":{"annotations":{"platform-logging-splunkindex":"sample-tenant"}}}}}' --type=merge
    

    You should see the pod in the sample-tenant namespace restart. By describing the pod, you should see the new annotation:

    kubectl describe pods -n sample-tenant
    

    And the output should look similar to the following:

    Name:         sample-app-7c5565d7cc-jcfbd
    Namespace:    sample-tenant
    Priority:     0
    Node:         docker-desktop/192.168.65.4
    Start Time:   Tue, 22 Feb 2022 11:46:34 -0500
    Labels:       app=sample-app
                  pod-template-hash=7c5565d7cc
                  tenant=sample-tenant
    Annotations:  platform-logging-splunkindex: sample-tenant
    Status:       Running
    ...
    
  2. If we jump back into the filebeat shell, we can adjust our jq filter to look for log messages with a fields.index set to sample-tenant.

    cat /tmp/filebeat/filebeat | jq -c '. | select( .fields.index == "sample-tenant" ) | .message'
    

    We should see the startup messages from our sample app!

    "Using sqlite database at /etc/todos/todo.db"
    "Listening on port 3000"
    

Mutating on the Log Index Info#

As mentioned earlier, we can use Gatekeeper to mutate pods to add their log index information. Doing this, teams don't need to remember to add the necessary annotations!

  1. Revert the manual annotation we added by running the following command:

    kubectl patch -n sample-tenant deployment sample-app -p '{"spec":{"template":{"metadata":{"annotations":{"platform-logging-splunkindex":null}}}}}' --type=merge
    
  2. Define a Gatekeeper mutation by running the following command:

    cat <<EOF | kubectl apply -f -
    apiVersion: mutations.gatekeeper.sh/v1beta1
    kind: AssignMetadata
    metadata:
      name: sample-tenant-log-index
    spec:
      match:
        scope: Namespaced
        kinds:
          - apiGroups: ["*"]
            kinds: ["Pod"]
        namespaces: ["sample-tenant"]
      location: "metadata.annotations.platform-logging-splunkindex"
      parameters:
        assign:
          value: "sample-tenant"
    EOF
    
  3. Now, we need to restart the pods in the sample-tenant namespace to get the new annotation mutated onto the pod.

    kubectl delete --all pods --namespace=sample-tenant
    
  4. When they start back up, you should be able to see the annotation on the pod, even though it's not defined by the deployment's pod spec.

    If we look at the log messages in filebeat, we should see the log messages too!

Wrapping Up!#

Hopefully, you understand how log forwarding works and how we try to keep things mostly invisible and automatic for tenants. That's one of the main goals... lower the barrier of entry and make things as automatic as possible!

Later, we'll talk about how the Landlord chart helps us by templating out the Gatekeeper mutations, making it significantly easier to manage.

What's next?#

Next, we're going to talk about node pool management, which helps us isolate workloads from various tenants and provide a cost accounting mechanism.

Go the Node Pool Management subsystem