First of all, we use helm. So we start here: https://docs.datadoghq.com/agent/kubernetes/?tab=helm
Add the repository
helm repo add datadog https://helm.datadoghq.com helm repo add stable https://charts.helm.sh/stable helm repo update
Install the chart
helm install datadog -f values.yaml --set datadog.apiKey=<DATADOG_API_KEY> datadog/datadog --set targetSystem=linux
Special case: AWS
And then because we use EKS and thus Amazon Linux 2 we need to do this (source: https://artifacthub.io/packages/helm/datadog/datadog#configuration-required-for-amazon-linux-2-based-nodes)
agents: # (...) podSecurity: # (...) apparmor: # (...) enabled: false # (...)
Then we need to tell the pods where the datadog api is (it is no longer on localhost inside a container, as we were used to from the datadog buildpack, but on the node it is scheduled on). This can be pushed to us on pod creation (existing pods wont get this value though - but we deploy changes to all applications anyway and change a hardcoded localhost to a read of the environment variable DD_AGENT_HOST)
admissionController: # clusterAgent.admissionController.enabled -- Enable the admissionController to be able to inject APM/Dogstatsd config and standard tags (env, service, version) automatically into your pods enabled: true # clusterAgent.admissionController.mutateUnlabelled -- Enable injecting config without having the pod label 'admission.datadoghq.com/enabled="true"' mutateUnlabelled: true
Findin all nodes: taint and tolerations
Finally we ran into a minor issue; which took forever to figure out. Our cluster have two autoscaling groups, one for long running pods, and one for pods that we allow to get rescheduled (most of our apps are ok with that). But this is set up as a taint in kubernetes; and we need to tell datadog about the toleration, otherwise only the nodes without the taint will get a datadog agent.
tolerations: - key: "removable" operator: "Equal" value: "true" effect: "NoSchedule"
A quick test
echo "some.new.to.send.to:1|c" |nc -w0 -u $DD_AGENT_HOST 8125
on a newly scheduled pod, and everything worked!