Target Allocator
If you’ve enabled Target Allocator service discovery on the OpenTelemetry Operator, and the Target Allocator is failing to discover scrape targets, there are a few troubleshooting steps that you can take to help you understand what’s going on and restore normal operation.
Troubleshooting steps
Did you deploy all of your resources to Kubernetes?
As a first step, make sure that you have deployed all relevant resources to your Kubernetes cluster.
Do you know if metrics are actually being scraped?
After you’ve deployed all of your resources to Kubernetes, make sure that the
Target Allocator is discovering scrape targets from your
ServiceMonitor
(s)
or
PodMonitor
(s).
Suppose that you have this ServiceMonitor
definition:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: sm-example
namespace: opentelemetry
labels:
app.kubernetes.io/name: py-prometheus-app
release: prometheus
spec:
selector:
matchLabels:
app: my-app
namespaceSelector:
matchNames:
- opentelemetry
endpoints:
- port: prom
path: /metrics
- port: py-client-port
interval: 15s
- port: py-server-port
this Service
definition:
apiVersion: v1
kind: Service
metadata:
name: py-prometheus-app
namespace: opentelemetry
labels:
app: my-app
app.kubernetes.io/name: py-prometheus-app
spec:
selector:
app: my-app
app.kubernetes.io/name: py-prometheus-app
ports:
- name: prom
port: 8080
and this OpenTelemetryCollector
definition:
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
name: otelcol
namespace: opentelemetry
spec:
mode: statefulset
targetAllocator:
enabled: true
serviceAccount: opentelemetry-targetallocator-sa
prometheusCR:
enabled: true
podMonitorSelector: {}
serviceMonitorSelector: {}
config:
receivers:
otlp:
protocols:
grpc: {}
http: {}
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 10s
static_configs:
- targets: ['0.0.0.0:8888']
processors:
batch: {}
exporters:
debug:
verbosity: detailed
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [debug]
metrics:
receivers: [otlp, prometheus]
processors: []
exporters: [debug]
logs:
receivers: [otlp]
processors: [batch]
exporters: [debug]
First, set up a port-forward
in Kubernetes, so that you can expose the Target
Allocator service:
kubectl port-forward svc/otelcol-targetallocator -n opentelemetry 8080:80
Where otelcol-targetallocator
is the value of metadata.name
in your
OpenTelemetryCollector
CR concatenated with the -targetallocator
suffix, and
opentelemetry
is the namespace to which the OpenTelemetryCollector
CR is
deployed.
Tip
You can also get the service name by running
kubectl get svc -l app.kubernetes.io/component=opentelemetry-targetallocator -n <namespace>
Next, get a list of jobs registered with the Target Allocator:
curl localhost:8080/jobs | jq
Your sample output should look like this:
{
"serviceMonitor/opentelemetry/sm-example/1": {
"_link": "/jobs/serviceMonitor%2Fopentelemetry%2Fsm-example%2F1/targets"
},
"serviceMonitor/opentelemetry/sm-example/2": {
"_link": "/jobs/serviceMonitor%2Fopentelemetry%2Fsm-example%2F2/targets"
},
"otel-collector": {
"_link": "/jobs/otel-collector/targets"
},
"serviceMonitor/opentelemetry/sm-example/0": {
"_link": "/jobs/serviceMonitor%2Fopentelemetry%2Fsm-example%2F0/targets"
},
"podMonitor/opentelemetry/pm-example/0": {
"_link": "/jobs/podMonitor%2Fopentelemetry%2Fpm-example%2F0/targets"
}
}
Where serviceMonitor/opentelemetry/sm-example/0
represents one of the
Service
ports that the ServiceMonitor
picked up:
opentelemetry
is the namespace in which theServiceMonitor
resource resides.sm-example
is the name of theServiceMonitor
.0
is one of the port endpoints matched between theServiceMonitor
and theService
.
Similarly, the PodMonitor
, shows up as podMonitor/opentelemetry/pm-example/0
in the curl
output.
This is good news, because it tells us that the scrape config discovery is working!
You might also be wondering about the otel-collector
entry. This is happening
because spec.config.receivers.prometheusReceiver
in the
OpenTelemetryCollector
resource (named otel-collector
) has self-scrape
enabled:
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 10s
static_configs:
- targets: ['0.0.0.0:8888']
We can take a deeper look into serviceMonitor/opentelemetry/sm-example/0
, to
see what scrape targets are getting picked up by running curl
against the
value of the _link
output above:
curl localhost:8080/jobs/serviceMonitor%2Fopentelemetry%2Fsm-example%2F0/targets | jq
Sample output:
{
"otelcol-collector-0": {
"_link": "/jobs/serviceMonitor%2Fopentelemetry%2Fsm-example%2F0/targets?collector_id=otelcol-collector-0",
"targets": [
{
"targets": ["10.244.0.11:8080"],
"labels": {
"__meta_kubernetes_endpointslice_port_name": "prom",
"__meta_kubernetes_pod_labelpresent_app_kubernetes_io_name": "true",
"__meta_kubernetes_endpointslice_port_protocol": "TCP",
"__meta_kubernetes_endpointslice_address_target_name": "py-prometheus-app-575cfdd46-nfttj",
"__meta_kubernetes_endpointslice_annotation_endpoints_kubernetes_io_last_change_trigger_time": "2024-06-21T20:01:37Z",
"__meta_kubernetes_endpointslice_labelpresent_app_kubernetes_io_name": "true",
"__meta_kubernetes_pod_name": "py-prometheus-app-575cfdd46-nfttj",
"__meta_kubernetes_pod_controller_name": "py-prometheus-app-575cfdd46",
"__meta_kubernetes_pod_label_app_kubernetes_io_name": "py-prometheus-app",
"__meta_kubernetes_endpointslice_address_target_kind": "Pod",
"__meta_kubernetes_pod_node_name": "otel-target-allocator-talk-control-plane",
"__meta_kubernetes_pod_labelpresent_pod_template_hash": "true",
"__meta_kubernetes_endpointslice_label_kubernetes_io_service_name": "py-prometheus-app",
"__meta_kubernetes_endpointslice_annotationpresent_endpoints_kubernetes_io_last_change_trigger_time": "true",
"__meta_kubernetes_service_name": "py-prometheus-app",
"__meta_kubernetes_pod_ready": "true",
"__meta_kubernetes_pod_labelpresent_app": "true",
"__meta_kubernetes_pod_controller_kind": "ReplicaSet",
"__meta_kubernetes_endpointslice_labelpresent_app": "true",
"__meta_kubernetes_pod_container_image": "otel-target-allocator-talk:0.1.0-py-prometheus-app",
"__address__": "10.244.0.11:8080",
"__meta_kubernetes_service_label_app_kubernetes_io_name": "py-prometheus-app",
"__meta_kubernetes_pod_uid": "495d47ee-9a0e-49df-9b41-fe9e6f70090b",
"__meta_kubernetes_endpointslice_port": "8080",
"__meta_kubernetes_endpointslice_label_endpointslice_kubernetes_io_managed_by": "endpointslice-controller.k8s.io",
"__meta_kubernetes_endpointslice_label_app": "my-app",
"__meta_kubernetes_service_labelpresent_app_kubernetes_io_name": "true",
"__meta_kubernetes_pod_host_ip": "172.24.0.2",
"__meta_kubernetes_namespace": "opentelemetry",
"__meta_kubernetes_endpointslice_endpoint_conditions_serving": "true",
"__meta_kubernetes_endpointslice_labelpresent_kubernetes_io_service_name": "true",
"__meta_kubernetes_endpointslice_endpoint_conditions_ready": "true",
"__meta_kubernetes_service_annotation_kubectl_kubernetes_io_last_applied_configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Service\",\"metadata\":{\"annotations\":{},\"labels\":{\"app\":\"my-app\",\"app.kubernetes.io/name\":\"py-prometheus-app\"},\"name\":\"py-prometheus-app\",\"namespace\":\"opentelemetry\"},\"spec\":{\"ports\":[{\"name\":\"prom\",\"port\":8080}],\"selector\":{\"app\":\"my-app\",\"app.kubernetes.io/name\":\"py-prometheus-app\"}}}\n",
"__meta_kubernetes_endpointslice_endpoint_conditions_terminating": "false",
"__meta_kubernetes_pod_container_port_protocol": "TCP",
"__meta_kubernetes_pod_phase": "Running",
"__meta_kubernetes_pod_container_name": "my-app",
"__meta_kubernetes_pod_container_port_name": "prom",
"__meta_kubernetes_pod_ip": "10.244.0.11",
"__meta_kubernetes_service_annotationpresent_kubectl_kubernetes_io_last_applied_configuration": "true",
"__meta_kubernetes_service_labelpresent_app": "true",
"__meta_kubernetes_endpointslice_address_type": "IPv4",
"__meta_kubernetes_service_label_app": "my-app",
"__meta_kubernetes_pod_label_app": "my-app",
"__meta_kubernetes_pod_container_port_number": "8080",
"__meta_kubernetes_endpointslice_name": "py-prometheus-app-bwbvn",
"__meta_kubernetes_pod_label_pod_template_hash": "575cfdd46",
"__meta_kubernetes_endpointslice_endpoint_node_name": "otel-target-allocator-talk-control-plane",
"__meta_kubernetes_endpointslice_labelpresent_endpointslice_kubernetes_io_managed_by": "true",
"__meta_kubernetes_endpointslice_label_app_kubernetes_io_name": "py-prometheus-app"
}
}
]
}
}
The query parameter collector_id
in the _link
field of the above output
states that these are the targets pertain to otelcol-collector-0
(the name of
the StatefulSet
created for the OpenTelemetryCollector
resource).
Is the Target Allocator enabled? Is Prometheus service discovery enabled?
If the curl
commands above don’t show a list of expected ServiceMonitor
s and
PodMonitor
s, you need to check whether the features that populate those values
are turned on.
One thing to remember is that just because you include the targetAllocator
section in the OpenTelemetryCollector
CR doesn’t mean that it’s enabled. You
need to explicitly enable it. Furthermore, if you want to use
Prometheus service discovery,
you must explicitly enable it:
- Set
spec.targetAllocator.enabled
totrue
- Set
spec.targetAllocator.prometheusCR.enabled
totrue
So that your OpenTelemetryCollector
resource looks like this:
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
name: otelcol
namespace: opentelemetry
spec:
mode: statefulset
targetAllocator:
enabled: true
serviceAccount: opentelemetry-targetallocator-sa
prometheusCR:
enabled: true
See the full OpenTelemetryCollector
resource definition in “Do you know if metrics are actually being scraped?”.
Did you configure a ServiceMonitor (or PodMonitor) selector?
If you configured a
ServiceMonitor
selector, it means that the Target Allocator only looks for ServiceMonitors
having a metadata.label
that matches the value in
serviceMonitorSelector
.
Suppose that you configured a
serviceMonitorSelector
for your Target Allocator, like in the following example:
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
name: otelcol
namespace: opentelemetry
spec:
mode: statefulset
targetAllocator:
enabled: true
serviceAccount: opentelemetry-targetallocator-sa
prometheusCR:
enabled: true
serviceMonitorSelector:
matchLabels:
app: my-app
By setting the value of
spec.targetAllocator.prometheusCR.serviceMonitorSelector.matchLabels
to
app: my-app
, it means that your ServiceMonitor
resource must in turn have
that same value in metadata.labels
:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: sm-example
labels:
app: my-app
release: prometheus
spec:
See the full ServiceMonitor
resource definition in “Do you know if metrics are actually being scraped?”.
In this case, the OpenTelemetryCollector
resource’s
prometheusCR.serviceMonitorSelector.matchLabels
is looking only for
ServiceMonitors
having the label app: my-app
, which we see in the previous
example.
If your ServiceMonitor
resource is missing that label, then the Target
Allocator will fail to discover scrape targets from that ServiceMonitor
.
Tip
The same applies if you’re using a PodMonitor. In that case, you would use apodMonitorSelector
instead of a serviceMonitorSelector
.Did you leave out the serviceMonitorSelector and/or podMonitorSelector configuration altogether?
As mentioned in
“Did you configure a ServiceMonitor or PodMonitor selector”,
setting mismatched values for serviceMonitorSelector
and podMonitorSelector
results in the Target Allocator failing to discover scrape targets from your
ServiceMonitors
and PodMonitors
, respectively.
Similarly, in
v1beta1
of the OpenTelemetryCollector
CR, leaving out this configuration altogether
also results in the Target Allocator failing to discover scrape targets from
your ServiceMonitors
and PodMonitors
.
As of v1beta1
of the OpenTelemetryOperator
, a serviceMonitorSelector
and
podMonitorSelector
must be included, even if you don’t intend to use it, like
this:
prometheusCR:
enabled: true
podMonitorSelector: {}
serviceMonitorSelector: {}
This configuration means that it will match on all PodMonitor
and
ServiceMonitor
resources. See the
full OpenTelemetryCollector definition in “Do you know if metrics are actually being scraped?”.
Do your labels, namespaces, and ports match for your ServiceMonitor and your Service (or PodMonitor and your Pod)?
The ServiceMonitor
is configured to pick up Kubernetes
Services
that match on:
- Labels
- Namespaces (optional)
- Ports (endpoints)
Suppose that you have this ServiceMonitor
:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: sm-example
labels:
app: my-app
release: prometheus
spec:
selector:
matchLabels:
app: my-app
namespaceSelector:
matchNames:
- opentelemetry
endpoints:
- port: prom
path: /metrics
- port: py-client-port
interval: 15s
- port: py-server-port
The previous ServiceMonitor
is looking for any services that have:
- the label
app: my-app
- reside in a namespace called
opentelemetry
- a port named
prom
,py-client-port
, orpy-server-port
For example, the following Service
resource would get picked up by the
ServiceMonitor
, because it matches the previous criteria:
apiVersion: v1
kind: Service
metadata:
name: py-prometheus-app
namespace: opentelemetry
labels:
app: my-app
app.kubernetes.io/name: py-prometheus-app
spec:
selector:
app: my-app
app.kubernetes.io/name: py-prometheus-app
ports:
- name: prom
port: 8080
The following Service
resource would not be picked up, because the
ServiceMonitor
is looking for ports named prom
, py-client-port
, or
py-server-port
, and this service’s port is called bleh
.
apiVersion: v1
kind: Service
metadata:
name: py-prometheus-app
namespace: opentelemetry
labels:
app: my-app
app.kubernetes.io/name: py-prometheus-app
spec:
selector:
app: my-app
app.kubernetes.io/name: py-prometheus-app
ports:
- name: bleh
port: 8080
Tip
If you’re usingPodMonitor
, the same applies, except that it picks up
Kubernetes pods that match on labels, namespaces, and named ports.Comentarios
¿Fue útil esta página?
Thank you. Your feedback is appreciated!
Please let us know how we can improve this page. Your feedback is appreciated!