Service Discovery
Service discovery is one of the major features of Prometheus. As a monitoring tool, being up to date with the infrastructure is critical. Configuring targets manually and keeping the list up to date is a tedious work; not doing it creates a partial view of the infrastructure and alert fatigue.
That is why Prometheus can take away this task and use sources of truth instead. At the time of writing this blog post, it supports 22 service discovery mechanisms.
Over the last year, 10 new service discovery mechanisms were added. We keep adding new ones, depending on different rules. When accepting new service discoveries, we look at community interest, technical details, and how we can, as Prometheus maintainers, develop and support the service discovery in the future.
We can not support all the service discovery mechanisms. In particular, we can’t maintain service discoveries we don’t have access to. In that category, you would find your in-house closed-source CMDB, and cloud providers who do not provide an open-source tier.
Another thing that can block us from merging new service discoveries is the technical difficulties. An example is netbox. We have got multiple requests in the past to have a netbox service discovery, but when I looked into the implementation, the go bindings were not recommended by the netbox community. Therefore, a native implementation was quite difficult.
File-based service discovery
For the service discovery mechanisms that are not integrated natively in
Prometheus, the solution has always been file_sd
. File SD is a service
discovery based on files.
Prometheus can read YAML and JSON files and update its targets accordingly. This is a powerful way to configure Prometheus. This method has a few advantages:
You can generate the file the way you want, e.g. with configuration management software, or with dedicated sidecars.
Inotify makes this approach event-based. As soon as the file changes, we can pick up the changes.
And, over the years, many people developed side cars that enables a rich SD integration with Prometheus.
Drawbacks of file service discovery
The file-based service discovery mechanism has two major drawbacks: the first one is that in most cases you need an extra process running next to your Prometheus server to generate the file.
The second one is that this sidecar must share a filesystem with the Prometheus server.
Those drawbacks did not stop the implementation of service discoveries, but in some cases, we have seen people mocking other networking service discoveries to avoid this situation. This is obviously not recommended.
As Prometheus matures, it was time to add a generic network-based service discovery.
http_sd
The HTTP Service Discover enables the discovery of targets over the HTTP protocol. The discovery source has to expose targets over an HTTP endpoint.
This approach lifts some of the limitations of the file_sd
. The sources of truth do not need to share a filesystem with Prometheus, and therefore sidecars are not needed.
It also uses the Prometheus HTTP client, which means that we can use features like Authentication (Basic, Bearer Token, OAuth2, Client certificate), TLS and HTTP proxy.
Format
The HTTP Service Discovery body format is the same as the file_sd
JSON format:
[
{
"targets": [ "<host>", ... ],
"labels": {
"<labelname>": "<labelvalue>", ...
}
},
...
]
Which translates to:
[
{
"targets": ["10.0.10.2:9100", "10.0.10.3:9100"],
"labels": {
"__meta_datacenter": "london"
}
},
{
"targets": ["10.0.40.2:9100", "10.0.40.3:9100"],
"labels": {
"__meta_datacenter": "london"
}
}
]
This is actually a list of groups, and you can have multiple groups of one item each if you want more detailed labels:
[
{
"targets": ["10.0.10.2:9100"],
"labels": {
"datacenter": "london",
"hostname": "frontend03"
}
},
{
"targets": ["10.0.10.4:9100"],
"labels": {
"datacenter": "london",
"hostname": "frontend04"
}
}
]
Note that this last example does not have the prefix __meta
on the labels. The
prefix is not mandatory, but I would recommend using it if you publish a
software for others to use, and let users use relabeling to extract the labels
they need. For internal projects, it’s fine to remove the __meta
prefix.
If no targets are discoverable, you can return an
empty JSON list: []
.
Prometheus configuration
On the Prometheus server, it is simply needed to indicate the URL of your endpoint, in an http_sd_configs section:
scrape_configs:
- job_name: mycmdb
http_sd_configs:
- url: http://mycmdb.internal/prometheus-sd-targets
You can easily add authentication and use TLS:
scrape_configs:
- job_name: mycmdb
http_sd_configs:
- url: https://mycmdb.internal/prometheus-sd-targets
basic_auth:
username: prometheus
password: changeme
Implementation
When implementing such a service discovery, there is a few things you should know.
You must send the Content-Type: application/json
HTTP Header. This
ensures that we do not try to decode arbitrary endpoints.
Every response should contain all the targets. We do not cache targets across restarts. You could cache them in your endpoint if it costs too much to recompute them each time. The HTTP response code should be HTTP 200.
As explained above, we support multiple authentication mechanisms. OAuth 2, Basic Authentication, Bearer token, Client certificate. The URL of the http_sd, however, is not considered secret. Please use an existing authentication mechanism to secure your endpoint.
Differences with file_sd
There are a few differences between http_sd and file_sd. Both services discoveries are supported and valid.
file_sd
supports YAML in addition to JSON. http_sd is limited to JSON.
Inotify makes file_sd
“event-based”,
meaning we would update the targets as soon as the file has changed. http_sd
just polls at regular intervals.
When should you use http_sd or file_sd?
I expect that http_sd will at some point become a new point of Prometheus integration for third party software. It is convenient, and does not require extra binaries to sit in the same filesystem as Prometheus.
It is a great option when your service discovery has not been accepted in Prometheus itself. We can’t have every service discovery in Prometheus because it causes a lot of work, and we need to have access to supported service discoveries to maintain them in the long term.
It’s also a viable option if your service discovery source is not written in go. Adding an HTTP endpoint directly within your application can be a lot better than trying to create mappers in go.
The last use case is when you want to combine exporters and service discovery. This is a new idea that is likely to be experimented soon.
Community showcase
I tried to list some early implementations I’ve found in Github.
Prometheus vCD SD is a
community built Service Discovery mechanism for VMWare vCloud Director. This is
an interesting one because it supports both file_sd
and http_sd
. Because we
made file_sd
and http_sd
format the same, in order to support HTTP SD, the
discovery was adapted to serve its file_sd
files over HTTP.
netbox-plugin-prometheus-sd is a Plugin that can be installed
in your netbox installation to add an SD endpoint to netbox. This is much more
convenient for the Netbox community than the file_sd
sidecar, as plugins are
quite the norm in the netbox ecosystem.
fastly-exporter might gain the ability to tell Prometheus its targets of interest. This would help to spread the queries of the different services, rather than doing one big scrape.
Conclusion
http_sd
is mainly the JSON file_sd over HTTP. We did not create a new, fancy
service discovery mechanism. Instead, we have built a pragmatic solution, which
is easy to use and understand. It will help a lot of people, and open new
possibilities for Prometheus users.
Are you having trouble with setting up and scaling Prometheus? Our team of observability experts can support the operations of your Prometheus environments.