mirror of
https://github.com/element-hq/synapse.git
synced 2026-01-16 23:00:43 +00:00
Add Prometheus [HTTP service discovery](https://prometheus.io/docs/prometheus/latest/http_sd/) endpoint for easy discovery of all workers in Docker image. Follow-up to https://github.com/element-hq/synapse/pull/19324 Spawning from wanting to [run a load test](https://github.com/element-hq/synapse-rust-apps/pull/397) against the Complement Docker image of Synapse and see metrics from the homeserver. `GET http://<synapse_container>:9469/metrics/service_discovery` ```json5 [ { "targets": [ "<host>", ... ], "labels": { "<labelname>": "<labelvalue>", ... } }, ... ] ``` The metrics from each worker can also be accessed via `http://<synapse_container>:9469/metrics/worker/<worker_name>` which is what the service discovery response points to behind the scenes. This way, you only need to expose a single port (9469) to access all metrics. <details> <summary>Real HTTP service discovery response</summary> ```json5 [ { "targets": [ "localhost:9469" ], "labels": { "job": "event_persister", "index": "1", "__metrics_path__": "/metrics/worker/event_persister1" } }, { "targets": [ "localhost:9469" ], "labels": { "job": "event_persister", "index": "2", "__metrics_path__": "/metrics/worker/event_persister2" } }, { "targets": [ "localhost:9469" ], "labels": { "job": "background_worker", "index": "1", "__metrics_path__": "/metrics/worker/background_worker1" } }, { "targets": [ "localhost:9469" ], "labels": { "job": "event_creator", "index": "1", "__metrics_path__": "/metrics/worker/event_creator1" } }, { "targets": [ "localhost:9469" ], "labels": { "job": "user_dir", "index": "1", "__metrics_path__": "/metrics/worker/user_dir1" } }, { "targets": [ "localhost:9469" ], "labels": { "job": "media_repository", "index": "1", "__metrics_path__": "/metrics/worker/media_repository1" } }, { "targets": [ "localhost:9469" ], "labels": { "job": "federation_inbound", "index": "1", "__metrics_path__": "/metrics/worker/federation_inbound1" } }, { "targets": [ "localhost:9469" ], "labels": { "job": "federation_reader", "index": "1", "__metrics_path__": "/metrics/worker/federation_reader1" } }, { "targets": [ "localhost:9469" ], "labels": { "job": "federation_sender", "index": "1", "__metrics_path__": "/metrics/worker/federation_sender1" } }, { "targets": [ "localhost:9469" ], "labels": { "job": "synchrotron", "index": "1", "__metrics_path__": "/metrics/worker/synchrotron1" } }, { "targets": [ "localhost:9469" ], "labels": { "job": "client_reader", "index": "1", "__metrics_path__": "/metrics/worker/client_reader1" } }, { "targets": [ "localhost:9469" ], "labels": { "job": "appservice", "index": "1", "__metrics_path__": "/metrics/worker/appservice1" } }, { "targets": [ "localhost:9469" ], "labels": { "job": "pusher", "index": "1", "__metrics_path__": "/metrics/worker/pusher1" } }, { "targets": [ "localhost:9469" ], "labels": { "job": "device_lists", "index": "1", "__metrics_path__": "/metrics/worker/device_lists1" } }, { "targets": [ "localhost:9469" ], "labels": { "job": "device_lists", "index": "2", "__metrics_path__": "/metrics/worker/device_lists2" } }, { "targets": [ "localhost:9469" ], "labels": { "job": "stream_writers", "index": "1", "__metrics_path__": "/metrics/worker/stream_writers1" } }, { "targets": [ "localhost:9469" ], "labels": { "job": "main", "index": "1", "__metrics_path__": "/metrics/worker/main" } } ] ``` </details> And how it ends up as targets in Prometheus (http://localhost:9090/targets): (image) ### Testing strategy 1. Make sure your firewall allows the Docker containers to communicate to the host (`host.docker.internal`) so they can access exposed ports of other Docker containers. We want to allow Synapse to access the Prometheus container and Grafana to access to the Prometheus container. - `sudo ufw allow in on docker0 comment "Allow traffic from the default Docker network to the host machine (host.docker.internal)"` - `sudo ufw allow in on br-+ comment "(from Matrix Complement testing) Allow traffic from custom Docker networks to the host machine (host.docker.internal)"` - [Complement firewall docs](ee6acd9154/README.md (potential-conflict-with-firewall-software)) 1. Build the Docker image for Synapse: `docker build -t matrixdotorg/synapse -f docker/Dockerfile . && docker build -t matrixdotorg/synapse-workers -f docker/Dockerfile-workers .` ([docs](7a24fafbc3/docker/README-testing.md (building-and-running-the-images-manually))) 1. Start Synapse: ``` docker run -d --name synapse \ --mount type=volume,src=synapse-data,dst=/data \ -e SYNAPSE_SERVER_NAME=my.docker.synapse.server \ -e SYNAPSE_REPORT_STATS=no \ -e SYNAPSE_ENABLE_METRICS=1 \ -p 8008:8008 \ -p 9469:9469 \ matrixdotorg/synapse-workers:latest ``` - Also try with workers: ``` docker run -d --name synapse \ --mount type=volume,src=synapse-data,dst=/data \ -e SYNAPSE_SERVER_NAME=my.docker.synapse.server \ -e SYNAPSE_REPORT_STATS=no \ -e SYNAPSE_ENABLE_METRICS=1 \ -e SYNAPSE_WORKER_TYPES="\ event_persister:2, \ background_worker, \ event_creator, \ user_dir, \ media_repository, \ federation_inbound, \ federation_reader, \ federation_sender, \ synchrotron, \ client_reader, \ appservice, \ pusher, \ device_lists:2, \ stream_writers=account_data+presence+receipts+to_device+typing" \ -p 8008:8008 \ -p 9469:9469 \ matrixdotorg/synapse-workers:latest ``` 1. You should be able to see Prometheus service discovery endpoint at http://localhost:9469/metrics/service_discovery 1. Create a Prometheus config (`prometheus.yml`) ```yaml global: scrape_interval: 15s scrape_timeout: 15s evaluation_interval: 15s scrape_configs: - job_name: synapse scrape_interval: 15s metrics_path: /_synapse/metrics scheme: http # We set `honor_labels` so that each service can set their own `job` label # # > honor_labels controls how Prometheus handles conflicts between labels that are # > already present in scraped data and labels that Prometheus would attach # > server-side ("job" and "instance" labels, manually configured target # > labels, and labels generated by service discovery implementations). # > # > *-- https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config* honor_labels: true # Use HTTP service discovery # # Reference: # - https://prometheus.io/docs/prometheus/latest/http_sd/ # - https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config http_sd_configs: - url: 'http://localhost:9469/metrics/service_discovery' ``` 1. Start Prometheus (update the volume bind mount to the config you just saved somewhere): ``` docker run \ --detach \ --name=prometheus \ --add-host host.docker.internal:host-gateway \ -p 9090:9090 \ -v ~/Documents/code/random/prometheus-config/prometheus.yml:/etc/prometheus/prometheus.yml \ prom/prometheus ``` 1. Make sure you're seeing some data in Prometheus. On http://localhost:9090/query, search for `synapse_build_info` 1. Start [Grafana](https://hub.docker.com/r/grafana/grafana) ``` docker run -d --name=grafana --add-host host.docker.internal:host-gateway -p 3000:3000 grafana/grafana ``` 1. Visit the Grafana dashboard, http://localhost:3000/ (Credentials: `admin`/`admin`) 1. **Connections** -> **Data Sources** -> **Add data source** -> **Prometheus** - Prometheus server URL: `http://host.docker.internal:9090` 1. Import the Synapse dashboard: https://github.com/element-hq/synapse/blob/develop/contrib/grafana/synapse.json
92 lines
4.1 KiB
Text
92 lines
4.1 KiB
Text
# syntax=docker/dockerfile:1-labs
|
|
|
|
ARG SYNAPSE_VERSION=latest
|
|
ARG FROM=matrixdotorg/synapse:$SYNAPSE_VERSION
|
|
ARG DEBIAN_VERSION=trixie
|
|
ARG PYTHON_VERSION=3.13
|
|
ARG REDIS_VERSION=7.2
|
|
|
|
# first of all, we create a base image with dependencies which we can copy into the
|
|
# target image. For repeated rebuilds, this is much faster than apt installing
|
|
# each time.
|
|
|
|
FROM ghcr.io/astral-sh/uv:python${PYTHON_VERSION}-${DEBIAN_VERSION} AS deps_base
|
|
|
|
ARG DEBIAN_VERSION
|
|
ARG REDIS_VERSION
|
|
|
|
# Tell apt to keep downloaded package files, as we're using cache mounts.
|
|
RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
|
|
|
|
# The upstream redis-server deb has fewer dynamic libraries than Debian's package which makes it easier to copy later on
|
|
RUN \
|
|
curl -fsSL https://packages.redis.io/gpg | gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg && \
|
|
chmod 644 /usr/share/keyrings/redis-archive-keyring.gpg && \
|
|
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb ${DEBIAN_VERSION} main" | tee /etc/apt/sources.list.d/redis.list
|
|
|
|
RUN \
|
|
--mount=type=cache,target=/var/cache/apt,sharing=locked \
|
|
--mount=type=cache,target=/var/lib/apt,sharing=locked \
|
|
apt-get update -qq && \
|
|
DEBIAN_FRONTEND=noninteractive apt-get install -yqq --no-install-recommends \
|
|
nginx-light \
|
|
redis-server="6:${REDIS_VERSION}.*" redis-tools="6:${REDIS_VERSION}.*" \
|
|
# libicu is required by postgres, see `docker/complement/Dockerfile`
|
|
libicu76
|
|
|
|
RUN \
|
|
# remove default page
|
|
rm /etc/nginx/sites-enabled/default && \
|
|
# have nginx log to stderr/out
|
|
ln -sf /dev/stdout /var/log/nginx/access.log && \
|
|
ln -sf /dev/stderr /var/log/nginx/error.log
|
|
|
|
# --link-mode=copy silences a warning as uv isn't able to do hardlinks between its cache
|
|
# (mounted as --mount=type=cache) and the target directory.
|
|
RUN --mount=type=cache,target=/root/.cache/uv \
|
|
uv pip install --link-mode=copy --prefix="/uv/usr/local" supervisor~=4.2
|
|
|
|
RUN mkdir -p /uv/etc/supervisor/conf.d
|
|
|
|
# now build the final image, based on the the regular Synapse docker image
|
|
FROM $FROM
|
|
|
|
# Copy over dependencies
|
|
COPY --from=deps_base --parents /usr/lib/*-linux-gnu/libicu* /
|
|
COPY --from=deps_base /usr/bin/redis-server /usr/local/bin
|
|
COPY --from=deps_base /uv /
|
|
COPY --from=deps_base /usr/sbin/nginx /usr/sbin
|
|
COPY --from=deps_base /usr/share/nginx /usr/share/nginx
|
|
COPY --from=deps_base /usr/lib/nginx /usr/lib/nginx
|
|
COPY --from=deps_base /etc/nginx /etc/nginx
|
|
COPY --from=deps_base /var/log/nginx /var/log/nginx
|
|
# chown to allow non-root user to write to http-*-temp-path dirs
|
|
COPY --from=deps_base --chown=www-data:root /var/lib/nginx /var/lib/nginx
|
|
|
|
# Copy Synapse worker, nginx and supervisord configuration template files
|
|
COPY ./docker/conf-workers/* /conf/
|
|
|
|
# Copy a script to prefix log lines with the supervisor program name
|
|
COPY ./docker/prefix-log /usr/local/bin/
|
|
|
|
# Expose nginx listener port
|
|
EXPOSE 8080/tcp
|
|
# Metrics for workers are on ports starting from 19091 but since these are dynamic
|
|
# we don't expose them by default (metrics must be enabled with
|
|
# SYNAPSE_ENABLE_METRICS=1)
|
|
#
|
|
# Instead, we expose a single port used for Prometheus HTTP service discovery
|
|
# (`http://<synapse_container>:9469/metrics/service_discovery`) and proxy all of the
|
|
# workers' metrics endpoints through that
|
|
# (`http://<synapse_container>:9469/metrics/worker/<worker_name>`).
|
|
EXPOSE 9469/tcp
|
|
|
|
# A script to read environment variables and create the necessary
|
|
# files to run the desired worker configuration. Will start supervisord.
|
|
COPY ./docker/configure_workers_and_start.py /configure_workers_and_start.py
|
|
ENTRYPOINT ["/configure_workers_and_start.py"]
|
|
|
|
# Replace the healthcheck with one which checks *all* the workers. The script
|
|
# is generated by configure_workers_and_start.py.
|
|
HEALTHCHECK --start-period=5s --interval=15s --timeout=5s \
|
|
CMD ["/healthcheck.sh"]
|