Commit graph

27 commits

Author SHA1 Message Date
Eric Eastwood
87d93b1ae6
Latest changes from importing/exporting from Grafana 12.3.1 (#19381)
These are automatic changes from importing/exporting from Grafana
12.3.1.

In order to verify that I'm not sneaking in any changes, you can follow
these steps to get the same output.

Reproduction instructions:

 1. Start [Grafana](https://hub.docker.com/r/grafana/grafana)
    ```
docker run -d --name=grafana --add-host
host.docker.internal:host-gateway -p 3000:3000 grafana/grafana
    ```
1. Visit the Grafana dashboard, http://localhost:3000/ (Credentials:
`admin`/`admin`)
 1. Import the Synapse dashboard: `contrib/grafana/synapse.json`
1. Export the Synapse dashboard. On the dashboard page -> **Export** ->
**Export as code** -> Using the **Classic** model -> Check **Export for
sharing externally** -> Copy
 1. Paste into `contrib/grafana/synapse.json`
 1. `git status`/`git diff` to check if there is any diff

Sanity checked the dashboard itself by importing the dashboard on
https://grafana.matrix.org/ (Grafana 10.4.1 according to
https://grafana.matrix.org/api/health). The process-level metrics won't
work because https://github.com/element-hq/synapse/pull/19337 just
merged and isn't on `matrix.org` yet. Also just generally, this
dashboard works for me locally with the
[load-tests](https://github.com/element-hq/synapse-rust-apps/pull/397)
I've been doing.


### Motivation

There are few fixes I want to make to the Grafana dashboard and it sucks
having to manually translate everything back over because we have
different formatting.

Hopefully after this bulk change, future exports will have exactly what
we want to change.
2026-01-16 11:36:49 -06:00
Eric Eastwood
58f59ffbcb
Refactor Grafana dashboard to use server_name label (#19337)
- Update `synapse_xxx` (server-level) metrics to use
`server_name="$server_name",` instead of `instance="$instance"`
- Add `synapse_server_name_info` metric to map Synapse `server_name`s to
the `instance`s they're hosted on.
- For process level metrics, update to use `xxx * on (instance, job,
index) group_left(server_name)
synapse_server_name_info{server_name="$server_name"}`

All of the changes here are backwards compatible with whatever people
were doing before with their Prometheus/Grafana dashboards.

Previously, the recommendation was to use the `instance` label to group
everything under the same server (803e4b4d88/docs/metrics-howto.md (L93-L147))

But the `instance` label actually has a special meaning and we're
actually abusing it by using it that way:

> `instance`: The `<host>:<port>` part of the target's URL that was
scraped.
>
> *--
https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series*

Since https://github.com/element-hq/synapse/issues/18592 (Synapse
`v1.139.0`), we now have the `server_name` label to use instead.


---

Additionally, the assumption that a single process is serving a single
server is no longer true with [Synapse Pro for small
hosts](https://docs.element.io/latest/element-server-suite-pro/synapse-pro-for-small-hosts/overview/).

Part of https://github.com/element-hq/synapse-small-hosts/issues/106

### Motivating use case

Although this change also benefits [Synapse Pro for small
hosts](https://docs.element.io/latest/element-server-suite-pro/synapse-pro-for-small-hosts/overview/)
(https://github.com/element-hq/synapse-small-hosts/issues/106), this is
actually spawning from adding Prometheus metrics to our workerized
Docker image (https://github.com/element-hq/synapse/pull/19324,
https://github.com/element-hq/synapse/pull/19336) with a more correct
label setup (without `instance`) and wanting the dashboard to be better.



### Testing strategy

1. Make sure your firewall allows the Docker containers to communicate
to the host (`host.docker.internal`) so they can access exposed ports of
other Docker containers. We want to allow Synapse to access the
Prometheus container and Grafana to access to the Prometheus container.
- `sudo ufw allow in on docker0 comment "Allow traffic from the default
Docker network to the host machine (host.docker.internal)"`
- `sudo ufw allow in on br-+ comment "(from Matrix Complement testing)
Allow traffic from custom Docker networks to the host machine
(host.docker.internal)"`
- [Complement firewall
docs](ee6acd9154/README.md (potential-conflict-with-firewall-software))
1. Build the Docker image for Synapse: `docker build -t
matrixdotorg/synapse -f docker/Dockerfile .`
([docs](7a24fafbc3/docker/README-testing.md (building-and-running-the-images-manually)))
 1. Generate config for Synapse:
    ```
    docker run -it --rm \
        --mount type=volume,src=synapse-data,dst=/data \
        -e SYNAPSE_SERVER_NAME=my.docker.synapse.server \
        -e SYNAPSE_REPORT_STATS=yes \
        -e SYNAPSE_ENABLE_METRICS=1 \
        matrixdotorg/synapse:latest generate
    ```
 1. Start Synapse:
     ```
    docker run -d --name synapse \
        --mount type=volume,src=synapse-data,dst=/data \
        -p 8008:8008 \
        -p 19090:19090 \
        matrixdotorg/synapse:latest
    ```
1. You should be able to see metrics from Synapse at
http://localhost:19090/_synapse/metrics
 1. Create a Prometheus config (`prometheus.yml`)
    ```yaml
    global:
      scrape_interval: 15s
      scrape_timeout: 15s
      evaluation_interval: 15s
    
    scrape_configs:
      - job_name: prometheus
        scrape_interval: 15s
        metrics_path: /_synapse/metrics
        scheme: http
        static_configs:
          - targets:
# This should point to the Synapse metrics listener (we're using
`host.docker.internal` because this is from within the Prometheus
container)
              - host.docker.internal:19090
    ```
1. Start Prometheus (update the volume bind mount to the config you just
saved somewhere):
    ```
    docker run \
        --detach \
        --name=prometheus \
        --add-host host.docker.internal:host-gateway \
        -p 9090:9090 \
-v
~/Documents/code/random/prometheus-config/prometheus.yml:/etc/prometheus/prometheus.yml
\
        prom/prometheus
    ```
1. Make sure you're seeing some data in Prometheus. On
http://localhost:9090/query, search for `synapse_build_info`
 1. Start [Grafana](https://hub.docker.com/r/grafana/grafana)
    ```
docker run -d --name=grafana --add-host
host.docker.internal:host-gateway -p 3000:3000 grafana/grafana
    ```
1. Visit the Grafana dashboard, http://localhost:3000/ (Credentials:
`admin`/`admin`)
1. **Connections** -> **Data Sources** -> **Add data source** ->
**Prometheus**
     - Prometheus server URL: `http://host.docker.internal:9090`
 1. Import the Synapse dashboard: `contrib/grafana/synapse.json`

To test workers, you can use the testing strategy from
https://github.com/element-hq/synapse/pull/19336 (assumes both changes
from this PR and the other PR are combined)
2026-01-14 17:57:42 -06:00
Erik Johnston
3ba3c7fe7d
Reduce cardinality of metrics on event persister (#19133)
This reduces the size of metrics by ~80%. Responding with the metrics
takes significant amounts of time.
2025-11-12 13:41:58 +00:00
Eric Eastwood
f13a136396
Refactor Gauge metrics to be homeserver-scoped (#18725)
Bulk refactor `Gauge` metrics to be homeserver-scoped. We also add lints
to make sure that new `Gauge` metrics don't sneak in without using the
`server_name` label (`SERVER_NAME_LABEL`).

Part of https://github.com/element-hq/synapse/issues/18592



### Testing strategy

 1. Add the `metrics` listener in your `homeserver.yaml`
    ```yaml
    listeners:
      # This is just showing how to configure metrics either way
      #
      # `http` `metrics` resource
      - port: 9322
        type: http
        bind_addresses: ['127.0.0.1']
        resources:
          - names: [metrics]
            compress: false
      # `metrics` listener
      - port: 9323
        type: metrics
        bind_addresses: ['127.0.0.1']
    ```
1. Start the homeserver: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Fetch `http://localhost:9322/_synapse/metrics` and/or
`http://localhost:9323/metrics`
1. Observe response includes the TODO metrics with the `server_name`
label

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
2025-07-29 10:37:59 -05:00
Eric Eastwood
e80bc4b062
Distinguish all vs local events being persisted in the "Event Send Time Quantiles" graph (#18510)
(Applies to the Grafana graphs)

As discovered by @devonh, we use `synapse_storage_events_persisted_events_total` (which tracks *all* persisted events) for the "Events" rate in the "Event Send Time Quantiles" graph. This is pretty misleading as I would expect it to be the rate of events being sent given the graph title, "Event Send Time Quantiles".

Since the event persistence queues are shared for local and remote events from federation and will block local events being sent, I think it does still make sense to have the event persist rate. I've updated the graph to include the rate of "Local events being persisted" and the rate of "All events being persisted". I think this properly disambiguates and clarifies what the graph is trying to show.
2025-06-05 15:30:28 -05:00
Erik Johnston
0455c40085 Update book location 2023-12-13 16:15:22 +00:00
Michael Sasser
3df70aa800
Replace all Prometheus datasource UIDs of the Grafana Dashboard with the variable ${DS_PROMETHEUS} and remove __inputs (#16471) 2023-10-23 19:50:50 +01:00
Will Lewis
835174180b
Fixed grafana deploy annotations in the dashboard config, so it shows for those not managing matrix.org (#15957)
Removed the 'matrix.org' hardcorded instance setting

Originally introduced in #15674

Co-authored-by: wrjlewis <will.lewis@askattest.com>
2023-07-20 12:33:06 +00:00
Eric Eastwood
0b5f64ff09
Add Synapse version deploy annotations to Grafana dashboard (#15674)
Fix https://github.com/matrix-org/synapse/issues/15662

This manifests as purple lines that show up on all time series panels
that you can hover and see what version was deployed.

Also added a new "Deployed Synapse versions over time" panel
where the color block changes with each version. And mixed this
color block into the "Up" time series panel.

To get the Grafana dashboard JSON to copy here: use the **Share** icon at the top -> **Export** -> check the **Export for sharing externally** option -> **View JSON** or **Save to file**
2023-05-31 14:35:49 -05:00
reivilibre
51e7255fbb
Fix the *MAU Limits* section of the Grafana dashboard relying on a specific job name for the workers of a Synapse deployment. (#14644) 2022-12-13 14:19:43 +00:00
reivilibre
a6514792b2
Update forgotten references to legacy metrics in the included Grafana dashboard. (#14477)
Fixes https://github.com/matrix-org/synapse/issues/14465
2022-11-22 10:51:01 +00:00
reivilibre
b455c2a5ec
Update Grafana dashboard to not use legacy metric names. (#13714) 2022-09-06 12:21:21 +01:00
reivilibre
f48f4dd59e
Update the Grafana dashboard that is included with Synapse in the contrib directory. (#13697)
* Add missing graph to contrib

* Update with minor but plausible changes, including positioning changes

* Newsfile

Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>

Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
2022-09-01 16:27:06 +01:00
Richard van der Hoff
434fd82d5f Update grafana dashboard 2022-08-13 21:50:20 +01:00
Sheogorath
77258b6725
docs(contrib): Add link to documentation in dashboard (#12602) 2022-05-09 10:08:31 +00:00
David Robertson
a10988983a
Break down cache expiry reasons in grafana (#10880)
A follow-up to #10829
2021-09-23 14:45:32 +01:00
Brendan Abolivier
d1f43b731c
Update the Synapse Grafana dashboard (#10570) 2021-08-16 12:57:09 +02:00
Erik Johnston
9c76d0561b
Update the contrib grafana dashboard (#10001) 2021-05-19 11:47:16 +01:00
Richard van der Hoff
fa361c8f65 Update grafana dashboard 2020-07-13 14:48:21 +01:00
Richard van der Hoff
816589b09a update grafana dashboard 2020-06-02 12:44:36 +01:00
Richard van der Hoff
782b811789 update grafana dashboard 2020-03-19 10:45:40 +00:00
Richard van der Hoff
abd334d27b Add extremities graphs to grafana dashboard 2019-06-25 11:51:32 +01:00
Richard van der Hoff
dd1c722a39 format json for grafana dashboard 2019-06-25 08:59:19 +01:00
Richard van der Hoff
6b0ddf8ee5 update grafana dashboard 2019-04-13 13:10:46 +01:00
Richard van der Hoff
0649306fde Update grafana dashboard 2018-09-25 13:29:28 +01:00
Richard van der Hoff
9b92720d88 fix event lag graph 2018-08-07 12:27:34 +01:00
Richard van der Hoff
6aab397ada synapse grafana dashboard 2018-07-31 09:45:58 +01:00