This changes the arguments in clock functions to be `Duration` and
converts call sites and constants into `Duration`. There are still some
more functions around that should be converted (e.g.
`timeout_deferred`), but we leave that to another PR.
We also changes `.as_secs()` to return a float, as the rounding broke
things subtly. The only reason to keep it (its the same as
`timedelta.total_seconds()`) is for symmetry with `as_millis()`.
Follows on from https://github.com/element-hq/synapse/pull/19223
### Background
As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process), we're
currently diving into the details and implications of running multiple
instances of Synapse in the same Python process.
"Per-tenant logging" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/48
### Prior art
Previously, we exposed `server_name` by providing a static logging
`MetadataFilter` that injected the values:
205d9e4fc4/synapse/config/logger.py (L216)
While this can work fine for the normal case of one Synapse instance per
Python process, this configures things globally and isn't compatible
when we try to start multiple Synapse instances because each subsequent
tenant will overwrite the previous tenant.
### What does this PR do?
We remove the `MetadataFilter` and replace it by tracking the
`server_name` in the `LoggingContext` and expose it with our existing
[`LoggingContextFilter`](205d9e4fc4/synapse/logging/context.py (L584-L622))
that we already use to expose information about the `request`.
This means that the `server_name` value follows wherever we log as
expected even when we have multiple Synapse instances running in the
same process.
### A note on logcontext
Anywhere, Synapse mistakenly uses the `sentinel` logcontext to log
something, we won't know which server sent the log. We've been fixing up
`sentinel` logcontext usage as tracked by
https://github.com/element-hq/synapse/issues/18905
Any further `sentinel` logcontext usage we find in the future can be
fixed piecemeal as normal.
d2a966f922/docs/log_contexts.md (L71-L81)
### Testing strategy
1. Adjust your logging config to include `%(server_name)s` in the format
```yaml
formatters:
precise:
format: '%(asctime)s - %(server_name)s - %(name)s - %(lineno)d -
%(levelname)s - %(request)s - %(message)s'
```
1. Start Synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Make some requests (`curl
http://localhost:8008/_matrix/client/versions`, etc)
1. Open the homeserver logs and notice the `server_name` in the logs as
expected. `unknown_server_from_sentinel_context` is expected for the
`sentinel` logcontext (things outside of Synapse).
Bulk refactor `Histogram` metrics to be homeserver-scoped. We also add
lints to make sure that new `Histogram` metrics don't sneak in without
using the `server_name` label (`SERVER_NAME_LABEL`).
Part of https://github.com/element-hq/synapse/issues/18592
### Testing strategy
1. Add the `metrics` listener in your `homeserver.yaml`
```yaml
listeners:
# This is just showing how to configure metrics either way
#
# `http` `metrics` resource
- port: 9322
type: http
bind_addresses: ['127.0.0.1']
resources:
- names: [metrics]
compress: false
# `metrics` listener
- port: 9323
type: metrics
bind_addresses: ['127.0.0.1']
```
1. Start the homeserver: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Fetch `http://localhost:9322/_synapse/metrics` and/or
`http://localhost:9323/metrics`
1. Observe response includes the TODO metrics with the `server_name`
label
### Todo
- [x] Wait for https://github.com/element-hq/synapse/pull/18656 to merge
### Dev notes
```
LoggingDatabaseConnection
make_conn
make_pool
make_fake_db_pool
```
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
Bulk refactor `Gauge` metrics to be homeserver-scoped. We also add lints
to make sure that new `Gauge` metrics don't sneak in without using the
`server_name` label (`SERVER_NAME_LABEL`).
Part of https://github.com/element-hq/synapse/issues/18592
### Testing strategy
1. Add the `metrics` listener in your `homeserver.yaml`
```yaml
listeners:
# This is just showing how to configure metrics either way
#
# `http` `metrics` resource
- port: 9322
type: http
bind_addresses: ['127.0.0.1']
resources:
- names: [metrics]
compress: false
# `metrics` listener
- port: 9323
type: metrics
bind_addresses: ['127.0.0.1']
```
1. Start the homeserver: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Fetch `http://localhost:9322/_synapse/metrics` and/or
`http://localhost:9323/metrics`
1. Observe response includes the TODO metrics with the `server_name`
label
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
Bulk refactor `Counter` metrics to be homeserver-scoped. We also add
lints to make sure that new `Counter` metrics don't sneak in without
using the `server_name` label (`SERVER_NAME_LABEL`).
All of the "Fill in" commits are just bulk refactor.
Part of https://github.com/element-hq/synapse/issues/18592
### Testing strategy
1. Add the `metrics` listener in your `homeserver.yaml`
```yaml
listeners:
# This is just showing how to configure metrics either way
#
# `http` `metrics` resource
- port: 9322
type: http
bind_addresses: ['127.0.0.1']
resources:
- names: [metrics]
compress: false
# `metrics` listener
- port: 9323
type: metrics
bind_addresses: ['127.0.0.1']
```
1. Start the homeserver: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Fetch `http://localhost:9322/_synapse/metrics` and/or
`http://localhost:9323/metrics`
1. Observe response includes the `synapse_user_registrations_total`,
`synapse_http_server_response_count_total`, etc metrics with the
`server_name` label
This should be reviewed commit by commit.
Nowadays it's trivial to propagate cache invalidations, which means we
can move some things off the main process, and not go through HTTP
replication.
`ReplicationGetQueryRestServlet` appeared to be unused, and was very
weird, as it was being called if the current instance is the main one…
to RPC to the main one (if no instance is set on a replication client,
it makes it to the main process)
The other two handlers could be relatively trivially moved to any
workers, moving some methods to the worker store.
**I've intentionally not removed the replication servlets yet** so that
it's safe to rollout, and will do another PR that clean those up to
remove on the N+1 version
This can be reviewed commit by commit.
This enables the `flake8-logging` and `flake8-logging-format` rules in
Ruff, as well as logging exception stack traces in a few places where it
makes sense
- https://docs.astral.sh/ruff/rules/#flake8-logging-log
- https://docs.astral.sh/ruff/rules/#flake8-logging-format-g
### Linting to avoid pre-formatting log messages
See [`adamchainz/flake8-logging` -> *LOG011 avoid pre-formatting log
messages*](152db2f167/README.rst (log011-avoid-pre-formatting-log-messages))
Practically, this means prefer placeholders (`%s`) over f-strings for
logging.
This is because placeholders are passed as args to loggers, so they can
do special handling of them.
For example, Sentry will record the args separately in their logging
integration:
c15b390dfe/sentry_sdk/integrations/logging.py (L280-L284)
One theoretical small perf benefit is that log levels that aren't
enabled won't get formatted, so it doesn't unnecessarily create
formatted strings
Use fully-qualified `PersistedEventPosition` (`instance_name` and `stream_ordering`) when returning `RoomsForUser` to facilitate proper comparisons and `RoomStreamToken` generation.
Spawning from https://github.com/element-hq/synapse/pull/17187 where we want to utilize this change
... when workers are unreachable, etc.
Fixes https://github.com/element-hq/synapse/issues/17117.
The general principle is just to make sure that we propagate any
exceptions to the JsonResource, so that we return an error code to the
sending server. That means that the sending server no longer considers
the message safely sent, so it will retry later.
In the issue, Erik mentions that an alternative solution would be to
persist the to-device messages into a table so that they can be retried.
This might be an improvement for performance, but even if we did that,
we still need this mechanism, since we might be unable to reach the
database. So, if we want to do that, it can be a later follow-up.
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
During the migration the automated script to update the copyright
headers accidentally got rid of some of the existing copyright lines.
Reinstate them.
This adds a module API which allows a module to update a user's
presence state/status message. This is useful for controlling presence
from an external system.
To fully control presence from the module the presence.enabled config
parameter gains a new state of "untracked" which disables internal tracking
of presence changes via user actions, etc. Only updates from the module will
be persisted and sent down sync properly).
Synapse was incorrectly implemented with a knock_state_events
property on some APIs (instead of knock_room_state). This was
correct in Synapse 1.70.0, but *both* fields were sent to also be
compatible with Synapse versions expecting the wrong field.
Enough time has passed that only the correct field needs to be
included/handled.
And fix a bug in the implementation of the updated redaction
format (MSC2174) where the top-level redacts field was not
properly added for backwards-compatibility.
Synapse will no longer send (or respond to) the unstable flags
for faster joins. These were only available behind a configuration
flag and handled in parallel with the stable flags.
Add an `is_mine_server_name` method, similar to `is_mine_id`.
Ideally we would use this consistently, instead of sometimes comparing
against `hs.hostname` and other times reaching into
`hs.config.server.server_name`.
Also fix a bug in the tests where `hs.hostname` would sometimes differ
from `hs.config.server.server_name`.
Signed-off-by: Sean Quah <seanq@matrix.org>
MSC3983 provides a way to request multiple OTKs at once from appservices,
this extends this concept to the Client-Server API.
Note that this will likely be spit out into a separate MSC, but is currently part of
MSC3983.
It can be useful to always return the fallback key when attempting to
claim keys. This adds an unstable endpoint for `/keys/claim` which
always returns fallback keys in addition to one-time-keys.
The fallback key(s) are not marked as "used" unless there are no
corresponding OTKs.
This is currently defined in MSC3983 (although likely to be split out
to a separate MSC). The endpoint shape may change or be requested
differently (i.e. a keyword parameter on the current endpoint), but the
core logic should be reasonable.
Experimental support for MSC3983 is behind a configuration flag.
If enabled, for users which are exclusively owned by an application
service then the appservice will be queried for one-time keys *if*
there are none uploaded to Synapse.
Fixes#12801.
Complement tests are at
https://github.com/matrix-org/complement/pull/567.
Avoid blocking on full state when handling a subsequent join into a
partial state room.
Also always perform a remote join into partial state rooms, since we do
not know whether the joining user has been banned and want to avoid
leaking history to banned users.
Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <seanq@matrix.org>
Co-authored-by: David Robertson <davidr@element.io>
It's important that collections returned from `@cached` methods are not
modified, otherwise future retrievals from the cache will return the
modified collection.
This applies to the return values from `@cached` methods and the values
inside the dictionaries returned by `@cachedList` methods. It's not
necessary for the dictionaries returned by `@cachedList` methods
themselves to be read-only.
Signed-off-by: Sean Quah <seanq@matrix.org>
Co-authored-by: David Robertson <davidr@element.io>
The `parse_enum` helper pulls an enum value from the query string
(by delegating down to the parse_string helper with values generated
from the enum).
This is used to pull out "f" and "b" in most places and then we thread
the resulting Direction enum throughout more code.
* Also use stable name in SendJoinResponse struct
follow-up to #14832
* Changelog
* Fix a rename I missed
* Run black
* Update synapse/federation/federation_client.py
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Use new query param when requesting a partial join
* Read new query param when serving partial join
* Provide new field names when serving partial joins
* Read new field names from partial join response
* Changelog