Commit graph

182 commits

Author SHA1 Message Date
Devon Hudson
df24e0f302
Fix support for older versions of zope-interface (#19274)
Some checks failed
Schema / Ensure generated documentation is up-to-date (push) Waiting to run
Tests / lint-crlf (push) Waiting to run
Tests / changes (push) Waiting to run
Tests / check-sampleconfig (push) Blocked by required conditions
Tests / check-schema-delta (push) Blocked by required conditions
Tests / check-lockfile (push) Waiting to run
Tests / lint (push) Blocked by required conditions
Tests / Typechecking (push) Blocked by required conditions
Tests / lint-newsfile (push) Waiting to run
Tests / lint-clippy (push) Blocked by required conditions
Tests / lint-clippy-nightly (push) Blocked by required conditions
Tests / lint-rust (push) Blocked by required conditions
Tests / lint-rustfmt (push) Blocked by required conditions
Tests / lint-readme (push) Blocked by required conditions
Tests / linting-done (push) Blocked by required conditions
Tests / calculate-test-jobs (push) Blocked by required conditions
Tests / trial (push) Blocked by required conditions
Tests / trial-olddeps (push) Blocked by required conditions
Tests / trial-pypy (all, pypy-3.10) (push) Blocked by required conditions
Tests / sytest (push) Blocked by required conditions
Tests / export-data (push) Blocked by required conditions
Tests / portdb (14, 3.10) (push) Blocked by required conditions
Tests / portdb (17, 3.14) (push) Blocked by required conditions
Tests / complement (monolith, Postgres) (push) Blocked by required conditions
Tests / complement (monolith, SQLite) (push) Blocked by required conditions
Tests / complement (workers, Postgres) (push) Blocked by required conditions
Tests / cargo-test (push) Blocked by required conditions
Tests / cargo-bench (push) Blocked by required conditions
Tests / tests-done (push) Blocked by required conditions
/ Check locked dependencies have sdists (push) Has been cancelled
Fixes #19269 

Versions of zope-interface from RHEL, Ubuntu LTS 22 & 24 and OpenSuse
don't support the new python union `X | Y` syntax for interfaces. This
PR partially reverts the change over to fully use the new syntax, adds a
minimum supported version of zope-interface to Synapse's dependency
list, and removes the linter auto-upgrades which prefer the newer
syntax.

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [X] Pull request is based on the develop branch
* [X] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [X] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))

---------

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2025-12-12 15:34:13 +00:00
Eric Eastwood
b9dda0ff22
Restore printing sentinel for log_record.request (#19172)
This was unintentionally changed in
https://github.com/element-hq/synapse/pull/19068.

There is no real bug here. Without this PR, we just printed an empty
string for the `sentinel` logcontext whereas the prior art behavior was
to print `sentinel` which this PR restores.

Found while staring at the logs in
https://github.com/element-hq/synapse/issues/19165


### Reproduction strategy

1. Configure Synapse with
[logging](df802882bb/docs/sample_log_config.yaml)
1. Start Synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Notice the `asyncio - 64 - DEBUG - - Using selector: EpollSelector`
log line (notice empty string `- -`)
1. With this PR, the log line will be `asyncio - 64 - DEBUG - sentinel -
Using selector: EpollSelector` (notice `sentinel`)
2025-11-13 09:57:56 -06:00
Andrew Ferrazzutti
fcac7e0282
Write union types as X | Y where possible (#19111)
Some checks are pending
Schema / Ensure Synapse config schema is valid (push) Waiting to run
Schema / Ensure generated documentation is up-to-date (push) Waiting to run
Tests / lint (push) Blocked by required conditions
Tests / lint-readme (push) Blocked by required conditions
Tests / linting-done (push) Blocked by required conditions
Tests / calculate-test-jobs (push) Blocked by required conditions
Tests / changes (push) Waiting to run
Tests / check-sampleconfig (push) Blocked by required conditions
Tests / check-schema-delta (push) Blocked by required conditions
Tests / check-lockfile (push) Waiting to run
Tests / Typechecking (push) Blocked by required conditions
Tests / lint-crlf (push) Waiting to run
Tests / lint-newsfile (push) Waiting to run
Tests / lint-clippy (push) Blocked by required conditions
Tests / lint-clippy-nightly (push) Blocked by required conditions
Tests / lint-rust (push) Blocked by required conditions
Tests / lint-rustfmt (push) Blocked by required conditions
Tests / trial (push) Blocked by required conditions
Tests / trial-olddeps (push) Blocked by required conditions
Tests / trial-pypy (all, pypy-3.10) (push) Blocked by required conditions
Tests / sytest (push) Blocked by required conditions
Tests / export-data (push) Blocked by required conditions
Tests / portdb (13, 3.10) (push) Blocked by required conditions
Tests / portdb (17, 3.14) (push) Blocked by required conditions
Tests / complement (monolith, Postgres) (push) Blocked by required conditions
Tests / complement (monolith, SQLite) (push) Blocked by required conditions
Tests / complement (workers, Postgres) (push) Blocked by required conditions
Tests / cargo-test (push) Blocked by required conditions
Tests / cargo-bench (push) Blocked by required conditions
Tests / tests-done (push) Blocked by required conditions
aka PEP 604, added in Python 3.10
2025-11-06 14:02:33 -06:00
Eric Eastwood
41a2762e58
Be mindful of other logging context filters in 3rd-party code (#19068)
Some checks failed
Tests / lint-newsfile (push) Has been cancelled
Tests / sytest (push) Has been cancelled
Tests / lint-clippy-nightly (push) Has been cancelled
Build docker images / Push merged images to docker.io/matrixdotorg/synapse (push) Has been cancelled
Build docker images / Push merged images to ghcr.io/element-hq/synapse (push) Has been cancelled
Deploy the documentation / GitHub Pages (push) Has been cancelled
Build release artifacts / Build .deb packages (push) Has been cancelled
Build release artifacts / Attach assets to release (push) Has been cancelled
Tests / lint-clippy (push) Has been cancelled
Tests / check-sampleconfig (push) Has been cancelled
Tests / check-schema-delta (push) Has been cancelled
Tests / lint (push) Has been cancelled
Tests / Typechecking (push) Has been cancelled
Tests / lint-rust (push) Has been cancelled
Tests / lint-rustfmt (push) Has been cancelled
Tests / lint-readme (push) Has been cancelled
Tests / linting-done (push) Has been cancelled
Tests / calculate-test-jobs (push) Has been cancelled
Tests / trial (push) Has been cancelled
Tests / trial-olddeps (push) Has been cancelled
Tests / trial-pypy (all, pypy-3.10) (push) Has been cancelled
Tests / export-data (push) Has been cancelled
Tests / portdb (13, 3.10) (push) Has been cancelled
Tests / portdb (17, 3.13) (push) Has been cancelled
Tests / complement (monolith, Postgres) (push) Has been cancelled
Tests / complement (monolith, SQLite) (push) Has been cancelled
Tests / complement (workers, Postgres) (push) Has been cancelled
Tests / cargo-test (push) Has been cancelled
Tests / cargo-bench (push) Has been cancelled
Tests / tests-done (push) Has been cancelled
Be mindful that Synapse can be run alongside other code in the same
Python process. We shouldn't overwrite fields on given log record unless
we know it's relevant to Synapse.

(no clobber)


### Background

As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process), we're
currently diving into the details and implications of running multiple
instances of Synapse in the same Python process.

"Per-tenant logging" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/48
2025-10-31 10:12:05 -05:00
Eric Eastwood
c0b9437ab6
Fix lost logcontext when using timeout_deferred(...) (#19090)
Some checks are pending
Schema / Ensure generated documentation is up-to-date (push) Waiting to run
Tests / lint-rustfmt (push) Blocked by required conditions
Tests / lint-readme (push) Blocked by required conditions
Tests / tests-done (push) Blocked by required conditions
Tests / lint (push) Blocked by required conditions
Tests / changes (push) Waiting to run
Tests / check-sampleconfig (push) Blocked by required conditions
Tests / check-schema-delta (push) Blocked by required conditions
Tests / check-lockfile (push) Waiting to run
Tests / Typechecking (push) Blocked by required conditions
Tests / lint-crlf (push) Waiting to run
Tests / lint-newsfile (push) Waiting to run
Tests / lint-pydantic (push) Blocked by required conditions
Tests / lint-clippy (push) Blocked by required conditions
Tests / lint-clippy-nightly (push) Blocked by required conditions
Tests / lint-rust (push) Blocked by required conditions
Tests / linting-done (push) Blocked by required conditions
Tests / calculate-test-jobs (push) Blocked by required conditions
Tests / trial (push) Blocked by required conditions
Tests / trial-olddeps (push) Blocked by required conditions
Tests / trial-pypy (all, pypy-3.10) (push) Blocked by required conditions
Tests / sytest (push) Blocked by required conditions
Tests / export-data (push) Blocked by required conditions
Tests / portdb (13, 3.10) (push) Blocked by required conditions
Tests / portdb (17, 3.13) (push) Blocked by required conditions
Tests / complement (monolith, Postgres) (push) Blocked by required conditions
Tests / complement (monolith, SQLite) (push) Blocked by required conditions
Tests / complement (workers, Postgres) (push) Blocked by required conditions
Tests / cargo-test (push) Blocked by required conditions
Tests / cargo-bench (push) Blocked by required conditions
Fix lost logcontext when using `timeout_deferred(...)` and things
actually timeout.

Fix https://github.com/element-hq/synapse/issues/19087 (our HTTP client
times out requests using `timeout_deferred(...)`
Fix https://github.com/element-hq/synapse/issues/19066 (`/sync` uses
`notifier.wait_for_events()` which uses `timeout_deferred(...)` under
the hood)


### When/why did these lost logcontext warnings start happening?

```
synapse.logging.context - 107 - WARNING - sentinel - Expected logging context call_later but found POST-2453

synapse.logging.context - 107 - WARNING - sentinel - Expected logging context call_later was lost
```

In https://github.com/element-hq/synapse/pull/18828, we switched
`timeout_deferred(...)` from using `reactor.callLater(...)` to
[`clock.call_later(...)`](3b59ac3b69/synapse/util/clock.py (L224-L313))
under the hood. This meant it started dealing with logcontexts but our
`time_it_out()` callback didn't follow our [Synapse logcontext
rules](3b59ac3b69/docs/log_contexts.md).
2025-10-30 11:49:15 -05:00
Eric Eastwood
f0aae62f85
Cheaper logcontext debug logs (random_string_insecure_fast(...)) (#19094)
Follow-up to https://github.com/element-hq/synapse/pull/18966

During the weekly Backend team meeting, it was mentioned that
`random_string(...)` was taking a significant amount of CPU on
`matrix.org`. This makes sense as it relies on
[`secrets.choice(...)`](https://docs.python.org/3/library/secrets.html#secrets.choice),
a cryptographically secure function that is inherently computationally
expensive. And since https://github.com/element-hq/synapse/pull/18966,
we're calling `random_string(...)` as part of a bunch of logcontext
utilities.

Since we don't need cryptographically secure random strings for our
debug logs, this PR is introducing a new `random_string_insecure_fast(...)`
function that uses
[`random.choice(...)`](https://docs.python.org/3/library/random.html#random.choice)
which uses pseudo-random numbers that are "both fast and threadsafe".
2025-10-30 11:47:53 -05:00
Andrew Ferrazzutti
fc244bb592
Use type hinting generics in standard collections (#19046)
aka PEP 585, added in Python 3.9

 - https://peps.python.org/pep-0585/
 - https://docs.astral.sh/ruff/rules/non-pep585-annotation/
2025-10-22 16:48:19 -05:00
Eric Eastwood
70c044db8e
Remove deprecated LoggingContext.set_current_context/LoggingContext.current_context methods (#18989)
These were added for backwards compatibility (and essentially
deprecated) in https://github.com/matrix-org/synapse/pull/7408
(2020-05-04) because
[`synapse-s3-storage-provider`](https://github.com/matrix-org/synapse-s3-storage-provider)
previously relied on them -- but `synapse-s3-storage-provider` since
been
[updated](https://github.com/matrix-org/synapse-s3-storage-provider/pull/36)
to no longer use them.
2025-10-02 13:21:37 -05:00
Eric Eastwood
d27ff161f5
Add debug logs wherever we change current logcontext (#18966)
Add debug logs wherever we change current logcontext (`LoggingContext`).
I've had to make this same set of changes over and over as I've been
debugging things so it seems useful enough to include by default.

Instead of tracing things at the `set_current_context(...)` level, I've
added the debug logging on all of the utilities that utilize
`set_current_context(...)`. It's much easier to reason about the log
context changing because of `PreserveLoggingContext` changing things
than an opaque `set_current_context(...)` call.
2025-10-02 11:51:17 -05:00
Eric Eastwood
06a84f4fe0
Revert "Switch to OpenTracing's ContextVarsScopeManager (#18849)" (#19007)
Revert https://github.com/element-hq/synapse/pull/18849

Go back to our custom `LogContextScopeManager` after trying
OpenTracing's `ContextVarsScopeManager`.

Fix https://github.com/element-hq/synapse/issues/19004

### Why revert?

For reference, with the normal reactor, `ContextVarsScopeManager` worked
just as good as our custom `LogContextScopeManager` as far as I can tell
(and even better in some cases). But since Twisted appears to not fully
support `ContextVar`'s, it doesn't work as expected in all cases.
Compounding things, `ContextVarsScopeManager` was causing errors with
the experimental `SYNAPSE_ASYNC_IO_REACTOR` option.

Since we're not getting the full benefit that we originally desired, we
might as well revert and figure out alternatives for extending the
logcontext lifetimes to support the use case we were trying to unlock
(c.f. https://github.com/element-hq/synapse/pull/18804).

See
https://github.com/element-hq/synapse/issues/19004#issuecomment-3358052171
for more info.


### Does this require backporting and patch releases?

No. Since `ContextVarsScopeManager` operates just as good with the
normal reactor and was only causing actual errors with the experimental
`SYNAPSE_ASYNC_IO_REACTOR` option, I don't think this requires us to
backport and make patch releases at all.



### Maintain cross-links between main trace and background process work

In order to maintain the functionality introduced in https://github.com/element-hq/synapse/pull/18932 (cross-links between the background process trace and currently active trace), we also needed a small change.

Previously, when we were using `ContextVarsScopeManager`, it tracked the tracing scope across the logcontext changes without issue. Now that we're using our own custom `LogContextScopeManager` again, we need to capture the active span from the logcontext before we reset to the sentinel context because of the `PreserveLoggingContext()` below.

Added some tests to ensure we maintain the `run_as_background` tracing behavior regardless of the tracing scope manager we use.
2025-10-02 11:27:26 -05:00
Eric Eastwood
5143f93dc9
Fix server_name in logging context for multiple Synapse instances in one process (#18868)
### Background

As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process), we're
currently diving into the details and implications of running multiple
instances of Synapse in the same Python process.

"Per-tenant logging" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/48

### Prior art

Previously, we exposed `server_name` by providing a static logging
`MetadataFilter` that injected the values:


205d9e4fc4/synapse/config/logger.py (L216)

While this can work fine for the normal case of one Synapse instance per
Python process, this configures things globally and isn't compatible
when we try to start multiple Synapse instances because each subsequent
tenant will overwrite the previous tenant.


### What does this PR do?

We remove the `MetadataFilter` and replace it by tracking the
`server_name` in the `LoggingContext` and expose it with our existing
[`LoggingContextFilter`](205d9e4fc4/synapse/logging/context.py (L584-L622))
that we already use to expose information about the `request`.

This means that the `server_name` value follows wherever we log as
expected even when we have multiple Synapse instances running in the
same process.


### A note on logcontext

Anywhere, Synapse mistakenly uses the `sentinel` logcontext to log
something, we won't know which server sent the log. We've been fixing up
`sentinel` logcontext usage as tracked by
https://github.com/element-hq/synapse/issues/18905

Any further `sentinel` logcontext usage we find in the future can be
fixed piecemeal as normal.


d2a966f922/docs/log_contexts.md (L71-L81)


### Testing strategy

1. Adjust your logging config to include `%(server_name)s` in the format
    ```yaml
    formatters:
        precise:
format: '%(asctime)s - %(server_name)s - %(name)s - %(lineno)d -
%(levelname)s - %(request)s - %(message)s'
    ```
1. Start Synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Make some requests (`curl
http://localhost:8008/_matrix/client/versions`, etc)
1. Open the homeserver logs and notice the `server_name` in the logs as
expected. `unknown_server_from_sentinel_context` is expected for the
`sentinel` logcontext (things outside of Synapse).
2025-09-26 17:10:48 -05:00
Eric Eastwood
04721c85e6
Disconnect background process work from request trace (#18932)
Before https://github.com/element-hq/synapse/pull/18849, we we're using
our own custom `LogContextScopeManager` which tied the tracing scope to
the `LoggingContext`. Since we created a new
`BackgroundProcessLoggingContext` any time we
`run_as_background_process(...)`, the trace for the background work was
separate from the trace that kicked of the work as expected (e.g.
request trace is separate from the background process we kicked to fetch
more messages from the federation).

Since we've now switched to the `ContextVarsScopeManager` (in
https://github.com/element-hq/synapse/pull/18849), the tracing scope now
crosses the `LoggingContext` boundaries (and thread boundaries) without
a problem. This means we end up with request traces that include all of
the background work that we've kicked off bloating the trace and making
it hard to understand what's going on.

This PR separates the traces again to how things were before.
Additionally, things are even better now since I added some cross-link
references between the traces to easily be able to jump between.

Follow-up to https://github.com/element-hq/synapse/pull/18849

---

In the before, you can see that the trace is blown up by the background
process (`bgproc.qwer`).

In the after, we now only have a little cross-link marker span
(`start_bgproc.qwer`) to jump to background process trace.

Before | After
---  | ---
<some image> | <some image>



### Testing strategy

1. Run a Jaeger instance
(https://www.jaegertracing.io/docs/1.6/getting-started/)
    ```shell
    $ docker run -d --name jaeger \
      -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \
      -p 5775:5775/udp \
      -p 6831:6831/udp \
      -p 6832:6832/udp \
      -p 5778:5778 \
      -p 16686:16686 \
      -p 14268:14268 \
      -p 9411:9411 \
      jaegertracing/all-in-one:1.59.0
    ```
 1. Configure Synapse to use tracing:
     `homeserver.yaml`
     ```yaml
    ## Tracing ##
    opentracing:
      enabled: true
      jaeger_config:
        sampler:
          type: const
          param: 1
        logging:
          false
    ```
1. Make sure the optional `opentracing` dependency is installed: `poetry
install --extras all`
1. In the `VersionsRestServlet`, modify it to kick off a dummy
background process (easy to test this way)
    ```python
from synapse.metrics.background_process_metrics import
run_as_background_process

    async def _qwer() -> None:
        await self.clock.sleep(1)

    run_as_background_process("qwer", "test_server", _qwer)
    ```
1. Run Synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Fire off a version requests: `curl
http://localhost:8008/_matrix/client/versions`
 1. Visit http://localhost:16686/search to view the traces
     - Select the correct service
     - Look for the  `VersionsRestServlet` operation
     - Press 'Find Traces' button
     - Select the relevant trace
     - Notice how the trace isn't bloated
- Look for the `start_bgproc.qwer` span cross-linking to the background
process
- Jump to the other trace using the cross-link reference ->
`bgproc.qwer`
2025-09-25 21:45:18 -05:00
Eric Eastwood
0458f691b6
Fix run_coroutine_in_background(...) incorrectly handling logcontext (#18964)
Regressed in
https://github.com/element-hq/synapse/pull/18900#discussion_r2331554278
(see conversation there for more context)


### How is this a regression?

> To give this an update with more hindsight; this logic *was* redundant
with the early return and it is safe to remove this complexity

> 
> It seems like this actually has to do with completed vs incomplete
deferreds...
> 
> To explain how things previously worked *without* the early-return
shortcut:
> 
> With the normal case of **incomplete awaitable**, we store the
`calling_context` and the `f` function is called and runs until it
yields to the reactor. Because `f` follows the logcontext rules, it sets
the `sentinel` logcontext. Then in `run_in_background(...)`, we restore
the `calling_context`, store the current `ctx` (which is `sentinel`) and
return. When the deferred completes, we restore `ctx` (which is
`sentinel`) before yielding to the reactor again (all good
)
> 
> With the other case where we see a **completed awaitable**, we store
the `calling_context` and the `f` function is called and runs to
completion (no logcontext change). *This is where the shortcut would
kick in but I'm going to continue explaining as if we commented out the
shortcut.* -- Then in `run_in_background(...)`, we restore the
`calling_context`, store the current `ctx` (which is same as the
`calling_context`). Because the deferred is already completed, our extra
callback is called immediately and we restore `ctx` (which is same as
the `calling_context`). Since we never yield to the reactor, the
`calling_context` is perfect as that's what we want again (all good
)
> 
> ---
> 
> But this also means that our early-return shortcut is no longer just
an optimization and is *necessary* to act correctly in the **completed
awaitable** case as we want to return with the `calling_context` and not
reset to the `sentinel` context. I've updated the comment in
https://github.com/element-hq/synapse/pull/18964 to explain the
necessity as it's currently just described as an optimization.
> 
> But because we made the same change to
`run_coroutine_in_background(...)` which didn't have the same
early-return shortcut, we regressed the correct behavior  . This is
being fixed in https://github.com/element-hq/synapse/pull/18964
>
>
> *-- @MadLittleMods,
https://github.com/element-hq/synapse/pull/18900#discussion_r2373582917*

### How did we find this problem?

Spawning from @wrjlewis
[seeing](https://matrix.to/#/!SGNQGPGUwtcPBUotTL:matrix.org/$h3TxxPVlqC6BTL07dbrsz6PmaUoZxLiXnSTEY-QYDtA?via=jki.re&via=matrix.org&via=element.io)
`Starting metrics collection 'typing.get_new_events' from sentinel
context: metrics will be lost` in the logs:

<details>
<summary>More logs</summary>

```
synapse.http.request_metrics - 222 - ERROR - sentinel - Trying to stop RequestMetrics in the sentinel context.
2025-09-23 14:43:19,712 - synapse.util.metrics - 212 - WARNING - sentinel - Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost
2025-09-23 14:43:19,713 - synapse.rest.client.sync - 851 - INFO - sentinel - Client has disconnected; not serializing response.
2025-09-23 14:43:19,713 - synapse.http.server - 825 - WARNING - sentinel - Not sending response to request <XForwardedForRequest at 0x7f23e8111ed0 method='POST' uri='/_matrix/client/unstable/org.matrix.simplified_msc3575/sync?pos=281963%2Fs929324_147053_10_2652457_147960_2013_25554_4709564_0_164_2&timeout=30000' clientproto='HTTP/1.1' site='8008'>, already dis
connected.
2025-09-23 14:43:19,713 - synapse.access.http.8008 - 515 - INFO - sentinel - 92.40.194.87 - 8008 - {@me:wi11.co.uk} Processed request: 30.005sec/-8.041sec (0.001sec, 0.000sec) (0.000sec/0.002sec/2) 0B 200! "POST /_matrix/client/unstable/org.matrix.simplified_msc3575/
```

</details>

From the logs there, we can see things relating to
`typing.get_new_events` and
`/_matrix/client/unstable/org.matrix.simplified_msc3575/sync` which led
me to trying out Sliding Sync with the typing extension enabled and
allowed me to reproduce the problem locally. Sliding Sync is a unique
scenario as it's the only place we use `gather_optional_coroutines(...)`
-> `run_coroutine_in_background(...)` (introduced in
https://github.com/element-hq/synapse/pull/17884) to exhibit this
behavior.


### Testing strategy

1. Configure Synapse to enable
[MSC4186](https://github.com/matrix-org/matrix-spec-proposals/pull/4186):
Simplified Sliding Sync which is actually under
[MSC3575](https://github.com/matrix-org/matrix-spec-proposals/pull/3575)
    ```yaml
    experimental_features:
      msc3575_enabled: true
    ```
1. Start synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
 1. Make a Sliding Sync request with one of the extensions enabled
    ```http
POST
http://localhost:8008/_matrix/client/unstable/org.matrix.simplified_msc3575/sync
    {
      "lists": {},
      "room_subscriptions": {
            "!FlgJYGQKAIvAscfBhq:my.synapse.linux.server": {
                "required_state": [],
                "timeline_limit": 1
            }
        },
        "extensions": {
            "typing": {
                "enabled": true
            }
        }
    }
    ```
1. Open your homeserver logs and notice warnings about `Starting ...
from sentinel context: metrics will be lost`
2025-09-24 15:24:47 +00:00
Eric Eastwood
5a9ca1e3d9
Introduce Clock.call_when_running(...) to include logcontext by default (#18944)
Introduce `Clock.call_when_running(...)` to wrap startup code in a
logcontext, ensuring we can identify which server generated the logs.

Background:

>  Ideally, nothing from the Synapse homeserver would be logged against the `sentinel` 
>  logcontext as we want to know which server the logs came from. In practice, this is not 
>  always the case yet especially outside of request handling. 
>   
>  Global things outside of Synapse (e.g. Twisted reactor code) should run in the 
>  `sentinel` logcontext. It's only when it calls into application code that a logcontext 
>  gets activated. This means the reactor should be started in the `sentinel` logcontext, 
>  and any time an awaitable yields control back to the reactor, it should reset the 
>  logcontext to be the `sentinel` logcontext. This is important to avoid leaking the 
>  current logcontext to the reactor (which would then get picked up and associated with 
>  the next thing the reactor does). 
>
> *-- `docs/log_contexts.md`

Also adds a lint to prefer `Clock.call_when_running(...)` over
`reactor.callWhenRunning(...)`

Part of https://github.com/element-hq/synapse/issues/18905
2025-09-22 10:27:59 -05:00
Eric Eastwood
7ecfe8b1a8
Better explain which context the task is run in when using run_in_background(...) or run_as_background_process(...) (#18906)
Follow-up to https://github.com/element-hq/synapse/pull/18900
2025-09-12 09:29:35 -05:00
Eric Eastwood
9cc4001778
Better explain logcontext in run_in_background(...) and run_as_background_process(...) (#18900)
Also adds a section in the docs explaining the `sentinel` logcontext.

Spawning from https://github.com/element-hq/synapse/pull/18870


### Testing strategy

1. Run Synapse normally and with `daemonize: true`: `poetry run
synapse_homeserver --config-path homeserver.yaml`
 1. Execute some requests
 1. Shutdown the server
 1. Look for any bad log entries in your homeserver logs:
    - `Expected logging context sentinel but found main`
    - `Expected logging context main was lost`
    - `Expected previous context`
    - `utime went backwards!`/`stime went backwards!`
- `Called stop on logcontext POST-0 without recording a start rusage`
    - `Background process re-entered without a proc`

Twisted trial tests:

 1. Run full Twisted trial test suite.
1. Check the logs for `Test starting with non-sentinel logging context ...`
2025-09-10 10:22:53 -05:00
Eric Eastwood
27fc3389f3
Switch to OpenTracing's ContextVarsScopeManager (#18849)
Switch to OpenTracing's `ContextVarsScopeManager` instead of our own
custom `LogContextScopeManager`.

This is now possible because the linked Twisted issue from the comment
in our custom `LogContextScopeManager` is resolved:
https://twistedmatrix.com/trac/ticket/10301

This PR is spawning from exploring different possibilities to solve the
`scope` loss problem I was encountering in
https://github.com/element-hq/synapse/pull/18804#discussion_r2268254424.
This appears to solve the problem and I've added the additional test
from there to this PR 
2025-08-27 11:41:00 -05:00
Eric Eastwood
0be7fe926d
Add debug log when HMAC incorrect (#18474)
Spawning from getting `HMAC incorrect` errors that seem unexplainable
except for the `registration_shared_secret` being misconfigured. It's
also possible my HMAC calculation is incorrect but every time I
double-check the result with the [known-good Python
example](553e124f76/docs/admin_api/register_api.md)
(which matches [Synapse's
source](24e849e483/synapse/rest/admin/users.py (L618-L633))),
it's as expected.

With these logs, we can actually debug whether
`registration_shared_secret` is being configured correctly or not.

It also helps specifically when using `registration_shared_secret_path`
since the default Synapse behavior (of creating the file and secret if
it doesn't exist) can mask deployment race condition where we would
start up Synapse before the `registration_shared_secret_path` file was
put in place:

> **`registration_shared_secret_path`**
>
> [...]
>
> If this file does not exist, Synapse will create a new shared secret
on startup and store it in this file.
>
> *-- [Synapse config
docs](6521406a37/docs/usage/configuration/config_documentation.md (registration_shared_secret_path))*


This only applies to the [`POST
/_synapse/admin/v1/register`](553e124f76/docs/admin_api/register_api.md)
endpoint but does log very sensitive information so we've made it so you
have to explicitly enable the logs by configuring
`synapse.rest.admin.users.registration_debug` (does not inherit root log
level) (via our new `ExplicitlyConfiguredLogger`)


`homeserver.yaml`
```yaml
log_config: "/myserver.log.config.yaml"
```

`myserver.log.config.yaml`
```yaml
version: 1

formatters:
    precise:
        format: '%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s - %(message)s'
        

handlers:
    # ... file/buffer handler (see `sample_log_config.yaml`)

    # A handler that writes logs to stderr. Unused by default, but can be used
    # instead of "buffer" and "file" in the logger handlers.
    console:
        class: logging.StreamHandler
        formatter: precise

loggers:
    synapse.storage.SQL:
        # beware: increasing this to DEBUG will make synapse log sensitive
        # information such as access tokens.
        level: INFO

    # Has to be explicitly configured as such. Will not inherit from the root level even if it's set to DEBUG
    synapse.rest.admin.users.registration_debug:
        level: DEBUG

root:
    level: INFO

    handlers: [console]

disable_existing_loggers: false
```
2025-07-22 11:09:45 -05:00
Erik Johnston
f500c7d982
Speed up MAS token introspection (#18357)
We do this by shoving it into Rust. We believe our python http client is
a bit slow.

Also bumps minimum rust version to 1.81.0, released last September (over
six months ago)

To allow for async Rust, includes some adapters between Tokio in Rust
and the Twisted reactor in Python.
2025-06-16 16:41:35 +01:00
V02460
068e22b4b7
Cleanup Python 3.8 leftovers (#17967)
Some small cleanups after Python3.8 became EOL.

- Move some type imports from `typing_extensions` to `typing`
- Remove the `abi3-py38` feature from pyo3

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))

---------

Co-authored-by: Quentin Gliech <quenting@element.io>
2025-02-10 16:53:24 +00:00
Colin Watson
d69c00b5a1
Stop using twisted.internet.defer.returnValue (#18020)
`defer.returnValue` was only needed in Python 2; in Python 3, a simple
`return` is fine.

`twisted.internet.defer.returnValue` is deprecated as of Twisted 24.7.0.

Most uses of `returnValue` in synapse were removed a while back; this
cleans up some remaining bits.
2024-12-20 10:57:59 +00:00
Erik Johnston
77eafd47df
Fix other unit tests with latest twisted (#17907)
There's also https://github.com/element-hq/synapse/pull/17906
2024-11-07 10:11:13 +00:00
Erik Johnston
83513b75f7
Speed up sliding sync by computing extensions in parallel (#17884)
The main change here is to add a helper function
`gather_optional_coroutines`, which works in a similar way as
`yieldable_gather_results` but takes a set of coroutines rather than a
function
2024-10-30 10:51:04 +00:00
Quentin Gliech
7d52ce7d4b
Format files with Ruff (#17643)
I thought ruff check would also format, but it doesn't.

This runs ruff format in CI and dev scripts. The first commit is just a
run of `ruff format .` in the root directory.
2024-09-02 12:39:04 +01:00
Erik Johnston
b4d95409fb
Fix @tag_args for non-methods (#17604)
The decorator assumed we were always wrapping function methods
2024-08-27 11:47:28 +01:00
dependabot[bot]
e161103b46
Bump mypy from 1.5.1 to 1.8.0 (#16901) 2024-03-13 17:05:57 +00:00
dependabot[bot]
1e68b56a62
Bump black from 23.10.1 to 24.2.0 (#16936) 2024-03-13 16:46:44 +00:00
Erik Johnston
23740eaa3d
Correctly mention previous copyright (#16820)
During the migration the automated script to update the copyright
headers accidentally got rid of some of the existing copyright lines.
Reinstate them.
2024-01-23 11:26:48 +00:00
Patrick Cloke
8e1e62c9e0 Update license headers 2023-11-21 15:29:58 -05:00
Patrick Cloke
ed1b879576
Do not call getfullargspec on every call. (#16589)
getfullargspec is relatively expensive and the results will
not change between calls, so precalculate it outside the
wrapper.
2023-10-31 20:16:17 +00:00
Patrick Cloke
ba48c563c9
Bump mypy from 1.4.1 to 1.5.1. (#16300) 2023-09-12 07:16:09 -04:00
dependabot[bot]
757010905e
Bump twisted from 22.10.0 to 23.8.0 (#16235)
* Bump twisted from 22.10.0 to 23.8.0

Bumps [twisted](https://github.com/twisted/twisted) from 22.10.0 to 23.8.0.
- [Release notes](https://github.com/twisted/twisted/releases)
- [Changelog](https://github.com/twisted/twisted/blob/trunk/NEWS.rst)
- [Commits](https://github.com/twisted/twisted/compare/twisted-22.10.0...twisted-23.8.0)

---
updated-dependencies:
- dependency-name: twisted
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix types

* Fix lint

* Newsfile

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
2023-09-05 11:14:14 +00:00
Patrick Cloke
9ec3da06da
Bump mypy-zope & mypy. (#16188) 2023-08-29 10:38:56 -04:00
V02460
84f441f88f
Prepare unit tests for Python 3.12 (#16099) 2023-08-25 15:05:10 -04:00
David Robertson
e691243e19
Fix typechecking with twisted trunk (#16121) 2023-08-24 14:53:07 +00:00
Patrick Cloke
ad3f43be9a
Run pyupgrade for python 3.7 & 3.8. (#16110) 2023-08-15 08:11:20 -04:00
Jason Little
224ef0b669
Unix Sockets for HTTP Replication (#15708)
Unix socket support for `federation` and `client` Listeners has existed now for a little while(since [1.81.0](https://github.com/matrix-org/synapse/pull/15353)), but there was one last hold out before it could be complete: HTTP Replication communication. This should finish it up. The Listeners would have always worked, but would have had no way to be talked to/at.

---------

Co-authored-by: Eric Eastwood <madlittlemods@gmail.com>
Co-authored-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Co-authored-by: Eric Eastwood <erice@element.io>
2023-07-11 13:08:06 -05:00
Eric Eastwood
8bfded81f3
Trace functions which return Awaitable (#15650) 2023-06-06 17:39:22 -05:00
dependabot[bot]
9bb2eac719
Bump black from 22.12.0 to 23.1.0 (#15103) 2023-02-22 15:29:09 -05:00
David Robertson
ffc2ee521d
Use mypy 1.0 (#15052)
* Update mypy and mypy-zope
* Remove unused ignores

These used to suppress

```
synapse/storage/engines/__init__.py:28: error: "__new__" must return a
class instance (got "NoReturn")  [misc]
```

and

```
synapse/http/matrixfederationclient.py:1270: error: "BaseException" has no attribute "reasons"  [attr-defined]
```

(note that we check `hasattr(e, "reasons")` above)

* Avoid empty body warnings, sometimes by marking methods as abstract

E.g.

```
tests/handlers/test_register.py:58: error: Missing return statement  [empty-body]
tests/handlers/test_register.py:108: error: Missing return statement  [empty-body]
```

* Suppress false positive about `JaegerConfig`

Complaint was

```
synapse/logging/opentracing.py:450: error: Function "Type[Config]" could always be true in boolean context  [truthy-function]
```

* Fix not calling `is_state()`

Oops!

```
tests/rest/client/test_third_party_rules.py:428: error: Function "Callable[[], bool]" could always be true in boolean context  [truthy-function]
```

* Suppress false positives from ParamSpecs

````
synapse/logging/opentracing.py:971: error: Argument 2 to "_custom_sync_async_decorator" has incompatible type "Callable[[Arg(Callable[P, R], 'func'), **P], _GeneratorContextManager[None]]"; expected "Callable[[Callable[P, R], **P], _GeneratorContextManager[None]]"  [arg-type]
synapse/logging/opentracing.py:1017: error: Argument 2 to "_custom_sync_async_decorator" has incompatible type "Callable[[Arg(Callable[P, R], 'func'), **P], _GeneratorContextManager[None]]"; expected "Callable[[Callable[P, R], **P], _GeneratorContextManager[None]]"  [arg-type]
````

* Drive-by improvement to `wrapping_logic` annotation

* Workaround false "unreachable" positives

See https://github.com/Shoobx/mypy-zope/issues/91

```
tests/http/test_proxyagent.py:626: error: Statement is unreachable  [unreachable]
tests/http/test_proxyagent.py:762: error: Statement is unreachable  [unreachable]
tests/http/test_proxyagent.py:826: error: Statement is unreachable  [unreachable]
tests/http/test_proxyagent.py:838: error: Statement is unreachable  [unreachable]
tests/http/test_proxyagent.py:845: error: Statement is unreachable  [unreachable]
tests/http/federation/test_matrix_federation_agent.py:151: error: Statement is unreachable  [unreachable]
tests/http/federation/test_matrix_federation_agent.py:452: error: Statement is unreachable  [unreachable]
tests/logging/test_remote_handler.py:60: error: Statement is unreachable  [unreachable]
tests/logging/test_remote_handler.py:93: error: Statement is unreachable  [unreachable]
tests/logging/test_remote_handler.py:127: error: Statement is unreachable  [unreachable]
tests/logging/test_remote_handler.py:152: error: Statement is unreachable  [unreachable]
```

* Changelog

* Tweak DBAPI2 Protocol to be accepted by mypy 1.0

Some extra context in:
- https://github.com/matrix-org/python-canonicaljson/pull/57
- https://github.com/python/mypy/issues/6002
- https://mypy.readthedocs.io/en/latest/common_issues.html#covariant-subtyping-of-mutable-protocol-members-is-rejected

* Pull in updated canonicaljson lib

so the protocol check just works

* Improve comments in opentracing

I tried to workaround the ignores but found it too much trouble.

I think the corresponding issue is
https://github.com/python/mypy/issues/12909. The mypy repo has a PR
claiming to fix this (https://github.com/python/mypy/pull/14677) which
might mean this gets resolved soon?

* Better annotation for INTERACTIVE_AUTH_CHECKERS

* Drive-by AUTH_TYPE annotation, to remove an ignore
2023-02-16 16:09:11 +00:00
David Robertson
a5a799722d
Tag federation request spans with the worker name (#15042)
* Systematically include worker name as process info

* Changelog

* don't bother with inner setdefault
2023-02-09 22:33:39 +00:00
David Robertson
3b8574b4f2
Tag /send_join responses to detect faster joins (#14950)
* Tag /send_join responses to detect faster joins

* Changelog

* Define a proper SynapseTag

* isort
2023-01-31 12:43:20 +00:00
Richard van der Hoff
cb59e08062
Improve logging and opentracing for to-device message handling (#14598)
A batch of changes intended to make it easier to trace to-device messages through the system.

The intention here is that a client can set a property org.matrix.msgid in any to-device message it sends. That ID is then included in any tracing or logging related to the message. (Suggestions as to where this field should be documented welcome. I'm not enthusiastic about speccing it - it's very much an optional extra to help with debugging.)

I've also generally improved the data we send to opentracing for these messages.
2022-12-06 09:52:55 +00:00
Patrick Cloke
d8cc86eff4
Remove redundant types from comments. (#14412)
Remove type hints from comments which have been added
as Python type hints. This helps avoid drift between comments
and reality, as well as removing redundant information.

Also adds some missing type hints which were simple to fill in.
2022-11-16 15:25:24 +00:00
David Robertson
285d72556b
Update mypy and mypy-zope, attempt 3 (#13993)
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2022-09-30 17:36:28 +01:00
David Robertson
8e52cb0bce
Revert "Update mypy and mypy-zope (#13925)"
This reverts commit 6d543d6d9f.
2022-09-30 16:37:48 +01:00
David Robertson
6d543d6d9f
Update mypy and mypy-zope (#13925)
* Update mypy and mypy-zope

* Unignore assigning to LogRecord attributes

Presumably https://github.com/python/typeshed/pull/8064 makes this ok

Cherry-picked from #13521

* Remove unused ignores due to mypy ParamSpec fixes

https://github.com/python/mypy/pull/12668

Cherry-picked from #13521

* Remove additional unused ignores

* Fix new mypy complaints related to `assertGreater`

Presumably due to https://github.com/python/typeshed/pull/8077

* Changelog

* Reword changelog

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2022-09-30 16:34:47 +01:00
Eric Eastwood
a911ffb42c
Tag trace with instance name (#13761)
We tag the Synapse instance name so that it's an easy jumping off point into the logs. Can also be used to filter for an instance that is under load.

As suggested by @clokep and @reivilibre in,

 - https://github.com/matrix-org/synapse/pull/13729#discussion_r964719258
 - https://github.com/matrix-org/synapse/pull/13729#discussion_r964733578
2022-09-09 11:31:37 -05:00
Eric Eastwood
f694bb71b7
Strip number suffix from instance name to consolidate services that traces are spread over (#13729)
The problem with many services is that it makes it hard to find which service has the trace you want, see https://github.com/jaegertracing/jaeger-ui/issues/985

Previously, we split traces out into services based on their instance name like `matrix.org client_reader-1`, etc but there are many worker instances of the same `client_reader` so there is a lot to click through.

With this PR, all of the traces are just collected under the worker type like `client_reader`, `event_persister` 😇

Note: A Synapse worker instance name is an opaque string with the number convention only being our own thing for the `matrix.org` deployment. But seems pretty sensible to group things this way.
2022-09-09 11:30:06 -05:00
Eric Eastwood
0a4efbc1dd
Instrument the federation/backfill part of /messages (#13489)
Instrument the federation/backfill part of `/messages` so it's easier to follow what's going on in Jaeger when viewing a trace.

Split out from https://github.com/matrix-org/synapse/pull/13440

Follow-up from https://github.com/matrix-org/synapse/pull/13368

Part of https://github.com/matrix-org/synapse/issues/13356
2022-08-16 12:39:40 -05:00