Looking for some advice from #MastodonAdmins

Troy

I'm running:
MAX_THREADS=5
WEB_CONCURRENCY=4

Wonder if you're running too many processes? Have you tuned your database? https://pgtune.leopard.in.ua - that helped mine out a lot two years ago.

I also don't remember what setting it was, but I had to tune some network setting in ubuntu too, was having a lot of rejected/dropped/whatever connections (its been a while).

Jonah Aragon

@KuJoe 1500 connection limit?

I have double your active users running on under 30 direct postgres connections most of the time. WEB_CONCURRENCY=24 and MAX_THREADS=15 (this is specific to my 44 CPU core count that I'm also sharing with Sidekiq and Postgres on the same machine, hence 24, don't overdo it...)

This might be over-explaining, but this is my setup:

1. You need to install pgBouncer https://docs.joinmastodon.org/admin/scaling/#pgbouncer

I have max_client_conn = 10000 and default_pool_size = 30 and reserve_pool_size = 120 set in pgbouncer.ini

This allows practically unlimited connections from Mastodon, but only 30 will be opened with postgres. It will scale up to 150 if absolutely required but that very rarely ever happens.

2. You should set max connections in Postgres itself to 200 and check https://pgtune.leopard.in.ua again.

Use OLTP instead of Web as the DB type. If your DB is on the same server as Mastodon don't just enter your total core count and RAM, enter only what you want dedicated to Postgres (so... probably about half).

Some settings depend on what max connections is, so if you ran pgTune before and then increased it they're probably wrong, best to double-check them all.

3. Profit

Hi I'm Sean

@KuJoe in my subjective opinion, the streaming service can take a serious beating in comparison to the other ones. I think the default values for that will get you very far

KuJoe 💞

@jonah I’ve been using pgbouncer and it’s been basically magic since running it, I didn’t think it was a database issue because PgHero rarely shows over 50 connections but my web connections (puma) will hit over 1000 active a few times an hour now, sometimes queuing up 10-40k backlogs.

I never had this issue with >30k active users so something must have changed recently with my user activity.

Jonah Aragon

@KuJoe oh, if you have pgBouncer already then that is good. I see... where are you getting this backlog number from?

KuJoe 💞

@jonah I enabled stats on the puma service and monitor it with a PHP script that generates alerts when the backlog number gets higher than 1000.

Jonah Aragon

@KuJoe you're aware Puma's backlog is connections that have been established but not yet processed by something else, and not a backlog of connections right?

What are the requests waiting for when they're in this queue? Like, if you look at the logs for the mastodon-web service, each line should include something like: duration=5.78 view=0.00 db=1.27

Do you have Prometheus metrics enabled?

KuJoe 💞

@jonah I didn’t realize that, I guess that makes sense why my changes to the web service don’t impact it. I don’t have Prometheus metrics enabled because I found it extremely confusing. I guess I should look into enabling that to help track down the bottleneck?

Jonah Aragon

@KuJoe well, check those timings in the logs with journalctl first, good chance they'll say what's up.

The built-in metrics Mastodon added in 4.4 (iirc) are pretty nice to have in general though.

Jonah Aragon

@KuJoe when 4.4 came out I made these Grafana dashboards to connect to Prometheus which were invaluable for troubleshooting database problems I was having: https://github.com/jonaharagon/mastodon-grafana

Nico…

@KuJoe that seems like way too many. Try to have one web concurrency per cpu then a few threads per web. Like 10 or 20. Well it depends on your system and also the number of users. How many parallel jobs do you think can run on each of your cpus ?