Looking for some advice from #MastodonAdmins
-
Looking for some advice from #MastodonAdmins
What setting do you have WEB_CONCURRENCY and MAX_THREADS for your puma (mastodon-web) service?
Iβve set it 50 and 20 respectively for a pool of 1000 web processes, but Iβve been getting constant overloading with tens of thousands of backlogged connections.
Do I need to adjust the mastodon-streaming service settings also? I increased my database connection limit to 1500 to handle the connections plus extras.
Any ideas?
I'm running:
MAX_THREADS=5
WEB_CONCURRENCY=4Wonder if you're running too many processes? Have you tuned your database? https://pgtune.leopard.in.ua - that helped mine out a lot two years ago.
I also don't remember what setting it was, but I had to tune some network setting in ubuntu too, was having a lot of rejected/dropped/whatever connections (its been a while).
-
Looking for some advice from #MastodonAdmins
What setting do you have WEB_CONCURRENCY and MAX_THREADS for your puma (mastodon-web) service?
Iβve set it 50 and 20 respectively for a pool of 1000 web processes, but Iβve been getting constant overloading with tens of thousands of backlogged connections.
Do I need to adjust the mastodon-streaming service settings also? I increased my database connection limit to 1500 to handle the connections plus extras.
Any ideas?
@KuJoe 1500 connection limit?
I have double your active users running on under 30 direct postgres connections most of the time. WEB_CONCURRENCY=24 and MAX_THREADS=15 (this is specific to my 44 CPU core count that I'm also sharing with Sidekiq and Postgres on the same machine, hence 24, don't overdo it...)
This might be over-explaining, but this is my setup:
1. You need to install pgBouncer https://docs.joinmastodon.org/admin/scaling/#pgbouncer
I have max_client_conn = 10000 and default_pool_size = 30 and reserve_pool_size = 120 set in pgbouncer.ini
This allows practically unlimited connections from Mastodon, but only 30 will be opened with postgres. It will scale up to 150 if absolutely required but that very rarely ever happens.
2. You should set max connections in Postgres itself to 200 and check https://pgtune.leopard.in.ua again.
Use OLTP instead of Web as the DB type. If your DB is on the same server as Mastodon don't just enter your total core count and RAM, enter only what you want dedicated to Postgres (so... probably about half).
Some settings depend on what max connections is, so if you ran pgTune before and then increased it they're probably wrong, best to double-check them all.
3. Profit
-
Looking for some advice from #MastodonAdmins
What setting do you have WEB_CONCURRENCY and MAX_THREADS for your puma (mastodon-web) service?
Iβve set it 50 and 20 respectively for a pool of 1000 web processes, but Iβve been getting constant overloading with tens of thousands of backlogged connections.
Do I need to adjust the mastodon-streaming service settings also? I increased my database connection limit to 1500 to handle the connections plus extras.
Any ideas?
@KuJoe in my subjective opinion, the streaming service can take a serious beating in comparison to the other ones. I think the default values for that will get you very far
-
@KuJoe 1500 connection limit?
I have double your active users running on under 30 direct postgres connections most of the time. WEB_CONCURRENCY=24 and MAX_THREADS=15 (this is specific to my 44 CPU core count that I'm also sharing with Sidekiq and Postgres on the same machine, hence 24, don't overdo it...)
This might be over-explaining, but this is my setup:
1. You need to install pgBouncer https://docs.joinmastodon.org/admin/scaling/#pgbouncer
I have max_client_conn = 10000 and default_pool_size = 30 and reserve_pool_size = 120 set in pgbouncer.ini
This allows practically unlimited connections from Mastodon, but only 30 will be opened with postgres. It will scale up to 150 if absolutely required but that very rarely ever happens.
2. You should set max connections in Postgres itself to 200 and check https://pgtune.leopard.in.ua again.
Use OLTP instead of Web as the DB type. If your DB is on the same server as Mastodon don't just enter your total core count and RAM, enter only what you want dedicated to Postgres (so... probably about half).
Some settings depend on what max connections is, so if you ran pgTune before and then increased it they're probably wrong, best to double-check them all.
3. Profit
@jonah Iβve been using pgbouncer and itβs been basically magic since running it, I didnβt think it was a database issue because PgHero rarely shows over 50 connections but my web connections (puma) will hit over 1000 active a few times an hour now, sometimes queuing up 10-40k backlogs.
I never had this issue with >30k active users so something must have changed recently with my user activity.
-
@jonah Iβve been using pgbouncer and itβs been basically magic since running it, I didnβt think it was a database issue because PgHero rarely shows over 50 connections but my web connections (puma) will hit over 1000 active a few times an hour now, sometimes queuing up 10-40k backlogs.
I never had this issue with >30k active users so something must have changed recently with my user activity.
@KuJoe oh, if you have pgBouncer already then that is good. I see... where are you getting this backlog number from?
-
@KuJoe oh, if you have pgBouncer already then that is good. I see... where are you getting this backlog number from?
@jonah I enabled stats on the puma service and monitor it with a PHP script that generates alerts when the backlog number gets higher than 1000.
-
@jonah I enabled stats on the puma service and monitor it with a PHP script that generates alerts when the backlog number gets higher than 1000.
@KuJoe you're aware Puma's backlog is connections that have been established but not yet processed by something else, and not a backlog of connections right?
What are the requests waiting for when they're in this queue? Like, if you look at the logs for the mastodon-web service, each line should include something like: duration=5.78 view=0.00 db=1.27
Do you have Prometheus metrics enabled?
-
@KuJoe you're aware Puma's backlog is connections that have been established but not yet processed by something else, and not a backlog of connections right?
What are the requests waiting for when they're in this queue? Like, if you look at the logs for the mastodon-web service, each line should include something like: duration=5.78 view=0.00 db=1.27
Do you have Prometheus metrics enabled?
@jonah I didnβt realize that, I guess that makes sense why my changes to the web service donβt impact it. I donβt have Prometheus metrics enabled because I found it extremely confusing. I guess I should look into enabling that to help track down the bottleneck?
-
@jonah I didnβt realize that, I guess that makes sense why my changes to the web service donβt impact it. I donβt have Prometheus metrics enabled because I found it extremely confusing. I guess I should look into enabling that to help track down the bottleneck?
@KuJoe well, check those timings in the logs with journalctl first, good chance they'll say what's up.
The built-in metrics Mastodon added in 4.4 (iirc) are pretty nice to have in general though.
-
@KuJoe well, check those timings in the logs with journalctl first, good chance they'll say what's up.
The built-in metrics Mastodon added in 4.4 (iirc) are pretty nice to have in general though.
@KuJoe when 4.4 came out I made these Grafana dashboards to connect to Prometheus which were invaluable for troubleshooting database problems I was having: https://github.com/jonaharagon/mastodon-grafana
-
Looking for some advice from #MastodonAdmins
What setting do you have WEB_CONCURRENCY and MAX_THREADS for your puma (mastodon-web) service?
Iβve set it 50 and 20 respectively for a pool of 1000 web processes, but Iβve been getting constant overloading with tens of thousands of backlogged connections.
Do I need to adjust the mastodon-streaming service settings also? I increased my database connection limit to 1500 to handle the connections plus extras.
Any ideas?
@KuJoe that seems like way too many. Try to have one web concurrency per cpu then a few threads per web. Like 10 or 20. Well it depends on your system and also the number of users. How many parallel jobs do you think can run on each of your cpus ?
-
R ActivityRelay shared this topic