Since yesterday my server has again been getting absolutely obliterated by AI scrapers.
-
Since yesterday my server has again been getting absolutely obliterated by AI scrapers. This time, though, load is below 1, but I'm getting up to 10 requests a second and all of my Apache workers are in state "R". What levers do I have to pull on this? ...
UPDATE:
Please do not hit reply to tell me about robots.txt or the first shit you googled. I had specific technical questions and would like specific technical answers, not vibes.
-
Since yesterday my server has again been getting absolutely obliterated by AI scrapers. This time, though, load is below 1, but I'm getting up to 10 requests a second and all of my Apache workers are in state "R". What levers do I have to pull on this? ...
UPDATE:
Please do not hit reply to tell me about robots.txt or the first shit you googled. I had specific technical questions and would like specific technical answers, not vibes.
@jwz You're asking people for help, and being a giant dickhole in return. So why don't you *pay* someone to help, or do your own research?
-
@jwz You're asking people for help, and being a giant dickhole in return. So why don't you *pay* someone to help, or do your own research?
@not3ottersinacoat Why don't you go piss up a flagpole? *plonk*
-
Since yesterday my server has again been getting absolutely obliterated by AI scrapers. This time, though, load is below 1, but I'm getting up to 10 requests a second and all of my Apache workers are in state "R". What levers do I have to pull on this? ...
UPDATE:
Please do not hit reply to tell me about robots.txt or the first shit you googled. I had specific technical questions and would like specific technical answers, not vibes.
@jwz I wasn't aware the problem was so bad. All solutions seem to suck for users and/or admins - so I'm doing a bit of research.
You've looked at this, I assume, as immediate triage / prophylactic?
https://github.com/ai-robots-txt/ai.robots.txt/blob/main/FAQ.mdYour questions point to a unique approach.
Which is to say, we don't know, but you want practical solutions to lessen the fuckery.
With that in mind, I asked particularly detailed questions of Gemini, then asked ChatGPT to improve on it.
*(*We've all been reduced to this.)🧵 -
@jwz I wasn't aware the problem was so bad. All solutions seem to suck for users and/or admins - so I'm doing a bit of research.
You've looked at this, I assume, as immediate triage / prophylactic?
https://github.com/ai-robots-txt/ai.robots.txt/blob/main/FAQ.mdYour questions point to a unique approach.
Which is to say, we don't know, but you want practical solutions to lessen the fuckery.
With that in mind, I asked particularly detailed questions of Gemini, then asked ChatGPT to improve on it.
*(*We've all been reduced to this.)🧵@jwz Again, my sympathies. Hopefully something(s) in all of this is helpful for you.
One of the things I have been trying to learn more about is the process of actually knowing what to ask AI in the first place, in order to get more useful answers. Experts seem to think part of the reason it sucks so much for most people is how people ask it for help, oftentimes in ways that encourage it to be vague and utterly useless. I see actual value, but it's like communicating with an idiot savant.
-
Since yesterday my server has again been getting absolutely obliterated by AI scrapers. This time, though, load is below 1, but I'm getting up to 10 requests a second and all of my Apache workers are in state "R". What levers do I have to pull on this? ...
UPDATE:
Please do not hit reply to tell me about robots.txt or the first shit you googled. I had specific technical questions and would like specific technical answers, not vibes.
@jwz Also got a chance to read up on this a bit more.
Nepenthes sounds interesting... not just because it tortures the AI crawlers, but more because it claims to greatly help with the problem, without imposing on people legitimately visiting your site.
I think the solution you suggested is better though, especially if you want your content indexed by search engines, with a lower unnecessary burden. Honestly, surprised it's not more of a gold standard.
-
@jwz I wasn't aware the problem was so bad. All solutions seem to suck for users and/or admins - so I'm doing a bit of research.
You've looked at this, I assume, as immediate triage / prophylactic?
https://github.com/ai-robots-txt/ai.robots.txt/blob/main/FAQ.mdYour questions point to a unique approach.
Which is to say, we don't know, but you want practical solutions to lessen the fuckery.
With that in mind, I asked particularly detailed questions of Gemini, then asked ChatGPT to improve on it.
*(*We've all been reduced to this.)🧵@KraftTea No, we absolutely have not "all" been reduced to this.
You made a choice.
Do not ever reply to me with AI slop again.
If someone didn't bother to write it, there's no reason to bother to read it.
-
@KraftTea No, we absolutely have not "all" been reduced to this.
You made a choice.
Do not ever reply to me with AI slop again.
If someone didn't bother to write it, there's no reason to bother to read it.
@jwz I'm not an AI, and I used search engines first to familiarize myself more, as well as actually READING your questions, which is why my initial suggestion was blocking, while also acknowledging you likely tried that.
If you know what you are looking for, and ask rather detailed questions specific to your question, you can get answers that will either help - which you can try & might help - or not - which can also help.Failing that, you need to contact someone rare who's done it before.
-
@jwz I'm not an AI, and I used search engines first to familiarize myself more, as well as actually READING your questions, which is why my initial suggestion was blocking, while also acknowledging you likely tried that.
If you know what you are looking for, and ask rather detailed questions specific to your question, you can get answers that will either help - which you can try & might help - or not - which can also help.Failing that, you need to contact someone rare who's done it before.
@KraftTea And we're done here *plonk*
-
Since yesterday my server has again been getting absolutely obliterated by AI scrapers. This time, though, load is below 1, but I'm getting up to 10 requests a second and all of my Apache workers are in state "R". What levers do I have to pull on this? ...
UPDATE:
Please do not hit reply to tell me about robots.txt or the first shit you googled. I had specific technical questions and would like specific technical answers, not vibes.
Today my server is getting slammed by a very wide botnet that is really interested in such hits as
/var/www/jwz/..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2fetc%2fpasswd/emacs-timeline.html
Not an AI training bot! How quaint! How retro!
It seems to be associated with something called "bxss dot me" and to them I would like to sincerely say, I hate you please die.