Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Darkly)
  • No Skin
Collapse
Brand Logo
  1. Home
  2. Uncategorized
  3. Since yesterday my server has again been getting absolutely obliterated by AI scrapers.

Since yesterday my server has again been getting absolutely obliterated by AI scrapers.

Scheduled Pinned Locked Moved Uncategorized
10 Posts 3 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • jwzJ This user is from outside of this forum
    jwzJ This user is from outside of this forum
    jwz
    wrote last edited by
    #1

    Since yesterday my server has again been getting absolutely obliterated by AI scrapers. This time, though, load is below 1, but I'm getting up to 10 requests a second and all of my Apache workers are in state "R". What levers do I have to pull on this? ...

    UPDATE:

    Please do not hit reply to tell me about robots.txt or the first shit you googled. I had specific technical questions and would like specific technical answers, not vibes.

    https://jwz.org/b/yk0T

    Emma Liv :pensive_party_blob:N Tea time.K jwzJ 4 Replies Last reply
    0
    • jwzJ jwz

      Since yesterday my server has again been getting absolutely obliterated by AI scrapers. This time, though, load is below 1, but I'm getting up to 10 requests a second and all of my Apache workers are in state "R". What levers do I have to pull on this? ...

      UPDATE:

      Please do not hit reply to tell me about robots.txt or the first shit you googled. I had specific technical questions and would like specific technical answers, not vibes.

      https://jwz.org/b/yk0T

      Emma Liv :pensive_party_blob:N This user is from outside of this forum
      Emma Liv :pensive_party_blob:N This user is from outside of this forum
      Emma Liv :pensive_party_blob:
      wrote last edited by
      #2

      @jwz You're asking people for help, and being a giant dickhole in return. So why don't you *pay* someone to help, or do your own research?

      jwzJ 1 Reply Last reply
      0
      • Emma Liv :pensive_party_blob:N Emma Liv :pensive_party_blob:

        @jwz You're asking people for help, and being a giant dickhole in return. So why don't you *pay* someone to help, or do your own research?

        jwzJ This user is from outside of this forum
        jwzJ This user is from outside of this forum
        jwz
        wrote last edited by
        #3

        @not3ottersinacoat Why don't you go piss up a flagpole? *plonk*

        1 Reply Last reply
        0
        • jwzJ jwz

          Since yesterday my server has again been getting absolutely obliterated by AI scrapers. This time, though, load is below 1, but I'm getting up to 10 requests a second and all of my Apache workers are in state "R". What levers do I have to pull on this? ...

          UPDATE:

          Please do not hit reply to tell me about robots.txt or the first shit you googled. I had specific technical questions and would like specific technical answers, not vibes.

          https://jwz.org/b/yk0T

          Tea time.K This user is from outside of this forum
          Tea time.K This user is from outside of this forum
          Tea time.
          wrote last edited by
          #4

          @jwz I wasn't aware the problem was so bad. All solutions seem to suck for users and/or admins - so I'm doing a bit of research.
          You've looked at this, I assume, as immediate triage / prophylactic?
          https://github.com/ai-robots-txt/ai.robots.txt/blob/main/FAQ.md

          Your questions point to a unique approach.
          Which is to say, we don't know, but you want practical solutions to lessen the fuckery.
          With that in mind, I asked particularly detailed questions of Gemini, then asked ChatGPT to improve on it.
          *(*We've all been reduced to this.)🧵

          Tea time.K jwzJ 2 Replies Last reply
          0
          • Tea time.K Tea time.

            @jwz I wasn't aware the problem was so bad. All solutions seem to suck for users and/or admins - so I'm doing a bit of research.
            You've looked at this, I assume, as immediate triage / prophylactic?
            https://github.com/ai-robots-txt/ai.robots.txt/blob/main/FAQ.md

            Your questions point to a unique approach.
            Which is to say, we don't know, but you want practical solutions to lessen the fuckery.
            With that in mind, I asked particularly detailed questions of Gemini, then asked ChatGPT to improve on it.
            *(*We've all been reduced to this.)🧵

            Tea time.K This user is from outside of this forum
            Tea time.K This user is from outside of this forum
            Tea time.
            wrote last edited by
            #5

            @jwz Again, my sympathies. Hopefully something(s) in all of this is helpful for you.

            One of the things I have been trying to learn more about is the process of actually knowing what to ask AI in the first place, in order to get more useful answers. Experts seem to think part of the reason it sucks so much for most people is how people ask it for help, oftentimes in ways that encourage it to be vague and utterly useless. I see actual value, but it's like communicating with an idiot savant.

            1 Reply Last reply
            0
            • jwzJ jwz

              Since yesterday my server has again been getting absolutely obliterated by AI scrapers. This time, though, load is below 1, but I'm getting up to 10 requests a second and all of my Apache workers are in state "R". What levers do I have to pull on this? ...

              UPDATE:

              Please do not hit reply to tell me about robots.txt or the first shit you googled. I had specific technical questions and would like specific technical answers, not vibes.

              https://jwz.org/b/yk0T

              Tea time.K This user is from outside of this forum
              Tea time.K This user is from outside of this forum
              Tea time.
              wrote last edited by
              #6

              @jwz Also got a chance to read up on this a bit more.

              Nepenthes sounds interesting... not just because it tortures the AI crawlers, but more because it claims to greatly help with the problem, without imposing on people legitimately visiting your site.

              I think the solution you suggested is better though, especially if you want your content indexed by search engines, with a lower unnecessary burden. Honestly, surprised it's not more of a gold standard.

              https://zadzmo.org/code/nepenthes/

              1 Reply Last reply
              0
              • Tea time.K Tea time.

                @jwz I wasn't aware the problem was so bad. All solutions seem to suck for users and/or admins - so I'm doing a bit of research.
                You've looked at this, I assume, as immediate triage / prophylactic?
                https://github.com/ai-robots-txt/ai.robots.txt/blob/main/FAQ.md

                Your questions point to a unique approach.
                Which is to say, we don't know, but you want practical solutions to lessen the fuckery.
                With that in mind, I asked particularly detailed questions of Gemini, then asked ChatGPT to improve on it.
                *(*We've all been reduced to this.)🧵

                jwzJ This user is from outside of this forum
                jwzJ This user is from outside of this forum
                jwz
                wrote last edited by
                #7

                @KraftTea No, we absolutely have not "all" been reduced to this.

                You made a choice.

                Do not ever reply to me with AI slop again.

                If someone didn't bother to write it, there's no reason to bother to read it.

                Tea time.K 1 Reply Last reply
                0
                • jwzJ jwz

                  @KraftTea No, we absolutely have not "all" been reduced to this.

                  You made a choice.

                  Do not ever reply to me with AI slop again.

                  If someone didn't bother to write it, there's no reason to bother to read it.

                  Tea time.K This user is from outside of this forum
                  Tea time.K This user is from outside of this forum
                  Tea time.
                  wrote last edited by
                  #8

                  @jwz I'm not an AI, and I used search engines first to familiarize myself more, as well as actually READING your questions, which is why my initial suggestion was blocking, while also acknowledging you likely tried that.
                  If you know what you are looking for, and ask rather detailed questions specific to your question, you can get answers that will either help - which you can try & might help - or not - which can also help.

                  Failing that, you need to contact someone rare who's done it before.

                  jwzJ 1 Reply Last reply
                  0
                  • Tea time.K Tea time.

                    @jwz I'm not an AI, and I used search engines first to familiarize myself more, as well as actually READING your questions, which is why my initial suggestion was blocking, while also acknowledging you likely tried that.
                    If you know what you are looking for, and ask rather detailed questions specific to your question, you can get answers that will either help - which you can try & might help - or not - which can also help.

                    Failing that, you need to contact someone rare who's done it before.

                    jwzJ This user is from outside of this forum
                    jwzJ This user is from outside of this forum
                    jwz
                    wrote last edited by
                    #9

                    @KraftTea And we're done here *plonk*

                    1 Reply Last reply
                    0
                    • jwzJ jwz

                      Since yesterday my server has again been getting absolutely obliterated by AI scrapers. This time, though, load is below 1, but I'm getting up to 10 requests a second and all of my Apache workers are in state "R". What levers do I have to pull on this? ...

                      UPDATE:

                      Please do not hit reply to tell me about robots.txt or the first shit you googled. I had specific technical questions and would like specific technical answers, not vibes.

                      https://jwz.org/b/yk0T

                      jwzJ This user is from outside of this forum
                      jwzJ This user is from outside of this forum
                      jwz
                      wrote last edited by
                      #10

                      Today my server is getting slammed by a very wide botnet that is really interested in such hits as

                      /var/www/jwz/..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2fetc%2fpasswd/emacs-timeline.html

                      Not an AI training bot! How quaint! How retro!

                      It seems to be associated with something called "bxss dot me" and to them I would like to sincerely say, I hate you please die.

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      Powered by NodeBB Contributors
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups