The one-liner:

dd if=/dev/zero bs=1G count=10 | gzip -c > 10GB.gz

This is brilliant.

    • 👍Maximum Derek👍@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      43
      ·
      17 hours ago

      Most often because they don’t download any of the css of external js files from the pages they scrape. But there are a lot of other patterns you can detect once you have their traffic logs loaded in a time series database. I used an ELK stack back in the day.

      • sugar_in_your_tea@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        6
        ·
        17 hours ago

        That sounds like a lot of effort. Are there any tools that get like 80% of the way there? Like something I could plug into Caddy, nginx, or haproxy?

        • 👍Maximum Derek👍@discuss.tchncs.de
          link
          fedilink
          English
          arrow-up
          15
          ·
          16 hours ago

          My experience is with systems that handle nearly 1000 pageviews per second. We did use a spread of haproxy servers to handle routing and SNI, but they were being fed offender lists by external analysis tools (built in-house).

          • sugar_in_your_tea@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            4
            ·
            16 hours ago

            Dang, I was hoping for a FOSS project that would do most of the heavy lifting for me. Maybe such a thing exists, idk, but it would be pretty cool to have a pluggable system that analyzes activity and tags connections w/ some kind of identifier so I could configure a web server to either send it nonsense (i.e. poison AI scrapers), zip bombs (i.e. bots that aren’t respectful of resources), or redirect to a honey pot (i.e. malicious actors).

            A quick search didn’t yield anything immediately, but I wasn’t that thorough. I’d be interested if anyone knows of such a project that’s pretty easy to play with.

            • A Basil Plant@lemmy.world
              link
              fedilink
              English
              arrow-up
              6
              ·
              edit-2
              15 hours ago

              Not exactly what you asked, but do you know about ufw-blocklist?

              I’ve been using this on my multiple VPSes for some time now and the number of fail2ban failed/banned has gone down like crazy. Previously, I had 20k failed attempts after a few months and 30-50 currently-banned IPs at all times; now it’s less than 1k failed after a year and maybe 3-ish banned at any time.

              There was also that paid service where users share their spammy IP address attempts with a centralized network, which does some dynamic intelligence monitoring. I forgot the name and search these days isn’t great. Something to do with “Sense”? It was paid, but well recommended as far as I remember.

              Edit: seems like the keyword is " threat intelligence platform"