From NGINX Ratelimits to Anubis: A New Approach Against AI Crawlers

TLDR

AI-powered crawlers were overloading our public Git repos. We replaced NGINX rate limits (which sometimes blocked real users) with Anubis, a proof-of-work anti-scraper proxy. Now, only suspicious requests get challenged, keeping the site fast for everyone else.

Introduction

In my previous post, I shared how we tackled the surge of AI-powered crawlers scraping our public Git repositories using NGINX’s rate limiting capabilities ("‘AI’ Crawlers Hammering Git Repos Across the Web – A Rate Limiting Approach"). While effective at reducing load, that method had a downside—it occasionally penalized legitimate users, especially those browsing files tied to specific revisions. To improve the experience without compromising protection, we adopted a new approach using Anubis, an anti-scraper web proxy using browser-based proof-of-work challenges tool. In this follow-up, I’ll explain why we made the switch, how Anubis integrates with our stack, and the impact it has had so far.

Installing and Configuring Anubis on FreeBSD

To get started, install the go-anubis package, add it to rc.conf, and then start the service.

pkg install go-anubis
sysrc anubis_enable=YES
sysrc anubis_args="-target http://127.0.1.XXX:81 -bind :8923 -difficulty 2 -policy-fname /usr/local/etc/nginx/botPolicies.json"
service anubis start

The -target flag defines where Anubis should forward requests after a client successfully passes the challenge. In our setup, this points back to the Varnish service that handles the actual traffic.

The -policy-fname flag is optional and only needed if you want to customize Anubis’ behavior. In our case, we provided a custom policy file to fine-tune the rules. However, you can start using Anubis with its default configuration and adjust it later as needed.

We also chose to lower the challenge difficulty using -difficulty 2 (the default is 4), making it less aggressive for legitimate users while still blocking abusive bots.

Configuring Anubis on NGINX

We decided not to forward all requests directly to Anubis. The goal is to avoid annoying legitimate users making simple or infrequent requests, while still stopping abusive bots and keeping server load under control.

Instead, we continue with the same approach described in our previous post, "‘AI’ Crawlers Hammering Git Repos Across the Web – A Rate Limiting Approach", by targeting only specific request patterns — particularly those containing a commit ID, which are frequently abused by crawlers. These paths are now routed through Anubis instead of being rate-limited, allowing us to challenge suspicious traffic without affecting normal users. This strikes a good balance between usability and protection. If crawler behavior intensifies, we may need to expand this filtering — but for now, it’s been effective.

Adding the Anubis upstream:

http {
  upstream anubis {
    server 127.0.1.XXX:8923 max_fails=99999 max_conns=1024 fail_timeout=1s;
    keepalive 256;
    keepalive_timeout 120s;
    keepalive_requests 256;
  }
}

Since we’re not sending all traffic directly to Anubis, it’s important to ensure that the Anubis service still receives all the required request data to perform its validation correctly.

# Anubis requirement
location /.within.website/ {
  proxy_pass  http://anubis;
  proxy_redirect default;
}

The next step is configuring NGINX to forward only specific requests to the Anubis service.

# URIs that are frequently targeted by crawlers, send to Anubis
location ~ ^/(ports|src|doc)/(log|plain)/ {
  set $backend "http://varnish";

  # We want only URIs with the 'id' parameter
  # /ports/log/graphics/exiftags?id=89e548d925ddd9bbb696fc30b6bcc87dac79b776
  # /ports/plain/mail/pear-Mail_Mbox/?id=4f22ec1271a760327feacd3845ae4fae3d455199
  access_by_lua_block {
    local args = ngx.req.get_uri_args()
    if args["id"] then
      ngx.var.backend = "http://anubis"
    end
  }
  proxy_pass $backend;
}

We’re leveraging Lua to selectively route traffic, but if you prefer to send all your website traffic through the Anubis service, the configuration is much simpler. Just update your main location / block like this:

location / {
  proxy_pass  http://anubis;
  proxy_redirect default;
}

If you’re using this method, you can also remove the initial configuration block that handles the default Anubis requirement.,

Customize Anubis Images

As an extra step, if you want to customize Anubis’ default challenge images, you can host your own versions and configure Anubis to serve them by redirecting the image requests.

For example, store your custom images in a path like /usr/local/www/anubis/ and use NGINX to rewrite the requests:

# Serve Anubis images directly instead of having it pass through to the service
# Place `happy.webp`, `pensive.webp` and `reject.webp` (former `sad.webp`) in the directory.
location /.within.website/x/cmd/anubis/static/img/
{
  alias /usr/local/www/anubis/;
}

Final Thoughts

That’s it. Replacing the rate limits we had before with Anubis challenges hasn’t impacted our overall server load — at least not so far. The experience for legitimate users remains smooth, and abusive traffic is effectively challenged. We hope things stay that way, but we’re ready to adjust if needed.

– dbaio

References & Further Reading