Why Google does not index you: bots are eating your crawl budget (and how to fix it)
Your hosting is saturated with malicious bots eating your crawl budget. That is why Googlebot reduces its visit frequency. Real data and a 30-min fix.
Summary
- Imperva 2025 data: 49.6% of all web traffic is bot, and 60%+ of that is malicious bot (scrapers, vuln scanners, AI crawlers, ad-stat inflators).
- How it hurts Google: Googlebot measures your response time and lowers crawl rate if your server is slow or returns 5xx (official docs).
- Symptoms: new pages take +2 weeks to index, GSC shows "Discovered — not indexed", response time > 800 ms.
- 4 fix levels:
1. Free Cloudflare + Bot Fight Mode (5 min, blocks 30-40%). 2. WAF custom rules (20 min, up to 60-70%). 3. SeoNova WPO Toolkit security module: pre-PHP bouncer, frees 70-80% CPU. 4. Reverse proxy + dedicated WAF for large sites.
- How to measure the fix: record requests/day, response time and not-indexed URLs in GSC; wait 14 days and compare.
You publish a new page. A week passes. Two. Four. You still do not see it in Google.
You blame “the algorithm,” content quality, a hidden penalty. Most of the time none of that is the cause. Googlebot has not even come back to read it. And if it did, it left in a hurry.
Let us break down why.
The number nobody is telling you
According to the Imperva Bad Bot Report 2025, in 2024 49.6% of all web traffic worldwide was bot. Half of the internet, robots. And of that 49.6%, over 60% was malicious bot (scrapers, vuln scanners, ad-stat inflators, runaway AI crawlers).
Translation: if your hosting receives 10,000 requests per day, 5,000 are bots. Of those 5,000, around 3,000 are bots that bring you zero human visitors. They just consume CPU, RAM, SQL connections and bandwidth.
Who pays the bill? Your hosting. Your PHP. Your database.
How this affects Google
Google has a technical concept called Crawl Budget (official docs since 2017). The short version:
Googlebot measures how much it can request from you without overloading your server. If your server responds fast, Googlebot asks for more. If your server responds slow or returns 5xx errors, Googlebot automatically lowers how often it comes back.
So: Google does not penalize you for being slow. But it stops coming as often. If it does not come often, it does not index new pages. If it does not index new pages, you do not rank. If you do not rank, no traffic.
What is the typical result for a small-to-mid site on shared hosting with no bot filtering?
- Google Search Console shows “Discovered — not indexed” on 40-70% of new URLs
- “Average server response time” in GSC: 1.5-4 seconds (Google wants < 600 ms)
- Pages that take 6-12 weeks to index, if they index at all
All while Bing — which has a more generous crawl budget and supports IndexNow for on-demand indexing — is indexing the same URLs in hours.
(That is why in our 8 sites 74.5% of SEO traffic comes from Bing and not Google.)
Who is eating your crawl budget
Not all bots are equal. There are 4 categories we see repeated on every hosting we audit:
1. Content scrapers
Sites that copy your content to republish it on affiliate sites, fake aggregators, or direct competitors. They usually come from rotating residential IPs so you cannot identify them easily. They can make hundreds of requests per minute for hours.
2. Vulnerability scanners
Bots that look for known holes: /wp-admin/, /.env, /wp-login.php?action=lostpassword, /xmlrpc.php, /wp-config.php.bak. They hit your site with thousands of nonexistent URLs hoping to slip through one. They generate massive 404 errors that also consume resources.
3. Runaway AI crawlers
Some AI crawlers (not the official ones like GPTBot or ClaudeBot, which respect robots.txt) go full throttle: GPTBot clones, dataset scrapers, “research” crawlers with no legitimate agent. They can do 20-50 requests per second for minutes.
4. Ad-stat inflators
Bots that simulate clicks or impressions on ads to artificially inflate stats from ad networks or competitors. They are the stealthiest because they imitate real browser user-agents.
Add all 4 → your hosting is saturated 50-80% of the day serving trash.
Symptoms that tell you have this problem
If any of these sound familiar, you almost certainly have bots eating your crawl budget:
- 🚨 New pages take more than 2 weeks to appear in Google
- 🚨 GSC shows “Discovered — not indexed” or “Crawled — not indexed” on many URLs
- 🚨 GSC “Average server response time” > 800 ms
- 🚨 Your cPanel / hosting panel reports CPU spikes you cannot explain
- 🚨 Your hosting bill keeps growing while human traffic does not
- 🚨 Your database (MySQL/MariaDB) hits 80% load outside peak hours
- 🚨 Visits in your stats plugin (Jetpack, Plausible, etc.) do NOT match server requests
- 🚨 Your CDN/Cloudflare marks a brutal percentage of cache MISS
If 3 or more apply: your crawl budget is being raided.
How to fix it (4 levels by case)
Level 1: Free Cloudflare (5 minutes)
The easiest thing you can do if you are not on Cloudflare yet:
- Sign up free at cloudflare.com
- Point your domain to their nameservers
- Enable Bot Fight Mode (Security → Bots → ON)
- Enable Browser Integrity Check (Security → Settings → ON)
This blocks 30-40% of the most obvious bot traffic. Free. Without touching WordPress.
Level 2: Basic free WAF (20 minutes)
If Cloudflare is on but not enough:
- Cloudflare Dashboard → WAF → Custom Rules
- Create rules to block:
- User-Agent containing “scrapy”, “selenium”, “headless”, “puppeteer”, “phantom”
- URLs like
/xmlrpc.php,/wp-config.php,/.env,/.git/ - Countries where you have no audience or customers (Block from the editor)
- Enable Super Bot Fight Mode if your plan allows ($20/mo Pro)
This pushes blocking up to 60-70% of malicious bot traffic.
Level 3: SeoNova WPO Toolkit, security module
Here a WordPress-specific solution that goes beyond Cloudflare comes in:
SeoNova WPO Toolkit includes a security module designed to act as a bouncer at the door of your WordPress: it identifies malicious bots on the first PHP line, before any plugin loads, before any theme runs, before any SQL query.
Specifically it:
- Filters known bot patterns without slowing real users (and lets through legitimate Googlebot and Bingbot)
- Blocks ad-stat inflators without touching your ad insertion
- Stops scrapers that copy your content in bulk
- Saves up to 70-80% of useless PHP/MySQL load
- Result: your server responds fast to Googlebot → crawl budget recovered
Unlike generic security plugins (Wordfence, iThemes, Sucuri), it does not load full PHP to decide whether to block: it decides before. That is why it does not hurt your Core Web Vitals.
Useful in cases where Cloudflare is already maxed out but the server is still saturated, or where you want protection without paying Cloudflare Enterprise.
Level 4: Reverse proxy + dedicated WAF (advanced)
For large sites (>500k pageviews/month), the serious option is:
- Nginx or Caddy as reverse proxy in front of WordPress
- Dedicated WAF: ModSecurity, OWASP CRS, or a managed one (Sucuri, StackPath)
- Aggressive rate limiting at the proxy level
- TLS fingerprinting to detect headless browsers
This needs DevOps. If you are here, hire someone or email us.
How to measure if what you did works
Before applying any fix, write these numbers down:
- Google Search Console → Crawl stats: requests/day and average response time
- GSC → Index Coverage: URLs “Discovered — not indexed” + “Crawled — not indexed”
- GSC → Pages → Not indexed: total
- Your hosting panel: average daily CPU + transfer
Apply the fix. Wait 14 days (Google needs time to readjust crawl rate).
Look at the same numbers again. If errors went down and indexed URLs went up, it worked.
The wrap
“Google is not indexing me” — 9 out of 10 times Google is NOT the one to blame. Your hosting is overrun by malicious bots while Googlebot is talking to a server that takes 3 seconds to reply.
Take those bots out of the picture, let your hosting respond fast, and Google comes back. No tricks, no waiting for the next dance, no agency contract.
It is one of the best technical ROI fixes you can do on your site this year. An hour and a half of work, benefits for months.
If you want to see how bot-heavy your hosting is right now and which WPO Toolkit module fits you, drop us a line at [email protected] or visit seonova.pro/en/wordpress-optimization.
Data from Imperva Bad Bot Report 2025. Crawl budget documentation: Google Search Central, 2017.
— The SeoNova team
Keep reading
More posts you might like
- WordPress Security
Free Cloudflare: how to set it up and 5 WAF anti-bot rules for your WordPress
Why free Cloudflare is brutal, how to set it up step by step, and 5 WAF rules that block 60-70% of malicious bots without paying a cent.
9 min read - WordPress Security
WordPress Application Passwords: pros, risks, and a step-by-step guide to create one
What WordPress Application Passwords are, when to use them, risks, and a step-by-step guide to create and revoke one without your main password.
7 min read - Technical SEO
In 2026, Bing brings us nearly 3x more SEO traffic than Google. Here is the data.
Real data from 8 production sites: Bing delivers 74.5% of total SEO clicks. Why Bing outperforms Google in 2026 and what to do on your site.
6 min read