How to prevent bots from crawling my WordPress search results ?

688 views December 2, 2016 July 13, 2019 admin 2

Prevent bots crawling your WordPress search pages

Prevent bots crawling your WordPress search pages

The problem

If your database contains a large number of posts or products, then your search can generate many pages of results.
The bots will crawl all those search pages, which can lead to tens of thousands of pages loaded every single day.

Perhaps you want to let your pages crawled, perhaps you don’t. There are many discussions about the SEO benefits or drawbacks of letting search pages being crawled and indexed by bots, and this is not the subject here.

Anyway, we will assume in the following discussion that your SEO specialist, in-house or consultant, do not want.

The solution: meta tags

Bots can read your html, and interpret some specific meta tags related to crawling and indexing.

By adding a “NOFOLLOW” meta tag to all your search pages, you tell all bots that discover one of those search pages no to crawl the other pages linked to it:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
or
<META NAME="ROBOTS" CONTENT="INDEX, NOFOLLOW">

Again, ask your SEO expert before doing something you could regret.

The solution: robots.txt

Advantage of this solution: you do not have to change your search pages content. But you have to rely on the fairness of all bots.

Bots follow instructions declared in a file named robots.txt, at the root of your website.

We will add a few lines of commands to tell all robots not to follow any url corresponding to a search page.
For instance, urls like:
?
?s
?s=
?s=test
?s=test&post_type=product
?post_type=product&s=test
?post_type=product&s=

 

Add the following block to your own robots.txt:
User-agent: *
Disallow: /*?s$
Disallow: /*?s=*
Disallow: /*?*&s=*
Disallow: /*?*&s$

You can and should test your page urls with a robots.txt tester, like https://tools.seobook.com/robots-txt/analyzer/.

It does not work!

Official bots will follow your robots.txt instructions. Just wait 3-4 days, and you should see the number of daily pages crawled drop.

I myself have witnessed a 60 thousand products WooCommerce site, drop from 50 thousand pages crawled a day, to a few hundreds by using this robots.txt !

It does not work, again !

For bad robots, you’ll have to actively stop them by brute force. There are WordPress plugins like Wordfence that can identify your bots, and neutralize them.

Some CDNs can do the trick also.

Was this helpful?

Leave A Comment
*
*

This site uses Akismet to reduce spam. Learn how your comment data is processed.


Join Our weekly Newsletter

Receive our latest news once a week, each Thursday afternoon.
Your email is kept 100% private, and you will not receive other stuff in your mailbox.

We keep your data private and share your data only with third parties that make this service possible. Read our Privacy Policy.