If your database contains a large number of posts or products, then your search can generate many pages of results.
The bots will crawl all those search pages, which can lead to tens of thousands of pages loaded every single day.
Perhaps you want to let your pages crawled, perhaps you don’t. There are many discussions about the SEO benefits or drawbacks of letting search pages being crawled and indexed by bots, and this is not the subject here.
Anyway, we will assume in the following discussion that your SEO specialist, in-house or consultant, do not want.
The solution: meta tags
Bots can read your html, and interpret some specific meta tags related to crawling and indexing.
By adding a “NOFOLLOW” meta tag to all your search pages, you tell all bots that discover one of those search pages no to crawl the other pages linked to it:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<META NAME="ROBOTS" CONTENT="INDEX, NOFOLLOW">
Again, ask your SEO expert before doing something you could regret.
The solution: robots.txt
Advantage of this solution: you do not have to change your search pages content. But you have to rely on the fairness of all bots.
Bots follow instructions declared in a file named robots.txt, at the root of your website.
We will add a few lines of commands to tell all robots not to follow any url corresponding to a search page.
For instance, urls like:
Add the following block to your own robots.txt:
You can and should test your page urls with a robots.txt tester, like http://tools.seobook.com/robots-txt/analyzer/.
It does not work!
Official bots will follow your robots.txt instructions. Just wait 3-4 days, and you should see the number of daily pages crawled drop.
I myself have witnessed a 60 thousand products WooCommerce site, drop from 50 thousand pages crawled a day, to a few hundreds by using this robots.txt !
It does not work, again !
For bad robots, you’ll have to actively stop them by brute force. There are WordPress plugins like Wordfence that can identify your bots, and neutralize them.
Some CDNs can do the trick also.