Google’s search index is its fetched database of the billions of websites which exist in the world. When a new website or a page within it is “indexed”, it’s being added to Google’s database, making it able to be found with the proper search on Google.
If a page isn’t indexed in Google, it may as well not exist if the goal is to get organic traffic to it.
I recently talked about what to do if Google isn’t indexing your site, but there are times when you don’t want to a page of yours indexed.
In other words, there are some pages you don’t want to appear in Google.
How to NoIndex a Page
Before we get into the pages and content you don’t want indexed, let’s talk about how to noindex.
There are two main ways to achieve this, and each has a slightly different purpose.
NoIndex Meta Tag in Head Section
In WordPress, it’s a simple process to noindex a page, meaning you don’t want it to appear in Google.
The Yoast SEO plugin for WordPress gives you an option on every individual page on your site to noindex.
Once the plugin is installed, navigate to the page in WordPress. In the Yoast section below the content itself, click on the “advanced” tab.
Change the setting to “No” from the drop down if you don’t want the page to appear in Google.
This will add a special noindex meta tag in the <head> section of the page, informing Google to ignore the page.
If you’re not working in WordPress or simply don’t want to use a plugin, it’s still an easy process.
In the <head> section of the page you want to be noindexed, paste the following code in:
<meta name="robots" content="noindex">
This will tell most search engines to disregard that page and consequently will keep it from appearing in their index.
If you exclusively want to block Google’s bots, replace the “robots” like so:
<meta name="googlebot" content="noindex">
NoIndex a Page With Robots.txt
The other option is to utilize your site’s robots.txt file.
This is a special file located in the root directory of your website which crawlers from Google and other search engines consult to know which content to ignore.
You can block more specific files or even entire directories on your website more efficiently than with the noindex tag just discussed.
If you don’t have a robots.txt file yet, create a basic txt file on your computer. If you already have one, just edit the existing file and add the restrictions which I’ll mention next.
If you’re not sure whether or not you have a robots.txt file, just search for your site/robots.txt. The file will load in your browser and show existing permissions if you already have one.
Once you have your robots.txt ready to edit, you can disallow any URLs, files, directories, etc. you want the search engine to exclude:
User-agent: *
Disallow: /wp-admin/
Disallow: /thank-you/
Disallow: /free-template.jpg
Allow: /wp-admin/admin-ajax.php
User-agent: Googlebot
Disallow: /whatever-page/
Let’s dissect the sample lines from the code above:
User-agent: *
You’ll typically want to start your robots.txt with this line. The “User-agent” refers to the specific search engine bot you want to obey the below permissions.
The “*” refers to all bots without specificity. For the first three examples, we’re instructing all bots to obey the following three rules.
Later, we created a separate “User-agent: Googlebot”, instructing just Google (Googlebot is its user-agent) to not crawl “/whatever-page/”.
Disallow: /wp-admin/
We’re instructing all bots to not crawl “wp-admin” or any content contained within that folder.
Disallow: /thank-you/
We’re instructing all bots to not crawl our “thank-you” page on our site.
Disallow: /free-template.jpg
We’re instructing all bots to not crawl a specific image on our site, found at that URL.
Allow: /wp-admin/admin-ajax.php
This instructs all bots to crawl this URL.
Simple enough, right?
Should I Use a Meta Tag or Robots.txt for NoIndexing?
With both options available, you might be wondering if you should use a meta tag or a robots.txt to noindex something.
The best answer is that it depends on what you want to noindex.
For Specific Pages – I recommend the Meta/NoIndex tag.
For a Folder or Image/Other Type of Content – Use the robots.txt option.
The robots.txt option also comes with a caveat.
It’s possible that Google or other search engines will STILL index content you specifically set to disallow in your robots.txt folder if they have links to them from other pages.
Typically this isn’t an issue when most content you don’t want indexed won’t have any links to it.
If it does, go with the Meta/NoIndex tag, which is why I recommend that option in general for pages.
NoIndex SEO – 4 Types of Content to NoIndex
Now that you know HOW to noindex content on your site with a couple of different options, let’s get practical.
Here are 4 types of content or pages which you likely don’t want indexed in Google.
Duplicate Content
When I defined canonical URLs, I explained that Google only indexes one instance of the same piece of content.
There might be instances on your site when you have multiple versions of the same content by design.
The classic example is when you might have a printer specific copy of the original content.
You only want that main version to appear in Google, so you would noindex that duplicate.
Admin Pages
For security purposes, you don’t want login or administration pages appearing in Google for your website.
In the above robots.txt example, I disallowed Google access from crawling the wp-admin directory and all the content therein.
Typically WordPress has this set up by default, which is why if you’re running a WordPress site you likely already have a robots.txt file without creating one yourself.
Any Private Content
Let’s say you’re not working in WordPress or a system where you can quietly create a draft of something before it goes live.
If you have live content you’re working on on your website, even if people can’t find it through accessing your website, bots can still find and index it.
For this reason, you’ll want to noindex any content which you don’t want people to see on your site as a general rule.
Thank You Page
Let’s say you are giving away a download as an incentive to encourage someone to sign up for your email list.
The easiest way to do this is that once someone signs up, they are sent to a designated “thank you” page on your site.
On that page there’s a download link for the promised incentive.
As such, we only want people to be able to access this page if they’ve signed up for your list.
If Google indexes this page, it will begin to appear in the SERPs in connection with your site. This would allow people to side step your signup entirely and still get the freebie.
With that in mind, make sure that your thank you page is set to be noindexed.
Now you know both the types of content you want search engines to ignore AND how to get them to do it!
Pingback: Google Search Console Coverage - How to Fix Your Content With It - Angry SEOer