What is the X-Robots-Tag?
When a browser or a search engine bot requests a resource from your server, the server sends back an HTTP response, which can include various headers. The X-Robots-Tag is one of those headers that gives specific instructions on how search engines crawl and index their pages.
While many of us are familiar with the traditional robots.txt file or the robots meta tag that you can place in the HTML of a webpage, the X-Robots-Tag works a bit differently. It allows you to apply indexing instructions at the HTTP header level, which is useful for non-HTML files like PDFs, images, and other media types.
An example of what the HTTP response with the X-Robots-Tag may look like:
HTTP/1.1 200 OK
Date: Tue, 23 Oct 2024 10:32:20 GMT
Content-encoding: gzip
(…)
X-Robots-Tag: noindex
(…)
Is the X-Robots-Tag Important?
The X-Robots-Tag provides a level of control that traditional methods can’t always achieve. For example, let’s say you have a downloadable PDF that you don’t want to appear in search results. By using the X-Robots-Tag, you can set it to noindex, nofollow for that specific file. This way, search engines will crawl the page but recognise that they should neither index the document or follow any links within it.
Here are a few reasons why it’s important:
- Enhanced Control: It gives you more precise control over how search engines interact with different types of content on your site.
- Preventing Duplicate Content: If you have multiple versions of the same content, you can use the X-Robots-Tag to prevent indexing of duplicates, which can help avoid penalties from search engines.
- Optimising Crawl Budget: By instructing search engines not to crawl certain pages or resources, you can ensure that your crawl budget is used effectively on your most important content.