• Learn SEO

What are HTTP Status Codes

  • Felix Rose-Collins
  • 3 min read

Intro

Robots Exclusion Protocol (REP) is a Webmaster file that is used to instruct robots. The instructions help the robots crawl web pages and index them for various websites. This REP is sometimes referred to as Robots.txt. They are placed in the top level of the web server directory to be most useful. For example: https://www.123abc.com/robots.txt REP groups are used as a web standard that regulates bot actions and search engine indexing behavior. Between 1994 and 1997, the original REP defined bot behavior for robots.txt. In 1996, search engines supported additional REP X-robot tags. Search engines handled links where the value contained a “follow” using a microformat rel-no follow.

Robot Cheat Sheet

To Totally Block web crawlers

User-agent: *   
Disallow: /   

To Block specific web crawlers from a target folder

User-agent: Googlebot   
Disallow: /no-google/   

To Block specific web crawlers from a target web page

User-agent: Googlebot   
Disallow: /no-google/blocked-page.html
User-agent: *   
Disallow:   
Sitemap: https://www.123abc.com/none-standard-location/sitemap.xml   

Exclusion Specific Robot Protocol Tags

URI, REP tags are applied to certain indexer task, and in some cases nosnippet, noarchive and noodpquery engines or a search query. Resources tagged with exclusion tags, search engines such as Bing SERP listings show these external links as forbidden URLs. Besides crawler directives specific search engines will interpret REP tags differently. An example of this can be seen in how Bing will sometimes list outside references on their SERPs as forbidden. Google takes the same listings and wipes out the URL and ODP references on their SERPs. The thought is that X-Robots would overrule directives that conflict with META elements.

Microformats

Particular HTML factors will overrule page settings in micro-formatted index directives. This method of programming requires skills and a very keen grasp of web servers and HTTP protocol. An example of this protocol would be a page of X-Robot tags with a particular element link that say follow then rel-nofollow. Robots.txt indexers usually lack directives, but it is possible to set group indexers of URIs that have a server with sided scripts on the site level.

Pattern Matching

Webmasters can still utilize two separate expressions to denote page exclusion. The two characters are the asterisk and the dollar sign. The asterisk denotes that can represent any combination of characters. The dollar sign is to denote the end of the URL.

Unrestricted Information

Robot files are always public, so it’s important to be aware that anyone can view a robot file attached to a web page. It is also accessible information where the Webmaster blocks the engines from on the server. These public files leave access to private user data that could include private individual data. It is possible to add password protection to keep visitors and others from viewing classified pages that should not be indexed.

Additional Rules

  • Simple meta robot parameters like index and follow command should only be used to prevent page indexing and crawling.
  • Dangerous bots will most certainly ignore these commands and as such are a useless security plan.
  • Each URL is only allowed one “disallow” line.
  • Separate robots files are required on each subdomain
  • Filenames for the bots are case-sensitive
  • Spacing does not separate search parameters

Top SEO Tactics: Robot.txt

Page Blocking – there are several ways to prevent a search engine from indexing and accessing a web page or domain.

Using Robots to Block pages

This exclusion tells the search engine not to crawl the page, but it may still index the page to show it in SERP listings.

No Index Page Blocking

This method of exclusion tells search engines they are allowed to visit the page, but they can not allowed to display the URL or save the page for its index. This is the preferred method of exclusion.

This is not a supported tactic. Search engines can still access pages with this command. Even if the search engine cannot directly follow the page, it can access the content using the browser analytics or other linked pages.

Meta Robots vs. Robots.txt

An example of a website’s robots.txt file can help clarify the process of the program. In the example the robot file is blocking the directory. When the particular URL is searched for in Google it shows that 2760 pages have been disallowed from the directory. In the example, the engine has not crawled the URLs, so they will not appear like traditional listings. These pages will accumulate link juice once they have links attached to them. In addition to their ranking power, they will also begin to gain popularity and trust from appearing in searches. Since the pages can’t be a benefit to the site because they aren’t being crawled. The best way to fix this problem and not have wasted ranking power on a page, it is prudent to use another method of exclusion to remove the individual pages. The coding would appear as: meta tag this method would exhibit better performance than the previous method.

Felix Rose-Collins

Felix Rose-Collins

Ranktracker's CEO/CMO & Co-founder

Felix Rose-Collins is the Co-founder and CEO/CMO of Ranktracker. With over 15 years of SEO experience, he has single-handedly scaled the Ranktracker site to over 500,000 monthly visits, with 390,000 of these stemming from organic searches each month.

Start using Ranktracker… For free!

Find out what’s holding your website back from ranking.

Create a free account

Or Sign in using your credentials

Different views of Ranktracker app