Robot.txt

There are several reasons and scenarios where we need to control the access of web robots or web crawlers or simple spiders, to our website or a part of our website. Like Google-bot (Google Spider) visiting our website, spam bots too will visit. Spam bots usually visit and collect private information from our website. When a robot crawls our website it uses a considerable amount of the website’s bandwidth too! It is easy to control robots by disallowing the access of the web robots to our website through the usage of a simple ‘robots.txt’ file.

Creating a robots.txt:

Open a new File in any Text Editor Like Notepad.

The rules in the robots.txt file are entered in a ‘field’: ‘value’ pair.

<field>:<value>

<field>

Can have possible two values: allow or disallow for a particular URL.

<value>

A URL or URI that the access or rule is specified.

Examples:

To exclude all the search engine robots from indexing our entire website.

User-agent: *
Disallow: /

To exclude all the bots from a certain directory within our website.

User-agent: *
Disallow: /aboutme/

Disallow multiple directories.

User-agent: *
Disallow: /aboutme/
Disallow: /stats/

To control access to specific documents.

User-agent: *
Disallow: /myFolder/name_me.html

To disallow a specific search engine bot from indexing our website,

User-agent: Robot_Name
Disallow: /

Advantages of Using Robots.txt:

  • Can avoid the wastage of resources.
  • Can save Bandwidth
  • Can remove Clutter and complexity from Web Statistics and more smooth analytics
  • Refusing a specific Robots

Supercharge Your Node.js Apps with Veeble’s Optimized Hosting

Get peak performance with Veeble’s Node.js hosting. Global data centers, SSD storage, and robust security ensure speed and reliability. Scale easily and deploy faster.

Scroll to Top