Robots.txt & Spiders

Photo Photo Photo Photo Photo Photo
. . . and finds:

    User-agent: *
    Disallow: /

The "User-agent: *" means this section applies to all robots.
The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

  • robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
  • the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.
So don't try to use /robots.txt to hide information.

What is the rel="nofollow" link attribute?

The rel="nofollow" is an attribute you can set on an HTML link tag, invented by Google, and adopted by others. Those links won't get any credit when Google ranks websites in the search results, thus removing the main incentive behind blog comment spammers robots.

See Preventing comment spam on the Official Google Blog.

From that description it sounds like it only affects the ranking, and the Google robot may still follow the links and index them. If so, it is different from the robots meta tag NOFOLLOW semantics.



Aren't robots bad for the web?

There are a few reasons people believe robots are bad for the Web:

•Certain robot uses overloaded networks and servers.
•Robots are operated by humans, who make configuration mistakes, or simply don't consider the implications of their actions.
•Web - wide indexing robots build a central database of documents, which doesn't scale too well to millions of documents on millions of sites.

But at the same time the majority of robots are well designed, professionally operated, cause no problems, and provide a valuable service in the absence of widely deployed better solutions.

So no, robots aren't inherently bad, nor inherently brilliant, and need careful attention.