Skip to Main Content

More Control of Robots.txt, REP META Introduced

Posted on 8.15.2007

Google Webmaster Central posted an announcement that it has improved its robots.txt analysis tool to recognize sitemaps declarations and relative URLs. Previous versions of the Google console were not aware of sitemaps at all and only understood absolute URLs. The new version tells webmasters whether their sitemap's URL and scope are valid, and Webmasters will be notified through the Webmaster Central console of multiple problems (per line) if they exist such as errors with syntax.

CLICK TO ENLARGE:

Google also provided some insights into the unavailable_after Meta tag which lets webmasters tell Google the exact date and time specific pages should stop being crawled and indexed - good for limited-time sales offers for example.

The other piece of news from the post is an announcement of the new X-Robots-Tag directive which adds Robots Exclusion Protocol (REP) META tag support for non-HTML pages. This will give webmasters control over videos, spreadsheets and other indexed file types. For example, if you have a promotion page in a PDF, format, you would use the following in the file's HTML headers:

X-Robots-Tag: unavailable_after: 31 Dec
2007 23:59:59 EST

From the official Google post: "REP META tags can be useful for implementing noarchive, nosnippet, and now unavailable_after tags for page-level instruction, as opposed to robots.txt, which is controlled at the domain root."

WebsiteMagazineMiniLogo

Leave Your Comment

Login to Comment

Become a Member

Not already a part of our community?
Sign up to participate in the discussion. It's free and quick.

Sign Up

 

Leave a comment
    Load more comments
    New code
  •    

    The Ultimate Guide to Personalization

    Kibo