Common Tag - The New Semantic Layer

Yahoo announced support of Common Tag, a new semantic tagging format for Web pages. The easiest explanation for Common Tags is that it is an HTML extension that makes it easier for content to be indexed and categorized. Do I have your attention yet?

The goal of semantic standards like Common Tag is to allow better decentralized interoperability between many different kinds of companies, services and individuals. According to the Common Tag website, "The Common Tag format was developed to address the current shortcomings of tagging and help everyone - including end users, publishers and developers - get more out of Web content. With Common Tag, content is tagged with unique, well-defined concepts - everything about New York City is tagged with one concept for New York City and everything about jaguar the animal is tagged with one concept for jaguar the animal. Common Tag also provides access to useful metadata that defines each concept and describes how the concepts relate to one another. For example, metadata for the Barack Obama Common Tag indicates that he's the President of the United States and that he's married to Michelle Obama."

Common Tag was developed jointly by a group of Web companies including Zemanta, Metaweb, and Yahoo!. Its format essentially adds semantic meaning to tags, making Web content more discoverable and enabling the community to create more useful applications for aggregating, searching, and browsing the Web. "Semantic tagging is an important next step in the evolution of the Web. When we add semantic meaning to tags, the content that is tagged becomes significantly easier for machines to understand. That in turn allows for the development of more intelligent applications for aggregating, searching, and browsing the Web," said Peter Mika from Yahoo! Research.

Content using the Common Tag format is more discoverable over time as more and more content related to a specific concept is accessible through a certain tag. As application developers begin to recognize or deploy offerings using Common Tag formatting, they'll deliver more related content to their users and in turn drive more traffic to publishers who use the Common Tag format. Services like DERI's Sindice.com provide developers with tools to find and incorporate related content into their applications using Common Tag. Google (through its Rich Snippets offering) and Yahoo! (through it's Search Monkey platform) already read RDFa (the markup standard used by the Common Tag format) to acquire information about sites. Learning as much as we can now and planning for support of the Common Tag format could provide dividends in the future.

Applications on the Web today could also use the Common Tag format to connect users. Since all content related to a particular concept can be connected to and organized by a single tag, support of the Common Tag metadata could conceivably help connect entire concepts to one another. This would allow publishers and developers to present end users with even more related content. For example, AdaptiveBlue's Glue service (a free browser plugin that helps users find content and media through friends as popular sites are browsed) plans to use the Common Tag format to help connect end users to other people with similar interests and to other related content across the Web.

The organizations that developed the Common Tag format offer a range of services to help publishers and bloggers take advantage of semantic tagging: a standard and extensible set of tags, simple tools to relate those tags to Web page content, and services that help users discover tagged content from other sites and popular search engines.

To tag their content using Common Tag, publishers can use automated tagging tools like those offered by Zemanta, or tag their content themselves. Zemanta is a service designed to enhance content submitted to it by analyzing that content and returning relevant metadata (tags, entities, categories) and content enhancements (related article and image links) to you. The service stores the derived metadata and content enhancements and makes them available to others provided they possess the appropriate Request ID (RID). The service also stores submitted content except in certain cases when handling delicate information. Social tagging services like Faviki (which combines social bookmarkig and Wikipedia) and Zigtag also allow end users to tag content using the Common Tag format.

Yahoo! has long been a proponent of tagging of open formats (such as Microformats) that accelerate the structuring of the Web to improve the community's overall ability to understand the Web. The problem is we've been there and done that. Who is to say that we need a hierarchy and authority for what it all means? And who appointed Common Tag to be that authority? There are existing solutions such as OpenCalais which are accessible through offerings such as the Calais Modules for Drupal or Tagaroo, a Calais Plugin for WordPress blogs.

Calais QuickMeta is a small code snippet you insert in your site's template. And, 99.99 percent of the time it does ... nothing. But when a metadata crawler such as Yahoo! comes along, it wakes up and goes into action. It automatically tags the content on your page and formats those tags in the microformats Yahoo! understands and hands them back to Yahoo for harvesting. Net result? You're part of the Semantic Web.

Tags are expressed using RDFa, a standard format for defining data in HTML. Relevant code can be found in the Common Tag Quick Start Guide. Interested parties can learn more in the Yahoo! Common Tag group. The challenge that Common Tag will have is ensuring that more work is not required on the part of publishers. Fortunately, a few tools currently exist to facilitate development in and around the Common tag.