By Dennis Gaskill
As most webmasters know, online content theft is a widespread problem. But you can
do something about it. Begin by finding the thief using these three methods:
Method One:
All content has unique text passages.
By using Google and an
exact-phrase search you can easily find other pages duplicating
your content.
Copy a unique sentence from your content and paste it into
Google with quotation marks around the phrase, then search. Only
pages with that exact sentence will be returned. Visit any returned
pages to verify it has your content. Should you see many pages
returned that do not have your content, choose a more unique text
passage to search.
Method Two:
Most content thieves are lazy. They’ll
often view your source code to steal
content, and take it all.
By placing a hidden link in the middle of a long paragraph
(where it’s less likely to be noticed) you can easily track these
thieves. CSS lets you hide links from view, without running afoul
of the search engines.
To legally hide links, use the CSS display property with the
value set to “none.” Here’s an example:
link:http://www.yoursite.com/i.htm
You might be tempted to omit the style code and link text so the
link would be shorter — thus less likely to be ferreted out by the
thief — but search engines consider that an illegal hidden link.
With that in your code you can now use Google to find pages
that link to that specific page, which will only be content thieves.
To search for links to the page, just enter the following into
Google’s search box:
link:http://www.yoursite.com/i.htm
You should create a page with the page name you use in your
hidden link or it will appear to Google that you have a broken link
within your website from your own content.
Also use a robots.txt file and noindex metatag to keep Google
from indexing the page. Don’t link to it from other pages on your
site for your visitors to find — you only want use it to catch content
thieves.
Method Three:
The third method doesn’t rely on
search engines, but uses your website’s
log files. Many content thieves are too lazy to even look
through your source code line by line. They copy and paste your
code and, if it looks good on the visible page, they consider the
job done.
Somewhere in your content, perhaps in the middle of a long
paragraph so it blends with the text, use a transparent GIF image
as a spacer between words. The page will look normal in a browser.
If someone copies and pastes your content (code and all) into
their own page, the transparent GIF will be called from your server
to the offender’s server. You will need to use the full URL to the
image in order for the image to transfer to the offender’s Web page.
Then, by checking your website’s log files you can see where the
GIF image is being used. Go there and you will find another thief.
I have used all three methods outlined above, and they all
work. However, I prefer Method One because it does not rely on the
laziness of the offender.
If You Find a Thief: Look for contact information and serve
notice that your copyrighted material is being used illegally and ask
them to remove it. The content thief, contact person, and website
owner can all be different people, so start friendly but firm. If that
fails, use http://www.whois.net to find the site owner and contact
him or her directly.
WhoIs data will also include the site’s name server. Look that up
in WhoIs to find the hosting company and inform them of their customer’s
copyright violations. If those actions don’t work, get tougher.
WhoIs data also includes the site owner’s mailing address. Send
a formal “Cease and Desist” letter to the owner, then file a notice of
Digital Millenium Copyright Act (DMCA) infringement with search
engines and directories to have the offending site removed from
their database. Use the link:search mentioned in method two to find
other sites that link to the violator.
For proof of infringement, try http://web.archive.org/ to show
that the content appeared on your site prior to the offender’s site.
If all else fails, you may have to hire an attorney.
Creating quality content is hard work. Don’t let others prosper at
your expense.
About the Author: Dennis Gaskill has been a full-time internet entrepreneur since 1999 and author of the print book, “Web Site Design Made Easy,” now in its third edition.
“Web Site Design Made Easy” is the teaching text at hundreds of colleges nationwide. Visit http://www.BoogieJack.com for products and information
of interest to webmasters of small and home businesses, as well as the hobbyist webmaster.