To Catch a [Content] Thief
By Dennis Gaskill
As most webmasters know, online content theft is a widespread problem. But you can do something about it. Begin by finding the thief using these three methods:
All content has unique text passages. By using Google and an exact-phrase search you can easily find other pages duplicating your content.
Copy a unique sentence from your content and paste it into Google with quotation marks around the phrase, then search. Only pages with that exact sentence will be returned. Visit any returned pages to verify it has your content. Should you see many pages returned that do not have your content, choose a more unique text passage to search.
Most content thieves are lazy. They’ll often view your source code to steal content, and take it all.
By placing a hidden link in the middle of a long paragraph (where it’s less likely to be noticed) you can easily track these thieves. CSS lets you hide links from view, without running afoul of the search engines.
To legally hide links, use the CSS display property with the value set to “none.” Here’s an example:
You might be tempted to omit the style code and link text so the link would be shorter — thus less likely to be ferreted out by the thief — but search engines consider that an illegal hidden link.
With that in your code you can now use Google to find pages that link to that specific page, which will only be content thieves. To search for links to the page, just enter the following into Google’s search box:
You should create a page with the page name you use in your hidden link or it will appear to Google that you have a broken link within your website from your own content.
Also use a robots.txt file and noindex metatag to keep Google from indexing the page. Don’t link to it from other pages on your site for your visitors to find — you only want use it to catch content thieves.
The third method doesn’t rely on search engines, but uses your website’s log files. Many content thieves are too lazy to even look through your source code line by line. They copy and paste your code and, if it looks good on the visible page, they consider the job done.
Somewhere in your content, perhaps in the middle of a long paragraph so it blends with the text, use a transparent GIF image as a spacer between words. The page will look normal in a browser.
If someone copies and pastes your content (code and all) into their own page, the transparent GIF will be called from your server to the offender’s server. You will need to use the full URL to the image in order for the image to transfer to the offender’s Web page.
Then, by checking your website’s log files you can see where the GIF image is being used. Go there and you will find another thief.
I have used all three methods outlined above, and they all work. However, I prefer Method One because it does not rely on the laziness of the offender.
If You Find a Thief: Look for contact information and serve notice that your copyrighted material is being used illegally and ask them to remove it. The content thief, contact person, and website owner can all be different people, so start friendly but firm. If that fails, use http://www.whois.net to find the site owner and contact him or her directly.
WhoIs data will also include the site’s name server. Look that up in WhoIs to find the hosting company and inform them of their customer’s copyright violations. If those actions don’t work, get tougher.
WhoIs data also includes the site owner’s mailing address. Send a formal “Cease and Desist” letter to the owner, then file a notice of Digital Millenium Copyright Act (DMCA) infringement with search engines and directories to have the offending site removed from their database. Use the link:search mentioned in method two to find other sites that link to the violator.
For proof of infringement, try http://web.archive.org/ to show that the content appeared on your site prior to the offender’s site. If all else fails, you may have to hire an attorney.
Creating quality content is hard work. Don’t let others prosper at your expense.
About the Author: Dennis Gaskill has been a full-time internet entrepreneur since 1999 and author of the print book, “Web Site Design Made Easy,” now in its third edition. “Web Site Design Made Easy” is the teaching text at hundreds of colleges nationwide. Visit http://www.BoogieJack.com for products and information of interest to webmasters of small and home businesses, as well as the hobbyist webmaster.