What TF-IDF Means for On-Page SEO
:: By Travis Bliffen, Stellar SEO ::
"Tf-Idf" is a term used in the field of information retrieval. TF in full is Term Frequency. It means the number of times a term appears in a document. In most cases a term appears more times in a long document than in a short document. When calculating term frequency (tf), you take the term and divide it by the number of words in the document. An example is when the number of terms in a document is 200 and a term has been used 4 times. This is the tf; 4/200 = 0.02. Those of you who have been working in SEO for a while should recognize that formula from calculating keyword density.
The Latter portion IDF stands for Inverse Document Frequency. Inverse document frequency measures the importance of a term in a document. Some terms are used numerous times within the corpus (test group of data) though they are of little importance. An inverse document frequency is meant to reduce these kinds of terms, giving weight to rare but important words in the document. Td-Idf weight is the product of the values of term frequency and inverse document frequency. Information retrieval methods have changed a lot over the years. The simple statistical methods used years ago have been replaced by faster and more effective ones.
Nowadays the keyword density isn’t as important as it was a few years ago either. Google algorithms have become more complex. Overuse of keywords in your document, popularly known as keyword stuffing is more effective at earning penalties than better rankings. Tf-Idf calculations will however allow you to optimize the overall topic of your content using competitive analysis just as you would have a few years back using keyword density comparisons.
How to Calculate Tf-Idf
If you love math and want to manually calculate Tf-Idf ratios for your content, Wikipedia has the formula mapped out with an example calculation. If on the other hand you would rather skip the math, onpage.org has an awesome tool in their dashboard that will give you the ratios for the top sites for any keyword.
As you can see above, you are given the term frequency and the Tf-Idf ratios in an easily digestible layout. Even better, they have a free option for a single URL so you can try the tool.
How to Use Tf-Idf to Your Advantage
Once you determine the ratios prevalent in your niche how can you use this to your advantage? Even though it is a more advanced metric than keyword density, it is still easy to implement, especially using the onpage.org tool. As you see above, the first column listed is “documents”. Just above that is the data set which is used for the calculations (corpus). In our case it is the top 15 pages for the keyword “local SEO services”. This is essentially your “go to” list for LSI keywords and implementation ratios.
Referring to the above image again, notice that “Local Splash” appears in only two (13 percent) of the searched documents while “local seo” appears in 93 percent of the documents. If you were missing “local seo” from your page this is a clear sign to add it. While that example is pretty obvious, I think you can see how this could uncover a series of related terms that could be used in your content. Further down the same list the term “digital marketing” appears, showing up in 40 percent of the searched documents, a related term but one you may not have in your content.
The next part to consider is the ratio. Again, you could do this manually with the formula shown on Wikipedia or you could save a lot of time and hassle by using another tool from onpage.org.
As you can see above, this tool allows you to input your text for a specific keyword and then suggests terms that could be added, used more, or used less based upon your competition.
SEMrush, another one of my favorite tools offers a similar tool under “seo ideas” that suggests related terms you can add to a page to better optimize it.
So, as you can see, the ability to optimize content is getting closer to science every day. Hopefully this overview of Tf-Idf has you thinking of ways to better prepare your SEO strategy for the future.
Over to You…
Are any of you experimenting with Tf-Idf for on-page optimization?
About the Author
Travis Bliffen is the founder of Stellar SEO, a Web design and marketing firm located in Franklin, TN. Travis and his team are equipped to handle any size SEO project and have helped numerous businesses to date build a rock solid online presence. When you are ready for more leads and sales, it is time to get #stellarized. Connect on Facebook or Twitter @theseoproz