SEO Fever

Search Engine Optimization
for the World Wide Web

Everything you've always wanted to know about
search engine optimization, but were afraid to ask.

Home>> SEO Knowledge Base >> Latent Semantic Indexing

Free SEO Tips RSS Feed

Latent Semantic Indexing (LSI)

What is LSI?

Latent Semantic Indexing (LSI) is, in a nutshell, word relationship technology. LSI is an important step in the document indexing process – it creates a result set by examining a document collection and producing results based on similarity between the documents.

Documents which have many words in common are said to be semantically close, while those with few words in common are semantically distant. By placing additional weight on related words in content, or words in similar positions in other related documents, LSI has a net effect of lowering the value of pages which only match the specific term and do not back it up with related terms.

By doing a Google search for a word with a tilda (~) preceding, it will show you what Google believes are related words (not necessarily synonyms). This data is collected during the 'thesaurus lookup' stage of query processing. For example, a search for ‘~dental’ will have ‘teeth, ‘tooth’, ‘dentist’, ‘dentistry’, and ‘oral’ all highlighted in the SERPs. This is possible as all of these words appear in a multitude of semantically-close documents relating to the dental industry.

For further reading on latent semantic indexing, please visit the following links:

Wikipedia's LSI/LSA Overview Page
SEOBook's excellent 'Patterns in Unstructured Data'
Telcordia's LSI Papers
Using Latent Semantic Indexing for Information Filtering

3 December 2008