SEO Fever

Search Engine Optimization
for the World Wide Web

Everything you've always wanted to know about
search engine optimization, but were afraid to ask.

Home>> SEO Knowledge Base >> Inverted Index Basics

Free SEO Tips RSS Feed

Inverted Index Basics

Search engine spiders will gather (crawl) every webpage on the internet by following links from page to page. They then extract all the content from the page and insert the content into their database in a specific location, where it can be easily located.

Each document is converted into a set of word occurrences called hits. The hits record the word, position(s) in document, an approximation of font size, and capitalization. Hit lists account for most of the space used in Google’s inverted index.

Let’s now look at a basic example of an inverted index. Say we’ve got three phrases as follows:

P0: “it is what it is”
P1: “what is it”
P2: “it is a banana”

This can be represented in an inverted index as follows:

where (x, y) can be read as (phrase number, word position in phrase).

"a": (2, 2)
"banana": (2, 3)
"is": (0, 1), (0, 4), (1, 1), (2, 1)
"it": (0, 0), (0, 3), (1, 2), (2, 0)
"what": (0, 2), (1, 0)

Note: word position starts at zero, so the word “what” occupies position 0 in phrase P1 (represented in bold above and diagrammatically below).

Phrase (P1) what is it
Position 0 1 2


For more on inverted indices please visit the following:

Definition of Inverted Index
Building an Inverted Index

19 December 2008