Tuesday, January 26, 2010

Re: does google focus on generating diverse results...?

First of all, my characterization of relevance function in the class is a normative one--I am talking about how we should be doing ranking--not about how any specific IR program--let alone Google--does it.

As for Google, there is no claim/proof/statement on Google's part that they actually take result diversity into account. About the only research paper on Google is from 1998. Most of what Google does currently is not documented anywhere publicly .

That said, it does seem *empirically* like the Google ranking is taking result diversity into account some how. The reason you see two links to "Recent papers from yochan" in the results is easy to explain.
The first one is presented once in the sub-menu--indented below the first result (or as what google calls "site links"), and once again right below. This is because I believe site-links are generated in a completely orthogonal process from the result ranking. Once a particular page is ranked at the top, if it has site links, they are just output.
Here is a link that explains how "site links" work.

(who hopes this won't become a class about search engine optimization ;-)

ps: Here is a link to a research paper that talks about the issues involved in generating diverse results efficiently. We will probably read it/cover it at some point of time: http://www.wsdm2009.org/papers/p5-agrawal.pdf

On Tue, Jan 26, 2010 at 5:27 PM,  <dthiruma@asu.edu> wrote:
Hello Professor,

I have a question. As [you] discussed in the class today, what I had understood  in the concept of relevancy R(d | Q, U, { d1, d2 , d3} ) is, if you have shown a relevant document d1 which is close to d now then it(d) need not be displayed next(instead we can show some other document). It can be achieved by displaying the documents which maximally distinct or clustering

My question is(or rather I was wondering):

When I searched for Kambhampati in google today,

It displays "Recent search from yochan" twice in the first two results(pointing to the same link). Does that mean google does not check for closeness with the already displayed documents? I am confused.

Dananjayan Thirumalai

