Saturday, April 17, 2010

Correlation coefficient viewed as cosine theta measure..

Although I didn't mention it explicitly in the class, perason correlation coefficient can be seen as the
vector similarity between "centered rating vectors"

Suppose the two rating vectors are

[r11 r12 r13 r14]

and

[r21 r22 r23 r24]

Centering means subtracting the mean of the vector from the vector

let r1 be the mean of r11..r14  and r2 be the mean of r21 ..r24

then centered vectors are

[r11-r1  r12-r1 r13-r1 r14-r1]

[r21-r2  r22-r2 r23-r2  r24-r2]

now if you take the cosine theta metric between these two vectors, you get 
dot product divided by the norm of both vectors.

dot product will be of the form  [r11-r1]*[r21-r2]+ ...+[r14-r1]*[r24-r2]

this is the numerator of pearson correlation coefficient.

the norm of the first vector is 
sqrt [( r11-r1)^2+..(r14-r1]^2]

which can also be viewed as the squared variance of the first vector..

QED

Rao

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.