Saturday, May 29, 2010

(with correct links): Fulton course/instructor evaluations


[The links should work now]


---------- Forwarded message ----------
From: Subbarao Kambhampati <rao@asu.edu>
Date: Sat, May 29, 2010 at 9:15 AM
Subject: Fulton course/instructor evaluations
To: Rao Kambhampati <rao@asu.edu>


Dear all:


I just received the results of the college teaching evaluations that you folks filled in and enjoyed reading them.

Thanks for taking the  time to fill these (the  students who couldn't get around to it are of course expressly excluded from these thanks;
it is hoped that they will be riddled with guilt instead :)

It is my somewhat quixotic custom to allow access to the evaluations to the class students for a limited time. It might give you a feel as to how your individual
views stacked up with the rest of the class. In keeping with it, here are links to the full evaluations--warts and all--in case you are interested:

http://rakaposhi.eas.asu.edu/tmp/494-s10-ug-evals.htm

http://rakaposhi.eas.asu.edu/tmp/494-s10-grad-evals.htm


If you have any other things you need to get off your chest regarding the course, feel free to let me know.

Otherwise, this will hopefully be the last communication on this mailing list.

Wishing you a relaxing Memorial Day weekend
Rao




Fulton course/instructor evaluations

Dear all:


I just received the results of the college teaching evaluations that you folks filled in and enjoyed reading them.

Thanks for taking the  time to fill these (the  students who couldn't get around to it are of course expressly excluded from these thanks;
it is hoped that they will be riddled with guilt instead :)

It is my somewhat quixotic custom to allow access to the evaluations to the class students for a limited time. It might give you a feel as to how your individual
views stacked up with the rest of the class. In keeping with it, here are links to the full evaluations--warts and all--in case you are interested:

http://rakaposhi.eas.asu.edu/tmp/494-s10-ug-evals.htm

http://rakaposhi.eas.asu.edu/tmp/494-s10-grad-evals.htm


If you have any other things you need to get off your chest regarding the course, feel free to let me know.

Otherwise, this will hopefully be the last communication on this mailing list.

Wishing you a relaxing Memorial Day weekend
Rao



Friday, May 21, 2010

Extra credit points...

Just as an FYI, here are the weighted extra credit points (over the four projects, 8.44 percentage points were available for extra credit; here is what
various people accumulated). As I said, the grade cutoffs are set only w.r.t. the regular cumulative (you have to trust me that I am not reading this
very email I am sending ;-)

Rao


==============
post id Extra

8.44

 

 

 

 
0182-085 1.75
2834-383 2.25
4227-658 0.00
1798-265 0.00
0983-622 4.39
8248-578 0.00
8414-781 1.11
0976-610 0.00
4263-811 0.00

 

 
2236-509 7.00
8069-301 3.08
6583-255 5.50
3175-812 5.97
9776-779 5.60
3625-775 5.67
0374-135 5.22
2075-379 5.86
8303-675 5.14
6377-700 6.58
6421-767 6.44
9686-775 2.60
7753-034 3.54
9247-014 5.43
1798-698 4.28
2000-612 4.78
9642-399 5.22
5150-913 6.14
6490-976 0.44
5481-043 2.25
0936-100 0.00

=============================================

Cumulatives--please let me know if you see any discrepancies...

Folks:

 Attached please find a pdf file with the cumulatives in rank order

I used the following procedure to compute the cumulative:

Projects 40pts (out of 100)
Exams   35pts (out of 100)
Homeworks 20pts (out of 100)
participation 5pts (out of 100)

The three project parts and the demo were all taken together to be 10pts each
homeworks were each taken to be 5pts
Exams were both taken to be 17.5pts each
participation grade is a numeric translation of the letter grade (in a 1-5 range).


If you see any discrepancies or have reservations about the way I computed the cumulative, let me know ASAP.

regards
Rao

Participation grade

Here are the participation grades (the letters will be convered to numbers and combined with other scores to get the cumulative--which will be sent in a min)

regards
rao


post id #absences Participation (Qn+blog) Pgrade post id

     

     

     

     

     
0182-085 2 0+1 B+ 0182-085
2834-383 0 1+1 A 2834-383
8414-781     B 8414-781
1798-265 20+ 2+3 D 1798-265
0976-610 0 2+2 A 0976-610
0983-622 4? 5+4 C+ 0983-622
4227-658 1 0+1 C+ 4227-658
8248-578 2 8+0 A- 8248-578
4263-811 6 3+1 C 4263-811

     

     
2236-509 1(1) 1+3 A 2236-509
6583-255 0 0+2 A- 6583-255
8069-301 0 20+0 A+ 8069-301
2075-379 0 3+6 A+ 2075-379
3175-812 1 0+3 A- 3175-812
0374-135 4(1) 1+2 B 0374-135
2000-612 1(1) 6+5 A- 2000-612
9776-779 0 3+2 A- 9776-779
7753-034 0 12+5 A 7753-034
3625-775 1 0+5 A- 3625-775
9247-014 2(1) 10+10 A 9247-014
6377-700 1(1) 30+5 A+ 6377-700
6421-767 0 3+2 A 6421-767
8303-675 0 4+3 A 8303-675
9686-775 1(1) 40+0 A 9686-775
1798-698 2(1) 25+1 A- 1798-698
5150-913 3(3) 1+3 A- 5150-913
9642-399 1 7+0 A- 9642-399
5481-043 0 10+3 A 5481-043
6490-976 3(1) 2+0 B+ 6490-976
0936-100 0 0+5? A 0936-100

Grades for project part 3 and Demo

Here are the grades for project part 3 and demo that the TA sent:

===========
post id Proj 3 P3 Ex Demo Dm Ex

40 10 40 6




















0182-085 32 3 36 4
2834-383 39 3 35 6
8414-781 22 0 30 0
1798-265 19 0 34 0
0976-610 0 0 15 0
0983-622 28.5 6 32 0
4227-658 32 0 32 0
8248-578 29 0 33.5 0
4263-811 1.5 0    










2236-509 40 6 40 6
6583-255 38 0 36 6
8069-301 39 2 40 5
2075-379 40 5 38.5 6
3175-812 35.5 9 40 6
0374-135 30.5 6 40 6
2000-612 32.5 6 33.5 6
9776-779 35 7.5 35 6
7753-034 39 6 34 5.5
3625-775 35.8 6 40 6
9247-014 33.5 3.5 37.5 4
6377-700 28 7 35.5 6
6421-767 38.5 10 34 6
8303-675 33.5 3 35 6
9686-775 37 4.5 36.5 5
1798-698 29 2 35 0
5150-913 32.5 7 35.5 6
9642-399 37.5 6 35 6
5481-043 29 3 29 6
6490-976 31.5 0 36.5 0
0936-100        

Grades for the final exam in CSE 494

Folks:

 I finally completed grading the final. Here are the points:


post id Final

90








0182-085 36
2834-383 24
8414-781 24.5
1798-265 40
0976-610 33
0983-622 30
4227-658 40.5
8248-578 28
4263-811 7.5

=============598Section====





2236-509 77.5
6583-255 66.5
8069-301 66
2075-379 38.5
3175-812 65.5
0374-135 60
2000-612 51
9776-779 74.5
7753-034 51
3625-775 61
9247-014 58.5
6377-700 66.5
6421-767 57
8303-675 61.5
9686-775 59.5
1798-698 63.5
5150-913 40
9642-399 60
5481-043 32.5
6490-976 31
0936-100 32

Tuesday, May 11, 2010

on grade posting time-frame..

Folks:

 Sorry I had to leave in the middle of the exam; hope you survived the rest without much pain.

 I am going to be out of town starting tomorrow and until 18th. I will try to post your grades as soon as I can, but did take permission for late
posting just in case.

regards
rao

Saturday, May 8, 2010

Cheat sheet e: Question regarding the scope of the final exam

Yes. One sheet 8.5x11 both sides

On Saturday, May 8, 2010, asael sorensen <Asael.Sorensen@asu.edu> wrote:
> Did you decide on whether we can use a cheat sheet or not?
> Ace Sorensen
> 602.633.5477
> acylt.com
>
>
> On Sat, May 8, 2010 at 12:57 PM, Subbarao Kambhampati <rao@asu.edu> wrote:
>
> Comprehensive with bias towards post-midterm topics
>
> rao
>
>
> On Sat, May 8, 2010 at 12:14 PM, Jeff Zhang <xiaolong.zhang.1@asu.edu> wrote:
> Hi Dr. Rao:
>
> Someone asked this question in class once, but I forgot how you answered it. So I'm wondering if the final exam will be scoped towards contents after the mid-term or would it be comprehensive?
>
> Thanks,=============================
> Jeff Zhang
> Department of Computer Science Engineering
> Arizona State University
> 699 S. Mill Ave Suite 371
> Tempe, Arizona
> Voice: (480)-208-5675
>
>
>

Re: Question regarding the scope of the final exam

Comprehensive with bias towards post-midterm topics

rao


On Sat, May 8, 2010 at 12:14 PM, Jeff Zhang <xiaolong.zhang.1@asu.edu> wrote:
Hi Dr. Rao:

Someone asked this question in class once, but I forgot how you answered it. So I'm wondering if the final exam will be scoped towards contents after the mid-term or would it be comprehensive?

Thanks,
=============================
Jeff Zhang
Department of Computer Science Engineering
Arizona State University
699 S. Mill Ave Suite 371
Tempe, Arizona
Voice: (480)-208-5675

Wednesday, May 5, 2010

homework 4 solutions posted; acquired wisdom link posted

Folks:

 The solutions for homework 4 are online.

 I also put a link to the blog review of the course content (as posted by you) from the lecture notes section. I have read each one of them, and I would encourage you to do so yourself so you can get an idea of what you may have missed that others seem to have caught (or vice versa)

Rao


ps: Here is an interesting mini-project idea involving collaborative filtering to suggest ideas from course to students
          1. Extract structured record from the postings
          2. make an studen-topic matrix
          3. Use collbaborative filtering to recommend, for each student, one or two topics that they might like because students
               just like them seem to like those topics..


          

internship in IR and data mining with a startup company

Prof. Hasan davulcu (HasanDavulcu@asu.edu) asked me to announce to the class that he has several internships in IR with a startup company that he believes the students of this class might be eligible for, and would be interested in.

If you are interested in finding out more, please send a note to him directly (mail address above)

regards
rao

Monday, May 3, 2010

Participation evaluation sheet

folks:

 Please note that you will need to fill-up and turn-in a hard copy of the enclosed participation evaluation sheet in tomorrow's class (I will bring a few blanks
just in case; but you might be better off filling it at home so you have all the correct stats).

regards
Rao


Sunday, May 2, 2010

Re: Question about keyword search on RDF data

I think there are two parts to your question. One is what does it mean to do keyword search on structured data such as RDF. The second is whether keyword queries have to converted into some form of structured queries just to run on RDF (or RDBMS).

This issue has received significant attention in the DB community (recall that RDF can be seen as just a format for Relational data), so let us start there.

Suppose I have a database, and a user gives a keyword query, what tuples in the database should be given as answers to this query? 

The answer is simple if the database has a single table--you just select all the rows that have the keywords in them (modulo some tf/idf extension)

However, most databases are "normalized"--that is, they split a wide "universal tuple" into many small tables. This means that you can have a situation where the keywords are spread over different rows in different tables. [Suppose we have two table [sid, name] and [sid, hobby] (where sid is student-id) A keyword query "rao tennis" will now have to be answered by seeing if there is a join between these two tables over sid that gives a row which has both rao and tennis in it. [The issues remain same whether the database is in RDF format or normal RDBMS one]

So, answering keyword queries will require you to either do arbitrary joins during the query processing stage, or *de-normalize* the database up-front so that you have the full universal relation in front of you (and so can do simple row selection). The former would require more work during query time (in as much as it will force you to rewrite the keyword query into a set of join queries), while the latter kills the structure to support keyword queries. 

The latter approach, as counter-intuitive as it might look for a database person--is actually the one that is used by most search engines that allow keyword access to large-scale databases (in fact, they write each individual universal tuple out as an html file. This process is given a fancy name--"Surfacing of the deep web"). 


The former approach--involving query-time joins to reconstruct universal tuples--started with a system called "BANKS". The problem becomes harder when the primary-key/foreign key joins between the various tables are lost, as might be the case for web databases. See http://rakaposhi.eas.asu.edu/smartint-icde10.pdf



=======
As for the answer to the second question--whether keyword queries get converted to SQL-style queries, "yes" if you use Banks-style approach and "no" if you do surfacing. 


Hope this answers your question.

Rao








On Thu, Apr 29, 2010 at 1:27 PM, Siva N <snatara5@asu.edu> wrote:

Professor,

I was having some questions about supporting search on RDF data..

I understand that RDF/RDFS data defines ontology and more semantics to the data and is much closer to structured data such as in relational DB. So does that mean that all queries to RDF data must be of SQL style queries and IR style keyword search may not be applicable ???

So if we were to try developing a keyword search engine on RDF data, does that have to be something like providing keyword search interface and then internally converting the keywords into SQL style queries and retrieve results from RDF data ??

Is doing plain IR-style keyword search on RDF data does not fully utilizes the structured nature of the data and does ontology based search engines are always best suited for RDF data ??

Thanks,
Siva
--
Graduate Student, MS Computer Science
School of Computing and Informatics
Mobile : 520 582 4479

Friday, April 30, 2010

on the extra class and the tuesday's class

Thanks to all the students who showed up for the information extraction lecture today.  It would certainly not have been any fun lecturing to the walls ;-)

To those who couldn't make it, the video as well as audio are available on the class page. I hope you will make time to watch or hear it. As I said, IE is one of the more active topics of research now, and I would hate for you not to know any entry into it--especially given that all you have done to-date provides you a pretty good infrastructure.

For Tuesday's last class, I plan to spend the first 45min or so discussing some issues in "information integration" and then then doing  a quick wrapup.

The slides for the information integration are on line, but  are likely to change by the class time.

regards
Rao



Thursday, April 29, 2010

Clarification regarding tomorrow's "extra class"

There was a question today as to whether the material to be covered in tomorrow's extra class will be "on the test."

My flippant response was possibly driven by the fact that  I am always freshly surprised to find out that there actually 
might be students taking one of my courses with an eye towards  their GPA (clearly, ASU student corporate memory/social networking
isn't all that it is cracked up to be :-).

Thinking about it again, I can imagine some students being unduly inconvenienced by having to keep up with an extra meeting that they haven't bargained for, in the last week of classes. 

So, I will certify that the material covered in tomorrow's lecture will not be "on the test". 


cheers
Rao

Apology to the student who asked the "transitive relation" question

To the student who asked the question about "transitive relation" in semantic web:

 My sincere apologies for my poor choice of words in my first response. As I clarified, all I meant to tell you was to hold on a minute as the answer to your question is going to be the center of the discussion that was to follow immediately. No disrespect was meant either for the question (which, as I repeatedly came back to as an important one) or the questioner.

regards
Rao "Thank God I am not Gordon Brown running for re-election" Kambhampati

Re: Tomorrow's class will be in BYAC 240

Oh--just to clarify, the class will start at 10:30 and will end at 11:45 (like normal classes). The room is reserved for longer periord just to allow you to get in ahead of time.

Rao


On Thu, Apr 29, 2010 at 12:02 PM, Subbarao Kambhampati <rao@asu.edu> wrote:
Folks:

 The "extra class" tomorrow will be in BYAC 240 (i.e., the same class room building but in the second floor).

See you there (hopefully). Just in case it might help clinch the deal, let me go ahead and announce that there will likely be donuts ;-)

rao


Breakout session - Friday, April 30th, 2010 10:15 AM to 11:45 AM in BYAC, 240

Room reserved for : 10:05 AM to 11:55 AM
 

 



Tomorrow's class will be in BYAC 240

Folks:

 The "extra class" tomorrow will be in BYAC 240 (i.e., the same class room building but in the second floor).

See you there (hopefully). Just in case it might help clinch the deal, let me go ahead and announce that there will likely be donuts ;-)

rao


Breakout session - Friday, April 30th, 2010 10:15 AM to 11:45 AM in BYAC, 240

Room reserved for : 10:05 AM to 11:55 AM
 

 


Wednesday, April 28, 2010

a clarification regarding blog comments (especially to thinking caps of yore)

Folks:
 
 There seems to be a sudden new found interest in responding to the "thinking cap" questions from many moons ago. I can't be sure of its origins, but one explanation may be  that  people are thinking they need to beef up their participation credit ;-)

 I would like to clarify  that I only follow the thinking cap responses in the short window surrounding their posting. For now, the two threads I will be following are in response to the two messages I posted yesterday (the "what did you like" question and the "what would you recommend from WWW 2010" question).

regards
rao



 

Current cumulatives

Folks

 Here are the cumulatives that take project 2 and homework 4 marks into account.

Still to come:

 --> Project 3 and Demo points
 --> participation points
 --> Final


rao

Tuesday, April 27, 2010

Assignment: Recommend a WWW 2010 paper to your classmates...[Due by 5/11--post to the blog]


So NPR has this nifty segment called "You must read this" (http://www.npr.org/templates/story/story.php?storyId=5432412 ) which gets writers and authors to recommend books that they think others should read.

Your last "homework" assignment for this course is patterned after it.

Here is the assignment:

================
1. Look at the papers being presented at the World Wide Web Conference this week
(the program --along with most of the pdf files--is available at  http://www2010.org/www/program/papers/ ; if you are interested in a particular paper but the pdf is not available, you can probably google the authors' pages--technical paper authors tend to be a narcissistic bunch and will put every paper up on their web page as soon as it is accepted ;-) 

2. Check out the "abstracts" of the papers whose titles seem interesting to you based broadly on the aims of this course.

3. If you like the abstract, try reading the introduction (optional, but recommended).

4. By 5/11, post a short comment in response to this article  giving

    4.1 paper title and link to its pdf
    4.2. why you would like to read it and/or why you think others in the class should read it
     4.3. how the paper is connected to what we have done in the course (you could also phrase this as a recommendation
            "if you liked that power iteration discussion, you will probably like this paper as it gives ways to speed the       
               convergence" 

    (your inputs to 4.2 and 4.3 can be interleaved).


===========

Here is the rationale for the assignment--unlike Physics 101, after which you don't expect to be able to read the state of the art papers,  this course is about an area that is very much recent and in progress (recall the farside neanderthal archeologist cartoon..). So,  you actually
do have a shot of understanding the directions of most working being done at the state-of-the-art (and in some cases even understand their contribution).

Rather than ask you to take this assertion at face value, I would like to encourage you to "do it" and thus "see it to believe it" as it were ;-) [Plus, this is a rather cheap way for me to figure out which WWW papers to read.]

Rao







Mandatory Blog qn (You should post your answer as a comment to this thread )

Here is the "interactive review" question that I promised.

"List five nontrivial ideas you came to appreciate during the course of this semester"
(You cannot list generic statements like "I thought pagerank was cool".  Any cross-topic connections
you saw are particularly welcome. )


Note that you must post your response as a comment to this particular message.
All comments must be done by the end of 4th May (next Tuesday--last class).

If you have forgotten what we have done, see the class home page for a description.

Rao



Project demo time slots

Hello everyone,
  The project demos will be held in the week of May 4th, Tuesday through Friday. Slots are available on a first come, first serve basis.
 
How to reserve your slot:
2. Find a slot that hasn't already been taken (Count == 0)
3. Click the checkbox for that slot, write your name and click 'Save'
 
On the day of the demo:
1. Bring your laptop with your code and program on it, or a link to the webserver where your program is running, or if nothing else works, your code and program on a flash drive.
2. Be prepared to answer questions regarding the implementation (did you use sparse arrays? did you cache the document norms in a file?)
3. Be prepared to run arbitrary queries on your system using any of the methods you have implemented.
 
If you absolutely cannot make it on any of the time slots, email me so we can work this out.

Thanks and Regards,
Sushovan De

A poll on a proposed extra class on Friday

Folks:

 I seem to have gone slower than expected and at the current rate, I am not going to be able to do justice to information integration and information extraction topics in the remaining two classes. So, I am contemplating holding an extra class meeting on Friday. Please vote on the poll below so I can get a sense of your availability. I will use that to decide whether and what time to hold the class. [Please vote wednesday night so I can make arrangements and announce by Thursday's class.]

regards
Rao

http://www.misterpoll.com/polls/483082


Monday, April 19, 2010

[Thinking Cap] on collaborative filtering...

Qn 1. We considered using the user-item matrix to find most similar users to the current user, and use them to predict the rating of new items for the current user. What if we decided instead to focus on items, and figure out items that seem to be most "similar" to each other in terms of the users who buy it.
   1.a. What techniques, that we have already seen, can be used to find the items most similar to each other?
    1.b. If you use those techniques to find, for each item, k most similar items, then how can we use this information to do ratings prediction for the current user on an item she hasn't yet rated?
      1.c. how does this item-centered analysis compare to user-centered one in terms of justifying the recommendations?

Qn 2. the User-item matrix has some similarities to the document-term matrix. In the case of d-t matrix, we saw that latent semantic indexing can be quite useful in finding "latent regularities" (e..g the "size" factor of the fishes, or the "database-oriented" vs. "statistics oriented" factors of the db example).
  2.a. Do you think LSI style analysis could be useful in improving collaborative filtering?
  2.b. What are some of the most immediate problems you run into when you try to do LSI-style analysis on user-item matrix (as against d-t matrix)?

Rao
-----------

Optional question: What does the following parable have to do with a CSE494/598 student who keeps asking the instructor "is there any way I can get extra credit to improve my grade?" ;-)

A man stood on top of his roof in the middle of one of the worst floods ever recorded. The water was rising around his feet so he looked up at the black sky and prayed. "Mighty God please save me!" An hour passed and the water was already over his feet when a man in a fishing boat came and called for him to get in. The man refused saying. "No, I am waiting for God to save me." The man in the boat waited as long as he could then left the man on his roof. Standing soaked on the roof as the waters rose he stared at the sky waiting for God to save him. Then a rescue boat pulled up beside his house and threw him a rope but again the man refused. "No, I am waiting for God to save me" Hours passed and the man had to cling to his chimny to avoid being washed away. He thought he was going to drown when he saw a bright light overhead he though God had come to save him this time. Then he heard a loudspeeker and realized it was a hellicopter. It dropped a ladder but again the man refused. "No! God will save me!" the man clung to the chimney until the water was over his head and he drowned. A moment later he stood before God weeping. "Mighty God, I prayed and prayed why didnt you save me?"
God said "YOW. I sent you a fishing boat, a rescue boat *and* a helecopter to save you. What the heck are you doing here?"

Sunday, April 18, 2010

a more useful paper on netflix prize (and how LSI-like techniques helped the winning team)

I added the following paper to the collaborative filtering readings:

   http://rakaposhi.eas.asu.edu/cse494/lsi-for-collab-filtering.pdf

This is a paper from IEEE Computer last year, written by Yehuda Koren--the same guy who wrote the CACM paper. Unlike the CACM paper that focuses just on temporal concept drift--that helped them vault over the last little percentage--this one talks about LSI-like techniques for doing collborative filtering, and how they allow him to combine multiple sources of feedback.

Certainly worth reading as most of the ideas he is using are in the epsilon-neighborhood of things we have covered in the class (and thus should make you feel warm and fuzzy about your knowledge state ;-)

rao

Saturday, April 17, 2010

Correlation coefficient viewed as cosine theta measure..

Although I didn't mention it explicitly in the class, perason correlation coefficient can be seen as the
vector similarity between "centered rating vectors"

Suppose the two rating vectors are

[r11 r12 r13 r14]

and

[r21 r22 r23 r24]

Centering means subtracting the mean of the vector from the vector

let r1 be the mean of r11..r14  and r2 be the mean of r21 ..r24

then centered vectors are

[r11-r1  r12-r1 r13-r1 r14-r1]

[r21-r2  r22-r2 r23-r2  r24-r2]

now if you take the cosine theta metric between these two vectors, you get 
dot product divided by the norm of both vectors.

dot product will be of the form  [r11-r1]*[r21-r2]+ ...+[r14-r1]*[r24-r2]

this is the numerator of pearson correlation coefficient.

the norm of the first vector is 
sqrt [( r11-r1)^2+..(r14-r1]^2]

which can also be viewed as the squared variance of the first vector..

QED

Rao

Friday, April 16, 2010

Re: An article on collaborative filtering sensitive to changing user preferences..

oh--by the way--I forgot to mention that this approach also won the netflix prize..

rao


On Fri, Apr 16, 2010 at 10:59 AM, Subbarao Kambhampati <rao@asu.edu> wrote:
Here is an optional reading on collaborative filtering under temporally evolving user preferences, that came out in this month's CACM.

It may be useful for those of you curious about the current status on the collaborative filtering research:

http://rakaposhi.eas.asu.edu/cse494/cacm-temporal-collab-filtering.pdf

Rao


An article on collaborative filtering sensitive to changing user preferences..

Here is an optional reading on collaborative filtering under temporally evolving user preferences, that came out in this month's CACM.

It may be useful for those of you curious about the current status on the collaborative filtering research:

http://rakaposhi.eas.asu.edu/cse494/cacm-temporal-collab-filtering.pdf

Rao

Thursday, April 15, 2010

Required reading for next class: Database refresher

Folks:

 The next five classes will be about getting structure and exploiting it during search. Much of this material straddles IR one one end--where we assume no structure; and databases on the other--where we assume data is fully structured. It will thus be very helpful to have some rudimentary background in relational databases.

If you haven't taken a course in databases, you might want to look at the following set of "refresher" slides for databases--so you have enough familiarity with the concepts.

http://rakaposhi.eas.asu.edu/cse494/notes/s04-DBReview-ullas.ppt

Rao

Tuesday, April 13, 2010

Clarification on the "rand-index" based external cluster evaluation

Folks:

 In talking to a student, I realized I didn't make something clear in describing the rand-index based external cluster evaluation method.

Rand-Index classifies every pair of entities e1,e2 into four categories

[in-the-same-ground-truth-cluster, in-the-same-generated-cluster]  A
[in-the-same-ground-truth-cluster, in-different-generated-clusters]  C
[in-different-ground-truth-clusters, in-the-same-generated-cluster]  B
[in-different-ground-truth-clusters, in-different-generated-clusters]  D

If A,B,C,D are the number in each class, then A+B+C+D will be n*(n-1)/2 (which is the number of pairs over n entities).


I modified the slide to make this point clear..

Rao

The napolean dynamite problem

check this New York Times  article for some background on collaborative filtering and netflix prize..

http://rakaposhi.eas.asu.edu/f08-cse494-mailarchive/msg00084.html

Monday, April 12, 2010

Current snapshot of the grade book

Folks:

 Attached please find the current snapshot of the 494/598 grade book (posted by your *posting-id*--this is not your student id). It contains the grades for three home works, project 1 and mid-term. I will return homework 3 and midterm in class tomorrow. 

To give you an idea of your relative standing, I also put in total and percentage columns. The total is calculated out of 45--
with 10pts for project 1, 5 points each for the homeworks, and 20 points for mid-term. Note that these weights "approximate" and subject to change. Note that the extra credit is shown but is *not* included in the total.

The percentage column basically converts the raw score out of 45 into a percentage.

The first part--in light green--is the 494 section and the second part in the light red is the 598 section.

Please let me or the TA know if there are any discrepancies you find.

If you have any concerns about your performance please talk to me.

regards
Rao


Thursday, April 8, 2010

mid-term grading and course withdrawal deadline

Folks:

 I was hoping to be done with the mid-term grading by today so you would have that information by 4/9-11 which apparently is the course withdrawal period. I unfortunately didn't manage to complete it.

 I think most students probably made the "drop/stay" decision already, but  if you are one of those who is waiting to make a decision based on your midterm score, please do let me know and I will try to get you taht information before 4/11.

regards
rao

Homework 4 Released--will be due April 20th in class

I released homework 4 on clustering, classification and recommendation systems. These will be due on 20th April.
It contains a question on collaborative filtering--that will be covered next Tuesday.

(I know I said the last homework will be due the end of the semester--but I decided to split it into two parts, and make the first part be due 4/20 so you can get it back by the end of the semester).

Rao

Wednesday, April 7, 2010

Project part 3 released

It is now available from the projects page.

Please note that the deadline is being put on the last day of classes. This will be a non-negotiable deadline. Please plan to make sure you are done by then.

You will also be required to show a demo of all three parts sometime the week of May 4th. The TA will give out slot sign-up sheets.  Note that the demo marks are not just part of project part 3, but rather will be added to the cumulative--so those who didn't complete project part 1 or part 2 in time will get the demo-credit for those parts as long they are running by the demo time.

Rao

Tuesday, April 6, 2010

[Thinking Cap--Easter Resurrection] on Classification/clustering

Comment on this on the blog

At the end of today's class, we saw that classification is in some sense a pretty easy extension of clustering--training data with different labels can be seen to be making up the different clusters. When test data comes, we just need to figure out which cluster it is closest to and assign it the label of that cluster. 

1. If classification is so darned straightforward, how come the whole entire field of machine learning is obsessed with more and more approaches for classification? What can be possibly wrong with the straightforward one we outlined? Can you list any problems our simple approach can run into? (Alternately, it is fine to just decide that Jieping Ye and Huan Liu cannot leave good enough alone... :-)

2. If you listed some problems in 1 (as against casting aspersions on Ye and Liu), then can you comment on the ramifications of those problems on clustering itself? Or is it that clustering is still pretty fine as it is?

rao

Sampling and Census... (according to Mr. Willis of Ohio )

Check this out for some context on the idea of using sampling in census (you can say I was almost channeling this in today's class; since as I told you my spring break involved watching west wing episodes that I bought for my son ;-)

  http://www.youtube.com/watch?v=be4DW_wHYX4&feature=PlayList&p=3C7EC3C175A19009&playnext_from=PL&playnext=1&index=32


Monday, April 5, 2010

An idf solution to the project 2 deadline

Some people have asked for an extension for the project. (Based on the mails I am getting, it is also possible that there is a bit of "behind the scenes organization" going on--since the frequency of mails has increased in the last couple of hours).

In extending deadlines at the last minute, my concern always is for those students who probably re-organized their lives to keep up with the announced deadline.  To make sure that these people do not feel penalized, I will do the following:  You can submit the project on Thursday with a flat 20% late penalty (this may look steep compared to zero penalty, but you should compare it to the "infinite" penalty implicit in the concept of "deadline"  and feel blessed ;-).

(If it is indeed the case that the whole class needs the two extra days, then all of you get 20% penalty--and the idf of the penalty will be zero.. )

Please note that projects are due at the *beginning* of the class; resist the temptation to skip class and give the project at the end of the day..

regards
Rao

ps: As I mentioned, you will still get partial credit for the earlier parts if you get all parts working by the final deadline.

Thursday, April 1, 2010

Fwd: On the Topeka Pagerank questions..


Some of you were confused as to how Topeka pagerank compares to the traditional one.
 
The important thing to note is that  Topeka Pagerank first separates web pages into smaller partitions based on their
color and then does equal amount of processing on each of them. It is thus a significantly more efficient computation--since for
each partition, we have a much smaller transition matrix M. The convergence is fast, but the final stationary page rank can be significantly different from the traditional one
(in particular, the separate but equal processing can sometimes prevent pages in some partitions from reaching their full importance--as the other partitions effectively act as rank sinks ). 

regards
rao


Sunday, March 28, 2010

Laptop for the mid-terms

Just wondering, does "open book, open notes" mean we can bring our laptop, as long as we disable the wireless?

J

Saturday, March 27, 2010

Reviewing topics covered

Here is a hint: For each class, along with the audio links, I had put a reasonably verbose description of topics covered in that class. If you read those descriptions, that should give you a "summary/review" of what happened.

For example, here is the description of what happened in Lecture 10--directly from the notes page

L10 Audio of [Feb 18, 2010] (***THE ADVANCED LSI LECTURE***) Lengthy recap of LSI analysis; discussion of class blog questions; discussion of relation between LSI and feature selection (feature selection looks for just subspaces of existing space); LSI and LDA (LDA takes the "class" information into account LSI doesn't know what classification you might be attempting); LSI and nonlinear dimensionality reduction (one idea is to first blow up the dimensionality and then find a lower-dimensional hyperplane in this blown up space. So, you might go from 20-D to 300-D and come down to 4-D).

solutions to homework 3 posted

The solutions to homework 3 are posted and are available at
http://rakaposhi.eas.asu.edu/cse494/home-work.html

(I didn't put up separate solutions for the google paper, but gave a link to a class discussion from an earlier offering on that paper).

Rao

Thursday, March 25, 2010

voice of people is voice of dog

We will go with the majority opinion re: midterm.

cheers
rao

"decisions are made by those who show up"...

Tuesday, March 16, 2010

Re: (*Correction*) Important: Schedule for midterm

Yes, you are right. I changed the URL in the homework problem.

Rao


On Tue, Mar 16, 2010 at 10:02 AM, Kimberly  wrote:
Professor Rao,

On part 2 of question 3 on the homework the website listed doesn't work. I think that it is: http://academic.research.microsoft.com/CSDirectory/Author_category_5.htm

-Kim Bontrager


On Fri, Mar 12, 2010 at 8:54 PM, Subbarao Kambhampati <rao@asu.edu> wrote:

[[Looks like I was not looking at the calendar carefully when I typed this mail--it would be reasonably hard for me to survive your collective ire if
you have to  come back in the middle of your  spring-break revelry to submit the homework. So here are the new dates:

Homework 3 due: 25th March
Exam:  30th March
Project part 2 Due date:  *shifted* to 6th April

Looking ahead, project part 3 will be due by 4th May, the last class, and will have to be demonstrated to the TA in that week.


regards
Rao



Sunday, March 14, 2010

Dr. Rao : Think Cap!...

Dear Dr. Rao

In page rank can we make a node as sink that is having "n" inbound links but one outbound link to itself?
What is A+Z matrix for that?
I figured out that If there is any node with only one link to itself then we have to omit 1 in the column of A corresponding to that node.

But in Hub/Auth we can't omit that 1.

In page rank we have to omit that 1 otherwise A+Z will not be a stochastic matrix.
Page rank is independent of self links.

Is my inference correct?

Friday, March 12, 2010

(*Correction*) Important: Schedule for midterm


[[Looks like I was not looking at the calendar carefully when I typed this mail--it would be reasonably hard for me to survive your collective ire if
you have to  come back in the middle of your  spring-break revelry to submit the homework. So here are the new dates:

Homework 3 due: 25th March
Exam:  30th March
Project part 2 Due date:  *shifted* to 6th April

Looking ahead, project part 3 will be due by 4th May, the last class, and will have to be demonstrated to the TA in that week.


regards
Rao

Important: Schedule for midterm

Folks:

 We will have mid-term on 25th March. All the topics we have covered until spring-break (i.e, including social networks), will be on midterm.

An example midterm can be found at http://rakaposhi.eas.asu.edu/s07-specimen-exam.pdf


Homework 3 will be due on 18th March.  I have added a question on social networks to the homework.
(Please note that the graded homework 3 may not be available by the time of the exam; solutions will however be posted)

Please let me know if you have any questions/concerns.

regards
rao

Points adjustments for 'algorithm description' for Project Part 1

Hi everyone,
  A couple of you have expressed concerns about the points you lost on the 'algorithm description' part of the project 1. I am persuaded that perhaps the problem specification was not clear enough in specifying what is required. Since others may also have lost points on this part, I would like to make class-wide amends.
  If you are one of the students whose project code was working, but did not get full points on the algorithm description part of the project, please contact me with your project so I can adjust your points.

 

Regards,
Sushovan
 

PS: The project part 2 has been updated with a section that tells you how many points each task of the project is worth.


Saturday, March 6, 2010

Project Part 1 - sample output

Some of you wanted to verify whether the output of your code was correct. Here are some sample outputs that I got, and my assumptions. If you used different assumptions, you would get different results, but it doesn't mean that your results are wrong.

Assumptions:
TF value is not normalized
IDF value is not normalized
in the TF-based similarity, all query words are given equal weight
in the TF-IDF based similarity, the query words are weight by IDF alone, not (0.5 + 0.5*tf)*idf [slide 62]

TF-based similarity

query> fall semester
[22932]result3/www.asu.edu/registrar/general/semcal.html: 0.3900947488027469
[22978]result3/www.asu.edu/sbs/paying_semester_selection.html: 0.28127197523150593
[3765]result3/www.asu.edu/fs/advantage/com/308-12.htm: 0.27980420556019375
[14669]result3/www.asu.edu/lib/archives/archive.htm: 0.2644429426739725
[22980]result3/www.asu.edu/sbs/tuition_semester_selection.html: 0.25993762245501817
[871]result3/www.asu.edu/aad/manuals/usi/usi301-02.html: 0.2434322477800738
[991]result3/www.asu.edu/admissions/nondegree/screen5.html: 0.23947373603569988
[1894]result3/www.asu.edu/clas/reesc/cli/yerevan2.htm: 0.23416772415725648

query> transcripts
[22956]result3/www.asu.edu/registrar/transcripts/index.html: 0.3460520089727601
[1055]result3/www.asu.edu/admissions/steps/transcripts.html: 0.26490647141300877
[4145]result3/www.asu.edu/graduate/generalinfo/DeMund/index.html: 0.10283867552865913
[3930]result3/www.asu.edu/graduate/admissions/domestic.html: 0.10173145278829236
[16633]result3/www.asu.edu/lib/archives/shema/shema.htm: 0.09531918345647832
[3871]result3/www.asu.edu/graduate/SEM/admissions/requirements.htm: 0.08987062325802277
[22928]result3/www.asu.edu/registrar/general/contacts.html: 0.08728715609439695

query> admissions
[3759]result3/www.asu.edu/forms/index.html: 0.4375949744936837
[3758]result3/www.asu.edu/forms/forms_adm.htm: 0.41178049395232924
[964]result3/www.asu.edu/admissions/international/index.html: 0.35909242322980395
[962]result3/www.asu.edu/admissions/international/expenses.html: 0.33409177193234335
[936]result3/www.asu.edu/admissions/contact/familymember.html: 0.31983816366946766
[22446]result3/www.asu.edu/provost/committees/FAS.html: 0.30983866769659335
[935]result3/www.asu.edu/admissions/contact/counselor.html: 0.2950148263727627
[937]result3/www.asu.edu/admissions/contact/freshman.html: 0.28966206101082875


TF-IDF based similarity:

query> fall semester
[22932]result3/www.asu.edu/registrar/general/semcal.html: 0.4359373503377644
[14669]result3/www.asu.edu/lib/archives/archive.htm: 0.2871305669297931
[991]result3/www.asu.edu/admissions/nondegree/screen5.html: 0.27532125556740966
[22978]result3/www.asu.edu/sbs/paying_semester_selection.html: 0.26913921652871753
[22980]result3/www.asu.edu/sbs/tuition_semester_selection.html: 0.2601997953589357
[3765]result3/www.asu.edu/fs/advantage/com/308-12.htm: 0.24906102399484592

query> admissions
[3758]result3/www.asu.edu/forms/forms_adm.htm: 0.5744080507440753
[938]result3/www.asu.edu/admissions/contact/index.html: 0.5031740605651273
[964]result3/www.asu.edu/admissions/international/index.html: 0.43059540671762864
[936]result3/www.asu.edu/admissions/contact/familymember.html: 0.4172350134375292
[963]result3/www.asu.edu/admissions/international/faq.html: 0.40879880279063435
[962]result3/www.asu.edu/admissions/international/expenses.html: 0.38631455298928125
[3759]result3/www.asu.edu/forms/index.html: 0.38614177431456176
[937]result3/www.asu.edu/admissions/contact/freshman.html: 0.38147065345148296

query> transcripts
[22956]result3/www.asu.edu/registrar/transcripts/index.html: 0.5123337680008732
[1055]result3/www.asu.edu/admissions/steps/transcripts.html: 0.47577601236762335
[3930]result3/www.asu.edu/graduate/admissions/domestic.html: 0.18180264245956912
[3871]result3/www.asu.edu/graduate/SEM/admissions/requirements.htm: 0.15646994303347006
[1051]result3/www.asu.edu/admissions/steps/printsteps.html: 0.1554359618428368
[3834]result3/www.asu.edu/graduate/SEM/8admiss.htm: 0.14754024695739043


Thanks and Regards,
Sushovan De

Regarding project part 1 specimen/solution code...

Some of you have asked for "specimen" code for project part 1 (to see how yours compares..)

Separately, some others have wondered how their inability to complete project part 1 fully  is going to affect their project grade.
I had informed this later group that since the project is a cascaded one, as long as all the parts work by the final (i.e., part 3) deadline, and can be demonstrated to the TA, they will get (partial) credit for any earlier incomplete efforts.

Given this, we cannot post the "specimen code" for the entire class. If you would rather get the part 1 code so you can extend it to part 2 (and have no intentions of claiming partial credit later for part 1), please let us know and we can deal with that on a case-by-case basis.

Hope this clarification helps.

regards
Rao

Thursday, March 4, 2010

ONN on Google opt-out feature

Here is the video I mentioned ;-)

http://www.theonion.com/content/video/google_opt_out_feature_lets_users?utm_source=videoembed

rao

aardvark paper

is available on the readings for the social networks part of the course..

You can also get some more current gossip about it at
http://googleblog.blogspot.com/2010/02/google-acquires-aardvark.html

rao

Attempts to finitize my bacon number

So one of the hardy souls that took part in the class survey wondered if it is possible to get "videos" of the lectures (since apparently audios are not
giving the full color picture as it were).

Now, I have always been a fan of full class automation (as ably depicted in this Real Genius scene (start at 3min 10sec)  http://www.youtube.com/watch?v=CPdWmpMK64o&feature=related ).

So, I got one of 'em flip video recorders--and with the able cinematography our own Abbas(i Kiarostami ;-) got the lecture recorded today. They are available online on the notes page. Let me know if you find them useful. (Since I haven't yet figured out how to stream them, you are currently limited to downloading them).

Rao

ps: Since I acted as an instructor in a class where Kevin Bacon appeared on one of the slides, I am hoping this will bump my bacon number..

The talk on network robustness I mentioned... Fwd: UPDATE: Seminar: Robust Performance of Networked Systems in Adverse and Uncertain Environments: March 8, 2010: 3:00pm - 4:00pm

You can go to this talk on Monday, and learn more basis stuff about the talk on Tu/Th

rao


---------- Forwarded message ----------
From: Audrey Avant <Audrey.Avant@asu.edu>
Date: Thu, Mar 4, 2010 at 10:40 AM
Subject: UPDATE: Seminar: Robust Performance of Networked Systems in Adverse and Uncertain Environments: March 8, 2010: 3:00pm - 4:00pm
To:


Please note change in time (3:00pm – 4:00pm).

Thank you~

 

SCHOOL OF COMPUTING, INFORMATICS, AND DECISION SYSTEMS ENGINEERING

engineering.asu.edu/cidse

PO Box 878809

Tempe, AZ   85287-8809

Ph:   480.965.3190

FAX: 480.965.

 

SEMINAR

Robust Performance of Networked Systems in Adverse and Uncertain Environments

Monday, March 8, 2010

3:00pm – 4:00pm

BYENG 455

 

Dr. Vladimir L. Boginski

University of Florida Research and Engineering Education Facility (UF-REEF), Shalimar, FL

 

 

Abstract:

 

Networked systems (e.g., communication/sensor networks) play a crucial role in many military and civilian tasks nowadays. Clearly, robust and efficient design and functionality of these networked systems would provide superior capabilities in collecting, processing, and communicating various types of information between system components. We address the aspects of robust performance of communication/sensor networks and other types of networks in terms of designing and identifying robust networks clusters. The important task that needs to be addressed is the ability to adequately respond to potential disruption/failure threats that may affect the efficiency of networked systems. These threats may be of various origins (e.g., enemy attacks); moreover, they are often uncertain by nature. We attempt to identify robust optimal strategies that take into account these factors and ensure the overall efficiency of networked systems under these conditions.

 

Bio:

 

Dr. Vladimir L. Boginski is a faculty member at the University of Florida Research and Engineering Education Facility (UF-REEF) located in Shalimar, FL. He holds a PhD degree in Industrial and Systems Engineering from the University of Florida (Gainesville, FL). He has conducted a number of research projects in the areas of network-based modeling and optimization, as well as data mining applications. His areas of interest are rather diverse. He has successfully conducted research in network-based modeling and optimization in communication/sensor networks, biological networks, financial markets, and social/collaboration networks, as well as predictive modeling data mining techniques in medicine, biochemistry, and military applications. His current research emphasis (in collaboration with the Air Force Research Laboratory) is on addressing uncertainty and robustness issues in the performance of networked systems by taking into account possible component failures, and developing optimal strategies for minimizing the negative impacts of these factors. His research has been sponsored by the U.S. Department of Defense (AFRL/AFOSR, DTRA), the U.S.

Department of Energy, and the National Science Foundation. He is also a recent recipient of the DTRA Young Investigator Award.

 

Contact Arunabha Sen (arunabha.sen@asu.edu) for any questions you may have or if you'd like to schedule a meeting with Dr. Boginski.