Sunday, May 2, 2010

Re: Question about keyword search on RDF data

I think there are two parts to your question. One is what does it mean to do keyword search on structured data such as RDF. The second is whether keyword queries have to converted into some form of structured queries just to run on RDF (or RDBMS).

This issue has received significant attention in the DB community (recall that RDF can be seen as just a format for Relational data), so let us start there.

Suppose I have a database, and a user gives a keyword query, what tuples in the database should be given as answers to this query? 

The answer is simple if the database has a single table--you just select all the rows that have the keywords in them (modulo some tf/idf extension)

However, most databases are "normalized"--that is, they split a wide "universal tuple" into many small tables. This means that you can have a situation where the keywords are spread over different rows in different tables. [Suppose we have two table [sid, name] and [sid, hobby] (where sid is student-id) A keyword query "rao tennis" will now have to be answered by seeing if there is a join between these two tables over sid that gives a row which has both rao and tennis in it. [The issues remain same whether the database is in RDF format or normal RDBMS one]

So, answering keyword queries will require you to either do arbitrary joins during the query processing stage, or *de-normalize* the database up-front so that you have the full universal relation in front of you (and so can do simple row selection). The former would require more work during query time (in as much as it will force you to rewrite the keyword query into a set of join queries), while the latter kills the structure to support keyword queries. 

The latter approach, as counter-intuitive as it might look for a database person--is actually the one that is used by most search engines that allow keyword access to large-scale databases (in fact, they write each individual universal tuple out as an html file. This process is given a fancy name--"Surfacing of the deep web"). 


The former approach--involving query-time joins to reconstruct universal tuples--started with a system called "BANKS". The problem becomes harder when the primary-key/foreign key joins between the various tables are lost, as might be the case for web databases. See http://rakaposhi.eas.asu.edu/smartint-icde10.pdf



=======
As for the answer to the second question--whether keyword queries get converted to SQL-style queries, "yes" if you use Banks-style approach and "no" if you do surfacing. 


Hope this answers your question.

Rao








On Thu, Apr 29, 2010 at 1:27 PM, Siva N <snatara5@asu.edu> wrote:

Professor,

I was having some questions about supporting search on RDF data..

I understand that RDF/RDFS data defines ontology and more semantics to the data and is much closer to structured data such as in relational DB. So does that mean that all queries to RDF data must be of SQL style queries and IR style keyword search may not be applicable ???

So if we were to try developing a keyword search engine on RDF data, does that have to be something like providing keyword search interface and then internally converting the keywords into SQL style queries and retrieve results from RDF data ??

Is doing plain IR-style keyword search on RDF data does not fully utilizes the structured nature of the data and does ontology based search engines are always best suited for RDF data ??

Thanks,
Siva
--
Graduate Student, MS Computer Science
School of Computing and Informatics
Mobile : 520 582 4479

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.