10.1.1.44.6984

10.1.1.44.6984 - Laten tSem an t icIndex ing:AP robab ilist icA na ly s is Ch r istosH.P apad im itr iou C om pu terSc ienceD iv is ion,U.C.B erke

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Laten tSem an t icIndex ing :AP robab ilist icA na ly s is Ch r istosH .P apad im itr iou C om pu terSc ienceD iv is ion ,U .C .B erke ley ch r istos@ cs .berke ley .edu P rabhakarR aghavan IBMA lm adenR esearchC en ter p ragh@ a lm aden .ibm .com H isaoT am ak i C om pu terSc ienceD epartm en t ,M e ijiU n ivers ity tam ak i@ cs .m e iji.ac .jp San toshV em pa la D epartm en to fM athem at ics ,M .I.T . vem pa [email protected] ath .m it .edu D ecem ber, A b stract Laten tsem an t icindex ing(LS I)isan in form at ionretr ieva ltechn iquebasedonthespectra l ana ly s iso ftheterm-docum en tm atr ix ,whoseem p ir ica lsuccesshadhereto forebeenw ithou t r igorou sp red ict ionandexp lanat ion .W ep rovethat ,undercerta incond it ion s ,LS Idoessucceed incap tu r ingtheunder ly ingsem an t icso fthecorpu sandach ievesim p rovedretr ieva lperform ance . W ea lsop roposethetechn iqueo f random pro jec tion asaw ayo fspeed ingupLS I.W ecom p lem en t ou rtheorem sw ithencou rag ingexper im en ta lresu lts .W ea lsoarguethatou rresu ltsm aybe v iew ed inam oregenera lfram ew ork ,asatheoret ica lbas isfortheu seo fspectra lm ethod sina w iderc lasso fapp licat ion ssuchasco llaborat ivelter ing . In troduct ion T hee ldo fin form at ionretr ieva lhastrad it iona llybeencon s ideredou ts idethescopeo fdatabasethe- ory .W h iledatabasetheorydea lsw ithquer iesthatarep rec isep red icates(theso-ca lled/em p loyee- m anager-sa laryparad igm ") ,in in form at ionretr ieva lw ehavetherathernebu lou sand ill-dened concep to f/re levance" ,w h ichdepend sin in tr icatew ay sonthein ten to ftheu serandthenatu re o fthecorpu s .Ev iden t ly ,very litt letheorycanbebu iltonth isbas is .(See[, ]forsu rvey son in form at ionretr ieva l,inc lud ingd iscu ss ion so fthetechn iquethatisthefocu so fth ispaper ,from databaseandtheoret ica lpo in tso fv iew ,respect ive ly ;[ , ]arec lass ica l tex tsonthesub jecto f in form at ionretr ieva l.) H ow ever ,thee ldo fin form at ionretr ieva lhasbeenevo lv ing ind irect ion sthatb r ing itc loser todatabases .In form at ionretr ieva lsy stem sareincreas ing lybe ingbu iltonre lat iona l(orob ject- re lat iona l)databasesy stem s ,ratherthanon attex tand indexles .A notherim portan tchangeis thed ram at icexpan s iono fthescopeo fin form at ionretr ieva l,w iththeadven to fm u lt im ed ia ,the in ternet ,andg loba lized in form at ion ;databaseconcep tsandsom etheoryhavestartedtond fert ile groundthere(seeforexam p le[ ,,],asw e llarecordnum bero fin form at ionretr ieva lpapersin the S IGMODP roceed ings)Second ly ,thetechn iquesem p loyed in in form at ionretr ieva lhave A nd ,perhap sasim portan t ly ,thestakeshavebecom etooh igh fordatabasetheoryto ligh t lypassth ise ldby . becom em orem athem at ica landsoph ist icated ,m orep lau s ib lyam enab letoana ly t ica ltreatm en t . T hep resen tpaperisanattem p ttotreatr igorou s lyonesuchtechn ique , la ten tsem an ticindex ing (LS I) , in troducednex t .T h ird ly ,in form at ionretr ieva...
View Full Document

This note was uploaded on 11/12/2010 for the course CSCI 271 taught by Professor Wilczynski during the Spring '08 term at USC.

Page1 / 18

10.1.1.44.6984 - Laten tSem an t icIndex ing:AP robab ilist icA na ly s is Ch r istosH.P apad im itr iou C om pu terSc ienceD iv is ion,U.C.B erke

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online