Learning to Rank.pdf - Learning to Rank Search Engines 11-442 11-642 HW4 Learning to Rank Due Nov 5 11:59pm The purpose of this assignment is to gain

Learning to Rank.pdf - Learning to Rank Search Engines...

This preview shows page 1 - 3 out of 5 pages.

11/11/2019 Learning to Rank 1/5 Search Engines: 11-442 / 11-642 HW4: Learning to Rank Due Nov 5, 11:59pm The purpose of this assignment is to gain experience with using a machine learning algorithm to train a feature-based retrieval model. All students must add learning-to-rank capabilities to the software that you developed for homework assignments 1-3. 11-442 students must submit a two-page report that contains a statement of collaboration and originality and a description of your custom features. 11-642 students must also use the software to conduct retrieval experiments and write a report that discusses the experimental results. Software Development This homework adds four new capabilities to the search engine that you developed for previous homework assignments. Your program must: 1. Read training queries and relevance judgments from input files; 2. Calculate feature vectors for training documents; 3. Write the feature vectors to a file; 4. Call SVM rank to train a retrieval model; 5. Read test queries from an input file; 6. Use BM25 to get inital rankings for test queries; 7. Calculate feature vectors for the top 100 ranked documents (for each query); 8. Write the feature vectors to a file; 9. Call SVM rank to calculate scores for test documents; 10. Read the scores produced by SVM rank ; and 11. Write the final ranking in trec_eval input format. Your program must implement the following features. f 1 : Spam score for d (read from index). Hint: The spam score is stored in your index as the spamScore attribute. It is returned as a string. int spamScore = Integer.parseInt (Idx.getAttribute ("spamScore", docid)); The score indicates the percentage of the documents in the corpus that are "spammier." That is, the spammiest 1% of the documents have percentile-score=0, the next spammiest have percentile-score=1, and so on. The least spammy 1% have percentile-score=99. f 2 : Url depth for d(number of '/' in the rawUrl field). Hint: The raw URL is stored in your index as the rawUrl attribute. String rawUrl = Idx.getAttribute ("rawUrl", docid); f 3 : FromWikipedia score for d (1 if the rawUrl contains "wikipedia.org", otherwise 0). f 4 : PageRank score for d (read from index). Hint: The PageRank is stored in your index as the PageRank attribute. It is returned as a string. float prScore = Float.parseFloat (Idx.getAttribute ("PageRank", docid));
Image of page 1
11/11/2019 Learning to Rank 2/5 f 5 : BM25 score for <q, d body >. f 6 : Indri score for <q, d body >. f 7 : Term overlap score (also called Coordination Match ) for <q, d body >. Hint: Term overlap is defined as the percentage of query terms that match the document field. f 8 : BM25 score for <q, d title >. f 9 : Indri score for <q, d title >. f 10 : Term overlap score (also called Coordination Match ) for <q, d title >.
Image of page 2
Image of page 3

You've reached the end of your free preview.

Want to read all 5 pages?

  • Spring '17
  • Callan

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask ( soon) You can ask (will expire )
Answers in as fast as 15 minutes
A+ icon
Ask Expert Tutors