Suppose you have joined a search engine development team to design a search algorithm based on both the Vector
model and the Boolean model.
You have collected the following (3) documents (unstructured) and plan to apply an index technique to convert them into an inverted index.
Doc 1data science is a field to use scientific method, process, algorithm, system to extract knowledge.
Doc 2data mining is the process to discover pattern in large data to involve method at the database system.
Doc 3information system is the study of network of hardware and software that people use to process data. To answer the below questions, you have to provide the detailed procedures step by step.
Question 1.1: In the process of creating the inverted index, please complete the following steps:
Remove all stop words and punctuation.
The list of stop words for this task is provided as follows:
Is, An, That, Use, And, To, From, In, Both, Of, At,
The Question 1.:
Create a merged inverted list including the within-document frequencies for each term.
Question 1.: Use the index created as above to create a dictionary and the related posting file.
1.4: Please design three Boolean queries, (e.g., web AND search) and list the relevant documents for each query. Each query must contain at least two keywords while no one keyword appears in one document only. Question 1.5: Please use the Vector model to query on the inverted index, and compare the result with the Boolean model.
(Hint: you can use cosine similarity and set a similarity threshold).
Recently Asked Questions
- 1- If the customer requests that future growth and enhancement ideas be kept, where can these ideas be placed?
- Identify one or more circumstances when a company might wish to delay introducing its product. Use an example of a product that will you may have used overtime
- 1- Under what circumstances is it appropriate to represent an SRS using informal techniques only?