Write a Java program that interacts with a user to process information retrieval queries, First, prompt for the directory containing the collection of data, then, you will need to build an inverted index or incidence matrix

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question

Write a Java program that interacts with a user to process information retrieval queries, First, prompt for the directory containing the collection of data, then, you will need to build an inverted index or incidence matrix. Each entry in the inverted index should consist of a vocabulary word, the word’s document frequency, and the word’s postings. Each posting should contain a document ID and the term frequency of the word with respect to the document.

Alternatively, you may build a (non-boolean) incidence matrix. This would contain a table where each row corresponds to a vocabulary word, and each column corresponds to a document. Each cell in the table contains the term frequency (which is an integer representing the number of times the row’s word appears in the column’s document). With that information, the term frequency and inverse document frequency can be calculated when needed.

Next, you will need to build the permuterm index. This will contain the information where each permuterm points back to the original vocabulary term. Thus, you will need an array where each record contains a permuterm and the vocabulary term that generated it. Finally, you will need to build a querying component. The program should prompt the user for a query term. The system should then input a query. If the query contains an asterisk, your program should find the permuterm of the query where the asterisk is at the end. It should then search the permuterm index for the matching terms which will indicate the vocabulary terms to search in the inverted index/incidence matrix. At that point, your program can compute the TF-IDF score for each vocabulary term and return them to the user.

Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 2 steps

Blurred answer
Knowledge Booster
Lists
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education