IMPLEMENTASI ALGORITMA RAPID AUTOMATIC KEYWORD EXTRACTION (RAKE) PADA PEMBUATAN INDEKS BUKU

Writer(s) : Chindy Christie Davina | Ingrid Nurtanio

Teknik Informatika | Teknik Informatika S1

PDF

Abstract

CHINDY CHRISTIE DAVINA. Implementation of the Rapid Automatic Keyword Extraction (RAKE) Algorithm in Book Indexes Creation (supervised by Ingrid Nurtanio) An index is a list of items (such as topics or names) discussed in a printed work that provides for each item the page number where the item can be found. Currently, index creation still requires a lot of human labor, making it time-consuming and prone to errors. In this context, the use of computational algorithms is expected to be an efficient and effective solution. The Rapid Automatic Keyword Extraction (RAKE) algorithm emerges as a potential solution to accelerate and facilitate the creation of book indexes. This research aims to implement the RAKE algorithm to create book indexes automatically and evaluate the performance of the indexes generated by the RAKE algorithm by comparing them to manually created indexes. The research was conducted by trying various scenarios on the algorithm, namely using the Part of Speech (PoS) tagging feature to detect words that also act as phrase delimiters besides stopwords; filtering the keywords generated by RAKE; adding cosine similarity and capital letter count features (experimenting with weighting of both features); and taking the top N ranked keywords. The evaluation was carried out by comparing the indexes generated by the system with the existing indexes at the back of the evaluation book. The results showed that the best scenario was to take the top 20 ranked indexes, use PoS tagging to detect additional phrase delimiters, perform keyword filtering, use cosine similarity feature weighted 1 and capital letter count weighted 8. In the book EM Modeling of Antennas and RF Components for Wireless Communication Systems, the average index rank was 8,3968, precision 0,02723, recall 0,45818, and f-measure 0,05141; while in the book High-Performance Scientific Computing the average index rank was 8,3955, precision 0,02177, recall 0,35733, and f-measure 0,04105; while in the book Scientific Computing with MATLAB and Octave the average index rank was 7,6805, precision 0,04653, recall 0,42289, and f-measure 0,08383; while in the book Introduction to Deep Learning the average index rank was 8,7226, precision 0,04115, recall 0,46441, and f-measure 0,07561.

Keyword(s): RAKE, Rapid Automatic Keyword Extraction, book indexes, cosine similarity, capital letter, keyword

Year : 2020