Implementation of Clustering and Similarity Analysis for Detecting Content Similarity in Student Final Projects

Author(s) : Amil Ahmad Ilham | Anugrayani Bustamin | Iqra' Aswad | Fadhilah Armin

IOP Conference Series: Materials Science and Engineering

PDF
Abstract

To finish study, students are requested to submit final projects. In some universities, the final projects are not necessary to be submitted for publication. The final project reports are stored in a local database. As the number of final projects is growing in the local database, similar contents may exist among the documents. The commercial tools cannot be used to detect the content similarity since the documents are not published. This paper proposed a system to detect content similarity in documents that are stored in a local database. Considering the number of stored documents, this similar content detection system implements two step processes. First, clustering documents to find most related documents. Second, finding content similarity among the selected documents. The experiment results show that the system is successfully clustering documents and detecting content similarity by implementing TF-IDF and Cosine Similarity algorithms. This system is limited to proceed documents that are written in Bahasa.