Amir Hamzah (Jurusan Teknik Informatika, Fakultas Teknologi Industri, ISTA AKPRIND Yogyakarta), Adhi Susanto (Jurusan Teknik Elektro, Fakultas Teknik, Universitas Gadjah Mada), F. Soesianto (Jurusan Teknik Elektro, Fakultas Teknik, Universitas Gadjah Mada), Jazi Eko Istiyanto (Jurusan Fisika, Fakultas Matematika dan Ilmu Pengetahuan Alam, UGM)


The speed expansion of text information volume has caused the difficulty of information retrieval process, mainly on the model of word-based matching. The page-rank based search results typically produce very long list so that user have no time to browse to find relevan documents that may have low ranking in the list. The sinonimy factor of word has caused non relevan document to be retrieved, whereas polisemy factor has caused relevan documen not to be retrieved. According to the cluster hypothesis, the application of document clustering to the search results before presented to the user can increase the effectiveness of retrieval.
 This sudy elaborates the application of document clustering to improve the effectiveness of retrieval by clustering to the search result before presented to the user. The method of presenting results to the user is then according to the clusters resulted, utilitized by cluster label. Three clustering algorithms from partitional approach i.e. K-Means, Bisecting K-Mean and Buckshot , and hierarchical agglomerative approach with two cluster similarity function i.e. UPGMA and Complete Link were chosen. The performance parameter was measured using F-measure, a metric derived from Precision and Recall of retrieval. The document collections to be tested are 1000 news document and 350 academic abstract documents .
 The results show that the presentation of search results by using clustering have improved the number of relevan document in the up-level ranks. The improvemenet was statistically significant compare to the page-rank method. The improvement of F-measure as a performance metric is about 14,34% for news documents and 28,18% for abstract documents. 

Keywords: search result clustering, retrieval effectiveness, F-measure.

Download PDF