Semantic Apparatus – A state-of-the-art survey on semantic similarity for document clustering using GloVe and density-based algorithms

Cited by Lee Sonogan

Example of semantic similarity calculation | Download Scientific Diagram

Abstract by Shapol M. Mohammed1, Karwan Jacksi2, Subhi R. M. Zeebaree3

Semantic similarity is the process of identifying relevant data semantically.The traditional way of identifying document similarity is by using synonymous keywords and syntactician. In comparison, semantic similarity is to find similar data using meaning of words and semantics. Clustering is a concept of grouping objects that have the same features and properties as a cluster and separate from those objects that have different features and properties. In semantic document clustering, documents are clustered using semantic similarity techniques with similarity measurements. One of the common techniques to cluster documents is the density-based clustering
algorithms using the density of data points as a main strategic to measure the similarity between them. In this paper, a state-of-the-art survey is presented to analyze the density-based algorithms for clustering documents. Furthermore, the similarity and evaluation measures are investigated with the selected algorithms to grasp the common ones. The delivered review revealed that the most used density-based algorithms in document clustering are DBSCAN and DPC. The most effective similarity measurement has been used with density-based algorithms, pecifically DBSCAN and DPC, is Cosine similarity with
F-measure for performance and accuracy evaluation.

Publication: Indonesian Journal of Electrical Engineering and Computer Science (Peer-Reviewed Journal)

Pub Date: April 2020 Doi: 10.11591/ijeecs.v22.i1.pp552-562

Keywords: Survey, Semantic Similarity, Document clustering, density-based algorithms (Plenty more sections and references in this research article)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.