SSDR: An Algorithm for Clustering Categorical Data Using Rough Set Theory

B. K. Tripathy; Adhir Ghosh

Abstract

SSDR: An Algorithm for Clustering Categorical Data Using Rough Set Theory

B. K. Tripathy and Adhir Ghosh

In the present day scenario, there are large numbers of clustering algorithms available to group objects having similar characteristics. But the implementations of many of those algorithms are challenging when dealing with categorical data. While some of the algorithms available at present cannot handle categorical data the others are unable to handle uncertainty. Many of them have the stability problem and also have efficiency issues. This necessitated the development of some algorithms for clustering categorical data and which also deal with uncertainty. In 2007, an algorithm, termed MMR was proposed [3], which uses the rough set theory concepts to deal with the above problems in clustering categorical data. Later in 2009, this algorithm was further improved to develop the algorithm MMeR [2] and it could handle hybrid data. Again, very recently in 2011 MMeR is again improved to develop an algorithm called SDR [22], which can also handle hybrid data. The last two algorithms can handle both uncertainties as well as deal with categorical data at the same time but SDR has more efficiency over MMeR and MMR. In this paper, we propose a new algorithm in this sequence, which is better than all its predecessors; MMR, MMeR and SDR, and we call it SSDR (Standard deviation of Standard Deviation Roughness) algorithm. This takes both the numerical and categorical data simultaneously besides taking care of uncertainty. Also, this algorithm gives better performance while tested on well known datasets.