Realization of Clustering with Golay Code Transformations

Faisal Alsaby, Simon Berkovich


Due to the recent explosion of Big Data, unsupervised learning has become a more and more significant topic. Clustering is one of the major concerns that is involved in unsupervised learning. Unsupervised clustering is a problem that is primarily discussed in different fields ranging from Big Data, machine learning, and computer vision to computational biology. This paper investigates a previously developed methodology that employs the error-correction Golay code to cluster Big Data streams. Building on this previous research, a novel approach of generating the clustering keys is presented. This approach utilizes the Gray code property along with Golay code. This clustering methodology is unique. It outperforms all other conventional techniques because it has linear time complexity with one passage through the file. This is a decisive factor for the realization of the Big Data stream processing with on-the-fly computations. To extract meaningful knowledge from these clusters, we provide a mechanism to facilitate the process of pattern recognition based on a semi-supervised technique. Different factors that influence the accuracy of this mechanism are also discussed. We give several theoretical justifications for this approach and provide experiments on synthetic datasets. The presented method can be effectively applied to various computational intelligence problems in the Big Data situation.


Big Data; clustering algorithms; Golay code; Gray code property; machine learning; pattern recognition

Full Text:



  • There are currently no refbacks.