Enhance Supervised Self-Organization Clustering by Utilizing Unsupervised Learning Embeddings on Discrete Data
Pub. online: 1 April 2026
Type: Computing In Data Science
Open Access
Received
22 May 2025
22 May 2025
Accepted
3 March 2026
3 March 2026
Published
1 April 2026
1 April 2026
Abstract
The self-organizing map (SOM) is an unsupervised, competitive learning neural network that projects high-dimensional data onto a low-dimensional grid, effectively showcasing the topological relationships within the original dataset. However, the conventional SOM training algorithm is restricted to numeric data. Categorical data typically needs to be converted into binary format before SOM training, which can lead to the loss of crucial similarity information between categorical values. As a result, the trained SOM may not accurately reflect the true topological order. While a training data splitting method (TDSM) can help identify perfect representative neurons and enhance clustering outcomes, the training data itself often lacks sufficient information, such as data distribution, and can be uncertain and ambiguous. Even when perfect neurons are identified, further improvements in clustering results become challenging. This paper investigates the possibility of improving the performance of supervised TDSM SOM clustering by utilizing unsupervised self-organization granule encoding for discrete data. This approach to unsupervised learning is advantageous for uncovering uncertain and ambiguous information within discrete data, leading to a more effective topological representation of the training data.
Supplementary material
Supplementary MaterialAll data and code associated with the data are in the GitHub repository https://github.com/foolishfool/TDSMSOG.
References
Aslam M (2023). Cochran’s Q test for analyzing categorical data under uncertainty. Journal of Big Data, 10(1): 147. https://doi.org/10.1186/s40537-023-00823-3
Bigdeli A, Maghsoudi A, Ghezelbash R (2022). Application of self-organizing map (SOM) and K-means clustering algorithms for portraying geochemical anomaly patterns in Moalleman district, NE Iran. Journal of Geochemical Exploration, 233: 106923. https://doi.org/10.1016/j.gexplo.2021.106923
Chushig-Muzo D, Soguero-Ruiz C, Engelbrecht AP, Bohoyo PDM, Mora-Jiménez I (2020). Data-driven visual characterization of patient health-status using electronic health records and self-organizing maps. IEEE Access, 8: 137019–137031. https://doi.org/10.1109/ACCESS.2020.3012082
Dai J, Zhu Z, Zou X (2024). Fuzzy rough attribute reduction based on fuzzy implication granularity information. IEEE Transactions on Fuzzy Systems, 32, 3741–3752. https://doi.org/10.1109/TFUZZ.2024.3381993
Dzitac I, Filip FG, Manolescu MJ (2017). Fuzzy logic is not fuzzy: World-renowned computer scientist Lotfi A. Zadeh. International Journal of Computers Communications & Control, 12(6): 748–789. https://doi.org/10.15837/ijccc.2017.6.3111
Fu Q, Li Y (2025). Automated contrastive optimization of class-based feature distribution for noncontinuous dataset. Knowledge and Information Systems, 1–40. https://doi.org/10.1007/s10115-025-02576-2
Fu Q, Li Y, Albathan M (2023). A supervised method to enhance distance-based neural network clustering performance by discovering perfect representative neurons. Granular Computing, 8(5): 1051–1065. https://doi.org/10.1007/s41066-023-00370-5
Guo S, Zhao H, Yang W (2021). Hierarchical feature selection with multi-granularity clustering structure. Information Sciences, 568: 448–462. https://doi.org/10.1016/j.ins.2021.04.046
Hanif R, Mustafa S, Iqbal S, Piracha S (2023). A study of time series forecasting enrollments using fuzzy interval partitioning method. Journal of Computational and Cognitive Engineering, 2(2): 143–149. https://doi.org/10.47852/bonviewJCCE2202159
Hidalgo DR, Cortés BB, Bravo EC (2021). Dimensionality reduction of hyperspectral images of vegetation and crops based on self-organized maps. Information Processing in Agriculture, 8(2): 310–327. https://doi.org/10.1016/j.inpa.2020.07.002
Hsu CC (2006). Generalizing self-organizing map for categorical data. IEEE Transactions on Neural Networks, 17(2): 294–304. https://doi.org/10.1109/TNN.2005.863415
Jia H, Cheung Ym, Liu J (2015). A new distance metric for unsupervised learning of categorical data. IEEE Transactions on Neural Networks and Learning Systems, 27(5): 1065–1079. https://doi.org/10.1109/TNNLS.2015.2436432
Khacef L, Rodriguez L, Miramond B (2020). Brain-inspired self-organization with cellular neuromorphic computing for multimodal unsupervised learning. Electronics, 9(10): 1605. https://doi.org/10.3390/electronics9101605
Kohonen T (1990). The self-organizing map. Proceedings of the IEEE, 78(9): 1464–1480. https://doi.org/10.1109/5.58325
Li C, Jiang L, Li H, Wu J, Zhang P (2017). Toward value difference metric with attribute weighting. Knowledge and Information Systems, 50(3): 795–825. https://doi.org/10.1007/s10115-016-0960-x
Licen S, Di Gilio A, Palmisani J, Petraccone S, de Gennaro G, Barbieri P (2020). Pattern recognition and anomaly detection by self-organizing maps in a multi month e-nose survey at an industrial site. Sensors, 20(7): 1887. https://doi.org/10.3390/s20071887
Lin TY (2009). Granular computing I: The concept of granulation and its formal model. International Journal of Granular Computing, Rough Sets and Intelligent Systems, 1(1): 21–42. https://doi.org/10.1504/IJGCRSIS.2009.026723
Lyu Z, Ororbia A, Li R, Desell T (2024). Minimally supervised learning using topological projections in self-organizing maps. arXiv preprint: https://arxiv.org/abs/2401.06923
Nagar D, Pannerselvam K, Ramu P (2022). A novel data-driven visualization of n-dimensional feasible region using interpretable self-organizing maps (iSOM). Neural Networks, 155: 398–412. https://doi.org/10.1016/j.neunet.2022.08.019
Ni M, Cheng H, Lai J (2021). Gan–som: A clustering framework with SOM-similar network based on deep learning. The Journal of Supercomputing, 77: 4871–4886. https://doi.org/10.1007/s11227-020-03464-y
Qu X, Yang L, Guo K, Ma L, Sun M, …, Li M (2021). A survey on the development of self-organizing maps for unsupervised intrusion detection. Mobile Networks and Applications, 26: 808–829. https://doi.org/10.1007/s11036-019-01353-0
Riese FM, Keller S, Hinz S (2019). Supervised and semi-supervised self-organizing maps for regression and classification focusing on hyperspectral data. Remote Sensing, 12(1): 7. https://doi.org/10.3390/rs12010007
Song M, Hu L, Feng S, Wang Y (2023). Feature ranking based on an improved granular neural network. Granular Computing, 8(1): 209–222. https://doi.org/10.1007/s41066-022-00324-3
Wickramasinghe CS, Amarasinghe K, Manic M (2019). Deep self-organizing maps for unsupervised image classification. IEEE Transactions on Industrial Informatics, 15(11): 5837–5845. https://doi.org/10.1109/TII.2019.2906083
Yao JT, Vasilakos AV, Pedrycz W (2013). Granular computing: Perspectives and challenges. IEEE Transactions on Cybernetics, 43(6): 1977–1989. https://doi.org/10.1109/TSMCC.2012.2236648