Journal of Data Science logo


Login Register

  1. Home
  2. To appear
  3. Enhance Supervised Self-Organization Clu ...

Journal of Data Science

Submit your article Information
  • Article info
  • More
    Article info

Enhance Supervised Self-Organization Clustering by Utilizing Unsupervised Learning Embeddings on Discrete Data
Qiang Fu   Yuefeng Li  

Authors

 
Placeholder
https://doi.org/10.6339/26-JDS1226
Pub. online: 1 April 2026      Type: Computing In Data Science      Open accessOpen Access

Received
22 May 2025
Accepted
3 March 2026
Published
1 April 2026

Abstract

The self-organizing map (SOM) is an unsupervised, competitive learning neural network that projects high-dimensional data onto a low-dimensional grid, effectively showcasing the topological relationships within the original dataset. However, the conventional SOM training algorithm is restricted to numeric data. Categorical data typically needs to be converted into binary format before SOM training, which can lead to the loss of crucial similarity information between categorical values. As a result, the trained SOM may not accurately reflect the true topological order. While a training data splitting method (TDSM) can help identify perfect representative neurons and enhance clustering outcomes, the training data itself often lacks sufficient information, such as data distribution, and can be uncertain and ambiguous. Even when perfect neurons are identified, further improvements in clustering results become challenging. This paper investigates the possibility of improving the performance of supervised TDSM SOM clustering by utilizing unsupervised self-organization granule encoding for discrete data. This approach to unsupervised learning is advantageous for uncovering uncertain and ambiguous information within discrete data, leading to a more effective topological representation of the training data.

Supplementary material

 Supplementary Material
All data and code associated with the data are in the GitHub repository https://github.com/foolishfool/TDSMSOG.

References

 
Ali OAM, Ali AY, Sumait BS (2015). Comparison between the effects of different types of membership functions on fuzzy logic controller performance. International Journal, 76: 76–83.
 
Aslam M (2023). Cochran’s Q test for analyzing categorical data under uncertainty. Journal of Big Data, 10(1): 147. https://doi.org/10.1186/s40537-023-00823-3
 
Bargiela A, Pedrycz W (2022). Granular computing. In: Handbook on Computer Learning and Intelligence: Volume 2: Deep Learning, Intelligent Control and Evolutionary Computation (PP Angelov, ed.), 97–132. World Scientific.
 
Bigdeli A, Maghsoudi A, Ghezelbash R (2022). Application of self-organizing map (SOM) and K-means clustering algorithms for portraying geochemical anomaly patterns in Moalleman district, NE Iran. Journal of Geochemical Exploration, 233: 106923. https://doi.org/10.1016/j.gexplo.2021.106923
 
Chushig-Muzo D, Soguero-Ruiz C, Engelbrecht AP, Bohoyo PDM, Mora-Jiménez I (2020). Data-driven visual characterization of patient health-status using electronic health records and self-organizing maps. IEEE Access, 8: 137019–137031. https://doi.org/10.1109/ACCESS.2020.3012082
 
Dai J, Zhu Z, Zou X (2024). Fuzzy rough attribute reduction based on fuzzy implication granularity information. IEEE Transactions on Fuzzy Systems, 32, 3741–3752. https://doi.org/10.1109/TFUZZ.2024.3381993
 
Deng J, Deng Y (2021). Information volume of fuzzy membership function. International Journal of Computers Communications & Control, 16(1).
 
Dzitac I, Filip FG, Manolescu MJ (2017). Fuzzy logic is not fuzzy: World-renowned computer scientist Lotfi A. Zadeh. International Journal of Computers Communications & Control, 12(6): 748–789. https://doi.org/10.15837/ijccc.2017.6.3111
 
Fu Q, Li Y (2025). Automated contrastive optimization of class-based feature distribution for noncontinuous dataset. Knowledge and Information Systems, 1–40. https://doi.org/10.1007/s10115-025-02576-2
 
Fu Q, Li Y, Albathan M (2023). A supervised method to enhance distance-based neural network clustering performance by discovering perfect representative neurons. Granular Computing, 8(5): 1051–1065. https://doi.org/10.1007/s41066-023-00370-5
 
Guo S, Zhao H, Yang W (2021). Hierarchical feature selection with multi-granularity clustering structure. Information Sciences, 568: 448–462. https://doi.org/10.1016/j.ins.2021.04.046
 
Hanif R, Mustafa S, Iqbal S, Piracha S (2023). A study of time series forecasting enrollments using fuzzy interval partitioning method. Journal of Computational and Cognitive Engineering, 2(2): 143–149. https://doi.org/10.47852/bonviewJCCE2202159
 
Hidalgo DR, Cortés BB, Bravo EC (2021). Dimensionality reduction of hyperspectral images of vegetation and crops based on self-organized maps. Information Processing in Agriculture, 8(2): 310–327. https://doi.org/10.1016/j.inpa.2020.07.002
 
Holloway EM (2019). Self organized multi agent swarms (SOMAS) for network security control.
 
Hsu CC (2006). Generalizing self-organizing map for categorical data. IEEE Transactions on Neural Networks, 17(2): 294–304. https://doi.org/10.1109/TNN.2005.863415
 
Hsu CC, Lin SH (2011). Visualized analysis of mixed numeric and categorical data via extended self-organizing map. IEEE Transactions on Neural Networks and Learning Systems, 23(1): 72–86.
 
Ji W, Pang Y, Jia X, Wang Z, Hou F, …, Wang R (2021). Fuzzy rough sets and fuzzy rough neural networks for feature selection: A review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 11(3): e1402.
 
Jia H, Cheung Ym, Liu J (2015). A new distance metric for unsupervised learning of categorical data. IEEE Transactions on Neural Networks and Learning Systems, 27(5): 1065–1079. https://doi.org/10.1109/TNNLS.2015.2436432
 
Khacef L, Rodriguez L, Miramond B (2020). Brain-inspired self-organization with cellular neuromorphic computing for multimodal unsupervised learning. Electronics, 9(10): 1605. https://doi.org/10.3390/electronics9101605
 
Kohonen T (1990). The self-organizing map. Proceedings of the IEEE, 78(9): 1464–1480. https://doi.org/10.1109/5.58325
 
Li C, Jiang L, Li H, Wu J, Zhang P (2017). Toward value difference metric with attribute weighting. Knowledge and Information Systems, 50(3): 795–825. https://doi.org/10.1007/s10115-016-0960-x
 
Li N, Jiang K, Ma Z, Wei X, Hong X, Gong Y (2021). Anomaly detection via self-organizing map. In:  2021 IEEE International Conference on Image Processing (ICIP) (Organizing Committee of IEEE ICIP 2021, ed.), 974–978. IEEE.
 
Licen S, Di Gilio A, Palmisani J, Petraccone S, de Gennaro G, Barbieri P (2020). Pattern recognition and anomaly detection by self-organizing maps in a multi month e-nose survey at an industrial site. Sensors, 20(7): 1887. https://doi.org/10.3390/s20071887
 
Lin TY (2009). Granular computing I: The concept of granulation and its formal model. International Journal of Granular Computing, Rough Sets and Intelligent Systems, 1(1): 21–42. https://doi.org/10.1504/IJGCRSIS.2009.026723
 
Lin TY (2023). Granular computing: Practices, theories, and future directions. In: Granular, Fuzzy, and Soft Computing (T-Y Lin, C-J Liau, J Kacprzyk, eds.), 199–219. Springer.
 
Lyu Z, Ororbia A, Li R, Desell T (2024). Minimally supervised learning using topological projections in self-organizing maps. arXiv preprint: https://arxiv.org/abs/2401.06923
 
Melin P, Monica JC, Sanchez D, Castillo O (2020). Analysis of spatial spread relationships of coronavirus (COVID-19) pandemic in the world using self organizing maps. Chaos, Solitons & Fractals, 138: 109917.
 
Miljković D (2017). Brief review of self-organizing maps. In: 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (MIPRO Technical Program Committee, ed.), 1061–1066. IEEE.
 
Nagar D, Pannerselvam K, Ramu P (2022). A novel data-driven visualization of n-dimensional feasible region using interpretable self-organizing maps (iSOM). Neural Networks, 155: 398–412. https://doi.org/10.1016/j.neunet.2022.08.019
 
Neagoe VE, Ropot AD (2002). Concurrent self-organizing maps for pattern classification. In: Proceedings First IEEE International Conference on Cognitive Informatics, 304–312. IEEE.
 
Ni M, Cheng H, Lai J (2021). Gan–som: A clustering framework with SOM-similar network based on deep learning. The Journal of Supercomputing, 77: 4871–4886. https://doi.org/10.1007/s11227-020-03464-y
 
Pedrycz W (2021). An Introduction to Computing with Fuzzy Sets: Analysis, Design, and Applications. Springer, Intelligent Systems Reference Library (Vol. 190).
 
Qu X, Yang L, Guo K, Ma L, Sun M, …, Li M (2021). A survey on the development of self-organizing maps for unsupervised intrusion detection. Mobile Networks and Applications, 26: 808–829. https://doi.org/10.1007/s11036-019-01353-0
 
Riese FM, Keller S, Hinz S (2019). Supervised and semi-supervised self-organizing maps for regression and classification focusing on hyperspectral data. Remote Sensing, 12(1): 7. https://doi.org/10.3390/rs12010007
 
Song M, Hu L, Feng S, Wang Y (2023). Feature ranking based on an improved granular neural network. Granular Computing, 8(1): 209–222. https://doi.org/10.1007/s41066-022-00324-3
 
Wang F, Franco H, Pugh J, Ross RJ (2016). Empirical comparative analysis of 1-of-K coding and K-prototypes in categorical clustering. CEUR Workshop Proceedings, 1751, 248–259.
 
Wickramasinghe CS, Amarasinghe K, Manic M (2019). Deep self-organizing maps for unsupervised image classification. IEEE Transactions on Industrial Informatics, 15(11): 5837–5845. https://doi.org/10.1109/TII.2019.2906083
 
Yang X, Dong M, Guo Y, Xue JH (2020). Metric learning for categorical and ambiguous features: An adversarial method. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases (F Hutter, K Kersting, J Lijffijt, I Valera, eds.), 223–238. Springer.
 
Yao JT, Vasilakos AV, Pedrycz W (2013). Granular computing: Perspectives and challenges. IEEE Transactions on Cybernetics, 43(6): 1977–1989. https://doi.org/10.1109/TSMCC.2012.2236648
 
Zadeh LA (2023). Fuzzy logic. In: Granular, Fuzzy, and Soft Computing (TY Lin, CJ Liau, J Kacprzyk, eds.), 19–49. Springer US, New York, NY.

PDF XML
PDF XML

Copyright
2026 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
fuzzy set granular computing noncontinuous data clustering self-organized granular encoding training data splitting method

Funding
This article was partially supported by the Grant DP220101360 from the Australian Research Council.

Metrics
since February 2021
79

Article info
views

34

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy