Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 23, Issue 1 (2025)
  4. BIE: Binary Image Encoding for the Class ...

Journal of Data Science

Submit your article Information
  • Article info
  • More
    Article info

BIE: Binary Image Encoding for the Classification of Tabular Data
Volume 23, Issue 1 (2025), pp. 109–129
James Halladay ORCID icon link to view author James Halladay details   Drake Cullen ORCID icon link to view author Drake Cullen details   Nathan Briner ORCID icon link to view author Nathan Briner details     All authors (9)

Authors

 
Placeholder
https://doi.org/10.6339/24-JDS1122
Pub. online: 19 April 2024      Type: Data Science In Action      Open accessOpen Access

Received
13 October 2023
Accepted
13 February 2024
Published
19 April 2024

Abstract

There has been remarkable progress in the field of deep learning, particularly in areas such as image classification, object detection, speech recognition, and natural language processing. Convolutional Neural Networks (CNNs) have emerged as a dominant model of computation in this domain, delivering exceptional accuracy in image recognition tasks. Inspired by their success, researchers have explored the application of CNNs to tabular data. However, CNNs trained on structured tabular data often yield subpar results. Hence, there has been a demonstrated gap between the performance of deep learning models and shallow models on tabular data. To that end, Tabular-to-Image (T2I) algorithms have been introduced to convert tabular data into an unstructured image format. T2I algorithms enable the encoding of spatial information into the image, which CNN models can effectively utilize for classification. In this work, we propose two novel T2I algorithms, Binary Image Encoding (BIE) and correlated Binary Image Encoding (cBIE), which preserve complex relationships in the generated image by leveraging the native binary representation of the data. Additionally, cBIE captures more spatial information by reordering columns based on their correlation to a feature. To evaluate the performance of our algorithms, we conducted experiments using four benchmark datasets, employing ResNet-50 as the deep learning model. Our results show that the ResNet-50 models trained with images generated using BIE and cBIE consistently outperformed or matched models trained on images created using the previous State of the Art method, Image Generator for Tabular Data (IGTD).

Supplementary material

 Supplementary Material
We provide the code and datasets separately in the supplementary material. Included in the code is also all of the figures included in the paper in svg, pdf, and png format. The code reflects the contents of the github repository used for these experiments at the time of publication (Halladay et al., 2023).

References

 
Aeberhard S, Forina M (1991). Wine. UCI Machine Learning Repository. https://doi.org/10.24432/C5PC7J
 
Albanese C, Li D, Lobachevskiy E, Meissner G (2013). A comparative analysis of correlation approaches in finance. The Journal of Derivatives, 21(2): 42–66. https://doi.org/10.3905/jod.2013.21.2.042
 
Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, et al. (2018). The history began from alexnet: A comprehensive survey on deep learning approaches.
 
Buturović L, Miljković D (2020). A novel method for classification of tabular data using convolutional neural networks. BioRxiv, 2020–05. https://doi.org/10.1101/2020.05.02.074203
 
Cawley GC, Talbot NL (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11: 2079–2107.
 
Choi S, Fang C, Haddad D, Kim M (2022). Predictive modeling of charge levels for battery electric vehicles using cnn efficientnet and igtd algorithm. arXiv preprint: https://arxiv.org/abs/2206.03612
 
Das HP, Spanos CJ (2022). Improved dequantization and normalization methods for tabular data pre-processing in smart buildings. In: Proceedings of the 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, 168–177.
 
Fisher RA (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2): 179–188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
 
Fisher RA (1988). Iris. UCI Machine Learning Repository. https://doi.org/10.24432/C56C76
 
Gokhale M, Mohanty SK, Ojha A (2023). Genevit: Gene vision transformer with improved deepinsight for cancer classification. Computers in Biology and Medicine, 155: 106643. https://doi.org/10.1016/j.compbiomed.2023.106643
 
Halladay J, Cullen D, Briner N, Warren J, Fye K, Basnet R, et al. (2022). Detection and characterization of ddos attacks using time-based features. IEEE Access, 10: 49794–49807. https://doi.org/10.1109/ACCESS.2022.3173319
 
Halladay J, Cullen D, Briner N, Watson W, Miller D, Primeau R (2023). Binary image transformation: Github repository.
 
Hand DJ, Christen P, Kirielle N (2021). F*: An interpretable transformation of the f-measure. Machine Learning, 110(3): 451–456. https://doi.org/10.1007/s10994-021-05964-1
 
He H, Garcia EA (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9): 1263–1284. https://doi.org/10.1109/TKDE.2008.239
 
He K, Zhang X, Ren S, Sun J (2016a). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.
 
He K, Zhang X, Ren S, Sun J (2016b). Identity mappings in deep residual networks. In: Proceedings, Part IV 14. Computer Vision–ECCV 2016: 14th European Conference. Amsterdam, The Netherlands. October 11–14, 2016, 630–645. Springer.
 
Hossin M, Sulaiman MN (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2): 1. https://doi.org/10.5121/ijdkp.2015.5201
 
Iqbal MI, Mukta MSH, Hasan AR, Islam S (2022). A dynamic weighted tabular method for convolutional neural networks. IEEE Access, 10: 134183–134198. https://doi.org/10.1109/ACCESS.2022.3231102
 
Koonce B (2021). ResNet 50, 63–72. Apress, Berkeley, CA
 
Krizhevsky A, Sutskever I, Hinton GE (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, (F Pereira, CJ Burges, L Bottou, KQ Weinberger, eds.) 25.
 
Krupski J, Graniszewski W, Iwanowski M (2021). Data transformation schemes for cnn-based network traffic analysis: A survey. Electronics, 10(16): 2042. https://doi.org/10.3390/electronics10162042
 
Ling CX, Huang J, Zhang H (2003). Auc: A better measure than accuracy in comparing learning algorithms. In: Advances in Artificial Intelligence: 16th Conference of the Canadian Society for Computational Studies of Intelligence (Y Xiang, (B Chaib-Draa, eds.), volume 16 of Proceedings, AI 2003, Halifax, Canada, June 11–13, 2003, 329–341. Springer.
 
Moskalenko A, Moskalenko V, Shaiekhov A, Zaretskyi M (2020). Multi-layer model and training method for information-extreme malware traffic detector. In: CMIS, 288–299.
 
Noroozi M, Favaro P (2016). Unsupervised learning of visual representations by solving jigsaw puzzles. In: European Conference on Computer Vision, 69–84. Springer.
 
Rabbah J, Ridouani M, Hassouni L (2022). A new churn prediction model based on deep insight features transformation for convolution neural network architecture and stacknet. International Journal of Web-Based Learning and Teaching Technologies (IJWLTT), 17(1): 1–18. https://doi.org/10.4018/ijwltt.300342
 
Seger C (2018). An investigation of categorical variable encoding techniques in machine learning: Binary versus one-hot and feature hashing.
 
Sharafaldin I, Lashkari AH, Hakak S, Ghorbani AA (2019). Developing realistic distributed denial of service (ddos) attack dataset and taxonomy. In: 2019 International Carnahan Conference on Security Technology (ICCST), 1–8. IEEE.
 
Sharma A, Vans E, Shigemizu D, Boroevich KA, Tsunoda T (2019). Deepinsight: A methodology to transform a non-image data to an image for convolution neural network architecture. Scientific Reports, 9(1): 11399. https://doi.org/10.1038/s41598-019-47765-6
 
Simon M, Rodner E, Denzler J (2016). Imagenet pre-trained models with batch normalization. arXiv preprint: https://arxiv.org/abs/1612.01452
 
Simonyan K, Zisserman A (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint: https://arxiv.org/abs/1409.1556
 
Sun B, Yang L, Zhang W, Lin M, Dong P, Young C, et al. (2019). Supertml: Two-dimensional word embedding for the precognition on structured tabular data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2973–2981.
 
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016). Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2818–2826.
 
Taheri A, Ebrahimnezhad H, Sedaaghi MH (2022). Prediction of the critical temperature of superconducting materials using image regression and ensemble deep learning. Materials Today Communications, 33: 104743. https://doi.org/10.1016/j.mtcomm.2022.104743
 
Wang W, Zhu M, Zeng X, Ye X, Sheng Y (2017). Malware traffic classification using convolutional neural network for representation learning. In: 2017 International Conference on Information Networking (ICOIN), 712–717. IEEE.
 
Wolberg W, Mangasarian O, Street N, Street W (1995). Breast Cancer Wisconsin (Diagnostic). UCI Machine Learning Repository. https://doi.org/10.24432/C5DW2B
 
Xie S, Girshick R, Dollár P, Tu Z, He K (2017). Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1492–1500.
 
Zhu Y, Brettin T, Xia F, Partin A, Shukla M, Yoo H, et al. (2021). Converting tabular data into images for deep learning with convolutional neural networks. Scientific Reports, 11(1): 11325. https://doi.org/10.1038/s41598-021-90923-y

PDF XML
PDF XML

Copyright
2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
computer vision DeepInsight IGTD native representation ResNet-50 tabular-to-image

Funding
This work was supported by the State of Colorado through funds appropriated for cybersecurity by a piece of legislation dubbed “Cyber Coding Cryptology for State Records.”

Metrics
since February 2021
835

Article info
views

217

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy