JDS

Journal of Data Science

1680-743X 1680-743X

SOSRUC

120308

10.6339/JDS.201407_12(3).0008

Research Article

The K-NN Algorithm for Compositional Data: A Revised Approach with and without Zero Values Present

Tsagris

Michail

School of Mathematical Sciences, University of Nottingham

12 3 519 534

Abstract: In compositional data, an observation is a vector with non-negative components which sum to a constant, typically 1. Data of this type arise in many areas, such as geology, archaeology, biology, economics and political science among others. The goal of this paper is to extend the taxicab metric and a newly suggested metric for com-positional data by employing a power transformation. Both metrics are to be used in the k-nearest neighbours algorithm regardless of the presence of zeros. Examples with real data are exhibited.

Keywords compositional data entropy k-NN algorithm