The Matrix Expression, Topological Index and Atomic Attribute of Molecular Topological Structure

The matrix expression, topological index and atomic attribute of molecular topological structure are reviewed. Nine ma- trices, twenty-six kinds of indices and eight methods dealing with weighted molecular graphs are summed up in three tables. Some shortcomings of the topological indices are discussed as: (1) the physical-chemical meaning of topological index is not explicit; (2) it is difficult to interpret the QSAR and QSPR models derived from the topological indices; and (3) topological index usually neglects the stereochemical information or the three-dimensional structure of the molecule. Three directions of topological index are focused on: (1) description of local information; (2) studies on inter-correlation of topological index; and (3) variable index.

The first step, also the most important step in QSAR/QSPR, is to numerically code the chemical structures of various molecules so as to build a correlation model between the chemical structures of various chemical compounds and the corresponding chemical and biological activities/properties.Thus, how to exactly transfer the chemical formula (or molecular graph) into numerical format has been a major task in QSAR/QSPR researches.There are many methods to quantify the molecular structures, in which topological index is the most popular since it can be obtained directly from molecular structures and rapidly computed for large numbers of molecules.A research (Basak et al. 1999) concluded that the topological index is the first effective choice in QSAR research.Recently, many articles (Cao and Yuan 2001, Balaban 1998, Bonchev 2000, Miguel et al. 2001, Agrawal 2001, Madan 1997, 1999, Bonchev 2001, Li et al. 2000, Rücker and Rücker 1999, and Erovnik 1999), reviews (Bono et al. 2001, Pogliani 2000, Randić and Zupan 2001, Estrada and Molina 2001, Diudea et al. 1995, Katritzky and Gordeeva 1993, and Schultz 2000), and monographs (Devillers and Balaban 1999, Kier and Hall 1999, Xin 1991, Xu and Hu 2001, Trinajstic 1992, King 1992, Gutman et al. 1991, and King and Rouvray 1987) gave systematic and comprehensive studies.Graph theory is nowadays a standard method of theoretical and computational chemistry (Harary 1969, andCvetkovic et al. 1995) and a large number of references are available on its application in chemistry (Devillers and Balaban 1999, Kier and Hall 1999, Xin 1991, Xu and Hu 2001, Trinajstic 1992, King 1992, Gutman et al. 1991, King and Rouvray 1987, Balaban 1995, and Diudea and Ivanciuc 1995).
Numerous activities and properties of organic molecules depend on the presence of specific atoms and/or functional groups in their structures.The main aim of QSAR/QSPR is to link the structure of a molecule to a biological activity or a property by means of a statistics tools, which can be expressed mathematically as follows (Devillers 1999): A/P = f (molecular structure) = f (molecular descriptors).
where A/P denotes the activity or property, which is essentially a chemical or a biological measurement value.The activity or property of a molecule can be commonly used normal boiling point, heat of formation, critical temperature, density, flash points, refractive index, chromatographic retention time, and octanol-water partition coefficient as well.Here, f (•) denotes a function, which depends on the molecular structure or molecular descriptors.In general, the model function, say f (•), can be linear or non-linear depending on different complexity of the data.
In order to evaluate structural similarity and diversity of the molecules and/or to build QSAR model as shown in the above equation, one need first to obtain the suitable numerical molecular descriptors associated with the molecular structure in QSAR/QSPR researches.There are many numerical molecular descriptors available in chemistry, including physical-chemical parameters, topological index, 3D descriptors and quantum chemical indices.However, in most cases, many chemists prefer to use topological index as molecular descriptors to evaluate toxicity, and predict biological activity (David, A. C. 2000, Liu et al. 2001, Basak et al. 1994, and Basak and Grunwald 1994), since the topological indices offer a simple way of measuring molecular branching, shape, size, cyclicity, symmetry, centricity and complexity.
The aim of this paper is to introduce the topological indices, which are essentially numerical molecular descriptors associated with the molecular structure.In order to make it easier for readers to understand the methods coding the chemical structures from the molecular graphs, three major forms are first classified as matrix expression, topological index and atomic attribute of molecular topological structure as well.Then, nine matrices expressing molecular structure, twenty-six kinds of topological indices and seven methods dealing with weighted molecular graphs are summed up in three tables for readers' convenient usage.

Three Methods Describing Graph Structure
Table 1: Graph matrices and their definitions Name of matrix (reference) Definition The Adjacency Matrix (Lukovits 2000) [

A=A(G)
where e ij is the edge formed by atoms i and j, E(G) means the set of edges in the molecular graph.Distance matrix (Rouvary 1986, Mihalic et al. 1992, and Bonchev and Tinajstic 1977) [ where min(l(p ij )) is the shortest path between atoms i and j.Reciprocal matrix (Ivanciuc 1989, Plavsic et al. 1993, and Ivanciuc et al. 1993 in which, the elements of RD is the reciprocal (excluding the zero elements) of the elements of D matrix.Detour matrix (Ivanciuc and Balaban 1994, Amic and Trinajstic 1995, Diudea et al. 1998, Rücker 1998, and Mihalic 1997) where max(l(p ij )) denotes the longest path between atoms i and j.Edge-adjacency matrix (Estrada 1995, 1996, 1999, Estrada and Ramirez 1996, and Estrada et al. 1998) [E] ij = 1 if there is a common node = 0 otherwise.

L=DEG(G)-A(G)
where deg i means the vertex degree of atom i.
Table 1 (contiuned): Graph matrices and their definitions Name of matrix (reference) Definition The x matrix (Randić 1992) χ=X(G) = 0 otherwise The resistance matrix (Klein andRandić 1993, andBonchev et al. 1994) see reference because the description of RM(G) is too long.

RM=RN(G)
Cluj matrix (Diudea 1997, Diudea et al. 1997, Kiss et al. 1997, and Gutman 1997) where the N i,p(m) represent the number of vertices on each side of the path p ij .
In order to extract structure information as much as possible, many methods focused their attention on describing atom and bond.

Matrices
In molecular graph, vertex represents atom and edge symbolizes bond.Thus, the molecular graphs can be easily expressed by matrices.Based on the matrix expression of the molecular graph, matrix polynomial, determinant, path, walk, and distance can be calculated (Berenike and Joachim 2001, Gutman et al. 2001, and Dayantis 1997).Matrix is the basis to compute other parameters.Some commonly used matrices expressing the molecular graphs are summarized in Table 1.
After getting the matrix of molecular graph, the characteristic determinants and spectra, which have important applications in molecular orbital theory (Graovac et al. 1977, Gutman and Polansky 1986, Trinajstic 1992, and Knop and Trinajstic 1980) and also are important sources of molecular descriptors (Hosoya 1971, 1988, 1990, Trinajstic 1988, and Hosoya and Murakami 1975), can be calculated.There are many references (Graovac et al. 1977, Gutman and Polansky 1986, Trinajstic 1988, 1992, Knop and Table 2: Topological indices and their definitions * Name of index (reference) Definition Autocorrelation descriptors (Moreau and Broto 1980) where g(i) and g(j) are the contributions attributed to atoms i and j.Balaban index J (Balaban 1982) where S i and S j mean the distance sums of the vertices V i and V j ; q = n e /(µ + 1), in which n e is the number of edges and µ is the cycle number.Bond flexibility index (Lieth et al. 1996) ρ KB = ΣΦ i − Φ + 1, in which Φ i corresponds to the fragment flexibilities; and Φ denotes the whole molecule flexibilities.Centric indices (Balaban 1979, andHu et al. 2003a) where δ i is the vertices at each step; and U is the Kronecker delta depending on the parity of the number of vertices n.
where I j is the intrinsic state of atoms j, V j is the vertex degree of atom j, and D ij is the distance between atoms i and j.Detour index (Lukovits 1996, Razinger 1997) where ∆ ij is an element in the detour matrix.Edge connectivity index (Estrada 1995) where δ(e i ) and δ(e j ) are the degree of edges e i and e j .
Table 2 (continued): Topological indices and their definitions * Name of index (reference) Definition Electropy index (Yee et al. 1977) where N is the total number of atoms in the molecule.N i means the number of one kind of atoms.(E-state) index (Kier and Hall 1990) where I i is the intrinsic state value of atom i; ∆I ij is the perturbation of I j on I i with the form as ∆I ij = (I i − I j )/D 2 ij .Extended ECI (Estrada et al. 1998) where δ(e i ) is the degree of edge e i .
Extended WI (Estrada et al. 1988) in which η i is the number of pairs of vertices at distance i. Flexibility index (Kier 1989 where A is the number of atoms; 1 κ α and 2 κ α are the first and second order of Kappa index. Fragment WI (Mekenyan et al. 1988) the index is to reflect the interaction between the excised fragment F and the remainder of the molecular graph (G − F ).
GSI (Gordon and Scantlebury 1964) 2N 2 = n i,j where V i and V j are the vertex degrees of atoms i and j.
Table 2 (continued): Topological indices and their definitions * Name of index (reference) Definition Harary index (Plavsic et al. 1993) Hosoya index (Hosoya 1971) where p(G, K) is the number of ways in which k edges of the graph may be chosen so that no two of them are adjacent.
Hyper-Wiener index (Klein et al. 1995) Identification numbers (Randić 1977) Kappa index (Kier 1985(Kier , 1986) where 1 P i , 2 P i , and 3 P i are the numbers of one, two, and three paths; 1 P min , and 1 P max are the number of one-bond path of linear and complete graph; 2 P min , and 2 P max are the number of two-bond paths of linear and star graph; 3 P max , and 3 P max are the number of three-bond paths of linear and twin star graph.
Kirchoff index (number) (Klein and Randić 1993) where Ω ij is the (i, j) element of resistance matrix.
Table 2 (continued): Topological indices and their definitions * Name of index (reference) Definition Molecular connectivity index (Randić 1975, andKier andHall 1976) where V i , V j , and V k are the vertex degrees of atoms i, j, and k.
Molecular topological index (Schultz 1989) in which E i is the row matrix consisting of v(A+D) where v is the vertex degree.

Topological index
There have been more than 400 kinds of topological indices available, since the birth of the first one.Topological index can be used to evaluate structural similarity and diversity.Its main role is to work as a numerical molecular descriptor in QSAR/QSPR model (Ivanciuc et al. 1999).Some important indices are listed in Table 2.

Atomic attribute
Nowadays, in addition to data on molecular connectivity, the information encoded by topological indices also includes the nature of atoms and the bond multiplicity.In topological description, another important aspect is to describe atoms and bonds, especially in weighted graph.Ivanciuc et al. has written a review (Ivanciuc and Balaban 1999) on the main schemes for computing vertex-and edge-weighted graph parameters, and the related structural descriptors.Ivanciuc et al. (1998) proposed to apply atomic electronegativity and covalent radius to obtain descriptors dealing with weighted graph.
The methods to describe nature of atoms and bond multiplicity first introduced some chemical parameters, such as atomic order (Z), relative eletronegativity (X), length of covalent radius (Y), atomic mass (A), atomic and adjacent hydrogen mass (AH), atomic polarity (P), atomic radius (R), and atomic eletronegativity (E).Based on these parameters, some topological invariants or indices were proposed (Nikolic et al. 1993, Balaban 1986).These descriptors were further applied to QSAR/QSPR researches (Medic et al. 1992, Balaban et al. 1990, 1992, Ivanciuc et al. 2000, Ivanciuc 2000, and Estrada 1997).The methods are summarized in Table 3 in detail.
where Ac is the atomic mass of carbon atom; A i , and A j are the atomic masses of atom i and j.B oij is the topological bond order of the edge between atom i and j.
Ivanciuc and Balaban 1999 , where NoH i is the number of hydrogen atoms bonded to the heavy atom i.

Ivanciuc
and Balaban 1999 where Zc is the atomic order of carbon atom; Z i , and Z j are the atomic orders of atom i and j.B oij is the topological bond order of the edge between atom i and j.Ivanciuc et al. 1998, andBarysz et al. 1983 , where X i , and X j are the relative eletronegativity of atom i and j.B oij is the topological bond order of the edge between atom i and j.
and Y j are the length of covalent radius of atom i and j.B oij is the topological bond order of the edge between atom i and j.
Sanderson 1983, Ivanciuc 1999 P , where α C is the atomic polarity of carbon atom; α i , and α j are the atomic polarity of atom i and j.B oij is the topological bond order of the edge between atom i and j.Ivanciuc et al. 1998, andNagle 1990 Table 3 (continued): Schemes for describing weighted graph Attribute Schemes

Definition for atoms and bonds
, where r C is the atomic radius of carbon atom; r i , and r j are the atomic radius of atom i and j.B oij is the topological bond order of the edge between atom i and j.Ivanciuc et al. 1998, andNagle 1990
Here we will take molecular connectivity index as an example to illustrate the calculation procedure.The molecular connectivity index is to describe the molecular connectivity (see its definition in Table 2).For instance, if we want to get χ 1 , we need to get the vertex degree (V i ), which is the sum of i row of A matrix for atom i, for every vertex in the molecular graph.From Figure 1, one can easily obtain such information, that is, 1, and V 7 = 1.Then, we can use the equation, say χ 1 = (V i V j ) −1/2 , to do the calculation first for every two adjacent vertexes and then sum them up to get χ 1 as shown in Table 5.In general, the topological indices offer a simple way of coding molecular structure information into numerical values.

Some Directions in Developing Topological Index
Although the topological indices have wide applications in QSAR/QSPR, there exit some shortcomings: a.Compared with other parameters, the physical-chemical meaning of topological index is not explicit (Devillers et al. 1997).
b.The degree of redundancy and degeneracy of certain topological indices can be very high.In that case, it is impossible to interpret the QSAR and QSPR models derived from these descriptors (Devillers 1999), which is worthy to study the problem under these kinds of situations and the topological indices should be employed only in contexts for which they are suitable.
c. Topological index usually neglects the stereochemical information or the three-dimensional structure of the molecule (Ivanciuc et al. 1999).
To deal with these shortcomings, different methods are proposed to improve the topological index.The main directions of topological index are mainly following points: (1) Description of local information Recent enrichments in these areas include topological indices for molecular fragments, some stereochemical features and electronic parame-ters associated with various atoms.The introduction of local information is helpful to explain the physical or chemical meaning of topological index.Local descriptors can be geometric, steric, hydrophobic, hydrophilic constants, or atomic electron density.Estrada thinks that the topological index is successful if the index has direct structural or physical meaning and at same time obtains similar results of original QSAR/QSPR model (Estrada 1999).A developing direction is to include not only topological but geometric characteristics to deal with three-dimensional space of stereochemistry.Another direction is to utilize atomic property to develop new index or improve the original index.
(2) Studies on inter-correlation of topological index Estrada thinks the resolution of practical problems should include as many descriptors as possible and the built QSAR/QSPR model should select as few descriptors as possible (Estrada 1999).The problem of reducing number of variables makes the study on correlation between topological indices important.
In QSAR research, a very important problem is how to reduce the number of variables to improve the stability of the model.On one side, the information of many topological indices is duplicated.On another side, how to select the required variables from the lots of topological indices is still unsolved.The solution of the problems requires the study on correlation between variables and there are some primary studies (Motoc and Balaban 1981, Motoc et al. 1982, Plavsic et al. 1996, 2000, Randić 2001, Rücker and Rücker 1994, and Chan et al. 1998). (

3) Variable index
The simple index should be modified if the molecules contain heteroatoms.The variable index is regarded as a novel way to describe the heteroatoms.The optimal value is computed through the regression procedure.The variable index is a flexible function, which makes the standard error of regression minimum and then find the optimal number (Randić and Pompe 2001b).There are many prior studies

Figure 1 .
Figure 1.Molecular structure and the corresponding topological graph (hydrogen-depressed) of 1ipC4

Table 3 :
Schemes for describing weighted graph

Table 4 :
The adjacent, distance and detour matrices of molecule 1ipC4

Table 5 :
Some topological indices of molecule 1ipC4