A Growing Self-Organizing Neural Network for Lifestyle Segmentation

Lifestyles can be used to explain existent and to anticipate future consumer behavior, both in a geographical and a temporal context. Basing market segmentations on consumer lifestyles enables the development of pur- poseful advertising strategies and the design of new products meeting future demands. The present paper introduces a new growing self-organizing neu- ral network which identifies lifestyles, or rather consumer types, in survey data largely autonomously. Before applying the algorithm to real marketing data we are going to demonstrate its general performance and adaptability by means of synthetic 2D data featuring distinct heterogeneity with respect to the arrangement of the individual data points.


Lifestyles and consumer typologies
Lifestyle is an important consumer characteristic and a determinant of individual purchasing behavior (Assael 1995;Engel, Blackwell, and Miniard 1995;Jobber and Lancaster 2003).It is represented by individuals' activities, interests and opinions.Cognitions from lifestyle segmentations influence marketing decisions in several ways, e.g. in media planning/selection and new product development for mass markets.The latter field of application can be motivated by the fact that many products and services are only successful in an economical respect, when they are purchased by thousands or even millions of people.Thus the focus of marketing researchers and marketers is on the identification of broad trends and patterns corresponding to the consumers' daily life, leisure behavior, and spending habits.Referring to plausible lifestyle segments proves to be more promising in marketing planning than the isolated use of demo-and psychographic data (Wedel and Kamakura 2000).
However, it should be recognized that lifestyles are continuously changing (Mowen and Minor 1998) and depend on cultural and sociological developments.New lifestyles develop from time to time.In this respect existing consumer typologies or segmentations, such as AIO ('activities, interests, and opinions') by Wells and Tigert (1971) or VALS ('values and life-styles') by the Stanford Research Institute (Mitchell 1983) are always a compromise between universality and specificity.The VALS typology, e.g., results from a market segmentation based on the individual's resources, mainly income and education, and self-orientation, i.e. attitude towards oneself, one's inspirations and the things one does to communicate and achieve them (Brassington and Pettitt 2005).In many cases an additional analysis of current primary data is indispensable to complete standardized information services such as VALS.This particularly applies if special consumer areas in daily life, such as nutrition behavior, are in the center of interest.Advances in automatic data production, e.g. by electronic micro test markets like IRI BehaviorScan or online panels, enable vivid portraits of different consumer types.
The reliable detection of meaningful lifestyle patterns and the empirical determination of consumer typologies from survey data is an elementary task of marketing research and contributes to a deeper understanding of existing relationships between products/services and consumers.Following Brassington and Pettitt (2005, p. 119) lifestyle segmentation methods 'can open the door to a better-tailored, more subtle offering to the customer on all aspects of the marketing mix'.Therefore different approaches, most of which apply traditional cluster or factor analysis, have been discussed in the past to address this topic.In both cases the analyst has to control the grouping process to a large extent.In hierarchical cluster analysis the selection of the similarity measure (e.g. the Tanimoto coefficient, which presents itself in case of binary coded consumer attributes), the fusion method (e.g.average linkage, where the distance between two clusters or segments is assumed to equal the average distance between pairs of observations, one in each cluster), and the criterion for determining an adequate number of clusters or consumer types (e.g.cubic clustering criterion according to Milligan and Cooper (1985)), influence the final structure of the resulting typology.Partitioning algorithms such as k-means, on the other hand, necessitate a pre-specification of the number of clusters (consumer types) to be considered.Similar problems arise from applying factor analysis as well, e.g.regarding the number of factors (consumer types) to be extracted or assumed, the communality estimation and the factor rotation.The adequate use of the respective algorithms requires extensive knowledge about the internal structure of the data to be analyzed.But the more intervention an algorithm requires from outside, the more subjectivity is involved regarding the final typology.Therefore, a methodology is desirable which enables the determination of meaningful consumer typologies as autonomously as possible.In this respect self-organizing neural networks applying unsupervised learning techniques seem to be a promising alternative.The so-called prototypes (see subsection 2.1) resulting from the application of this kind of neural networks to appropriate data provide an almost 'natural' and easy-to-interpret basis for consumer typologies.

Related methodical work
Self-organizing neural networks have been an integral part of the data analytical instruments of natural and social sciences for several years.The spectrum of applications ranges from automatic image, text, and speech processing (Kohonen 2001) through the analysis of gas chromatographic patterns (Questier, Guo, Walczak, Massart, Boucon, and de Jong 2002), to financial data and industry analysis (DeBoeck and Kohonen 1998;Simula, Vasara, Vesanto, and Helminen 1999), qualification analysis in business administration (Wagner 2004), as well as market segmentation in e-commerce (Vellido, Lisboa, and Meehan 1999) and market basket analysis (Decker and Monien 2003).
As a consequence thereof this class of neural networks is the subject of continuing efforts for improvement.Corresponding research interests are devoted to both their individual design for specific areas of application and the elimination of existing methodological problems.Some of these problems, e.g. the flexible determination of network structure and size, have been solved or at least significantly reduced by algorithms such as the neural gas network (Martinetz and Schulten 1991), the growing neural gas network (Fritzke 1995), the growing self-organizing map (Villmann and Bauer 1998), the growing hierarchical self-organizing map (Dittenbach, Rauber, and Merkl 2002), and the grow-whenrequired network (Marsland, Shapiro, and Nehmzow 2002).In addition proposals with regard to a speeding-up of the adaptation process are already available for particular neural networks, e.g. for neural gas networks (Atukurale and Suganthan 2000).However, the efficient determination of adequate parameter settings continues to be a crucial practical problem, which has to be solved for each data set by a more or less troublesome trial-and-error process, if the internal structure of the data is unknown.This applies to neural networks with an a priori defined topology as well as to growing ones.

Motivation of the new approach
In contrast to natural and technical sciences, where unsupervised pattern detection has been an established subject of current research for a number of years, corresponding work is still in its infancy in marketing and consumer research.This, among other things, results from the fact that the application of most of the available self-organizing neural networks requires considerable experience regarding the adequate specification of the relevant control parameters.Experiences of this kind are rarely available on the part of marketing practitioners.
Against this background we are going to introduce a self-organizing neural network approach, where the number of parameters to be preset by the user at the beginning of the data analysis process is limited to a minimum.The Growing Neural Network with Autonomous Parameter Specification (GNNAPS) presented in this paper determines the majority of the control parameters required for an adequate network adaptation more or less autonomously.To our knowledge the present study is the first to use a growing self-organizing neural network within the scope of lifestyle segmentation and provides an impression of the insights attainable in this way.
The remainder of the paper is organized as follows: In section 2 we briefly reflect some basic methodological aspects and outline the GNNAPS algorithm.Its performance and adaptability is demonstrated by means of a synthetic 2D data set in section 3 before real survey data collected from a German household panel are analyzed in section 4. The paper concludes with a brief discussion and an outlook on future research.

Preliminary remarks
The methodology underlying the GNNAPS algorithm is vector quantization, the basic idea of which is to represent J K-dimensional input vectors (consumer profiles) t j = (t j1 , . . ., t jk , . . ., t jK ), with j ∈ {1, . . ., J}, by an adequate number H of weight vectors η h = (η h1 , . . ., η hk , . . ., η hK ), with h ∈ {1, . . ., H}.Each weight vector η h defines one node ('neuron') of a neural network and will be interpreted in the following as the prototypical representative of a lifestyle segment.In the course of network adaptation each input vector (consumer profile) t j is assigned to one weight vector or prototype η h so that the distance between them is minimal.For H < J, which is the default, the assignment of J input vectors to H prototypes equals a compression of the data set considered (Gersho and Gray 1992).At the end of the network adaptation process the whole set of consumer profiles is represented by a set of 'prototypes' characterizing the individual lifestyle segments.
Striving for an algorithm, which requires as little as possible prior knowledge concerning the structure of the data to be analyzed, special attention is devoted to the autonomous or rather data-driven determination of network parameters.In the present case only two control parameters have to be preset by the user, namely the so-called compression level CL and the maximum number of adaptation steps L considered to be necessary to adequately represent the available data.The former determines the extent of data reduction, whereas the latter influences the accuracy of data representation.To receive acceptable results L should be significantly larger than the number of observations J.
Setting CL close to 0 is tantamount to the objective of only slightly compressing the available data set and the permission of representing the relevant patterns by a comparatively large number of prototypes.If a compression level close to 1 is selected, only a few prototypes are generated, which involves a high level of generalization and the acceptance of comparatively large distances between the prototypes (weight vectors), if the data features heterogeneous patterns.In any case 0 < CL < 1 must apply.

Self-initialization:
At the beginning of the adaptation process, i.e. in adaptation step l = 1, the neural network, or rather the associated set of nodes ('neurons') U, contains only two non-connected nodes u 1 and u 2 , i.e.U = {u 1 , u 2 } applies, with associated K-dimensional weight vectors η 1 and η 2 , both initialized with positive random values.The set of edges ('connections') C between nodes is empty.In the final neural network or rather its graphical visualization the edges are connecting those nodes which represent similar lifestyle segments.Both sets together determine the topological structure of the initial neural network.The size of these sets, and therewith the size of the whole network, grows in the course of network adaptation.This process comprises two aspects, namely the addition of new nodes and the adaptation of the weight vectors of already existing nodes.The latter is frequently equated with the term 'learning'.
The decision as to whether a new node has to be added to the current network or not in network adaptation step l depends on the extent to which the best matching node u h Best fits the current input vector.The best matching node is the one whose weight vector has the smallest Euclidean distance to the current input vector.According to Marsland, Shapiro, and Nehmzow (2002) we call this the activity v h Best of the best matching node.A formal definition is given later.The smaller the aforesaid distance, the higher the activity is.If the activity of the best matching node falls below threshold we can take this as a hint at an insufficient fit between the respective weight vector and the input vector considered.Depending on the compression level v T hres can have values between exp(− S Max 2 ), for CL → 1, and 1, for CL → 0, where S Max denotes the maximum Euclidean distance between two input vectors.A simple but also quite rough approximation is S Max = K k=1 (max j t jk − min j t jk ) 2 , where max j t jk and min j t jk are the maximum and the minimum value of all input vectors with respect to dimension k.A methodologically more elegant way of approximating S Max is to calculate a random sample of Euclidean dis- . ., J}, j i = j i , and n < J, from the available data set and to define S Max = max{d j 1 , . . ., d jn }.In doing so the distribution of D can be found and an estimation of the standard deviation becomes possible.The approximation of S Max spares us the time-consuming calculation of J 2 distances required for determining the true maximum Euclidean distance.
The addition of new nodes is internally controlled by two further variables, namely the firing counter y h Best and the training requirement w h Best ∈ [0, 1] of the best matching node.The former allows us to take into account how often u h Best has been the best matching node in the network adaptation process so far.The more often a particular node matches the current input best, indicated by its firing counter, the lower its training requirement is.Both variables are initialized as follows: Similar to the addition of new nodes the learning process, i.e. the adaptation of the existent nodes (weight vectors) to the current input, is controlled by special variables as well.The learning rates Best and Second determine the degree to which the best and the second best matching node are adapted to the input.Restricting this process to two nodes is not unusual in the relevant literature (cf.e.g.Fritzke 1995).The higher the learning rates, the stronger the weight vectors concerned are adapted to the present data.For the best matching node we define: Learning rate Best is positively correlated with compression level CL and maximum distance S Max and it has an anchor point at 0.1.In that way we refer to the prerequisite that, as a rule, the larger the range of data, the faster the adaptation of weights should be.Best decreases with an increasing maximum number of adaptation steps L, which equals a smoother and more moderate adaptation of the weight vectors.The learning rate of u h Second corresponds directly to that of u h Best : The smaller L, the more we have to extend the adaptation to the second best matching node and vice versa.A small maximum number of adaptation steps causes the algorithm to adapt the topological neighbors of the best matching node to a high degree in order to achieve a fast generation of the neural network.
A large L, on the other hand, causes a rather moderate adaptation of weights.
Starting from these presets one network adaptation step of the algorithm comprises the following sub-steps S1 -S10: (S1) Input selection and determination of the best matching nodes: An input vector t j = (t j1 , . . ., t jK ), with j ∈ {1, . . ., J}, is randomly selected from the data base, and the Euclidean distances to all nodes of the current neural network are calculated: The smallest distance of all determines the best matching node u h Best : The second best matching node u h Second follows from:

(S2) Calculation of the activity of the best matching node:
The activity of the best matching node is a function of its (Euclidean) distance to the current input vector:

(S3) Calculation of the threshold for the training requirement:
Due to the fact that a node can adequately represent a subset of the data only after a certain number of adaptations, the training requirement has to fall below threshold before a new node is inserted.The larger the learning rate Best , the higher this threshold is and the less intensively u h Best has to be adapted before a new node may be added.The threshold increases with the number of nodes |U | in the network as well.That is to say the more nodes included in the network, the less intensive each of them has to be trained before a new node may be added.
The motivation of this is simple: The more advanced the adaptation process, the better the data set is represented by the available weight vectors.Due to the fact that new nodes are always inserted in the neighborhood of already existing ones, the former have to be trained the less intensively, the later they are added to the neural network.
(S4) Generation of connections between u h Best and u h Second : The nodes u h Best and u h Second will be connected, if either the distance between the respective weight vectors is so small that no further node can be inserted between them or at least one of both is not yet adapted sufficiently.In this case the age a h Best h Second of the concerning connection is set to 0: The above condition guarantees the generation of correct masked Voronoi polyhedra (Martinetz and Schulten 1994).

(S5) Addition of a new node:
If both the activity and the training requirement of the best matching node fall below the respective thresholds and if also the current number of nodes is smaller than the number of input vectors to be represented, then a new node or weight vector is added to the network.Formally, this procedure looks as follows: By inserting the new node between the current input vector t j and the best matching node η h Best the concerning Voronoi polyhedron is divided, which causes an improvement of data representation.The connections of nodes are changed in such a way that the Delaunay triangulation (Martinetz and Schulten 1994) is preserved.The firing counter y h New and the training requirement w h New of the new node are initialized, whereas the corresponding control variables of the best matching node are reset.

(S6) Adaptation of weight vectors:
If no new node is added in sub-step S5 the weight vector of the best matching node as well as those of its topological neighbors are adapted by using learning rates Best and Second .For the best matching node we define: With exponent ln(l + exp(1))/ ln(L + exp(1)) the training requirement is related to the current state of network adaptation.At the beginning of the training process the strength of adaptation declines slowly, but the speed increases the more the algorithm approaches the maximum number of adaptation steps L. So the nodes have more time to fit the given data structure.Weighting the extent of adaptation resulting from (t j − η h Best ) with the training requirement provokes that a node is adapted the less intensively, the more frequently this took place in the past.For the neighboring nodes the basic form is sufficient: Accordingly the extent of adaptation declines faster for the neighboring nodes than for the best matching one.

(S7) Update of control variables:
The firing counter and the training requirement of the best matching node are updated as follows: (S8) Removal of old connections: All connections between the best matching node and its topological neighbors are aged by 1: (S9) Removal of nodes: All the nodes without a connection to any other node and whose contribution to the goodness of data representation is negligible are removed: (S10) Check of the stopping criterion: After having updated the adaptation step counter according to l = l + 1 the stopping criterion l > L is checked.If this holds, the algorithm stops and the connection matrix Otherwise it continues with sub-step S1.

Measures of network performance
At the end of the adaptation process each node u h represents a non-empty subset of data points or rather a lifestyle segment.The goodness of data representation within a segment can be assessed using the maximum distance MD h between the data points (consumer profiles) concerned and the associated weight vector (prototype) η h .Considering the maximum of maxima across all segments leads to the so-called maximum distance error where m j = arg min h∈{1,...,H} dist(t j , η h ) is the subscript of the weight vector which best matches input vector t j .So MDE refers to the worst match of a consumer profile to its associated prototype and assesses the balance of segmentation.A very popular performance measure is the quantization error (Kohonen 2001) which assesses the distortion due to data compression and the effectiveness of node placement in the neural network (Yerramalla, Cukic, and Fuller 2003).Both measures should take lowest possible values.
In lifestyle segmentation, as can be seen in the empirical part of this paper, we are not only interested in good data representation but also in an adequate network topology.The latter is related to the number of nodes and the lengths of the connections in the graph by which the neural network is represented.The fewer nodes this graph has and the shorter the paths between individual nodes (expressed by the edges or connecting lines), the more compact the neural network is.In other words neighboring nodes should represent similar lifestyle segments and should therefore be connected directly by an edge, whereas those nodes, which are located further away from each other, should not.Thus, in the ideal case, a well-trained neural network represents both the existent lifestyle patterns and the similarity of these patterns.We are going to demonstrate this with the synthetic 2D data in section 3. Referring to Marsland, Shapiro, and Nehmzow (2002) the following simple measure of compactness can be defined where c h 1 h 2 indicates whether nodes u h 1 and u h 2 are connected (c h 1 h 2 = 1) or not (c h 1 h 2 = 0).To facilitate comparisons of different adaptation runs regarding the compactness of the respective neural networks C1 can be related to the number of connections: Again, both measures should take lowest possible values.
For more global comparisons we can additionally consider the geometric mean GM of the above performance measures.Due to the congruence of the orientation of the individual measures GM should be as small as possible, too.

Performance Study with 2D Synthetic Data
The performance and adaptability of the GNNAPS approach is analyzed by means of a 2D synthetic data set, which is similar to that used by Martinetz and Schulten (1991) as well as Fritzke (1995).It contains J = 19 160 data points (input vectors), which define manifold graphical objects.Figure 1 illustrates the whole data set.Each object is characterized by the number J i (with J = 7 i=1 J i ) of its data points.The two single lines in particular are worth closer consideration.The significantly different numbers of data points (J 4 = 7 260 vs. J 6 = 100) result from the varying 'density' of the horizontal line on the right hand side.The more one goes to the right on this line, the more data points are represented by the respective section of the line.In contrast to this each of the five black rectangles in the 'chessboard pattern', as well as the dotted rectangle on the left hand side, have been generated with 1 600 data points only.This topological heterogeneity of the input data is a special challenge to the algorithm's adaptability.
If we apply the GNNAPS algorithm to this data with CL = 0.15 and if we restrict the maximum number of adaptation steps to L = 10 7 , the neural network with H = 181 nodes depicted in Figure 2   To draw comparisons we additionally applied the growing neural gas network (GNGN) suggested by Fritzke (1995) to the present data.The GNGN algorithm is a powerful benchmark because of its impressive results in earlier studies, e.g. in a comprehensive comparison with k-means, growing k-means, and the original neural gas network published by Daszykowski, Walczak, and Massart (2002).Initializing the GNGN algorithm with control parameters similar to those used by Fritzke (1995), and fixing the step width for adding new nodes equal to 55 249 in combination with a maximum number of adaptation steps L = 10 7 , results in a pattern representation with performance values QE = 12 422.49,MDE = 3.83, C1 = 870.44,C2 = 2.89, and GM = 104.60.The above-mentioned step width ensures the comparability of results by causing the GNGN algorithm to generate H = 182 nodes on the whole.The corresponding neural network is depicted in Figure 3.
Again the attained representation looks adequate.In particular the representation of the 'chessboard pattern' is without doubt convincing.But the data pattern underlying the horizontal 'density line' claims a comparatively large number of nodes, which slightly hampers the representation of the graphical objects in the lower right and left hand corner.The representation of the large rectangle equals the GNNAPS solution.The visual conformity of both representations is reflected in the individual performance measures as well.The superiority of GNGN over GNNAPS regarding QE and C2 (with values 12 422.49 vs. 17 343.50 and 2.89 vs. 3.09) is opposed to the superiority of GNNAPS over GNGN regarding MDE and C1 (with values 2.49 vs. 3.83 and 801.62 vs. 870.44).Considering the geometric mean one might conclude a slight superiority of GNNAPS over GNGN, which results from the fact that the balance in pattern representation turns out to be somewhat better with GNNAPS.To get a closer impression of how GNNAPS learns the given topological relations, the performance measures for an increasing number of adaptation steps with compression level CL = 0.15 are shown in Table 1.The H values in the second column quickly converge to their final level of 181.However GNNAPS does not increase the number of nodes (starting with H = |U | = 2) monotonously to achieve an acceptable representation of the data, but also reduces them, if this is justifiable with the current stage of adaptation.The GNGN algorithm lacks this flexibility per definition.At the same time the GM value of GNNAPS declines continuously.As few as 10 6 (i.e.≈ 50 • J) adaptation steps are sufficient to approach a relatively stable representation of the relevant patterns.Larger numbers of adaptation steps primarily cause a kind of fine-tuning.For comparison purposes the performance measures of GNGN are given in brackets.For each L the step width for adding new nodes was selected in such a way that the GNGN solution is comparable to that of GNNAPS, as far as possible.Once again the parametrization of GNGN was aligned with suggestions by Fritzke (1995).The impressive performance of GNGN with respect to the quantization error QE is opposed to the superiority of GNNAPS regarding the maximum distance error MDE.The more advanced the adaptation process, the better the measure of compactness C1 turns out to be with GNNAPS, whereas C2 levels off close to 3 for both algorithms.
The differences between both algorithms with respect to QE and MDE can easily be motivated.The GNGN algorithm minimizes the quantization error, i.e. feature spaces showing a high density of data points are represented very well (cf.the 'chessboard pattern'), whereas the contrary applies to feature spaces with a comparatively low density of data points (cf. the graphical objects in the lower left and right hand corner).The GNNAPS algorithm, on the other hand, also minimizes the maximum distance error MDE.Thus, it is less susceptible to data heterogeneity and attains a more balanced pattern representation.Moreover, since the number of nodes generated by the GNGN algorithm directly depends on the number of input signals and the selected step width for adding new nodes, its use requires some experience regarding the definition of the latter.Otherwise the parametrization may degenerate into a troublesome trial-and-error process, if the internal structure of the data is unknown.In this respect the adaptation of GNNAPS is self-controlled to a high degree, which eases its use by non-specialists, e.g. in marketing and consumer research.

The data
In this section we are going to use the new algorithm to empirically determine a consumer typology from real survey data.The data underlying the following considerations was made accessible by the German ZUMA Institute and is part of a sub-sample of the 1995 GfK ConsumerScan Household Panel Data.For a detailed description of this data set see Papastefanou, Schmidt, Börsch-Supan, Lüdtke, and Oltersdorf (1999).It contains socio-economic and demographic characteristics of several thousands of households/consumers as well as individual attitudes towards nutrition (e.g.slimness orientation, plain fare, and brand products), aspects of daily life (e.g.traditional living, convenience-orientated cooking, and mistrust towards new products), environment (e.g.ecological awareness, mobility, and industry), and shopping (e.g.tendency to purchase new products, price consciousness, and preference for small retail stores).A considerable number of the respective statements or items are more or less concerned with individual nutrition behavior (e.g.: 'Multivitamin juices are an important supplement to daily nutrition.').The special relevance of food-related lifestyle analysis has already been emphasized by Grunert, Brunsø, andBisp (1997), de Boer, McCarthy, Cowan, andRyan (2004) and others.Hollensen (2004), in particular, states that food consumption habits may even be used as a general indicator of lifestyle in international or global marketing.To identify existing lifestyle patterns we consider the attitudes of J = 4 266 households/consumers measured by means of K = 81, mostly Likert-scaled items (with 1 ≡ 'I definitely disagree.', . . ., 5 ≡ 'I definitely agree.').The scale is assumed to be equidistant, and therefore the data can be treated as metric.

Selected results
From a practical point of view the given task requires a comparatively high degree of generalization to get an easy-to-grasp set of prototypes representing reasonable groups of consumers.Therefore GNNAPS was applied with a high compression level, namely CL = 0.98.Together with a maximum number of adaptation steps L = 10 7 we get H = 14 prototypes involving performance values QE = 36 355.28,MDE = 16.67,C1 = 193.71,and C2 = 5.87.Fixing CL close to 1 causes a strong compression of the data, and ergo, provides a small number of prototypes or consumer types respectively.
The numbers of consumers represented by weight vectors η 1 = (η 11 , . . ., η 1,81 ), . . ., η 14 = (η 14,1 , . . ., η 14,81 ) range between 129 and 514.Each 'lifestyle prototype' can be described by considering its specific attitude profile.As to be expected Prototype η 1 represents about 7.15 % (= 305) consumers with a rather hedonistic way of life.Weight η 13 = 4.54 ++ in the first row of the table, for example, indicates their clear agreement, on average, with the statement 'I like to have company.'.Consumers of this type like to try new and unknown things and are willing to take care of their body by attaching great importance to healthy nutrition.However they do not show a disposition to the extreme such as pure vegetarian diet.This applies to all groups with the exception of number 9 where a distinct refusal of 'normal' consumer behavior can be observed.Members of this group have maximum weights on items such as 'I regard most new products on the market as unnecessary.'(η 91 = 3.95 ++ ) and 'I am very distrustful of advertising messages.'(η 9,11 = 4.43 ++ ).They strictly disagree with the statement 'There is too much fuss about diets.'.Members of the first group are called the 'hedonists' in the following.
Those 166 consumers (≡ 3.89 %) that are represented by prototype η 10 may be characterized as conservative and less health-orientated with a frugal diet and lifestyle.They show little interest for new products and strictly decline extreme forms of nutrition such as vegetarianism.We label this group the 'conservatives' in the following.Likewise the 'hedonists' they tend to question whether it helps much trying to take into account all nutrition tips (see last row of Table 2).In this respect both consumer groups stand out against those represented by prototype η 5 .Due to the fact that this prototype represents the largest group of consumers, namely 514, we can use it as a reference object.This becomes apparent by the average level of the individual weights, which lie mostly between those of prototype η 1 and prototype η 10 .The comparatively low value of group 5 regarding the last item results from the less distinct profile of these 'average consumers'.Moreover, none of the 81 weights of prototype η 5 equals the maximum or minimum across all groups.
Further items which strongly discriminate between the 14 prototypes are, e.g., the relevance of product quality in purchase decisions, the preference for brand products, the time spent to prepare a meal, the attention paid to mild/nonirritant foods, the purchase of products without additives, and the preference for domestic goods.The straight interpretability of the individual weights is a considerable advantage over traditional segmentation tools such as hierarchical cluster analysis and multidimensional scaling.
To get a global impression of the relations between the individual prototypes we can look at the 2D projection of the neural network.The connection graph in Figure 4 visualizes the topological structure of the network underlying the lifestyle segmentation.The relative size of the nodes indicates the number of consumers being represented by the respective prototype.The age of the connections existing in adaptation step L is represented by the thickness of the corresponding edges in the connection graph.By this means the strength of the relations between the prototypes or lifestyle segments is depicted.The thicker an edge is drawn, the stronger the respective relation is.From Figure 4 we learn that there is no direct connection (edge) between prototypes η 1 and η 10 which is in accordance with the dissimilarities between both groups of consumers regarding their lifestyles.The missing connection to the reference object η 5 in both cases is plausible as well.The connection graph as a whole expresses the complexity of the inter-segment relationships in the considered population.

Managerial implications
Lifestyle analyses by means of GNNAPS are useful for decision making in fields like product positioning and promotion strategy development (Mowen and Minor 1998).The empirical results motivate, for instance, the creation of new top quality food products with a distinct health-orientation.Top quality should also include high standard taste requirements.This is conformable with Jago (2000) who argues that health is one of the most important drivers of new product development in food industry.Pronounced associations with vegetarian food should however be avoided here.Products of this type would precisely meet the preferences of a considerable 7.15 % of the population represented by the available data.The remaining consumer types or lifestyles can be analyzed in an analogous manner with respect to their implications for new product development or advertising planning for instance.

Conclusions
In this paper we have introduced a new algorithm which has proven its ability to detect heterogeneous data patterns with a comparatively small number of parameters to be controlled by the user.In contrast to the established GNGN approach, which features significantly more parameters to be preset, GNNAPS works highly satisfactorily with two external control parameters (L and CL) only.The process of determining the size and the structure of the neural network is quite autonomous in this respect.The comparatively small number of external parameters facilitates the use of the new algorithm in practice, e.g. in exploratory data analysis and marketing research, and reduces the total time required for network adjustment.
In the empirical study we were able to show that the principle of computing prototypes from consumer behavior data corresponds directly to classification tasks in marketing research, and particularly in lifestyle segmentation.The poor requirements of the algorithm regarding the prior knowledge about the internal structure of the survey data enabled an exploratory analysis in the primary sense.In the present example a straight interpretation of the individual weights towards future food preferences and appropriate managerial implications was possible.

Figure 1 :
Figure 1: Visualization of the 2D synthetic data set results.The corresponding performance measures are QE = 17 343.50,MDE = 2.49, C1 = 801.62,and C2 = 3.09, which leads to a geometric mean GM = 101.70.The dots and lines denote the nodes and edges (connections) of the neural network.The larger a node is drawn, the more input vectors are represented by the respective weight vector.Obviously all graphical objects, including the horizontal 'density line' on the right hand side, are represented adequately by the available set of nodes.

Figure 4 :
Figure 4: Connection graph (with |C| = 33) The current maximum age of a connection is a Max = |U | − 1 and equals the maximum number of edges emanating from a node u h , if all the other nodes are topological neighbors of u h .The latter results, if u h is always the best or second best matching node, and if the probability of becoming the best or second best matching node is equal for all the other nodes.All connections exceeding the maximum age, and ergo being dispensable, are removed:

Table 1 :
Performance of GNNAPS and GNGN (in brackets)on the 2D data

Table 2 :
Profiles of selected 'lifestyle prototypes' To create expressive consumer types it is advisable to focus on those items which show the clearest differences across all prototypes.In Table2the prototypes of three exemplary groups of consumers are depicted.Each prototype is represented by its weight vector, i.e. η 1 , η 5 and η 10 .Item weights equaling the maximum or minimum across all prototypes are marked with ++ and −−, whereas signs + and − denote the second largest and smallest values.The 'extreme values' indicate a clear approval or disapproval regarding the corresponding statement.For the sake of clarity only those items have been listed which have an 'extreme value' either for η 1 or η 10 .