Mosaics for Visualizing Knowledge Structures

Mosaic plots are state-of-the-art graphics for multivariate categorical data in statistical visualization. Knowledge structures are mathematical models that belong to the theory of knowledge spaces in psychometrics. This paper presents an application of mosaic plots to psychometric data arising from underlying knowledge structure models. In simulation trials and with empirical data, the scope of this graphing method in knowledge space theory is investigated.


Introduction
Graphics are a powerful tool for presenting and exploring data, especially when implemented in interactive visualization software.Graphics help in understanding data and in determining structure.They are easy to create, convenient to use, and they can present information effectively.Interactive graphical approaches become indispensable in particular when analyzing large and complex datasets.Theus and Urbanek (2008) present interactive graphics.Cook and Swayne (2007) address dynamic graphics.The "Handbook of Data Visualization" by Chen, Härdle and Unwin (2008) provides an overview of the current state of affairs.
In the graphics literature, various types of plots for categorical data have been proposed.Graphics for univariate categorical data are barcharts and spineplots (Hummel, 1996).For multivariate categorical data, the modern graphic is the mosaic plot, which was originally introduced by Hartigan and Kleiner (1981).The classical mosaic plot has been further developed and now comes with the variations equal binsize mode and doubledecker plot (for comparing highlighted proportions), fluctuation diagram (for identifying common combinations), and multiple barcharts (for comparing conditional distributions); see Friendly (1994Friendly ( , 2000) ) and Hofmann (2008).Other graphics for multivariate categorical data are, for instance, parallel sets (Kosara, Bendix and Hauser, 2006) and bargrams (Wittenburg, Lanning, Heinrichs and Stanton, 2001).Both graphical methods work well for few variables, and if the main focus is on analyzing pairs of variables.In the present context, however, we are interested in the total number of cases for every combination of all variables altogether.Mosaic plots present that information perfectly, hence we propose their usage in this paper.Nonetheless, graphics such as parallel sets and bargrams could be used to explore KST data in lower dimensions, for instance for the visual prediction of mastery dependencies for pairs of test items (cf., Ünlü and Kickmeier-Rust, 2006).This is out of the scope of the present paper, which gives a first application of the mosaic display in KST, and could be investigated in future research.Doignon and Falmagne (1985) introduced knowledge space theory (KST).Most of the theory of knowledge spaces is presented in a monograph entitled "Knowledge Spaces" by Doignon and Falmagne (1999).A comprehensive bibliography on KST, including many references on applications of KST, can be retrieved from http://wundt.kfunigraz.ac.at/kst.php.For concrete application examples, see in particular Albert and Lukas (1999).The theory of knowledge spaces has been successfully applied for the computerized, adaptive assessment and training of knowledge; for instance, see the Assessment and LEarning in Knowledge Spaces (ALEKS) system, a fully automated math tutor on the Internet, http://www.aleks.com.
In KST, (interactive) statistical graphics have not been considered so far.Prerequisite (or precedence) and Hasse diagrams are seen and used graphics.However, they do not really provide new insights.Their shape is determined by the models, they do not graphically display the raw data, and they are merely utilized for static presentation rather than exploration.In this paper, the mosaic plot is applied to response data arising from underlying knowledge structure models in KST.Mosaic plots can display the data and knowledge structure jointly in a single graphic.Aggregating over items in the mosaic plot, as is implemented in an interactive manner in visualization software, allows visually displaying the traces of the data and knowledge structure on subsets of the item set.Considering projections of knowledge structures on subsets of item sets, in turn, is important in KST (e.g., Falmagne, 2008).In simulation trials, using the basic local independence model as the data generating model, the scope of this graphing method in visually detecting the underlying knowledge structure is investigated.
Mosaic plots provide an interesting new perspective on visualizing and deriving knowledge structures from data.The connection between the mosaic plot and KST has not been made before and is a novel contribution.At this point it is important to note that mosaic plots and numerical algorithms for automatically deriving knowledge structures are not competing approaches, but rather they are supportive of each other.The results gained from visually inspecting mosaic plots, for instance, can be consulted as a reference against which to compare the findings obtained from KST numerical data analysis methods.In particular, this paper does not propose investigating whether automatic approaches or the use of graphics lead to better results.They have to be seen as complementary, not competing, approaches.
The graphics in this paper are created using Mondrian, a statistical data visualization software featuring modern interactive visualization techniques.Mondrian has been developed by Theus (2002) in the programming language Java.It is available at no cost for Windows, Mac OS X, and UNIX, by download from http://www.rosuda.org/Mondrian/.

Mosaic Plots
The mosaic plot was originally introduced by Hartigan and Kleiner (1981), for visualizing contingency tables.This section gives a brief introduction to mosaic plots.
A mosaic plot consists of groups of rectangular tiles.Each tile corresponds to one cell of a contingency table.The tile's area is proportional to the cell's count.Its shape and location in the display are determined by how the mosaic plot is constructed.The following step-wise construction of a p-dimensional mosaic plot is used (Hofmann, 2008): Let X 1 , X 2 , • • • , X p be p categorical variables.Let c i be the number of categories of variable X i (1 ≤ i ≤ p).
1.For i = 0, start with an initial single rectangle r 0 , of width w 0 and height h 0 .Let i = 1.
2. For each rectangle r (for r 0 take all the observations in the data).Determine the (relative) breakdown for the variable X i ; that is, among the observations corresponding to r , count the number of those observations that fall into each of the categories of X i .Split the width (height) of rectangle r into c i pieces, where the widths (heights) are proportional to the breakdown, and keep the height (width) of each the same as r 3. Increase i by 1. 4. While i ≤ p, repeat steps 2 and 3 for all r The construction of a mosaic plot implies a unique labeling of the tiles according to the realized splits.Theoretically, as long as the order of the variables used in the mosaic plot is known, the labels of the tiles can be determined.Practically, however, there exists no good way of labeling the tiles for presentation purposes, if the number of categories or variables is large.This problem can be solved using interactive graphics, where querying of the tiles for their labels is possible.See Figure 16 for an example.
Table 1 shows a cross-tabulation of 340 German students answering two dichotomous questions on mathematical literacy (see Section 5 for details).This is the classical form of the mosaic plot.There are the variations equal binsize mode (each cell is allocated the same amount of space, and the information is reduced to the binary case of whether a cell is or is not empty), doubledecker plot (instead of alternately splitting the x and y axes, only the x axis is used), fluctuation diagram (each cell is allocated the same amount of space, and the cell with the maximum frequency fills its space completely, thus fixing the scale for the rest of the diagram), and multiple barcharts (each cell is allocated the same amount of space, and only the heights of the bars in the cells are scaled).* Figure 2 shows a classical mosaic plot, fluctuation diagram, and multiple barcharts of the 340 German students answering five dichotomous test items of the mathematical literacy test, with the underlying (true) knowledge states highlighted (see Section 5 for details).Experimentation with the mosaic plot for visually detecting knowledge states from data has shown that the classical form of the mosaic plot is not as appropriate as the other two variants.This is also indicated by Figure 2. Furthermore, since it is easier to compare heights (in multiple barcharts) than areas (in fluctuation diagram), in the sequel, we confine ourselves to using multiple barcharts.† Theoretically, mosaic plots can be applied to any number of categorical variables.Practically, however, space and display resolution are limiting factors.
Nevertheless, experimentation with the mosaic plot has shown that up to 11-12 dichotomous items the fluctuation diagram and multiple barcharts are applicable.The classical mosaic plot can be used up to 13-14 dichotomous items, albeit this variant, as compared to the former two, is not that effective for inspecting knowledge structures in response data.

Knowledge Space Theory
This section starts with a motivating small example (Falmagne, Cosyn, Doignon and Thiéry, 2006), and then reviews some of the basic deterministic and probabilistic concepts of KST.For details, the reader is referred to the aforementioned references.

Example: Elementary Algebra
A natural starting point for a theory of knowledge assessment and training stems from the observation that some pieces of knowledge may imply other pieces of knowledge.In the context of the following example, the mastery of some algebra problem may imply the mastery of other algebra problems.Such implications between pieces of knowledge can be used to design efficient computer-based, adaptive knowledge assessment and training procedures.
As an example, we consider six (dichotomous) problems in elementary algebra.
a. A car travels on the freeway at an average speed of 52 miles per hour.How many miles does it travel in 5 hours and 30 minutes?
b. Using the pencil, mark the point at the coordinates (1, 3).
c. Perform the multiplication 4x 4 y 4 •2x•5y 2 and simplify your answer as much as possible.
d. Find the greatest common factor of the expressions 14t 6 y and 4tu 5 y 8 .Simplify your answer as much as possible.
f. Write an equation for the line that passes through the point (−5, 3) and is perpendicular to the line 8x + 5y = 11.
A plausible prerequisite (or precedence) diagram of mastery dependencies for the six elementary algebra problems may look like in Figure 3.The prerequisite diagram in Figure 3 completely specifies the feasible knowledge states.A respondent can certainly master just Problem a.This does not imply mastery of any other problem.In that case, the knowledge state is {a}.However, if the respondent masters e, for instance, then a, b, and c must also be mastered.This gives the knowledge state {a, b, c, e}.Analyzing the prerequisite diagram in this way, we see that there are exactly 10 knowledge states consistent with the diagram: This set K of all possible knowledge states is called knowledge structure.These notions are formalized mathematically in the next section.

Basic Deterministic Concepts
A general concept is that of a knowledge structure.Definition 1.A knowledge structure is a pair (Q, K) in which Q is a non-empty set, and K is a collection of subsets of Q containing at least the empty set ∅ and Q.The set Q is called the domain of the knowledge structure.The elements q ∈ Q and K ∈ K are referred to as (test) items and (knowledge) states, respectively.
As an example knowledge structure consider the one described in Section 3.1, on the domain of the six elementary algebra problems.
Note that this example knowledge structure is closed under union and intersection.
The notions of a knowledge structure and (quasi ordinal) knowledge space are at the level of persons (representing collections of knowledge states of individuals).There is another important notion, that of a surmise relation, which is at the level of items (representing collections of mastery dependencies between items).Definition 3. Let Q be a non-empty set of items.Any quasi order-that is, reflexive and transitive binary relation-on Q is called a surmise relation.Birkhoff (1937)'s theorem (see also Doignon and Falmagne, 1999, Theorem 1.49) provides a linkage between quasi ordinal knowledge spaces and surmise relations on an item set.

As an example surmise relation consider the relation which corresponds to the prerequisite diagram of mastery dependencies in
Theorem 1.There exists a one-to-one correspondence between the collection of all quasi ordinal knowledge spaces K on a domain Q, and the collection of all surmise relations on Q.Such a correspondence is defined through the two equivalences (p, q ∈ Q, K ⊂ Q): This theorem is important from a practical point of view.Though the quasi ordinal knowledge space and surmise relation models are empirically interpreted at two different levels, at the levels of persons and items, respectively, they are connected with each other mathematically, through Birkhoff's theorem.
The 10 knowledge states (of the example knowledge structure) consistent with the prerequisite diagram are obtained applying the second equivalence of Birkhoff's theorem.

Basic Probabilistic Concepts
Knowledge states are latent and not directly observable, due to random response errors.A person who is actually unable to solve an item, but does so, makes a lucky guess.On the other hand, a person makes a careless error, if he/she fails to solve an item which he/she is capable of mastering.A probabilistic extension of the knowledge structure model covering random response errors is the basic local independence model.We use this probability model for simulating data in this paper (for the simulation algorithm, see Sargin and Ünlü, 2009).
with p(K) > 0 for any K ∈ K, and K∈K p(K) = 1; with constants β q , η q ∈ [0, 1[ for each q ∈ Q, respectively called careless error and lucky guess probabilities at q.
To each knowledge state K ∈ K is attached a probability p(K) measuring the likelihood that an examinee is in state K (Point 2).For R ∈ 2 Q and K ∈ K, r(R, K) specifies the conditional probability of response pattern R for an examinee in state K (Point 3).The item responses of an examinee are assumed to be independent given the knowledge state of the examinee.The response error probabilities β q , η q (q ∈ Q) are attached to the items and do not vary with the knowledge states (Point 4).Note that are mastered but not solved (careless error), mastered and solved (no careless error), solved but not mastered (lucky guess), and not solved and not mastered (no lucky guess), respectively.
The BLIM allows expressing the (manifest) occurrence probabilities of the response patterns by means of the (latent) model parameters.
Corollary 1.Under the BLIM, the occurrence probabilities ρ(R) of response patterns R ∈ 2 Q are parameterized as The parameters of the BLIM are p(K) (K ∈ K) and β q , η q (q ∈ Q).The number of independent model parameters is 2|Q|+(|K|−1).(For a set X, |X| denotes the size of X.)Because the size of K generally tends to be prohibitively large in practice, parameter estimation and model testing based on classical maximum likelihood methodology are not feasible in general (see, e.g., Ünlü, 2006).
In KST, knowledge structures are built by qualitative or exploratory approaches; namely, by querying experts (e.g., Düntsch and Gediga, 1996), from postulated psychological assumptions (e.g., Düntsch and Gediga, 1995), or by numerical data analysis procedures (e.g., Sargin and Ünlü, 2009;Schrepp, 2003).In this paper, we consider two KST exploratory data analysis methods in more detail, and demonstrate that these methods can be more effective when combined with the graphical mosaic plot technique.The mosaic plot provides an interesting new perspective on graphically displaying and deriving knowledge structures from data.
We close this section with final remarks.
1.The connection of KST to other theories has been investigated in several publications.For instance, its connection to item response theory has been discussed by Stefanutti (2006) and Ünlü (2006, 2007).Schrepp (2005) and Ünlü (2006, 2011) have outlined KST's relationship to latent class (scaling) analysis (including Guttman scalogram analysis).
2. The connection of KST to log-linear models has not been worked out so far-there exists no log-linear reformulation of KST, although latent class models can also be specified as log-linear models (e.g., Haberman, 1979;Hagenaars, 1993).A formulation of KST in the framework of log-linear models would be useful particularly for the purposes of this paper: there is statistical literature published on the relationship between mosaic plots and log-linear models (e.g., Friendly, 1994Friendly, , 2000;;Theus and Lauer, 1999).However, this is out of the scope of this paper, which gives a first application of the mosaic plot technique to a traditional, latent class modeling based formulation of KST.In KST, the BLIM (a restricted latent class model) is fundamental, in the sense that most of the KST probabilistic models are special cases of the BLIM.Nevertheless, a log-linear reformulation of KST and systematic applications of published literature on mosaic plots and log-linear models in KST are important directions for further research (cf., Section 7).

Data Generating Models
Based on the BLIM (Section 3.3), we simulated data using each of these knowledge structures, endowed with the uniform probability distribution (and later, with a skewed distribution as well).We varied the sample size n, and a single response error rate β q = η q = τ (q ∈ Q).The aim is investigating the extent to which the underlying knowledge states can be visually recovered from mosaic plots of the simulated data.

Mosaic Plot Representation of the Knowledge Structures
The test items form an |Q|-way dichotomy.Each knowledge state uniquely corresponds to a cell of this cross classification, which subsumes exactly the items mastered by the examinee.The tiles of the mosaic plot subsequently corresponding to the knowledge states of the knowledge structures K 1 , K 2 , and K 3 are shown in Figure 5. § Note that the two tiles in the upper left and lower right corners of the mosaic plot correspond to ∅ and Q, respectively, which always are assumed to be knowledge states.
Figure 5: Mosaic plot tiles corresponding to the knowledge states of the quasi ordinal knowledge spaces K 1 (left), K 2 (middle), and K 3 (right) (highlighted) § As described in Section 2, the construction of a mosaic plot implies a unique labeling of the tiles according to the realized splits.In Figure 5 (and in all other plots of this paper), the order of the variables used is a, b, c, d, and e, and the category order is 0 and 1, for left and right or top and bottom on the x-axis or y-axis, respectively.See also Figure 16.
The mosaic plots in Figure 5 represent uniformly distributed knowledge states.A mosaic plot representation for the skewed case is given in Figure 6.

Results
First, we consider a "near to ideal" situation; a large sample size n = 1600 and a small response error rate τ = 0.03. Figure 7 shows the multiple barcharts views of the (three) datasets (one per each knowledge structure) simulated for these settings.¶ Figure 7: Mosaic plots of the datasets simulated from K 1 (left), K 2 (middle), and K 3 (right) for n = 1600 and τ = 0.03.The underlying knowledge states are highlighted ¶ Although, in the sequel, just one simulated dataset is presented (one per each knowledge structure and settings), further simulation trials and the corresponding mosaic plots have yielded similar results.Because of typographic reasons, however, we omit these in the paper.Moreover, the problem of how to design and to present extensive and representative simulation studies and their results for investigating (interactive) mosaic plots, or (interactive) graphics per se, is an interesting and nontrivial problem on its own.Work of this type needs to be pursued in further research, and obviously cannot be the scope of the present paper.
The three multiple barcharts in Figure 7 give an unambiguous picture.For each of the knowledge structures, the tiles corresponding to the knowledge states (cf., Figure 5) clearly emerge, as compared to the ones that do not correspond to the states.(The knowledge states are highlighted in red.) Next, we consider a more realistic situation; a small sample size n = 100 and a larger response error rate τ = 0.05. Figure 8 shows the mosaic plots of the datasets simulated for these settings.Apart from one or two exceptions for K 3 , the mosaic plots still give a clear picture of the underlying knowledge states, albeit not as unambiguous as the previous ones.As compared to the plots in Figure 7, the tiles that do not correspond to the knowledge states now have larger heights but yet are discriminable from the tiles corresponding to the states.The more knowledge states appear, the more difficult it is to spot them.As shown in Figure 9 (n = 1600, τ = 0.05), the mosaic plots become better, and again unambiguous, when sample size increases.Finally, we consider a "far from ideal" situation; a small sample size n = 100 and a large response error rate τ = 0.20.Figure 10 shows the multiple barcharts views of the datasets simulated for these settings.Having considered simulation results for the uniformly distributed case, next we want to present some findings for the skew distributed example (K 2 , p) in Figure 6.
On the left hand side, Figure 12 displays the results for a "near to ideal" situation, and for a more lifelike situation on the right.Both plots permit seeing the underlying knowledge states by the eyes.In Figure 13, the response error rate has been increased (τ = 0.20) and this leads to different results for two sample sizes (n = 100 versus n = 3200).The knowledge states are discriminable, apart from some exceptions for the small sample size.The mosaic plot becomes better, and again unambiguous, when sample size increases.The results for the skew distributed knowledge states shown in Figures 12 and 13 are in line with those for the uniformly distributed case aforementioned, as well as with findings obtained from further simulation trials using other skewed distributions (not reported in this paper).In summary: The more knowledge states were used, the more difficult it was to recover them visually in mosaic displays.Smaller response error rates implied more reliable graphical detection.With increasing sample size the results became even better, both in the uniformly and the skew distributed examples.

Application to Empirical Data
In this section, the mosaic plot is applied for detecting knowledge states in an empirical dataset, which is from the Programme for International Student Assessment (PISA; http://www.pisa.oecd.org/).Static plots do not allow interacting with graphics.User interaction (e.g., Theus and Urbanek, 2008;Unwin, Theus and Hofmann, 2006) is seen to be helpful in exploring these assessment data.

PISA Dataset
We analyze part of the 2003 PISA data which consists of 340 German students answering five questions on mathematical literacy; Q = {a, b, c, d, e}.The dataset (available from the authors) consists of 0/1 scores; an incorrect answer is scored 0, and a correct answer 1.
Table 2 shows a frequency table of the incorrect and correct answers to the five test items of the 340 German students.These items form a Rasch scale.That is, the (dichotomous) one-parameter logistic item response theory model (Fischer and Molenaar, 1995) fits the data very well (goodness-of-fit and item fit).Under this model, the following item difficulties for the five questions are estimated: −2.07 (Item a), −1.22 (Item b), −0.04 (Item c), 1.44 (Item d), and 2.18 (Item e).Hence, the items most likely form a chain, which is considered as the underlying surmise relation in the subsequent analyses (see Figure 14).
The corresponding quasi ordinal knowledge space (Theorem 1) consists of the states ∅, {a}, {a, b}, {a, b, c}, {a, b, c, d}, and Q.The multiple barcharts in Figure 15 give a satisfactory picture.The two tiles in the upper left and lower right corners of the mosaic plot correspond to the knowledge states ∅ and Q.The tiles representing the remaining states reasonably emerge, as compared to the ones that do not correspond to the states.
In Figure 16, the three (most likely) non-state tiles with relatively large heights are queried.
Figure 16: Standard querying gives basic information about the three nonstate tiles with relatively large heights.The underlying knowledge states are Queried information, especially when utilized in combination with linked plots, may prove valuable in studying specific aspects of a phenomenon.For instance, consider the middle plot of Figure 16.If the data are open format test data such that guessing effects are nearly eliminated by appropriate item formulation, the majority of the cases landing in the queried tile most probably committed a careless error on Item d.This could furthermore explain why, compared to that non-state tile, there are fewer cases contained in the lower right corner tile, which corresponds to the knowledge state Q.
Other useful interactive visualization techniques are, for instance, aggregation and linking.Figure 17   Interactive queries give the following information.There are 79 examinees solving Item d.Almost all of these cases fall into the three tiles of the aggregated mosaic plot corresponding to the induced states {a, b} (third row, third column), {a, b, c} (third row, fourth column), and {a, b, c, e} (fourth row, fourth column).The absolute / relative frequencies are, in respective order, 16 / 20.78%, 40 / 37.38%, and 12 / 41.38%.Note that these proportions can also be seen from the mosaic plot.The relative frequencies are estimates of the conditional probabilities that Item d is solved given any of the former induced states.The obtained figures support the underlying chain hierarchy among the five test items.Item d is a successor to Item c, and a predecessor to Item e.In other words, graphical considerations of such a type may help in classifying (e.g., new) test items appropriately.

Numerical Data Analysis Methods
In this section, two KST numerical data analysis methods for deriving surmise relations on sets of dichotomous items are considered.These methods are shown to be more effective when applied in combination with the mosaic plot.

Inductive Item Tree Analysis
The two inductive item tree analysis (IITA) algorithms considered in this paper are the original algorithm by Schrepp (2003Schrepp ( , 2006)), and the corrected algorithm by Sargin and Ünlü (2009) and Ünlü and Sargin (2010).In both methods, competing surmise relations are generated, and a fit measure is computed for every relation in order to find the one that fits the data best.
The first step, the same for both algorithms, is the inductive generation of surmise relations.The idea is to eliminate iteratively item pairs that cause intransitivities.The constructed surmise relations constitute the selection set of the IITA procedures, from which the surmise relation that fits the data best is selected.
Besides the construction of the surmise relations, in a second step, it is very important to find that surmise relation which fits the data best.The idea is to estimate the expected numbers of counterexamples for each surmise relation of the selection set, where a counterexample is an observed response pattern contradicting a mastery dependency i j (mastering j implies mastering i) of the surmise relation .
In a last step, over all competing surmise relations, the surmise relation with the minimum value for the discrepancy between the observed and expected numbers of counterexamples is chosen.The fit of each surmise relation to the data is quantified based on average sums of the quadratic differences between the observed and expected numbers of counterexamples.

Results
We reconsider the dataset simulated from K 3 for a sample size n = 1600 and a response error rate τ = 0.20, visualized in Figure 11 (right plot) of Section 4.3.Analyses of this dataset using the original and corrected IITA algorithms give the results reported in Figure 18.The original IITA algorithm is clearly outperformed by the corrected.The former erroneously includes nine non-states (e.g., fourth row, first column), the latter misses only one state (second row, second column).This confirms the results obtained in Sargin and Ünlü (2009).More importantly, seeing the mosaic plot the knowledge state missed by the corrected IITA version could be detected visually and be added subsequently, giving the true knowledge structure underlying the dataset.The mosaic plot clearly substantiates the implausibility of the knowledge structure returned by the original IITA algorithm (e.g., first two non-state tiles in fourth row).
Figure 19 shows the knowledge structure resulting from an original IITA analysis of a dataset simulated from K 1 for a sample size n = 400 and a response error rate τ = 0.03.Note that this dataset has not been used before; however, see Figure 7 (left plot).
Figure 19: Mosaic plot of the dataset simulated from K 1 for n = 400 and τ = 0.03 (cf., Figure 7).The knowledge states obtained for this dataset under the original IITA algorithm are highlighted The multiple barcharts in Figure 19 give an unambiguous picture.Indeed, they provide valuable information.The tile corresponding to the knowledge state missed by the original IITA algorithm (second row, second column) clearly emerges, as compared to the non-state tiles.This knowledge state could be added subsequently, giving the true knowledge structure underlying the simulated dataset.The corrected IITA algorithm applied to this dataset yields K 1 .
Next, we reconsider the PISA dataset (Section 5), visualized in Figure 15.Analyzing this empirical dataset using the original and corrected IITA algorithms gives the results reported in Figure 20.The original and corrected IITA algorithms both detect all the underlying knowledge states.The original IITA algorithm additionally includes seven nonstates; for instance, the non-state represented by the tile in the first row and third column.Using the mosaic plot, this tile would certainly be discarded.Note that the tiles in the upper left and lower right corners correspond to the knowledge states ∅ and Q.The corrected IITA algorithm, on the other hand, only includes one non-state.Yet the tile representing this non-state (third row, last column) has a relatively large height (cf., also the remarks to Figure 16).The inclusion of this non-state could probably be avoided using the mosaic plot.The empty set and the domain are always states.The heights of the remaining state tiles are clearly larger than the height of that non-state tile.

A Cautionary Note on the Scope of this Paper
As mentioned in Section 1, the mosaic plot and the IITA data analysis methods do work together, in the sense that they are not competing approaches, but are supportive of each other.The IITA algorithms have been shown to be more effective when applied in combination with the mosaic plot.The results gained from visually inspecting the mosaic plots have been consulted as a reference against which to compare the findings obtained from the IITA procedures.Hence, the question whether one approach, numerical or graphical, is better than the other is not relevant, but even misleading.Nor can the aim of this paper be discovering or investigating general theoretical results.Graphics are intuitive and intrinsically empirical, and this in particular is why they are effectively supplementing theoretical approaches.Taking that into account, it is not surprising that both approaches (mosaic plot and IITA algorithms) do lead to similar results.But, nonetheless, one can gain benefit from using both approaches in combination.

Conclusion
In this paper, we have mentioned but not exploited the important connection between mosaic plots and log-linear models (see Remark 2 in Section 3.3).The papers published on this connection propose extensions of the mosaic plot to highlight patterns of deviations from log-linear models for categorical data.For instance, color and shading of the tiles are introduced to represent sign and magnitude of residuals from a specified log-linear model.Work of this type is out of the scope of the present paper and needs to be pursued in future research (after having developed a log-linear reformulation of KST).Such a refined approach incorporating residuals could be used for the stepwise graphical building of knowledge structures, and for visually judging the fit or adequacy of a KST log-linear model.
As a first application of mosaic plots in a traditional, latent class modeling based formulation of KST, we have investigated the scope of the mosaic display for visually inspecting latent knowledge states in discrete multivariate response data in KST.This connection has not been made before and is a novel contribution.We have seen that this graphing method reveals the underlying knowledge structure very well for smaller response error rates.The results become even better with increasing sample size.The more knowledge states occur, the more difficult it is to recover them.This applies to both uniform and skew probabilistic knowledge structures.The mosaic plot has also been satisfactorily applied to the PISA dataset.Since numerical data analysis procedures in KST may produce implausible findings, a graphical approach such as the present one can provide valuable information with which to compare the results obtained from numerical methods (e.g., Figures 19 and 20).Graphics, especially when implemented in interactive software, are particularly valuable for exploratory work (e.g., Theus and Urbanek, 2008); see Figures 16 and 17.They do complement, not compete with or replace numerical or confirmatory statistical approaches.
Future research on graphics in KST could include visualizing combinatorial properties of a knowledge structure.The simplest such property is closedness under union.A more advanced property is wellgradedness (item-wise gradual learning), which leads to the pedagogically important concept of a learning space.Building substructures of a knowledge structure on subsets of the item set is also important.Knowledge structures may be too large to work with conveniently, or only a specific part of the domain may be needed.This can graphically be realized, for instance, through interactive mosaic plots.Since combinatorial structure-not data-is to be visualized, the mosaic plot representation of a knowledge structure introduced in Figure 5 is adequate.Interactively aggregating over items in such a mosaic plot representation allows easily visualizing substructures.In particular, the aggregated mosaic plot then shows how many states of the parent structure induce the same traces on the subdomain.Selecting a trace state of the substructure, in turn, allows visually displaying the corresponding knowledge states of the parent structure.
To conclude, the mosaic plot represents a promising novel perspective on empirically evaluating knowledge structure models in KST.

Figure 1
Figure 1 illustrates the step-wise construction of the corresponding two-dimensional mosaic plot.

Figure 2 :
Figure 2: Classical mosaic plot (left), fluctuation diagram (middle), and multiple barcharts (right) for the five test items of the mathematical literacy test dataset.Underlying knowledge states are highlighted.Details on these data and concepts are discussed in Section 5

Figure 3 :
Figure 3: Prerequisite diagram of mastery dependencies for the six elementary algebra problems a-f .Reflexivity and transitivity are assumed to hold and are not explicitly depicted.The mastery of Problem b is, for instance, a prerequisite for the mastery of Problem e.That is, the mastery of Problem e implies that of Problem b Figure 3: = ∪ {(a, c), (a, d), (a, e), (a, f ), (b, d), (b, e), (b, f ), (c, d), (c, e), (c, f ), (d, f ), (e, f )} , where = {(x, x) : x ∈ {a, b, c, d, e, f }} denotes the reflexive pairs.
On a domain Q = {a, b, c, d, e} of five dichotomous test items, we consider the three (antisymmetric) surmise relations 1 , 2 , and 3 depicted as Hasse diagrams in Figure4.They consist of 15, 11, and 8 item pairs, respectively.‡

Figure 6 :
Figure 6: A skewed distribution on the quasi ordinal knowledge space K 2 , where the states are listed from left to right in the same order as given in Section 4.1 and with probabilities p(K) on the y-axis (left) (one-dimensional alignment).A mosaic plot representation of the skewed probabilistic knowledge structure is shown as well (right) (two-dimensional alignment).The knowledge states and their proportions in the population are highlighted.The twodimensional alignment of the mosaic plot reveals additional overall structural information

Figure 12 :
Figure 12: Mosaic plots of the datasets simulated from K 2 equipped with the skewed distribution shown in Figure 6, for n = 1600 and τ = 0.03 (left) and for n = 250 and τ = 0.05 (right).The underlying knowledge states are highlighted

Figure 13 :
Figure 13: Mosaic plots of the datasets simulated from K 2 equipped with the skewed distribution shown in Figure 6, for n = 100 and τ = 0.20 (left) and for n = 3200 and τ = 0.20 (right).The underlying knowledge states are highlighted

Figure 14 :
Figure 14: Hasse diagram of the Rasch scale of five assessment items on the domain Q = {a, b, c, d, e}.From bottom to top, sorted according to increasing difficulty.Assumed to underlie the PISA dataset

Figure 15 :
Figure 15: Mosaic plot of the PISA dataset.The underlying knowledge states are highlighted shows a mosaic plot for the Items a, b, c, and e of the PISA dataset, aggregating over Item d, and a barchart for Item d.The examinees solving that particular item are selected and highlighted.

Figure 17 :
Figure 17: Aggregation and linking.Mosaic plot (top) for the Items a, b, c, and e of the PISA dataset, aggregating over Item d.Barchart (bottom) for Item d.Examinees solving Item d are selected and highlighted

Figure 18 :
Figure 18: Mosaic plot of the dataset simulated from K 3 for n = 1600 and τ = 0.20 (cf., Figure 11).The knowledge states obtained for this dataset under the original (left) and corrected (right) IITA algorithms are highlighted

Figure 20 :
Figure 20: Mosaic plot of the PISA dataset (cf., Figure 15).The knowledge states obtained for this dataset under the original (left) and corrected (right) IITA algorithms are highlighted

Table 2 :
Frequency table of the incorrect and correct answers to the five PISA test items