Textbooks on Differential Calculus in Eighteenth Century Europe: A Comparative Stylistic Analysis

Comparative mathematical textbook analysis aims at the determination of differences among countries concerning the development and transmission of mathematics. On the other hand, textual statistics provides a means to quantify a text by applying multivariate statistical techniques. So far this statistical approach has not been applied to comparative mathematical textbook analysis yet. The object of this paper is to quantify and compare the style of a number of textbooks on differential calculus written in 18th century Europe. To that purpose two multivariate statistical techniques have been applied: 1) simple correspondence analysis and 2) hierarchical clustering analysis. The results of both analysis help to detect some interesting associations among the analysed textbooks.


Introduction
supports the comparative textbook analysis as a means to determine the differences among countries, when speaking of style, meaning and epistemology in mathematics.In order to do so he claims that educational system must be taken into account because textbooks and their transmission depend on its constraints, values and styles.Following Schubring's views I elaborated my PhD thesis, which aimed at analysing the mathematical development of calculus through a number of textbooks on differential calculus written in 18th century Europe, namely, in France, Germany, Italy and Great Britain (see Blanco, 2004).This paper deals with the analysis and comparison of the style of these textbooks.
To that purpose I worked with the following textbooks on differential calculus: • Analyse des Infiniment Petits (1696) by the Marquis de L'Hôpital.
• The Doctrine and Application of Fluxions (1750) by Thomas Simpson.
In some cases not the whole textbook has been analysed, but just the parts concerning differential calculus.Most of these works are barely commented on, and mostly even ignored, in the traditional histories of calculus.They have been overlooked by the major works of their time and, consequently, not studied in detail so far.
Textual statistics allows to quantify a text by applying multivariate statistical techniques to create associations from the computation of word frequencies (Greenacre, 1993;Lebart and Salem, 1994).In that sense it is worth mentioning the work of Ginebra and Cabos (1998) and that of Riba and Ginebra (2003), where they apply multivariate statistical analysis to compare the style of the different parts of the cavalry novel Tirant lo Blanc.So far textual statistics has not been applied to comparative analysis of historical mathematical textbooks yet.The aim of this paper is to analyse and compare the style of the studied textbooks.Multivariate statistics here provides the tools to detect any association among these textbooks.

Methodology
Following a similar approach to that of Ginebra and Cabos (1998) and Riba and Ginebra (2003) I took into account words used by the authors to introduce a section as a means to describe numerically the style of a textbook.I grouped them into seven categories: 1) corollary; 2) example; 3) problem; 4) theorem  definition, solution, proof).As far as the analysed textbooks are concerned, there is no digitalized version available on line.Hence, the collection of words had to be done by hand from manuscript copies.Once the count was up the words and their frequencies were arranged in a contingency table (Table 1), where columns and rows represent "authors" and "words", respectively.Here "Riccati" stands for Riccati-Saladini (1765-1767).
In order to discover any association among the analysed textbooks and the frequencies of the words listed above I applied two multivariate techniques with the help of the statistical software MINITAB: 1) simple correspondence analysis; 2) hierarchical clustering analysis.

Simple correspondence analysis
Simple correspondence analysis is an ordination and dimension reduction technique, used to transform numerical information into graphical form.It deals with the representation of the rows and columns of a contingency table on a biplot, that is, a bi-dimensional map.The principal axes or components find an optimal orientation in each cloud of points.The contingency table is a matrix of dimension n × p, whose elements are frequencies (n = 7, p = 14 for Table 1).The corresponding relative frequencies, f ij , i = 1, . . ., n; j = 1, . . ., p, are obtained by dividing the frequencies into the total k (k = 3607 for Table 1).The row i profile is where the sum represents the distributions of frequencies in individual rows (marginal frequencies).Likewise, the column j profile is where the sum represents the distributions of frequencies in individual columns (marginal frequencies).These profiles allows to define a cloud of n row points in R p , or of p column points in R n .How should the distance between two row (or column) points be interpreted?The distance between two row points i, i is defined as: and the distance between two column points j, j is: This distance, called chi-square (χ 2 ) distance, resembles the usual Euclidean distance.
Inertia is associated with the chi-square value of the contingency table, which in turn accounts for the variation in the table, that is, the greatest distance among the points of the table.Inertia is the weighted sum of each point's mass (i.e., the marginal frequency) by the distance between the point and the origin, O: In both cases the development of the sum yields: In order to detect the principal axes, those with maximum inertia in any cloud of points, we introduce the following matrixes: -the matrix of relative frequencies: -the diagonal matrix of row distributions: -the diagonal matrix of column distributions: Therefore, D −1 n F and D −1 p F T are the matrixes of row profiles and of column profiles, respectively.In R p the matrix D −1 n F provides the coordinates of row points, the elements of the matrix D n are the row masses and D −1 p represents the distance metric.The maximum intertia in R p , with regard to the origin O, is determined as follows: under the normalization restriction u T D −1 p u = 1.Hence u is an orthonormal eigenvector of the matrix Its eigenvalue is denoted λ: The expression ψ = (D −1 n F )D −1 p u displays the coordinates of row points in the factorial axes: with: Let us consider now the whole set of eigenvalues of the matrix where N stands for the minimum of n − 1 and p − 1.It can be proved that the sum of the eigenvalues is: that is to say, the inertia, as in (3.9).So far we have worked with the cloud of row points.The maximum inertia for the cloud of column points, in R n , can be determined similarly.In this case the matrix D −1 p F T provides the coordinates of column points, the matrix D p contains the column masses and the matrix D −1 n defines the distance metric.Let us now define absolute and relative contributions.Absolute contributions indicate which points have been the most influential in determining the orientation of the principal axes, thus rendering easier the interpretation of them.The total contribution of all the rows to the factorial axis α equals the eigenvalue λ α : is called the (absolute) row i contribution to the inertia of the factorial axis α.
Then the relative row i contribution to the axis α is: The (absolute) column j contribution to the inertia of the factorial axis α and the relative column contributions can be calculated in a similar way.
Finally, we can compute the relative contributions of a specific axis to a row or a column, which measures the quality of the resulting representation: Relative contribution of the α-axis Relative contribution of the α-axis to column j = C α (j) These contributions can be geometrically understood as the projection over the α-axis of the distance between a point and the origin.This is why they are also known as "squared cosines".The quality of the representation is computed from the sum of the relative contributions (or squared cosines) of the first m axis to every row or column (here, m = 2).Therefore, quality provides information about the points which are best explained by the axes or by the subspace formed from the principal plane.It helps to interpret each profile's position, which means the ratio of appearances of a certain variable in a certain unit or point.
In the contingency table created to the purpose of this study, the units "authors" are represented by row points whereas the variables "words" by column points (or vectors).The chi-square value related to the contingency table is 1904.124(with 78 degrees of freedom).This value ensures the significance of the test, namely, of the association between rows and columns.Somewhere in the contingency table there are significant differences among the profiles.Let us survey the tables of concurrence analysis (Table 2) and that of relative inertia (Table 3).
Table 2 shows the corresponding concurrence analysis, related to total inertia.The first and second components of the chosen plane explain 83.60% of the inertia (with bold type in Table 2).Total inertia (here, 0.527) accounts for the variation in the table.Therefore, the information about the position among the "authors" profiles, that is, the representation of distances, is quite accurate.
On the other hand, Table 3 displays the relative inertia of every author to every word.Karsten shows the major contribution to inertia (with bold type in Table 3).
I used an asymmetric map in order to display graphically the results of the correspondence analysis.Here the vertexes are the seven categories of "words" (rows), which work as a reference system, assuming standard coordinates.The use of an asymmetric map here is more convenient because the closer a point is to a vertex, the higher its profile is with respect to that category.On the contrary, in a symmetric map closeness of a row and a column does not imply any association in the data, but the overlay of two separate maps.The coordinates of "authors" are principal, that is, referring to the principal axes.A closer look at contributions reveals the words which play a major part in the formation of an axis.Contributions quantify the attraction of the points towards the axes.That is to say, they show the most influential words in determining the principal orientation of the axes.Table 4 displays the quality, inertia, contributions and squared cosines of every "word", as well as their coordinates with respect to each component (axis).In Table 4 bold type is used to denote the most significant contributions.5 displays the quality, inertia, contributions and square cosines of every "author", as well as their coordinates with respect to each component (axis).Quality (with bold type in Table 5) reveals the accuracy of the map in displaying each author's profile.Hence, higher quality means a more accurate representation.From the coordinates showed in Table 4 and Table 5, the plots in Figure 1 are obtained.These plots display the associations that can be detected.In them the first two components or axes are plotted against one another.
The group of German authors lies to the right (Figure 1a) whereas the rest to the left.Over the x-axis (Figure 1b) the greatest difference lies between the word figure to the left (with a contribution of 47.4%) and, to the right, the word corollary (with a contribution of 28.6%) and other words (with a contribution of 15.2%).Over the y-axis (Figure 1b) the greatest difference lies between the words example (contribution: 37.5%) and problem (contribution: 23.5%), to the top, and the words figure (contribution: 17.5%) and corollary (contribution: 14.9%), to the bottom.
Most of the points are well represented by this map (see the column for quality in Table 5), except Ditton (quality: 0.003) and Wolff (quality: 0.323), whose representation is not accurate enough.Hence a third axis or component should be taken into account.Along the x-axis (Figure 1c) the group of "authors" to the right is related to corollary and other words, and the one to the left to figure.When it comes to the y-axis, the difference between Karsten and Kästner is the greatest.Karsten associates with corollary whereas Kästner with example and problem.The group of Karsten (quality: 0.911), Kastner (quality: 0.966) and Tempelhoff (quality: 0.923) have an accurate representation.To the left the group of Saladini (quality: 0.806), Bézout (quality: 0.843), Maclaurin (quality: 0.939) and Lacroix (quality: 0.918) bears also high quality.Reyneau could be included in this group but with lower quality (quality: 0.635).This group is associated with the word figure.There is still another group with an accurate representation, which includes L'Hôpital (quality: 0.846), Agnesi (quality: 0.904), Riccati (quality: 0.946) and, to a lesser degree of accuracy, Simpson (quality: 0.587).calculus that Simpson read.Another interesting association is that of Lacroix, Bézout, Saladini and Maclaurin (and, to a lesser extent, Reyneau), whose works show a tendency towards the use of figures.To conclude, combined multivariate statistical techniques provide a new approach to analyse and compare historical mathematical textbooks.

Table 1 :
Contingency table containing word frequencies, where columns represent the fourteen "authors" and rows represent the seven categories of "words".

Table 2 :
Table of concurrence analysis, displaying the proportion of variation (or inertia).

Table 3 :
Table of relative inertia of each "author" to each "word"