'CAinterprTools': R package for visual aid to Correspondence Analysis interpretation
Some of the features of the R script for CA (described in this site) have been turned into an R package. In its current version (0.7), the package is available from my GitHib repository, and can be installed from that repository straight into R via the 'devtools' package (see instruction at the bottom of this page). Besides implementing some of the features of my CA script, the package allows to calculate the significance of the CA dimensions and of the total inertia by means of a permutation test. Also, the package comes with a dataset ('greenacre_data') after Greenacre 2007 (p. 90, exhibit 12.1).
The package is also described in an article of mine published in Elsevier's SoftwareX journal (LINK). If you want to cite this package, you may use the following format:
Gianmarco Alberti, CAinterprTools: An R package to help interpreting Correspondence Analysis’ results, SoftwareX, Volumes 1–2, September 2015, Pages 2631, ISSN 23527110, http://dx.doi.org/10.1016/j.softx.2015.07.001.
Here is a list of the implemented commands, with short examples of their use (using the 'greenacre_data' that comes with the package):
data("greenacre_data")
loads the sample dataset.
Some of the features of the R script for CA (described in this site) have been turned into an R package. In its current version (0.7), the package is available from my GitHib repository, and can be installed from that repository straight into R via the 'devtools' package (see instruction at the bottom of this page). Besides implementing some of the features of my CA script, the package allows to calculate the significance of the CA dimensions and of the total inertia by means of a permutation test. Also, the package comes with a dataset ('greenacre_data') after Greenacre 2007 (p. 90, exhibit 12.1).
The package is also described in an article of mine published in Elsevier's SoftwareX journal (LINK). If you want to cite this package, you may use the following format:
Gianmarco Alberti, CAinterprTools: An R package to help interpreting Correspondence Analysis’ results, SoftwareX, Volumes 1–2, September 2015, Pages 2631, ISSN 23527110, http://dx.doi.org/10.1016/j.softx.2015.07.001.
Here is a list of the implemented commands, with short examples of their use (using the 'greenacre_data' that comes with the package):
data("greenacre_data")
loads the sample dataset.
ca.corr(greenacre_data)
displays a bar plot of the strength of the correlation between rows and columns of the input contingency table.
sig.tot.inertia.perm(greenacre_data, k=10000)
calculates the significance of the CA total inertia via permutation test (using 10000 permutations); a density curve of the permuted total inertia is displayed along with the observed total inertia and the 95th percentile of the permuted total inertia. The latter can be regarded as a 0.05 alpha threshold for the observed total inertia's significance. The number of permutations can be set by the user (1000 is set by default).
aver.rule(greenacre_data)
returns a chart suggesting which CA dimension is important for data structure interpretation, according to the socalled 'average rule'.
malinvaud(greenacre_data)
performs the Malinvaud test and print on screen the test's result (among which the significance of the CA dimensions); a plot is also provided, wherein a reference line (in RED) indicates the 0.05 threshold.
sig.dim.perm(greenacre_data, 1, 2, k=10000)
calculates the significance of the 1 and 2 CA dimensions via permutation test (using 10000 permutations), and displays the results as a scatterplot; reference lines provide information about the significance of the selected dimensions. The number of permutations can be set by the user (1000 is set by default).
rows.cntr(greenacre_data, 1, T)
displays the contribution of the row categories to the 1 CA dimension; a reference line indicates the threshold above which a contribution can be considered important for the determination of the dimension. The parameter 'T' specifies that the categories' contribution to the total inertia is also shown (hollow circle).
rows.cntr.scatter(greenacre_data, 1, 2)
displays a scatterplot for the row categories contribution to dimension 1&2.
rows.qlt(greenacre_data, 1, 2)
displays the quality of row categories display on the subspace determined by the 1&2 CA dimensions.
rows.corr(greenacre_data, 1)
displays the correlation of the row categories with the 1 CA dimension.
rows.corr.scatter(greenacre_data, 1, 2)
displays a scatterplot for row categories correlation with dimension 1&2.
cols.cntr(greenacre_data, 1, T)
displays the contribution of the column categories to the 1 CA dimension; a reference line indicates the threshold above which a contribution can be considered important for the determination of the dimension. The parameter 'T' specifies that the categories' contribution to the total inertia is also shown (hollow circle).
cols.cntr.scatter(greenacre_data, 1, 2)
displays a scatterplot for column categories contribution to dimension 1&2.
ca.cols.qlt(greenacre_data, 1, 2)
displays the quality of column categories display on the subspace determined by the 1&2 CA dimensions.
cols.corr(greenacre_data, 1)
displays the correlation of the column categories with the 1 CA dimension.
cols.corr.scatter(greenacre_data, 1, 2)
displays a scatterplot for column categories correlation with dimension 1&2.
As of version 0.5, 'CAinterprTools' integrates two functions that are described elsewhere in this same site, as well as a brand new third one:
1) ca.scatter(): described at this page in this same site
2) ca.plus(): described at this page in this same site
3) sig.dim.perm.scree(): it allows to test the significance of the CA dimensions by means of permutation of the input contingency table. The number of permutations used is entered by the user. The function return a scree plot displaying for each dimension the observed eigenvalue and the 95th percentile of the permuted distribution of the corresponding eigenvalue. Observed eigenvalues that are larger than the corresponding 95th percentile are significant at alpha 0.05. See the command's help provided by the package for further details.
New in version 0.6: 'ggplot2' and 'ggrepel' package are used to produce the charts returned by the functions: cols.cntr.scatter(), rows.cntr.scatter(), cols.corr.scatter(), rows.corr.scatter(). The two packages have been preferred over R base plotting facitily for their ability to plot non overlapping point labels. This will allow complex charts to have notoless cluttered labels.
New in version 0.7: ca.percept() has been added to the package; the function is described at this page in this same site. The brand_coffee dataset has been also included. The dataset is after Kennedy et al, Practical Applications of Correspondence Analysis to Categorical Data in Market Research, in Journal of Targeting Measurement and Analysis for Marketing, 1996. Minor corrections have been done to the help documentation of a handfull of commands.
To install the package in R, just follow the few steps listed below (you can copy and paste the highlighted pieces of code): 1) install the 'devtools' package: install.packages("devtools", dependencies=TRUE) 2) load that package: library(devtools) 3) download the 'CAinterprTools' package from GitHub via the 'devtools''s command: install_github("gianmarcoalberti/CAinterprTools_0.7") 4) load the package: library(CAinterprTools) 5) enjoy! 

Have you found this website helpful? Consider to leave a comment in this page.