Correspondence Analysis in R

CA in R

Given the relevance and utility of CA, Baxter and Cool are right in advocating a more widespread use of the technique (Baxter, Cool 2010, 213, 225), and their effort in providing a detailed guide to perform CA in the free statistical R environment (Ihaka, Gentleman 1996) is very welcome. In fact, all the main commercial statistical programs can perform CA, but their price is generally far beyond the budget of the average user, let alone students willing to approach the technique to analyse data on their own. On the contrary, many packages that perform CA are freely available in R, each with different features as far as graphical output and analytical tools are concerned. See, e.g., the ‘anacor’ (de Leeuw, Mair 2009), ‘ca’ (Nenadic, Greenacre 2007), and ‘FactoMineR’ packages (Lê et al. 2008; Husson et al. 2011), or those (namely, ‘MASS’, ‘ade4’, and ‘vegan’) used by Baxter and Cool (2010, references therein). The availability of many different tools offers to the user the possibility to choose the one(s) he consider more appropriate for his analytical tasks.

As for the choice I made, the decision to focus on the ‘ca’ and ‘FactoMineR’ packages rests on both matter of personal taste and on the fact that extensive literature does exist allowing the users to go deeper into the details of both packages (Greenacre 2007, 213-258; Nenadic, Greenacre 2007; Lê et al. 2008; Husson et al. 2011). Remarkably, video tutorials on the use of ‘FactoMineR’ have been made available by F. Husson himself and can be easily found on his YouTube channel.

With my R Script I wish:

˗ to expand the sensible Baxter and Cool’s idea of benefitting from the flexibility of the R environment to perform CA;
˗ to make a step forward in the direction of freeing the user from manually entering long pieces of R code; in this respect, an R script will be proposed. It will be soon made available both on-line (http://uniud.academia.edu/GianmarcoAlberti/) and from the author upon request. A video tutorial on YouTube is also available in this site.

It is not the intention of this article to instruct the readers on the coding needed to perform CA. As a matter of fact, I do not want to replicate what already exists in literature: many scholars have already focused on line-by-line tutorials of CA in R (Nenadic, Greenacre 2007; Greenacre 2007, 213-258; Baxter, Cool 2010; Husson et al. 2011, 59-126; Glynn in press). Rather, this work is indented for archaeologists with no or scant knowledge of R yet willing to use it to perform CA. For this reason, the article concentrates on the analysis’ output rather than on the way to obtain it from R. More experienced R users will already be familiar enough to grasp the script on their own and to use (or even modify) it according to their personal taste and specific needs.

What are the advantages of the script? It allows the user to:

˗ pick the best (or, at least, what I consider as such) from the aforementioned two R packages developed by leading scholars in CA computation, in order to provide a set of CA statistics and graphical outputs relevant to the analysis of data;
˗ provide a textual summary of the CA output statistics;
˗ provide graphs (some of them not native to the packages) that are important for CA interpretation;
- provide the possibility to compare four different criterions for the selection of an optimal dimensionality of the CA solution; in this respect, the Malinvaud’s test has been implemented for the first time in R, at the best of my knowledge.