R package for seriation by means of Correspondence Analysis

'CAseriation': R package for Seriation by means of Correspondence Analysis

In archaeology there is often the need to seriate contingency tables in order to devise a relative chronology of different types of contexts (e.g., graves). Different approaches exists in literature to achieve a best ordering.

The method implemented in the 'CAseriation' package is the ordination of rows and columns of a contingency table according to their scores on the Correspondence Analysis' dimension selected by the user. The package also allows to plot the CA scatterplot of selected dimensions, and to seek for clusters in the dataset. As for seriation, two plots are returned, displaying the sorted contingency table. A 'battleship' chart for the sorted table is also produced. The results (i.e., sorted tables) are exported into Excel spreadsheets. As for the clustering rationale, see the documentation that comes with the 'FactoMineR' package or, also, see my 2013 Journal article cited here.

The ideal work-flow for the use of the package would be:
(a) fed the contingency table into R;
(b) inspect the Correspondence Analysis scatterplot in search of a seriation structure (i.e., presence of the 'horseshoe' effect);
(c) sort the table according to the dimension the user is interesting in;
(d) additionally, formally assess the existence of clusters in the data;
(e) gauge to what extent the seriation structure (if any) embedded in the data is close to a perfect seriation.

Implemented functions to achieve the above goals:
(b) check.ca.plot()
(c) sort.table()
(d) plot.clusters.rows(); plot.clusters.cols()
(e) evaluate()

The 'CAseriation' package is currently available from my GitHib repository, and can be installed from that repository straight into R via the 'devtools' package (see instruction at the bottom of this page).

Here is a list of the implemented commands, with short examples of their use (using the 'perfect_seriation' dataset that comes with the package):
data("perfect_seriation")
load the sample dataset

check.ca.plot(perfect_seriation,1,2)
plot the Correspondence Analysis scatterplot of the first 2 dimensions in order to inspect data structure (e.g., seeking for the horseshoe effect)

sort.table(perfect_seriation,1)
sort the input contingency table according to the scores of rows and columns categories on the 1 CA dimension; two seriation plots and a 'battleship' plot for the sorted table are also produced

plot.clusters.rows(perfect_seriation,1,2)
display the CA scatterplot for row categories, with different clusters of points being given different colors

plot.clusters.cols(perfect_seriation,1,2)
the same as the preceding function, but applies to column categories

evaluate(perfect_seriation,1,2, which='R')
plot the CA rows scatterplot of the first two dimensions and add a second order polynomial fit; the Rsquared value is also reported

To install the package, just follow the few steps listed below (you can copy and paste the highlighted pieces of code):
1) install the 'devtools' package: install.packages("devtools", dependencies=TRUE)
2) load that package: library(devtools)
3) download the 'CAseriation' package from GitHub via the 'devtools''s command: install_github("gianmarcoalberti/CAseriation")
4) load the package: library(CAseriation)
5) enjoy!

Have you found this website helpful? Consider to leave a comment in this page.