Correspondence Analysis in Archaeology
  • Home
  • Guide by worked examples
    • Aim of Correspondence Analysis
    • Association between rows and columns
    • Number of dimensions useful for data interpretation
    • Interpreting the CA scatterplot: dimensions interpretation
    • Interpreting the CA scatterplot (continued): correlation between row profiles and dimensions
    • Quality of the representation
    • Assembling the whole picture
    • Extension: clustering rows and/or columns
    • Another worked example
  • References
  • CA in R
    • CAinterprTools (R package)
    • R function for various CA scatterplots
    • R function for improved CA scatterplot
    • R function for perceptual-map-like CA scatterplot
    • R function for plotting Pareto chart of categories contribution
    • R Script for CA
    • Additional R Script for CA
    • R Script for the Significance of CA's Dimensions
  • Other Tools for Statistics
    • R package for seriation via CA
    • R function for scalar-stress probability calculation
    • R function for post. prob. for different relations btw 2 Bayesian 14C phases
    • R function for Posterior Probability Density plot
    • R function for binary Logistic Regression
    • R function for binary Logistic Regression internal validation
    • R function for optimism-adjusted AUC
    • R function for Brainerd-Robinson similarity coefficient
    • R function for univariate outliers detection
    • R function for plotting Jenks natural breaks classification
    • R function for permutation-based Chi square test of independence
    • R function for permutation t-test
    • R function for visually displaying Mann-Whitney test
    • R function for visually displaying Kruskal-Wallis test
    • Kruskal-Wallis Excel Template
    • Chi-squared Excel Template
    • Excel Template for Robust Statistics
  • GIS
  • Blog
  • About me
  • Guestbook/Comments
'CAseriation': R package for Seriation by means of Correspondence Analysis
In archaeology there is often the need to seriate contingency tables in order to devise a relative chronology of different types of contexts (e.g., graves). Different approaches exists in literature to achieve a best ordering.

The method implemented in the 'CAseriation' package is the ordination of rows and columns of a contingency table according to their scores on the Correspondence Analysis' dimension selected by the user. The package also allows to plot the CA scatterplot of selected dimensions, and to seek for clusters in the dataset. As for seriation, two plots are returned, displaying the sorted contingency table. A 'battleship' chart for the sorted table is also produced. The results (i.e., sorted tables) are exported into Excel spreadsheets. As for the clustering rationale, see the documentation that comes with the 'FactoMineR' package or, also, see my 2013 Journal article cited here.

The ideal work-flow for the use of the package would be:
(a) fed the contingency table into R;
(b) inspect the Correspondence Analysis scatterplot in search of a seriation structure (i.e., presence of the 'horseshoe' effect);
(c) sort the table according to the dimension the user is interesting in;
(d) additionally, formally assess the existence of clusters in the data;
(e) gauge to what extent the seriation structure (if any) embedded in the data is close to a perfect seriation.

Implemented functions to achieve the above goals:
(b) check.ca.plot()
(c) sort.table()
(d) plot.clusters.rows(); plot.clusters.cols()
(e) evaluate()

The 'CAseriation' package is currently available from my GitHib repository, and can be installed from that repository straight into R via the 'devtools' package (see instruction at the bottom of this page).

Here is a list of the implemented commands, with short examples of their use (using the 'perfect_seriation' dataset that comes with the package):
data("perfect_seriation")
load the sample dataset

check.ca.plot(perfect_seriation,1,2)
plot the Correspondence Analysis scatterplot of the first 2 dimensions in order to inspect data structure (e.g., seeking for the horseshoe effect)

sort.table(perfect_seriation,1)
sort the input contingency table according to the scores of rows and columns categories on the 1 CA dimension; two seriation plots and a 'battleship' plot for the sorted table are also produced

plot.clusters.rows(perfect_seriation,1,2)
display the CA scatterplot for row categories, with different clusters of points being given different colors

plot.clusters.cols(perfect_seriation,1,2)
the same as the preceding function, but applies to column categories

evaluate(perfect_seriation,1,2, which='R') 
plot the CA rows scatterplot of the first two dimensions and add a second order polynomial fit; the Rsquared value is also reported



To install the package, just follow the few steps listed below (you can copy and paste the highlighted pieces of code):
1) install the 'devtools' package:  install.packages("devtools", dependencies=TRUE)
2) load that package: library(devtools)
3) download the 'CAseriation' package  from GitHub via the 'devtools''s command: install_github("gianmarcoalberti/CAseriation")
4) load the package: library(CAseriation)
5) enjoy!

Have you found this website helpful?  Consider to leave a comment in this page.

Powered by Create your own unique website with customizable templates.