'plotJenks': R function for plotting univariate classification using Jenks' natural break method (DOI: 10.13140/RG.2.2.18011.05929)

'plotJenks' is an R function which allows to break a dataset down into a user-defined number of breaks and to nicely plot the results, adding a number of other relevant information. Implementing the Jenks' natural breaks method, it allows to find the best arrangement of values into different classes.

The function is quite straightforward:

where

To describe the use of the function, let's use a fictional dataset storing information about the wealth (in terms of the total cost of the accompayining burial items) of 78 graves in a cemetery. Let's assume the data are stored in a dataframe's column named

We can put the function to work as follows:

Note

The function returns the following chart:

The function is quite straightforward:

*plotJenks(data, n=3, brks.cex=0.70, top.margin=10, dist=5)*where

*data*: is a vector storing the data;*n*: is the desired number of classes in which the dataset must be broken down (3 by default);*brks.cex*: is used to adjust the size of the labels used in the returned plot to display the classes' break-points;*top.margin*: is used to adjust the distance of the labels from the top margin of the returned chart;*dist*: is used to adjust the distance of the labels from the dot used to display the data points.To describe the use of the function, let's use a fictional dataset storing information about the wealth (in terms of the total cost of the accompayining burial items) of 78 graves in a cemetery. Let's assume the data are stored in a dataframe's column named

*$wealth*. The dataframe name is*mydata*. Let's assume that we wish to break down the wealth values into 6 classes and we wish to work out the break-points that best capture "natural" groupings in the data.We can put the function to work as follows:

*plotJenks(mydata$wealth, n=6)*Note

*:*all the other parameters are omitted, that is are left as they are by default.The function returns the following chart:

The wealth values are arranged on the x-axis in ascending order, while the index of the individual observations is reported on the y-axis. Vertical dotted red lines correspond to the optimal break-points which best divide the wealth values into the selected 6 classes. The break-points (and their values) are reported in the upper part of the chart, onto the corresponding break lines. Also, the chart's subtitle reports the

*Goodness of Fit*value relative to the selected partition, and the partition which correspond to the maximum

*GoF*value.

Finally, the function also returns some textual information which can be saved as follows:

*res <- plotJenks(mydata$wealth, n=6)*

that is storing them into an object to be further explored.

The user can access the information stored in the object

*res*as follows:

*res$info*: provide information about whether or not the method created non-unique breaks;*res$classif*: returns the created classes and the number of observations falling in each class;*res$classif$brks*: returns the classes' break-points;*res$breaks$max.GoF*: returns the number of classes at which the maximum GoF is achieved;*res$class.data*: returns a dataframe storing the values and the class in which each value actually falls into.

The function can be downloaded HERE. Alternatively, you can copy/paste the code below. Please note that the 'classInt' package must be already installed in R in order for the function to work properly.

plotJenks <- function(data, n=3, brks.cex=0.70, top.margin=10, dist=5){

df <- data.frame(sorted.values=sort(data, decreasing=TRUE))

Jclassif <- classIntervals(df$sorted.values, n, style = "jenks") #requires the 'classInt' package

test <- jenks.tests(Jclassif) #requires the 'classInt' package

df$class <- cut(df$sorted.values, unique(Jclassif$brks), labels=FALSE, include.lowest=TRUE) #the function unique() is used to remove non-unique breaks, should the latter be produced. This is done because the cut() function cannot break the values into classes if non-unique breaks are provided

if(length(Jclassif$brks)!=length(unique(Jclassif$brks))){

info <- ("The method has produced non-unique breaks, which have been removed. Please, check '...$classif$brks'")

} else {info <- ("The method did not produce non-unique breaks.")}

loop.res <- numeric(nrow(df))

i <- 1

repeat{

i <- i+1

loop.class <- classIntervals(df$sorted.values, i, style = "jenks")

loop.test <- jenks.tests(loop.class)

loop.res[i] <- loop.test[[2]]

if(loop.res[i]>0.9999){

break

}

}

max.GoF.brks <- which.max(loop.res)

plot(x=df$sorted.values, y=c(1:nrow(df)), type="b", main=paste0("Jenks natural breaks optimization; number of classes: ", n), sub=paste0("Goodness of Fit: ", round(test[[2]],4), ". Max GoF (", round(max(loop.res),4), ") with classes:", max.GoF.brks), ylim =c(0, nrow(df)+top.margin), cex=0.75, cex.main=0.95, cex.sub=0.7, ylab="observation index", xlab="value (increasing order)")

abline(v=Jclassif$brks, lty=3, col="red")

text(x=Jclassif$brks, y= max(nrow(df)) + dist, labels=sort(round(Jclassif$brks, 2)), cex=brks.cex, srt=90)

results <- list("info"=info, "classif" = Jclassif, "breaks.max.GoF"=max.GoF.brks, "class.data" = df)

return(results)

}

df <- data.frame(sorted.values=sort(data, decreasing=TRUE))

Jclassif <- classIntervals(df$sorted.values, n, style = "jenks") #requires the 'classInt' package

test <- jenks.tests(Jclassif) #requires the 'classInt' package

df$class <- cut(df$sorted.values, unique(Jclassif$brks), labels=FALSE, include.lowest=TRUE) #the function unique() is used to remove non-unique breaks, should the latter be produced. This is done because the cut() function cannot break the values into classes if non-unique breaks are provided

if(length(Jclassif$brks)!=length(unique(Jclassif$brks))){

info <- ("The method has produced non-unique breaks, which have been removed. Please, check '...$classif$brks'")

} else {info <- ("The method did not produce non-unique breaks.")}

loop.res <- numeric(nrow(df))

i <- 1

repeat{

i <- i+1

loop.class <- classIntervals(df$sorted.values, i, style = "jenks")

loop.test <- jenks.tests(loop.class)

loop.res[i] <- loop.test[[2]]

if(loop.res[i]>0.9999){

break

}

}

max.GoF.brks <- which.max(loop.res)

plot(x=df$sorted.values, y=c(1:nrow(df)), type="b", main=paste0("Jenks natural breaks optimization; number of classes: ", n), sub=paste0("Goodness of Fit: ", round(test[[2]],4), ". Max GoF (", round(max(loop.res),4), ") with classes:", max.GoF.brks), ylim =c(0, nrow(df)+top.margin), cex=0.75, cex.main=0.95, cex.sub=0.7, ylab="observation index", xlab="value (increasing order)")

abline(v=Jclassif$brks, lty=3, col="red")

text(x=Jclassif$brks, y= max(nrow(df)) + dist, labels=sort(round(Jclassif$brks, 2)), cex=brks.cex, srt=90)

results <- list("info"=info, "classif" = Jclassif, "breaks.max.GoF"=max.GoF.brks, "class.data" = df)

return(results)

}

*Have you found this website helpful? Consider to leave a comment in this page.*