--- title: "Tutorial: Adaptive versus regular histograms" author: "Joe Song" date: "Updated 2020-11-07" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{Tutorial: Adaptive versus regular histograms} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Adaptive histograms can reveal patterns in data more effectively than regular (equal-bin-width) histograms. The "discontinuous" style adaptive histogram is recommended because it adapts to the data without attempting to assign a non-zero density to every bin. When bins are not equal with, the vertical axis of the histogram is density instead of frequencies. ## Example 1: Data from a Gaussian mixture model using one bin per cluster Plot an adaptive histogram from data generated by a Gaussian mixture model with three components: ```{r, fig.width=6} require("Ckmeans.1d.dp") x <- c(rnorm(40, mean=-2, sd=0.3), rnorm(45, mean=1, sd=0.1), rnorm(70, mean=3, sd=0.2)) ahist(x, col="lightblue", sub=paste("n =", length(x)), col.stick="darkblue", lwd=2, xlim=c(-4,4), main="Example 1. Gaussian mixture model with 3 components\n(one bin per component)\nAdaptive histogram") ``` When breaks is specified, ahist will call hist (regular histogram function in R). ```{r, fig.width=6} ahist(x, breaks=3, col="lightgreen", sub=paste("n =", length(x)), col.stick="forestgreen", lwd=2, main="Example 1. Regular histogram") ``` ## Example 2: Data from a Gaussian mixture model using three bins per cluster Plot an adaptive histogram from data generated by a Gaussian mixture model with three components using a given number of bins ```{r, fig.width=6} ahist(x, k=9, col="lavender", col.stick="navy", sub=paste("n =", length(x)), lwd=2, main="Example 2. Gaussian mixture model with 3 components\n(on average 3 bins per component)\nAdaptive histogram") ``` When breaks is specified, ahist will call hist (regular histogram function in R). ```{r, fig.width=6} ahist(x, breaks=9, col="lightgreen", col.stick="forestgreen", sub=paste("n =", length(x)), lwd=2, main="Example 2. Regular histogram") ``` ## Example 3: Adaptive histogram of protein DNase The DNase data frame has 176 rows and 3 columns of data obtained during development of an ELISA assay for the recombinant protein DNase in rat serum: ```{r, fig.show='hold', fig.width=6} data(DNase) res <- Ckmeans.1d.dp(DNase$density) kopt <- length(res$size) ahist(res, data=DNase$density, col=rainbow(kopt), col.stick=rainbow(kopt)[res$cluster], sub=paste("n =", length(x)), border="transparent", xlab="Optical density of protein DNase", main="Example 3. Elisa assay of DNase in rat serum\nAdaptive histogram") ``` Using the same data with Example 3, this example demonstrates the inadequacy of equal-bin-width histograms. The third bin gives a false sense of sample distribution. We can specifiy breaks="Sturges" in ahist() function to use equal-bin-width histograms. The difference is that sticks are added to the histogram by ahist(), but not by the R provided hist() function. ```{r, fig.show='hold', fig.width=6} ahist(DNase$density, breaks="Sturges", col="palegreen", add.sticks=TRUE, col.stick="darkgreen", main="Example 3. Elisa assay of DNase in rat serum\nRegular histogram (equal bin width)", xlab="Optical density of protein DNase") ``` ## Example 4. Repetitive data Cluster data with repetitive elements: ```{r, fig.show='hold', fig.width=6} x <- c(1,1,1,1, 3,4,4, 6,6,6) ahist(x, k=c(2,4), col="gray", lwd=2, lwd.stick=6, col.stick="chocolate", main="Example 4. Adaptive histogram of repetitive elements") ahist(x, breaks=3, col="lightgreen", lwd=2, lwd.stick=6, col.stick="forestgreen", main="Example 4. Regular histogram") ```