Old Faithful is a cone geyser located in Wyoming, in Yellowstone National Park in the United States. It is one of the most predictable geographical features on Earth, erupting almost every 91 minutes. (source http://en.wikipedia.org/wiki/Old_Faithful).
In this Rmarkdown document we shall briefly explore some data properties of the Old Faithful geyser.
Let’s have a look to the data that we are using (272 observations on 2 variables in total):
eruptions | waiting |
---|---|
3.600 | 79 |
1.800 | 54 |
3.333 | 74 |
2.283 | 62 |
4.533 | 85 |
2.883 | 55 |
The meaning of the variables is summarized below:
A simple plot of the data shows that there exist 2 clusters:
Also, plotting marginal distribution of variables suggest that data can be approximated by a mixture of 2D-gaussians!
Thus, we hypothesize that the data \(\mathbf{x}\) fl \[\mathbf{x}\sim \sum_{i=1}^{K} \phi_i\cdot\mathcal{N}(\mathbf{x}|\mathbf{\mu}_i,\mathbf{\Sigma}_i)\]
In order to test our hypothesis we shall use the a basic algorithm from the mclust
library:
suppressMessages(library('mclust'))
faithfulMclust<-Mclust(faithful,G = 2)
plot(faithfulMclust,what="classification")
summary(faithfulMclust)
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VVV (ellipsoidal, varying volume, shape, and orientation) model with 2 components:
##
## log.likelihood n df BIC ICL
## -1130.264 272 11 -2322.192 -2322.695
##
## Clustering table:
## 1 2
## 175 97
… remainder of the analysis