Introduction

Old Faithful is a cone geyser located in Wyoming, in Yellowstone National Park in the United States. It is one of the most predictable geographical features on Earth, erupting almost every 91 minutes. (source http://en.wikipedia.org/wiki/Old_Faithful).

alt text

In this Rmarkdown document we shall briefly explore some data properties of the Old Faithful geyser.

Data analysis

Data format

Let’s have a look to the data that we are using (272 observations on 2 variables in total):

eruptions waiting
3.600 79
1.800 54
3.333 74
2.283 62
4.533 85
2.883 55

The meaning of the variables is summarized below:

  • eruptions: Eruption time in mins
  • waiting: Waiting time to next eruption (in mins)

Data distribution

A simple plot of the data shows that there exist 2 clusters:

Also, plotting marginal distribution of variables suggest that data can be approximated by a mixture of 2D-gaussians!

Thus, we hypothesize that the data \(\mathbf{x}\) fl \[\mathbf{x}\sim \sum_{i=1}^{K} \phi_i\cdot\mathcal{N}(\mathbf{x}|\mathbf{\mu}_i,\mathbf{\Sigma}_i)\]

Clustering

In order to test our hypothesis we shall use the a basic algorithm from the mclust library:

suppressMessages(library('mclust'))
faithfulMclust<-Mclust(faithful,G = 2)
plot(faithfulMclust,what="classification")

summary(faithfulMclust)
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm 
## ----------------------------------------------------
## 
## Mclust VVV (ellipsoidal, varying volume, shape, and orientation) model with 2 components:
## 
##  log.likelihood   n df       BIC       ICL
##       -1130.264 272 11 -2322.192 -2322.695
## 
## Clustering table:
##   1   2 
## 175  97

… remainder of the analysis