# Histograms and Density Plots

## Histograms

You can create histograms with the function **hist(***x***)** where *x* is a numeric vector of values to be plotted. The option ** freq=FALSE** plots probability densities instead of frequencies. The option **breaks=** controls the number of bins.

`# Simple Histogram`

hist(mtcars$mpg)

`# Colored Histogram with Different Number of Bins`

hist(mtcars$mpg, breaks=12, col="red")

`# Add a Normal Curve (Thanks to Peter Dalgaard)`

x <- mtcars$mpg

h<-hist(x, breaks=10, col="red", xlab="Miles Per Gallon",

main="Histogram with Normal Curve")

xfit<-seq(min(x),max(x),length=40)

yfit<-dnorm(xfit,mean=mean(x),sd=sd(x))

yfit <- yfit*diff(h$mids[1:2])*length(x)

lines(xfit, yfit, col="blue", lwd=2)

Histograms can be a poor method for determining the shape of a distribution because it is so strongly affected by the number of bins used.

## Kernel Density Plots

Kernal density plots are usually a much more effective way to view the distribution of a variable. Create the plot using **plot(density(***x***)) **where* x *is a numeric vector.

`# Kernel Density Plot`

d <- density(mtcars$mpg) # returns the density data

plot(d) # plots the results

`# Filled Density Plot`

d <- density(mtcars$mpg)

plot(d, main="Kernel Density of Miles Per Gallon")

polygon(d, col="red", border="blue")

## Comparing Groups VIA Kernal Density

The **sm.density.compare( ) **function in the **sm **package allows you to superimpose the kernal density plots of two or more groups. The format is **sm.density.compare(***x***, ***factor***)** where *x* is a numeric vector and *factor* is the grouping variable.

`# Compare MPG distributions for cars with `

#
4,6, or 8 cylinders

library(sm)

attach(mtcars)

# create value labels

cyl.f <- factor(cyl, levels= c(4,6,8),

labels = c("4 cylinder", "6 cylinder", "8 cylinder"))

# plot densities

sm.density.compare(mpg, cyl, xlab="Miles Per Gallon")

title(main="MPG Distribution by Car Cylinders")

# add legend via mouse click

colfill<-c(2:(2+length(levels(cyl.f))))

legend(locator(1), levels(cyl.f), fill=colfill)

## To Practice

Try the histograms and distributions section of this intro to statistics in R course.