# Descriptive Statistics

R provides a wide range of functions for obtaining summary statistics. One method of obtaining descriptive statistics is to use the **sapply( ) **function with a specified summary statistic.

`# get means for variables in data frame mydata`

# excluding missing values

sapply(mydata, mean, na.rm=TRUE)

Possible functions used in sapply include **mean, sd, var, min, max, median, range, and quantile**.

There are also numerous R functions designed to provide a range of descriptive statistics at once. For example

` # mean,median,25th and 75th quartiles,min,max`

summary(mydata)

# Tukey min,lower-hinge, median,upper-hinge,max

fivenum(x)

Using the Hmisc package

`library(Hmisc)`

describe(mydata)

# n, nmiss, unique, mean, 5,10,25,50,75,90,95th percentiles

# 5 lowest and 5 highest scores

Using the **pastecs **package

`library(pastecs)`

stat.desc(mydata)

# nbr.val, nbr.null, nbr.na, min max, range, sum,

#
median, mean, SE.mean, CI.mean, var, std.dev, coef.var

Using the psych package

`library(psych)`

describe(mydata)

# item name ,item number, nvalid,
mean, sd,

#
median, mad, min, max, skew, kurtosis, se

## Summary Statistics by Group

A simple way of generating summary statistics by grouping variable is available in the psych package.

`library(psych)`

describe.by(mydata, group,...)

The doBy package provides much of the functionality of SAS PROC SUMMARY. It defines the desired table using a model formula and a function. Here is a simple example.

`library(doBy)`

summaryBy(mpg + wt ~ cyl + vs, data = mtcars,

FUN = function(x) {
c(m = mean(x), s = sd(x))
} )

# produces mpg.m wt.m mpg.s wt.s for each

# combination of the levels of cyl and vs

**See also**: aggregating data.

## To Practice

Want to practice interactively? Try this free course on statistics and R