DataCamp R Course

Descriptive Statistics

R provides a wide range of functions for obtaining summary statistics. One method of obtaining descriptive statistics is to use the sapply( ) function with a specified summary statistic.

# get means for variables in data frame mydata
# excluding missing values
sapply(mydata, mean, na.rm=TRUE)

Possible functions used in sapply include mean, sd, var, min, max, median, range, and quantile.

There are also numerous R functions designed to provide a range of descriptive statistics at once. For example

# mean,median,25th and 75th quartiles,min,max

# Tukey min,lower-hinge, median,upper-hinge,max

Using the Hmisc package

# n, nmiss, unique, mean, 5,10,25,50,75,90,95th percentiles
# 5 lowest and 5 highest scores

Using the pastecs package

# nbr.val, nbr.null,, min max, range, sum,
# median, mean, SE.mean, CI.mean, var,, coef.var

Using the psych package

# item name ,item number, nvalid, mean, sd,
# median, mad, min, max, skew, kurtosis, se

Summary Statistics by Group

A simple way of generating summary statistics by grouping variable is available in the psych package.

library(psych), group,...)

The doBy package provides much of the functionality of SAS PROC SUMMARY. It defines the desired table using a model formula and a function. Here is a simple example.

summaryBy(mpg + wt ~ cyl + vs, data = mtcars,
  FUN = function(x) { c(m = mean(x), s = sd(x)) } )
# produces mpg.m wt.m mpg.s wt.s for each
# combination of the levels of cyl and vs

See also: aggregating data.

To Practice

Want to practice interactively? Try this free course on statistics and R