# Bootstrapping

## Nonparametric Bootstrapping

The boot package provides extensive facilities for bootstrapping and related resampling methods. You can bootstrap a single statistic (e.g. a median), or a vector (e.g., regression weights). This section will get you started with basic nonparametric bootstrapping.

The main bootstrapping function is boot( ) and has the following format:

bootobject <- boot(data= , statistic= , R=, ...) where

 parameter description data A vector, matrix, or data frame statistic A function that produces the k statistics to be bootstrapped (k=1 if bootstrapping a single statistic). The function should include an indices parameter that the boot() function can use to select cases for each replication (see examples below). R Number of bootstrap replicates ... Additional parameters to be passed to the function that produces the statistic of interest

boot( ) calls the statistic function R times. Each time, it generates a set of random indices, with replacement, from the integers 1:nrow(data). These indices are used within the statistic function to select a sample. The statistics are calculated on the sample and the results are accumulated in the bootobject. The bootobject structure includes

 element description t0 The observed values of k statistics applied to the orginal data. t An R x k matrix where each row is a bootstrap replicate of the k statistics.

You can access these as bootobject\$t0 and bootobject\$t.

Once you generate the bootstrap samples, print(bootobject) and plot(bootobject) can be used to examine the results. If the results look reasonable, you can use boot.ci( ) function to obtain confidence intervals for the statistic(s).

The format is

boot.ci(bootobject, conf=, type= ) where

 parameter description bootobject The object returned by the boot function conf The desired confidence interval (default: conf=0.95) type The type of confidence interval returned. Possible values are "norm", "basic", "stud", "perc", "bca" and "all" (default: type="all")

### Bootstrapping a Single Statistic (k=1)

The following example generates the bootstrapped 95% confidence interval for R-squared in the linear regression of miles per gallon (mpg) on car weight (wt) and displacement (disp). The data source is mtcars. The bootstrapped confidence interval is based on 1000 replications.

```# Bootstrap 95% CI for R-Squared library(boot) # function to obtain R-Squared from the data rsq <- function(formula, data, indices) {   d <- data[indices,] # allows boot to select sample   fit <- lm(formula, data=d)   return(summary(fit)\$r.square) } # bootstrapping with 1000 replications results <- boot(data=mtcars, statistic=rsq,    R=1000, formula=mpg~wt+disp) # view results results plot(results) # get 95% confidence interval boot.ci(results, type="bca")```

### Bootstrapping several Statistics (k>1)

In example above, the function rsq returned a number and boot.ci returned a single confidence interval. The statistics function you provide can also return a vector. In the next example we get the 95% CI for the three model regression coefficients (intercept, car weight, displacement). In this case we add an index parameter to plot( ) and boot.ci( ) to indicate which column in bootobject\$t is to analyzed.

```# Bootstrap 95% CI for regression coefficients library(boot) # function to obtain regression weights bs <- function(formula, data, indices) {   d <- data[indices,] # allows boot to select sample   fit <- lm(formula, data=d)   return(coef(fit)) } # bootstrapping with 1000 replications results <- boot(data=mtcars, statistic=bs,    R=1000, formula=mpg~wt+disp) # view results results plot(results, index=1) # intercept plot(results, index=2) # wt plot(results, index=3) # disp # get 95% confidence intervals boot.ci(results, type="bca", index=1) # intercept boot.ci(results, type="bca", index=2) # wt boot.ci(results, type="bca", index=3) # disp ```

## Going Further

The boot( ) function can generate both nonparametric and parametric resampling. For the nonparametric bootstrap, resampling methods include ordinary, balanced, antithetic and permutation. For the nonparametric bootstrap, stratified resampling is supported. Importance resampling weights can also be specified.

The boot.ci( ) function takes a bootobject and generates 5 different types of two-sided nonparametric confidence intervals. These include the first order normal approximation, the basic bootstrap interval, the studentized bootstrap interval, the bootstrap percentile interval, and the adjusted bootstrap percentile (BCa) interval.

Look at help(boot), help(boot.ci), and help(plot.boot) for more details.

## Learning More

Good sources of information include Resampling Methods in R: The boot Package by Angelo Canty, Getting started with the boot package by Ajay Shah, Bootstrapping Regression Models by John Fox, and Bootstrap Methods and Their Applications by Davison and Hinkley.

## To Practice

Try this interactive exercise with the boot package from DataCamp's Intro to Computational Finance with R course.