# Subsetting Data

**R** has powerful indexing features for accessing object elements. These features can be used to select and exclude variables and observations. The following code snippets demonstrate ways to keep or delete variables and observations and to take random samples from a dataset.

## Selecting (Keeping) Variables

`# select variables v1, v2, v3`

myvars <- c("v1", "v2", "v3")

newdata <- mydata[myvars]

# another method

myvars <- paste("v", 1:3, sep="")

newdata <- mydata[myvars]

# select 1st and 5th thru 10th variables

newdata <- mydata[c(1,5:10)]

## Excluding (DROPPING) Variables

`# exclude variables v1, v2, v3`

myvars <- names(mydata) %in% c("v1", "v2", "v3")

newdata <- mydata[!myvars]

# exclude 3rd and 5th variable

newdata <- mydata[c(-3,-5)]

# delete variables v3 and v5

mydata$v3 <- mydata$v5 <- NULL

## Selecting Observations

`# first 5 observerations`

newdata <- mydata[1:5,]

# based on variable values

newdata <- mydata[ which(mydata$gender=='F'

& mydata$age > 65), ]

# or

attach(newdata)

newdata <- mydata[ which(gender=='F' & age > 65),]

detach(newdata)

## Selection using the Subset Function

The **subset( ) **function is the easiest way to select variables and observeration. In the following example, we select all rows that have a value of age greater than or equal to 20 or age less then 10. We keep the ID and Weight columns.

`# using subset function `

newdata <- subset(mydata, age >= 20 | age < 10,

select=c(ID, Weight))

In the next example, we select all men over the age of 25 and we keep variables weight *through* income (weight, income and all columns between them).

`# using subset function (part 2)`

newdata <- subset(mydata, sex=="m" & age > 25,

select=weight:income)

## Random Samples

Use the **sample( )** function to take a **random sample of size n** from a dataset.

`# take a random sample of size 50 from a dataset `

*mydata*

# sample without replacement

mysample <- mydata[sample(1:nrow(mydata), 50,

replace=FALSE),]

## Going Further

**R** has extensive facilities for sampling, including drawing and calibrating survey samples (see the sampling package), analyzing complex survey data (see the survey package and it's homepage) and bootstrapping.