# Missing Data

In R, missing values are represented by the symbol **NA** (not available). Impossible values (e.g., dividing by zero) are represented by the symbol **NaN** (not a number). Unlike SAS, R uses the same symbol for character and numeric data.

For more practice on working with missing data, try this course on cleaning data in R.

## Testing for Missing Values

`is.na(x) # returns TRUE of x is missing`

y <- c(1,2,3,NA)

is.na(y) # returns a vector (F F F T)

## Recoding Values to Missing

`# recode 99 to missing for variable v1`

# select rows where v1 is 99 and recode column v1

mydata$v1[mydata$v1==99] <- NA

## Excluding Missing Values from Analyses

Arithmetic functions on missing values yield missing values.

`x <- c(1,2,NA,3)`

mean(x) # returns NA

mean(x, na.rm=TRUE) # returns 2

The function **complete.cases()** returns a logical vector indicating which cases are complete.

`# list rows of data that have missing values `

mydata[!complete.cases(mydata),]

The function **na.omit()** returns the object with listwise deletion of missing values.

`# create new dataset without missing data `

newdata <- na.omit(mydata)

## Advanced Handling of Missing Data

Most modeling functions in R offer options for dealing with missing values. You can go beyond pairwise of listwise deletion of missing values through methods such as multiple imputation. Good implementations that can be accessed through R include Amelia II, Mice, and mitools.