Creating new variables

Use the assignment operator <- to create new variables. A wide array of operators and functions are available here.

# Three examples for doing the same computations

mydata$sum <- mydata$x1 + mydata$x2
mydata$mean <- (mydata$x1 + mydata$x2)/2

attach(mydata)
mydata$sum <- x1 + x2
mydata$mean <- (x1 + x2)/2
detach(mydata)

mydata <- transform( mydata,
sum = x1 + x2,
mean = (x1 + x2)/2
)

(To practice working with variables in R, try the first chapter of this free interactive course.)

Recoding variables

In order to recode data, you will probably use one or more of R's control structures.

# create 2 age categories

mydata$agecat <- ifelse(mydata$age > 70,
c("older"), c("younger"))

# another example: create 3 age categories

attach(mydata)
mydata$agecat[age > 75] <- "Elder"
mydata$agecat[age > 45 & age <= 75] <- "Middle Aged"
mydata$agecat[age <= 45] <- "Young"
detach(mydata)

Renaming variables

You can rename variables programmatically or interactively.

# rename interactively
fix(mydata) # results are saved on close

# rename programmatically
library(reshape)
mydata <- rename(mydata, c(oldname="newname"))

# you can re-enter all the variable names in order
# changing the ones you need to change.the limitation
#
is that you need to enter all of them!
names(mydata) <- c("x1","age","y", "ses")

Variable types in R

R supports a diverse range of variable types, each tailored to handle specific data forms:

When creating new variables, it's essential to ensure they are of the appropriate type for your analysis. If unsure, you can use the class() function to check a variable's type.

Checking and changing variable types

Ensuring your variables are of the correct type is crucial for accurate analysis:

Variable scope

Understanding the scope of a variable is essential:

Using variables with functions

Variables play a central role when working with functions:

Variable operations

Depending on their type, you can perform various operations on variables:

Recoding variables

Recoding involves changing the values of a variable based on certain conditions. For instance, you might want to group ages into categories like "young", "middle-aged", and "senior". R offers various control structures to facilitate this process. When recoding, always ensure that the new categories or values make logical sense and serve the purpose of your analysis.

Renaming variables

There might be instances where you'd want to rename variables for clarity or consistency. R provides two primary ways to rename variables:

Frequently Asked Questions (FAQs) about Variables in R

Q: What's the difference between <- and=for assignment in R?

A: Both <- and = can be used for assignment in R. However, <- is the more traditional and preferred method, especially in scripts and functions. The = operator is often used within function calls to specify named arguments.

Q: How can I check the type of a variable in R?

A: You can use the class() function to determine the type or class of a variable. This function will return values like "numeric", "character", "factor", and so on, depending on the variable's type.

Q: I mistakenly assigned a character value to a numeric variable. How can I correct it?

A: R provides type conversion functions like as.numeric(), as.character(), and as.logical(). You can use these functions to convert a variable to the desired type.

Q: What does "recoding variables" mean?

A: Recoding refers to the process of changing or transforming the values of a variable based on certain criteria. For instance, converting a continuous age variable into age categories (e.g., "young", "middle-aged", "senior") is an example of recoding.

Q: How can I rename a variable in my dataset?

A: R offers multiple ways to rename variables. You can do it interactively using the fix() function, which opens a data editor. Alternatively, there are various R packages and functions that allow for programmatic renaming of variables.

Q: Are variable names in R case-sensitive?

A: Yes, variable names in R are case-sensitive. This means that myVariable, MyVariable, and myvariable would be treated as three distinct variables.

Q: Can I use spaces in variable names?

A: It's not recommended to use spaces in variable names in R. Instead, you can use underscores (_) or periods (.) to separate words in variable names, like my_variable or my.variable.

Q: How do I delete or remove a variable from my workspace?

A: You can use the rm() function followed by the variable name to remove it from your workspace. It's a good practice to clear unnecessary variables to free up memory.

Q: What's the difference between local and global variables?

A: Local variables are confined to the function or environment they are created in and can't be accessed outside of it. In contrast, global variables are accessible throughout your entire script or R session.

Q: How can I view all the variables currently in my workspace?

A: You can use the ls() function to list all the variables currently present in your workspace.