Skip to Main Content

R: Tests

Resources for learning and using the R programming language.

Tests for analyzing a single categorical variable

Binomial test

binom.test()

This test requires a number of successes, and number of trials, and an expected proportion of successes.

Data set-up: Known values

The simplest way to run this test does not require a dataset at all, but just these three numbers. In this example, there were 110 trials with 64 successes, and the expected proportion is 50%

binom.test(x = 64, n = 110, p = 0.5 )
Data set-up: Disaggregated (Raw data)

Ran is a dichotomous variableMore commonly, you may have one dichotomous variable representing success or failure, as well as an expected proportion. 

In this example, you calculate the number of successes from the vector data$Ran, in which 2 represents a success.

data <- read.delim("http://www.statsci.org/data/oz/ms212.txt")
binom.test(sum(data$Ran ==  2),  length(data$Ran), p = 0.5  )

data$Ran == 2 creates a vector with TRUE or FALSE for each observation in data$Ran, depending on whether the observation is 2 (success) or not. sum() of that vector gives the number of successes, because TRUE is counted as 1. length(data$Ran) is the number of trials because it outputs the number of observations in data$Ran.

Data set-up: Tabulated (Contingency table)

how a contingency table looksThe data maybe already tabulated as a frequency table, and not as a column with rows for each trial.

frequency_table <- data.frame(success = c(1,2), frequency = c(46,64))

number_of_successes is the second value in the frequency column, and number_of_trials is the sum: successes plus failures.

number_of_successes <- frequency_table$frequency[2]
number_of_trials <- sum(frequency_table$frequency)
binom.test(x = number_of_successes, n = number_of_trials, p = 0.5)

See example output

Chi square (X²)  goodness of fit test

chisq.test()

Requires one categorical variable with values of expected frequencies.

Data set-up: Disaggregated (Raw data)

data <- read.delim("http://www.statsci.org/data/oz/ms212.txt")

table(data$Ran) creates a frequency table. If no probability is provided, it is assumed that each category is equally likely.

chisq.test(table(data$Ran))

To indicate different expected probabilities, create a vector of probabilities inside c(), in the order the categories appear in the table

chisq.test(table(data$Ran), p = c(0.75, 0.25))
Data set-up: Tabulated (Contingency table)

what a contingency table looks likeIf the data are available only as a frequency table, and not as a column with a value for each observation as shown above, you can simply use the vector that represents the frequency of each category.

frequency_table <- data.frame(category = c(1,2), frequency = c(46,64))
chisq.test(frequency_table$frequency))

See example output

Tests for analyzing relationship between two categorical variables

Requires two categorical variables, with two or more possible values.

Chi square (X²) contingency test

chisq.test()

Data set-up: Disaggregated (Raw data)

what disaggregated data looks likeRequires two categorical variables, with two or more possible values.

data <- read.csv("http://users.stat.ufl.edu/~winner/data/marij1_indiv.csv")

chisq.test() expects a matrix where the rows and columns are possible values of the two variables, and the cells are the number of observations with each combination of values. This can be created in R using the table() function.

# table(data$marijUse, data$party) create a matrix like this:

#      1   2   3
#  1  40 213 118
#  2   3  55  40
#  3   1  44  54
#  4   0  17  32

chisq.test(table(data$marijUse, data$party))
Data set-up: Tabulated (Contingency table)

what tabulated data look like

data <- read.csv("http://users.stat.ufl.edu/~winner/data/marij1.csv")

Data in this structure will require reshaping to create a suitable matrix for chisq.test(). We can accomplish this with the reshape2 package.

library(reshape2)
data_as_matrix <- acast(data, marijUse ~ party)
chisq.test(data_as_matrix)

See example output

Fisher's exact test

fisher.test()

Requires two categorical variables with two possible values each.
Data set-up: Disaggregated (Raw data)

how disaggregated data look

data <- read.delim("http://www.statsci.org/data/oz/ms212.txt")
fisher.test(data$Gender, data$Smokes)

fisher.test() can accept either a matrix or two vectors. In this example, two vectors are taken from a dataframe.

Data set-up: Tabulated (Contingency table)

undefined

data <- data.frame(gender = c(1, 2, 1, 2), 
                   smokes = c(1, 1, 2,  2), 
                   frequency = c(8, 3, 51, 48))

With data in this tabulated form, it is easier to create a matrix using the reshape2 package.

library(reshape2)
data_as_matrix <- acast(data, gender ~ smokes)

# the matrix looks like this:
#   1  2
# 1 8 51
# 2 3 48

fisher.test(data_as_matrix)

See example output

Tests for analyzing a single numerical variable

One-sample t-test

t.test()

data set up for one sample t testRequires one normally distributed numerical variable and a hypothesized mean. See instructions for checking for normality.

data <- read.table("http://www.statsci.org/data/general/balaconc.txt", 
                   stringsAsFactors = FALSE, header = TRUE)
t.test(data$SideSway, mu = 11)

In this case, the hypothesized mean is 11.

See example output

Sign test for median

SignTest()

Requires one numerical variable and a hypothesized median. The numerical variable does not need to be normally distributed.

data <- read.table("http://www.statsci.org/data/general/balaconc.txt", 
                   stringsAsFactors = FALSE, header = TRUE)
library(DescTools)
SignTest(data$SideSway, mu = 22)

See example output

Tests with a numerical response variable and explanatory categorical variable(s) (Parametric)

Two-sample t-test

t.test()

Requires one normally distributed, numerical variable and one grouping variable with two values. The grouping variable may be numeric-type or string-type.

data <- read.table("http://www.statsci.org/data/general/balaconc.txt", 
                   stringsAsFactors = FALSE, header = TRUE)
t.test(SideSway ~ Age, data = data, var.equal = TRUE)

See example output

Paired t-test

t.test()

Data from Recovered from StrokeRequires two numerical variables that are paired. Paired samples are matched in some way; often they represent the same object or respondent tested at different points in time.

data <- read.table("http://www.statsci.org/data/oz/stroke.txt", 
         header = TRUE)
t.test(data$Bart1, data$Bart8, paired = TRUE)

See example output

One-way ANOVA

aov()

Requires one normally distributed, numerical response variable and one categorical grouping variable with two or more values. 

data <- read.table("http://www.statsci.org/data/general/wolfrive.txt", 
                   stringsAsFactors = FALSE, header = TRUE)
summary(aov(data$HCB ~ data$Depth))

See example output

Welch's t-test

t.test()

Requires one normally distributed, numerical variable and one grouping variable with two values. The grouping variable may be numeric-type or string-type.

data <- read.table("http://www.statsci.org/data/general/balaconc.txt", 
                   stringsAsFactors = FALSE, header = TRUE)
t.test(data$SideSway ~ data$Age)

See example output

Multiway ANOVA

aov()

two way anova data set upRequires one normally distributed numerical response variable and two categorical grouping variables with two or more values. 

data <- read.table("http://www.statsci.org/data/general/fullmoon.txt", 
                   stringsAsFactors = FALSE, header = TRUE)
summary(aov(data$Admission ~ data$Month + data$Moon))

See example output

Tests with a numerical response variable and an explanatory categorical variable (Non-parametric)

Mann-Whitney U-test

wilcox.test()

data setupRequires one numerical or ordinal variable, and one grouping variable with two values. 

data <- read.table("http://www.statsci.org/data/general/balaconc.txt", 
                   stringsAsFactors = FALSE, header = TRUE)
wilcox.test(data$FBSway~data$Age)

See example output

Kruskal-Wallis Test

kruskal.test()

Requires one numerical or ordinal variable, and one grouping variable with two or more values. 

data <- read.table("http://www.statsci.org/data/general/balaconc.txt", 
                   stringsAsFactors = FALSE, header = TRUE)
kruskal.test(data$FBSway ~ data$Age)

See example output

Tests for analyzing the relationship between numerical variables

Simple linear regression

lm()

regression dataRequires two numerical variables.

data <- read.table("http://www.statsci.org/data/general/kittiwak.txt", 
                   stringsAsFactors = FALSE, header = TRUE)
linear_model <- lm(data$Population ~ data$Area)
summary(linear_model)

See example output

Linear correlation

cor.test(x, y, method=c("pearson"))

Requires two numerical variables. See setup for Simple linear regression above.

data <- read.table("http://www.statsci.org/data/general/kittiwak.txt", 
                   stringsAsFactors = FALSE, header = TRUE)
cor.test(data$Population, data$Area, method=c("pearson"))

See example output

Spearman's rank correlation

cor.test(x, y, method=c("spearman"))

Requires two numerical variables. See setup for Simple linear regression above.

data <- read.table("http://www.statsci.org/data/general/kittiwak.txt", 
                   stringsAsFactors = FALSE, header = TRUE)
cor.test(data$Population, data$Area, method=c("spearman"))

See example output