binom.test()
This test requires a number of successes, and number of trials, and an expected proportion of successes.
The simplest way to run this test does not require a dataset at all, but just these three numbers. In this example, there were 110 trials with 64 successes, and the expected proportion is 50%
binom.test(x = 64, n = 110, p = 0.5 )
More commonly, you may have one dichotomous variable representing success or failure, as well as an expected proportion.
In this example, you calculate the number of successes from the vector data$Ran
, in which 2 represents a success.
data <- read.delim("http://www.statsci.org/data/oz/ms212.txt") binom.test(sum(data$Ran == 2), length(data$Ran), p = 0.5 )
data$Ran == 2
creates a vector with TRUE or FALSE for each observation in data$Ran
, depending on whether the observation is 2 (success) or not. sum()
of that vector gives the number of successes, because TRUE is counted as 1. length(data$Ran)
is the number of trials because it outputs the number of observations in data$Ran
.
The data maybe already tabulated as a frequency table, and not as a column with rows for each trial.
frequency_table <- data.frame(success = c(1,2), frequency = c(46,64))
number_of_successes
is the second value in the frequency column, and number_of_trials
is the sum: successes plus failures.
number_of_successes <- frequency_table$frequency[2] number_of_trials <- sum(frequency_table$frequency) binom.test(x = number_of_successes, n = number_of_trials, p = 0.5)
chisq.test()
Requires one categorical variable with values of expected frequencies.
data <- read.delim("http://www.statsci.org/data/oz/ms212.txt")
table(data$Ran)
creates a frequency table. If no probability is provided, it is assumed that each category is equally likely.
chisq.test(table(data$Ran))
To indicate different expected probabilities, create a vector of probabilities inside c()
, in the order the categories appear in the table
chisq.test(table(data$Ran), p = c(0.75, 0.25))
If the data are available only as a frequency table, and not as a column with a value for each observation as shown above, you can simply use the vector that represents the frequency of each category.
frequency_table <- data.frame(category = c(1,2), frequency = c(46,64)) chisq.test(frequency_table$frequency))
Requires two categorical variables, with two or more possible values.
chisq.test()
Requires two categorical variables, with two or more possible values.
data <- read.csv("http://users.stat.ufl.edu/~winner/data/marij1_indiv.csv")
chisq.test()
expects a matrix where the rows and columns are possible values of the two variables, and the cells are the number of observations with each combination of values. This can be created in R using the table()
function.
# table(data$marijUse, data$party) create a matrix like this: # 1 2 3 # 1 40 213 118 # 2 3 55 40 # 3 1 44 54 # 4 0 17 32 chisq.test(table(data$marijUse, data$party))
data <- read.csv("http://users.stat.ufl.edu/~winner/data/marij1.csv")
Data in this structure will require reshaping to create a suitable matrix for chisq.test()
. We can accomplish this with the reshape2
package.
library(reshape2) data_as_matrix <- acast(data, marijUse ~ party) chisq.test(data_as_matrix)
fisher.test()
data <- read.delim("http://www.statsci.org/data/oz/ms212.txt") fisher.test(data$Gender, data$Smokes)
fisher.test()
can accept either a matrix or two vectors. In this example, two vectors are taken from a dataframe.
data <- data.frame(gender = c(1, 2, 1, 2), smokes = c(1, 1, 2, 2), frequency = c(8, 3, 51, 48))
With data in this tabulated form, it is easier to create a matrix using the reshape2
package.
library(reshape2) data_as_matrix <- acast(data, gender ~ smokes) # the matrix looks like this: # 1 2 # 1 8 51 # 2 3 48 fisher.test(data_as_matrix)
t.test()
Requires one normally distributed numerical variable and a hypothesized mean. See instructions for checking for normality.
data <- read.table("http://www.statsci.org/data/general/balaconc.txt", stringsAsFactors = FALSE, header = TRUE) t.test(data$SideSway, mu = 11)
In this case, the hypothesized mean is 11.
SignTest()
Requires one numerical variable and a hypothesized median. The numerical variable does not need to be normally distributed.
data <- read.table("http://www.statsci.org/data/general/balaconc.txt", stringsAsFactors = FALSE, header = TRUE) library(DescTools) SignTest(data$SideSway, mu = 22)
t.test()
Requires one normally distributed, numerical variable and one grouping variable with two values. The grouping variable may be numeric-type or string-type.
data <- read.table("http://www.statsci.org/data/general/balaconc.txt", stringsAsFactors = FALSE, header = TRUE) t.test(SideSway ~ Age, data = data, var.equal = TRUE)
t.test()
Requires two numerical variables that are paired. Paired samples are matched in some way; often they represent the same object or respondent tested at different points in time.
data <- read.table("http://www.statsci.org/data/oz/stroke.txt", header = TRUE) t.test(data$Bart1, data$Bart8, paired = TRUE)
aov()
Requires one normally distributed, numerical response variable and one categorical grouping variable with two or more values.
data <- read.table("http://www.statsci.org/data/general/wolfrive.txt", stringsAsFactors = FALSE, header = TRUE) summary(aov(data$HCB ~ data$Depth))
t.test()
Requires one normally distributed, numerical variable and one grouping variable with two values. The grouping variable may be numeric-type or string-type.
data <- read.table("http://www.statsci.org/data/general/balaconc.txt", stringsAsFactors = FALSE, header = TRUE) t.test(data$SideSway ~ data$Age)
aov()
Requires one normally distributed numerical response variable and two categorical grouping variables with two or more values.
data <- read.table("http://www.statsci.org/data/general/fullmoon.txt", stringsAsFactors = FALSE, header = TRUE) summary(aov(data$Admission ~ data$Month + data$Moon))
wilcox.test()
data <- read.table("http://www.statsci.org/data/general/balaconc.txt", stringsAsFactors = FALSE, header = TRUE) wilcox.test(data$FBSway~data$Age)
lm()
Requires two numerical variables.
data <- read.table("http://www.statsci.org/data/general/kittiwak.txt", stringsAsFactors = FALSE, header = TRUE) linear_model <- lm(data$Population ~ data$Area) summary(linear_model)
cor.test(x, y, method=c("pearson"))
Requires two numerical variables. See setup for Simple linear regression above.
data <- read.table("http://www.statsci.org/data/general/kittiwak.txt", stringsAsFactors = FALSE, header = TRUE) cor.test(data$Population, data$Area, method=c("pearson"))
cor.test(x, y, method=c("spearman"))
Requires two numerical variables. See setup for Simple linear regression above.
data <- read.table("http://www.statsci.org/data/general/kittiwak.txt", stringsAsFactors = FALSE, header = TRUE) cor.test(data$Population, data$Area, method=c("spearman"))