simplestats.Rmd
Many widely used and powerful statistical analysis commands — such as
lm
, glm
, lme4::lmer
, etc — have a
simple and consistent calling syntax, often involving a “formula” (e.g.,
y ~ x
), which makes them consistent, and easy to remember
and apply. Some other functions, even simple ones, don’t use the formula
syntax, or can be a bit awkward to use in some contexts, or require
default values of arguments to be explicitly overridden. In the
psyntur
, there are some tools that aim to make this
functions easier to apply.
These functions and the accompanying data sets can be loaded with the
usual library
command.
library(psyntur)
t_test
R’s stats::t.test
makes it easy to perform independent,
paired, or one-sample t-tests. For the independent sample t-test, the
default is the Welch two sample t-test. While arguably a good choice in
practice, when t-tests are being taught to illustrate a simple example
of normal linear model, the assumption of homogeneity of variance is
used. To use this with t.test
, this requires
var.equal = TRUE
to be used. The t_test
function is psyntur
is used when the standard independent
t-test with homogeneity of variance is the desired default test. For
example, in the following, we use it with the faithfulfaces
data set.
t_test(trustworthy ~ face_sex, data = faithfulfaces)
#>
#> Two Sample t-test
#>
#> data: trustworthy by face_sex
#> t = 1.9389, df = 168, p-value = 0.05419
#> alternative hypothesis: true difference in means between group female and group male is not equal to 0
#> 95 percent confidence interval:
#> -0.004253649 0.471193782
#> sample estimates:
#> mean in group female mean in group male
#> 4.444061 4.210591
paired_t_test
For paired t-tests, the paired_t_test
function can be
used. In this function, a formula is not used. Instead, two variables in
the same data frame, which are assumed to be paired in some manner, are
used. For example, the pairedsleep
data set (included in
psyntur
) is as follows.
pairedsleep
#> # A tibble: 10 × 3
#> ID y1 y2
#> <fct> <dbl> <dbl>
#> 1 1 0.7 1.9
#> 2 2 -1.6 0.8
#> 3 3 -0.2 1.1
#> 4 4 -1.2 0.1
#> 5 5 -0.1 -0.1
#> 6 6 3.4 4.4
#> 7 7 3.7 5.5
#> 8 8 0.8 1.6
#> 9 9 0 4.6
#> 10 10 2 3.4
This gives the difference from control in number of hours slept by 10
different patients when each took two different drugs. These time
differences under the two drugs are y1
and y2
.
A paired samples t-test can be performed as follows with this data.
paired_t_test(y1, y2, data = pairedsleep)
#>
#> Paired t-test
#>
#> data: vec_1 and vec_2
#> t = -4.0621, df = 9, p-value = 0.002833
#> alternative hypothesis: true mean difference is not equal to 0
#> 95 percent confidence interval:
#> -2.4598858 -0.7001142
#> sample estimates:
#> mean difference
#> -1.58
pairwise_t_test
For independent t-tests applied all pairs of a set of variables, to
which p-value adjustments are applied, we can use
pairwise_t_test
. For example, the following creates a
categorical variable with four values, which are the interaction of two
binary variables.
data_df <- dplyr::mutate(vizverb, IV = interaction(task, response))
Independent samples t-tests with Bonferroni corrections on the
time
variable applied to all pairs of the four levels of
the IV
variable can be done as follows.
pairwise_t_test(time ~ IV, data = data_df)
#>
#> Pairwise comparisons using t tests with pooled SD
#>
#> data: y and x
#>
#> verbal.verbal visual.verbal verbal.visual
#> visual.verbal 0.0790 - -
#> verbal.visual 1.0000 0.0166 -
#> visual.visual 0.0044 2.9e-07 0.0241
#>
#> P value adjustment method: bonferroni
shapiro_test
The Shapiro-Wilk test of normality can be applied to a single numeric vector in a data frame as in the following example.
shapiro_test(time, data = data_df)
#> # A tibble: 1 × 2
#> statistic p_value
#> <dbl> <dbl>
#> 1 0.911 0.0000378
To test the normality of each subset of a variable, such as
time
, corresponding to the values of a categorical
variable, we can use a by
variable as in the following
example.
shapiro_test(time, by = IV, data = data_df)
#> # A tibble: 4 × 3
#> IV statistic p_value
#> <fct> <dbl> <dbl>
#> 1 verbal.verbal 0.755 0.000198
#> 2 visual.verbal 0.861 0.00809
#> 3 verbal.visual 0.938 0.221
#> 4 visual.visual 0.914 0.0763