Title: | Judd, McClelland, & Ryan Formatting for ANOVA Output |
---|---|
Description: | Produces ANOVA tables in the format used by Judd, McClelland, and Ryan (2017, ISBN: 978-1138819832) in their introductory textbook, Data Analysis. This includes proportional reduction in error and formatting to improve ease the transition between the book and R. |
Authors: | Adam Blake [cre, aut] , Jeff Chrabaszcz [aut], Ji Son [aut] , Jim Stigler [aut] |
Maintainer: | Adam Blake <[email protected]> |
License: | GPL (>= 3) |
Version: | 3.0.0 |
Built: | 2024-11-26 04:53:36 UTC |
Source: | https://github.com/uclatall/supernova |
Plotting method for pairwise objects.
autoplot.pairwise(object, ...) ## S3 method for class 'pairwise' plot(x, y, ...)
autoplot.pairwise(object, ...) ## S3 method for class 'pairwise' plot(x, y, ...)
object |
A |
... |
Additional arguments passed to the plotting geom. |
x |
A |
y |
Ignored, required for compatibility with the |
This function requires an optional dependency: ggplot2
.
When this package is installed, calling autoplot()
or plot
on a pairwise
object will
generate a plot of the pairwise comparisons. The plot will show the differences between the
groups, with error bars representing the confidence intervals. The x-axis will be labeled with
the type of confidence interval used and the values of the differences, and the y-axis will be
labeled with the groups being compared. A dashed line at 0 is included to help visualize the
differences.
if (require(ggplot2)) { # generate the plot immediately pairwise(lm(mpg ~ factor(am) + disp, data = mtcars), plot = TRUE) # or save the object and plot it later p <- pairwise(lm(mpg ~ factor(am) + disp, data = mtcars)) plot(p) }
if (require(ggplot2)) { # generate the plot immediately pairwise(lm(mpg ~ factor(am) + disp, data = mtcars), plot = TRUE) # or save the object and plot it later p <- pairwise(lm(mpg ~ factor(am) + disp, data = mtcars)) plot(p) }
This function is needed to re-fit the models for Type III SS. If you have a model with an
interactive term (e.g. y ~ a + b + a:b
), when you try to refit without one of the lower-order
terms (e.g. y ~ a + a:b
) lm()
will add it back in. This function uses a fitting function
that operates underneath lm()
to circumvent this behavior. (It is very similar to drop1()
.)
drop_term(fit, term)
drop_term(fit, term)
fit |
The model to update. |
term |
The term to drop from the model. |
An object of the class lm
.
lm()
with the fitted equation.Print the output of lm()
with the fitted equation.
equation(x, digits = max(3L, getOption("digits") - 3L))
equation(x, digits = max(3L, getOption("digits") - 3L))
x |
The fitted linear model to print. |
digits |
The minimal number of significant digits. |
Invisibly return the fitted linear model.
This function will return a list of lists where the top-level keys (names) of the items indicate
the component of the full model (i.e. the term) that the generated models can be used to test. At
each of these keys is a list with both the complex
and simple models
that can be compared to
test the component. The complex
models always include the target term, and the simple
models
are identical to the complex
except the target term is removed. Thus, when the models are
compared (e.g. using anova
, except for Type III; see details below), the resulting values
will show the effect of adding the target term to the model. There are three generally used
approaches to determining what the appropriate comparison models should be, called Type I, II,
and III. See the sections below for more information on these types.
generate_models(model, type = 3) ## S3 method for class 'formula' generate_models(model, type = 3) ## S3 method for class 'lm' generate_models(model, type = 3)
generate_models(model, type = 3) ## S3 method for class 'formula' generate_models(model, type = 3) ## S3 method for class 'lm' generate_models(model, type = 3)
model |
The model to generate the models from, of the type |
type |
The type of sums of squares to calculate: - Use |
A list of the augmented models for each term, where the associated term is the key for each model in the list.
For Type I SS, or sequential SS, each term is considered in order after the preceding terms are considered. Consider the example model
Y ~ A + B + A:B
, where ":" indicates an interaction. To determine the Type I effect of A
, we would compare the
model Y ~ A
to the same model without the term: Y ~ NULL
. For B
, we compare Y ~ A + B
to
Y ~ A
; and for A:B
, we compare Y ~ A + B + A:B
to Y ~ A + B
. Incidentally, the anova()
function that ships with the base installation of R computes Type I statistics.
For Type II SS, or hierarchical SS, each term is considered in the presence of all of the terms that do not include it. For example, consider an example three-way factorial model
Y ~ A + B + C + A:B + A:C + B:C + A:B:C
, where ":" indicates an interaction. The effect of A
is found by comparing Y ~ B + C + B:C + A
to Y ~ B + C + B:C
(the only terms included are those that do not include A
). For B
, the
comparison models would be Y ~ A + C + A:C + B
and Y ~ A + C + A:C
; for A:B
, the models
would be Y ~ A + B + C + A:C + B:C + A:B
and Y ~ A + B + C + A:C + B:C
; and so on.
For Type III SS, or orthogonal SS, each term is considered in the presence of all of the other terms. For example, consider an example two-way factorial model
Y ~ A + B + A:B
, where :
indicates an interaction between the terms. The effect of A
, is found by comparing
Y ~ B + A:B + A
to Y ~ B + A:B
; for B
, the comparison models would be Y ~ A + A:B + B
and
Y ~ A + A:B
; and for A:B
, the models would be Y ~ A + B + A:B
and Y ~ A + B
.
Unfortunately, anova()
cannot be used to compare Type III models. anova()
does not allow for
violation of the principle of marginality, which is the rule that interactions should only be
tested in the context of their lower order terms. When an interaction term is present in a model,
anova()
will automatically add in the lower-order terms, making a model like Y ~ A + A:B
unable to be compared: it will add the lower-order term B
,and thus use the model Y ~ A + B + A:B
instead. To get the appropriate statistics for Type III comparisons, use drop1()
with the
full scope, i.e. drop1(model_fit, scope = . ~ .)
.
# create all type 2 comparison models models <- generate_models( lm(mpg ~ hp * factor(am), data = mtcars), type = 2 ) # compute the SS for the hp term anova_hp <- anova(models$hp$simple, models$hp$complex) anova_hp[["Sum of Sq"]][[2]]
# create all type 2 comparison models models <- generate_models( lm(mpg ~ hp * factor(am), data = mtcars), type = 2 ) # compute the SS for the hp term anova_hp <- anova(models$hp$simple, models$hp$complex) anova_hp[["Sum of Sq"]][[2]]
Remove cases with missing values.
listwise_delete(obj, vars) ## S3 method for class 'data.frame' listwise_delete(obj, vars = names(obj)) ## S3 method for class 'lm' listwise_delete(obj, vars = all.vars(formula(obj)))
listwise_delete(obj, vars) ## S3 method for class 'data.frame' listwise_delete(obj, vars = names(obj)) ## S3 method for class 'lm' listwise_delete(obj, vars = all.vars(formula(obj)))
obj |
The |
vars |
The variables to consider. |
For data.frame
s, the vars
are checked for missing values. If one is found on any of
the variables, the entire row is removed (list-wise deletion). For linear models, the model is
refit after the underlying data have been processed.
number
vectorThis creates a formatted double vector. You can specify the number of digits you want the value to display after the decimal, and the underlying value will not change. Additionally you can explicitly set whether scientific notation should be used and if numbers less than 0 should contain a leading 0.
number(x = numeric(), digits = 3L, scientific = FALSE, leading_zero = TRUE) is_number(x) as_number(x)
number(x = numeric(), digits = 3L, scientific = FALSE, leading_zero = TRUE) is_number(x) as_number(x)
x |
|
digits |
The number of digits to display after the decimal point. |
scientific |
Whether the number should be represented with scientific notation (e.g. 1e2) |
leading_zero |
Whether a leading zero should be used on numbers less than 0 (e.g. .001) |
An S3 vector of class supernova_number
. It should behave like a double, but be
formatted consistently.
number(1:5, digits = 3)
number(1:5, digits = 3)
This function is useful for generating and testing all pairwise comparisons of categorical terms
in a linear model. This can be done in base R using functions like pairwise.t.test
and
TukeyHSD
, but these functions are inconsistent both in their output format and their general
approach to pairwise comparisons. pairwise()
will return a consistent table format, and will
make consistent decisions about how to calculate error terms and confidence intervals. See the
Details section low for more on how the models are tested (and why your output might not
match other functions).
pairwise( fit, correction = "Tukey", term = NULL, alpha = 0.05, var_equal = TRUE, plot = FALSE ) pairwise_t(fit, term = NULL, alpha = 0.05, correction = "none") pairwise_bonferroni(fit, term = NULL, alpha = 0.05) pairwise_tukey(fit, term = NULL, alpha = 0.05)
pairwise( fit, correction = "Tukey", term = NULL, alpha = 0.05, var_equal = TRUE, plot = FALSE ) pairwise_t(fit, term = NULL, alpha = 0.05, correction = "none") pairwise_bonferroni(fit, term = NULL, alpha = 0.05) pairwise_tukey(fit, term = NULL, alpha = 0.05)
fit |
|
correction |
The type of correction (if any) to perform to maintain the family-wise
error-rate specified by |
term |
If |
alpha |
The family-wise error-rate to restrict the tests to. If "none" is given for
|
var_equal |
If |
plot |
Setting plot to TRUE will automatically call |
For simple one-way models where a single categorical variable predicts and outcome, you will get
output similar to other methods of computing pairwise comparisons. Essentially, the differences
on the outcome between each of the groups defined by the categorical variable are compared with
the requested test, and their confidence intervals and p-values are adjusted by the requested
correction
.
However, when more than two variables are entered into the model, the outcome will diverge somewhat from other methods of computing pairwise comparisons. For traditional pairwise tests you need to estimate an error term, usually by pooling the standard deviation of the groups being compared. This means that when you have other predictors in the model, their presence is ignored when running these tests. For the functions in this package, we instead compute the pooled standard error by using the mean squared error (MSE) from the full model fit.
Let's take a concrete example to explain that. If we are predicting a car's miles-per-gallon
(mpg
) based on whether it has an automatic or manual transmission (am
), we can create that
linear model and get the pairwise comparisons like this:
pairwise(lm(mpg ~ factor(am), data = mtcars))
The output of this code will have one table showing the comparison of manual and automatic transmissions with regard to miles-per-gallon. The pooled standard error is the same as the square root of the MSE from the full model.
In these data the am
variable did not have any other values than automatic and manual, but
we can imagine situations where the predictor has more than two levels. In these cases, the
pooled SD would be calculated by taking the MSE of the full model (not of each group) and then
weighting it based on the size of the groups in question (divide by n).
To improve our model, we might add the car's displacement (disp
) as a quantitative predictor:
pairwise(lm(mpg ~ factor(am) + disp, data = mtcars))
Note that the output still only has a table for am
. This is because we can't do a pairwise
comparison using disp
because there are no groups to compare. Most functions will drop or not
let you use this variable during pairwise comparisons. Instead, pairwise()
uses the same
approach as in the 3+ groups situation: we use the MSE for the full model and then weight it by
the size of the groups being compared. Because we are using the MSE for the full model, the
effect of disp
is accounted for in the error term even though we are not explicitly comparing
different displacements. Importantly, the interpretation of the outcome is different than in
other traditional t-tests. Instead of saying, "there is a difference in miles-per-gallon based
on the type of transmission," we must add that this difference is found "after accounting for
displacement."
A list of tables organized by the terms in the model. For each term (categorical terms only, as splitting on a continuous variable is generally uninformative), the table describes all of the pairwise-comparisons possible.
An alternative set of summary statistics for ANOVA. Sums of squares, degrees of freedom, mean squares, and F value are all computed with Type III sums of squares, but for fully-between subjects designs you can set the type to I or II. This function adds to the output table the proportional reduction in error, an explicit summary of the whole model, separate formatting of p values, and is intended to match the output used in Judd, McClelland, and Ryan (2017).
supernova(fit, type = 3, verbose = TRUE) ## S3 method for class 'lm' supernova(fit, type = 3, verbose = TRUE) ## S3 method for class 'lmerMod' supernova(fit, type = 3, verbose = FALSE)
supernova(fit, type = 3, verbose = TRUE) ## S3 method for class 'lm' supernova(fit, type = 3, verbose = TRUE) ## S3 method for class 'lmerMod' supernova(fit, type = 3, verbose = FALSE)
fit |
A model fit by |
type |
The type of sums of squares to calculate (see |
verbose |
If |
An object of the class supernova
, which has a clean print method for displaying the
ANOVA table in the console as well as a named list:
tbl |
The ANOVA table as a |
fit |
|
models |
Models created by |
Judd, C. M., McClelland, G. H., & Ryan, C. S. (2017). Data Analysis: A Model Comparison Approach to Regression, ANOVA, and Beyond (3rd ed.). New York: Routledge. ISBN:879-1138819832
supernova(lm(mpg ~ disp, data = mtcars)) change_p_decimals <- supernova(lm(mpg ~ disp, data = mtcars)) print(change_p_decimals, pcut = 8)
supernova(lm(mpg ~ disp, data = mtcars)) change_p_decimals <- supernova(lm(mpg ~ disp, data = mtcars)) print(change_p_decimals, pcut = 8)
stats::update()
will perform the update in parent.frame()
by default, but this can cause
problems when the update is called by another function (so the parent frame is no longer the
environment the user is in).
update_in_env(object, formula., ...)
update_in_env(object, formula., ...)
object |
An existing fit from a model function such as |
formula. |
Changes to the formula – see |
... |
Additional arguments to the call, or arguments with
changed values. Use |
The updated model is returned.
Extract the variables from a model formula
variables(object) ## S3 method for class 'supernova' variables(object) ## S3 method for class 'formula' variables(object) ## S3 method for class 'lm' variables(object) ## S3 method for class 'lmerMod' variables(object)
variables(object) ## S3 method for class 'supernova' variables(object) ## S3 method for class 'formula' variables(object) ## S3 method for class 'lm' variables(object) ## S3 method for class 'lmerMod' variables(object)
object |
A list containing the outcome
and predictor
variables in the model.