Here’s the email (her email is italicized and my responses are bolded):

I am running the following model ->y~b0+(b1)x1+(b2)x2+(b3)x1*x2x1 is continuous and x2 is categorical (gender, two levels OR race, three levels)I’m interested in the interaction term.

I have been using the summary function to get the estimate and p-value for the interaction term. My questions are as follows:…Q2: Does the magnitude of standardized beta for an interaction term really mean anything interesting when the moderator is categorical? In my head it means that for every SD increase in the interaction term the DV changes by “beta” SDs. Correct? The sign of it clearly means something (whether the relationship b/w y and x1 gets more pos or more neg when you change groups), but the actual value does not seem meaningful. Is that right?

Q3: Relatedly, is it better to just report the semi-partial R-squared for the interaction term? What is the most informative estimate here?

Q4: Can the magnitude of a standardized beta for an interaction term be above 1? This is what the internet seems to say but it is confusing. Some of my betas from summary(mod) are >1, which is why I initially went down this rabbit hole.

My response makes sense, but I figured I’d actually simulate this to make sure it makes sense.

/**
* jQuery Plugin: Sticky Tabs
*
* @author Aidan Lister

// Set the correct tab when the page loads showStuffFromHash(context);

// Set the correct tab when a user uses their back/forward button $(window).on('hashchange', function() { showStuffFromHash(context); });

// Change the URL when tabs are clicked $('a', context).on('click', function(e) { history.pushState(null, null, this.href); showStuffFromHash(context); });

return this; }; }(jQuery));

window.buildTabsets = function(tocID) {

// build a tabset from a section div with the .tabset class function buildTabset(tabset) {

// check for fade and pills options var fade = tabset.hasClass("tabset-fade"); var pills = tabset.hasClass("tabset-pills"); var navClass = pills ? "nav-pills" : "nav-tabs";

// determine the heading level of the tabset and tabs var match = tabset.attr('class').match(/level(\d) /); if (match === null) return; var tabsetLevel = Number(match[1]); var tabLevel = tabsetLevel + 1;

// find all subheadings immediately below var tabs = tabset.find("div.section.level" + tabLevel); if (!tabs.length) return;

// create tablist and tab-content elements var tabList = $('

'); $(tabs[0]).before(tabList); var tabContent = $('

'); $(tabs[0]).before(tabContent);

// build the tabset var activeTab = 0; tabs.each(function(i) {

// get the tab div var tab = $(tabs[i]);

// get the id then sanitize it for use with bootstrap tabs var id = tab.attr('id');

// see if this is marked as the active tab if (tab.hasClass('active')) activeTab = i;

// remove any table of contents entries associated with // this ID (since we'll be removing the heading element) $("div#" + tocID + " li a[href='#" + id + "']").parent().remove();

// sanitize the id for use with bootstrap tabs id = id.replace(/[.\/?&!#<>]/g, '').replace(/\s/g, '_'); tab.attr('id', id);

// get the heading element within it, grab it's text, then remove it var heading = tab.find('h' + tabLevel + ':first'); var headingText = heading.html(); heading.remove();

// build and append the tab list item var a = $('' + headingText + ''); a.attr('href', '#' + id); a.attr('aria-controls', id); var li = $('

'); li.append(a); tabList.append(li);

// set it's attributes tab.attr('role', 'tabpanel'); tab.addClass('tab-pane'); tab.addClass('tabbed-pane'); if (fade) tab.addClass('fade');

// move it into the tab content div tab.detach().appendTo(tabContent); });

// set active tab $(tabList.children('li')[activeTab]).addClass('active'); var active = $(tabContent.children('div.section')[activeTab]); active.addClass('active'); if (fade) active.addClass('in');

if (tabset.hasClass("tabset-sticky")) tabset.rmarkdownStickyTabs(); }

// convert section divs with the .tabset class to tabsets var tabsets = $("div.section.tabset"); tabsets.each(function(i) { buildTabset($(tabsets[i])); }); };

## simulate same random normal data for both conditions n = 300 set.seed(1212) x = rnorm(n) y = rnorm(n) g = sample(c(1,0), size=n, replace=T) ## one model with a strong main effect y_strong = .7*x -1*g + .3*x*g + y ## one model with a weak negative main effect, but identical sized interaction term y_weak = - .3*x -1*g + .3*x*g + y ## combined into data frame d = data.frame(x=x, y=y, g=g, y_strong=y_strong, y_weak=y_weak) d$g = as.factor(d$g)

Now let’s visualize them:

## visualize them require(flexplot) a = flexplot(y_strong~x + g, data=d, method="lm") b = flexplot(y_weak~x + g, data=d, method="lm") cowplot::plot_grid(a,b)

If we look at the models, the coefficients for the interaction are identical (as they should be):

mod_strong = lm(y_strong~x*g, data=d) mod_weak = lm(y_weak~x*g, data=d) coef(mod_strong)

## (Intercept) x g1 x:g1 ## -0.06460258 0.62464595 -0.95230473 0.53946943

coef(mod_weak)

## (Intercept) x g1 x:g1 ## -0.06460258 -0.37535405 -0.95230473 0.53946943

Now, let’s look at the semi-partials:

estimates(mod_strong)$semi.p

## Note: I am not reporting the semi-partial R squared for the main effects because an interaction is present. To obtain main effect sizes, drop the interaction from your model.

## Note: You didn't choose to plot x so I am inputting the median

## x g x:g ## 0.36870573 0.11421068 0.03484555

estimates(mod_weak)$semi.p

## Note: I am not reporting the semi-partial R squared for the main effects because an interaction is present. To obtain main effect sizes, drop the interaction from your model. ## ## ## Note: You didn't choose to plot x so I am inputting the median

## x g x:g ## 0.01169347 0.17879960 0.05455156

Notice that the semi-partials are different: the one with the weak effect is much larger. Also, proportionally, the semi-partial for the strong main effect model is 0.03/0.518 = 0.058, while the proportion for the semi-partial of the weak main effect model is 0.054/0.245 = 0.22. In other words, the semi-partial for the model with a weak main effect seems larger than the one with the strong main effect. Once again, this is because the semi-p assigns chunks of variance explained to each component. Though in *absolute* value the interactions are identical, in relative value the one with the weak main effect seems much stronger.

// add bootstrap table styles to pandoc tables function bootstrapStylePandocTables() { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); } $(document).ready(function () { bootstrapStylePandocTables(); });

I’m not a fan of how introductory statistics is taught, in decision-tree cookbook fashion where students have to memorize which analysis is most appropriate for which circumstance. I think a much better way to teach statistics (and a much better way to think of it) is to teach the general linear model (GLM). GLM doesn’t care whether there are one or two predictor variables, whether the variables are quantitative or qualitative, or the number of levels per variable. All it cares about is which variable is the outcome and which are the predictors. (And GLM can even handle multiple outcomes as well!)

Methodologists, as far as I know, have no “name” for an analysis with two categorical predictors and two quantitative predictors. Instead, we simply plug those variables in to the equation:

\(Y = A + B + X + Z\)

where A/B are categorical and Z/X are numeric. The computer (and the mathematics) don’t care what we call it and the computer doesn’t require a decision tree (other than one that specifies which predictors are numeric and which are categorical).

Shouldn’t plotting be like that as well? Shouldn’t we just have to tell a computer our outcome variable (i.e., what’s on the Y axis) and what predictor variables we have? Then shouldn’t the computer figure out for us how to plot it?

That’s the idea behind flexplot. Flexplot is a common language for plotting where the user simply specifies what the outcome is and what the predictors are. The computer then decides the optimal way of plotting the variables. However, the user does has flexibility to decide which variables are going to be on the X axis, which will be plotted as separate lines/symbols/colors, and which will be panelled.

In this article, I’m going to show you how to use flexplot to graph the most common sorts of analyses in psychology.

I’m just going to load a dataset that I’m going to use throughout this post. I made these data up a few years ago and they’re good for showing just about any sort of analysis you might be interested in. Essentially, it simulates data where participants were randomly assigned to different therapies for weight loss (behaviorist versus cognitive therapy). The dataset also contains other variables, such as motivation scores and income.

So with that, let’s load the fifer package, as well as the exercise datasets:

require(fifer) require(ggplot2) ### load the "exercise data" dataset data("exercise_data") ### rename the exercise dataset (to make it easier to call objects within the dataset) d = exercise_data

In this situation, we have a grouping variable (e.g., treatment versus control, male versus female, low/medium/high medication) and we want to see how scores on the dependent variable vary as a function of group. To do so, we can use a “median dot plot” as I call them (or a mean dot plot, if you choose to report the mean instead of the median). The median is shown as a large red dot, along with interquartile ranges. I prefer nonparametric versions (i.e., medians/IQRs) rather than means/standard errors, just in case the data are not normally distributed. The “scores” for the rewards/no-rewards conditions have been “jittered,” which just means that noise has been added so they don’t overlap as much.

flexplot(weight.loss~rewards, data=d)