I received an email Before I share that, let me give a little background. This individual is trying to perform logistic regression using the following model:
Or, in R code:
model = glm(y~x*z, data=d, family=binomial)
Now, let’s see what this individual says:
We’ve got a significant interaction effect in our logistic regression. Now we want to break it down and interpret it using simple effects. In the attached handout on page 2, paragraph 2, (paragraph beginning “Simple slopes can be..”) the method describes testing the model 3 times with different scalings of the Z variable. One test should be with the Z variable scaled so that the mean is 0. The second test should be with the Z variable scaled at one SD above the mean. The third test is with the Z variable scales at one SD below the mean. So the interaction effect is run 3 times. The subsequent pages show examples of this process.
The problem I’ve encountered is that when I do that, the Betas, Wald, Odds ratios etc are the same for all 3 of these tests for the interaction effect. I’ve missed something. Help is needed!
(First suggestion: avoid passive voice :))
And, here is what the mentioned handout says,
Simple slopes can be tested using a macro (e.g., Andrew Hayes’ macros for SPSS and SAS (http://afhayes.com/spsssasandmplusmacrosandcode.html and http://processmacro.org/index.html) or a computer method. In the computer method, the logistic model with the interaction is tested multiple times using different scalings for the Z variable. This method capitalizes on the fact that when Z is centered, the main effect for the X variable, β1, from the interaction model is a simple effect coefficient. It represents the effect of X when Z is equal to its mean, because the rescaled value z has a value of 0 when Z is equal to its mean. Rescaling Z again, where the standard deviation (sz) is subtracted, zhigh = z – sz, gives the simple effect coefficient and its significance test for X at one standard deviation above the mean of Z. The x variable scaling is unchanged, but the interaction must be computed anew, so that xzhigh = zhigh*x. The lowZ slope simple slope can be tested by rescaling Z again, this time adding one standard deviation from the mean of Z, where zlow = z + sz, and then recalculating the interaction term.
So, in summary, the text this individual is following suggests that he needs to transform Z three different ways:
Or, in R code:
z_centered = z  mean(z)
z_low = z_centered  sd(z)
z_high = z_centered + sd(z)
Then compute the significance of for each of these of these models:
Or, in R:
mod_centered = glm(y~x*z_centered, data=d, family=binomial)
mod_low = glm(y~x*z_low, data=d, family=binomial)
mod_high = glm(y~x*z_high, data=d, family=binomial)
And, I should probably be honest; I thought the answer to his question was obvious.
And, I was wrong.
Here’s what I started to write:
… he should have identical results. As they say in stats, “statistical models are invariant to linear transformations.” That’s just a fancy way of saying you can analyze, say sexual aggression as is, or you can analyze sexual aggression + 1 and you’re going to get the same results (though the intercept will be different). I’m not clear on what this paper is suggesting. If I didn’t know better, I would think it’s suggesting exactly as Damon did. But again, that’s a silly thing to do. I suspect what they meant is to test the X effect for data where Z is approximately 1 standard deviation above the mean, then do the same for where Z is approximately 1 standard deviation below the mean. If so, then your dataset for the two tests (+/1 sd) will have smaller sample sizes than the full dataset.
But, I figured I would doublecheck and make sure I wasn’t deceiving myself. So, I simulated some data to verify what I was saying was correct. (This is for a regular regression, not logistic, but my conclusions would be the same either way):
require(tidyverse)
# simulate data
x = rnorm(100)
z = .2*x + rnorm(length(x), 0, sqrt(1.2^2))
y = .4*x + .3*z + .3*x*z + rnorm(length(x), 0, .75)
d = data.frame(y=y, x=x, z=z, z_low = zsd(z), z_high = z+sd(z))
# fit the models
mod = lm(y~x*z, data=d)
mod_low = lm(y~x*z_low, data=d)
mod_high = lm(y~x*z_high, data=d)
# summarize p values in a table
p_centered = summary(mod)coefficients[,4] %>% round(3)
p_high = summary(mod_high)r.squared, xR^2$ (or even the semipartials), you're just adjusting the values of the slopes/intercept. Since slopes/intercepts are inextricably tied to the actual values of the variables, changing the values of the variables will modify the slopes/intercepts. Since pvalues test the deviation of each slope from zero, you will see different pvalues for the slopes/intercept.
I find this approach very confusing, btw.
What do I suggest instead?
Just plot the thing. Who cares about significance?
require(flexplot)
flexplot(y~x  z, data=d, method="lm")
]]>
https://quantpsych.net/maybetestingthesignificanceofsimpleslopesisuseless/feed/
0

Simple Slopes Models in JASP/R
https://quantpsych.net/simpleslopesmodelsinjaspr/
https://quantpsych.net/simpleslopesmodelsinjaspr/#respond
Tue, 01 Sep 2020 22:18:02 +0000
https://quantpsych.net/?p=266
Yes, simple slopes….
So….apparently this is a thing.
But I’d never heard of it until a year ago. And I didn’t understand it until last week.
“Aren’t you a quantitative psychologist?” you say…
Why yes, yes I am.
“Isn’t that basic knowledge?”
Maybe. Maybe not. I suspect my unfamiliarity with simple slopes has little to do with my prowess at maintaining a pulse on quantpsych. Rather, people like to call different things by different names.
(Don’t get me started on SPSS’s “hierarchical regression models.”)
When do you use simple slopes?
Good question!
You (typically) do a simple slopes analysis after you’ve detected a statistically significant interaction. (I almost vomited writing the words ‘statistically significant’). In other words, interactions are an afterthought, at best.
(Maybe I’ve never heard of simple slopes because, to me, interactions are never an afterthought).
Let’s look at an example, shall we? Below, is an anova summary table of the avengers dataset:
Df
Sum Sq
Mean Sq
F value
Pr(>F)
speed
1
228840.41
228840.41
624.0194
0
superpower
1
121771.09
121771.09
332.0546
0
speed:superpower
1
83506.39
83506.39
227.7116
0
Residuals
808
296309.80
366.72
NA
NA
My oh my, how significant that interaction is.
Once our results tell us something we should have predicted in advance (i.e., an interaction is present), we now want to know the nature of the interaction. What does it look like?
That’s where simple slopes comes in.
A simple slopes analysis simply computes predictions for various levels of the data. So, we might try to see how our model predicts shots.taken (our outcome variable) for a few different values of speed and for each level of superpower (yes versus no). That might look like this (btw, this is a bad way of doing this, so don’t….):
So, apparently, speed means more shots taken for nonsuperheroes and less shots taken for superheroes.
But, this is problematic.
Why?
Why Simple Slopes Models are Problematic
Well, there’s worse things one could do. But, this model takes a continuous variable (speed) and condenses all that rich information into just three values (1sd, mean, +1sd). It’s not necessary. Instead, why not just plot a regularold scatterplot (and maybe have different panels for superheroes and nonsuperheroes)?
By the way, this is exactly what flexplot does: where possible, it maintains the continuous nature of the data. There’s no funky coding we have to do. We don’t have to save predictions to excel then export them to another program. It just does this naturally.
Let’s go ahead and look at how to do it in JASP first, then I’ll show you how to do it in R.
Computing simple slopes with JASP’s Visual Modeling Module
It’s really so easy to do this, it shouldn’t require a tutorial. But, I’ll ablige
If you haven’t added the Visual Modeling module yet, do so by pressing the + button at the top right:
then checking the Visual Modeling Module:
(sorry….my version is in dutch for some reason…don’t ask :))
Now, import your dataset, then click on Visual Modeling at the top, and select “linear modeling” (second option)
Now select your predictors and your outcome:
Then you’ll have to add your interaction term under the “Model Terms” menu (or Modeltermen if you’re dutch :)). To do so, select both predictor variables, then click the right arrow:
Then look to the right.
There’s your simple slopes graphic, except it doesn’t compress the data like the first figure did. And, it includes raw data so you can know if your model actually fits.
Which it doesn’t.
Let’s go ahead and add a polynomial term by first clicking “Add as a polynomial” on the speed variable:
Then click on the “Visual fitting” pane and select “Quadratic” from the menu:
And now our model has a polynomial term:
For more information on visualizing in JASP, see my article on JASP’s website or my YouTube playlist
Computing simple slopes in R with Flexplot
This is all quite easy in R, if you’re familiar with it:
require(flexplot)
data(avengers)
model = lm(shots.taken~speed+superpower + I(speed^2) + speed:superpower, data=avengers)
visualize(model, plot="model", ghost.line="gray")
For more information on doing this in R, see my Flexplot manual.
]]>
https://quantpsych.net/simpleslopesmodelsinjaspr/feed/
0

Automating Images Inserts From Screencaptures with R Studio!
https://quantpsych.net/automatingimagesinsertsfromscreencaptureswithrstudio/
https://quantpsych.net/automatingimagesinsertsfromscreencaptureswithrstudio/#respond
Mon, 31 Aug 2020 17:32:20 +0000
https://quantpsych.net/?p=247
I love writing in rmarkdown
. What I love most about it is the ability to have a onestopshop for generating text/Rcode/output.
That works well and good for most of everything I do, except for when I’m trying to show users how to use JASP. I have found it extremely tedious to write about how to use the JASP interfact, make a screencapture, save the screencapture to the appropriate project folder, search how to insert an image in rmarkdown
, then finally type the image path.
But, oops, I misspelled the image name, so now I have to do that weird kinda sorta doubleclicky thingy in mac to highlight the name of the file, then copy, then paste. But, oops! There are 100 images in there and I accidentally selected the wrong one.
So now I have to search through all those images to find the right one.
Yeah, it’s tedious and frustrating.
For this reason, I decided that my stats book would be R, then maybe I could tackle a JASP version.
That was, of course, until I read a blog post from Andrew Heiss^{1} about converting plainold text into markdownformatted text. To do so, he used Apple’s Automator.
I’d heard of Automator, but never got around to seeing how it would benefit me.
“I’m paid for the grand prowess of my enormous brain,” I had thought. “I’m so important none of my job requires automation.”
Oh how wrong I was.
After Andrew’s post, and knowing his brain’s way more prowessy than mine, I figured I ought to dive into it. Oddly enough, I ended up learning to automate through Keyboard Maestro. (Don’t ask me why I went with that). But, I’m sure the same can be done in automator, with a few modifications.
The basic idea
Here’s what I envisioned I could do:
 Change the default screenshot directory to the directory where my
rmarkdown
file is stored.
 Take a screenshot
 Copy the file name of the screenshot I just made to my clipboard, along with the proper
rmarkdown
tag.
 Paste the proper code to my markdown document.
Care to see an example?
It seemed to me that I needed to come up with two macros: one that would easily allow me to specify a folder to dump my screenshots, that way I can use relative references in rmarkdown
. The second macro would then search that folder for the newest screenshot
That was easy enough to do with scripting:
To make it easier to copy and paste, here’s the applescript text:
set frontApp to (path to frontmost application as text)
if frontApp does not end with "Finder.app:" then
tell application frontApp
set dir to POSIX path of (choose folder with prompt "Choose Folder")
end tell
else
tell application "System Events"
activate
set dir to POSIX path of (choose folder with prompt "Choose Folder")
set frontmost of application process "Finder" to true
end tell
end if
And here’s the terminal commands
mkdir p "$KMVAR_File_Path/screenshots"
defaults write com.apple.screencapture location "$KMVAR_File_Path/screenshots"
defaults write com.apple.screencapture name "$KMVAR_dirname"
killall SystemUIServer
Not too bad, eh?
Now the second macro needs to be able to identify the most recent screencapture, copy its relative file path, then paste it (with proper syntax) into R Studio
.
Here’s the macro:
And voila! It works beautifully.
So now, all I have to do is
 Type ::screenshot to specify the directory (do once per project)
 Make a screenshot as you normally would (commandshift4)
 Wait 5 seconds (it takes some time for the OS to dump the screencapture in the folder)
 Type option4 to paste the
rmarkdown
into R studio
I have a feeling this will save me hundreds of hours.
]]>
https://quantpsych.net/automatingimagesinsertsfromscreencaptureswithrstudio/feed/
0

Undeniable proof GLMs can run ttest!!
https://quantpsych.net/undeniableproofglmscanrunttest/
https://quantpsych.net/undeniableproofglmscanrunttest/#respond
Wed, 19 Aug 2020 14:51:06 +0000
https://quantpsych.net/?p=233
There are two possibilities: either I’m crazy, or everyone else is wrong.
Well, I’m ’bout to prove I’m not wrong.
Yes, I know it’s hard for people to accept we should abandon the standard stats curriculum in favor of the GLM.
But you can’t deny you’ll get identical results doing a ttest as a GLM.
“Prove it!,” you say?
Well, I am happy to oblige.
Let’s go ahead and run a regular old ttest:
require(flexplot)
data(avengers)
t.test(ptsd~north_south, data=avengers, var.equal=TRUE)
##
## Welch Two Sample ttest
##
## data: ptsd by north_south
## t = 8.195, df = 810, pvalue = 9.755e16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.4112410 0.2523058
## sample estimates:
## mean in group north mean in group south
## 3.834729 4.166502
(Note the default ttest in R, Welch’s, does a correction which will make it not equal to a regression, hence the var.equal=TRUE
. Thanks for the reminder Michael!)
Now let’s do the same thing as a glm:
mod = lm(ptsd~north_south, data=avengers)
summary(mod)
##
## Call:
## lm(formula = ptsd ~ north_south, data = avengers)
##
## Residuals:
## Min 1Q Median 3Q Max
## 2.0347 0.3665 0.0347 0.3653 3.2335
##
## Coefficients:
## Estimate Std. Error t value Pr(>t)
## (Intercept) 3.83473 0.02863 133.954 < 2e16 ***
## north_southsouth 0.33177 0.04048 8.195 9.76e16 ***
## 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5768 on 810 degrees of freedom
## Multiple Rsquared: 0.07656, Adjusted Rsquared: 0.07542
## Fstatistic: 67.16 on 1 and 810 DF, pvalue: 9.755e16
Oh, would you look at that. The tstatistic for the ttest (8.195) is exactly the same as the tstatistic for the “slope” in the linear model (8.195), at least in absolute value. (The ttest version subtracts north from south, while the glm version does the opposite).
It’s. The. Same. Thing.
Bazinga.
Except the GLM approach gives you more than the ttest and is easily expandable; the ttest is NOT.
And, of course, we might as well look at a graphic of the glm (which you cannot do with the ttest, btw).
visualize(mod, plot="model")
]]>
https://quantpsych.net/undeniableproofglmscanrunttest/feed/
0

Objections to the GLM Approach to Statistics Curriculum
https://quantpsych.net/objectionstothegenerallinearmodelapproachtostatisticscurriculum/
https://quantpsych.net/objectionstothegenerallinearmodelapproachtostatisticscurriculum/#respond
Tue, 18 Aug 2020 00:08:40 +0000
https://quantpsych.net/?p=210
I recently shared a video with an old friend who is an immunologist and statistics teacher. (That video will be published shortly, btw). In that video I argue we need to shift to a general linear model approach to teaching statistics. Here’s an old video that explains what I mean by teaching stats as a GLM:
Not surprisingly, she was skeptical. She sent me an email with her objections (and points of agreement). I figured I’d share it with ya’ll. (Her words are italicized, mine are indented).
Okay, so here’s the things I thought.
Overwhelmingly, you and I feel exactly the same about pedagogy. It should be teaching concepts and connections, because memorization is pointless, particularly in the age of Google. So, things I agree with:
1) Don’t memorize. I don’t give tests, but when I did, I allowed a 3×5 that they could write all the memorization stuff (like formulas) on.
Nice to see we agree Although, it’s not the memorization I find problematic, per se. It’s the fact that it requires too much intellectual effort to actually run the analysis. By the time they actually figure out what analysis and run the thing, there’s nothing left to interpret the results. Imagine if, in order to unlock your phone, you’d have to enter a password, enter your mother’s date of birth, swipe a fingerprint scanner, do 30 jumping jacks, then eat a pintsized bowl of cereal. Only then would your phone unlock. That’s a massive human factors nightmare. Software (and deciding which analyses to run) should not get in the way of giving us what we want (results). I’m just shortcutting the time from opening software to interpreting results.
2) It’s not a protocol/procedure. It’s a science that is changing constantly, and teaching it like a protocol is silly. Of course people will learn in chunks and not connect. I use several main concepts, like the ratio, to connect everything. We talk about linear regression that way (how could you not, with an F?).
Yes, agreed. I actually wrote a paper about this: i propose an eight step approach to data analysis. It’s not a procedure, it’s a framework. Here’s the link, in case you’re interested:
btw, it has Harry Potter references and a birthday cake metaphor. Some of my best work
…
4) Once the concepts are solid, use software. Don’t endlessly make people calculate z scores.
YES! I used to spend hours doing hand calculations in front of students. What a waste of time!
…
Things I had concerns about
1) We also don’t teach biology to nonbiologists in the way experts think about it. To experts, all biology is inclusive fitness, or what makes a population most likely to reproduce (to genes to be passed on). Complicated fields, like Immunology, make so much more sense when you realize it’s all about many cells competing for resources to be the ones to survive and generate copies of themselves. It’s a truly elegant system. But learning about each system in and of itself, in little chunks, is quite difficult without the interwoven themes. BUT – and here’s where I have trouble – It is easier, and more practical, to teach it how we teach it. The students don’t get overwhelmed, and the ones that aren’t too bright don’t get totally lost.
Good point. I too have wondered whether learning in discrete chunks is necessary before one comes to see things as interrelated. But, I don’t think that’s the case with statistics, or at least the discrete chunks we currently use aren’t serving us well. Case in point: you (and many others) seem to struggle with the idea that everything is just the linear model when it is, verifiably, all the linear model. Again, this is not intended as an insult or condescension; it’s just the way you were taught. The fact that it’s so hard for you (and others) to accept says, to me at least, the existing curriculum isn’t serving us well.
After writing that, it sounds like I’m being harsh and argumentative. I’m not. Just making the point that the way we currently teach requires students to make a really hard mental transition, one that is entirely unnecessary.
2) At the end of the day, sometimes we need a yes/no, actually, usually we need a yes/no. We make a cutoff, even though we KNOW we are losing information, because we have to generate something that is manageable, understood quickly by others in the field, and summarized as an asterisk in a paper so it doesn’t take up too much space.
I agree. But, sometimes is different than every time. Yes, sometimes making binary decisions is best. You’ll see in some videos in the coming weeks I demonstrate situations where I have to make binary decisions (in this case, whether we should keep an interaction term in our model). However, sometimes we do need to know something about the degree. The standard stats curriculum says little (though not nothing) about that. Mine does
I don’t disagree your way makes sense; however, it requires a LOT of people changing how they think about statistics.
Totally. But, are you familiar with the “replication crisis”? That too requires a LOT of people changing how they think about research. But it’s happening (at least in psychology). It’s a hard change, but things are changing.
And, just because it’s hard, doesn’t mean it’s not necessary
You’re going to have to have buyin from thousands or perhaps millions and you’ll have to convince journals to give more space to statistics.
I’d recommend reading the article I mentioned above. What I recommend actually doesn’t add that much, and what it does add is much more informative than a table of pvalues. Yes there is a cost, but the benefits far outweigh the costs.
3) You’ll have to forgive my piecemeal training and the fact that I am not even close to as much of an expert as you – but nonparametrics? How do you deal with that?
Great question! But, I need to clarify some things. How I handle messy data is very different now than when I handled it as a biostatistician. From what I remember, it’s very common to handle messy models with Mann Whitneys or Friedman tests (among others). From what I’ve read (e.g., https://psycnet.apa.org/record/200814338002), these are very dated ways of handling messy models. In fact, I don’t even use “modern” robust procedures (as that article advocates). Instead, rather than removing the parametric from statistics, I just assume a nonnormal distribution. If a biomarker is super skewed and zeroinflated, maybe I’ll model it as a gamma or poisson or zeroinflated model. In other words, rather than sweep the messiness under the rug as nonparametric procedures do, I’d rather model that messiness, except I model them as generalized linear models. Well, generalized linear models are just extensions of general linear models, so it’s an easy transition to make. (Although I don’t teach generalized linear models until their second statistics class).
]]>
https://quantpsych.net/objectionstothegenerallinearmodelapproachtostatisticscurriculum/feed/
0

Hacking timeseries data with Flexplot
https://quantpsych.net/hackingtimeseriesdatawithflexplot/
https://quantpsych.net/hackingtimeseriesdatawithflexplot/#respond
Thu, 13 Aug 2020 14:35:12 +0000
https://quantpsych.net/?p=194
I received an email from somebody the other day who’s learning the basics of flexplot. Here’s her question:
I have time series data for reading achievement at four school grades. I am fitting nonlinear latent growth curve models to map reading achievement over time.
In order to plot these data using flexplot (in R), I have cheated a bit by making the time points one variable and the reading achievement data a second variable. The idea is to plot the data points over time, along with the means and SDs.
So this works fine, flexplot reads the timepoints as categories and gives all the data points (jittered) plus means and SDs at each time. I was just wondering whether it is possible to join up the means at each timepoint in a horizontal line so that the nonlinear trend is clearer?
The beauty of flexplot is that all flexplot graphics are ggplot2 objects. Sooooo…..
All one has to do is figure out how you would do that in ggplot 2 and layer that onto the flexplot object, like so:
require(flexplot)
require(ggplot2)
data(avengers)
### convert injuries to factor (just to fit your example)
avengers$injuries = factor(avengers$injuries, ordered=T)
plot = flexplot(ptsd~injuries, data=avengers)
plot + stat_summary(fun=median, colour="red", geom="line", aes(group=1))
And you end up getting a lovely plot, like so:
]]>
https://quantpsych.net/hackingtimeseriesdatawithflexplot/feed/
0

Some thoughts on interaction terms
https://quantpsych.net/somethoughtsoninteractiontermscoefficientssemipartialcorrelationsandstandardizedbetasgreaterthanone/
https://quantpsych.net/somethoughtsoninteractiontermscoefficientssemipartialcorrelationsandstandardizedbetasgreaterthanone/#respond
Wed, 05 Aug 2020 14:54:15 +0000
https://quantpsych.net/?p=184
I recently received an email from a student. The cool thing about answering questions is it gives you a chance to learn something you knew but didn’t know you knew (see my response to Questions 3/4 below).
Here’s the email (her email is italicized and my responses are bolded):
I am running the following model >
y~b0+(b1)x1+(b2)x2+(b3)x1*x2
x1 is continuous and x2 is categorical (gender, two levels OR race, three levels)
I’m interested in the interaction term.
I have been using the summary function to get the estimate and pvalue for the interaction term. My questions are as follows:
…
Q2: Does the magnitude of standardized beta for an interaction term really mean anything interesting when the moderator is categorical? In my head it means that for every SD increase in the interaction term the DV changes by “beta” SDs. Correct? The sign of it clearly means something (whether the relationship b/w y and x1 gets more pos or more neg when you change groups), but the actual value does not seem meaningful. Is that right?
Let’s do some math:
y = b0 + b1*x + b2*group + b3*group*x
Let’s assume we’re the referent group (i.e., group = 0):
y = b0 + b1*x + b2*(0) + b3*(0)*x
y = b0 + b1*x
And for the other group:
y = b0 + b1*x + b2*(1) + b3*(1)*x
y = (b0 + b2) + (b2 + b3)*x
So, b2 is the difference between groups in the intercept, and b3 is the difference between groups in the slope. If it’s a standardized beta, b3 represents the difference in the amount of standard deviations y changes for a standard deviation change in x (relative to the referent group). Make sense?
Q3: Relatedly, is it better to just report the semipartial Rsquared for the interaction term? What is the most informative estimate here?
That’s a tough question to answer. I’d probably be inclined to report the standardized beta and interpret it correctly in the paper (i.e., as a difference in slopes). The semipartial R squared isn’t as standardized as you would think. Suppose you have two models that have slopes that deviate by exactly the same amount (i.e., the interaction betas are equivalent). Also suppose the first model has a large “main effect” (e.g., there’s generally a positive slope for x), and suppose the second model has no main effect (i.e., there’s a crossover interaction, producing an average main effect of zero). The semipartial will partition the explained variance. In the first model (the one with a large main effect), much of that variance is going to be sucked up by the main effect and so the interaction effect is going to appear quite small. On the other hand, almost all of the explained variance for the second model is going to be given to the interaction. Remember, these two models have identical sized interactions, yet the semi partials are going to be very different. Make sense?
Q4: Can the magnitude of a standardized beta for an interaction term be above 1? This is what the internet seems to say but it is confusing. Some of my betas from summary(mod) are >1, which is why I initially went down this rabbit hole.
I suppose it could. Because b3 is the difference in slopes, you could have a strongly negative slope for group 1 (e.g., b1 = 0.7) and a strongly positive slope for group 2 (e.g., +0.8). To make this happen, your interaction term must be greater than one (it would be 1.1 in this case).
My response makes sense, but I figured I’d actually simulate this to make sure it makes sense.
/**
* jQuery Plugin: Sticky Tabs
*
* @author Aidan Lister
* adapted by Ruben Arslan to activate parent tabs too
* http://www.aidanlister.com/2014/03/persistingthetabstateinbootstrap/
*/
(function($) {
"use strict";
$.fn.rmarkdownStickyTabs = function() {
var context = this;
// Show the tab corresponding with the hash in the URL, or the first tab
var showStuffFromHash = function() {
var hash = window.location.hash;
var selector = hash ? 'a[href="' + hash + '"]' : 'li.active > a';
var $selector = $(selector, context);
if($selector.data('toggle') === "tab") {
$selector.tab('show');
// walk up the ancestors of this element, show any hidden tabs
$selector.parents('.section.tabset').each(function(i, elm) {
var link = $('a[href="#' + $(elm).attr('id') + '"]');
if(link.data('toggle') === "tab") {
link.tab("show");
}
});
}
};
// Set the correct tab when the page loads
showStuffFromHash(context);
// Set the correct tab when a user uses their back/forward button
$(window).on('hashchange', function() {
showStuffFromHash(context);
});
// Change the URL when tabs are clicked
$('a', context).on('click', function(e) {
history.pushState(null, null, this.href);
showStuffFromHash(context);
});
return this;
};
}(jQuery));
window.buildTabsets = function(tocID) {
// build a tabset from a section div with the .tabset class
function buildTabset(tabset) {
// check for fade and pills options
var fade = tabset.hasClass("tabsetfade");
var pills = tabset.hasClass("tabsetpills");
var navClass = pills ? "navpills" : "navtabs";
// determine the heading level of the tabset and tabs
var match = tabset.attr('class').match(/level(\d) /);
if (match === null)
return;
var tabsetLevel = Number(match[1]);
var tabLevel = tabsetLevel + 1;
// find all subheadings immediately below
var tabs = tabset.find("div.section.level" + tabLevel);
if (!tabs.length)
return;
// create tablist and tabcontent elements
var tabList = $('
');
$(tabs[0]).before(tabList);
var tabContent = $('
');
$(tabs[0]).before(tabContent);
// build the tabset
var activeTab = 0;
tabs.each(function(i) {
// get the tab div
var tab = $(tabs[i]);
// get the id then sanitize it for use with bootstrap tabs
var id = tab.attr('id');
// see if this is marked as the active tab
if (tab.hasClass('active'))
activeTab = i;
// remove any table of contents entries associated with
// this ID (since we'll be removing the heading element)
$("div#" + tocID + " li a[href='#" + id + "']").parent().remove();
// sanitize the id for use with bootstrap tabs
id = id.replace(/[.\/?&!#<>]/g, '').replace(/\s/g, '_');
tab.attr('id', id);
// get the heading element within it, grab it's text, then remove it
var heading = tab.find('h' + tabLevel + ':first');
var headingText = heading.html();
heading.remove();
// build and append the tab list item
var a = $('' + headingText + '');
a.attr('href', '#' + id);
a.attr('ariacontrols', id);
var li = $('
');
li.append(a);
tabList.append(li);
// set it's attributes
tab.attr('role', 'tabpanel');
tab.addClass('tabpane');
tab.addClass('tabbedpane');
if (fade)
tab.addClass('fade');
// move it into the tab content div
tab.detach().appendTo(tabContent);
});
// set active tab
$(tabList.children('li')[activeTab]).addClass('active');
var active = $(tabContent.children('div.section')[activeTab]);
active.addClass('active');
if (fade)
active.addClass('in');
if (tabset.hasClass("tabsetsticky"))
tabset.rmarkdownStickyTabs();
}
// convert section divs with the .tabset class to tabsets
var tabsets = $("div.section.tabset");
tabsets.each(function(i) {
buildTabset($(tabsets[i]));
});
};
## simulate same random normal data for both conditions
n = 300
set.seed(1212)
x = rnorm(n)
y = rnorm(n)
g = sample(c(1,0), size=n, replace=T)
## one model with a strong main effect
y_strong = .7*x 1*g + .3*x*g + y
## one model with a weak negative main effect, but identical sized interaction term
y_weak =  .3*x 1*g + .3*x*g + y
## combined into data frame
d = data.frame(x=x, y=y, g=g, y_strong=y_strong, y_weak=y_weak)
d$g = as.factor(d$g)
Now let’s visualize them:
## visualize them
require(flexplot)
a = flexplot(y_strong~x + g, data=d, method="lm")
b = flexplot(y_weak~x + g, data=d, method="lm")
cowplot::plot_grid(a,b)
If we look at the models, the coefficients for the interaction are identical (as they should be):
mod_strong = lm(y_strong~x*g, data=d)
mod_weak = lm(y_weak~x*g, data=d)
coef(mod_strong)
## (Intercept) x g1 x:g1
## 0.06460258 0.62464595 0.95230473 0.53946943
coef(mod_weak)
## (Intercept) x g1 x:g1
## 0.06460258 0.37535405 0.95230473 0.53946943
Now, let’s look at the semipartials:
estimates(mod_strong)$semi.p
## Note: I am not reporting the semipartial R squared for the main effects because an interaction is present. To obtain main effect sizes, drop the interaction from your model.
## Note: You didn't choose to plot x so I am inputting the median
## x g x:g
## 0.36870573 0.11421068 0.03484555
estimates(mod_weak)$semi.p
## Note: I am not reporting the semipartial R squared for the main effects because an interaction is present. To obtain main effect sizes, drop the interaction from your model.
##
##
## Note: You didn't choose to plot x so I am inputting the median
## x g x:g
## 0.01169347 0.17879960 0.05455156
Notice that the semipartials are different: the one with the weak effect is much larger. Also, proportionally, the semipartial for the strong main effect model is 0.03/0.518 = 0.058, while the proportion for the semipartial of the weak main effect model is 0.054/0.245 = 0.22. In other words, the semipartial for the model with a weak main effect seems larger than the one with the strong main effect. Once again, this is because the semip assigns chunks of variance explained to each component. Though in absolute value the interactions are identical, in relative value the one with the weak main effect seems much stronger.
// add bootstrap table styles to pandoc tables
function bootstrapStylePandocTables() {
$('tr.header').parent('thead').parent('table').addClass('table tablecondensed');
}
$(document).ready(function () {
bootstrapStylePandocTables();
});
]]>
https://quantpsych.net/somethoughtsoninteractiontermscoefficientssemipartialcorrelationsandstandardizedbetasgreaterthanone/feed/
0

Mapping Graphics to Common Statistical Analyses using Flexplot
https://quantpsych.net/mappinggraphicstocommonstatisticalanalysesusingflexplot/
https://quantpsych.net/mappinggraphicstocommonstatisticalanalysesusingflexplot/#respond
Tue, 30 Oct 2018 20:33:32 +0000
http://quantpsych.net/?p=49
The General Linear Model
I’m not a fan of how introductory statistics is taught, in decisiontree cookbook fashion where students have to memorize which analysis is most appropriate for which circumstance. I think a much better way to teach statistics (and a much better way to think of it) is to teach the general linear model (GLM). GLM doesn’t care whether there are one or two predictor variables, whether the variables are quantitative or qualitative, or the number of levels per variable. All it cares about is which variable is the outcome and which are the predictors. (And GLM can even handle multiple outcomes as well!)
Methodologists, as far as I know, have no “name” for an analysis with two categorical predictors and two quantitative predictors. Instead, we simply plug those variables in to the equation:
\(Y = A + B + X + Z\)
where A/B are categorical and Z/X are numeric. The computer (and the mathematics) don’t care what we call it and the computer doesn’t require a decision tree (other than one that specifies which predictors are numeric and which are categorical).
Shouldn’t plotting be like that as well? Shouldn’t we just have to tell a computer our outcome variable (i.e., what’s on the Y axis) and what predictor variables we have? Then shouldn’t the computer figure out for us how to plot it?
That’s the idea behind flexplot. Flexplot is a common language for plotting where the user simply specifies what the outcome is and what the predictors are. The computer then decides the optimal way of plotting the variables. However, the user does has flexibility to decide which variables are going to be on the X axis, which will be plotted as separate lines/symbols/colors, and which will be panelled.
In this article, I’m going to show you how to use flexplot to graph the most common sorts of analyses in psychology.
Preliminaries
I’m just going to load a dataset that I’m going to use throughout this post. I made these data up a few years ago and they’re good for showing just about any sort of analysis you might be interested in. Essentially, it simulates data where participants were randomly assigned to different therapies for weight loss (behaviorist versus cognitive therapy). The dataset also contains other variables, such as motivation scores and income.
So with that, let’s load the fifer package, as well as the exercise datasets:
require(fifer)
require(ggplot2)
### load the "exercise data" dataset
data("exercise_data")
### rename the exercise dataset (to make it easier to call objects within the dataset)
d = exercise_data
Independent TTest/ANOVA
In this situation, we have a grouping variable (e.g., treatment versus control, male versus female, low/medium/high medication) and we want to see how scores on the dependent variable vary as a function of group. To do so, we can use a “median dot plot” as I call them (or a mean dot plot, if you choose to report the mean instead of the median). The median is shown as a large red dot, along with interquartile ranges. I prefer nonparametric versions (i.e., medians/IQRs) rather than means/standard errors, just in case the data are not normally distributed. The “scores” for the rewards/norewards conditions have been “jittered,” which just means that noise has been added so they don’t overlap as much.
flexplot(weight.loss~rewards, data=d)