One interesting and very useful aspect of dummy coding is the ability to use these dichotomously coded variables as predictors in a regression model.
A regression model does just that, it models something, represents something.
Let’s investigate a hypothetical model of stress. We want to see if biofeedback training and gender have an effect on stress scores. This is a very simple model. A more detailed discussion and model specification was presented in our Data Cleaning webinar from February 2014 that you can access here.
In a regression model the dummy coded variables are often called indicator variables. This is because dummy variables are coded as 0’s and 1’s, and this makes each variable like an individual on/off switch in a regression model. 0 = off, 1 = on.
Our model has the following variable operationalizations (coding):
Gender:
- Males = 0
- Females = 1
Group:
- Control (did not receive biofeedback training) = 0
- Experimental (received biofeedback training) = 1
Dependent Variable:
- Stress Score (a continuous variable, with a score range from 0-56, higher scores indicting greater stress)
So, our model can be specified as follows:
Stress Score = β0 + β1 (Gender) + β2 (Group)
The value of β0, which is the intercept, is the mean (average) Stress score when the two predictors of Gender and Group are set to zero. In other words, the intercept would be the average stress score of our study participants when Gender = 0 and Group = 0.
We see that we coded our variables such that Males = 0 and Controls = 0. So the reference model is specified such that the intercept β0 gives us the mean Stress Score for males who did not receive biofeedback training. This makes sense, because if Gender and Group are equal to zero, then the β1 and β2 values (coefficients) would be eliminated by multiplication by the zeros, leaving only the equation:
Stress Score = β0
Here is an example of a coefficients tables from SPSS with some hypothetical numbers:
Coefficients* | ||||||
Model | Unstandardized Coefficients | Standardized Coefficients | t | Sig. | ||
B | Std. Error | Beta | ||||
1 | (Constant) | 27.576 | .811 | 34.003 | .000 | |
Gender | -2.674 | 1.265 | -.166 | -2.115 | .037 | |
Group | -7.867 | 1.117 | -.552 | -7.044 | .000 | |
a. Dependent Variable: Stress Score |
The B values are the numbers we are interested in. The values of the constant, which is synonymous with the terms “intercept” and “β0” is 27.58, which is the mean score when all of the predictors are set to zero, i.e. when the participant is male and did not have biofeedback, he will have a Stress Score of 27.58 on average:
Stress Score = β0 + β1 (0) + β2 (0)
Stress Score = 27.56
However, if our values for Gender and/or Group are 1, the switch is turned on for those particular predictors, and the coefficient terms are added to the model. This would then adjust the mean Stress Score accordingly.
If the participant were female, then the equation would look like this:
Stress Score = β0 + β1 (1) + β2 (0)
Stress Score = 27.56 – 2.67(1) = 24.89
The predictor is significant (p = .037) So females score on average about 2.7 points lower on the Stress Score in reference to males. And this is when they are both in the control group, because the indicator is still switched off for the variable of Group, and we know that we set Group = 0 as the Control Group.
Try it! Use the equation to predict the mean stress score when the indicators for both Gender and Group are switched on (when the participant is female and had biofeedback training). How about when you want to model males who received biofeedback training (turn on the switch for Group, and leave the switch off for Gender).
Awesome stuff!