There are many ways to introduce this topic. One helpful way to proceed is to think about the usual formula for a regression model in a slightly different fashion. For example, when you consider a two variable regression equation in the usual way, you might express the predicted score for each Y in your study as follows:
where the two X variables are the predictors and the letter a is used to describe the intercept (some folks use the letters c, k, or
to describe the intercept, but it means the same thing - it's just a naming convention). When you use this equation to form predicted values, there's a pattern: you take the person's score on the first variable and multiply it by some b and then add it to the next variable times its coefficient and so on. The only thing that is different is the intercept. Let's pretend, then, that we're interested in making the intercept follow this pattern as well. Let's assume that everyone in our study has been given the value on a new "variable" which has the value "1" for everyone in the data set. In equation form, it might look like this:
The reason that we do this is because it now makes it much easier for us to express the information about the predictors in our model in matrix form, it's just that now we're going to include our "variable" called a column of 1's in with the other "real" variables. To calculate the score matrix for use in models including the intercept, our raw data might look as follows (I'm using the raw data from the Pedhazur example which we used before - notice that the columns of this matrix are the variables reading, verbal, and achievement motivation, respectively):
We can now calculate a matrix which will look very much like a variance/covariance matrix or correlation matrix in that it will be a symmetric matrix. This matrix, however, will be termed the "augmented sums of squares and cross-products" because it will have our special column of 1's in it and elements where we are used to seeing variances, for example, will now be replaced by sums of squares. Let's see how this looks for our sample data:
We can now take this matrix and (as was done with the correlation matrix) divide by N, the sample size. (Notice that you can find this number [20] as the diagonal element associated with the intercept variable.) I.e.: Notice that the elements of the fourth row and column of this matrix are interesting in that they represent the means associated with reading, verbal, and achievement motivation, respectively. Notice also, that the other elements of the matrix have a meaning as well, they are just the sums of squares and crossproducts associated with these variables. If I would call these three variables R, V and A, respectively, I might use sigma notation to describe this matrix as follows:
There now, wasn't that fun?